CN113936246A - Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning - Google Patents
Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning Download PDFInfo
- Publication number
- CN113936246A CN113936246A CN202111076953.2A CN202111076953A CN113936246A CN 113936246 A CN113936246 A CN 113936246A CN 202111076953 A CN202111076953 A CN 202111076953A CN 113936246 A CN113936246 A CN 113936246A
- Authority
- CN
- China
- Prior art keywords
- local
- feature
- features
- learning
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000006870 function Effects 0.000 claims abstract description 63
- 238000000605 extraction Methods 0.000 claims abstract description 15
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 10
- 238000013507 mapping Methods 0.000 claims description 15
- 230000008859 change Effects 0.000 claims description 13
- 238000002474 experimental method Methods 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 9
- 230000009466 transformation Effects 0.000 claims description 9
- 238000005259 measurement Methods 0.000 claims description 7
- 238000012549 training Methods 0.000 claims description 7
- 230000003044 adaptive effect Effects 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 230000009286 beneficial effect Effects 0.000 claims description 4
- 238000013135 deep learning Methods 0.000 claims description 4
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 claims description 3
- 210000001015 abdomen Anatomy 0.000 claims description 3
- 230000004913 activation Effects 0.000 claims description 3
- 210000001217 buttock Anatomy 0.000 claims description 3
- 210000000038 chest Anatomy 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000001868 liquid chromatography-fluorescence detection Methods 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000000844 transformation Methods 0.000 claims description 3
- 210000000689 upper leg Anatomy 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000002349 favourable effect Effects 0.000 claims description 2
- 210000002683 foot Anatomy 0.000 claims description 2
- 210000001699 lower leg Anatomy 0.000 claims description 2
- 239000010419 fine particle Substances 0.000 claims 2
- 239000000284 extract Substances 0.000 abstract description 2
- 101100153586 Caenorhabditis elegans top-1 gene Proteins 0.000 description 9
- 101100370075 Mus musculus Top1 gene Proteins 0.000 description 9
- 238000011156 evaluation Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000002679 ablation Methods 0.000 description 2
- 206010034719 Personality change Diseases 0.000 description 1
- 244000309466 calf Species 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000007786 learning performance Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an unsupervised target pedestrian re-identification method based on Joint Local Feature discriminant learning, which is characterized in that a Joint Local Feature Extraction Network (JLFEN) formed by parallel space converters and a plurality of simple convolutional neural networks horizontally and dynamically divides two groups of Local regions with different quantity scales for the same pedestrian and extracts effective features, so that the Local features are effectively aligned in space; a Feature Joint Discrimination (FJD) loss function improvement model consisting of a Local Feature Discrimination (LFD) loss function and a Cascade Feature Discrimination (CFD) loss function is adopted to perform discriminant learning on unsupervised Local features, so that the influence of different pedestrians with similar shapes on the Local Feature learning is reduced.
Description
Technical Field
The invention relates to an unsupervised target pedestrian re-identification method based on joint local feature discriminant learning, and belongs to the field of computer vision.
Background
In the pedestrian re-identification research based on deep learning, the quality of local features also has certain influence on the learning of the non-tag data features of the pedestrian re-identification model. In order to learn more effective local features, an unsupervised target pedestrian re-identification method based on combined local feature discriminant learning is provided. The pedestrian features are horizontally and dynamically divided through the joint local feature extraction network, and the corresponding region features are extracted to obtain two local feature groups, so that the influence of pedestrian attitude change and camera angle on local feature alignment is reduced. And guiding the local features to carry out discriminant learning by adopting a local feature discriminant loss function so as to improve the learning capability of the unsupervised pedestrian re-recognition model on the local features. In order to reduce the influence of local features with similar appearances of different pedestrians on model learning, the relative distance and the absolute distance between the features are calculated by using a cascade feature discrimination loss function, the similar features are drawn close, different features are pushed away, and the recognition performance of the unsupervised pedestrian re-recognition model is further enhanced.
Disclosure of Invention
In order to solve some problems in the background art, the invention provides an unsupervised target pedestrian re-identification method based on joint local feature discriminant learning.
1. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: combining local feature extraction networks;
in order to make the network dependent on the pedestrian gesture in the imageThe local features of different areas are extracted through the change of the state angle, the problem that the local features cannot be well aligned is solved, two parallel space converters are added behind a ResNet50 network to divide the local areas, a simple convolutional neural network is used for extracting the features, the intermediate feature mapping of the image is sent into a plurality of positioning networks, the spatial transformation of the feature mapping is carried out, and compared with the original image, the network computing complexity is reduced; the positioning network consists of a convolution layer with the kernel size of 3 multiplied by 3 and two full connection layers, ReLU is used as the network activation function, the last full connection layer is initialized and biased, and in order to obtain local sampling grids of two division modes, the positioning network is utilized to respectively predict two groups of space position parameters theta ═ theta1,θ2,...,θMEta ═ eta1,η2,...,ηN};
In order to enable the predicted space position to obtain effective fine-grained characteristics and align the characteristics in the space, position parameters in two space converters are predicted according to each part of the human body in the vertical direction; the human body can be divided into three parts, namely a head part, an upper body and a lower body, wherein the human body is generally short in the upper body and long in the lower body, the head part accounts for the least, the upper body is arranged, and then the lower body is arranged, but under the camera, due to the change of the angle of the camera and the change of the posture of the pedestrian, the proportion of the pedestrian in the obtained image is changed, and if the problem that the upper body is long and the lower body is short occurs, the partial areas divided by the same pedestrian can not be aligned;
the positioning network firstly predicts three groups of space position parameters to divide pedestrians in unequal proportion in the horizontal direction to obtain three local regions with the longitudinal width ratio close to 1:2:3, obtains local features from top to bottom, enables the proportion of heads to be minimum, enables the middle part to be close to the size of the upper body of a normal visual angle, and finally contains feature information below the buttocks of the pedestrians, can divide different local regions for the pedestrians in an image space according to the specific change condition of the pedestrians in the image, and can deal with the problem that the local features cannot be aligned due to the change of the proportion, the posture and the like of the pedestrians;
in addition, the upper body part and the lower body part of the pedestrian compriseThe pedestrian information is more detailed and can be divided into parts such as chests, abdomens, thighs, crus, feet and the like, different parts of the pedestrian information possibly contain different feature information, and the feature information obtained by fine local area division can enable a model to better extract fine-grained features in an image, so that six local areas are divided from the image by considering the fact that six groups of spatial position parameters are pre-measured according to a plurality of parts contained in a human body, the more effective fine-grained feature information of the pedestrian local area can be conveniently mined in a network mode, the robustness and the recognition accuracy of the model can be improved by combining the two types of local feature information, the position parameters are subjected to affine transformation with the size of 2 x 3, and the local areas are obtained by cutting the feature mapping; wherein A isθ,AηRespectively representing unknown parameters in the two groups of positioning networks, and locally cutting the image through prediction parameters a, b, c and d;
according to the predicted parameters, the feature mapping is cut to divide local sampling grids with different positions and scales according to the spatial position of the pedestrian in the image, and the generation process isWherein, for each spatial location parameterized by the positioning network,representing the spatial location coordinates of the input,representing the spatial location coordinates of the output; finally, two groups of local sampling grid parameters of different division modes are obtained, a sampler is used for sampling, three local areas and six local areas with the longitudinal width ratio close to 1:2:3 are respectively obtained in the same image according to the above formula, and different local areas comprise different parts of pedestriansFinally, the obtained local areas are respectively sent into a simple convolutional neural network to be coded to obtain local characteristics; the convolutional neural network is composed of an adaptive average pooling function, a convolutional layer, a BN layer and two full-connection layers, wherein the adaptive average pooling function is used for ensuring that a feature region input into the convolutional layer is local feature mapping with a specific size of 2048 multiplied by 1, then feature extraction is respectively carried out on the local region through the convolutional layer, the BN layer and the two full-connection layers, and meanwhile, the feature information of local image splicing can be obtained by connecting the local feature information, so that a model obtains the overall important information of pedestrians in an image, and the problem of inaccurate matching caused by the similarity of local features of different pedestrians is reduced.
2. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: judging and learning local features;
by combining the division of two local areas in the local feature extraction network, local features of different positions and scales can be extracted from the same image, and then the local features are compared in the non-label local features, which is very difficult to process in deep learning based on small batch optimization, so that the feature memory is adopted to store and update the unsupervised sample features; judging local features of each block of input image according to Euclidean distance metric criterionThe similarity between the features of the similar positions with other images is subjected to local feature learning,representing the m local feature of the ith image; feature memoryThe updating method is to use the similarity of sample features as the auxiliary clustering of the monitoring information, train the sample features to find out the similar features nearest to the sample features, and judge whether the classes of the pseudo labels are consistent to perform corresponding operations on the featuresUpdating; the dynamic updating process comprises the following steps:whereinIs composed ofThe rate of the update is 0.1,for the updated latest local feature, P is the training period, and when P is 0,initializing a feature library for an unlabeled database prior to training, and updating features in memoryInfinite proximity to
We compute each local featureFinding distances from Euclidean distances between sample features in feature memoryThe most recent K local features are obtainedSet and then calculate the K local features and featuresThe sum of the similarities between them, andand all mth local features in the feature memory to obtain a Local Feature Discrimination (LFD) loss function as:
wherein M represents the number of local features into which the image is divided, here 3 and 6, | · |. the luminance2Representing the euclidean distance.
3. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: judging and learning cascade features;
under the condition that data does not have class labels, when local features which are similar in appearance but contain different identity information are clustered, the problems that the local features of the same pedestrian are separated easily and the local features of different pedestrians are drawn close easily occur in the extracted local features, so that the local features of the same pedestrian cannot be registered and the learning capability of the local features of the model is reduced, therefore, in order to improve the feature robustness of a model learning sample, a cascade feature discrimination loss function (CFD) optimization model is adopted, the sum of all local feature information of an image is obtained by connecting the local features output by unlabeled images, the discrimination of the local cascade features is learned by maximizing the inter-class distance and minimizing the intra-class distance, and learning the feature of the sample by using the hardest positive sample and the hardest negative sample of the sample is beneficial to enhancing the feature learning capability of the model, the robustness and the accuracy of the model are improved, so that a corresponding quaternary loss function is provided to guide the learning of the cascade characteristics, and the most difficult positive sample and the most difficult negative sample are used in the cascade characteristic discrimination loss function, so that the learning of the characteristics by the model is improved, and therefore, the most difficult positive sample and the most difficult negative sample pair of the samples need to be found in different modes;
first, a small sample batch is givenFor the input image XiPerforming a series of simple random transformations including image cropping, contrast, saturation and brightness to obtain pseudo-positive sample X by image processingpiThe marked identity marks and the input images are used as the most difficult positive samples, all the image samples are sent to a network for experiment, and the randomly generated pseudo positive samples are favorable for feature discrimination learning of an unsupervised model; then, if they are not nearest neighbors, the identities of the samples are not similar and do not belong to the same class, whether the samples are nearest neighbors can be determined through the similarity between the samples, and the hardest negative sample pair is determined by using the cyclic ordering similarity result, so that the Euclidean distance between every two samples is measured according to the Euclidean distance to obtain each sample XiGenerates an ordered list N of the measurement resultsiSorting by measurement results, if sample XjThe farther the distance is from the sample XiThe lower the similarity of (A) is, not XiSo that X can be identified as the nearest neighbor top-njIs XiIn order to mine the most difficult negative pairs, the first two negative x samples in the ranking list are selectedmiAnd xniAs the most difficult negative sample pair, where xmiRank at xniBefore; and finally, performing feature learning on the model through the obtained four samples, wherein a Cascade Feature Discrimination (CFD) loss function of the model is expressed as:
LCFD=(||xi-xpi||2-||xai-xmi||2)+α)++(||xai-xpi||2-||xmi-xni||2+β)+
wherein xaiRepresenting an input image, xpiRepresenting pseudo-positive samples, xmi,xniRespectively represent the hardest negative sample pair, ()+The expression takes the maximum value, and the parameters alpha and beta are threshold values.
4. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: feature association discrimination learning;
the local feature discrimination loss function mainly performs discrimination learning on each local feature in the image, and the splicing feature discrimination loss function mainly performs the learning on all local feature discriminativity of each image, so that the loss functions of two features are combined in a reasonable mode to obtain the cascade feature discrimination loss function, the learning capability of the pedestrian re-identification model on the unlabeled data features can be improved, and the Feature Joint Discrimination (FJD) loss function is expressed as:
where λ represents a weight.
Drawings
Figure 1 parameter K experimental results in different data sets.
Figure 2 parameter n experimental results in different data sets.
FIG. 3 shows the result of taking the value of parameter λ in data set Market-1501.
FIG. 4 shows the result of taking the value of parameter λ in the data set DukeMTMC-reiD.
FIG. 5 is a method structure model.
Detailed Description
The invention comprises the following technical scheme:
1. an unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: combining local feature extraction networks;
in order to enable the network to extract local features of different areas according to the change of the posture angle of pedestrians in the image and reduce the problem that the local features cannot be well aligned, two parallel space converters are added behind the ResNet50 network to divide the local areas, a simple convolutional neural network is used for feature extraction, the intermediate feature mapping of the image is sent into a plurality of positioning networks, and compared with the original image, the spatial transformation of the feature mapping reduces the network calculation complexity; the core size of the positioning network is 3 x 3 volumeThe method comprises the steps that a layer is formed by stacking and two full-connection layers, ReLU is used as a network activation function, the last full-connection layer is initialized and biased, and in order to obtain local sampling grids of two division modes, a positioning network is used for predicting two groups of space position parameters theta ═ respectively1,θ2,...,θMEta ═ eta1,η2,...,ηN};
In order to enable the predicted space position to obtain effective fine-grained characteristics and align the characteristics in the space, position parameters in two space converters are predicted according to each part of the human body in the vertical direction; the human body can be divided into three parts, namely a head part, an upper body and a lower body, wherein the human body is generally short in the upper body and long in the lower body, the head part accounts for the least, the upper body is arranged, and then the lower body is arranged, but under the camera, due to the change of the angle of the camera and the change of the posture of the pedestrian, the proportion of the pedestrian in the obtained image is changed, and if the problem that the upper body is long and the lower body is short occurs, the partial areas divided by the same pedestrian can not be aligned;
the positioning network firstly predicts three groups of space position parameters to divide pedestrians in unequal proportion in the horizontal direction to obtain three local regions with the longitudinal width ratio close to 1:2:3, obtains local features from top to bottom, enables the proportion of heads to be minimum, enables the middle part to be close to the size of the upper body of a normal visual angle, and finally contains feature information below the buttocks of the pedestrians, can divide different local regions for the pedestrians in an image space according to the specific change condition of the pedestrians in the image, and can deal with the problem that the local features cannot be aligned due to the change of the proportion, the posture and the like of the pedestrians;
in addition, the upper body part and the lower body part of the pedestrian contain more detailed pedestrian information which can be divided into parts such as the chest, the abdomen, the thighs, the calves, the feet and the like, different parts of the pedestrian can contain different feature information, and the fine-grained features in the image can be better extracted by a model through the feature information obtained by fine local area divisionThe fine-granularity feature information is combined with two kinds of local feature information, so that the robustness and the recognition accuracy of the model are improved, the position parameters adopt affine transformation with the size of 2 multiplied by 3, and a local area is obtained by cutting the feature mapping; wherein A isθ,AηRespectively representing unknown parameters in the two groups of positioning networks, and locally cutting the image through prediction parameters a, b, c and d;
according to the predicted parameters, the feature mapping is cut to divide local sampling grids with different positions and scales according to the spatial position of the pedestrian in the image, and the generation process isWherein, for each spatial location parameterized by the positioning network,representing the spatial location coordinates of the input,representing the spatial location coordinates of the output; finally, two groups of local sampling grid parameters of different division modes are obtained, a sampler is used for sampling, three local areas and six local areas with the longitudinal width ratio close to 1:2:3 are respectively obtained in the same image, and because different local areas contain characteristic information of different parts of pedestrians, the obtained local areas are respectively sent into a simple convolutional neural network to be coded to obtain local characteristics; the convolutional neural network is composed of an adaptive average pooling function, a convolutional layer, a BN layer and two full-connection layers, wherein the adaptive average pooling function is used for ensuring that a feature region input into the convolutional layer is a local feature mapping with a specific size of 2048 multiplied by 1, and then the local region is divided through the convolutional layer, the BN layer and the two full-connection layersAnd feature extraction is carried out, and meanwhile, the feature information of local image splicing can be obtained by connecting the local feature information, so that the model obtains the overall important information of pedestrians in the image, and the problem of inaccurate matching caused by the similarity of local features of different pedestrians is reduced.
2. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: judging and learning local features;
by combining the division of two local areas in the local feature extraction network, local features of different positions and scales can be extracted from the same image, and then the local features are compared in the non-label local features, which is very difficult to process in deep learning based on small batch optimization, so that the feature memory is adopted to store and update the unsupervised sample features; judging local features of each block of input image according to Euclidean distance metric criterionThe similarity between the features of the similar positions with other images is subjected to local feature learning,representing the m local feature of the ith image; feature memoryThe updating method is characterized in that the similarity of sample characteristics is used as monitoring information for assisting clustering, the sample characteristics are trained to find out the nearest similar characteristics, and whether the categories of pseudo labels are consistent or not is judged to correspondingly update the characteristics; the dynamic updating process comprises the following steps:whereinIs composed ofThe rate of the update is 0.1,for the updated latest local feature, P is the training period, and when P is 0,initializing a feature library for an unlabeled database prior to training, and updating features in memoryInfinite proximity to
We compute each local featureFinding distances from Euclidean distances between sample features in feature memoryThe most recent K local features are obtainedSet and then calculate the K local features and featuresThe sum of the similarities between them, andand all mth local features in the feature memory to obtain a Local Feature Discrimination (LFD) loss function as:
wherein M represents the number of local features into which the image is divided, here 3 and 6, | · |. the luminance2Representing the euclidean distance.
3. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: judging and learning cascade features;
under the condition that data does not have class labels, when local features which are similar in appearance but contain different identity information are clustered, the problems that the local features of the same pedestrian are separated easily and the local features of different pedestrians are drawn close easily occur in the extracted local features, so that the local features of the same pedestrian cannot be registered and the learning capability of the local features of the model is reduced, therefore, in order to improve the feature robustness of a model learning sample, a cascade feature discrimination loss function (CFD) optimization model is adopted, the sum of all local feature information of an image is obtained by connecting the local features output by unlabeled images, the discrimination of the local cascade features is learned by maximizing the inter-class distance and minimizing the intra-class distance, and learning the feature of the sample by using the hardest positive sample and the hardest negative sample of the sample is beneficial to enhancing the feature learning capability of the model, the robustness and the accuracy of the model are improved, so that a corresponding quaternary loss function is provided to guide the learning of the cascade characteristics, and the most difficult positive sample and the most difficult negative sample are used in the cascade characteristic discrimination loss function, so that the learning of the characteristics by the model is improved, and therefore, the most difficult positive sample and the most difficult negative sample pair of the samples need to be found in different modes;
first, a small sample batch is givenFor the input image XiPerforming a series of simple random transformations including image cropping, contrast, saturation and brightness to obtain pseudo-positive sample X by image processingpiThe same as the marked identity label and the input image are used as the most difficult positive samples, all the image samples are sent to the network for experiment, and the marked identity label and the input image are usedThe machine-generated false positive sample is beneficial to feature discriminant learning of an unsupervised model; then, if they are not nearest neighbors, the identities of the samples are not similar and do not belong to the same class, whether the samples are nearest neighbors can be determined through the similarity between the samples, and the hardest negative sample pair is determined by using the cyclic ordering similarity result, so that the Euclidean distance between every two samples is measured according to the Euclidean distance to obtain each sample XiGenerates an ordered list N of the measurement resultsiSorting by measurement results, if sample XjThe farther the distance is from the sample XiThe lower the similarity of (A) is, not XiSo that X can be identified as the nearest neighbor top-njIs XiIn order to mine the most difficult negative pairs, the first two negative x samples in the ranking list are selectedmiAnd xniAs the most difficult negative sample pair, where xmiRank at xniBefore; and finally, performing feature learning on the model through the obtained four samples, wherein a Cascade Feature Discrimination (CFD) loss function of the model is expressed as:
LCFD=(||xi-xpi||2-||xai-xmi||2+α)++(||xai-xpi||2-||xmi-xni||2+β)+
wherein xaiRepresenting an input image, xpiRepresenting pseudo-positive samples, xmi,xniRespectively represent the hardest negative sample pair, ()+The expression takes the maximum value, and the parameters alpha and beta are threshold values.
4. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: feature association discrimination learning;
the local feature discrimination loss function mainly performs discrimination learning on each local feature in the image, and the splicing feature discrimination loss function mainly performs the learning on all local feature discriminativity of each image, so that the loss functions of two features are combined in a reasonable mode to obtain the cascade feature discrimination loss function, the learning capability of the pedestrian re-identification model on the unlabeled data features can be improved, and the Feature Joint Discrimination (FJD) loss function is expressed as:
where λ represents a weight.
Results and analysis of the experiments
In order to verify the influence of a combined local feature extraction network on the alignment of the local features of the pedestrians, the combined local feature extraction network is verified on a DukeMTMC-reiD data set and a Market-1501 data set respectively, performance evaluation is carried out through evaluation index average precision value mAP and matching rate top-k, when two local feature branches are combined, the mAP and top-1 values of the model are superior to those of the single local feature branch, and the combined local feature discriminant learning method is used for extracting the local features of different scales and positions of the same pedestrian image, so that the network can more finely acquire effective local feature information of the pedestrians, the accuracy of the local features in alignment comparison is improved, and the learning capacity of the unsupervised pedestrian re-identification model on the local features is further improved. The results of the experiments are shown in the following table.
TABLE 1 local feature Branch in Market-1501 affects results on model Performance
TABLE 2 local feature Branch in DukeMTMC-reiD impact results on model Performance
Compared with the PAUL unsupervised local feature method, the combined local feature branch is used and is divided according to the structure of the human body part, so that local features containing more effective feature information can be obtained for effective local feature comparison. It is also contemplated to learn local features using the FJD loss function with better discriminant learning capabilities. It can be seen from the table that in both datasets, the mAP and top-1 results are higher than PAUL, in the Market-1501 dataset, the mAP value is increased to 41.4%, and the top-1 value is increased to 70.2%; in the DukeMTMC-reiD dataset, the mAP value increased to 54.1% and the top-1 value increased to 73.9%.
Table 3 comparison with the latest method in Market-1501 data set
TABLE 4 comparison of the DukeMTMC-reiD dataset with the latest method
In order to better analyze the influence of a Local Feature Discrimination (LFD) loss function and a Cascade Feature Discrimination (CFD) loss function in a model on the discriminability of the learning local features of the model, the result of a pre-training network JLFEN is taken as the reference of an experiment, the experiment analysis is respectively carried out in a typical data set Market-1501 and a DukeMTMC-reiD, the experiment result shows that the values of an evaluation index average precision value mAP and a matching rate top-k are obviously improved after the LFD loss function is used for learning, and the values of mAP and top-k are also improved after the CFD loss function is used, so that the effective discrimination of the CFD loss function on the local features of different pedestrians with similar appearances in a pedestrian image is shown. Finally, two loss functions are reasonably combined, and the results of the mAP and the top-k are obviously superior to those of the other two combination modes, wherein the mAP value is improved by 8.7%, the top-1 value is improved by 8.3%, the top-5 value is improved by 6.3%, and the top-15 value is improved by 5.5% in the Market-1501 data set; the mAP value in the Duke MTMC-reiD data set is improved by 7.2%, the top-1 value is improved by 7.5%, the top-5 value is improved by 2.8%, and the top-15 value is improved by 1.7%.
TABLE 5 ablation test results of loss function in Market-1501 data set
TABLE 6 results of ablation experiments with loss function in DukeMTMC-reiD data set
The values of the parameters in the loss function also have a certain influence on the performance of the model. Through experiments in different data sets, the values of the weight parameter lambda in the FJD loss function, the selection of the parameter K in the LFD loss function and the selection of the parameter n in the CFD loss function are analyzed, and the influence of the selection of the parameter K in the LFD loss function and the influence of the selection of the parameter n in the CFD loss function on the model performance are respectively analyzed. The experimental method controls other parameters to be unchanged, the values of the other parameters are respectively explored, and the first hit rate top-1 is used as an evaluation index for analysis. It was found that when K is 15, the top-1 value is relatively good.
The CFD loss function also plays an important role in distinguishing the model learning characteristics, wherein the better most difficult negative sample is found, so that the model learning characteristics can be effectively improved, the unsupervised model robustness is enhanced, and therefore, the influence of the selection of the parameter n in the nearest neighbor top-n in the sample on the cascade characteristic distinguishing loss function is analyzed through experiments. Because two negative samples are needed in CFD loss function learning, when the n value is too small, the obtained samples are insufficient, the loss function learns the positive sample as the negative sample by mistake, so that the learning performance of the model is reduced, when the n value is gradually increased, the performance of the model is increased and then reduced, the larger the n value is, because the learned samples are enough, the model can easily find the most difficult sample, the learning difficulty of the model is reduced, and when n is 6, the model can obtain good performance.
From the above results, the parameter K is 15, n is 6 to control the variables, and the parameter λ is analyzed experimentally, and it can be seen in fig. 3 that in different data sets, as the λ value is gradually increased, the results of the average precision value mAP and the first hit rate top-1 are increased and then decreased, wherein at λ 2, the modeling performance is the best.
Claims (4)
1. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: combining local feature extraction networks;
in order to enable the network to extract local features of different areas according to the change of the posture angle of pedestrians in the image and reduce the problem that the local features cannot be well aligned, two parallel space converters are added behind the ResNet50 network to divide the local areas, a simple convolutional neural network is used for feature extraction, the intermediate feature mapping of the image is sent into a plurality of positioning networks, and compared with the original image, the spatial transformation of the feature mapping reduces the network calculation complexity; the positioning network consists of a convolution layer with the kernel size of 3 multiplied by 3 and two full-connection layers, ReLU is used as the network activation function, the last full-connection layer is initialized and biased, and in order to obtain local sampling grids of two division modes, the positioning network is utilized to respectively predict two groups of space position parameters theta ═ theta1,θ2,...,θMEta ═ eta1,η2,...,ηN};
In order to enable the predicted space position to obtain effective fine-grained characteristics and align the characteristics in the space, position parameters in two space converters are predicted according to each part of the human body in the vertical direction; the human body can be divided into three parts, namely a head part, an upper body and a lower body, wherein the human body is generally short in the upper body and long in the lower body, the head part is the smallest, then the upper body and then the lower body, but under the camera, due to the change of the angle of the camera and the change of the posture of a pedestrian, the proportion of the pedestrian in the obtained image changes, for example, the problem that the upper body is long and the lower body is short occurs, and the situation that the partial areas divided by the same pedestrian cannot be aligned can occur;
the positioning network firstly predicts three groups of space position parameters to divide pedestrians in unequal proportion in the horizontal direction to obtain three local regions with the longitudinal width ratio close to 1:2:3, obtains local features from top to bottom to enable the proportion of heads to be minimum, the middle part of the local regions is close to the size of the upper body at a normal visual angle, and finally contains feature information below the buttocks of the pedestrians;
in addition, the upper body part and the lower body part of the pedestrian contain more detailed pedestrian information which can be divided into parts such as chests, abdomens, thighs, shanks, feet and the like, different parts of the pedestrian information can contain different feature information, and the fine particle size features in the image can be better extracted by the model through the feature information obtained by fine local area division, so that six local areas are divided by the image by considering prediction of six groups of spatial position parameters according to a plurality of parts contained in a human body, the network can conveniently mine more effective fine particle size feature information of the local area of the pedestrian, the robustness and the recognition accuracy of the model can be improved by combining two types of local feature information, the position parameters adopt affine transformation with the size of 2 x 3, and the local areas are obtained by cutting the feature mapping; wherein A isθ,AηRespectively representing unknown parameters in the two groups of positioning networks, and locally cutting the image through prediction parameters a, b, c and d;
according to the predicted parameters, the feature mapping is cut to divide local sampling grids with different positions and scales according to the spatial position of the pedestrian in the image, and the generation process isWherein, for each spatial position parameterized by the positioning network,representing the spatial location coordinates of the input,representing the spatial location coordinates of the output; finally, two groups of local sampling grid parameters of different division modes are obtained, a sampler is used for sampling, three local areas and six local areas with the longitudinal width ratio close to 1:2:3 are respectively obtained in the same image, and because different local areas contain characteristic information of different parts of pedestrians, the obtained local areas are respectively sent into a simple convolutional neural network to be coded to obtain local characteristics; the convolutional neural network consists of an adaptive average pooling function, a convolutional layer, a BN layer and two full-connection layers, wherein the adaptive average pooling function is used for ensuring that a feature region input into the convolutional layer is local feature mapping with a specific size of 2048 multiplied by 1, then feature extraction is respectively carried out on the local region through the convolutional layer, the BN layer and the two full-connection layers, and meanwhile, the local feature information is connected to obtain feature information of image local splicing, so that a model obtains integral important information of pedestrians in an image, and the problem of inaccurate matching caused by the similarity of local features of different pedestrians is solved.
2. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: local feature discrimination learning;
by combining the division of two local areas in a local feature extraction network, local features of different positions and scales can be extracted from the same image, and then the local features are compared in the non-label local features and are difficult to process in deep learning based on small-batch optimization, so that a feature memory is adopted to store and update the unsupervised sample features; judging local features of each block of input image according to Euclidean distance measurement criterionThe similarity between the features of the similar positions with other images is subjected to local feature learning,indicates the ith sheetThe mth local feature of the image; feature memoryThe updating method is characterized in that the similarity of sample characteristics is used as monitoring information for assisting clustering, the sample characteristics are trained to find out the nearest similar characteristics, and whether the categories of pseudo labels are consistent or not is judged to correspondingly update the characteristics; the dynamic updating process comprises the following steps:whereinIs composed ofThe rate of the update is 0.1,for the updated latest local feature, P is the training period, and when P is 0,initializing a feature library for an unlabeled database prior to training, and updating features in memoryInfinite proximity to
We compute each local featureFinding distances from Euclidean distances between sample features in feature memoryThe most recent K local features are obtainedSet and then calculate the K local features and featuresThe sum of the similarities between them, andand all mth local features in the feature memory to obtain a Local Feature Discrimination (LFD) loss function as:
wherein M represents the number of local features into which the image is divided, here 3 and 6, | · |. the luminance2Representing the euclidean distance.
3. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: judging and learning cascade characteristics;
under the condition that data does not have class labels, when local features which are similar in appearance but contain different identity information are clustered, the problems that the local features of the same pedestrian are separated easily and the local features of different pedestrians are drawn close easily occur in the extracted local features, so that the local features of the same pedestrian cannot be registered and the learning capacity of the local features of the model is reduced, therefore, in order to improve the feature robustness of a model learning sample, a cascade feature discrimination loss function (CFD) optimization model is adopted, the sum of all local feature information of an image is obtained by connecting the local features output by unlabeled images, the discrimination of the local cascade features is learned by maximizing the inter-class distance and minimizing the intra-class distance, the learning of the sample features by using the hardest positive sample and the hardest negative sample of the sample is beneficial to enhancing the feature learning capacity of the model and improving the robustness and accuracy of the model, therefore, a corresponding quadruple loss function is provided to guide the learning of the cascade characteristics, and the most difficult positive sample and the most difficult negative sample are used in the cascade characteristic discrimination loss function, so that the learning of the characteristics by the model is improved, and therefore, the most difficult positive sample and the most difficult negative sample of the samples need to be found in different modes;
first, a small sample batch is givenFor the input image XiPerforming a series of simple random transformations including image cropping, contrast, saturation and brightness to obtain pseudo-positive sample X by image processing operationpiThe marked identity marks and the input images are used as the most difficult positive samples, all the image samples are sent to a network for experiment, and the randomly generated pseudo positive samples are favorable for feature discriminant learning of an unsupervised model; then, if they are not nearest neighbors, the identities between the samples are not similar and belong to the same class, whether the samples are nearest neighbors can be determined through the similarity between the samples, and the hardest negative sample pair is determined by using the cyclic ordering similarity result, so that each sample X is obtained by measuring the Euclidean distance between every two samples according to the Euclidean distanceiGenerates an ordered list X of the measurement resultsiSorting by measurement results, if sample XjThe farther the distance is from the sample XiThe lower the similarity of (A) is, not XiSo that X can be identified as the nearest neighbor top-njIs XiIn order to mine the most difficult negative pairs, the first two negative x samples in the ranking list are selectedmiAnd xniAs the most difficult negative sample pair, where xmiRank at xniBefore; finally, the four obtained are passedThe sample carries out feature learning on the model, and the Cascade Feature Discrimination (CFD) loss function of the sample is expressed as:
LCFD=(||xi-xpi||2-||xai-xmi||2+α)++(||xai-xpi||2-||xmi-xni||2+β)+
wherein xaiRepresenting an input image, xpiRepresenting pseudo-positive samples, xmi,xniRespectively represent the hardest negative sample pair, ()+This means taking the maximum value and the parameters α, β as the threshold.
4. An unsupervised target pedestrian re-identification method based on joint local feature discriminant learning is characterized by comprising the following steps of: feature joint discrimination learning;
the local feature discrimination loss function mainly performs discrimination learning on each local feature in the image, and the splicing feature discrimination loss function mainly performs the learning on all local feature discriminativity of each image, so that the loss functions of two features are combined in a reasonable mode to obtain the cascade feature discrimination loss function, the learning capability of the pedestrian re-identification model on the unlabeled data features can be improved, and the Feature Joint Discrimination (FJD) loss function is expressed as:
where λ represents a weight.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111076953.2A CN113936246A (en) | 2021-09-14 | 2021-09-14 | Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111076953.2A CN113936246A (en) | 2021-09-14 | 2021-09-14 | Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113936246A true CN113936246A (en) | 2022-01-14 |
Family
ID=79275732
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111076953.2A Pending CN113936246A (en) | 2021-09-14 | 2021-09-14 | Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113936246A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116310648A (en) * | 2023-03-23 | 2023-06-23 | 北京的卢铭视科技有限公司 | Model training method, face recognition method, electronic device and storage medium |
-
2021
- 2021-09-14 CN CN202111076953.2A patent/CN113936246A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116310648A (en) * | 2023-03-23 | 2023-06-23 | 北京的卢铭视科技有限公司 | Model training method, face recognition method, electronic device and storage medium |
CN116310648B (en) * | 2023-03-23 | 2023-12-12 | 北京的卢铭视科技有限公司 | Model training method, face recognition method, electronic device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108960140B (en) | Pedestrian re-identification method based on multi-region feature extraction and fusion | |
CN112101150B (en) | Multi-feature fusion pedestrian re-identification method based on orientation constraint | |
CN111709311B (en) | Pedestrian re-identification method based on multi-scale convolution feature fusion | |
CN111898736B (en) | Efficient pedestrian re-identification method based on attribute perception | |
CN110197502B (en) | Multi-target tracking method and system based on identity re-identification | |
Senior | A combination fingerprint classifier | |
CN106682696B (en) | The more example detection networks and its training method refined based on online example classification device | |
CN102682302B (en) | Human body posture identification method based on multi-characteristic fusion of key frame | |
CN109165540B (en) | Pedestrian searching method and device based on prior candidate box selection strategy | |
CN108595636A (en) | The image search method of cartographical sketching based on depth cross-module state correlation study | |
CN106257498A (en) | Zinc flotation work condition state division methods based on isomery textural characteristics | |
CN105975932A (en) | Gait recognition and classification method based on time sequence shapelet | |
CN111814705B (en) | Pedestrian re-identification method based on batch blocking shielding network | |
CN106778501A (en) | Video human face ONLINE RECOGNITION method based on compression tracking with IHDR incremental learnings | |
CN110334628B (en) | Outdoor monocular image depth estimation method based on structured random forest | |
CN113706547B (en) | Unsupervised domain adaptive semantic segmentation method based on category dissimilarity guidance | |
CN112767447A (en) | Time-sensitive single-target tracking method based on depth Hough optimization voting, storage medium and terminal | |
CN113486902A (en) | Three-dimensional point cloud classification algorithm automatic selection method based on meta-learning | |
Symeonidis et al. | Neural attention-driven non-maximum suppression for person detection | |
CN115527269A (en) | Intelligent human body posture image identification method and system | |
CN114297237A (en) | Three-dimensional point cloud data retrieval method and device based on category fusion and computer equipment | |
CN113936246A (en) | Unsupervised target pedestrian re-identification method based on joint local feature discriminant learning | |
CN117893737A (en) | Jellyfish identification and classification method based on YOLOv-LED | |
CN113723558A (en) | Remote sensing image small sample ship detection method based on attention mechanism | |
CN112084353A (en) | Bag-of-words model method for rapid landmark-convolution feature matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20220114 |