CN111242064A - Pedestrian re-identification method and system based on camera style migration and single marking - Google Patents

Pedestrian re-identification method and system based on camera style migration and single marking Download PDF

Info

Publication number
CN111242064A
CN111242064A CN202010053330.2A CN202010053330A CN111242064A CN 111242064 A CN111242064 A CN 111242064A CN 202010053330 A CN202010053330 A CN 202010053330A CN 111242064 A CN111242064 A CN 111242064A
Authority
CN
China
Prior art keywords
image
pedestrian
training
images
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010053330.2A
Other languages
Chinese (zh)
Inventor
李强
高玲
吴绍君
李杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Normal University
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202010053330.2A priority Critical patent/CN111242064A/en
Publication of CN111242064A publication Critical patent/CN111242064A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian re-identification method and a system based on camera style migration and single marking, which comprises the following steps: acquiring a plurality of non-label images to be subjected to pedestrian re-identification; marking the pedestrian to be identified in one of all the unlabelled images to be subjected to pedestrian re-identification; inputting the marked images into a pre-trained cycleGAN network, and outputting a plurality of camera style migration images corresponding to each image in the marked images to realize data amplification; marking the pedestrian of the camera style migration image corresponding to the marked image; the unlabelled image is put into a pre-trained cycleGAN network to realize data amplification; and inputting the marked image, the unmarked image, the camera style transition image corresponding to the marked image and the camera style transition image corresponding to the unmarked image into a pre-trained CNN network, and outputting the identification result of the pedestrian to be identified in each image in the unmarked image.

Description

Pedestrian re-identification method and system based on camera style migration and single marking
Technical Field
The disclosure relates to the technical field of pedestrian re-identification, in particular to a pedestrian re-identification method and system based on camera style migration and single labeling.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
Pedestrian re-identification (re-ID) is a technique for finding a particular pedestrian in an image library or video sequence by using computer vision correlation techniques, i.e., a given pedestrian is targeted from other multiple surveillance camera data or pictures for the given pedestrian given a pedestrian of interest.
In the course of implementing the present disclosure, the inventors found that the following technical problems exist in the prior art:
in the study of pedestrian re-identification, a commonly used method is supervised learning, a training set of the supervised learning requires input and output pairs, namely features and targets with labels, the labels are labeled manually, a machine finds the relationship between the features and the target labels through data training, when a new feature is input, the labels can be judged, namely a model is learned from a given training data set, and then a predicted value of new data is obtained according to the model. In order to obtain a relatively good recognition model, a large number of pictures and corresponding manually labeled labels are required, and the manual labeling process is labor-consuming and time-consuming. Unlike supervised learning, unsupervised learning has no label on input data and no definite result, and the intrinsic relation between input features is learned directly.
Existing single sample-based studies have mostly focused on selection of pseudo-tags. M.ye, a.j.ma, l.zheng, j.li, and p.c.yuen, "Dynamic label mapping for Unsupervised video-identification," in proc.ieee int.conf.via., oct.2017, pp.5152-5160, and h.fan, l.zheng, c.yan, and y.yang, "Unsupervised person re-identification," marketing and fine-tuning, "ACM trans.multimedia combination.command.l., app.14, vol.4, p.83, oct.2018. a static policy is used to determine the number of pseudo-tags, and then the next training is performed. These algorithms always fix the size of the training set of pseudo-labels during the iteration process.
Wu Y, Lin Y, Dong X, et al, extension the Unknown gradient, One-Shot Video-Based Person Re-identification by Stepwise Learning [ C ]//2018 IEEE/CVFConference on Computer Vision and Pattern Recognition (CVPR). IEEE,2018, a progressive Learning framework is proposed, which makes better use of unlabeled data for training pedestrian Re-identification in limited samples. The algorithm initially trains a CNN model on data of single labeled samples, then generates a pseudo label for all unlabeled samples, and selects some most reliable pseudo label data for training according to the prediction confidence. Unlike the former method, the number of pseudo label training sets is not fixed, but is continuously enlarged according to the sampling strategy. In contrast, dynamically increasing the number of pseudo-tags achieves better results in the iterative process. However, none of the above methods consider the pedestrian retrieval problem across cameras.
Pedestrian re-identification is a problem of searching pictures across cameras, and under different cameras, the difference of the same person under different cameras is large due to different shooting angles, different color differences caused by building shielding, different picture background contents and the like. Traditional single sample mark only marks a certain person under a certain camera, lacks the differentiation study that the camera was striden to the picture, has leaded to the model to stride the problem that the camera recognition rate is not high. The reason for the weakness of the previous method model is two points: 1) the data volume of single sample labeling is too small; 2) the model is overfitting to a certain camera, and cannot adapt to a data set crossing the cameras. The pictures of the same pedestrian are greatly different under different cameras, and different pedestrians can be similar under the same camera.
Disclosure of Invention
In order to overcome the defects of the prior art, the disclosure provides a pedestrian re-identification method and system based on camera style migration and single marking;
in a first aspect, the present disclosure provides a pedestrian re-identification method based on camera style migration and single labeling;
the pedestrian re-identification method based on camera style migration and single marking comprises the following steps:
acquiring a plurality of non-label images to be subjected to pedestrian re-identification;
marking the pedestrian to be identified in one of all the unlabelled images to be subjected to pedestrian re-identification;
inputting the marked images into a pre-trained cycleGAN network, and outputting a plurality of camera style migration images corresponding to each image in the marked images to realize data amplification; marking the pedestrian of the camera style migration image corresponding to the marked image;
the unlabelled images are sent to a pre-trained cycleGAN network, and a plurality of camera style migration images corresponding to each image in the unlabelled images are output to realize data amplification;
and inputting the marked image, the unmarked image, the camera style transition image corresponding to the marked image and the camera style transition image corresponding to the unmarked image into a pre-trained CNN network, and outputting the identification result of the pedestrian to be identified in each image in the unmarked image.
In a second aspect, the present disclosure also provides a pedestrian re-identification system based on camera style migration and single annotation;
pedestrian re-identification system based on camera style migration and single marking includes:
an acquisition module configured to: acquiring a plurality of non-label images to be subjected to pedestrian re-identification;
a marking module configured to: marking the pedestrian to be identified in one of all the unlabelled images to be subjected to pedestrian re-identification;
a data amplification module configured to: inputting the marked images into a pre-trained cycleGAN network, and outputting a plurality of camera style migration images corresponding to each image in the marked images to realize data amplification; marking the pedestrian of the camera style migration image corresponding to the marked image;
the unlabelled images are sent to a pre-trained cycleGAN network, and a plurality of camera style migration images corresponding to each image in the unlabelled images are output to realize data amplification;
an identification module configured to: and inputting the marked image, the unmarked image, the camera style transition image corresponding to the marked image and the camera style transition image corresponding to the unmarked image into a pre-trained CNN network, and outputting the identification result of the pedestrian to be identified in each image in the unmarked image.
In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, wherein when the computer instructions are executed by the processor, the method of the first aspect is performed.
In a fourth aspect, the present disclosure also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the beneficial effect of this disclosure is:
the method and the device for expanding the data provide a solution idea for data expansion by using image style migration aiming at the condition that the model identification performance is poor due to insufficient data volume in the single-labeling problem, and provide a new scheme for expanding the sample for the single-labeling step-by-step iterative experiment.
During training, the method provides that the CNN model is initialized by the original data set single label sample and the camera style migration data set single label sample together.
The method comprises the steps of searching for a credible picture and giving a pseudo label, calculating picture characteristics of an original data set, randomly selecting a part of pictures generated by the original picture and the camera style migration to give the pseudo label, and putting the part of the pictures into an iterative training model.
Most of the existing pedestrian re-identification methods rely on complete data labeling, namely, data of people in each training set under different cameras need to be labeled. However, for practical monitoring scenes, such as monitoring videos in a city, it is very costly to manually label the pedestrian label of each video segment from a plurality of cameras. Therefore, we try to label only the samples with a single label, and let the network learn to use those unlabelled samples by itself. That is, for each pedestrian, we only need to label one of the videos or one of the pictures, and the rest videos or pictures are searched by the algorithm itself.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.
FIG. 1 is a flow chart of a method of the first embodiment;
FIG. 2(a) to FIG. 2(c) are schematic diagrams of image style migration and data augmentation of the first embodiment; cam1 represents the original real picture shot by the camera1 in the data set, c1s2 represents the pedestrian shot by the camera1 and the generated picture of the shooting style of the camera 2;
FIGS. 3(a) to 3(c) are schematic views of the operation principle of the cycleGAN of the first embodiment;
FIGS. 4(a) -4 (b) are schematic diagrams illustrating initialization of the model by a single-label scheme according to the first embodiment;
FIGS. 5(a) -5 (d) are the prediction accuracy and recall of the selected pseudo tag candidate set of the first embodiment;
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
The first embodiment provides a pedestrian re-identification method based on camera style migration and single annotation;
the pedestrian re-identification method based on camera style migration and single marking comprises the following steps:
s1: acquiring a plurality of non-label images to be subjected to pedestrian re-identification;
s2: marking the pedestrian to be identified in one of all the unlabelled images to be subjected to pedestrian re-identification;
s3: inputting the marked images into a pre-trained cycleGAN network, and outputting a plurality of camera style migration images corresponding to each image in the marked images to realize data amplification; marking the pedestrian of the camera style migration image corresponding to the marked image;
the unlabelled images are sent to a pre-trained cycleGAN network, and a plurality of camera style migration images corresponding to each image in the unlabelled images are output to realize data amplification;
s4: and inputting the marked image, the unmarked image, the camera style transition image corresponding to the marked image and the camera style transition image corresponding to the unmarked image into a pre-trained CNN network, and outputting the identification result of the pedestrian to be identified in each image in the unmarked image.
As one or more embodiments, the pre-trained cycleGAN network, the step of training comprising:
s31: constructing a cycleGAN network;
s32: constructing a training set; the training set includes: a Market-1501 data set or a DukeMTMC-reiD data set;
s33: and taking the image containing the pedestrian b acquired by one camera a in the training set as an input value of the cycleGAN network, taking the images containing the pedestrian b acquired by all the cameras except the camera a in the training set as output values of the cycleGAN network, and training the cycleGAN network to obtain the trained cycleGAN network.
As one or more embodiments, the pre-trained CNN network, the step of training comprising:
s41: performing primary training on the CNN network by using an ImageNet data set; in the primary training process, the image is used as an input value of the CNN network, and the pedestrian label of the image is used as an output value of the CNN network;
s42: performing secondary training on the CNN network after primary training by using a real label data set; in the secondary training process, the image in the real label data set is used as an input value of the CNN network after primary training, and the pedestrian label of the image in the real label data set is used as an output value of the CNN network after primary training;
s43: performing three-level training on the CNN network after the second-level training by using a plurality of unlabeled images to be subjected to pedestrian re-identification; in the third-level training process, all the unlabeled images are used as input values of the CNN network after the second-level training, and the CNN network after the second-level training outputs a pseudo label set; the reliability of the pseudo labels is discriminated, the pseudo labels with the reliability higher than a set threshold value are reserved, and the pseudo labels with the reliability lower than the set threshold value are removed;
s44: inputting the data set with the real label and the data set with the high-reliability pseudo label into the CNN network after the three-stage training for final training; and obtaining the trained CNN network.
Further, before the step of performing secondary training on the CNN network after the primary training by using the real label dataset in S42, the method further includes:
and inputting the data set with the real label into a trained cycleGAN network for data amplification.
Further, in S43, before performing the third-level training step on the CNN network after the second-level training, using a plurality of unlabeled images to be subjected to pedestrian re-identification, the method further includes:
and inputting a plurality of unlabeled images to be subjected to pedestrian re-identification into a trained cycleGAN network for data amplification.
Further, in S43, the method for discriminating the reliability of the pseudo tag includes:
calculating the distance between the pedestrian image feature corresponding to the pseudo tag and the pedestrian image feature corresponding to the real tag of the same pedestrian, and if the distance is smaller than a set threshold value, indicating that the current pseudo tag is a reliable pseudo tag; otherwise, the current pseudo label is represented as an unreliable pseudo label.
Figure BDA0002371970520000081
And calculating Euclidean distance between the samples according to the distance (unlabeled samples and labeled samples) in the feature space, wherein the closer the Euclidean distance to the labeled samples is, the higher the reliability is.
And performing single sample labeling setting, and respectively setting a labeled data set:
Figure BDA0002371970520000082
where x represents input image data and y represents identity tag information representing the image data.
Setting a non-tagged dataset to a non-tagged dataset
Figure BDA0002371970520000083
A non-tagged dataset has only input data without a corresponding tag. Data set size | L | ═ nl+k,|U|=nl+k+2u。
Regarding the setting of ordinary single sample labeling, N pedestrian pictures of M pedestrians, K cameras are shared in a certain pedestrian re-identification data set O. For each pedestrian, a picture of a certain pedestrian is selected in the camera1 and is given an initialized label, if the picture of the pedestrian is not shot in the camera1, a picture is randomly selected from the next camera for labeling, and the purpose of this is to ensure that each pedestrian has a picture labeled with the label for initialization.
With respect to data set E migrated via data stylesAnd the total K x N pictures and the M pedestrians select and mark the pictures of each pedestrian under each camera style, so that the total K x M labeled pictures are used for initialization. In the evaluation stage, the trained CNN model is used for inquiring data and image database data, and the output result is a ranking list in all the image database data obtained by Euclidean distance between the inquired data and the image database data. When using unlabeled data, for each unlabeled data xiPredicting a pseudo label by belonging to U
Figure BDA0002371970520000091
Mixing L withe、StAnd MtRespectively noted as a style migration tag dataset, a pseudo tag dataset, and an index dataset. Our approach is style migration + common single sample annotation co-training CNN models.
CycleGAN, unlike general GAN, CycleGAN does not require a matching training image and can generate another scene pattern map from a specific scene pattern map. The cycleGAN has two discriminators (D)XAnd DY) And two generators (G)(XY)And G(YX)) The composition can not only complete the conversion from the data set X to the data set Y, but also meet the conversion from Y to X.
FIG. 2(a) to FIG. 2(c) are schematic diagrams of image style migration and data augmentation of the first embodiment; cam1 represents the original real picture shot by the camera1 in the data set, c1s2 represents the pedestrian shot by the camera1 and the generated picture of the shooting style of the camera 2;
in fig. 3(a) -3 (c), X, Y are two different styles of pedestrian picture data sets, respectively, and X, Y in the present disclosure represents two different camera styles of picture sets.
Fig. 3(a) shows the interconversion of the two style data sets X, Y, followed by the determination of confrontation of the actual picture with the generated picture by the respective discriminators.
A picture X of the X data set is input in FIG. 3(b), and is generated by a generator GXYGenerating Y-dataset style pictures
Figure BDA0002371970520000092
Then by the generator GYXWill be provided with
Figure BDA0002371970520000093
Conversion of pictures into data set X-style pictures
Figure BDA0002371970520000094
Such a cyclic process achieves the conversion of the image style of the dataset, and the new image is generated with the pedestrian subject of dataset X and the image style background of dataset Y. Similarly, fig. 3(c) shows the same operation of data set interchange, so that the data set expansion can be realized by the cyclic consistency countermeasure network.
CycleGAN is a unidirectional GAN of X- > Y plus Y- > X. Two GANs share two producers, each with a discriminator, and a total of two producers and two discriminators so the CycleGAN has four losses.
Wherein, the discriminator loss of X- > Y is:
Figure BDA0002371970520000101
the discriminator loss for Y- > X is:
Figure BDA0002371970520000102
generator GXYThe loss is:
Figure BDA0002371970520000103
generator GYXThe loss is:
Figure BDA0002371970520000104
the final loss of CycleGAN is therefore the sum of the four losses expressed as:
L(GXY,GYX,DX,DY)=LGAN(GXY,DY,X,Y)+LGAN(GYX,DX,Y,X)+LCYC(GXY)+LCYC(GYX) (5)
firstly, two cameras with different styles are taken as two different domain spaces, and an image set MijInterconversion is performed by CycleGAN.
Figure BDA0002371970520000105
And i, j is less than or equal to K, i represents the current camera, and j represents the rest of other cameras. After all image sets of different camera styles are mutually converted, the data set at the moment is changed into K times of the original data set.
Since there is an ID tag between the real picture and the generated picture, in the experiment, we use ID-discrete embedding (IDE) as the re-ID CNN model. Using softmax loss, the IDE considers the re-identified training as an image classification task. In the implementation process, the input images are all uniformly adjusted to 256 × 128 due to the low resolution of the generated pictures. Using ResNet50 as the backbone, fine tuning was done using ImageNet with the training model, deleting the last 1000-dimensional classification layer and adding two fully connected layers. And generating a new training sample by using the cycleGAN, expanding the data set in one data set, and regarding two different camera styles as two different domain spaces.
Using the CycleGAN to learn the style of each camera, and completing the mutual conversion identity loss of the image-image styles in every two cameras as follows:
Figure BDA0002371970520000111
finally, for K × N data sets captured by K cameras for M pedestrians, we obtained K × N-K × M training pictures.
After the cross-camera style picture migration is completed, all pictures are divided into an original picture data set O and a style migration picture data set E.
Then, the labeled data sets are respectively set
Figure BDA0002371970520000112
And
Figure BDA0002371970520000113
where x represents input image data and y represents identity tag information representing the image data.
Similarly, according to the original real picture and the non-paired label-free data set of the generated picture setting
Figure BDA0002371970520000114
And
Figure BDA0002371970520000115
unpaired unlabeled dataset has only input data, no label information.
Setting pseudo label data sets S and S according to an original image data set and a data set after cross-camera style migrationeThe predicted identity label is predicted by a CNN model trained by initial label data
Figure BDA0002371970520000116
And
Figure BDA0002371970520000117
the goal of single labeling is to fully utilize a large amount of unlabeled data, and one of the methods is to select the unlabeled data with useful value according to some methods and label the unlabeled data as a pseudo label. After the data set expansion is completed, the CNN model is updated in two steps. The first step is to migrate the label data set L by using the original label data set L and the data styleeAnd training a CNN model by the four parts of the pseudo label set and the label-free set. The second step selects some reliable pseudo labels as candidates from the large amount of unlabeled data according to a prediction reliability criterion. The CNN model is trained in the first iteration using only the label data set, during which all unlabeled data is not assigned a pseudo label. In the iterative process, the pseudo label candidate set is continuously enlarged. Then through style migration + single labeled squareThe method utilizes four parts of data to learn step by step, and finally obtains a robust model. The pseudo label is distributed to the unmarked candidate object by the identity label of the nearest marked field in the feature space, and the reliability of the pseudo label is measured according to the distance between the identity label and the unmarked candidate object.
The specific method is that firstly two marking data sets L and L are usedeThe initial model is trained, then the initial model is used to predict the pseudo-label that produces the unlabeled data, and all the pseudo-labeled data is put into a candidate set, which is continuously updated. Selecting the most reliable sample from the candidate set to give the most reliable sample a pseudo label, training the data containing the label and the data containing the pseudo label together to generate a more robust CNN model, wherein the selection process of the pseudo label is dynamic, the number of the pseudo labels is gradually increased, an iterative updating link is entered, and the CNN model is gradually updated step by step to make the CNN model more robust.
As shown in fig. 1, the original data set is first expanded by K (K is the number of cameras in the data set) times, and then the CNN network is initialized. And performing sample prediction on the unlabeled data through an initial CNN network, selecting a sample with high reliability to enter a candidate set, and then selecting a part of the sample as a pseudo label. After the selection of the pseudo label is finished, the CNN model is retrained together with the original label data, the whole process is dynamically changed, and the residual label-free sample data set is automatically updated after one iteration is finished. The number of pseudo-tags is increased gradually as the iteration progresses.
Network training:
the updating step of the model is first introduced. In the t-th iteration, four kinds of data are used for training, namely a label set L of the original data set and a label set L of the style migration data setePseudo mark set StAnd a tagless tag set Mt. The two label data sets are labeled by a single sample method, so the label sets L and LeWith reliable marking information. The pseudo label set contains the most reliable pseudo labels, so the pseudo label set StHas relatively reliable tag information. The tags and pseudo-tag sets were optimized using idclasfier and Cross-entry loss. Index set MtThere is no reliable pseudo tag norReliable utilization information, MtFor sorting out the remaining unlabelled data in the iterative process, as the iterative process progresses, MtIs also dynamically changed, and finally, The Exclusive Loss is used for optimizing The CNN model.
a. Training of tagged pictures + tagged style transition pictures
The most critical step in the whole transfer learning + single sample gradual learning process is to utilize the tag data set with the highest efficiency. In the labeling of data sets L and LeThe Cross-camera data migration method has the advantages that the real identity tags exist in the data collection, Cross-entry loss is uniformly used for optimizing the tag data of the two parts, when Cross-entry loss is used for optimizing the single-label tag data, the data collection is subjected to Cross-camera data migration, the original data collection is expanded to be K times (K is the number of cameras of the original data collection), and therefore the original single-label data collection is changed to be K times. For ease of understanding, labeled L and L in the representationeIn the real training process, the two parts of data are trained together. The objective function is used for this part:
Figure BDA0002371970520000131
wherein x isiAnd yiThe original real picture data sets represent the ith input image data and the ith corresponding identification tag, respectively. f is an identity classifier parameterized by ω, and f is used for classifying a feature embedding function φ, which is parameterized by θ as an embedding function. lCERepresenting a predictive tag f and a true identity tag yiA smaller value indicates a more similar to the true identity.
The process is particularly important in the whole process of gradually iterating and training the model and fully using the unmarked samples, because the initial iteration process does not need pseudo-labeled data, a good initial training model plays a very key role in the subsequent training of adding pseudo-labeled data, and the pseudo-labeled data can be more fully utilized. The method has K times of data quantity of the original data set, and provides a more robust initial model for the whole process.
b. Training of pseudo-tagged pictures + pseudo-tagged style migration pictures
In this part of data training, our method is divided into two parts, the first part is the original data set image data, and the second part is the camera style migration data set image data. And calculating characteristics according to the pictures of the original data set, and randomly selecting one of K cameras for training for each picture in all picture data. For the part of data, the model which can not be optimized well depends on the credibility of the pseudo label set, the pseudo labels with high credibility are selected to be trained together, the robustness of the model is increased, and otherwise, if unreliable samples are selected to be used as the pseudo labels to be trained, the model is seriously damaged. Pseudo label set S formed by selecting candidate and endowing pseudo labels to efficient sampling standardtAnd the system also has more reliable label supervision information, so Cross-error is also used for carrying out optimization training according to the rule processing of the labeled set:
Figure BDA0002371970520000141
wherein s isiE {0,1} is xiSelection index of (1), xiWhere this means that the sample is not labeled, siGenerated by previous label population process, which decides whether to select pseudo label data
Figure BDA0002371970520000142
The method is used for identity classification training.
c. Training of unlabeled picture + unlabeled style transition picture
In the utilization of a large amount of non-labeled data, the most common method is to use exclusive loss as auxiliary loss for self supervision and extract effective information to learn and judge the representation. In the process, the difference of input images is mainly learned to distinguish samples, and the difference between pedestrian images is used for extracting weak supervision information. Using an objective function can make some data x in the non-marked setiPush away other data x in feature spacej(i≠j)。
Figure BDA0002371970520000143
Feature library
Figure BDA0002371970520000144
All target image features are stored and updated after each iteration. MiDenotes the ith column in M, and
Figure BDA0002371970520000145
as data xiL2-normalized feature embedding. Because Mi-Mj||2=2-Mi TMjMaximizing data xiAnd xjThe Euclidean distance between the two is equal to the minimum cosine similarity Mi TMjThe above equation is optimized by a softmax-like loss:
Figure BDA0002371970520000151
wherein the content of the first and second substances,
Figure BDA0002371970520000152
for our CNN model, the effect is to extract D-dimensional features for each picture. Theta represents the weight of the re-ID model, the hyperparameter tau represents the temperature influencing factor of the sotfmax function, and higher temperatures tau lead to weaker probability distributions. After each iteration, M is updated as follows:
Figure BDA0002371970520000153
where the hyperparameter μ is the update rate of M, μ is not fixed to a constant, since M requires a smaller μ to accelerate the update of M at the beginning of training, and as epochs increase, M must become gradually stable, in which case a larger μ is required, so the whole process μ is gradually increased.
Exclusivity loss is a self-supervised auxiliary loss, mainly used for learning discriminant representations from unlabeled data without identifying identity tags. In the process of iteratively optimizing the model, exclusivity loss mainly causes the model to learn the differences of the input images to be distinguished. So that more attention needs to be paid to the details of the input identity throughout the process. More samples are accessed in an iterative process and the differences between pedestrian images are exploited to provide some useful supervisory information.
And (3) pseudo label assignment:
the selection and value taking process of the pseudo label plays a crucial role in the process of fully utilizing the unmarked data. In the label estimation screening process, the most common method is to extract unmarked data according to the classification loss, but the predicted value of the classification loss cannot adapt to detection evaluation well in practical application, and the classifier is easy to generate overfitting for single-labeled sample data. And (3) according to the distance in the feature space as a reference standard of the credibility of the pseudo label, adopting a Nearest Neighbor (NN) classifier, and distributing the pseudo label to the unmarked data according to the nearest marked neighbor. Euclidean distances are calculated from the input features of the original dataset, for all unlabeled data x by the following formulai∈U∪UeEvaluation was carried out:
Figure BDA0002371970520000161
the dissimilarity cost is evaluated according to the following formula:
d(θ;xi)=||φ(θ;xi)-φ(θ;x*)|| (14)
in the t-th iteration, m is selected by the following equationtThe unlabeled sample closest to the labeled sample:
Figure BDA0002371970520000162
where K denotes the number of cameras of a certain original picture data set, mtRepresents the t-th iterationThe size of the pseudo tag set is selected. Then the best predicted value y is obtained*Giving it a false identity mark
Figure BDA0002371970520000163
By setting a dummy tag
Figure BDA0002371970520000164
And putting the model into an iterative training optimization model.
With respect to the iterative scheme, equations (9) and (11) are first optimized at each iteration by
Figure BDA0002371970520000165
Labels are estimated for unlabeled data and reliable samples are selected for training by equation (15). Through mt←mt-1+p×(nl+k+2u), and p ∈ (0,1) is an amplification factor, which controls the sampling size of the candidate set in the iterative process.
The experimental scheme is as follows:
data set:
market-1501: the data set was collected on a Qinghua university campus with images from 6 different cameras, one of which was a low pixel. While the data set provides a training set and a test set. The training set contained 12,936 images and the test set contained 19,732 images. The image is automatically detected and cut by the detector, containing some detection errors (close to the actual use). There were 751 persons in the training data and 750 persons in the test set. So there are an average of 17.2 training data per person in the training set.
DukeMTMC-reiD: the images are from 8 different cameras. The data set provides a training set and a test set.
The training set contained 16,522 images and the test set contained 17,661 images. There were 702 total training data, and an average of 23.5 training data per person.
Evaluation indexes are as follows: we used cumulative matching feature (CMC) curves and mean accuracy (maps) to evaluate the performance of each method. The average Accuracy (AP) is calculated from its accuracy calling curve. The mAP is the average of the average precision in all queries. We list rank-1, rank-5, rank-10, rank-20 scores to represent CMC curves. The CMC score reflects the retrieval accuracy and the mAP reflects the recall rate.
Experimental setup
For this single annotation experiment based on data augmentation, we have added pictures of image interconversion under different styles of cameras in all datasets, in addition to randomly selecting one image from camera1 for each identity as initialization. If a camera does not have any record of an identity under it, we will randomly choose a sample from the next camera to ensure that there is a sample for each identity to initialize.
Details of the experiment
We first remove the last classification layer of ResNet-50 as a feature embedding model. Initialization was performed by ImageNet pre-training model. An additional full connection layer with batch normalization function and a classification layer are added on the top of the CNN feature extractor to pass through the label and pseudo label loss optimization model. For exclusivity loss, we process the unmarked features through the full join layer of batch join normalization and then perform the normalization operation.
TABLE 1 comparison with the state of the art
Figure BDA0002371970520000171
Baseline (supervised) showed the best performance of the full annotation compared to the state of the art in the Market-1501 dataset. Baseline (ONE-EXAMPLE) shows the initial model trained with the single annotation approach. TJOF means that THEJOINT OBJECTIVE FUNCTION is the latest method of Wu et al in the study of image dataset single annotation. Ours is our method.
baseline (supervised) is an experimental result under the training of hundred percent of labeled data, and no unlabeled data exists in the process, but in an actual scene, complete manual labeling is not practical. According to the method, the migration data also have the label marked by the original data sheet through the camera style migration learning, so that the data volume marked by the original data sheet is K times of that of the original data sheet in the initialization process. The initialization model we have trained in this way is more robust. According to the data set Market-1501, the experimental result shows that the hit rate of rank-1 is improved by 8.7%, the hit rate of rank-5 is improved to 64.5%, the hit rate of rank-5, rank-10 and rank-20 is also improved to a large extent, and finally the hit rate of average accuracy mAP is improved by 5.2% compared with the most advanced method.
TABLE 2
Figure BDA0002371970520000181
Comparison of baseline (Supervised) with the state of the art in the Duke-MTMC-reiD dataset shows the best performance for the full annotation. Baseline (ONE-EXAMPLE) shows the initial model trained with the single annotation approach. TJOF represents THE same JOINT OBJECTIVE FUNCTION [ Wu Y, Lin Y, Dong X, et al.progressive Learning for person Re-Identification with One Example [ J ]. IEEE Transactions on Imageprocessing,2019:1-1 ] is THE latest method for image data set annotation research by Wu et al. Ours is our method.
The performance of the method in the data set Duke-MTMC-reiD is still excellent, and in various evaluation indexes which are commonly used, the method is improved to a different extent compared with the most advanced method at present. The specific percentage improvement can be found in table 2. The effectiveness of our method was verified in the superior performance of both data sets.
Using comparison of raw data and migrated data
As the experimental result of the method is trained without using migration data, the iteration effect brought by initializing the model on the original data set by using the K-time expanded single-label setting is much higher than that of the latest method, and the hit rate and the average precision mAP of the method are respectively 3.1% and 4%. In order to better reflect the excellent performance of a migration data set in training, a comparison experiment is performed on two image data sets, namely Market-1501 and DukeMTMC-reiD, an original data set and the migration data set are respectively set to be respectively subjected to the training experiment, and a single-label setting of data expansion is still adopted in the aspect of initialization.
TABLE 3
Figure BDA0002371970520000191
Table 3 lists the current most advanced method compared to our method. Ours (W.O/T) shows the result of our method without using migration data in training. Ours (W/T) represents the results of our method in training using migration data
The results from table 3 show that if only the data style migration dataset plus the single annotation data of the original dataset is used during initialization, the results will be higher than with the current state-of-the-art methods. The results in our own two control experiments can be seen: experimental results on both data sets showed 5.6%, 3.8% and 1.2%, 0.7% improvement in Rank-1 hit and mean accuracy maps results over the non-use migration data.
After the single marking method is used for initialization, the effect is improved more obviously when the migration data is used for training. The reasons for this good result are mainly two: firstly, the data volume of K times of the original single label is realized by utilizing the camera style migration during initialization, a good initialization CNN model is provided, and the optimization of the subsequent iteration process is realized, because a good initialization can show stronger performance when unmarked data is utilized; secondly, after the migration data is put into training, each pedestrian has pictures which can be used for training under all different camera styles, so that the problem of overfitting caused by single camera shooting scene and small representative training data amount is solved to a certain extent, and the training effect is greatly increased.
FIGS. 4(a) -4 (b) are the DukeMTMC-reiD model initialized with our single notation scheme, and Ours (W.O/T) indicates that no camera style migration data is added during training. Ours (W/T) indicates that migration data was added during training. The abscissa represents the percentage of the selected data to the entire unmarked data. Each solid point represents an iteration result.
First, from the set of control experiments of fig. 4(a) to 4(b), we can clearly conclude that: introducing migration data during training may be more robust than models trained using only raw data set data. Secondly, it can be clearly seen that with the addition of higher and higher proportion of unlabeled data to training, the rank-1 hit rate and the average accuracy mAP are increased to a certain extent, and the trend of slow increase and even decline is generated. This is because as the unlabeled data is gradually increased in the iterative training process, there are fewer and fewer reliable pseudo-labeled data available for selection in the later iterative training process, so that more noise is introduced in the training process, which may cause a certain damage to the robustness of the model.
To further consolidate our results, we list the more detailed data in fig. 5(a) -5 (d).
Precision, Recall, represents the prediction accuracy and Recall of the selected pseudotag candidate set. Fig. 5(a) to 5(b) show the results of the experiment on the data set Market-1501, and fig. 5(c) to 5(d) show the results of the experiment on the data set DukeMTMC-reID.
Compared with the same single labeling experiment, the method has the advantages of being much better than other methods in performance even if the method is not trained by using migration data. Compared with the method of Wu Y, Lin Y, Dong X, equivalent, progressive Learning for Person Re-Identification with One instance Example [ J ]. IEEEtransactions on Image Processing,2019:1-1, the method achieves about 89.33% higher precision than 70% of the method, achieves 45.6% at the last iteration on the recall rate, and is also higher than any other single-labeled method. When the migration data is used for training, almost all performances are superior to those of the performances on the original data set, and the result shows that the problem that a training picture under a single camera is over-fitted can be well solved by utilizing the style migration data, and the problem that a pedestrian is difficult to recognize under different cameras is solved to a certain degree.
Regarding the effect of the amplification factor p on the experiment.
TABLE 4 p as expansion factor
Figure BDA0002371970520000211
Figure BDA0002371970520000221
It can be seen from table 4 that the smaller the scale-up factor p, the better the experimental results. The smaller magnification factor leads to a great improvement in the stability of the model in the iterative training of the model, but requires a longer time and more iterative steps.
In the study of single-label pedestrian re-identification, only one picture of one pedestrian is marked, so that the initialized model has low performance. The data volume is changed to be K times of that of an original data set through the camera style migration, and the expanded data is subjected to single-label setting, so that a label sample used for initialization is also changed to be K times of the original data set. Such an initialization model is more robust. Migration data is used in training, the result is better than that of only using an original data set, and the risk that overfitting is easily caused when a single camera shoots pictures is reduced. The experimental results prove the effectiveness of the method: the data set is augmented by the migration of the camera style inside the data set, so that a single-label experiment and self-walking learning scheme can obtain better performance.
The present disclosure is primarily directed to single sample pedestrian re-identification. By single sample annotation is meant that each pedestrian in the dataset has only one annotated sample and a plurality of unlabeled samples. The specific method is that a small amount of labeled data is used for training a CNN model, then the model obtained through training is used for predicting labels (pseudo labels) for unlabeled data, and finally the model is retrained by using the pseudo label data obtained through prediction and the original small amount of label data. However, because the angles, the color differences, the backgrounds and the like of the pictures under different cameras are different, the pictures of the same pedestrian are very different. When only one sample is marked, the learning of crossing the camera is lacked by the picture, and the recognition efficiency is not high. A new single-sample labeling scheme is provided, namely, interconversion of different camera image styles is carried out through a cycleGAN, so that the purpose that each pedestrian crossing the camera has at least one labeled image in each camera style is achieved. Each pedestrian has a marked picture under all different camera styles, the purpose of doing so is to effectively solve the problem that the picture marked by a single sample is not representative, and then a follow-up high-efficiency iterative process is realized by utilizing a self-learning framework. Our approach compares to the latest technology: in a data set Market-1501, the hit rate of the rank-1 is improved to 64.5 percent from the original 55.8 percent, the hit rates of the rank-5, the rank-10 and the rank-20 are respectively improved by 6.2 percent, 5.2 percent and 4.4 percent, and the average precision mAP is improved to 31.4 percent from the original 26.2 percent; our approach in the data set Duke-MTMC-reID also improved 6.3%, 4.1%, 3.4%, 2.8% and 1.5% in these several evaluation indexes, respectively, and we proved through experiments that: through the interconversion of different styles of pictures under different cameras in one data set, the problem of low recognition rate caused by lack of marked pedestrian pictures across the cameras is solved while the data set is effectively expanded.
The present disclosure focuses on single sample learning, which has both advantages, and can fully utilize data distribution information and category labels of unlabeled samples, thereby automatically utilizing the unlabeled samples to improve learning performance.
The present disclosure proposes cross-camera similarity mining during model training and finding trusted pictures. The specific method comprises the following steps: 1) style conversion of different camera images is carried out in the same data set by using CycleGAN, and which not only increases the data quantity of the labeled pictures, but also enables the labeled training pictures to be expanded from the original mode that only one camera is in the style to the mode that a plurality of cameras are in the style.
In the present disclosure, we use GAN network to realize data expansion and cross-camera labeling.
The second embodiment also provides a body left and right limb posture tracking and distinguishing system based on the face orientation;
pedestrian re-identification system based on camera style migration and single marking includes:
an acquisition module configured to: acquiring a plurality of non-label images to be subjected to pedestrian re-identification;
a marking module configured to: marking the pedestrian to be identified in one of all the unlabelled images to be subjected to pedestrian re-identification;
a data amplification module configured to: inputting the marked images into a pre-trained cycleGAN network, and outputting a plurality of camera style migration images corresponding to each image in the marked images to realize data amplification; marking the pedestrian of the camera style migration image corresponding to the marked image;
the unlabelled images are sent to a pre-trained cycleGAN network, and a plurality of camera style migration images corresponding to each image in the unlabelled images are output to realize data amplification;
an identification module configured to: and inputting the marked image, the unmarked image, the camera style transition image corresponding to the marked image and the camera style transition image corresponding to the unmarked image into a pre-trained CNN network, and outputting the identification result of the pedestrian to be identified in each image in the unmarked image.
In a third embodiment, the present embodiment further provides an electronic device, which includes a memory, a processor, and computer instructions stored in the memory and executed on the processor, where the computer instructions, when executed by the processor, implement the method in the first embodiment.
In a fourth embodiment, the present embodiment further provides a computer-readable storage medium for storing computer instructions, and the computer instructions, when executed by a processor, implement the method of the first embodiment.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. The pedestrian re-identification method based on camera style migration and single marking is characterized by comprising the following steps of:
acquiring a plurality of non-label images to be subjected to pedestrian re-identification;
marking the pedestrian to be identified in one of all the unlabelled images to be subjected to pedestrian re-identification;
inputting the marked images into a pre-trained cycleGAN network, and outputting a plurality of camera style migration images corresponding to each image in the marked images to realize data amplification; marking the pedestrian of the camera style migration image corresponding to the marked image;
the unlabelled images are sent to a pre-trained cycleGAN network, and a plurality of camera style migration images corresponding to each image in the unlabelled images are output to realize data amplification;
and inputting the marked image, the unmarked image, the camera style transition image corresponding to the marked image and the camera style transition image corresponding to the unmarked image into a pre-trained CNN network, and outputting the identification result of the pedestrian to be identified in each image in the unmarked image.
2. The method of claim 1, wherein the pre-trained cycleGAN network, the step of training comprising:
s31: constructing a cycleGAN network;
s32: constructing a training set;
s33: and taking the image containing the pedestrian b acquired by one camera a in the training set as an input value of the cycleGAN network, taking the images containing the pedestrian b acquired by all the cameras except the camera a in the training set as output values of the cycleGAN network, and training the cycleGAN network to obtain the trained cycleGAN network.
3. The method of claim 2, wherein the training set comprises: a Market-1501 data set or a DukeMTMC-reiD data set.
4. The method of claim 1, wherein the pre-trained CNN network, the step of training comprising:
s41: performing primary training on the CNN network by using an ImageNet data set; in the primary training process, the image is used as an input value of the CNN network, and the pedestrian label of the image is used as an output value of the CNN network;
s42: performing secondary training on the CNN network after primary training by using a real label data set; in the secondary training process, the image in the real label data set is used as an input value of the CNN network after primary training, and the pedestrian label of the image in the real label data set is used as an output value of the CNN network after primary training;
s43: performing three-level training on the CNN network after the second-level training by using a plurality of unlabeled images to be subjected to pedestrian re-identification; in the third-level training process, all the unlabeled images are used as input values of the CNN network after the second-level training, and the CNN network after the second-level training outputs a pseudo label set; the reliability of the pseudo labels is discriminated, the pseudo labels with the reliability higher than a set threshold value are reserved, and the pseudo labels with the reliability lower than the set threshold value are removed;
s44: inputting the data set with the real label and the data set with the high-reliability pseudo label into the CNN network after the three-stage training for final training; and obtaining the trained CNN network.
5. The method as claimed in claim 4, wherein before the step of performing secondary training on the CNN network after the primary training by using the true label data set in S42, the method further comprises:
and inputting the data set with the real label into a trained cycleGAN network for data amplification.
6. The method as claimed in claim 4, wherein before the step of performing the three-stage training on the secondary trained CNN network, using a plurality of unlabeled images to be subjected to pedestrian re-identification, the method further comprises:
and inputting a plurality of unlabeled images to be subjected to pedestrian re-identification into a trained cycleGAN network for data amplification.
7. The method according to claim 4, wherein in the step S43, the discrimination of the reliability of the pseudo label comprises the following specific steps:
calculating the distance between the pedestrian image feature corresponding to the pseudo tag and the pedestrian image feature corresponding to the real tag of the same pedestrian, and if the distance is smaller than a set threshold value, indicating that the current pseudo tag is a reliable pseudo tag; otherwise, the current pseudo label is represented as an unreliable pseudo label.
8. Pedestrian re-identification system based on camera style migration and single marking is characterized by including:
an acquisition module configured to: acquiring a plurality of non-label images to be subjected to pedestrian re-identification;
a marking module configured to: marking the pedestrian to be identified in one of all the unlabelled images to be subjected to pedestrian re-identification;
a data amplification module configured to: inputting the marked images into a pre-trained cycleGAN network, and outputting a plurality of camera style migration images corresponding to each image in the marked images to realize data amplification; marking the pedestrian of the camera style migration image corresponding to the marked image;
the unlabelled images are sent to a pre-trained cycleGAN network, and a plurality of camera style migration images corresponding to each image in the unlabelled images are output to realize data amplification;
an identification module configured to: and inputting the marked image, the unmarked image, the camera style transition image corresponding to the marked image and the camera style transition image corresponding to the unmarked image into a pre-trained CNN network, and outputting the identification result of the pedestrian to be identified in each image in the unmarked image.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the method of any of claims 1-7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 7.
CN202010053330.2A 2020-01-17 2020-01-17 Pedestrian re-identification method and system based on camera style migration and single marking Pending CN111242064A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010053330.2A CN111242064A (en) 2020-01-17 2020-01-17 Pedestrian re-identification method and system based on camera style migration and single marking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010053330.2A CN111242064A (en) 2020-01-17 2020-01-17 Pedestrian re-identification method and system based on camera style migration and single marking

Publications (1)

Publication Number Publication Date
CN111242064A true CN111242064A (en) 2020-06-05

Family

ID=70868708

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010053330.2A Pending CN111242064A (en) 2020-01-17 2020-01-17 Pedestrian re-identification method and system based on camera style migration and single marking

Country Status (1)

Country Link
CN (1) CN111242064A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069929A (en) * 2020-08-20 2020-12-11 之江实验室 Unsupervised pedestrian re-identification method and device, electronic equipment and storage medium
CN112131961A (en) * 2020-08-28 2020-12-25 中国海洋大学 Semi-supervised pedestrian re-identification method based on single sample
CN112507941A (en) * 2020-12-17 2021-03-16 中国矿业大学 Cross-vision field pedestrian re-identification method and device for mine AI video analysis
CN112528788A (en) * 2020-12-01 2021-03-19 重庆兆光科技股份有限公司 Re-recognition method based on domain invariant features and space-time features
CN113222114A (en) * 2021-04-22 2021-08-06 北京科技大学 Image data augmentation method and device
CN113256778A (en) * 2021-07-05 2021-08-13 爱保科技有限公司 Method, device, medium and server for generating vehicle appearance part identification sample
CN113449850A (en) * 2021-07-05 2021-09-28 电子科技大学 Intelligent inhibition method for clutter of sea surface monitoring radar
CN114299543A (en) * 2021-12-29 2022-04-08 福州大学 Unsupervised pedestrian re-identification method
CN115862087A (en) * 2022-09-26 2023-03-28 哈尔滨工业大学 Unsupervised pedestrian re-identification method and system based on reliability modeling
CN115909464A (en) * 2022-12-26 2023-04-04 淮阴工学院 Self-adaptive weak supervision label marking method for pedestrian re-identification
CN112507941B (en) * 2020-12-17 2024-05-10 中国矿业大学 Cross-view pedestrian re-identification method and device for mine AI video analysis

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008842A (en) * 2019-03-09 2019-07-12 同济大学 A kind of pedestrian's recognition methods again for more losing Fusion Model based on depth

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008842A (en) * 2019-03-09 2019-07-12 同济大学 A kind of pedestrian's recognition methods again for more losing Fusion Model based on depth

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
H. FAN 等: ""Unsupervised person re-identification: Clustering and fine-tuning"", 《ACM》 *
JUN-YAN ZHU等: ""Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks"", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 *
Y. LIN等: ""Improving person re-identification by attribute and identity learning"", 《ARXIV》 *
YU WU等: ""Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning"", 《IEEE》 *
YU WU等: ""Progressive Learning for Person Re-Identification With One Example"", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 *
Z. ZHONG等: ""Re-ranking person re-identification with k-reciprocal encoding"", 《IEEE》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112069929A (en) * 2020-08-20 2020-12-11 之江实验室 Unsupervised pedestrian re-identification method and device, electronic equipment and storage medium
CN112069929B (en) * 2020-08-20 2024-01-05 之江实验室 Unsupervised pedestrian re-identification method and device, electronic equipment and storage medium
CN112131961B (en) * 2020-08-28 2023-02-03 中国海洋大学 Semi-supervised pedestrian re-identification method based on single sample
CN112131961A (en) * 2020-08-28 2020-12-25 中国海洋大学 Semi-supervised pedestrian re-identification method based on single sample
CN112528788A (en) * 2020-12-01 2021-03-19 重庆兆光科技股份有限公司 Re-recognition method based on domain invariant features and space-time features
CN112528788B (en) * 2020-12-01 2023-11-21 重庆兆光科技股份有限公司 Domain invariant feature and space-time feature based re-identification method
CN112507941A (en) * 2020-12-17 2021-03-16 中国矿业大学 Cross-vision field pedestrian re-identification method and device for mine AI video analysis
CN112507941B (en) * 2020-12-17 2024-05-10 中国矿业大学 Cross-view pedestrian re-identification method and device for mine AI video analysis
CN113222114A (en) * 2021-04-22 2021-08-06 北京科技大学 Image data augmentation method and device
CN113222114B (en) * 2021-04-22 2023-08-15 北京科技大学 Image data augmentation method and device
CN113449850A (en) * 2021-07-05 2021-09-28 电子科技大学 Intelligent inhibition method for clutter of sea surface monitoring radar
CN113256778A (en) * 2021-07-05 2021-08-13 爱保科技有限公司 Method, device, medium and server for generating vehicle appearance part identification sample
CN114299543A (en) * 2021-12-29 2022-04-08 福州大学 Unsupervised pedestrian re-identification method
CN115862087A (en) * 2022-09-26 2023-03-28 哈尔滨工业大学 Unsupervised pedestrian re-identification method and system based on reliability modeling
CN115862087B (en) * 2022-09-26 2023-06-23 哈尔滨工业大学 Unsupervised pedestrian re-identification method and system based on reliability modeling
CN115909464A (en) * 2022-12-26 2023-04-04 淮阴工学院 Self-adaptive weak supervision label marking method for pedestrian re-identification
CN115909464B (en) * 2022-12-26 2024-03-26 淮阴工学院 Self-adaptive weak supervision tag marking method for pedestrian re-identification

Similar Documents

Publication Publication Date Title
CN111242064A (en) Pedestrian re-identification method and system based on camera style migration and single marking
CN111126360B (en) Cross-domain pedestrian re-identification method based on unsupervised combined multi-loss model
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
Saha et al. Deep learning for detecting multiple space-time action tubes in videos
Lin et al. RSCM: Region selection and concurrency model for multi-class weather recognition
CN111783576B (en) Pedestrian re-identification method based on improved YOLOv3 network and feature fusion
CN111832514B (en) Unsupervised pedestrian re-identification method and unsupervised pedestrian re-identification device based on soft multiple labels
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN110598543B (en) Model training method based on attribute mining and reasoning and pedestrian re-identification method
Kang et al. Detection and recognition of text embedded in online images via neural context models
Kobyshev et al. Matching features correctly through semantic understanding
CN111241987B (en) Multi-target model visual tracking method based on cost-sensitive three-branch decision
JP2022082493A (en) Pedestrian re-identification method for random shielding recovery based on noise channel
Tian et al. Robust joint learning network: improved deep representation learning for person re-identification
Ding et al. Let features decide for themselves: Feature mask network for person re-identification
Wu et al. Unsupervised visible-infrared person re-identification via progressive graph matching and alternate learning
CN114579794A (en) Multi-scale fusion landmark image retrieval method and system based on feature consistency suggestion
CN113128410A (en) Weak supervision pedestrian re-identification method based on track association learning
CN115082854A (en) Pedestrian searching method oriented to security monitoring video
CN113920470B (en) Pedestrian retrieval method based on self-attention mechanism
CN112215252B (en) Weak supervision target detection method based on-line difficult sample mining
Xia et al. Self‐training with one‐shot stepwise learning method for person re‐identification
Jia et al. An unsupervised person re‐identification approach based on cross‐view distribution alignment
Bello et al. Deep Learning-Based SOLO Architecture for Re-Identification of Single Persons by Locations
Constantinou et al. Spatial keyframe extraction of mobile videos for efficient object detection at the edge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200605

RJ01 Rejection of invention patent application after publication