CN110956158A - Pedestrian shielding re-identification method based on teacher and student learning frame - Google Patents

Pedestrian shielding re-identification method based on teacher and student learning frame Download PDF

Info

Publication number
CN110956158A
CN110956158A CN201911289053.9A CN201911289053A CN110956158A CN 110956158 A CN110956158 A CN 110956158A CN 201911289053 A CN201911289053 A CN 201911289053A CN 110956158 A CN110956158 A CN 110956158A
Authority
CN
China
Prior art keywords
pedestrian
network
data
teacher
shielding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911289053.9A
Other languages
Chinese (zh)
Inventor
赖剑煌
卓嘉璇
陈培佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
National Sun Yat Sen University
Original Assignee
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Sun Yat Sen University filed Critical National Sun Yat Sen University
Priority to CN201911289053.9A priority Critical patent/CN110956158A/en
Publication of CN110956158A publication Critical patent/CN110956158A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Abstract

The invention discloses a pedestrian shielding re-identification method based on a teacher student learning frame, which comprises the following steps of: firstly, training a teacher network, and simulating a training process of shielding pedestrians and re-identifying the pedestrians by using the existing large-scale complete pedestrian data, wherein the process is realized by a combined salient detection network with a cross-domain simulator, and the process is a teacher teaching process; then the teacher network is transmitted to the student network, so that the student network continues training on real small-scale pedestrian data sheltered by using a model of the teacher network, and the process is a student practice process; and finally, training to obtain a model with pedestrian discrimination and shielding robustness through the teacher teaching and student practice processes, wherein the model can be used for shielding pedestrians and then identifying the pedestrians. The invention can greatly improve the effect performance of the prior task of shielding pedestrians and re-identifying the pedestrians, and has wide application value.

Description

Pedestrian shielding re-identification method based on teacher and student learning frame
Technical Field
The invention relates to a pedestrian re-identification method under the shielding condition, in particular to a pedestrian re-identification method based on a teacher-student learning frame for multi-stage cross-domain learning.
Background
The pedestrian re-identification task refers to the step of crossing different cameras to search out pedestrians with the same identity under different conditions of different time, angles, illumination and the like. With the rapid development of intelligent monitoring systems, pedestrian re-identification technology is used in various public practical applications, aiming at finding specific pedestrians, such as criminals, children and missing persons, in different cameras. However, in the application of a real scene, the pedestrian shot by the camera is often shielded by static or dynamic shelters in the surrounding environment, such as other pedestrians, moving vehicles, buildings, flowers, trees, and the like, so that the loss of the target subject information and the interference of the sheltered information are caused, and therefore, the effect of the pedestrian re-identification task is reduced. Because the problem of occlusion is an unavoidable and non-negligible challenge in the task of pedestrian re-identification and has important practical significance, research on the re-identification of occluded pedestrians has become a key topic with great value in the field of computer vision.
The general research work for pedestrian re-identification can be mainly divided into two aspects, namely feature extraction and metric learning. The feature extraction is to extract important description information representing a target subject, and the extracted feature descriptor is called as a feature descriptor, and has robustness and discriminability to better meet the requirement of task matching. The metric learning is to establish a metric subspace to match with the feature descriptors representing the pedestrians after feature extraction, the feature descriptors of the same sample are drawn to a close distance in the metric space, and the feature descriptors of different samples are more separated in the feature space, so that the classification and identification of the identity of the pedestrians are realized. Although the general pedestrian re-identification research work is mature at present, the problem of the pedestrian re-identification by the pedestrian re-identification method is still greatly overcome. Mainly because the general pedestrian re-identification method creates attention to the whole image resulting in more influence from the occluded part.
In recent years, some researchers have proposed solutions to the problem of pedestrian occlusion re-identification. The main solution idea is to extract local features by partitioning, select local unshielded parts for matching and similarity measurement to reduce the influence of an occlusion region, and simultaneously combine global features to realize better re-identification effect of occluded pedestrians. Although the scheme can achieve a certain effect, because the feature extraction needs to be carried out on each small block obtained by dividing the small blocks respectively, the complexity of calculation is greatly improved, the problem of block alignment is easy to occur, and the effect of pedestrian re-identification is influenced. In addition to the limited research efforts in recent years due to the problem of occluded pedestrian re-identification, the problem has not been well researched due to the limitation of insufficient occlusion training data, resulting in slow progress of the related research efforts for occluded pedestrian re-identification.
Aiming at the problems of previous research work and training data limitation, the pedestrian shielding re-identification method based on the teacher-student learning frame is provided, and has important research significance and practical value.
Disclosure of Invention
The invention provides a pedestrian re-identification method based on a teacher-student learning frame aiming at the difficulty of a pedestrian re-identification task, the method can realize pedestrian re-identification through small-scale real shielding data training, and has the advantages of high matching rate and strong robustness.
The purpose of the invention is realized by the following technical scheme: a pedestrian shielding re-identification method based on a teacher student learning frame comprises the following steps:
firstly, training a teacher network, simulating a training process of shielding pedestrians and re-identifying pedestrians by using existing large-scale complete pedestrian data, wherein the process is realized by a combined salient detection network with a cross-domain simulator, and the process is a teacher teaching process; then, the teacher network is transmitted to the student network, so that the student network continues training on real small-scale pedestrian data sheltered by using a model of the teacher network, and the process is a student practice process; and finally, training to obtain a model with pedestrian discrimination and shielding robustness through the teacher teaching and student practice processes, wherein the model can be used for shielding pedestrians and then identifying the pedestrians. The invention can greatly improve the effect performance of the prior task of shielding pedestrians and re-identifying the pedestrians, and has wide application value.
Specifically, the pedestrian shielding re-identification method based on the teacher student learning frame comprises the following steps:
s1, generating a significant pedestrian mask label of complete pedestrian data by using an existing significant object detection model, and screening the generated sample, wherein the complete pedestrian data comprises a complete non-occluded pedestrian image, an identity label and a significant pedestrian mask label corresponding to pedestrians, and initial materials are provided for model training;
and S2, establishing a cross-domain simulator for simulating the process of identifying the blocked pedestrians by using the complete pedestrian data in order to realize the teaching process of the teacher. The cross-domain simulator sets a shielding selection probability, selects a certain proportion of complete pedestrian data to perform shielding simulation processing in each data loading process, and simultaneously gives new label information to provide a reliable data source for teacher network training;
s3, inputting data processed by a cross-domain simulator into a pedestrian re-identification network which is combined with the obvious detection, namely a teacher network, training, wherein the proportion of pedestrian data which is subjected to simulated shielding in the data to complete pedestrian data is increased along with the progress of training iteration times, and training is carried out until network loss is converged through continuous forward propagation and backward feedback adjustment of the network to obtain a basic model which has a robust pedestrian recognition function on shielding;
s4, generating a significant pedestrian mask label of real shielded pedestrian data by using the combined significant detection branch in the combined significant detection pedestrian re-identification network obtained in the step S3, and providing materials for training of a student network;
s5, building a teacher network in the reference step S3 for the student network in the practice process of the students, inheriting the network parameters obtained by the teacher network in the step S3, continuing training on real pedestrian shielding data, and obtaining a final network model through multi-round training convergence.
Preferably, in step S1, the generated sample is screened by a method comprising: if the average confidence of the significant pedestrian mask label generated by the sample is higher than a preset threshold, the sample is left as training data; and if the average confidence of the significant pedestrian mask labels generated by the samples is lower than or equal to a preset threshold, rejecting the samples.
Preferably, in step S2, a cross-domain simulator is established to realize a gradual span from the complete pedestrian data domain to the simulated occluded pedestrian data domain, and the steps are:
(1-1) loading all complete pedestrian data, and randomly selecting partial complete pedestrian data according to the proportion p to carry out shielding treatment;
(1-2) calculating the image area of each selected complete pedestrian sample, and calculating to obtain the size of the shielding area according to the set shielding proportion;
(1-3) selecting a background block from a background area of a complete pedestrian image, and scaling the background block to the size of a shielding area according to an indefinite length-width ratio, so that different shielding blocks are obtained in each operation;
(1-4) generating black blocks with the same size and shape for the blocking blocks, and processing the mask labels of the remarkable pedestrians;
(1-5) randomly selecting any position of a complete pedestrian image to cover a blocking block, covering a black block at the same position of a remarkable pedestrian mask label to finish simulated blocking in image operation, and keeping a pedestrian identity label unchanged;
(1-6) assigning a new two-class label to each simulated and generated pedestrian-obstructing sample, wherein the two-class label is an obstructing and non-obstructing two-class label, and the assigned label value is 1 and represents that pedestrians are obstructed;
(1-7) giving an occlusion and non-occlusion classification label with a label value of 0 to the complete pedestrian data without occlusion processing, wherein the pedestrian image, the significant mask label and the pedestrian identity label of the complete pedestrian data are kept unchanged, and the label value is 0 and represents that the pedestrian is not occluded (complete);
(1-8) repeating the steps (1-1) - (1-6) in each iteration of teacher network training, wherein the proportion p is larger and larger as the number of iteration rounds is increased, and more complete pedestrian data are selected for shielding processing until the training is stopped.
Preferably, in step S3, the pedestrian re-identification network for joint salient detection is composed of three parts, which are respectively a feature extraction trunk, a joint salient detection branch and a pedestrian classification identification branch. Removing the part of a full connection layer by adopting a ResNet-50 deep network for a feature extraction backbone; the combined significant detection branch is used for predicting a significant pedestrian mask, and each point is classified to be a pedestrian area or not by adopting a softmax loss function; the pedestrian classification and identification branch is used for classifying the identity of the pedestrian, and each pedestrian is taken as a category by adopting a softmax loss function; the trunk and the two branches are trained together.
Furthermore, in step S3, the significant detection branch is combined to calculate the classification loss error of each pixel, and then the classification loss errors of the accumulated pixels are fed back, so as to classify the foreground and the background of each pixel in the prediction mask, where the foreground is a pedestrian region and the background is a non-pedestrian region, and the branch is used to perform significant labeling on the pedestrian region in the image.
Specifically, let the loss function L of the joint significant detection branchSExpressed as:
Figure BDA0002315463560000041
wherein L issoftmaxRepresenting the softmax function and the cross-entropy loss function, f (-) and h (-) representing the feature extractor and the salient pedestrian detector, respectively. DFSet of data representing a complete pedestrian, comprising CFIndividual identity of pedestrian, a total of NFThe number of images is one of,
Figure BDA0002315463560000042
set of representations DFThe image of the ith image of (a),
Figure BDA0002315463560000043
respectively representing a corresponding pedestrian identity class label and a prominent pedestrian mask label, wherein cF∈{1,2,…,CF},sF∈SF. (p, q) represents the position coordinates of each pixel point on the image, HFAnd WFIndicating the length and width of the image.
In the invention, the pedestrian classification and identification branch is used for classifying the identity of the pedestrian, softmax cross entropy is used as a loss function, and L is usedCExpressed as:
Figure BDA0002315463560000044
wherein g (-) represents a pedestrian identity classifier. DFSet of data representing a complete pedestrian, comprising CFIndividual identity of pedestrian, a total of NFThe number of images is one of,
Figure BDA0002315463560000045
representing a data set DFThe image of the ith image of (a),
Figure BDA0002315463560000046
tag for indicating pedestrian identity class, LsoftmaxRepresenting the softmax function and the cross entropy loss function.
As an implementation mode, different weights are respectively given to loss functions of a joint significant detection branch and a pedestrian classification identification branch to form a pedestrian re-identification network loss function of joint significant detection, which is expressed by a formula as follows:
L(IF,CF,SF)=αLC(IF,CF)+(1-α)LS(IF,SF)
α is a hyper-parameter for balancing weight between two branch loss functions, α is in a value range of 0 to 1, and α is greater than 0.5, so that the pedestrian classification and identification branch can be used as a main task, and the network is assisted by the combined significant detection branch to pay attention to the pedestrian.
Preferably, in the step S3, in the network pedestrian classification identification branch based on the joint significant detection, since the cross-domain generator can bring a new label to the sample, that is, a blocking and non-blocking two-classification label, a blocking and non-blocking two-classification loss function is added to the pedestrian classification identification branch, and the ability of the feature extraction process to pay attention to the pedestrian is improved by determining whether the pedestrian is blocked. Occlusion and non-occlusion binary loss function LOCan be expressed as:
Figure BDA0002315463560000051
wherein b (-) represents an occlusion and non-occlusion two-class classifier, IF/O'Representing a new set for network training resulting from putting together the occlusion processed picture (picture in the O' data set) and the non-occlusion processed picture (picture in the F data set),
Figure BDA0002315463560000052
is shown as IF/O'The number i of samples in (a) is,
Figure BDA0002315463560000053
the data is 0 or 1, wherein 0 represents a complete (non-occlusion) pedestrian sample, and 1 represents a simulated occlusion pedestrian sample. Adding two classification loss functions of shielding and non-shielding, the loss function of the pedestrian classification identification branch becomes a multitask loss function, and using LMExpressed as:
LM(IF/O',CF/O',OF/O')=βLC(IF/O',CF/O')+(1-β)LO(IF/O',OF/O')
wherein, IF/O'Representing a new set for network training obtained by putting together the occlusion processed picture (picture in O' data set) and the non-occlusion processed picture (picture in F data set), CF/O'Representing the pedestrian identity label corresponding to each sample, namely a classification label; o isF/O'An attribute representing the occlusion versus non-occlusion class of each sample, i.e., a label for occlusion or non-occlusion. L iscAnd LoA loss function representing the identity classification of the pedestrian, an occlusion and non-occlusion two-classification loss function respectively, and β ∈ (0, 1) represents a hyperparameter balancing the weight between the two loss functions.
As another embodiment, different weights are respectively given to the loss functions of the joint significant detection branch and the pedestrian classification identification branch added with the cross-domain generator to form a pedestrian re-identification network loss function of the joint significant detection, which is expressed by a formula:
L(IF/O',CF/O',OF/O',SF/O')=αLM(IF/O',CF/O',OF/O')+(1-α)LS(IF/O',SF/O'))
α is a hyper-parameter for balancing weight between two branch loss functions, α is in a value range of 0 to 1, and α is greater than 0.5, so that the pedestrian classification and identification branch can be used as a main task, and the network is assisted by the combined significant detection branch to pay attention to the pedestrian.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention provides a pedestrian re-identification network framework based on a teacher student learning network, aiming at two challenges in a pedestrian re-identification task, namely, non-robust shielding influence caused by extracted features and insufficient shielding training data set. The framework utilizes the existing large-scale non-shielding complete pedestrian data to simulate and generate shielding pedestrian data through a teacher teaching stage, jointly participates in network training, obtains a basic model of shielding robustness, uses the basic model in a student practice stage for further training of real shielding pedestrian data, and finally obtains a model with the robustness to shielding. The model can learn a large amount of samples of sheltering from the condition, breaks through the restriction of sheltering from pedestrian's not enough data, and the network also can solve the not good influence that the shelter from the thing and bring through the key attention to the pedestrian simultaneously, improves effectively and shelters from pedestrian and sign the task performance again, reaches the pedestrian who shelters from the ideal under the problem and signs the effect again.
2. In the teaching process of teachers, a pedestrian re-identification network combining significant detection is designed. The network has the advantages that attention to pedestrian positions is paid through the shared feature extraction trunk and the joint significant detection branch and the pedestrian classification identification branch which are mutually assisted, so that the network is not interfered by the shielded positions, and the benefit of shielding robustness is achieved.
3. In order to realize the simulation of the re-identification process of the shielded pedestrian, the invention operates a large-scale complete pedestrian data set and a label through a cross-domain simulator. The probability increasing along with training is designed in the cross-domain simulator, complete pedestrian data is selected and processed, the process of transition from the complete pedestrian data to simulated pedestrian data shielding is more stable, and sufficient materials are provided for the teaching process of teachers.
Drawings
FIG. 1 is a schematic diagram of an embodiment of the method of the present invention.
Fig. 2 is a schematic diagram of the motivation of the present invention for pedestrian attention.
Fig. 3 is a schematic diagram of the joint significance detection branch of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent; the present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
Examples
As shown in fig. 2, in the prior art, for the problem that identification errors are caused by global attention adopted for a task of re-identifying an occluded pedestrian, the embodiment provides a method for re-identifying an occluded pedestrian, and as shown in fig. 1, the method adopts a learning framework based on teachers and students to be applied to the problem of re-identifying the occluded pedestrian, and performs parameter optimization and updating in two stages of a teacher network and a student network. Firstly, an original large-scale complete pedestrian training sample is given, and simulation of pedestrian shielding data is achieved through a cross-domain simulator in the teaching process of a teacher. And then, inputting the pedestrian re-identification network which is combined with the obvious detection into a teacher network for training to obtain a basic network with pedestrian classification judgment and shielding robustness. Then, in the practice process of students, the student network further trains and adjusts the network on the basis of the teacher network model and the real pedestrian blocking data. Through the simulation training of teacher's network teaching and the actual training of student's network practice, the model can learn a large amount of samples that shelter from the condition, breaks through the restriction that shelters from pedestrian data not enough, and the network also can be through the focus to the pedestrian and pay close attention to the not good influence of solving the shelter from bringing, improves effectively and shelters from pedestrian and sign the task performance again, reaches the pedestrian who shelters from the ideal under the problem and signs the effect again. The steps of the method are described in detail below with reference to the accompanying drawings.
S1, inputting complete pedestrian data including pedestrian images and pedestrian identity information. And operating the pedestrian image by using the existing salient object detection algorithm to generate a corresponding salient pedestrian mask serving as a new label. Screening a pedestrian sample: if the average confidence of the significant pedestrian mask label generated by the sample is higher than 0.5, the sample is left as training data; and if the average confidence of the significant pedestrian mask labels generated by the samples is less than or equal to 0.5, rejecting the samples. After the operation, the obtained complete pedestrian data comprises a complete pedestrian image, an identity label corresponding to the pedestrian and a significant pedestrian mask, and the obtained complete pedestrian data serves as training data.
And S2, generating combined data which are input in each iteration and contain complete pedestrians and blocking pedestrians through the training data obtained in the step S1 through a cross-domain simulator.
Complete pedestrian data includes a complete unobstructed pedestrian image and a pedestrian's corresponding identity tag, salient pedestrian mask, here denoted DFSet of data representing a complete pedestrian, comprising CFIndividual identity of pedestrian, a total of NFAn image, IFRepresents DFThe corresponding labels of the images in (1) are identity labels cF∈{1,2,…,CFAnd significant pedestrian mask sF∈SF. The cross-domain simulator implements a span from the complete pedestrian data domain to the simulation generated occluded pedestrian data, represented as a mapping function F: DF→DO',DO'Representing occluded pedestrian data generated by an occlusion simulation generator.
The step of generating simulated occluded pedestrian data is:
(1-1) first, all the complete pedestrian data D are loadedFRandomly selecting partial complete pedestrian data according to the proportion p to carry out shielding treatment;
(1-2) for each selected complete pedestrian sample IFCalculating the area of the image
Figure BDA0002315463560000081
And according to the settingGear ratio [ r1,r2]Calculating to obtain the size of the shielding area
Figure BDA0002315463560000082
(1-3) selecting a small background block patch from the background area of the complete pedestrian image, and scaling the small background block patch to the size of the shielded area in an uncertain length-width ratio;
(1-4) generating a black block black _ patch of the same size and shape for the blocking block for processing the prominent pedestrian mask label;
(1-5) randomly selecting any position of the complete pedestrian image to cover a blocking block, covering a black block at the same position of the significant pedestrian mask label, completing simulated blocking in image operation, and obtaining a new simulated pedestrian-blocked image IO'And prominent pedestrian label sO'Keeping pedestrian identity label unchanged cO'
(1-6) assigning a new two-class label (occlusion and non-occlusion two-class label) to the generated simulated occluded pedestrian sample, denoted oO'1, representing blocking a pedestrian;
(1-7) remaining intact pedestrian data without occlusion processing, with its pedestrian image, saliency mask label, pedestrian identity label left unchanged (I)F,cF,sF) Assigning occlusion and non-occlusion binary labels, denoted as oF0, representing an unobstructed (full) pedestrian;
(1-8) through a cross-domain simulator, the data used for network training is changed into the combined data of complete pedestrian data and simulated generated sheltered pedestrian data, which is represented as DF/O'As the number of training iteration rounds increases, the proportion of occluded pedestrian data generated by simulation occupying the training data will increase. The whole pedestrian data selected for the occlusion processing is increased until the training is stopped. The network can thus obtain occlusion robustness by observing many different types of occlusion images.
And S3, inputting the data set of the combined complete pedestrian and the simulated sheltered pedestrian into a pedestrian joint significant detection and identifying a teacher network for training, wherein the teacher network comprises three main parts, namely a feature extraction main part and two different branches. The basic deep network used by the feature extraction backbone is ResNet-50, with the full connection layer portion removed, acting as the feature extractor f (-) of the network.
One of the branches is a joint significance detection branch, as shown in fig. 3, the branch calculates the classification loss error of each pixel point, then accumulates the classification loss errors of the pixel points to perform feedback, and performs foreground and background classification on each pixel point in the prediction mask, wherein the foreground is a pedestrian region and the background is a non-pedestrian region, so that the branch can perform significance labeling on the pedestrian region in the image. The branch uses the softmax cross entropy of the pixel level as a loss function, namely, the classification of whether each pixel point is divided into pedestrian regions or not is carried out, and the joint significant detection branch is expressed as h (·), LSThe loss function is detected for the corresponding joint significance, then the loss function is expressed as:
Figure BDA0002315463560000091
the other branch is a pedestrian classification and identification branch, and classification of the identity of the pedestrian is realized.
As an embodiment, the pedestrian classification identification branch may use softmax cross entropy as a loss function, with each pedestrian as a category. By LCExpressed as:
Figure BDA0002315463560000092
wherein g (-) represents a pedestrian identity classifier.
In this embodiment, the loss functions of the two branches above are respectively given different weights, that is, the loss functions of the pedestrian re-identification network for joint significant detection are formed, and are expressed by a formula:
L(IF,CF,SF)=αLC(IF,CF)+(1-α)LS(IF,SF)
α is a hyper-parameter for balancing weight between two branch loss functions, α is in a value range of 0 to 1, and α is greater than 0.5, so that the pedestrian classification and identification branch can be used as a main task, and the network is assisted by the combined significant detection branch to pay attention to the pedestrian.
As another embodiment, the pedestrian classification identification branch has a pedestrian identity loss function LCIn addition, an auxiliary occlusion and non-occlusion binary loss function L is combinedOForm a multitask penalty function, using LMExpressed, the loss function is:
LM(IF/O',CF/O',OF/O')=βLC(IF/O',CF/O')+(1-β)LO(IF/O',OF/O')
wherein β ∈ (0, 1) is a hyperparameter that balances the weighted ratio of the two loss functions in the multitask loss function, and β is typically set greater than 0.5 to achieve pedestrian identity classification recognition as the primary task.
The teacher network shares the feature extraction backbone, and then the two branches act on the feature extraction backbone respectively to feed back supervision information, so that the teacher network guides the feature extraction process. Combining the two branches, the resulting overall loss function is:
L(IF/O',CF/O',OF/O',SF/O')=αLM(IF/O',CF/O',OF/O')+(1-α)LS(IF/O',SF/O')
the method comprises the following steps that α belongs to (0, 1) as a super parameter for balancing the proportion of two branches in the network, α is set to be more than 0.5 according to task requirements, the pedestrian classification and identification branches can be used as main tasks, and the significant detection branches are combined to assist the network to pay attention to pedestrians.
With the progress of teacher network training, the discrimination ability of the network to the pedestrian classification is enhanced on one hand, and the remarkable ability to the pedestrian is continuously improved on the other hand, which also reflects that the network can pay important attention to the pedestrian. Through the training of combining two branches, global attention in the past is gradually developed into attention to the area where pedestrians are remarkable, the performance of shielding robustness is continuously enhanced, and the model can extract better features for subsequent matching aiming at pedestrian re-identification of shielding problems.
And S4, after the teacher network training is finished, because the combined significant detection branch of the teacher network can predict the pedestrian area, and the capability of significantly detecting the pedestrians is enhanced in the training of the step S3, the generation of significant pedestrian mask labels is performed on real pedestrian shielding data by adopting the combined significant detection branch of the teacher network, so that materials are provided for the student network training.
And S5, in order to achieve a better effect in the real application, training of real pedestrian shielding data needs to be performed on the basis of the teacher network training result in the step S3. The training of the real pedestrian shielding data is based on a model of a teacher network, and is performed by a student network, and the structure of the student network is set in a mode of referring to the teacher network, and the training is performed on the real pedestrian shielding data until parameters are converged, so that a final network model is obtained.
In this embodiment, the effect of the method is explained through experiments, and the experimental database selects an occupancy-REID pedestrian-obstructing database, a Partial-REID pedestrian-obstructing database, a P-DukeMTMC-REID pedestrian-obstructing database, and a P-ETHZ pedestrian-obstructing database: wherein the Occluded-REID database contains 200 different pedestrians, each pedestrian having 5 images of pedestrians of different occlusion types and 5 images of non-Occluded pedestrians, totaling 2000 images. In the experiment, an image of a pedestrian which is shielded is used as an image of a query domain, an image of a pedestrian which is not shielded is used as an image of a search domain, images of 100 pedestrians are selected to form a training set, and the remaining 100 pedestrians are used as a test set; the Partial-REID database contains 900 images of 60 different pedestrians. Each pedestrian has 5 images of different occlusion types, 5 local images with the occlusion removed and 5 images without the occlusion of the whole body, 30 pedestrians are selected for training, and the rest 30 pedestrians are tested. The P-DukeMTMC-reID database contains 24143 images of 1299 pedestrians, each containing a full-body unobstructed image and multiple obstructed images, of which 665 pedestrians were selected to make up the data set and the remaining 634 pedestrians were used as the test set. The P-ETHZ database contains 3897 images of 85 pedestrians, of which 43 are selected to make up the training set and the remaining 42 are used as the test set.
The invention selects ResNet-50 as an initialization network, and verifies the effectiveness of three parts, namely a teacher network, a joint significant detection branch and a cross-domain simulator in a teacher student learning network on an Occluded-REID database, a Partial-REID database, a P-DukeMTMC-reID database and a P-ETHZ database, as shown in Table 1:
TABLE 1 Effect of the invention in each part
Figure BDA0002315463560000111
As can be seen from table 1, after joining the teacher network, the recognition accuracy is greatly improved by 38.50%, 40.33%, 6.34% and 17.38%, respectively. Due to the fact that the P-DukeMTMC-reiD data is large in scale, difficulty in the network practice process of students is increased, and compared with the other three databases, the identification rate after the students are added into a teacher network is improved a little. Moreover, the combined significant detection branch is added, so that the recognition accuracy rates are respectively improved by 5.80%, 6.67%, 4.69% and 5.95%, and the recognition accuracy rates are respectively improved by 5.17%, 10.00%, 4.15% and 8.58% by adding a cross-domain simulator. The results in table 1 show that the combination of the teacher network, the joint significance detection branch and the cross-domain simulator can achieve the best effect.
This example also compares the method of the present invention with some existing mainstream methods based on the conventional descriptor and deep network, and the comparison results on the Occluded-REID database, Partial-REID database, P-DukeMTMC-REID database and P-ETHZ database are shown in Table 2.
Table 2 comparison of the present invention with mainstream Algorithm
Figure BDA0002315463560000121
The results in table 2 show that the recognition accuracy of the invention on the Occluded-REID database, the Partial-REID database, the P-DukeMTMC-reiD database and the P-ETHZ database reaches 73.69%, 82.67%, 51.42% and 62.86% respectively, which is superior to most pedestrian re-identification mainstream algorithms, and the effect of the invention on the problem of pedestrian re-identification by blocking is up to the advanced level of the field.
The techniques described herein may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. For a hardware implementation, the processing modules may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Programmable Logic Devices (PLDs), field-programmable gate arrays (FPGAs), processors, controllers, micro-controllers, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
For a firmware and/or software implementation, the techniques may be implemented with modules (e.g., procedures, steps, flows, and so on) that perform the functions described herein. The firmware and/or software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A pedestrian shielding re-identification method based on a teacher student learning frame is characterized by comprising the following steps:
firstly, training a teacher network, simulating a training process of shielding pedestrians and re-identifying pedestrians by using existing large-scale complete pedestrian data, wherein the process is realized by a combined salient detection network with a cross-domain simulator, and the process is a teacher teaching process;
then, the teacher network is transmitted to the student network, so that the student network continues training on real small-scale pedestrian data sheltered by using a model of the teacher network, and the process is a student practice process;
and finally, training to obtain a model with pedestrian discrimination and shielding robustness through the teacher teaching and student practice processes, wherein the model is used for shielding pedestrians and then identifying the pedestrians.
2. The occluded pedestrian re-identification method based on the teacher student learning frame according to claim 1, comprising the steps of:
s1, generating a remarkable pedestrian mask label of complete pedestrian data, screening the generated sample, wherein the complete pedestrian data comprises a complete non-blocking pedestrian image, an identity label corresponding to a pedestrian and the remarkable pedestrian mask label, and providing an initial material for model training;
s2, establishing a cross-domain simulator for simulating a pedestrian shielding re-identification process by using complete pedestrian data, setting a shielding selection probability by the cross-domain simulator, selecting a certain proportion of complete pedestrian data in each data loading process to perform shielding simulation treatment, and simultaneously giving new label information to provide a reliable data source for teacher network training;
s3, inputting data processed by a cross-domain simulator into a pedestrian re-identification network which is combined with the obvious detection, namely a teacher network, training, wherein the proportion of pedestrian data which is subjected to simulated shielding in the data to complete pedestrian data is increased along with the progress of training iteration times, and training is carried out until network loss is converged through continuous forward propagation and backward feedback adjustment of the network to obtain a basic model which has a robust pedestrian recognition function on shielding;
s4, generating a significant pedestrian mask label of real shielded pedestrian data by using the combined significant detection branch in the combined significant detection pedestrian re-identification network obtained in the step S3, and providing materials for training of a student network;
s5, building a teacher network in the reference step S3 for the student network in the practice process of the students, inheriting the network parameters obtained by the teacher network in the step S3, continuing training on real pedestrian shielding data, and obtaining a final network model through multi-round training convergence.
3. The method for re-identifying blocked pedestrians based on teacher student learning frame as claimed in claim 2, wherein in step S1, the generated samples are screened by: if the average confidence of the significant pedestrian mask label generated by the sample is higher than a preset threshold, the sample is left as training data; and if the average confidence of the significant pedestrian mask labels generated by the samples is lower than or equal to a preset threshold, rejecting the samples.
4. The method for re-identifying occluded pedestrians based on teacher student learning frame as claimed in claim 2, wherein in step S2, a cross-domain simulator is established to realize gradual span from complete pedestrian data domain to simulated occluded pedestrian data domain, comprising the steps of:
(1-1) loading all complete pedestrian data, and randomly selecting partial complete pedestrian data according to the proportion p to carry out shielding treatment;
(1-2) calculating the image area of each selected complete pedestrian sample, and calculating to obtain the size of the shielding area according to the set shielding proportion;
(1-3) selecting a background block from a background area of a complete pedestrian image, and scaling the background block to the size of a shielding area according to an indefinite length-width ratio, so that different shielding blocks are obtained in each operation;
(1-4) generating black blocks with the same size and shape for the blocking blocks, and processing the mask labels of the remarkable pedestrians;
(1-5) randomly selecting any position of a complete pedestrian image to cover a blocking block, covering a black block at the same position of a remarkable pedestrian mask label to finish simulated blocking in image operation, and keeping a pedestrian identity label unchanged;
(1-6) assigning a new two-class label to each simulated and generated pedestrian-obstructing sample, wherein the two-class label is an obstructing and non-obstructing two-class label, and the assigned label value is 1 and represents that pedestrians are obstructed;
(1-7) giving an occlusion and non-occlusion classification label with a label value of 0 to the complete pedestrian data without occlusion processing, wherein the pedestrian image, the significant mask label and the pedestrian identity label of the complete pedestrian data are kept unchanged, and the occlusion and non-occlusion classification label represents that the pedestrian is not occluded;
(1-8) repeating the steps (1-1) - (1-6) in each iteration of teacher network training, wherein the proportion p is larger and larger as the number of iteration rounds is increased, and more complete pedestrian data are selected for shielding processing until the training is stopped.
5. The shielded pedestrian re-identification method based on the teacher student learning frame as claimed in claim 2, wherein in step S3, the pedestrian re-identification network of joint significance detection is composed of three parts, which are respectively a feature extraction trunk, a joint significance detection branch and a pedestrian classification identification branch; removing the part of a full connection layer by adopting a ResNet-50 deep network for a feature extraction backbone; the combined significant detection branch is used for predicting a significant pedestrian mask, and each point is classified to be a pedestrian area or not by adopting a softmax loss function; the pedestrian classification and identification branch is used for classifying the identity of the pedestrian, and each pedestrian is taken as a category by adopting a softmax loss function; training the model by the trunk and the two branches;
and calculating the classification loss error of each pixel point by combining the significant detection branch, then accumulating the classification loss errors of the pixel points to perform feedback and return, performing foreground and background classification on each pixel point in the prediction mask, wherein the foreground is a pedestrian region, the background is a non-pedestrian region, and the branch is used for significantly marking the pedestrian region in the image.
6. The teacher-student learning frame-based pedestrian re-identification occlusion method as claimed in claim 5, wherein a loss function L of the joint significance detection branch is setSExpressed as:
Figure FDA0002315463550000031
wherein L issoftmaxRepresenting a softmax function and a cross entropy loss function, f (-) and h (-) representing a feature extractor and a salient pedestrian detector, respectively;
Figure FDA0002315463550000032
set of representations DFThe ith image of (1), DFSet of data representing a complete pedestrian, comprising CFIndividual identity of pedestrian, a total of NFThe number of images is one of,
Figure FDA0002315463550000033
respectively representing a corresponding pedestrian identity class label and a prominent pedestrian mask label, wherein cF∈{1,2,…,CF},sF∈SF(p, q) represents the position coordinates of each pixel point on the image, HFAnd WFIndicating the length and width of the image.
7. The teacher-student learning frame-based pedestrian re-identification sheltering method according to claim 6, wherein the pedestrian classification and identification branch is used for classifying the identity of the pedestrian, and the softmax loss function is used and the L is usedCExpressed as:
Figure FDA0002315463550000034
wherein g (-) represents a pedestrian identity classifier, DFSet of data representing a complete pedestrian, comprising CFPersonal bodyAll people are NFThe number of images is one of,
Figure FDA0002315463550000035
representing a data set DFThe image of the ith image of (a),
Figure FDA0002315463550000036
tag for indicating pedestrian identity class, LsoftmaxRepresenting the softmax function and the cross entropy loss function.
8. The teacher-student learning frame-based pedestrian re-identification sheltered method according to claim 7, wherein the loss functions of the joint significance detection branch and the pedestrian classification identification branch are respectively given different weights to form a pedestrian re-identification network loss function of the joint significance detection, expressed by a formula:
L(IF,CF,SF)=αLC(IF,CF)+(1-α)LS(IF,SF)
α is an over-parameter for balancing the weight between two branch loss functions, and α is in the range of 0 to 1.
9. The teacher student learning frame based occluded pedestrian re-identification method of claim 6, wherein in the step S3 combined with the significantly detected pedestrian re-identification network pedestrian classification identification branch, the occlusion and non-occlusion two-classification loss function LOExpressed as:
Figure FDA0002315463550000037
wherein b (-) represents an occlusion and non-occlusion two-class classifier, IF/O'Representing a new set for network training obtained by putting the pictures subjected to the occlusion processing and the pictures not subjected to the occlusion processing together,
Figure FDA0002315463550000038
is shown as IF/O'The number i of samples in (a) is,
Figure FDA0002315463550000039
the data is 0 or 1, wherein 0 represents a complete pedestrian sample, and 1 represents a simulated blocking pedestrian sample;
adding two classification loss functions of shielding and non-shielding, the loss function of the pedestrian classification identification branch becomes a multitask loss function, and using LMExpressed as:
LM(IF/O',CF/O',OF/O')=βLC(IF/O',CF/O')+(1-β)LO(IF/O',OF/O')
wherein, IF/O' denotes a new set for network training, obtained by putting together the occlusion-processed picture and the non-occlusion-processed picture, CF/O'Representing the pedestrian identity label corresponding to each sample, namely a classification label; o isF/O'Properties representing occlusion versus non-occlusion classes for each sample, i.e. labels for occlusion or non-occlusion, LcAnd LoA loss function representing the identity classification of the pedestrian, an occlusion and non-occlusion two-classification loss function respectively, and β ∈ (0, 1) represents a hyperparameter balancing the weight between the two loss functions.
10. The shielded pedestrian re-identification method based on the teacher student learning frame according to claim 9, wherein the loss functions of the joint significant detection branch and the pedestrian classification identification branch added with the cross-domain generator are respectively given different weights to form a pedestrian re-identification network loss function of the joint significant detection, which is expressed by a formula:
L(IF/O',CF/O',OF/O',SF/O')=αLM(IF/O',CF/O',OF/O')+(1-α)LS(IF/O',SF/O'))
α is an over-parameter for balancing the weight between two branch loss functions, and α is in the range of 0 to 1.
CN201911289053.9A 2019-12-12 2019-12-12 Pedestrian shielding re-identification method based on teacher and student learning frame Pending CN110956158A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911289053.9A CN110956158A (en) 2019-12-12 2019-12-12 Pedestrian shielding re-identification method based on teacher and student learning frame

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911289053.9A CN110956158A (en) 2019-12-12 2019-12-12 Pedestrian shielding re-identification method based on teacher and student learning frame

Publications (1)

Publication Number Publication Date
CN110956158A true CN110956158A (en) 2020-04-03

Family

ID=69981622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911289053.9A Pending CN110956158A (en) 2019-12-12 2019-12-12 Pedestrian shielding re-identification method based on teacher and student learning frame

Country Status (1)

Country Link
CN (1) CN110956158A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111553333A (en) * 2020-07-10 2020-08-18 支付宝(杭州)信息技术有限公司 Face image recognition model training method, recognition method, device and electronic equipment
CN111738289A (en) * 2020-05-09 2020-10-02 北京三快在线科技有限公司 Computer vision CV model training method and device, electronic equipment and storage medium
CN111783606A (en) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of face recognition network
CN111814705A (en) * 2020-07-14 2020-10-23 广西师范大学 Pedestrian re-identification method based on batch blocking shielding network
CN112149542A (en) * 2020-09-15 2020-12-29 北京字节跳动网络技术有限公司 Training sample generation method, image classification method, device, equipment and medium
CN113076917A (en) * 2021-04-20 2021-07-06 南京甄视智能科技有限公司 Pedestrian quality evaluation method and system
CN113505797A (en) * 2021-09-09 2021-10-15 深圳思谋信息科技有限公司 Model training method and device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596211A (en) * 2018-03-29 2018-09-28 中山大学 It is a kind of that pedestrian's recognition methods again is blocked based on focusing study and depth e-learning
CN108960127A (en) * 2018-06-29 2018-12-07 厦门大学 Pedestrian's recognition methods again is blocked based on the study of adaptive depth measure
CN109934177A (en) * 2019-03-15 2019-06-25 艾特城信息科技有限公司 Pedestrian recognition methods, system and computer readable storage medium again
CN110197154A (en) * 2019-05-30 2019-09-03 汇纳科技股份有限公司 Pedestrian recognition methods, system, medium and the terminal again of fusion site texture three-dimensional mapping
CN110321801A (en) * 2019-06-10 2019-10-11 浙江大学 A kind of change one's clothes pedestrian recognition methods and system again based on autoencoder network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108596211A (en) * 2018-03-29 2018-09-28 中山大学 It is a kind of that pedestrian's recognition methods again is blocked based on focusing study and depth e-learning
CN108960127A (en) * 2018-06-29 2018-12-07 厦门大学 Pedestrian's recognition methods again is blocked based on the study of adaptive depth measure
CN109934177A (en) * 2019-03-15 2019-06-25 艾特城信息科技有限公司 Pedestrian recognition methods, system and computer readable storage medium again
CN110197154A (en) * 2019-05-30 2019-09-03 汇纳科技股份有限公司 Pedestrian recognition methods, system, medium and the terminal again of fusion site texture three-dimensional mapping
CN110321801A (en) * 2019-06-10 2019-10-11 浙江大学 A kind of change one's clothes pedestrian recognition methods and system again based on autoencoder network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JIAXUAN ZHUO ET AL.: "A Novel Teacher-Student Learning Framework For Occluded Person Re-Identification" *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738289A (en) * 2020-05-09 2020-10-02 北京三快在线科技有限公司 Computer vision CV model training method and device, electronic equipment and storage medium
CN111783606A (en) * 2020-06-24 2020-10-16 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of face recognition network
CN111783606B (en) * 2020-06-24 2024-02-20 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of face recognition network
CN111553333A (en) * 2020-07-10 2020-08-18 支付宝(杭州)信息技术有限公司 Face image recognition model training method, recognition method, device and electronic equipment
CN111553333B (en) * 2020-07-10 2020-10-16 支付宝(杭州)信息技术有限公司 Face image recognition model training method, recognition method, device and electronic equipment
CN111814705A (en) * 2020-07-14 2020-10-23 广西师范大学 Pedestrian re-identification method based on batch blocking shielding network
CN111814705B (en) * 2020-07-14 2022-08-02 广西师范大学 Pedestrian re-identification method based on batch blocking shielding network
CN112149542A (en) * 2020-09-15 2020-12-29 北京字节跳动网络技术有限公司 Training sample generation method, image classification method, device, equipment and medium
CN113076917A (en) * 2021-04-20 2021-07-06 南京甄视智能科技有限公司 Pedestrian quality evaluation method and system
CN113076917B (en) * 2021-04-20 2022-08-12 南京甄视智能科技有限公司 Pedestrian quality evaluation method and system
CN113505797A (en) * 2021-09-09 2021-10-15 深圳思谋信息科技有限公司 Model training method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
Tabernik et al. Deep learning for large-scale traffic-sign detection and recognition
CN110956158A (en) Pedestrian shielding re-identification method based on teacher and student learning frame
CN109034044B (en) Pedestrian re-identification method based on fusion convolutional neural network
CN109359559B (en) Pedestrian re-identification method based on dynamic shielding sample
Wang et al. Actionness estimation using hybrid fully convolutional networks
CN111027493B (en) Pedestrian detection method based on deep learning multi-network soft fusion
CN108596211B (en) Shielded pedestrian re-identification method based on centralized learning and deep network learning
CN110210551A (en) A kind of visual target tracking method based on adaptive main body sensitivity
CN111767882A (en) Multi-mode pedestrian detection method based on improved YOLO model
CN109902806A (en) Method is determined based on the noise image object boundary frame of convolutional neural networks
CN109711262B (en) Intelligent excavator pedestrian detection method based on deep convolutional neural network
CN106845499A (en) A kind of image object detection method semantic based on natural language
CN106096602A (en) A kind of Chinese licence plate recognition method based on convolutional neural networks
CN108304798A (en) The event video detecting method of order in the street based on deep learning and Movement consistency
CN107133569A (en) The many granularity mask methods of monitor video based on extensive Multi-label learning
CN108960184A (en) A kind of recognition methods again of the pedestrian based on heterogeneous components deep neural network
CN104021381B (en) Human movement recognition method based on multistage characteristics
Yu et al. Research of image main objects detection algorithm based on deep learning
Fadaeddini et al. A deep residual neural network for low altitude remote sensing image classification
CN104680193B (en) Online objective classification method and system based on quick similitude network integration algorithm
CN111368660A (en) Single-stage semi-supervised image human body target detection method
CN114758288A (en) Power distribution network engineering safety control detection method and device
Janku et al. Fire detection in video stream by using simple artificial neural network
CN108345866B (en) Pedestrian re-identification method based on deep feature learning
Li et al. Pedestrian detection based on light perception fusion of visible and thermal images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200403

WD01 Invention patent application deemed withdrawn after publication