CN110956158A

CN110956158A - Pedestrian shielding re-identification method based on teacher and student learning frame

Info

Publication number: CN110956158A
Application number: CN201911289053.9A
Authority: CN
Inventors: 赖剑煌; 卓嘉璇; 陈培佳
Original assignee: National Sun Yat Sen University
Current assignee: Sun Yat Sen University; National Sun Yat Sen University
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-04-03

Abstract

The invention discloses a pedestrian shielding re-identification method based on a teacher student learning frame, which comprises the following steps of: firstly, training a teacher network, and simulating a training process of shielding pedestrians and re-identifying the pedestrians by using the existing large-scale complete pedestrian data, wherein the process is realized by a combined salient detection network with a cross-domain simulator, and the process is a teacher teaching process; then the teacher network is transmitted to the student network, so that the student network continues training on real small-scale pedestrian data sheltered by using a model of the teacher network, and the process is a student practice process; and finally, training to obtain a model with pedestrian discrimination and shielding robustness through the teacher teaching and student practice processes, wherein the model can be used for shielding pedestrians and then identifying the pedestrians. The invention can greatly improve the effect performance of the prior task of shielding pedestrians and re-identifying the pedestrians, and has wide application value.

Description

Pedestrian shielding re-identification method based on teacher and student learning frame

Technical Field

The invention relates to a pedestrian re-identification method under the shielding condition, in particular to a pedestrian re-identification method based on a teacher-student learning frame for multi-stage cross-domain learning.

Background

The pedestrian re-identification task refers to the step of crossing different cameras to search out pedestrians with the same identity under different conditions of different time, angles, illumination and the like. With the rapid development of intelligent monitoring systems, pedestrian re-identification technology is used in various public practical applications, aiming at finding specific pedestrians, such as criminals, children and missing persons, in different cameras. However, in the application of a real scene, the pedestrian shot by the camera is often shielded by static or dynamic shelters in the surrounding environment, such as other pedestrians, moving vehicles, buildings, flowers, trees, and the like, so that the loss of the target subject information and the interference of the sheltered information are caused, and therefore, the effect of the pedestrian re-identification task is reduced. Because the problem of occlusion is an unavoidable and non-negligible challenge in the task of pedestrian re-identification and has important practical significance, research on the re-identification of occluded pedestrians has become a key topic with great value in the field of computer vision.

The general research work for pedestrian re-identification can be mainly divided into two aspects, namely feature extraction and metric learning. The feature extraction is to extract important description information representing a target subject, and the extracted feature descriptor is called as a feature descriptor, and has robustness and discriminability to better meet the requirement of task matching. The metric learning is to establish a metric subspace to match with the feature descriptors representing the pedestrians after feature extraction, the feature descriptors of the same sample are drawn to a close distance in the metric space, and the feature descriptors of different samples are more separated in the feature space, so that the classification and identification of the identity of the pedestrians are realized. Although the general pedestrian re-identification research work is mature at present, the problem of the pedestrian re-identification by the pedestrian re-identification method is still greatly overcome. Mainly because the general pedestrian re-identification method creates attention to the whole image resulting in more influence from the occluded part.

In recent years, some researchers have proposed solutions to the problem of pedestrian occlusion re-identification. The main solution idea is to extract local features by partitioning, select local unshielded parts for matching and similarity measurement to reduce the influence of an occlusion region, and simultaneously combine global features to realize better re-identification effect of occluded pedestrians. Although the scheme can achieve a certain effect, because the feature extraction needs to be carried out on each small block obtained by dividing the small blocks respectively, the complexity of calculation is greatly improved, the problem of block alignment is easy to occur, and the effect of pedestrian re-identification is influenced. In addition to the limited research efforts in recent years due to the problem of occluded pedestrian re-identification, the problem has not been well researched due to the limitation of insufficient occlusion training data, resulting in slow progress of the related research efforts for occluded pedestrian re-identification.

Aiming at the problems of previous research work and training data limitation, the pedestrian shielding re-identification method based on the teacher-student learning frame is provided, and has important research significance and practical value.

Disclosure of Invention

The invention provides a pedestrian re-identification method based on a teacher-student learning frame aiming at the difficulty of a pedestrian re-identification task, the method can realize pedestrian re-identification through small-scale real shielding data training, and has the advantages of high matching rate and strong robustness.

The purpose of the invention is realized by the following technical scheme: a pedestrian shielding re-identification method based on a teacher student learning frame comprises the following steps:

firstly, training a teacher network, simulating a training process of shielding pedestrians and re-identifying pedestrians by using existing large-scale complete pedestrian data, wherein the process is realized by a combined salient detection network with a cross-domain simulator, and the process is a teacher teaching process; then, the teacher network is transmitted to the student network, so that the student network continues training on real small-scale pedestrian data sheltered by using a model of the teacher network, and the process is a student practice process; and finally, training to obtain a model with pedestrian discrimination and shielding robustness through the teacher teaching and student practice processes, wherein the model can be used for shielding pedestrians and then identifying the pedestrians. The invention can greatly improve the effect performance of the prior task of shielding pedestrians and re-identifying the pedestrians, and has wide application value.

Specifically, the pedestrian shielding re-identification method based on the teacher student learning frame comprises the following steps:

s1, generating a significant pedestrian mask label of complete pedestrian data by using an existing significant object detection model, and screening the generated sample, wherein the complete pedestrian data comprises a complete non-occluded pedestrian image, an identity label and a significant pedestrian mask label corresponding to pedestrians, and initial materials are provided for model training;

and S2, establishing a cross-domain simulator for simulating the process of identifying the blocked pedestrians by using the complete pedestrian data in order to realize the teaching process of the teacher. The cross-domain simulator sets a shielding selection probability, selects a certain proportion of complete pedestrian data to perform shielding simulation processing in each data loading process, and simultaneously gives new label information to provide a reliable data source for teacher network training;

s3, inputting data processed by a cross-domain simulator into a pedestrian re-identification network which is combined with the obvious detection, namely a teacher network, training, wherein the proportion of pedestrian data which is subjected to simulated shielding in the data to complete pedestrian data is increased along with the progress of training iteration times, and training is carried out until network loss is converged through continuous forward propagation and backward feedback adjustment of the network to obtain a basic model which has a robust pedestrian recognition function on shielding;

s4, generating a significant pedestrian mask label of real shielded pedestrian data by using the combined significant detection branch in the combined significant detection pedestrian re-identification network obtained in the step S3, and providing materials for training of a student network;

s5, building a teacher network in the reference step S3 for the student network in the practice process of the students, inheriting the network parameters obtained by the teacher network in the step S3, continuing training on real pedestrian shielding data, and obtaining a final network model through multi-round training convergence.

Preferably, in step S1, the generated sample is screened by a method comprising: if the average confidence of the significant pedestrian mask label generated by the sample is higher than a preset threshold, the sample is left as training data; and if the average confidence of the significant pedestrian mask labels generated by the samples is lower than or equal to a preset threshold, rejecting the samples.

Preferably, in step S2, a cross-domain simulator is established to realize a gradual span from the complete pedestrian data domain to the simulated occluded pedestrian data domain, and the steps are:

(1-1) loading all complete pedestrian data, and randomly selecting partial complete pedestrian data according to the proportion p to carry out shielding treatment;

(1-2) calculating the image area of each selected complete pedestrian sample, and calculating to obtain the size of the shielding area according to the set shielding proportion;

(1-3) selecting a background block from a background area of a complete pedestrian image, and scaling the background block to the size of a shielding area according to an indefinite length-width ratio, so that different shielding blocks are obtained in each operation;

(1-4) generating black blocks with the same size and shape for the blocking blocks, and processing the mask labels of the remarkable pedestrians;

(1-5) randomly selecting any position of a complete pedestrian image to cover a blocking block, covering a black block at the same position of a remarkable pedestrian mask label to finish simulated blocking in image operation, and keeping a pedestrian identity label unchanged;

(1-6) assigning a new two-class label to each simulated and generated pedestrian-obstructing sample, wherein the two-class label is an obstructing and non-obstructing two-class label, and the assigned label value is 1 and represents that pedestrians are obstructed;

(1-7) giving an occlusion and non-occlusion classification label with a label value of 0 to the complete pedestrian data without occlusion processing, wherein the pedestrian image, the significant mask label and the pedestrian identity label of the complete pedestrian data are kept unchanged, and the label value is 0 and represents that the pedestrian is not occluded (complete);

(1-8) repeating the steps (1-1) - (1-6) in each iteration of teacher network training, wherein the proportion p is larger and larger as the number of iteration rounds is increased, and more complete pedestrian data are selected for shielding processing until the training is stopped.

Preferably, in step S3, the pedestrian re-identification network for joint salient detection is composed of three parts, which are respectively a feature extraction trunk, a joint salient detection branch and a pedestrian classification identification branch. Removing the part of a full connection layer by adopting a ResNet-50 deep network for a feature extraction backbone; the combined significant detection branch is used for predicting a significant pedestrian mask, and each point is classified to be a pedestrian area or not by adopting a softmax loss function; the pedestrian classification and identification branch is used for classifying the identity of the pedestrian, and each pedestrian is taken as a category by adopting a softmax loss function; the trunk and the two branches are trained together.

Furthermore, in step S3, the significant detection branch is combined to calculate the classification loss error of each pixel, and then the classification loss errors of the accumulated pixels are fed back, so as to classify the foreground and the background of each pixel in the prediction mask, where the foreground is a pedestrian region and the background is a non-pedestrian region, and the branch is used to perform significant labeling on the pedestrian region in the image.

Specifically, let the loss function L of the joint significant detection branch^SExpressed as:

wherein L is_softmaxRepresenting the softmax function and the cross-entropy loss function, f (-) and h (-) representing the feature extractor and the salient pedestrian detector, respectively. D_FSet of data representing a complete pedestrian, comprising C_FIndividual identity of pedestrian, a total of N_FThe number of images is one of,

set of representations D_FThe image of the ith image of (a),

respectively representing a corresponding pedestrian identity class label and a prominent pedestrian mask label, wherein c_F∈{1，2，…,C_F}，s_F∈S_F. (p, q) represents the position coordinates of each pixel point on the image, H_FAnd W_FIndicating the length and width of the image.

In the invention, the pedestrian classification and identification branch is used for classifying the identity of the pedestrian, softmax cross entropy is used as a loss function, and L is used^CExpressed as:

wherein g (-) represents a pedestrian identity classifier. D_FSet of data representing a complete pedestrian, comprising C_FIndividual identity of pedestrian, a total of N_FThe number of images is one of,

representing a data set D_FThe image of the ith image of (a),

tag for indicating pedestrian identity class, L_softmaxRepresenting the softmax function and the cross entropy loss function.

As an implementation mode, different weights are respectively given to loss functions of a joint significant detection branch and a pedestrian classification identification branch to form a pedestrian re-identification network loss function of joint significant detection, which is expressed by a formula as follows:

L(I_F,C_F,S_F)＝αL^C(I_F,C_F)+(1-α)L^S(I_F,S_F)

α is a hyper-parameter for balancing weight between two branch loss functions, α is in a value range of 0 to 1, and α is greater than 0.5, so that the pedestrian classification and identification branch can be used as a main task, and the network is assisted by the combined significant detection branch to pay attention to the pedestrian.

Preferably, in the step S3, in the network pedestrian classification identification branch based on the joint significant detection, since the cross-domain generator can bring a new label to the sample, that is, a blocking and non-blocking two-classification label, a blocking and non-blocking two-classification loss function is added to the pedestrian classification identification branch, and the ability of the feature extraction process to pay attention to the pedestrian is improved by determining whether the pedestrian is blocked. Occlusion and non-occlusion binary loss function L^OCan be expressed as:

wherein b (-) represents an occlusion and non-occlusion two-class classifier, I_F/O'Representing a new set for network training resulting from putting together the occlusion processed picture (picture in the O' data set) and the non-occlusion processed picture (picture in the F data set),

is shown as I_F/O'The number i of samples in (a) is,

the data is 0 or 1, wherein 0 represents a complete (non-occlusion) pedestrian sample, and 1 represents a simulated occlusion pedestrian sample. Adding two classification loss functions of shielding and non-shielding, the loss function of the pedestrian classification identification branch becomes a multitask loss function, and using L^MExpressed as:

L^M(I_F/O',C_F/O',O_F/O')＝βL^C(I_F/O',C_F/O')+(1-β)L^O(I_F/O',O_F/O')

wherein, I_F/O'Representing a new set for network training obtained by putting together the occlusion processed picture (picture in O' data set) and the non-occlusion processed picture (picture in F data set), C_F/O'Representing the pedestrian identity label corresponding to each sample, namely a classification label; o is_F/O'An attribute representing the occlusion versus non-occlusion class of each sample, i.e., a label for occlusion or non-occlusion. L is^cAnd L^oA loss function representing the identity classification of the pedestrian, an occlusion and non-occlusion two-classification loss function respectively, and β ∈ (0, 1) represents a hyperparameter balancing the weight between the two loss functions.

As another embodiment, different weights are respectively given to the loss functions of the joint significant detection branch and the pedestrian classification identification branch added with the cross-domain generator to form a pedestrian re-identification network loss function of the joint significant detection, which is expressed by a formula:

L(I_F/O',C_F/O',O_F/O',S_F/O')＝αL^M(I_F/O',C_F/O',O_F/O')+(1-α)L^S(I_F/O',S_F/O'))

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention provides a pedestrian re-identification network framework based on a teacher student learning network, aiming at two challenges in a pedestrian re-identification task, namely, non-robust shielding influence caused by extracted features and insufficient shielding training data set. The framework utilizes the existing large-scale non-shielding complete pedestrian data to simulate and generate shielding pedestrian data through a teacher teaching stage, jointly participates in network training, obtains a basic model of shielding robustness, uses the basic model in a student practice stage for further training of real shielding pedestrian data, and finally obtains a model with the robustness to shielding. The model can learn a large amount of samples of sheltering from the condition, breaks through the restriction of sheltering from pedestrian's not enough data, and the network also can solve the not good influence that the shelter from the thing and bring through the key attention to the pedestrian simultaneously, improves effectively and shelters from pedestrian and sign the task performance again, reaches the pedestrian who shelters from the ideal under the problem and signs the effect again.

2. In the teaching process of teachers, a pedestrian re-identification network combining significant detection is designed. The network has the advantages that attention to pedestrian positions is paid through the shared feature extraction trunk and the joint significant detection branch and the pedestrian classification identification branch which are mutually assisted, so that the network is not interfered by the shielded positions, and the benefit of shielding robustness is achieved.

3. In order to realize the simulation of the re-identification process of the shielded pedestrian, the invention operates a large-scale complete pedestrian data set and a label through a cross-domain simulator. The probability increasing along with training is designed in the cross-domain simulator, complete pedestrian data is selected and processed, the process of transition from the complete pedestrian data to simulated pedestrian data shielding is more stable, and sufficient materials are provided for the teaching process of teachers.

Drawings

FIG. 1 is a schematic diagram of an embodiment of the method of the present invention.

Fig. 2 is a schematic diagram of the motivation of the present invention for pedestrian attention.

Fig. 3 is a schematic diagram of the joint significance detection branch of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent; the present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

Examples

As shown in fig. 2, in the prior art, for the problem that identification errors are caused by global attention adopted for a task of re-identifying an occluded pedestrian, the embodiment provides a method for re-identifying an occluded pedestrian, and as shown in fig. 1, the method adopts a learning framework based on teachers and students to be applied to the problem of re-identifying the occluded pedestrian, and performs parameter optimization and updating in two stages of a teacher network and a student network. Firstly, an original large-scale complete pedestrian training sample is given, and simulation of pedestrian shielding data is achieved through a cross-domain simulator in the teaching process of a teacher. And then, inputting the pedestrian re-identification network which is combined with the obvious detection into a teacher network for training to obtain a basic network with pedestrian classification judgment and shielding robustness. Then, in the practice process of students, the student network further trains and adjusts the network on the basis of the teacher network model and the real pedestrian blocking data. Through the simulation training of teacher's network teaching and the actual training of student's network practice, the model can learn a large amount of samples that shelter from the condition, breaks through the restriction that shelters from pedestrian data not enough, and the network also can be through the focus to the pedestrian and pay close attention to the not good influence of solving the shelter from bringing, improves effectively and shelters from pedestrian and sign the task performance again, reaches the pedestrian who shelters from the ideal under the problem and signs the effect again. The steps of the method are described in detail below with reference to the accompanying drawings.

S1, inputting complete pedestrian data including pedestrian images and pedestrian identity information. And operating the pedestrian image by using the existing salient object detection algorithm to generate a corresponding salient pedestrian mask serving as a new label. Screening a pedestrian sample: if the average confidence of the significant pedestrian mask label generated by the sample is higher than 0.5, the sample is left as training data; and if the average confidence of the significant pedestrian mask labels generated by the samples is less than or equal to 0.5, rejecting the samples. After the operation, the obtained complete pedestrian data comprises a complete pedestrian image, an identity label corresponding to the pedestrian and a significant pedestrian mask, and the obtained complete pedestrian data serves as training data.

And S2, generating combined data which are input in each iteration and contain complete pedestrians and blocking pedestrians through the training data obtained in the step S1 through a cross-domain simulator.

Complete pedestrian data includes a complete unobstructed pedestrian image and a pedestrian's corresponding identity tag, salient pedestrian mask, here denoted D_FSet of data representing a complete pedestrian, comprising C_FIndividual identity of pedestrian, a total of N_FAn image, I_FRepresents D_FThe corresponding labels of the images in (1) are identity labels c_F∈{1，2，…,C_FAnd significant pedestrian mask s_F∈S_F. The cross-domain simulator implements a span from the complete pedestrian data domain to the simulation generated occluded pedestrian data, represented as a mapping function F: D_F→D_O'，D_O'Representing occluded pedestrian data generated by an occlusion simulation generator.

The step of generating simulated occluded pedestrian data is:

(1-1) first, all the complete pedestrian data D are loaded_FRandomly selecting partial complete pedestrian data according to the proportion p to carry out shielding treatment;

(1-2) for each selected complete pedestrian sample I_FCalculating the area of the image

And according to the settingGear ratio [ r₁,r₂]Calculating to obtain the size of the shielding area

(1-3) selecting a small background block patch from the background area of the complete pedestrian image, and scaling the small background block patch to the size of the shielded area in an uncertain length-width ratio;

(1-4) generating a black block black _ patch of the same size and shape for the blocking block for processing the prominent pedestrian mask label;

(1-5) randomly selecting any position of the complete pedestrian image to cover a blocking block, covering a black block at the same position of the significant pedestrian mask label, completing simulated blocking in image operation, and obtaining a new simulated pedestrian-blocked image I_O'And prominent pedestrian label s_O'Keeping pedestrian identity label unchanged c_O'；

(1-6) assigning a new two-class label (occlusion and non-occlusion two-class label) to the generated simulated occluded pedestrian sample, denoted o_O'1, representing blocking a pedestrian;

(1-7) remaining intact pedestrian data without occlusion processing, with its pedestrian image, saliency mask label, pedestrian identity label left unchanged (I)_F,c_F,s_F) Assigning occlusion and non-occlusion binary labels, denoted as o_F0, representing an unobstructed (full) pedestrian;

(1-8) through a cross-domain simulator, the data used for network training is changed into the combined data of complete pedestrian data and simulated generated sheltered pedestrian data, which is represented as D_F/O'As the number of training iteration rounds increases, the proportion of occluded pedestrian data generated by simulation occupying the training data will increase. The whole pedestrian data selected for the occlusion processing is increased until the training is stopped. The network can thus obtain occlusion robustness by observing many different types of occlusion images.

And S3, inputting the data set of the combined complete pedestrian and the simulated sheltered pedestrian into a pedestrian joint significant detection and identifying a teacher network for training, wherein the teacher network comprises three main parts, namely a feature extraction main part and two different branches. The basic deep network used by the feature extraction backbone is ResNet-50, with the full connection layer portion removed, acting as the feature extractor f (-) of the network.

One of the branches is a joint significance detection branch, as shown in fig. 3, the branch calculates the classification loss error of each pixel point, then accumulates the classification loss errors of the pixel points to perform feedback, and performs foreground and background classification on each pixel point in the prediction mask, wherein the foreground is a pedestrian region and the background is a non-pedestrian region, so that the branch can perform significance labeling on the pedestrian region in the image. The branch uses the softmax cross entropy of the pixel level as a loss function, namely, the classification of whether each pixel point is divided into pedestrian regions or not is carried out, and the joint significant detection branch is expressed as h (·), L^SThe loss function is detected for the corresponding joint significance, then the loss function is expressed as:

the other branch is a pedestrian classification and identification branch, and classification of the identity of the pedestrian is realized.

As an embodiment, the pedestrian classification identification branch may use softmax cross entropy as a loss function, with each pedestrian as a category. By L^CExpressed as:

wherein g (-) represents a pedestrian identity classifier.

In this embodiment, the loss functions of the two branches above are respectively given different weights, that is, the loss functions of the pedestrian re-identification network for joint significant detection are formed, and are expressed by a formula:

L(I_F,C_F,S_F)＝αL^C(I_F,C_F)+(1-α)L^S(I_F,S_F)

As another embodiment, the pedestrian classification identification branch has a pedestrian identity loss function L^CIn addition, an auxiliary occlusion and non-occlusion binary loss function L is combined^OForm a multitask penalty function, using L^MExpressed, the loss function is:

L^M(I_F/O',C_F/O',O_F/O')＝βL^C(I_F/O',C_F/O')+(1-β)L^O(I_F/O',O_F/O')

wherein β ∈ (0, 1) is a hyperparameter that balances the weighted ratio of the two loss functions in the multitask loss function, and β is typically set greater than 0.5 to achieve pedestrian identity classification recognition as the primary task.

The teacher network shares the feature extraction backbone, and then the two branches act on the feature extraction backbone respectively to feed back supervision information, so that the teacher network guides the feature extraction process. Combining the two branches, the resulting overall loss function is:

L(I_F/O',C_F/O',O_F/O',S_F/O')＝αL^M(I_F/O',C_F/O',O_F/O')+(1-α)L^S(I_F/O',S_F/O')

the method comprises the following steps that α belongs to (0, 1) as a super parameter for balancing the proportion of two branches in the network, α is set to be more than 0.5 according to task requirements, the pedestrian classification and identification branches can be used as main tasks, and the significant detection branches are combined to assist the network to pay attention to pedestrians.

With the progress of teacher network training, the discrimination ability of the network to the pedestrian classification is enhanced on one hand, and the remarkable ability to the pedestrian is continuously improved on the other hand, which also reflects that the network can pay important attention to the pedestrian. Through the training of combining two branches, global attention in the past is gradually developed into attention to the area where pedestrians are remarkable, the performance of shielding robustness is continuously enhanced, and the model can extract better features for subsequent matching aiming at pedestrian re-identification of shielding problems.

And S4, after the teacher network training is finished, because the combined significant detection branch of the teacher network can predict the pedestrian area, and the capability of significantly detecting the pedestrians is enhanced in the training of the step S3, the generation of significant pedestrian mask labels is performed on real pedestrian shielding data by adopting the combined significant detection branch of the teacher network, so that materials are provided for the student network training.

And S5, in order to achieve a better effect in the real application, training of real pedestrian shielding data needs to be performed on the basis of the teacher network training result in the step S3. The training of the real pedestrian shielding data is based on a model of a teacher network, and is performed by a student network, and the structure of the student network is set in a mode of referring to the teacher network, and the training is performed on the real pedestrian shielding data until parameters are converged, so that a final network model is obtained.

In this embodiment, the effect of the method is explained through experiments, and the experimental database selects an occupancy-REID pedestrian-obstructing database, a Partial-REID pedestrian-obstructing database, a P-DukeMTMC-REID pedestrian-obstructing database, and a P-ETHZ pedestrian-obstructing database: wherein the Occluded-REID database contains 200 different pedestrians, each pedestrian having 5 images of pedestrians of different occlusion types and 5 images of non-Occluded pedestrians, totaling 2000 images. In the experiment, an image of a pedestrian which is shielded is used as an image of a query domain, an image of a pedestrian which is not shielded is used as an image of a search domain, images of 100 pedestrians are selected to form a training set, and the remaining 100 pedestrians are used as a test set; the Partial-REID database contains 900 images of 60 different pedestrians. Each pedestrian has 5 images of different occlusion types, 5 local images with the occlusion removed and 5 images without the occlusion of the whole body, 30 pedestrians are selected for training, and the rest 30 pedestrians are tested. The P-DukeMTMC-reID database contains 24143 images of 1299 pedestrians, each containing a full-body unobstructed image and multiple obstructed images, of which 665 pedestrians were selected to make up the data set and the remaining 634 pedestrians were used as the test set. The P-ETHZ database contains 3897 images of 85 pedestrians, of which 43 are selected to make up the training set and the remaining 42 are used as the test set.

The invention selects ResNet-50 as an initialization network, and verifies the effectiveness of three parts, namely a teacher network, a joint significant detection branch and a cross-domain simulator in a teacher student learning network on an Occluded-REID database, a Partial-REID database, a P-DukeMTMC-reID database and a P-ETHZ database, as shown in Table 1:

TABLE 1 Effect of the invention in each part

As can be seen from table 1, after joining the teacher network, the recognition accuracy is greatly improved by 38.50%, 40.33%, 6.34% and 17.38%, respectively. Due to the fact that the P-DukeMTMC-reiD data is large in scale, difficulty in the network practice process of students is increased, and compared with the other three databases, the identification rate after the students are added into a teacher network is improved a little. Moreover, the combined significant detection branch is added, so that the recognition accuracy rates are respectively improved by 5.80%, 6.67%, 4.69% and 5.95%, and the recognition accuracy rates are respectively improved by 5.17%, 10.00%, 4.15% and 8.58% by adding a cross-domain simulator. The results in table 1 show that the combination of the teacher network, the joint significance detection branch and the cross-domain simulator can achieve the best effect.

This example also compares the method of the present invention with some existing mainstream methods based on the conventional descriptor and deep network, and the comparison results on the Occluded-REID database, Partial-REID database, P-DukeMTMC-REID database and P-ETHZ database are shown in Table 2.

Table 2 comparison of the present invention with mainstream Algorithm

The results in table 2 show that the recognition accuracy of the invention on the Occluded-REID database, the Partial-REID database, the P-DukeMTMC-reiD database and the P-ETHZ database reaches 73.69%, 82.67%, 51.42% and 62.86% respectively, which is superior to most pedestrian re-identification mainstream algorithms, and the effect of the invention on the problem of pedestrian re-identification by blocking is up to the advanced level of the field.

The techniques described herein may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. For a hardware implementation, the processing modules may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Programmable Logic Devices (PLDs), field-programmable gate arrays (FPGAs), processors, controllers, micro-controllers, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.

For a firmware and/or software implementation, the techniques may be implemented with modules (e.g., procedures, steps, flows, and so on) that perform the functions described herein. The firmware and/or software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A pedestrian shielding re-identification method based on a teacher student learning frame is characterized by comprising the following steps:

firstly, training a teacher network, simulating a training process of shielding pedestrians and re-identifying pedestrians by using existing large-scale complete pedestrian data, wherein the process is realized by a combined salient detection network with a cross-domain simulator, and the process is a teacher teaching process;

then, the teacher network is transmitted to the student network, so that the student network continues training on real small-scale pedestrian data sheltered by using a model of the teacher network, and the process is a student practice process;

and finally, training to obtain a model with pedestrian discrimination and shielding robustness through the teacher teaching and student practice processes, wherein the model is used for shielding pedestrians and then identifying the pedestrians.

2. The occluded pedestrian re-identification method based on the teacher student learning frame according to claim 1, comprising the steps of:

s1, generating a remarkable pedestrian mask label of complete pedestrian data, screening the generated sample, wherein the complete pedestrian data comprises a complete non-blocking pedestrian image, an identity label corresponding to a pedestrian and the remarkable pedestrian mask label, and providing an initial material for model training;

s2, establishing a cross-domain simulator for simulating a pedestrian shielding re-identification process by using complete pedestrian data, setting a shielding selection probability by the cross-domain simulator, selecting a certain proportion of complete pedestrian data in each data loading process to perform shielding simulation treatment, and simultaneously giving new label information to provide a reliable data source for teacher network training;

3. The method for re-identifying blocked pedestrians based on teacher student learning frame as claimed in claim 2, wherein in step S1, the generated samples are screened by: if the average confidence of the significant pedestrian mask label generated by the sample is higher than a preset threshold, the sample is left as training data; and if the average confidence of the significant pedestrian mask labels generated by the samples is lower than or equal to a preset threshold, rejecting the samples.

4. The method for re-identifying occluded pedestrians based on teacher student learning frame as claimed in claim 2, wherein in step S2, a cross-domain simulator is established to realize gradual span from complete pedestrian data domain to simulated occluded pedestrian data domain, comprising the steps of:

(1-7) giving an occlusion and non-occlusion classification label with a label value of 0 to the complete pedestrian data without occlusion processing, wherein the pedestrian image, the significant mask label and the pedestrian identity label of the complete pedestrian data are kept unchanged, and the occlusion and non-occlusion classification label represents that the pedestrian is not occluded;

5. The shielded pedestrian re-identification method based on the teacher student learning frame as claimed in claim 2, wherein in step S3, the pedestrian re-identification network of joint significance detection is composed of three parts, which are respectively a feature extraction trunk, a joint significance detection branch and a pedestrian classification identification branch; removing the part of a full connection layer by adopting a ResNet-50 deep network for a feature extraction backbone; the combined significant detection branch is used for predicting a significant pedestrian mask, and each point is classified to be a pedestrian area or not by adopting a softmax loss function; the pedestrian classification and identification branch is used for classifying the identity of the pedestrian, and each pedestrian is taken as a category by adopting a softmax loss function; training the model by the trunk and the two branches;

and calculating the classification loss error of each pixel point by combining the significant detection branch, then accumulating the classification loss errors of the pixel points to perform feedback and return, performing foreground and background classification on each pixel point in the prediction mask, wherein the foreground is a pedestrian region, the background is a non-pedestrian region, and the branch is used for significantly marking the pedestrian region in the image.

6. The teacher-student learning frame-based pedestrian re-identification occlusion method as claimed in claim 5, wherein a loss function L of the joint significance detection branch is set^SExpressed as:

wherein L is_softmaxRepresenting a softmax function and a cross entropy loss function, f (-) and h (-) representing a feature extractor and a salient pedestrian detector, respectively;

set of representations D_FThe ith image of (1), D_FSet of data representing a complete pedestrian, comprising C_FIndividual identity of pedestrian, a total of N_FThe number of images is one of,

respectively representing a corresponding pedestrian identity class label and a prominent pedestrian mask label, wherein c_F∈{1，2，…,C_F}，s_F∈S_F(p, q) represents the position coordinates of each pixel point on the image, H_FAnd W_FIndicating the length and width of the image.

7. The teacher-student learning frame-based pedestrian re-identification sheltering method according to claim 6, wherein the pedestrian classification and identification branch is used for classifying the identity of the pedestrian, and the softmax loss function is used and the L is used^CExpressed as:

wherein g (-) represents a pedestrian identity classifier, D_FSet of data representing a complete pedestrian, comprising C_FPersonal bodyAll people are N_FThe number of images is one of,

representing a data set D_FThe image of the ith image of (a),

8. The teacher-student learning frame-based pedestrian re-identification sheltered method according to claim 7, wherein the loss functions of the joint significance detection branch and the pedestrian classification identification branch are respectively given different weights to form a pedestrian re-identification network loss function of the joint significance detection, expressed by a formula:

L(I_F,C_F,S_F)＝αL^C(I_F,C_F)+(1-α)L^S(I_F,S_F)

α is an over-parameter for balancing the weight between two branch loss functions, and α is in the range of 0 to 1.

9. The teacher student learning frame based occluded pedestrian re-identification method of claim 6, wherein in the step S3 combined with the significantly detected pedestrian re-identification network pedestrian classification identification branch, the occlusion and non-occlusion two-classification loss function L^OExpressed as:

wherein b (-) represents an occlusion and non-occlusion two-class classifier, I_F/O'Representing a new set for network training obtained by putting the pictures subjected to the occlusion processing and the pictures not subjected to the occlusion processing together,

is shown as I_F/O'The number i of samples in (a) is,

the data is 0 or 1, wherein 0 represents a complete pedestrian sample, and 1 represents a simulated blocking pedestrian sample;

adding two classification loss functions of shielding and non-shielding, the loss function of the pedestrian classification identification branch becomes a multitask loss function, and using L^MExpressed as:

L^M(I_F/O',C_F/O',O_F/O')＝βL^C(I_F/O',C_F/O')+(1-β)L^O(I_F/O',O_F/O')

wherein, I_F/O' denotes a new set for network training, obtained by putting together the occlusion-processed picture and the non-occlusion-processed picture, C_F/O'Representing the pedestrian identity label corresponding to each sample, namely a classification label; o is_F/O'Properties representing occlusion versus non-occlusion classes for each sample, i.e. labels for occlusion or non-occlusion, L^cAnd L^oA loss function representing the identity classification of the pedestrian, an occlusion and non-occlusion two-classification loss function respectively, and β ∈ (0, 1) represents a hyperparameter balancing the weight between the two loss functions.

10. The shielded pedestrian re-identification method based on the teacher student learning frame according to claim 9, wherein the loss functions of the joint significant detection branch and the pedestrian classification identification branch added with the cross-domain generator are respectively given different weights to form a pedestrian re-identification network loss function of the joint significant detection, which is expressed by a formula: