CN115376178A

CN115376178A - Unknown domain pedestrian re-identification method and system based on domain style filtering

Info

Publication number: CN115376178A
Application number: CN202210286465.2A
Authority: CN
Inventors: 种衍文; 章郴; 潘少明
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2022-03-22
Filing date: 2022-03-22
Publication date: 2022-11-22

Abstract

The invention provides a method and a system for re-identifying pedestrians in an unknown domain based on filtering of a domain style, aiming at the complex distribution condition that a training set and a test have data distribution difference and multiple data distributions exist in a test set at the same time, a task for re-identifying pedestrians in the unknown domain is established, wherein the task firstly trains a model on a certain data set and then tests all data sets except the data set in a combined manner; when unknown domain learning which does not need domain adaptation and is based on domain style filtering is carried out, a model can remove redundant information through a self-encoder and is used as a basis, the domain style information is filtered by means of pedestrian Identity (ID) loss in a multi-task combined learning mode, domain invariance characteristics are extracted, a pedestrian feature encoder network is used as a main module of a domain style filtering network, a pedestrian ID classification network and an image feature decoder network are used as auxiliary modules, the encoder and the decoder form a self-encoder network and are used for realizing an image reconstruction task, and the ID classification network is used for a pedestrian re-identification task.

Description

Unknown domain pedestrian re-identification method and system based on domain style filtering

Technical Field

The method can be applied to the field of pedestrian re-identification, a multi-task joint learning structure is constructed to realize a cross-camera pedestrian retrieval task, and an unknown domain pedestrian re-identification technical scheme based on domain style filtering is realized.

Background

As a research hotspot in the fields of computer vision and image retrieval, the pedestrian Re-identification (ReID) technology can be widely applied to scenes such as intelligent security monitoring systems and intelligent retail stores, and is a sub task in the field of image retrieval.

The rapid development of the computer hardware level and the deep learning theory knowledge promotes the deep learning technology to be widely applied to various fields including image classification, target detection, semantic segmentation, image retrieval, pedestrian re-identification and the like. A plurality of methods are sequentially proposed in the pedestrian re-identification community, and the upper limit of the accuracy of pedestrian re-identification in different difficult scenes is continuously refreshed.

The pedestrian re-identification can be divided into a single-domain pedestrian re-identification task and a cross-domain pedestrian re-identification task according to different application scenes. The single-domain pedestrian re-recognition takes the pedestrian retrieval task under the condition that the training set data and the test set data come from the same data distribution into consideration, and mainly aims to solve the problem of invariant pedestrian feature extraction under the challenges of pedestrian component shielding, attitude change, motion blur, pedestrian image misalignment and the like.

However, when there is a large Domain style difference (Domain Gap) between the data distribution of the test sample and the training sample, for example, after the model is trained on the public academic data set, the accuracy is greatly reduced when the test is performed in a scene (e.g., a market) different from the training set, and this kind of situation is called a cross-Domain pedestrian re-identification problem. However, cross-domain pedestrian re-identification is not the end point of the field, and the main reasons are:

the data distribution of the target domain samples is not a single style distribution considered by the current cross-domain pedestrian re-identification. The ultimate goal of pedestrian re-identification is to achieve cross-camera pedestrian retrieval within a city, and even between cities. Thus, the scene of the target domain in the real world may not be a single scene as in cross-domain pedestrian re-recognition, but have images from multiple different scenes, such as subways, schools, shopping malls, etc., at the same time. The images in different scenes have great style difference due to camera models, camera settings and the like, so that the data distribution has great difference, and the style difference can be extracted into image feature representation by a convolutional neural network, so that the subsequent pedestrian recognition task is influenced.

In addition, the existing pedestrian re-identification method reduces the inter-domain difference by performing domain adaptation in the target domain, so that the cross-domain performance of the model is improved. However, this often requires a significant amount of domain adaptation time to be expended, making deployment difficult (actual required deployment time rather than off-line model training time that can be omitted). While ReID is generally considered a short-term task that, if the time span is too large, will cause the most fatal problem of re-identification of the garment-changing pedestrian (over 24h cases are generally difficult to detect). In this problem, most pedestrian re-identification methods fail at present, because when the methods are subjected to domain adaptation for a target domain, a pedestrian to be queried is likely to leave the current region early, so that the queried result has no meaning. Therefore, the research is dedicated to propose an unknown domain learning method without domain adaptation to solve the problem of pedestrian re-identification in the unknown domain, so as to realize real 'one-training, everywhere deployment'.

In summary, as an image retrieval task, both the single-domain pedestrian re-identification task and the cross-domain pedestrian re-identification task cannot simulate a real-world pedestrian retrieval scene, and meanwhile, a problem that deployment can be performed only by requiring a large amount of domain adaptation time exists. Thus, this patent addresses the unknown domain pedestrian re-identification problem, i.e., not only where the training set and test set samples come from different distributions, but also where the data distribution of the test set samples comes from multiple domain styles. The single-domain pedestrian re-identification task and the cross-domain pedestrian re-identification task are simultaneously omitted, but the important task is realized. In addition, the patent also provides an unknown domain pedestrian re-identification method based on domain style filtering, target domain data are not needed to participate in model training at all, zero deployment cost is achieved, higher model precision compared with the existing method can be achieved, the generalization capability of the model in real world scenes is improved, and the application of a pedestrian re-identification technology in real life is assisted.

Reference documents:

[1].Zhong Z,Zheng L,Li S,et al.Generalizing a person retrieval model hetero-and homogeneously[C].

Proceedings of the European Conference on Computer Vision(ECCV),2018:172-188.

[2].Deng W,Zheng L,Ye Q,et al.Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification[C].

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2018:994-1003.

[3]Fu Y,Wei Y,Wang G,et al.Self-Similarity Grouping:A Simple Unsupervised Cross Domain Adaptation Approach for Person Re-Identification[C].Proceedings of the IEEE International Conference on Computer Vision,2019:6112-6121.

[4].Luo H,Gu Y,Liao X,et al.Bag of tricks and a strong baseline for deep person re-identification[C].Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops,2019:0-0.

disclosure of Invention

In view of the problems and defects of the existing pedestrian re-identification method, the invention provides a new pedestrian re-identification task, which is called an unknown-domain pedestrian re-identification task. Different from the traditional pedestrian re-identification technology, the method considers that the training set and the test set have data distribution difference, and the complex distribution condition of multiple data distributions simultaneously exists in the test set. Specifically, the task first trains the model on a certain data set, and then combines all data sets except the data set to perform testing. The key point of the method is to innovatively incorporate the application problem of pedestrian re-identification in the real world into theoretical research and put forward a pedestrian re-identification task in an unknown domain to solve the problem.

The invention provides a method for re-identifying pedestrians in an unknown domain based on filtering of a domain style, which aims at the complex distribution condition that a training set and a test have data distribution difference and multiple data distributions exist in the test set at the same time, and establishes a task for re-identifying the pedestrians in the unknown domain, wherein the task firstly trains a model on a certain data set and then combines all data sets except the data set to test;

when unknown domain learning based on domain style filtering without domain adaptation is performed, the process is as follows,

the model can remove the characteristic of redundant information as the foundation through the self-Encoder, in a multitask joint learning mode, the domain style information is filtered by means of the loss of the identity ID of a pedestrian, so that the purpose of extracting the domain invariance characteristic is achieved, the domain style filtering network takes a pedestrian feature Encoder network Encoder as a main module, takes a pedestrian ID Classification network ID Classification and an image feature Decoder network Decoder as auxiliary modules, the Encoder and the Decoder form a self-Encoder network Auto-Encoder for realizing an image reconstruction task, and the ID Classification network is used for a pedestrian re-identification task; the implementation steps for carrying out the pedestrian re-identification in the unknown domain are as follows

1) Given a Source Domain dataset with a pedestrian ID tag, note

Where N denotes the number of images in the source domain dataset, x ⁿ The n-th image is shown,

representing an image x ⁿ A corresponding pedestrian ID tag;

2) Randomly selecting a group of images in a source domain data set in a PK sampling mode to form a small batch of training samples, wherein P pedestrians are selected, and each pedestrian selects K images to form a triple;

3) Executing a forward propagation algorithm, wherein an input image x is subjected to an encoder E to obtain a hidden feature code f; then the code f is continuously input into a decoding network D and the reconstruction result is output

The code f is also input to the ID classification network C _ID In (1), acquiring a pedestrian ID feature f _ID And pedestrian ID prediction results

4) Respectively calculating the image reconstruction loss L according to the output result of forward propagation _R And ID classification loss L _ID Then weighting different losses, and calculating the total loss L of the model;

5) Separately calculate L to L _R And L _ID Performs an optimization operation using a back-propagation algorithm, updates the encoder E, the decoder D, and the ID classification network C _ID ；

6) Continuing to circularly execute the processes from the step 2) to the step 5) until the model converges;

7) After the model is converged, a trained encoder E is obtained ^* And ID classification network

8) In the inference phase, a set of unlabeled exemplars for the target domain given unknown domain conditions

Any one image y passes through an encoder E ^* And ID classification network

Forward propagation is executed to obtain the domain invariance pedestrian characteristic f _ID And then calculating the cosine similarity of the features and the image features of the image library, and sequencing according to the sequence of similarity from large to small to obtain a pedestrian retrieval result.

And, based on the reconstruction of the input image by the self-encoder to filter the domain style related information, the specific training process is as follows,

using the mean square error loss to restrict the image reconstruction process from the encoder network, and respectively performing data sampling of N points on the data distribution p of given input samples and the data distribution q of output reconstruction results, wherein N is the number of source domain training samples, and the image reconstruction loss L related to the image reconstruction task _R Described using formula (1):

where i represents the number of the source domain training sample, L _MSE (p, q) represents the mean square error loss for data distribution p and data distribution q.

Further, the specific process of classifying the network based on the ID is as follows,

adding pedestrian ID loss on the hidden vector as supervision to enable an encoder and an ID classification network to learn information related to ID classification, discarding the information unrelated to the classification, and further enabling a model to remove redundant information unrelated to the classification after combining with a self-encoder model, wherein the redundant information comprises domain style information;

taking the pedestrian re-identification task as an ID classification task, and adopting ID loss L _ID Supervision is carried out, L _ID Including cross entropy loss computed on the output layer, triplet loss and center loss computed on the embedding layer, and a formulaic description L _ID As shown in the following formula:

wherein f is _ID ID features representing any one image x in a batch of input images; while

Representing anchor images x in image triplets _a The ID feature of (a), similarly,

and

respectively representing the positive sample image x in this triplet _p And negative sample image x _n ID feature of (a); y is _ID ID tag representing image xThe label is a paper label with a color,

then the ID label of image x after label smoothing regularization is indicated,

ID classification results predicted for the network; the weights of the cross-entropy and triplet penalties are both set to 1, while the weight w ₁ To weigh the importance of the center loss.

Moreover, cross entropy loss L _CE Triple loss L _tri And central loss L _Center The definitions of (a) and (b) are respectively as follows,

wherein C represents the total number of classified categories, C represents the number of the categories, p and q represent the data distribution of model prediction results and sample labels respectively, and p _c And q is _c Respectively representing two sampling values distributed at the position c;

where PK represents P IDs sampled per batch, each ID samples K images, and t and s represent the ID and the number of images for each ID, respectively; m is interval hyperparameter of triple loss, and each triple consists of anchor point image a, positive sample image p ⁺ And negative sample image n ^- Composition d _ap And d _an Respectively representing Euclidean distances between the anchor point image characteristics and the positive sample image characteristics and between the anchor point image characteristics and the negative sample image characteristics;

where B denotes the number of images in a batch,

indicating the ID features of the ith image therein,

an ID tag indicating the ith image,

is shown as

Feature centers for categories.

Moreover, the image reconstruction task and the pedestrian ID classification network are optimized in a multi-task combined training mode, the implementation steps are as follows,

the image reconstruction task and the ID classification task are jointly optimized, the effect of filtering the image style information is achieved, and the total target loss of the model is defined as:

wherein x _i Which is representative of the input image(s),

representing the output result of the input image after being reconstructed by the self-encoder; weight w ₁ And w ₂ Is used to weigh the importance between losses;

the above equation is then minimized in a multitask joint training fashion to reach the "optimal point" for the model. The model optimization objective is noted as:

wherein E is ^* Represents the final optimization result of the encoder network, and D ^* And

respectively representing the final optimization results of the decoder network and the ID classification network.

Also, the ResNet50 IBN-B model pre-trained on the ImageNet dataset was used as the backbone network for the encoder, while removing the last spatial down-sampling operation; during model testing, the output of the full-connection layer FC in the ID classification network is selected as the pedestrian feature, the cosine similarity between the query image feature and the image feature of the image library is calculated for measurement, and finally the retrieval result of the query image is obtained through sorting.

On the other hand, the invention also provides a domain style filtering-based unknown domain pedestrian re-identification system, which is used for realizing the domain style filtering-based unknown domain pedestrian re-identification method.

Further, a processor and a memory are included, the memory for storing program instructions, the processor for invoking the stored instructions in the memory to perform a domain style filtering based unknown domain pedestrian re-identification method as described above.

Alternatively, the present invention includes a readable storage medium, on which a computer program is stored, and when the computer program is executed, the method for re-identifying a person in an unknown domain based on domain style filtering out is implemented.

The invention is a method which can be applied to the pedestrian re-identification and pedestrian retrieval field, and has the following advantages compared with the existing pedestrian re-identification technology:

1) Aiming at the defect that the existing single-domain pedestrian re-identification task and cross-domain pedestrian re-identification task cannot simulate a real world scene, the invention provides a new unknown-domain pedestrian re-identification task, which is used for discussing the problem of pedestrian re-identification under the condition that data distribution difference exists between a training set and a test set sample, and multiple data distributions exist in the test set at the same time. Specifically, unknown domain pedestrian re-identification considers training a model on a particular data set and performing model tests on all data sets except the data set.

2) The invention provides an unknown domain learning method based on domain style filtering, aiming at the problem of the pedestrian re-identification of the unknown domain, the method does not need any domain adaptation operation, and solves the problem that the current pedestrian re-identification method can not realize rapid deployment in a short time. Specifically, the method reconstructs an input image by designing a model based on a self-encoder, and removes redundant information related to the domain style by combining with the pedestrian re-identification loss, thereby extracting the characteristics of the pedestrian with the invariance of the domain.

3) The method can greatly improve the performance of the model on the pedestrian re-identification task in the unknown domain without using any target domain data for domain adaptive operation, is far superior to other pedestrian re-identification methods in performance and deployment cost, can realize the final target of rapid deployment, and meets the deployment requirement in the actual scene.

Drawings

Fig. 1 is a structural diagram of an auto-Encoder used in the embodiment of the present invention, where a network may encode input data x into an intermediate result y through an Encoder encorder, and then restore the intermediate result y into an approximate result x% of an original input through a Decoder.

Fig. 2 is a frame structure diagram of the domain style filtering method used in the embodiment of the present invention, in which an input image is encoded by a pedestrian feature encoder E to obtain an intermediate result, and then the intermediate result is decoded by a decoder D and is calculated as a mean square error loss with the input image, and is classified by an Identity (ID) classifier and is calculated as an ID loss.

Detailed Description

The detailed unknown domain pedestrian re-identification technical scheme is described in detail below with reference to the embodiments and the accompanying drawings.

As a key of public safety application based on video monitoring, the pedestrian re-identification technology can be used for quickly and accurately searching pedestrians in mass data acquired by a monitoring system deployed in large quantity.

However, the existing single-domain pedestrian re-recognition algorithm and the cross-domain pedestrian re-recognition algorithm proposed by considering the distribution difference (domain deviation) between the actual application scene data and the training data cannot face more complicated application scenes, including the reasons of equipment manufacturer difference, different parameter settings, complicated and variable application scenes and the like, so that the pedestrian re-recognition not only needs to face the difference of the training set and the test set in data distribution, but also needs to face different data distributions (domain styles) widely existing in the test set, but the problem is ignored by the existing pedestrian re-recognition community. Aiming at the problem, the patent provides a new unknown domain pedestrian re-identification algorithm based on domain style filtering, and the main work comprises two aspects:

1) First, the patent proposes a new unknown domain pedestrian re-identification task. In the unknown domain pedestrian re-identification task, not only are there data distribution differences between the training set and the test set, but also there are samples from multiple data distributions simultaneously inside the test set.

2) Secondly, the domain adaptation in the target domain is considered in the existing pedestrian re-recognition algorithm to improve the model generalization, but a large amount of deployment time cost is brought, so that the technology cannot really land on the ground at present. The patent provides a new domain style filtering algorithm without domain adaptation on the basis of the first work, and assists pedestrian re-identification by means of image reconstruction realized by a self-encoder so as to filter redundant domain style information and extract the features of pedestrians with unchanged domains.

Based on the research content, the unknown domain pedestrian re-identification data set M2O is obtained by utilizing five opened source pedestrian re-identification data sets, and the domain adaptation time and the algorithm precision of four types of classical algorithms on the unknown domain pedestrian re-identification problem are reproduced. The algorithm provided by the patent is far superior to other algorithms in deployment time and precision, and the final assumption of 'training at one place and deploying at all' of the pedestrian re-identification problem is realized.

The unknown domain pedestrian re-identification method based on the domain style filtering provided by the embodiment of the invention is specifically realized as follows. In specific implementation, the model can be built by using the deep learning framework of the pyrrch.

1. This patent proposes that a test set contain unknown domain pedestrian re-identification tasks based on a variety of different data distributions,

the invention provides a new pedestrian re-identification task, which is called an unknown domain pedestrian re-identification task. Different from the traditional pedestrian re-identification technology, the method considers that the training set and the test have data distribution difference, and the complex distribution condition of various data distributions simultaneously exists in the test set. Specifically, the task first trains the model on a certain data set, and then combines all data sets except the data set to perform testing.

2. The patent proposes a method for reconstructing an input image based on an auto-encoder to filter out domain style related information,

the invention provides a domain-adaptation-free unknown domain pedestrian re-identification method based on domain style filtering. The network can remove the characteristic of redundant information through the self-Encoder as the basis, and filter the domain style information by means of the Classification loss of the pedestrian Identity (ID) in a multitask joint learning mode to achieve the aim of extracting the domain invariance characteristic. The method comprises the following specific steps:

1) Given a Source Domain dataset with a pedestrian ID tag, note

representing an image x ⁿ A corresponding pedestrian ID tag;

2) Randomly selecting a group of images in a source domain data set in a PK sampling mode to form a small batch of training samples, namely selecting P pedestrians, selecting K images for each pedestrian, and forming a plurality of triples including anchor point images, positive sample images and negative sample images;

3) The forward propagation method is performed: an input image x is subjected to an encoder E to obtain a hidden feature code f; then, the hidden feature code f is continuously input into a decoding network D, and a reconstruction result is output

In addition, the hidden feature f is input into the ID classification network C _ID In the method, a pedestrian ID feature f is obtained _ID And pedestrian ID prediction results

4) Calculating the image reconstruction loss L by respectively using an equation (1) and an equation (2) according to the output result of the forward propagation _R And ID classification loss L _ID Then weighting different losses according to the formula (6) and calculating the total loss L of the model;

5) Separately calculate L to L _R And L _ID Performs the optimization operation of (7) using a back-propagation method, updates the encoder E, the decoder D, and the ID classification network C _ID ；

6) Continuing to circularly execute the processes from the step 2 to the step 5 until the model converges;

7) After the model is converged, a trained encoder E can be obtained ^* And ID classification network

In any one of the images y, through an encoder E ^* And ID classification network

Forward propagation is carried out, namely the domain invariance pedestrian characteristic f can be obtained _ID Then calculating the cosine similarity between the feature and the image features in the image library, and calculating the cosine similarity according to the similarityAnd sequencing the big and small orders to obtain a pedestrian retrieval result.

3. The patent proposes a method for reconstructing an input image based on an auto-encoder to filter out domain style related information, and the specific training process is as follows,

for the image reconstruction task, the input image can be reconstructed more smoothly and accurately by applying a Mean Square Error (MSE) constraint to the input image and the output image based on an unsupervised learning self-encoder model, and the process is generally called as an image reconstruction task. Specifically, the self-encoder is composed of an encoder and a decoder. By extracting and compressing the characteristics of input data, an encoder encodes an input pedestrian image into the discriminative characteristics with a large amount of redundant information removed, and then reconstructs a hidden vector into an output result similar to an original input image through a decoder. In an embodiment, for a given data distribution p of input samples and a data distribution q of output reconstruction results, data sampling is performed at N points (where N is the number of source domain training samples, i.e. the number of training set images), respectively, and then an image reconstruction loss L for an image reconstruction task is obtained _R Can be described by the formula (1):

where i denotes the number of the source domain training samples, L _MSE (p, q) represents the mean square error loss for data distribution p and data distribution q.

4. The specific process proposed in this patent for classifying a network based on ID is as follows,

although the implicit feature of the middle layer of the self-encoder can effectively remove a large amount of redundant information, as an unsupervised method, the self-encoder cannot decide what the redundant information removed by the middle layer is. In general, redundant information removed in the encoding process is a very basic intensive relationship among pixels, rather than high-level information with specific semantics, and the method cannot restrict a network to remove domain style related information. Therefore, the method applies pedestrian recognition constraint on the hidden vector, so that redundant information irrelevant to pedestrian recognition is removed while the characteristic of the identified pedestrian is extracted. Theoretically, the purpose of removing the designated information, such as the domain style related information, can be achieved by designing a reasonable loss function.

Specifically, the patent chooses to add pedestrian ID loss to the hidden vector as a supervision, prompting the encoder and the ID classification network to learn information about ID classification. In other words, information that is not relevant to the classification is discarded, which, when combined with the self-encoder model, further facilitates the model to remove such redundant information that is not relevant to the classification. The information that is not related to the classification information contains a large amount of domain style information.

The main reasons are as follows: for the training set samples, all pedestrian images in the training set have the same style and the same data distribution due to the fact that the images are from the same camera acquisition, and the information has no effect on the pedestrian classification task. Because different images of either the same pedestrian or different images of different pedestrians have this same domain style information, this information cannot be retained during the gradient backpropagation process that occurs after the ID classification loss is minimized, but is removed as extraneous information.

Therefore, after the combined training with the image reconstruction task is carried out, the encoder effectively achieves the effect similar to a style mask (mask), the domain style information is filtered, and then the domain invariance pedestrian features irrelevant to the style are extracted, so that the identification difficulty of the model is greatly reduced.

This patent constructs the triplex with the PK sampling, selects P pedestrians at random promptly, and K images are selected to every pedestrian. For each anchor image x _a Select the image x with the same ID but the farthest distance _p As a positive sample, an image x with a different ID but closest distance is selected _n As negative examples, triplets are composed. For the image reconstruction task, the patent uses a mean square error penalty to constrain the self-encoder network.

For the ID classification task (namely, the pedestrian re-identification task), the patent adopts the ID loss L _ID Supervision is performed, and L _ID Including Cross Entropy Loss (CE) of Label Smoothing Regularization (LSR) computed on the output layer, triplet Loss (triple Loss) and Center Loss (Center Loss) computed on the embedding layer, a formulation description L _ID As shown in the following formula:

wherein f is _ID An ID feature representing any image x in a batch of input images; while

and

respectively representing the positive sample image x in this triplet _p And negative sample image x _n ID feature of (triplet loss will select one image as anchor image x in a batch of input images _a Then, an image with the same ID as the anchor image is selected as the positive sample image x _p Selecting an image different from the anchor image ID as the negative sample image x _n To compose a triplet and then decrease the distance between pairs of positive samples and increase the distance between pairs of negative samples); y is _ID An ID tag representing the image x,

then the ID label of image x after label smoothing regularization is indicated,

an ID classification result predicted for the network; weight w ₁ For weighting the importance of the central loss, in the embodiment, the weights of the cross-entropy loss and the triplet loss are preferably set to be1, and w ₁ Is set to 5 × 10 ^-4 。

In addition, the cross entropy loss L used in equation (2) _CE Triple loss L _tri And central loss L _Center The definitions of (A) are as follows:

where PK represents P IDs sampled per batch, K images are sampled per ID, and t and s represent the ID and the number of images per ID, respectively. m is interval hyperparameter of triple loss, and each triple consists of anchor image a and positive sample image p ⁺ And negative sample image n ^- Composition d _ap And d _an Respectively representing Euclidean distances between the anchor point image characteristics and the positive sample image characteristics and between the anchor point image characteristics and the negative sample image characteristics;

where B denotes the number of images in a batch,

indicating the ID features of the ith image therein,

an ID tag indicating the ith image,

is shown as

Feature centers of the categories.

5. The image reconstruction task and the pedestrian ID classification network are optimized in a multi-task combined training mode, and the specific implementation steps are as follows,

the image style information can be filtered by performing combined optimization on the image reconstruction task and the ID classification task. Specifically, the model total target loss is defined as:

wherein x _i Which is representative of the input image(s),

representing the output result of the input image after being reconstructed by the self-encoder; w is a ₁ And w ₂ Is used to weigh the importance between the losses, the present embodiment preferably sets w ₁ ＝5×10 ^-4 ,w ₂ And (2). Then, the above formula is minimized in a multitask joint training mode, and the optimal point of the model can be reached. Thus, the model optimization objective is:

6. The effectiveness of the unknown domain pedestrian re-identification method based on domain style filtering, which is provided by experimental verification on the unknown domain pedestrian re-identification data set, is specifically as follows:

the training set of Market-1501 is taken as the training set of the source domain, and the test sets of DukeMTMC-reiD, MSMT17, CUHK03 and VIPeR data sets are put together as the test set of the target domain, denoted as the M2O data set (Market 2 Others). Where the training set includes 12,936 images of 751 pedestrians from 6 cameras, the test set includes 121,073 images from an additional 27 cameras, where 15,603 images are query images and the remaining 10,5470 images are galleries. In order to reproduce other domain adaptation methods, a training set containing 70,076 images of 2,826 pedestrians is formed by using training data of DukeMTMC-reiD, MSMT17, CUHK03 and VIPeR data sets, and the domain adaptation work for other three types of classical domain adaptation methods ^[1-3] The patented method does not require the use of the training set at all.

The input image is resized to 256 × 128 using data augmentation including random horizontal flipping, random cropping, and color dithering, but without using random erasure, which is commonly used in pedestrian re-recognition, because random erasure is actually not conducive to model learning in the pedestrian re-recognition problem in cases where the domain style has large differences ^[4] . In addition, the mean and variance of ImageNet are used for carrying out data normalization processing on all training sets and testing sets, and the training sets and the testing sets are converted into tenor data which can be processed by the pytorch for model training and testing respectively. The method mainly comprises the following steps:

1) The initial network model is trained using source domain (Market 1501) data independent of target domain data. Specifically, firstly, a group of images are randomly selected in a source domain data set in a PK sampling mode to form a small batch of training samples; the forward propagation method is then performed: an input image x is subjected to an encoder E to obtain a hidden feature code f; then the code f is continuously input into a decoding network D, and a reconstruction result is output

The encoding result f is also input to the ID classification network C _ID In the method, a pedestrian ID feature f is obtained _ID And pedestrian ID prediction results

Calculating the image reconstruction loss L by respectively using an equation (1) and an equation (2) according to the output result of the forward propagation _R And ID classification loss L _ID Then weighting different losses according to the formula (6) and calculating the total loss L of the model; performing the optimization operation of equation (7) using a back-propagation method, updating the encoder E, decoder D, and ID classification network C _ID (ii) a Repeating the processes of data sampling, forward propagation, loss calculation, loss weighting and backward propagation until the model converges.

This patent preferably suggests using the ResNet50 IBN-B model pre-trained on the ImageNet dataset as the backbone network for the encoder, while removing the last spatial down-sampling operation. 16 pedestrians were selected per batch, and 4 images were selected per pedestrian for training. The whole model is optimized by using an Adam optimizer, and the initial learning rate of an encoder is 3.5 multiplied by 10 ^-4 And the initial learning rates of the decoder and the ID classification network are set to 3.5 x 10 ^-3 And the layers of all modules in the model respectively attenuate the learning rate by 10 times at the 40 th and 70 th epoch (iteration cycle), and the model training is finished until the 120 th epoch.

2) In the inference stage, the output of a full connection layer FC (2048) with 2048 neurons in an ID classification network is selected as a pedestrian feature, the cosine similarity between the query image feature and the image feature of the gallery is calculated and measured, and finally the retrieval result of the query image is obtained through sorting. Specifically, given a no-label sample data set of the target domain Market2other under the condition of unknown domain pedestrian re-recognition, given any image y, and performing forward propagation through an encoder and an ID classification network to obtain the domain invariance pedestrian feature f _ID And then calculating the cosine similarity of the features and the image features of the gallery, and sequencing according to the sequence of the similarity from large to small to obtain the pedestrian retrieval result of the query image in the gallery image of the Market2other data set.

The model is trained and tested on the NVIDIA graphics GeForce RTX 3090 of the 24G block 2, and the model can achieve better performance under the support of more graphics card devices. The method can finally carry out off-line training in only about 4h, and can realize performance far exceeding that of the existing four-class classical pedestrian re-identification method under the condition of completely not needing any domain adaptation time (zero time deployment cost), thereby really realizing the final assumption of 'one-place training and everywhere deployment'. The specific network implemented by the embodiment method may be exemplified as shown in table 1.

In order to compare performance difference and deployment time difference with those of the existing pedestrian re-identification method, the invention experimentally reproduces the performance of each open source classically working on an M2O data set in the four types of methods most representative at present, namely BoT-BS ^[4] 、SPGAN ^[2] 、HHL ^[1] And SSG ^[3] . Four works respectively aim at solving the problem of re-identifying pedestrians in a supervised single domain, the problem of re-identifying pedestrians in a cross domain by using a style migration method to reduce inter-domain differences, the problem of re-identifying pedestrians in a cross domain by exploring the style migration between cameras to reduce the differences in a target domain, and the problem of re-identifying pedestrians in a cross domain by extracting target domain features for clustering and iteratively fine-tuning a model after a simple model is pre-trained in a source domain, wherein a large amount of domain adaptation time is consumed for the three. The results of the different method replicates and the performance of the study method on the M2O dataset are shown in table 2.

TABLE 1 detailed network architecture and parameters for unknown Domain learning methods based on Domain style Filtering

TABLE 2 precision and deployment time comparison of the four most advanced methods on the unknown Domain pedestrian re-identification problem

The unknown domain pedestrian re-identification method based on the domain style filtering can efficiently remove domain style information specific to different domains by using a self-encoder through the combined training of the pedestrian re-identification task and the image reconstruction task, and realizes the identification precision far exceeding that of other four types of pedestrian re-identification methods, namely the precision of Rank-1=28.4% and mAP =11.8% displayed on the last row of the table 1, and the identification precision and the deployment time are far superior to that of other methods.

In specific implementation, a person skilled in the art can implement the automatic operation process by using a computer software technology, and a system device for implementing the method, such as a computer-readable storage medium storing a corresponding computer program according to the technical solution of the present invention and a computer device including a corresponding computer program for operating the computer program, should also be within the scope of the present invention.

In some possible embodiments, a system for re-identifying pedestrians in unknown domains based on domain style filtering is provided, which includes a processor and a memory, the memory is used for storing program instructions, and the processor is used for calling the stored instructions in the memory to execute a method for re-identifying pedestrians in unknown domains based on domain style filtering as described above.

In some possible embodiments, a system for re-identifying pedestrians in an unknown domain based on domain style filtering is provided, which includes a readable storage medium, on which a computer program is stored, and when the computer program is executed, the method for re-identifying pedestrians in an unknown domain based on domain style filtering is implemented as described above.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A method for re-identifying pedestrians in an unknown domain based on domain style filtering is characterized in that: aiming at the complex distribution conditions that a training set and a test set have data distribution differences and multiple data distributions exist in the test set at the same time, a person re-identification task in an unknown domain is established, the task firstly trains a model on a certain data set and then tests on all data sets except the data set in a combined manner;

1) Given a Source Domain dataset with a pedestrian ID tag, note

representing an image x ⁿ A corresponding pedestrian ID tag;

2) Randomly selecting a group of images in a source domain data set in a PK sampling mode to form a small batch of training samples, wherein P pedestrians are selected, and each pedestrian selects K images and forms a triple;

The code f is also input to the ID classification network C _ID In, obtainTaking pedestrian ID features f _ID And pedestrian ID prediction results

4) Respectively calculating the image reconstruction loss L according to the output result of the forward propagation _R And ID classification loss L _ID Then weighting different losses, and calculating the total loss L of the model;

Any one image y passes through an encoder E ^* And ID classification network

2. The method of claim 1, wherein the unknown domain pedestrian re-identification method based on domain style filtering is characterized in that: based on the reconstruction of the input image by the self-encoder to filter out the relevant information of the domain style, the specific training process is as follows,

constraining self-encoder using mean square error lossIn the image reconstruction process of the network, data sampling of N points is respectively carried out on data distribution p of given input samples and data distribution q of output reconstruction results, wherein N is the number of source domain training samples, and the image reconstruction loss L of an image reconstruction task is _R Described using formula (1):

3. The method for re-identifying pedestrians in unknown domain based on domain style filtering, as claimed in claim 1, wherein: the specific process of classifying the network based on ID is as follows,

taking the pedestrian re-identification task as an ID classification task, and adopting ID loss L _ID Supervision is carried out, L _ID Including label-smoothed canonical cross-entropy losses computed on the output layer, triplet losses and center losses computed on the embedding layer, a formulaic description L _ID As shown in the following formula:

and

respectively representing the positive sample image x in this triplet _p And negative sample image x _n ID feature of (a); y is _ID An ID tag representing the image x,

then the ID label of image x after label smoothing regularization is indicated,

4. The method of claim 3, wherein the unknown domain pedestrian re-identification based on domain style filtering is characterized in that: cross entropy loss L _CE Triple loss L _tri And central loss L _Center The definitions of (a) and (b) are respectively as follows,

where PK represents P IDs sampled per batch, K images per ID, tAnd s represents an ID and the number of images of each ID, respectively; m is interval hyperparameter of triple loss, and each triple consists of anchor point image a, positive sample image p ⁺ And negative sample image n ^- Composition d _ap And d _an Respectively representing Euclidean distances between the anchor point image characteristics and the positive sample image characteristics and between the anchor point image characteristics and the negative sample image characteristics;

where B denotes the number of images in a batch,

indicating the ID features of the ith image therein,

an ID tag indicating the ith image,

is shown as

Feature centers for categories.

5. The method of claim 1, wherein the unknown domain pedestrian re-identification method based on domain style filtering is characterized in that: the image reconstruction task and the pedestrian ID classification network are optimized in a multi-task combined training mode, the implementation steps are as follows,

wherein x _i Which represents the input image, is,

6. The method for re-identifying pedestrians in unknown domain based on domain style filtering, as claimed in claim 1, wherein: using a ResNet50 IBN-B model pre-trained on the ImageNet dataset as the backbone network for the encoder, while removing the last spatial downsampling operation; during model testing, the output of the full-link FC in the ID classification network is selected as the pedestrian feature, the cosine similarity between the query image feature and the image feature of the image library is calculated for measurement, and finally the retrieval result of the query image is obtained through sequencing.

7. An unknown domain pedestrian re-identification system based on domain style filtering is characterized in that: for implementing a domain-style-filtering-based unknown domain pedestrian re-identification method according to any one of claims 1 to 6.

8. The system for re-identifying pedestrians in unknown domain based on domain style filtering-out of claim 7, wherein: comprising a processor and a memory, the memory being configured to store program instructions, the processor being configured to invoke the stored instructions in the memory to perform a method of pedestrian re-identification of unknown domains based on domain style filtering as claimed in any one of claims 1 to 6.

9. The system according to claim 7, wherein the unknown domain pedestrian re-identification system based on domain style filtering is characterized in that: comprising a readable storage medium having stored thereon a computer program which, when executed, implements a method of domain style filtering-based unknown domain pedestrian re-identification as claimed in any one of claims 1 to 6.