CN111738039B

CN111738039B - Pedestrian re-identification method, terminal and storage medium

Info

Publication number: CN111738039B
Application number: CN201910391548.6A
Authority: CN
Inventors: 王晓波; 石海林; 汪硕; 傅天宇; 梅涛
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-05-10
Filing date: 2019-05-10
Publication date: 2024-05-24
Anticipated expiration: 2039-05-10
Also published as: CN111738039A

Abstract

The embodiment of the invention provides a pedestrian re-identification method, a terminal and a storage medium, wherein the pedestrian re-identification method comprises the following steps: obtaining a first model associated with a first image comprising a first set of images; screening target images, the similarity of which meets a target threshold, from second images forming a second image set; the target clustering center comprises a center point of each type of image in the second image; adjusting parameters of the first model based on the target image to obtain a second model associated with the second image; wherein the acquisition scene of the first image is different from the acquisition scene of the second image; obtaining an image to be queried containing a target pedestrian, and identifying the image to be queried based on a second model to obtain a target image containing the target pedestrian in a second image; the acquisition scene of the image to be queried is the same as the acquisition scene of the second image.

Description

Pedestrian re-identification method, terminal and storage medium

Technical Field

The present invention relates to, but not limited to, the field of computer technologies, and in particular, to a pedestrian re-recognition method, a terminal, and a storage medium.

Background

Pedestrian re-recognition (Person-identification), also known as pedestrian re-recognition, is a technique that uses computer vision techniques to determine whether a target pedestrian is present in an image or video sequence. The method has important application in the fields of urban safety, intelligent video analysis and the like. Pedestrian re-recognition may be understood as retrieving a pedestrian image under a cross-device given a monitored pedestrian image. In recent years, with the development of deep learning and the appearance of large-scale open source data sets, a pedestrian re-identification algorithm has made a large breakthrough.

However, due to the influence of factors such as illumination, wearing styles of background pedestrians and the like, different pedestrian data sets have large differences, namely large field differences exist, so that the existing pedestrian re-recognition algorithm has high performance on a source field with a tag and poor performance on a target field without the tag. For this reason, in the related art, the following cross-domain pedestrian re-recognition method is adopted to migrate the image of the target domain to the source domain, and the model trained on the source domain is used to recognize the image to be recognized associated with the target domain; however, the cross-domain pedestrian re-recognition method adopted in the related art cannot effectively improve the accuracy of recognizing the image to be recognized associated with the target domain, and further cannot realize accurate recognition.

Disclosure of Invention

The embodiment of the invention provides a pedestrian re-recognition method, a terminal and a storage medium, which are used for solving the problems that the cross-domain pedestrian re-recognition method adopted in the related technology cannot effectively improve the accuracy of recognizing the image to be recognized associated with the target domain, and further cannot realize accurate recognition.

The technical scheme of the embodiment of the invention is realized as follows:

a pedestrian re-identification method, the method comprising:

Obtaining a first model associated with a first image comprising a first set of images;

Screening target images, the similarity of which meets a target threshold, from second images forming a second image set; the target clustering center comprises a center point of each category of image in the second image;

Adjusting parameters of the first model based on the target image to obtain a second model associated with the second image; wherein the acquisition scene of the first image is different from the acquisition scene of the second image;

Obtaining an image to be queried containing a target pedestrian, and identifying the image to be queried based on the second model to obtain a target image containing the target pedestrian in the second image; the acquisition scene of the image to be queried is the same as the acquisition scene of the second image.

Optionally, the obtaining a first model associated with a first image that forms the first image set includes:

Acquiring pedestrian attribute information and pedestrian identification information of pedestrians included in the first image;

and training to obtain the first model based on the pedestrian attribute information and the pedestrian identification information.

Optionally, the screening the target image with the similarity with the target cluster center meeting the target threshold from the second images forming the second image set includes:

Setting a first category number of pedestrians included in the second image;

Determining a first cluster center of each category of images in the second image based on the first category number; wherein the target cluster center comprises the first cluster center;

Determining an image, of which the similarity with each first clustering center meets a first similarity threshold value, from the second image as a third image; wherein the target image includes the third image;

correspondingly, the adjusting the parameters of the first model based on the target image to obtain a second model associated with the second image includes:

and adjusting parameters of the first model based on the third image to obtain the second model.

Optionally, the setting the first category number of the second image includes:

Acquiring a second category number of pedestrians included in the second image;

And adding a first threshold to the second category number to obtain the first category number.

Optionally, the adjusting the parameter of the first model based on the third image to obtain the second model includes:

The parameters of the first model are adjusted based on the third image, so that a third model is obtained;

if the first similarity threshold is larger than the second threshold, generating a second similarity threshold; wherein the second similarity threshold is less than the first similarity threshold;

setting a third category number of pedestrians included in the third image;

determining a second class center for each class of images in the third image based on the third class number; wherein the target cluster center comprises the second cluster center;

Determining an image, of which the similarity with each second aggregation center accords with the second similarity threshold value, from the third image as a fourth image; wherein the target image includes the fourth image;

and adjusting parameters of the third model based on the fourth image to obtain the second model.

Optionally, if the first similarity threshold is greater than the second threshold, generating the second similarity threshold includes:

if the first similarity threshold is larger than the second threshold, subtracting a preset lower similarity threshold from a preset upper similarity threshold to obtain a first parameter;

Multiplying the first parameter by a preset attenuation rate to obtain a second parameter;

Subtracting the second parameter from the first similarity threshold to obtain the second similarity threshold.

Optionally, the adjusting the parameter of the third model based on the fourth image to obtain the second model includes:

Determining a first target set corresponding to a fifth image in the fourth image; wherein the first target set comprises a plurality of images of the fourth image, wherein the distance between the images and the fifth image is within a target distance range; wherein the fifth image is any one of the fourth images;

Deleting images which are the same as the image acquisition unit to which the fifth image belongs in the first target set to obtain a second target set;

Determining images except for the images in the first target set in the fourth image to form a third target set;

and adjusting parameters of the third model based on the second target set and the third target set to obtain the second model.

Optionally, the adjusting the parameter of the third model based on the second target set and the third target set to obtain the second model includes:

obtaining a first characteristic image with the farthest distance from the fifth image in the second target set;

Screening a second characteristic image from the third target set;

And if the distance between the first characteristic image and the fifth image is larger than the distance between the second characteristic image and the fifth image, adjusting the parameters of the third model based on the first characteristic image, the second characteristic image and the fifth image to obtain the second model.

A terminal, the terminal comprising:

A memory for storing executable instructions;

and the processor is used for executing the executable instructions stored in the memory to realize the steps of the pedestrian re-identification method.

A storage medium storing one or more programs executable by one or more processors to implement the steps of the pedestrian re-recognition method described above.

The embodiment of the invention has the following beneficial effects: in the first aspect, pedestrian attribute information and pedestrian identification information are used for joint training aiming at a first image set, namely a source domain, so that generalization capability of a source domain model is improved; in the second aspect, dynamic threshold sampling is used for a second image set, namely a target domain, so as to improve a self-supervision learning algorithm, and images which are reliable and can improve the retrieval capacity of the model across cameras are screened out; in a third aspect, reliable cross-camera samples are mined on a target domain using spatial relationships, where spatial relationship mining is also mining samples that may have the same pedestrian identification information under different image acquisition units, such as cameras, to train the model, improving the ability of the model to search across cameras, and thus improving the performance of the model on the target domain. The strategy based on the three aspects builds a model aiming at the target domain and carries out pedestrian re-recognition, so that the recognition performance is far superior to that of the cross-domain pedestrian re-recognition algorithm published at present. That is, for the source domain, the pedestrian attribute information and the pedestrian identification information are trained jointly through the source domain; and aiming at the target domain, sampling a dynamic threshold value and mining the spatial relationship to improve the identification performance of the source domain model in the target domain. In practical application, in the task of mutual migration of the public databases Market-1501 and DukeMTMC-reID, the performance of the model is far beyond the best unsupervised cross-domain pedestrian re-identification algorithm published at present.

Because a first model associated with a first image comprising the first set of images is obtained; screening target images, the similarity of which meets a target threshold, from second images forming a second image set; the target clustering center comprises a center point of each type of image in the second image; adjusting parameters of the first model based on the target image to obtain a second model associated with the second image; wherein the acquisition scene of the first image is different from the acquisition scene of the second image; obtaining an image to be queried containing a target pedestrian, and identifying the image to be queried based on a second model to obtain a target image containing the target pedestrian in a second image; the acquisition scene of the image to be queried is the same as the acquisition scene of the second image; that is, after the first model is obtained, screening the second images in the second image set to screen out target images with similarity between the second images and the center points of the images in each category meeting a target threshold, and adjusting parameters of the first model based on the target images to obtain a second model for the second image set, wherein the first model is not directly used for the second image set; therefore, the problem that the accuracy rate of identifying the image to be identified associated with the target domain cannot be effectively improved by the cross-domain pedestrian re-identification method adopted in the related technology, and further accurate identification cannot be achieved is solved, the accuracy rate of cross-domain identification is achieved, and the accuracy rate of target domain identification is improved.

Drawings

Fig. 1 is a schematic flow chart of a pedestrian re-recognition method according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of another pedestrian re-recognition method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of source domain model training provided by an embodiment of the present invention;

FIG. 4 is a schematic diagram of training a target domain model according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart of another pedestrian re-recognition method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

In the embodiment of the invention, the data set with the existing label is defined as the source domain, and the data set without the label is defined as the target domain.

Here, a brief description will be made of a pedestrian re-recognition method related to the related art:

1. Pedestrian re-recognition algorithm for feature segmentation (Beyond Part Models: person RETRIEVAL WITH REFINED PART Pooling, PCB).

The general method uses a 50-layer residual network to extract the features, and the extracted feature images are subjected to global pooling (global pooling) and then are trained by a normalized exponential function (Softmax); the PCB horizontally divides the extracted feature map into a plurality of parts, each part is independently trained by Softmax, the features of each part can correspond to a part of a human body, and the human body description is more refined, so that the PCB can remarkably improve the performance of a pedestrian re-recognition algorithm. It should be noted that, although the method can significantly improve the accuracy of the algorithm, the accuracy of the cross-library identification is still lower.

2. Pedestrian re-recognition algorithm (IN DEFENSE of THE TRIPLET Loss for Person Re-Identification) based on triplet loss function.

The triple loss function (triple loss) in measurement learning is introduced into the pedestrian re-recognition algorithm, the distance in the class can be reduced, the distance between the classes can be enlarged, the accuracy of the pedestrian re-recognition algorithm is improved, but the cross-database re-recognition capability of the model is difficult to improve.

3. The reordering method (Re-ranking Person Re-identification with k-reciprocal Encoding).

The re-ordering algorithm uses the primary retrieval result of the model to re-retrieve in the query set, re-orders the primary retrieval result of the model by the re-retrieved result, and the re-ordered result is generally significantly better than the primary ordered result. However, in the cross-library pedestrian re-recognition, the cross-library re-recognition capability of the model is poor, and the improvement brought by the reordering algorithm is difficult to be satisfactory.

The pedestrian re-identification algorithm has the problems of high performance of a source domain and poor performance on a target domain. In order to improve the identification accuracy of the target domain, two ideas are taken as follows in the related technology:

firstly, performing image style migration by using an countermeasure generation network;

Representative methods for solving pedestrian cross-library re-recognition using an countermeasure generation Network mainly include generating a countermeasure Network (person TRANSFER GENERATIVE ADVERSARIAL Network, PTGAN) and (SIMILARITY PRESERVING CYCLE-persistence GENERATIVE ADVERSARIAL Network, SPGAN) for pedestrian re-recognition. PTGAN migrating the image of the target domain to the source domain, and directly identifying by using a model trained on the source domain; SPGAN migrating the image of the source domain to the target domain, training the model by using the migrated image, and improving the performance of the model in the target domain. Both of these methods are unsatisfactory for improving accuracy of target domain identification.

Secondly, using an unsupervised transfer learning algorithm to transfer the model from the source domain to the target domain;

The representative work of the unsupervised migration algorithm mainly comprises a migration algorithm based on pedestrian attribute alignment, such as transferable joint attribute recognition deep learning (Transferable Joint Attribute-IDENTITY DEEP LEARNING, TJ-AIDL), a method ARN based on domain information separation and an algorithm PUL based on domain adaptive clustering. TJ-AIDL aligns the information of attribute pedestrian attribute in the characteristics of the source domain and the target domain, so as to achieve the purpose of model migration. The ARN separates the domain information from the pedestrian's identity ID information and learns to the domain-invariant (domain-invariant) feature.

As can be seen from the above, the pedestrian re-recognition methods in the related art have certain problems, and the improvement of the target domain recognition accuracy is limited, for the following reasons:

The method is based on an countermeasure generation network algorithm, and the countermeasure generation network regards the field difference between the data sets as the style difference of the images. Therefore, to eliminate the domain difference between the source domain and the target domain, PTGAN migrates the style of the target domain image into the style of the source domain image, and then uses the existing high-performance model on the source domain for recognition; SPGAN migrating the source domain image style into a target domain image style, training a model by using the images before migration, and finally identifying the images of the target domain by using the trained model. SPGAN and PTGAN solve the problem of domain differences to some extent, but in the process of antagonizing the generation of network transformation image styles, the identity information of the images is lost, so that the models are difficult to identify the images after migration.

Based on an unsupervised transfer learning method, on one hand, TJ-AIDL considers that the pedestrian attribute is common point of all pedestrian data sets, such as gender, clothes color, trousers color and pedestrian attribute, and can help to improve the accuracy of a pedestrian re-identification algorithm. Therefore, TJ-AIDL aligns the data of the source domain and the attribute of the target domain, and improves the performance of the model on the target domain. However, since there is no label on the target domain, the attribute features extracted from the target domain are not correct, and forced alignment using an algorithm results in limited improvement of the target domain identification rate by TJ-AIDL. On the other hand, for the ARN, two branches are connected behind the backbone network, one branch extracts the characteristics containing the identity information, the other branch extracts the characteristics containing the domain information, and the F norms of the two characteristics are maximized to achieve the purpose of separating the domain information from the identity information, so that the characteristics which are irrelevant to the domain information and can be effectively classified are extracted. However, it is very difficult to completely dissociate the identity information and the domain information, especially in the case of no tag in the target domain, besides this, the branching of the extracted domain information and the identity information behind the backbone network greatly increases the model parameters, and the addition of the F-norm leads to extremely unstable training and optimization of the model. Although the ARN greatly improves the recognition accuracy of the model on the target domain, the ARN has the defects of complex realization and difficult training, and is difficult to be applied in practice. In another aspect, the PUL uses a clustering method, uses a model trained by a source domain, extracts features of a target domain, uses the extracted features to perform clustering, and uses a pseudo tag training model obtained by the clustering. Clustering and training constitute a self-supervised learning mode. The PUL has the advantages of easy realization and robust training, and has the defects of strong dependence on the starting point of learning, too low starting point and very limited improvement of the self-supervision learning method, thus having strong requirements on generalization capability of a source domain model. In addition, in the self-supervised learning algorithm, the labels obtained in the clustering process often have wrong samples, and how to select reliable samples from the clustering results has a great influence on the self-supervised learning algorithm.

The embodiment of the invention provides a pedestrian re-identification method, which is applied to a terminal, and is shown in fig. 1, and the method comprises the following steps:

Step 101, obtaining a first model associated with a first image constituting a first set of images.

In the embodiment of the present invention, the first image set may be a public data set. The first model is a model obtained by training the terminal based on the first image.

And 102, screening out target images, of which the similarity with the target clustering center meets a target threshold, from second images forming a second image set.

The target cluster center comprises a center point of each class of image in the second image. Here, the target image is used to adjust the parameters of the first model.

In an embodiment of the present invention, the second image set may be a public data set.

Illustratively, the disclosed dataset includes Market-1501 and DukeMTMC-reID, with Market-1501 being used as the first image set for the task of Market-1501 migration to DukeMTMC-reID, which may also be referred to herein as a labeled source domain dataset; dukeMTMC-reID as a second set of images, which may also be referred to herein as a label-free target domain dataset. Of course, dukeMTMC-reID is used as the first image set, namely the labeled source domain dataset, for the DukeMTMC-reID migrate to the marker-1501 task, and marker-1501 is used as the second image set, namely the unlabeled target domain dataset.

And step 103, adjusting parameters of the first model based on the target image to obtain a second model associated with the second image.

Wherein the acquisition scene of the first image is different from the acquisition scene of the second image. Here, the second model is used to identify the image to be queried in the acquisition scene of the second image.

In the embodiment of the invention, the acquired scene of the first image is different from the acquired scene of the second image, and the identification of the pedestrian is cross-domain, and it can be understood that the different scenes are also called different fields.

Step 104, obtaining an image to be queried containing the target pedestrian, and identifying the image to be queried based on the second model to obtain a target image containing the target pedestrian in the second image.

The acquisition scene of the image to be queried is the same as the acquisition scene of the second image.

The pedestrian re-recognition method provided by the embodiment of the invention obtains a first model associated with a first image forming a first image set; screening target images, the similarity of which meets a target threshold, from second images forming a second image set; the target clustering center comprises a center point of each type of image in the second image; adjusting parameters of the first model based on the target image to obtain a second model associated with the second image; wherein the acquisition scene of the first image is different from the acquisition scene of the second image; obtaining an image to be queried containing a target pedestrian, and identifying the image to be queried based on a second model to obtain a target image containing the target pedestrian in a second image; the acquisition scene of the image to be queried is the same as the acquisition scene of the second image; that is, after the first model is obtained, screening the second images in the second image set to screen out target images with similarity between the second images and the center points of the images in each category meeting a target threshold, and adjusting parameters of the first model based on the target images to obtain a second model for the second image set, wherein the first model is not directly used for the second image set; therefore, the problem that the accuracy rate of identifying the image to be identified associated with the target domain cannot be effectively improved by the cross-domain pedestrian re-identification method adopted in the related technology, and further accurate identification cannot be achieved is solved, the accuracy rate of cross-domain identification is achieved, and the accuracy rate of target domain identification is improved.

According to the foregoing embodiment, the embodiment of the present invention provides a pedestrian re-recognition method, which is applied to a terminal, as shown in fig. 2, and includes:

Step 201, pedestrian attribute information and pedestrian identification information of a pedestrian included in the first image are obtained.

Step 202, training to obtain a first model based on pedestrian attribute information and pedestrian identification information.

In the embodiment of the invention, the source domain is trained by combining two tasks of pedestrian attribute information and pedestrian identification information, wherein the pedestrian attribute information comprises seven pedestrian attributes which are universal on almost all pedestrian databases, namely age, cap, knapsack, satchel, handbag, upper half body color and lower half body color.

Exemplary, training of the source domain as shown in fig. 3, using Softmax as a loss function in the source domain training, age, hat, backpack, satchel, handbag are all two-class classifiers, with upper-body color and lower-body color being multi-class classifiers.

The backbone network of the source domain model comprises a convolution layer of 50 layers of residual networks Resnet-50, a global average pooling layer (Global Average Pooling, GAP), a batch standardization (Batch Normalization, BN) layer, a full connection layer outputting 1024 dimensions and a BN layer which are sequentially connected.

In the training process of the source domain model, two tasks of pedestrian Identification (ID) information classification and pedestrian attribute information are used for joint training. The seven kinds of ages are respectively classified into children and adults, whether caps are provided with caps, whether backpacks are provided with backpacks, whether satchels are provided with satchels, and whether handbags are provided with handbags, and the colors of the upper half body and the lower half body are multi-classification tasks. In the training process, the input of the source domain model is the picture of the pedestrian in the source domain data set; the output is a prediction of the ID and seven pedestrian attributes of the input picture by the classifier of the source domain model. And calculating a Softmax loss function by utilizing the artificially marked ID, the labels of seven pedestrian attributes and the corresponding prediction of the source domain model of the pedestrian pictures in the source domain data set, and carrying out gradient back transmission training on the loss function to obtain the source domain model. The structure of the source domain model consists of a backbone network and a classifier. For any picture, the 1024-dimensional output obtained by inputting the picture into the backbone network is the characteristic vector of the image. The classifier includes a pedestrian ID information classifier and a classifier corresponding to pedestrian attribute information. The tested data are divided into a query set and a registration set, and a main network of a source domain model is used for respectively extracting the characteristic vectors of the pictures of the query set and the pictures of the registration set; the search result of the pictures in the query set in the registration set can be obtained by comparing the Euclidean distance of the picture in the query set and the feature vector of the picture in the registration set; and calculating the average accuracy (Rank 1) of all the picture retrieval results in the query set and the average accuracy (mAP) of all the retrieval results as evaluation indexes of the source domain model. The source domain model only needs to be tested on the source domain, and the model with the best performance on the source domain is selected as an initialization model of the model which is suitable for the second image set and is the second stage model.

Step 203, setting a first category number of pedestrians included in the second image.

In the embodiment of the present invention, step 203 sets a first class number of the second image, including the following steps:

step 203a, obtaining a second category number of pedestrians included in the second image.

The second category number is the true category number of the second image, and it can be understood that the second category number is the total number of all pedestrians included in the second image, and all pedestrians include all objects selected when the image acquisition unit associated with the second image set performs image acquisition.

In the embodiment of the invention, for the task of migrating the mark-1501 to DukeMTMC-reID, the second category number refers to the real category number of the sample image included in DukeMTMC-reID.

In the embodiment of the invention, for the task of DukeMTMC-reID to migrate to the mark-1501, the second category number refers to the real category number of the sample image included in the mark-1501.

Step 203b, adding the first threshold to the second class number to obtain a first class number.

In the embodiment of the invention, because the data of the target domain is not manually marked, the embodiment of the invention trains the model of the target domain by using the pseudo tag obtained by the clustering algorithm in the experimental process. For example, the K-means algorithm is adopted to cluster the target domain data, and the false labels obtained by the clustering algorithm are found to be more obvious in improvement on training the training target domain model when the given cluster type parameter in the K-means algorithm is slightly larger than the real type in the data set.

In the embodiment of the invention, for the experimental data sets Market-1501 and DukeMTMC-reID, the number of the true categories, namely the number of the true categories, is about 700, so that the number of the classification categories of the classifier can be set to 900.

In the embodiment of the invention, a backbone network of a source domain model is shared to a target domain, when the target domain data training model is used, parameters of the backbone network of the source domain model are used as initialization of backbone network parameters of the target domain model, a new randomly initialized classifier is added for ID classification of pedestrians, and the classification number of the classifier is set to 900 in an exemplary manner.

In the embodiment of the invention, an initial dynamic threshold is set as an upper bound U and a threshold lower bound L, wherein the initial threshold lambda 1 is set as U, the initial value of the upper bound U is 0.8, and the initial value of the lower bound L is 0.7; given the decay rate η of the threshold λ1, the value of the initialization η is 0.0015. Here, setting the relevant parameters of the dynamic threshold value, and selecting more reliable samples from the second image set to train based on the dynamic threshold value, so as to obtain a model applicable to the second image.

Step 204, determining a first cluster center of each class of images in the second image based on the first class number.

Wherein the target cluster center comprises a first cluster center.

In the embodiment of the present invention, as shown in fig. 4, the training of the target domain uses the same backbone network as the source domain to extract features, uses features to cluster, and designates each sample as its category from the nearest category center.

Here, since the output of the backbone network is 1024-dimensional features, the output of ResNet-50 is 2048-dimensional, and the 2048-dimensional feature dimension is too high, resulting in poor clustering result of the clustering algorithm, we use one BN layer, one fully connected layer outputting 1024-dimensional and one BN layer structure to perform dimension reduction, so as to avoid information loss of feature vector caused by too low dimension reduction, and we position 1024-dimensional dimension reduction.

Since the data of the target domain is not manually marked, we only classify the ID in the target domain and do not train by using the attribute of the pedestrian. The purpose of classifying by using pedestrian attributes in the source domain is to enhance the generalization capability of the source domain model, and provide a better basic model for training the target domain. The characteristics used by the target domain and the source domain are 1024-dimensional vectors output by the backbone network.

Step 205, determining, from the second images, an image whose similarity with each first cluster center meets the first similarity threshold as a third image.

Wherein the target image comprises a third image.

In the embodiment of the present invention, the similarity may include cosine similarity. The terminal determines an image, the cosine similarity of which between the terminal and each first clustering center accords with a first similarity threshold value, from the second image as a third image; for example, the terminal selects a third image with cosine similarity greater than λ1 between the image and its corresponding first cluster center as a training sample, otherwise, the third image is discarded and not used for training.

And step 206, adjusting parameters of the first model based on the third image to obtain a second model.

In the embodiment of the present invention, step 206 adjusts parameters of the first model based on the third image to obtain a second model, including the following steps:

Step 206a, adjusting parameters of the first model based on the third image to obtain a third model.

Step 206b, if the first similarity threshold is greater than the second threshold, generating a second similarity threshold.

Wherein the second similarity threshold is less than the first similarity threshold.

In the embodiment of the present invention, if the first similarity threshold is greater than the second threshold in step 206b, a second similarity threshold is generated, which includes the following steps:

first, if the first similarity threshold is greater than the second threshold, subtracting a preset lower similarity threshold from a preset upper similarity threshold to obtain a first parameter.

And then multiplying the first parameter by a preset attenuation rate to obtain a second parameter.

And finally, subtracting the second parameter from the first similarity threshold to obtain a second similarity threshold.

In the embodiment of the present invention, the terminal may generate a second similarity threshold λ2 smaller than the first similarity threshold λ1 by using the following formula 1,

Λ2=λ1- η× (U-L) (formula 1)

In the embodiment of the invention, the similarity threshold is used for controlling the selection of training samples, and if the similarity threshold is too large, samples belonging to the same category in the selected samples are almost pictures of the front and rear frames of the same person under the same camera. The essence of the pedestrian re-identification essence problem is to find out pictures belonging to the same pedestrian under different cameras, and the problem is a cross-camera retrieval problem. Samples selected based on an excessive similarity threshold are therefore not helpful for the promotion of pedestrian re-recognition tasks. If the similarity threshold is too low, a large number of erroneous samples exist in the samples belonging to the same category among the selected samples. Therefore, when the training initial model has weak discrimination capability in the target domain, a larger similarity threshold value is set, so that the reliability of the selected training sample is ensured; along with the progress of training, the discrimination capability of the model to the source domain data is enhanced, the similarity threshold is attenuated, some samples crossing the camera can be selected, and the retrieval capability of the model crossing the camera is improved. In the embodiment of the invention, if the threshold value is smaller than the lower limit L, training is stopped.

Step 206c, setting a third category number of pedestrians included in the third image.

In the embodiment of the invention, the setting method of the third category number is similar to the setting method of the first category number.

Step 206d, determining a second class center of the image of each class in the third image based on the third class number.

Wherein the target cluster center comprises a second cluster center.

Step 206e, determining, from the third image, an image whose similarity with each second center of the second cluster meets the second similarity threshold as a fourth image.

Wherein the target image comprises a fourth image.

And step 206f, adjusting parameters of the third model based on the fourth image to obtain a second model.

In the embodiment of the present invention, step 206f adjusts parameters of the third model based on the fourth image to obtain the second model, and includes the following steps:

A first step of determining a first target set corresponding to a fifth image in the fourth image; wherein the first target set comprises a plurality of images in the fourth image, and the distance between the images and the fifth image is within the target distance range.

Wherein the fifth image is any one of the fourth images. Here, the fifth image may be denoted by x _a.

In an exemplary embodiment of the present invention, the first target set C1 includes 15 images, which are 15 samples { x ₁,x₂,x₃,…,x₁₅ } found in the feature space and having a distance between x _a within the target distance range. The 15 samples can be considered to be the closest samples to x _a.

And deleting the image which is the same as the image acquisition unit to which the fifth image belongs in the first target set to obtain a second target set.

Here, the second target set C2 is a subset of the first target set.

And thirdly, determining images except for the images in the first target set in the fourth image to form a third target set.

Here, the fourth image includes an image in the third target set C3 and an image in the first target set C1.

And step four, adjusting parameters of the third model based on the second target set and the third target set to obtain a second model.

In the embodiment of the present invention, the fourth step adjusts parameters of the third model based on the second target set and the third target set to obtain the second model, and may further include the following steps:

First, a first feature image of the second target set that is farthest from the fifth image is obtained.

And secondly, screening out a second characteristic image from the third target set.

And finally, if the distance between the first characteristic image and the fifth image is larger than the distance between the second characteristic image and the fifth image, adjusting parameters of the third model based on the first characteristic image, the second characteristic image and the fifth image to obtain a second model.

In the first set of targets C1, the sample furthest from x _a is selected as positive sample x _p, and one sample not belonging to the set C1 is randomly selected in the target domain as negative sample x _n, which are also in different cameras with x _a. Finally, the model is further trained by using the triplet loss function, and the model is obtained after training through a target domain, such as a formula 2.

L _triplet＝max(||x_a-x_p||₂-‖x_a-x_n‖₂ +margin, 0) (equation 2)

In the embodiment of the invention, the purpose of the triplet loss function is to pull the distance between the negative sample and x _a away and to reduce the distance between the positive sample and x _a. max represents the maximum meaning, and margin represents a threshold value of the distance difference. If the difference between the negative sample distance of |x _a-x_n‖₂ and the positive sample distance of |x _p-x_n||₂ is less than margin, i.e. the distance between the negative samples of x _n and x _a is greater than the distance between the positive samples of x _p and x _a, then there is |x _a-x_p||_n-‖x_a-x_n‖₂ +margin less than 0 and l _triplet is equal to 0, we consider that this sample is sufficiently safe to be distinguishable by the model, so that the set of positive and negative samples is not selected for optimization; whereas the difference between the negative sample distance of x _a-x_n‖₂ and the positive sample distance of x _p-x_n||₂ is greater than the margin, i.e. the distance between the negative samples of x _n and x _a is less than the distance between the positive samples of x _p and x _a, then there is a value of x _a-x_p||₂-‖x_a-x_n‖₂ + margin greater than 0, l _triplet is equal to x _a-x_p||₂-‖x_a-x_n‖₂ + margin, we consider that the set of positive and negative samples are indistinguishable from the model, so we need to train the model using such positive and negative samples, the discrimination capability of the model is enhanced.

Step 207, obtaining an image to be queried containing the target pedestrian, and identifying the image to be queried based on the second model to obtain a target image containing the target pedestrian in the second image.

Based on the pedestrian re-recognition method provided by the invention, the target pedestrian is recognized, and the following test conclusion is obtained:

(1) Promotion of source domain model by attribute

TABLE 1 training on Source Domain, results tested directly on target Domain

As can be seen from the data in table 1, the result of the lifting of the source domain by the pedestrian attribute information is shown in table 1, the pedestrian attribute information is added to perform the combined training on the basis of training on the source domain by using the pedestrian identification information, and the value of the evaluation index corresponding to the attribute is larger than the value of the evaluation index corresponding to the non-attribute, that is, the pedestrian attribute information and the pedestrian identification information are used to perform the combined training, so that the generalization capability of the source domain model can be improved, and a high training starting point is provided for the self-supervision learning of the target domain.

(2) Promotion of dynamic threshold sampling

Table 2 comparison of dynamic threshold samples and fixed threshold samples

Table 2 shows the results of comparing the dynamic threshold sampling with the fixed threshold sampling, and it is clear from the data in table 1 that the improvement by the dynamic threshold sampling is higher than the improvement by the fixed threshold sampling when the self-monitoring training is performed on the target domain.

(3) Lifting of spatial relationship mining

Table 3 results of mining using spatial relationships

Table 3 shows the results of spatial relationship mining, based on the data in Table 3, the model trained on the target domain using the second stage self-supervised learning algorithm is used to mine reliable sample pairs across cameras, and the triplet loss function is used to train to achieve spatial relationship mining, after the performance of the model on the target domain has far exceeded the published best results.

Referring to fig. 5, in the pedestrian re-recognition method provided in the embodiment of the present invention, in the first aspect, during source domain training, seven kinds of pedestrian attribute information, such as age, cap, backpack, satchel, handbag, upper body color, lower body color and pedestrian identification information, are used for joint training, so as to improve generalization of a source domain model; here, the general supervision algorithm only uses pedestrian identification to train the model, but pedestrian identifications among pedestrians in different data sets are basically not coincident, so that the generalization capability of the model trained on source domain data on a target domain is poor; although pedestrian identifications of pedestrians in different data sets are not coincident, attributes of pedestrians such as age, hat, knapsack, satchel, handbag, upper body color, lower body color and the like are almost the same in different data sets, so that the source domain model is trained by combining pedestrian attribute information and pedestrian identification information on the source domain, generalization capability of the source domain model on the target domain can be improved, a better basic model is provided for target domain training, and a high starting point is created for self-supervision training.

In a second aspect, when the target domain is trained using a self-supervised learning algorithm, the training samples are sampled with dynamic thresholds, the upper and lower bounds of the dynamic thresholds, and the decay rate of the dynamic thresholds are given. In the training process, as the discrimination capability of the model in the target domain is enhanced, the dynamic threshold gradually decays from the upper bound to the lower bound, and more reliable sample training models can be selected.

In the third aspect, on the basis of a self-supervision learning algorithm, for any sample x _a in a target domain, searching 15 samples { x ₁,x₂,x₃,…,x₁₅ } closest to x _a in a feature space, and removing samples in the C1 set and x _a in the same camera to obtain C2. Among the C2 sets, the sample farthest from x _a is selected as positive sample x _p, and a sample of the C1 set not belonging to x _a is randomly selected in the target domain as negative sample x _n. Finally, the model is further trained using a triplet loss function (triplet loss).

That is, three strategies of dynamic threshold sampling and spatial relation mining improve the recognition performance of the source domain model in the target domain through the source domain pedestrian attribute combined training. In the task of disclosing the mutual migration of the databases Market-1501 and DukeMTMC-reID, the performance of the model far exceeds the best unsupervised cross-domain pedestrian re-identification algorithm published at present.

It should be noted that, in this embodiment, the descriptions of the same steps and the same content as those in other embodiments may refer to the descriptions in other embodiments, and are not repeated here.

Based on the foregoing embodiments, an embodiment of the present invention provides a terminal, which may be applied to a pedestrian re-recognition method provided in the embodiments corresponding to fig. 1 and 2, and referring to fig. 6, the terminal 3 includes: a processor 31, a memory 32, and a communication bus 33, wherein:

The communication bus 33 is used to enable a communication connection between the processor 31 and the memory 32.

The processor 31 is configured to execute the pedestrian re-recognition program stored in the memory 32 to implement the steps of:

screening target images, the similarity of which meets a target threshold, from second images forming a second image set; the target clustering center comprises a center point of each type of image in the second image;

Obtaining an image to be queried containing a target pedestrian, and identifying the image to be queried based on a second model to obtain a target image containing the target pedestrian in a second image; the acquisition scene of the image to be queried is the same as the acquisition scene of the second image.

In other embodiments of the present invention, the processor 31 is configured to perform the steps of obtaining the first model associated with the first image forming the first image set in the memory 32 by:

Acquiring pedestrian attribute information and pedestrian identification information of pedestrians included in a first image;

Based on the pedestrian attribute information and the pedestrian identification information, training to obtain a first model.

In other embodiments of the present invention, when the processor 31 is configured to perform the screening of the target images in the memory 32, from the second images forming the second image set, the target images having the similarity with the target cluster center meeting the target threshold value, the screening may be performed by:

Setting a first category number of pedestrians included in the second image;

Determining a first cluster center of each category of images in the second image based on the first category number; the target clustering center comprises a first clustering center;

Determining an image, the similarity of which between the image and each first clustering center accords with a first similarity threshold value, from the second image as a third image; wherein the target image comprises a third image;

Correspondingly, adjusting parameters of the first model based on the target image to obtain a second model associated with the second image, including:

and adjusting parameters of the first model based on the third image to obtain a second model.

In other embodiments of the present invention, when the processor 31 is configured to execute the setting of the first category number of the second image in the memory 32, the following steps may be implemented:

Acquiring a second category number of pedestrians included in the second image;

And adding the first threshold value to the second class number to obtain a first class number.

In other embodiments of the present invention, the processor 31 is configured to perform the adjustment of the parameters of the first model based on the third image in the memory 32, and when obtaining the second model, the adjustment may be implemented by the following steps:

adjusting parameters of the first model based on the third image to obtain a third model;

and if the first similarity threshold is greater than the second threshold, generating a second similarity threshold.

Wherein the second similarity threshold is less than the first similarity threshold;

setting a third category number of pedestrians included in the third image;

Determining a second class center of the image of each class in the third image based on the third class number; the target clustering center comprises a second clustering center;

Determining an image, of which the similarity with each second center of the polymer meets a second similarity threshold value, from the third image as a fourth image; wherein the target image comprises a fourth image;

and adjusting parameters of the third model based on the fourth image to obtain a second model.

In other embodiments of the present invention, the processor 31 is configured to execute the step of generating the second similarity threshold if the first similarity threshold is greater than the second threshold in the memory 32, by:

subtracting the second parameter from the first similarity threshold to obtain a second similarity threshold.

In other embodiments of the present invention, the processor 31 is configured to perform the adjustment of the parameters of the third model based on the fourth image in the memory 32, and when obtaining the second model, the adjustment may be implemented by the following steps:

Determining a first target set corresponding to a fifth image in the fourth image; wherein the first target set comprises a plurality of images in the fourth image, and the distance between the images and the fifth image is within a target distance range; wherein the fifth image is any image in the fourth image;

and adjusting parameters of the third model based on the second target set and the third target set to obtain a second model.

In other embodiments of the present invention, the processor 31 is configured to execute the adjustment of the parameters of the third model based on the second target set and the third target set in the memory 32, and when obtaining the second model, the adjustment may be implemented by the following steps:

Obtaining a first characteristic image with the farthest distance from a fifth image in the second target set;

Screening a second characteristic image from a third target set;

And if the distance between the first characteristic image and the fifth image is larger than the distance between the second characteristic image and the fifth image, adjusting parameters of the third model based on the first characteristic image, the second characteristic image and the fifth image to obtain a second model.

It should be noted that, in the specific implementation process of the steps executed by the processor in this embodiment, the implementation process in the pedestrian re-recognition method provided in the embodiment corresponding to fig. 1-2 may be referred to, and will not be described herein again.

Based on the foregoing embodiments, embodiments of the present invention provide a storage medium storing one or more programs executable by one or more processors to implement the steps of:

In other embodiments of the invention, the one or more programs may be executed by one or more processors to implement the steps of:

Setting a first category number of pedestrians included in the second image;

Acquiring a second category number of pedestrians included in the second image;

if the first similarity threshold is greater than the second threshold, generating a second similarity threshold; wherein the second similarity threshold is less than the first similarity threshold;

setting a third category number of pedestrians included in the third image;

Screening a second characteristic image from a third target set;

The above is merely an example of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. A method of pedestrian re-identification, the method comprising:

Obtaining an image to be queried containing a target pedestrian, and identifying the image to be queried based on the second model to obtain a target image containing the target pedestrian in the second image; wherein, the acquisition scene of the image to be queried is the same as the acquisition scene of the second image;

Wherein the adjusting the parameters of the first model based on the target image to obtain a second model associated with the second image includes:

Adjusting parameters of the first model based on a third image to obtain the second model; the third image is an image, wherein the similarity between the second image and each first clustering center accords with a first similarity threshold value;

the adjusting the parameters of the first model based on the third image to obtain the second model includes:

Adjusting parameters of the third model based on a fourth image to obtain the second model; the fourth image is an image, in the third image, of which the similarity with each second center of the polymer meets a second similarity threshold; wherein the second similarity threshold is less than the first similarity threshold.

2. The method of claim 1, wherein the obtaining a first model associated with a first image comprising a first set of images comprises:

3. The method according to claim 1, wherein the screening out the target images, which have a similarity with the target cluster center that meets the target threshold, from the second images that constitute the second image set, includes:

Setting a first category number of pedestrians included in the second image;

4. A method according to claim 3, wherein said setting a first category number of said second image comprises:

Acquiring a second category number of pedestrians included in the second image;

5. The method according to claim 3 or 4, wherein said adjusting parameters of the first model based on the third image to obtain the second model comprises:

If the first similarity threshold is larger than the second threshold, generating a second similarity threshold;

setting a third category number of pedestrians included in the third image;

6. The method of claim 5, wherein generating a second similarity threshold if the first similarity threshold is greater than a second threshold comprises:

7. The method of claim 5, wherein adjusting parameters of the third model based on the fourth image to obtain the second model comprises:

8. The method of claim 7, wherein adjusting parameters of the third model based on the second set of targets and the third set of targets to obtain the second model comprises:

Screening a second characteristic image from the third target set;

9. A terminal, the terminal comprising:

A memory for storing executable instructions;

A processor for executing executable instructions stored in the memory to implement the pedestrian re-identification method of any one of claims 1 to 8.

10. A storage medium storing executable instructions which, when executed, are adapted to cause a processor to perform the pedestrian re-recognition method of any one of claims 1 to 8.