WO2022062419A1

WO2022062419A1 - Target re-identification method and system based on non-supervised pyramid similarity learning

Info

Publication number: WO2022062419A1
Application number: PCT/CN2021/092935
Authority: WO
Inventors: 董文会; 曲培树; 刘汉平; 唐延柯; 陈慧杰; 高迎; 张俊叶
Original assignee: 德州学院
Priority date: 2020-09-22
Filing date: 2021-05-11
Publication date: 2022-03-31
Also published as: CN112132014B; CN112132014A; NL2029214A; NL2029214B1

Abstract

A target re-identification method and system based on non-supervised pyramid similarity learning. The target re-identification method based on non-supervised pyramid similarity learning comprises: obtaining a sample image to be queried and a target scene domain image; and outputting a target image, matching the sample image to be queried, in a target scene domain via a target re-identification model, wherein a training and updating process of the target re-identification model comprises: performing non-supervised multiscale horizontal pyramid similarity learning on images of a source scene domain and the target scene domain; and automatically labeling a target scene domain sample image according to similarity and screening out a training sample to train and update an initial model so as to obtain the target re-identification model. By means of continuous iterative training and updating, the model is more and more adaptive to sample data in a target scene domain, and the accuracy of pedestrian target re-identification can be improved.

Description

Object Re-Identification Method and System Based on Unsupervised Pyramid Similarity Learning

technical field

The invention belongs to the field of target re-identification, and in particular relates to a target re-identification method and system based on unsupervised pyramid similarity learning.

Background technique

The statements in this section merely provide background information related to the present invention and do not necessarily constitute prior art.

The purpose of target re-identification is to compare and match the pedestrian target image to be searched with the pedestrian images obtained under different cameras, and to find out whether the target pedestrian appears in different camera monitoring scenes. This technology plays an important role in intelligent surveillance and public safety. This problem has always been challenging in complex surveillance environments (such as illumination changes, objects occluded by other things, different surveillance perspectives, etc.).

Recently, object re-identification methods based on deep learning frameworks have achieved good performance, which can be divided into supervised deep object re-identification methods and unsupervised deep object re-identification methods. The supervised deep object re-identification method has a high recognition accuracy, but this method needs to label a large number of pedestrian objects in the monitoring scene, which will consume a lot of manpower and material resources. For different application scenarios, the method is not adaptive, and the data needs to be re-labeled. The unsupervised deep object re-identification method does not need to label the data in the surveillance scene, and its difficulty is how to effectively learn the pedestrian object model. Among this class of methods, the deep re-identification method based on unsupervised cross-domain learning has better performance. The deep re-identification method based on unsupervised cross-domain learning uses the labeled source scene domain data to train the deep learning framework to obtain the original model, and uses the unlabeled data to train the original model in the target scene domain, so that the model adapts to the target scene domain. data and obtain an accurate target model. Due to the difference between the source scene domain and the target scene domain, how to obtain a good adaptive model is a key problem to be solved by such methods. The current methods to solve this problem include: learning a target model with invariant features and adaptively updating it by aligning attributes and labels, and generating images in the target domain that are consistent with the style of the labeled images in the source scene as training samples through an adversarial network. Do adaptation or learn the inconsistency of similarity in different cameras, etc. The performance of these methods is still inferior to the corresponding supervised methods, and there are still problems in building models and migration algorithms. Most of them use the overall feature model, and the performance will drop significantly when the target is occluded or the monitoring perspective changes.

To sum up, the inventor found that the target model constructed by the current target re-identification method is inaccurate, and the target model is not suitable for the characteristics of unlabeled samples.

SUMMARY OF THE INVENTION

In order to solve the above problems, the present invention provides a target re-identification method and system based on unsupervised pyramid similarity learning, which can classify and identify feature blocks of different scales by means of unsupervised clustering, and screen out valid data samples. The initial model is trained and updated, and through continuous iterative training and updating, the model is more and more adapted to the sample data in the target scene domain, which can improve the accuracy of pedestrian target re-identification.

In order to achieve the above object, the present invention adopts the following technical solutions:

A first aspect of the present invention provides an object re-identification method based on unsupervised pyramid similarity learning.

An object re-identification method based on unsupervised pyramid similarity learning, including:

Obtain the sample image to be queried and the target scene domain image;

The target image that matches the sample image to be queried in the target scene domain is output through the target re-identification model;

Among them, the training and update process of the target re-identification model is as follows:

Unsupervised multi-scale horizontal pyramid similarity learning for source scene domain and target scene domain images;

According to the similarity, the sample images of the target scene domain are automatically marked and the training samples are selected to train and update the initial model, and obtain the target re-identification model.

A second aspect of the present invention provides an object re-identification system based on unsupervised pyramid similarity learning.

An object re-identification system based on unsupervised pyramid similarity learning, including:

an image acquisition module, which is used to acquire the sample image to be queried and the target scene domain image;

A target re-identification module, which is used for outputting a target image matching the sample image to be queried in the target scene domain through the target re-identification model;

A third aspect of the present invention provides a computer-readable storage medium.

A computer-readable storage medium on which a computer program is stored, when the program is executed by a processor, implements the steps in the above-mentioned object re-identification method based on unsupervised pyramid similarity learning.

A fourth aspect of the present invention provides a computer apparatus.

A computer device, comprising a memory, a processor and a computer program stored on the memory and running on the processor, the processor implements the above-mentioned goal of learning based on unsupervised pyramid similarity when the processor executes the program Steps in the re-identification method.

Compared with the prior art, the beneficial effects of the present invention are:

The multi-scale pyramid feature block of the present invention is simple and general, can fully describe the sample features from the whole to the local, and fully mine the identification information of the samples.

The present invention integrates multi-scale pyramid similarity learning into the unsupervised deep convolutional neural network, constructs a multi-scale feature depth model to learn the characteristics of unidentified samples, and the model comprehensively learns the similarity between different samples and different scale feature blocks , with stable and robust characteristics.

The present invention designs a function to measure the similarity between the source scene domain and the target scene domain and the similarity distance between samples in the target scene domain in the transfer learning process. On this basis, each scale feature block uses DBSCAN clustering to realize automatic labeling and filter. The samples screened by this method are more conducive to the transfer and adaptation of the model, resulting in better performance.

Description of drawings

The accompanying drawings forming a part of the present invention are used to provide further understanding of the present invention, and the exemplary embodiments of the present invention and their descriptions are used to explain the present invention, and do not constitute an improper limitation of the present invention.

1 is a flowchart of a target re-identification method based on unsupervised pyramid similarity learning according to an embodiment of the present invention;

FIG. 2 is a framework diagram of an initial model deep convolutional neural network according to an embodiment of the present invention

Fig. 3 is the multi-scale pyramid feature block flow chart of the embodiment of the present invention;

4 is a framework diagram of an adaptive transfer learning according to an embodiment of the present invention;

5 is a graph of the Rank-1 recognition accuracy rate corresponding to different scales according to an embodiment of the present invention;

6 is a graph of the Rank-1 recognition accuracy rate corresponding to different parameters β∈[0,1] according to an embodiment of the present invention;

FIG. 7 is a Rank-1 recognition accuracy curve diagram corresponding to different parameters p according to an embodiment of the present invention.

detailed description

The present invention will be further described below with reference to the accompanying drawings and embodiments.

It should be noted that the following detailed description is exemplary and intended to provide further explanation of the invention. Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It should be noted that the terminology used herein is for the purpose of describing specific embodiments only, and is not intended to limit the exemplary embodiments according to the present invention. As used herein, unless the context clearly dictates otherwise, the singular is intended to include the plural as well, furthermore, it is to be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates that There are features, steps, operations, devices, components and/or combinations thereof.

Example 1

As shown in FIG. 1 , the target re-identification method based on unsupervised pyramid similarity learning in this embodiment includes:

Step 1: Obtain the sample image to be queried and the target scene domain image;

Step 2: output the target image matching the sample image to be queried in the target scene domain through the target re-identification model;

The labeled and filtered samples are used to continue training the model. After several iterations of training, the updated model will be more suitable for the target scene area, thereby obtaining a higher target re-identification accuracy.

In a specific implementation, the initial model is to provide experience for the early learning of unidentified samples in the target scene domain, and to improve the accuracy of the preliminary learning. The initial model is obtained by training a deep convolutional neural network built on labeled samples in the source scene domain.

A specific example of the initial model of this embodiment is shown in FIG. 2 , and the initial model is a modified ResNet-50 deep convolutional neural network.

It should be noted here that, in other embodiments, the initial model may also be implemented by using other existing deep convolutional neural network models, which will not be described in detail here.

The following is an example of the modified ResNet-50 deep convolutional neural network:

The specific transformation is:

Retain the first four layers of ResNet-50, add a uniform pooling layer and two fully connected layers FC1 and FC2. The output dimension of FC1 is 2048, and the output dimension of FC2 is the number of actual entities.

The loss function is designed as a combination of the cross entropy loss function (cross entropy loss) and the triplet loss function (triplet loss), the triplet loss function is used in the first fully connected layer, and the cross is used in the second fully connected layer. Entropy loss function. The combination of the two loss functions will give full play to the advantages of both classification and validation methods.

The triplet loss function (triplet loss) adopts batch-hard triplet loss, and each mini-batch is constructed by randomly sampling K sample instances of P target entities, which are defined as follows:

in,

is the feature of the selected sample;

for and

The sample features with the same label,

and

The sample features with inconsistent labels, m is the edge parameter.

The cross entropy loss function is defined as:

in,

are the actual label and the predicted label, respectively, and _lCE is the cross-entropy loss of the sample.

The loss function L _source used for training in the source scene domain is the superposition of formulas (1) and (2).

L _source = L _triplet + L _CE (3)

Taking the market1501 public database for training as an example, the number of pedestrians in the database is 750, and the output dimension of FC2 is 750. The loss functions used in the training process are the cross-entropy loss function and the triplet loss function.

Unsupervised multi-scale pyramid similarity learning is:

Unsupervised multi-scale similarity learning is used to mine the similarity at multiple scales between samples in the target scene domain and the source scene domain samples as well as between samples in the target scene domain. The similarity learning between the target scene domain samples and the source scene domain samples is mainly to mine the similarity between the source scene and the target scene domain. The similarity mining helps the migration of the initial model to the target scene domain, especially the initial learning stage. The purpose of learning the similarity between samples in the target scene domain is to mine the similarity between samples and provide a basis for the automatic labeling of samples in the target domain.

The specific scheme of unsupervised multi-scale pyramid similarity learning is as follows:

Set the sample image of the jth target scene domain

The feature map obtained after inputting the initial model is

According to the set scale parameter σ, the feature map is evenly divided horizontally into 2 ^σ blocks. After uniform pooling of each block, the feature set can be obtained.

The multi-scale pyramid is embodied in: set σ=σ ₀ , then the scale parameter set can be set as the set of all positive integers less than σ ₀ {0, 1, ..., σ ₀ }, then for the feature map,

The final multi-scale pyramid feature set obtained is

The set contains features of different scales from the whole (scale parameter is 0) to ^2σ local features, which can fully express the characteristics of the image.

The similarity between the target scene domain samples and the source scene domain samples is defined as:

in,

Sample features for the target scene domain

Nearest neighbor samples in the source scene domain.

The smaller the value, the closer the sample is to the source scene domain. Using the formula (4) to calculate the similarity between the corresponding block features in the source scene and the target scene domain, the similarity between the two different scene domains can be fully analyzed.

In order to more accurately realize the similarity learning between samples in the target scene domain, this scheme uses the context of each sample to describe the corresponding samples, and the context description specifically uses K-reciprocal vector. sample

The K mutually adjacent sample vectors vi _,k are defined as: when the sample

is the sample

When the K are adjacent to each other

When the two are not K neighbors to each other, vi _,k =0.

The similarity between samples in the target scene domain is defined as:

in,

are two sample features in the target scene domain, v _{i, k} , v _{j, k} are the K mutually adjacent sample vectors of samples i and j respectively, and N _T is the total number of samples in the target scene domain. Using formula (5) for all sample feature blocks, the similarity corresponding to the corresponding block features can be calculated.

FIG. 3 is a flowchart of an embodiment of multi-scale pyramid feature segmentation. Specifically, the feature map is divided into 2 ^σ blocks uniformly according to the scale parameter σ. Multi-scale is reflected in the use of multiple scales to block the feature map. The scale used in Figure 3 is {0, 1, 2, 3}, and the feature map is finally decomposed into {1, 2, 4, 8} blocks, These feature blocks are uniformly pooled to form multi-scale pyramid features.

In the process of automatic labeling of samples in the target scene domain and screening of training samples:

Sample labeling and sample screening are mainly used to train the model, and training the model with accurately labeled and appropriate samples helps to obtain a higher recognition accuracy.

The automatic labeling and screening scheme of samples is as follows: the unsupervised clustering algorithm DBSCAN is used to cluster and assign pseudo-labels to the sample sets of different scales. The distance standard used in DBSCAN clustering is the combination of the two distances of formula (4) and (5), specifically:

in,

is the k-th pyramid feature block of the target scene sample, and β∈[0, 1] is the balance parameter.

In order to screen the data samples, all the distances of the samples calculated by formula (6) are sorted from small to large, and the scanning radius ε of the DBSCAN clustering algorithm is set as the mean value of the first pN distances. where p is the scale factor and N is the total number of sample pairs in the target scene. Only samples within the scan radius will be selected.

During model training and updating:

The training and updating of the model is used to realize the migration of the model from the source scene domain to the target scene domain. The trained and updated model will be more suitable for the target scene domain, so that it has good performance.

The loss function used in the training of the target scene domain is to calculate all the pyramid feature blocks as independent individuals, respectively substitute them into formula (3) and calculate the cumulative sum:

The specific process of adaptive transfer learning in the target scene domain is shown in Figure 4. All samples obtain multi-scale pyramid features according to the process in Figure 3, and then use the DBSCAN unsupervised clustering algorithm for labeling and screening. The distance standard of DBSCAN clustering is determined by the formula (6) Obtained by calculation. After the distances calculated by formula (6) are sorted from small to large, the samples within the scanning radius ε are used for the training of adaptive transfer learning, and the rest will be excluded. The pyramid features of each scale need to be clustered by DBSCAN as an independent individual, that is, each sample will obtain annotations on multiple scales. The deep learning framework used in the adaptive transfer learning of the target scene domain is basically similar to the initial model in Figure 2. The difference is that the sample features of each scale will participate as independent individuals in the training process. Therefore, the loss function is formula (7) is Cumulative sum of loss functions at all scales.

The training and updating of the model adopts the method of multiple iterative training. In each iteration, the sample features are re-labeled, and the samples are re-labeled and screened. With the increase of the number of iterations, the model will gradually adapt to the target scene domain samples, so as to obtain accurate recognition. Rate. When re-identifying the target, input the query sample image into the model to obtain the matching target image, so as to realize the query purpose.

This is further illustrated by the following simulation:

The key parameter selection in the target re-identification method of this embodiment is simulated and calculated, including the scale parameter σ, the parameter β that fuses the similarity with the source scene and the similarity of the target scene domain sample in the distance standard calculation, and the ratio required to calculate ε. parameter p. The source scene domain image library used in the simulation is DukeMTMC-ReID, and the target scene domain image library is Market1501, both of which are commonly used libraries for public target re-identification. The simulation results can provide reference for the application of relevant technical personnel in specific cases.

Fig. 5 shows the recognition accuracy of the scheme Rank-1 of the present embodiment with different scale parameters σ. It can be seen that different scale parameters will obtain different recognition rates. The simulation results show that when σ=2, the corresponding parameter set is σ ={0,1,2} will have the highest recognition accuracy.

Figure 6 shows the recognition accuracy of Rank-1 corresponding to different parameters β. It can be seen from formula (6) that the role of β in the calculation of the distance standard is the weight ratio of the two similarities in similarity learning. It can be seen from the simulation results that when β=0.1, that is, the proportion of similarity with the source scene is 0.1, and the target The highest Rank-1 recognition accuracy will be obtained when the sample similarity in the scene domain accounts for 0.9.

Figure 7 shows the accuracy of Rank-1 recognition rates corresponding to different scale parameters p. In this embodiment, the scanning radius ε is set as the mean value of the first pN distances, where N is the number of sample pairs. Since the number of N is very large, the specific setting of p will have a great influence on the recognition accuracy. It can be seen from the simulation results that the recognition accuracy is the highest when p is set to 1.7×10 ^-3 .

Embodiment 2

The target re-identification system based on unsupervised pyramid similarity learning of this embodiment includes:

The modules of the target re-identification system based on unsupervised pyramid similarity learning in this embodiment correspond one-to-one with the steps in the target re-identification method based on unsupervised pyramid similarity learning in the first embodiment, and the specific implementation process is the same as the embodiment. Once mentioned, it will not be repeated here.

Embodiment 3

This embodiment provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method for target re-identification based on unsupervised pyramid similarity learning described in the first embodiment above. step.

Embodiment 4

This embodiment provides a computer device, including a memory, a processor, and a computer program stored in the memory and running on the processor, when the processor executes the program, the computer program based on the first embodiment described above is implemented. Steps in an object re-identification method for unsupervised pyramid similarity learning.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein, including but not limited to disk storage, optical storage, and the like.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each process and/or block in the flowchart illustrations and/or block diagrams, and combinations of processes and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flows of the flowcharts and/or the block or blocks of the block diagrams.

These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium. During execution, the processes of the embodiments of the above-mentioned methods may be included. The storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM) or the like.

The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. For those skilled in the art, the present invention may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention shall be included within the protection scope of the present invention.

Claims

A target re-identification method based on unsupervised pyramid similarity learning, characterized by comprising:

Obtain the sample image to be queried and the target scene domain image;

The target image that matches the sample image to be queried in the target scene domain is output through the target re-identification model;

Among them, the training and update process of the target re-identification model is as follows:

Unsupervised multi-scale horizontal pyramid similarity learning for source scene domain and target scene domain images;

According to the similarity, the sample images in the target scene domain are automatically marked and the training samples are selected to train and update the initial model, and obtain the target re-identification model.
The target re-identification method based on unsupervised pyramid similarity learning according to claim 1, wherein the initial model is obtained by training a deep convolutional neural network constructed by annotated samples in the source scene domain.
The target re-identification method based on unsupervised pyramid similarity learning according to claim 1, wherein in the process of unsupervised multi-scale horizontal pyramid similarity learning, the initial model is used to extract the target scene domain in the target area. The feature map of unidentified samples is divided into horizontal blocks of different scales, and the discriminative information of unidentified samples is mined by using features from global to different local areas.
The target re-identification method based on unsupervised pyramid similarity learning according to claim 1, wherein in the process of unsupervised multi-scale horizontal pyramid similarity learning, the target scene domain samples are similar to the source scene domain samples. The property can be expressed as the difference between 1 and the natural logarithm term, which is the natural logarithm of the negative distance between the target scene domain sample feature and its nearest neighbor sample in the source scene domain.
The target re-identification method based on unsupervised pyramid similarity learning according to claim 1, wherein in the process of unsupervised multi-scale horizontal pyramid similarity learning, the similarity between samples in the target scene domain is 1 and The difference of the ratio, the ratio is the ratio of the cumulative sum of the smaller of the adjacent sample vectors of any two samples K to the cumulative sum of the larger of the adjacent sample vectors of any two samples K.
The target re-identification method based on unsupervised pyramid similarity learning according to claim 1, characterized in that, in the process of automatically labeling and filtering out the training samples, the feature blocks of different scales are processed by means of unsupervised clustering. Classification and identification, and filter out valid data samples.
The target re-identification method based on unsupervised pyramid similarity learning according to claim 1, wherein in the training and updating process of the target re-identification model, a method of multiple iteration training is adopted, and each iteration is re-labeled The sample features are obtained, the samples are re-labeled and screened. With the increase of the number of iterations, the target re-identification model gradually adapts to the target scene domain samples.
A target re-identification system based on unsupervised pyramid similarity learning, characterized by comprising:

an image acquisition module, which is used to acquire the sample image to be queried and the target scene domain image;

A target re-identification module, which is used for outputting a target image matching the sample image to be queried in the target scene domain through the target re-identification model;

Among them, the training and update process of the target re-identification model is as follows:

Unsupervised multi-scale horizontal pyramid similarity learning for source scene domain and target scene domain images;

According to the similarity, the sample images in the target scene domain are automatically marked and the training samples are selected to train and update the initial model, and obtain the target re-identification model.
A computer-readable storage medium on which a computer program is stored, characterized in that, when the program is executed by a processor, the target repetition based on unsupervised pyramid similarity learning as described in any one of claims 1-7 is realized. Identify the steps in the method.
A computer device, comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor implements any one of claims 1-7 when executing the program Steps in the described object re-identification method based on unsupervised pyramid similarity learning.