NL2029214A

NL2029214A - Target re-indentification method and system based on non-supervised pyramid similarity learning

Info

Publication number: NL2029214A
Application number: NL2029214A
Authority: NL
Inventors: Liu Hanping; Tang Yanke; Chen Huijie; Zhang Junye; Dong Wenhui; Gao Ying; Qu Peishu
Original assignee: Univ Dezhou
Priority date: 2020-09-22
Filing date: 2021-09-21
Publication date: 2022-05-23
Also published as: NL2029214B1; CN112132014B; WO2022062419A1; CN112132014A

Abstract

The invention belongs to target re-identification; and provides a target re-identification method and system based on non-supervised pyramid similarity learning. The target re-identification method includes: obtaining a sample image to be queried and a target scene domain image; outputting a target image matched with the sample image to be queried in the target scene domain by a target re-identification model; wherein, a training and updating process of the target re-identification model is: performing non-supervised multi-scale horizontal pyramid similarity learning on a source scene domain and the target scene domain image; automatically labeling a target scene domain sample image according to a similarity and screening out a training sample to train and update an initial model to obtain the target re-identification model. Through continuous iterative training and updating; the model is increasingly adaptive to sample data in the target scene domain; and the accuracy of pedestrian target re-identification is improved.

Description

TARGET RE-INDENTIFICATION METHOD AND SYSTEM BASED ON NON-SUPERVISED PYRAMID SIMILARITY LEARNING Field of the Invention The present invention belongs to a field of target re-identification, and particularly relates to a target re-identification method and system based on non-supervised pyramid similarity learning.

Background of the Invention The statements in this section merely provide background information related to the present invention, and do not necessarily constitute prior art.

The purpose of target re-identification is to compare and match a pedestrian target image that needs to be searched with pedestrian images obtained from different cameras, and to find whether the target pedestrian appears in different camera surveillance scenes.

This technology plays an important role in intelligent monitoring and public safety.

In a complex monitoring environment (such as changes in lighting, target blocked by other things, different monitoring perspectives, etc.), and this issue has always been challenging.

Recently, a target re-identification method based on the deep learning framework has achieved a better performance.

This type of method can be divided into supervised deep target re-identification methods and non-supervised deep target re-identification methods.

The supervised deep target re-identification method has a high identification accuracy rate, but this method needs to label a large number of pedestrian targets in a monitoring scene, which will consume a lot of manpower and material resources.

For different application scenarios, the method does not have adaptability, and the data needs to be relabeled.

The non-supervised deep target re-identification method does not need to label the data in the monitoring scene.

The difficulty is how to effectively learn the pedestrian target model.

Among these methods, the deep re-identification method based on non-supervised cross-domain learning has a better performance.

The deep re-identification method based on non-supervised cross-domain learning uses the labeled source scene domain data to train the deep learning framework to obtain an original model, and uses the unlabeled data to train the original model in the target scene domain, so that the model can self-adapt to the data of the target scene domain and obtain an accurate target model.

Due to a difference between the source scene .-

domain and the target scene domain, how to obtain a good adaptive model is a key problem to be solved in this kind of method. At present, methods to solve this problem include: learning the target model of invanant features and adaptively updating through the alignment of attributes and labels, and generating an image consistent with the labeled image style of the source scene in the target domain through a countermeasure network as a training sample for adaptation, or learning the inconsistency of similarity in different cameras, etc. These methods are still inferior in performance to the corresponding supervisory methods, and there are still problems in building models and migration algorithms. Most of them use the overall feature model, and when the target is blocked or the monitoring perspective 1s changed, the performance will be greatly reduced.

In summary, the inventor found that the target model constructed by the current target re-identification method is inaccurate, and the target model is not suitable for unlabeled sample characteristics.

Summary of the Invention In order to solve the above problems, the present invention provides a target re-identification method and system based on non-supervised pyramid similarity learning, which classifies and labels feature blocks with different scales through non-supervised clustering, and screens out effective data samples to train and update the initial model, and through continuous iterative training and updating, the model is more and more adaptive to sample data in the target scene domain, and the accuracy of pedestrian target re-identification can be improved.

In order to achieve the above purpose, the present invention adopts the following technical scheme: A first aspect of the present invention provides a target re-identification method based on non-supervised pyramid similarity learning.

A target re-identification method based on non-supervised pyramid similarity learning, includes: obtaining a sample image to be queried and a target scene domain image; outputting a target image matched with the sample image to be queried in the target scene domain by a target re-identification model; wherein, a training and updating process of the target re-identification model is: performing non-supervised multi-scale horizontal pyramid similarity learning on a source scene domain and the target scene domain image; 22 automatically labeling a target scene domain sample image according to a similarity and screening out a training sample to train and update an initial model to obtain the target re-identification model.

A second aspect of the present invention provides a target re-identification system based on non-supervised pyramid similarity learning.

A target re-identification system based on non-supervised pyramid similarity learning, includes: an image obtaining module, obtaining a sample image to be queried and a target scene domain image; a target re-identification module, outputting a target image matched with the sample image to be queried in the target scene domain by a target re-identification model; wherein, a training and updating process of the target re-identification model is: performing non-supervised multi-scale horizontal pyramid similarity learning on a source scene domain and the target scene domain image; automatically labeling a target scene domain sample image according to a similarity and screening out a training sample to train and update an initial model to obtain the target re-identification model.

A third aspect of the present invention provides a computer-readable storage medium.

A computer-readable storage medium, which stores a computer program, when the computer program is executed by a processor, the steps in the target re-identification method based on non-supervised pyramid similarity learning as described above are realized.

A fourth aspect of the present invention provides a computer-readable storage medium.

A computing device, which includes a memory, a processor, and a computer program that stored in the memory and operable on the processor, wherein the processor executes the computer program for implementing steps of the target re-identification method based on non-supervised pyramid similarity learning as described above.

Compared with the prior art, the beneficial effects of the present invention are: The multi-scale pyramid feature block of the present invention is simple and universal, which can fully describe sample feature from whole to parts, and fully mine identifying information of the sample. The present invention integrates the multi-scale pyramid similarity learning into non-supervised deep convolutional neural network, and constructs a multi-scale feature deep model to learn characteristics of unlabeled samples. The model comprehensively learns the similarity between different samples and feature blocks with different scales, and has the characteristics of stability and robustness.

The present invention designs a distance measurement function for measuring the similarity between the source scene domain and the target scene domain and the similarity between samples of the target scene domain in migration learning. On this basis, each scale feature block uses DBSCAN clustering to realize automatic sample labeling and screening. The samples screened by the method are more conducive to the migration and adaptation of the model, so as to obtain better IO performance. Brief Description of the Drawings The accompanying drawings of the specification constituting a part of the present invention are used to provide a further understanding of the present invention. The exemplary embodiments of the present invention and the description thereof are used to explain the present invention, and do not constitute an improper limitation of the present invention. FIG. 1 is a flowchart of a target re-identification method based on non-supervised pyramid similarity learning of an embodiment of the present invention; FIG. 2 is a frame diagram of a deep convolutional neural network of an initial model of an embodiment of the present invention FIG. 3 is a block flowchart of multi-scale pyramid feature of an embodiment of the present invention; FIG. 4 is a framework diagram of adaptive migration learning of an embodiment of the present invention; FIG. 5 is a graph of an identification accuracy of Rank-1 corresponding to different scales of an embodiment of the present invention; FIG. 6 is a graph of an identification accuracy of Rank-1 corresponding to different parameters B € [0,1] of an embodiment of the present invention; FIG. 7 is a graph of an identification accuracy of Rank-1 corresponding to different parameters p according to an embodiment of the present invention. -4-

Detailed Description of the Embodiments The present invention will be further described below in conjunction with the drawings and embodiments.

It should be pointed out that the following detailed descriptions are all illustrative and are intended to provide further descriptions of the present invention. Unless otherwise specified, all technical and scientific terms used herein have the same meaning as commonly understood by those of ordinary skill in the technical field to which the present invention belongs.

It should be noted that the terms used here are only for describing specific embodiments, and are not intended to limit the exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular form is also intended to include the plural form. In addition, it also should be understood that when the terms "comprising" and/or "including" are used in this specification, it indicates the presence of features, steps, operations, devices, components and / or combinations thereof. Embodiment 1 As shown in FIG. 1, the target re-identification method based on non-supervised pyramid similarity learning of the embodiment includes: Step 1: obtaining a sample image to be queried and a target scene domain image; Step 2: outputting a target image matched with the sample image to be queried in the target scene domain by a target re-identification model, wherein, a training and updating process of the target re-identification model is: performing non-supervised multi-scale horizontal pyramid similarity learning on a source scene domain and the target scene domain image; automatically labeling a target scene domain sample image according to a similarity and screening out a training sample to train and update an initial model to obtain the target re-identification model. A labeled and screened sample is used to continue training the model. After several iterations of training, the updated model will be more suitable for a target scene area, so as to obtain a higher target re-identification accuracy rate. In a specific implementation, the initial model is to provide experience for the early learning of 25 unlabeled samples of the target scene domain, and to improve the accuracy of initial learning. The initial model is obtained by a deep convolutional neural network constructed by training labeled samples in the source scene domain.

A specific embodiment of the initial model of the embodiment is shown in FIG. 2, and the initial model is a modified ResNet-50 deep convolutional neural network.

It should be noted here that in other embodiments, the initial model also can be implemented by other existing deep convolutional neural network models, which will not be described in detail here.

The following takes the modified ResNet-50 deep convolutional neural network as an example to illustrate: The specific transformation is: A first four layers of ResNet-50 are kept, a uniform pooling layer and two fully connected layers FC1 and FC2 are added. An output dimension of FCI is 2048, and an output dimension of FC2 is a number of actual entities.

A loss function is designed as a combination of a cross entropy loss function and a triple loss function. The triple loss function is used in the first full connection layer and the cross entropy loss function is used in the second full connection layer. The combination of the two loss functions will give full play to the advantages of two methods of classification and verification.

The triple loss function adopts batch-hard triple loss, and each small batch is constructed by randomly sampling K sample instances of P target entities, which is defined as follows: begi = a Baa [m + max [6-6], — min JE lb HZ Eon 3 (1) Wherein, fa is the feature of the selected sample; fp is the feature of the sample consistent with the label of fa , f is the feature of the sample inconsistent with the label of fa, and m is the edge parameter.

The cross entropy loss function is defined as: Lee = EL Sas lee (Ya 95) 2) -6-

Wherein, Vad : are the actual label and the predicted label respectively, and ey is the cross-entropy loss of the sample.

The loss function Lsource used in the source scene domain training is a superposition of formulas (1) and (2).

Lsouree = Luiplet + Lee (3) Taking the Market1501 public database for training as an example, a number of pedestrians in the database is 750, and the output dimension of FC2 is 750. The loss function used in the training process is the cross-entropy loss function and the triplet loss function.

The non-supervised multi-scale pyramid similarity learning is: The non-supervised multi-scale similarity learning is used to mine similarities between the target scene domain sample and the source scene domain sample and between the samples in the target scene domain on multiple scales. The similarity learning between the target scene domain sample and the source scene domain sample is mainly for mining a similarity between the source scene and the target scene domain. The similarity mining is helpful for the migration of the initial model to the target scene domain, especially an initial learning stage. The similarity learning between samples in the target scene domain is mainly to mine the similarity between samples and provide a basis for automatic labeling of samples in the target domain.

The specific scheme of non-supervised multi-scale pyramid similarity learning is as follows: Suppose that the feature map obtained after inputting a sample image Xp of a j-th target scene domain into the initial model is fr According to a set scale parameter o, the feature map is uniformly divided into 2° blocks, and after each block is uniformly pooled, a feature set fe can be obtained. The multi-scale pyramid is embodied in: if ¢ = 0; , a scale parameter set can be set to a set {0,1,:,09} of all positive integers less than o,, then for the feature map fl, a finally obtained multi-scale pyramid feature set is { £0, £ … £) oo } which contains features with different scales from the whole (scale parameter is 0) to 2° local features, it can fully describe the characteristics of the image.

The similarity between the target scene domain sample and the source scene domain sample is -7-

defined as: dst) = 1-0 inst) (4) Wherein, N,(f)) is the nearest neighbor sample of the target scene domain sample feature fl in the source scene domain. The smaller thed (£1), the closer the sample is to the source scene domain. Formula (4) is used to calculate the similarity between corresponding block features in the source scene and the target scene domain, and the similarity between the two different scene domains can be fully analyzed. In order to more accurately realize the similarity learning between samples in the target scene domain, the solution uses context of each sample to describe the corresponding sample, and the context description specifically uses a K-reciprocal vector. The K-reciprocal vector V of a sample fl. is defined as: when the sample fX is the K-reciprocal of the sample fi, Vik = elke and when the two are not K-reciprocal, Vik = 0.

The similarity between samples in the target scene domain is defined as: doth £1) = 1— Ze min (Vi Vj) (5) Ley Max Vik Vj x) Wherein, fi, fl are two sample features in the target scene domain, Vy, V;,x are K-reciprocal vector of samples i and j respectively, and Ny is a total number of samples in the target scene domain. For all sample feature blocks, the similarity corresponding to the corresponding block features can be calculated by using formula (5).

FIG. 3 is a flowchart of an embodiment of multi-scale pyramid feature block. Specifically, the feature map is divided into 2° blocks uniformly according to the scale parametero. The multi-scale feature is reflected in the use of multiple scales to block the feature map. For example, the scale used in FIG. 3 is {0, 1, 2,3}, and the feature map is finally decomposed into {1, 2, 4, 8} blocks, which are uniformly pooled to form multi-scale pyramid features.

In a process of automatic sample labeling of target scene domain and training sample screening: Sample labeling and sample screening are mainly used to train the model. Using accurate labeling and appropriate samples to train the model will help to obtain high identification accuracy.

The automatic labeling and screening scheme of samples is as follows: the non-supervised clustering algorithm DBSCAN is used to cluster block sample sets with different scales and assign -8-

pseudo label.

The distance standard used in DBSCAN clustering is a combination of formula (4) and (5), specifically: df fri) = (1 = Bde fr) + Bs (Fh) + ds(f) (6) Wherein, flo en is a k-th pyramid feature block of the target scene sample, and BE [0,1] are balance parameters.

In order to screen out data samples, all the samples calculated by formula (6) are sorted by distance from small to large, and a scan radius £ of the DBSCAN clustering algorithm is set to a mean value of the first n distances.

Wherein, p is a scale factor and N is a total number of sample pairs in the target scene.

Only samples within the scanning radius will be selected.

In the process of model training and updating: The training and updating of the model is used to realize the migration of the model from the source scene domain to the target scene domain.

The trained and updated model will be more suitable for the target scene domain, thus having good performance.

The loss function used for training in the target scene domain is to calculate all pyramid feature blocks as independent individuals, which are substituted into formula (3) respectively to calculate a cumulative sum: Larger = Zitzo Zizi Zhen Lusiptee (fro ho fai) + Lee (Van 9h) (7) The specific process of adaptive migration learning in the target scene domain is shown in FIG. 4. All samples obtain multi-scale pyramid features according to the process in FIG. 3, and then use DBSCAN non-supervised clustering algorithm for labeling and screening.

Screening of samples: after the distance calculated by a formula (6) is sorted from small to large, the samples within the scanning radius are used for the training of adaptive migration learning, and the rest will be excluded.

The pyramid features of each scale need to be DBSCAN clustered as independent individuals, that is, each sample will obtain labels in multiple scale ranges.

The deep learning framework used in the adaptive migration learning of the target scene domain is basically similar to the initial model in FIG. 2, and the difference is that the sample characteristics of each scale will participate as independent individuals in the training process.

Therefore, the loss function is formula (7), which is the cumulative sum of the loss functions on all scales.

The training and updating of the model adopt the multiple iterative training method.

Each iteration -9-

re-labels and obtains the sample characteristics, the samples are re-labeled and screened out, and as a number of iterations increases, the target re-identification model gradually adapts to target scene domain samples, so as to obtain an accurate identification rate. During target re identification, the matched target image can be obtained by inputting the query sample image into the model, so as to achieve the query purpose.

It is further illustrated by the following simulation: The key parameter selection in the target re-identification method of the embodiment is simulated and calculated, including the scale parameter o, the parameter B integrating the source scene similarity and the target scene domain sample similarity in the distance standard calculation, and the scale parameter p required to calculate e. A source scene domain image library used in the simulation is DukeMTMC-RelD and the target scene domain image library is market] 501. Both of them are common libraries for public target re-identification. The simulation results can provide reference for the application of relevant technicians in specific cases.

FIG. 5 shows an identification accuracy of the scheme rank-1 of the embodiment under different scale parameters o. It can be seen that different recognition rates will be obtained according to different scale parameters. The simulation results show that when o= 2, that is, when a corresponding parameter set is o= {0,1,2}, the highest identification accuracy will be achieved.

FIG. 6 shows an identification accuracy of Rank-1 corresponding to different parameters B. It can be seen from the calculation of distance standard in formula (6) that the role of B is a weight proportion of two similarities in similarity learning. The simulation result shows that when B = 0.1, that is, the proportion of source scene similarity is 0.1, and when the target scene domain sample similarity is 0.9, the highest identification accuracy Rank-1 will be obtained.

FIG. 7 shows an identification accuracy of Rank-1 corresponding to different parameters p. In the embodiment, the scanning radius € is set as the average of the first pN distances, wherein n is a number of sample pairs. Due to the large number of N, the specific setting of p will have a great impact on the identification accuracy. The simulation results show that when p is set to 1.7 x 1073, the identification accuracy is the highest.

Embodiment 2 The target re-identification system based on non-supervised pyramid similarity learning of the present embodiment includes: -10-

an image obtaining module, for obtaining a sample image to be queried and a target scene domain image; a target re-identification module, for outputting a target image matched with the sample image to be queried in the target scene domain by a target re-identification model; wherein, a training and updating process of the target re-identification model is: performing non-supervised multi-scale horizontal pyramid similarity learning on a source scene domain and the target scene domain image; automatically labeling a target scene domain sample image according to a similarity and screening out a training sample to train and update an initial model to obtain the target re-identification model.

Each module of the target re-identification system based on non-supervised pyramid similarity learning in the embodiment corresponds to the steps in the target re-identification method based on non-supervised pyramid similarity learning in the first embodiment one by one. The specific implementation process is described in the first embodiment and will not be described here.

Embodiment 3 The embodiment provides a computer-readable storage medium, which stores a computer program, when the computer program is executed by a processor, the steps in the target re-identification method based on non-supervised pyramid similarity learning as described in the first embodiment above are realized.

Embodiment 4 The embodiment provides a computing device, which includes a memory, 4 processor, and a computer program that stored in the memory and operable on the processor, wherein the processor executes the computer program for implementing steps of the target re-identification method based on non-supervised pyramid similarity learning as described in the first embodiment.

Those skilled in the art should understand that the embodiments of the present invention can be provided as a method, a system, or a computer program product. Therefore, the present invention may adopt the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware. Moreover, the present invention may adopt the form of a computer program product implemented on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) containing computer-usable program codes.

-11-

The present invention is described with reference to flowcharts and/or block diagrams of methods, devices (systems), and computer program products according to embodiments of the present invention. It should be understood that each process and/or block in the flowchart and/or block diagram, and the combination of processes and/or blocks in the flowchart and/or block diagram can be implemented by computer program instructions. These computer program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, an embedded processor, or other programmable data processing equipment to generate a machine, so that instructions executed by the processor of the computer or other programmable data processing equipment are caused to generate means for implementing the functions specified in one or more IO processes in the flowchart and/or one block or more in the block diagram. These computer program instructions can also be stored in a computer-readable memory that can direct a computer or other programmable data processing equipment to work in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction device. The device implements the functions specified in one process or multiple processes in the flowchart and/or one block or multiple blocks in the block diagram. These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operation steps are executed on the computer or other programmable equipment to produce computer-implemented processing, thus, the instructions executed on the computer or other programmable devices provide steps for implementing the functions specified in one or more processes in the flowchart and/or one block or more in the block diagram. Those of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through a computer program. The program can be stored in a computer readable storage medium. During execution, it may include the procedures of the above-mentioned method embodiments. Wherein, the storage medium may be a magnetic disk, an optical disc, a read-only memory (ROM), or a random access memory (RAM), etc. The foregoing descriptions are only preferred embodiments of the present invention and are not used to limit the present invention. For those skilled in the art, the present invention can have various modifications and changes. Any modification, equivalent replacement, improvement, etc.

12.

made within the spirit and principle of the present invention should be included in the protection scope of the present invention. -13-

Claims

Conclusions: I. A method for re-identifying a deol based on unsupervised pyramid similarity learning, comprising: obtaining a retrievable sample image and a target scene domain image; outputting a target image corresponding to the retrievable sample image in the target scene domain by a target re-identification model; wherein a training and updating process of the target re-identification model is: performing unsupervised multiscale horizontal pyramid similarity learning on a source scene domain and the target scene domain image, automatically labeling a sample image of a target scene domain by a similarity and sorting out a training sample to train an initial model and update to obtain the target re-identification model.

The method of target re-identification based on unsupervised pyramid similarity learning according to claim 1, wherein the initial model is obtained by training a deep convolutional neural network constructed by labeled examples in the source scene domain.

The method for re-identifying a target based on unsupervised pyramid similarity learning according to claim 1 or 2, wherein in the method step of unsupervised learning of multi-scale horizontal pyramid similarities, a feature map of an unidentified example in the target scene domain is extracted from extracting a target area and dividing the feature map into horizontal blocks of different scales, and mining identifying information from the unidentified example by features from global to different parts.

The method of target re-identification based on unsupervised pyramid similarity learning according to claim 1, 2 or 3, wherein in the method step of unsupervised learning of multi-scale horizontal pyramid similarities a -14-

similarity between a target scene domain example and a source scene domain example can be expressed as a difference between 1 and a natural logarithm term, and the natural logarithm term is a natural logarithm after a distance between the characteristic of the example in the target scene domain and the nearest neighbor example in the source scene domain is negative. A target re-identification method based on unsupervised pyramid similarity learning according to any preceding claim, wherein in the method step of unsupervised multi-scale pyramid matching horizontal pyramid matching, a match between examples in the target scene domain is a difference between 1 and a ratio K, and the ratio is a ratio of a sum of the smaller of adjacent example vectors of any two examples K to the greater of the adjacent example vectors of any two examples K.

A target re-identification method based on unsupervised pyramid similarity learning according to any preceding claim, wherein the method step of automatically labeling and sorting training samples, classifying and labeling function blocks of different scales by unsupervised clustering and sorting of effective data samples.

The method of target re-identification based on unsupervised pyramid similarity learning according to any preceding claim, wherein the method of training and updating the target re-identification model, applying multiple iterative training methods, re-labelling and obtaining sample functions for each iteration, relabeling and sorting samples, and as a number of iterations increases, gradually adapting the target identification model to the target scene includes domain samples.

8. A target re-identification system based on unsupervised pyramid similarity learning, containing: -15-

an image acquisition module for obtaining a sample image to be retrieved and a target scene domain image; a target re-identification module for outputting a target image corresponding to the sample image to be requested in the target scene domain by a target re-identification model; wherein a training and updating process of the target re-identification model is: performing unsupervised multiscale horizontal pyramid similarity learning on a source scene domain and the target scene domain image; automatically labeling a sample image of a target scene domain according to a similarity and sorting out a training sample to train and update an initial model to obtain the target re-identification model.

A computer-readable storage medium that stores a computer program, which, when the computer program is executed by a processor, performs the steps in the method for reidentifying the target based on unsupervised pyramid similarity learning according to any one of claims 1 to 7.

A computing device comprising a memory, a processor and a computer program stored in the memory and operating on the processor, the processor executing the computer program for implementing steps of the target reidentification method based on unsupervised pyramid similarity learning according to a of claims 1 to 7. 16 -