CN113837262B

CN113837262B - Unsupervised pedestrian re-identification method, system, terminal and medium

Info

Publication number: CN113837262B
Application number: CN202111097831.1A
Authority: CN
Inventors: 杨华; 陈琳
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2021-09-18
Filing date: 2021-09-18
Publication date: 2023-10-27
Anticipated expiration: 2041-09-18
Also published as: CN113837262A

Abstract

The invention provides an unsupervised pedestrian re-identification method and system, based on a graph neural network, comprising the following steps: pedestrian characteristics are extracted as nodes, and a graph convolutional neural network is constructed according to neighborhood information; performing constraint training on the graph network by using prior loss from labeled source domain data and consistency loss from unlabeled target data; a progressive alternate update algorithm is designed for interaction between the graph network-based clustering module and the pedestrian re-recognition network to promote final performance. Meanwhile, a corresponding terminal and medium are also provided. The method has good robustness, can better cope with the defects of the distance-sensitive traditional clustering module, and can obtain more accurate pseudo tag generation, thereby helping the next pedestrian characteristic learning.

Description

Unsupervised pedestrian re-identification method, system, terminal and medium

Technical Field

The invention relates to the technical field of computer vision, in particular to an unsupervised pedestrian re-identification method, system, terminal and medium based on a graph neural network.

Background

The re-identification of pedestrians under a monitoring network is a very challenging problem, and the main task of the re-identification of pedestrians is to identify the same pedestrian under the monitoring of different cameras that do not overlap. How to extract features with sufficient discrimination in limited data is a critical challenge in pedestrian re-recognition technology. Because of the rising of deep learning, in recent years, a method of utilizing deep learning to design a network self-adaptive learning pedestrian feature expression is greatly applied, and particularly, a method of extracting the spatial domain feature of a pedestrian by using a deep convolutional network (CNN) achieves better effects, such as E.Ahmed, M.Jones, and T.K. marks.an improved deep learning architecture for person re-identification.In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3908-3916,2015.1,2. Deep learning has made unprecedented progress in the field of pedestrian recognition, which is fundamentally attributable to the expansion of data set capacity and the increase in computational resources.

The pedestrian re-identification task at the present stage is realized by virtue of supervised learning to a great extent, namely each sample is provided with a corresponding label, the corresponding characteristic of each label is continuously learned through a deep neural network, and finally classification is realized. In this case, the capacity of the dataset, the quality of the label, often play a decisive role in the performance of the model. However, high quality datasets present difficulty in labeling, and it is counted that it takes about 2 to 3 seconds to label individual object categories in individual images, but datasets in practical applications often contain thousands of pictures, and the overall labeling process becomes prohibitively lengthy. Especially when fine-grained classification and multi-label classification tasks are involved, the labeling cost increases exponentially with the number of targets and the difficulty of recognition. The training samples according to the unknown class (not labeled) solve various problems in recognition, known as unsupervised learning. Unsupervised learning is one of the most difficult and important problems in computer vision and machine learning today. Many researchers believe that learning from a large number of unlabeled samples can help break up the problems associated with intelligence and learning essence. Furthermore, since unlabeled samples are easy to collect at a low cost, unsupervised learning has practical utility in many computer vision and robotic applications. The design of an end-to-end deep learning system using unlabeled samples to address the problem of unsupervised pedestrian re-identification is a technical challenge to be solved.

The existing non-supervision classification method mainly adopts distance sensitive clustering cores, such as Zhun Zhong et al, "Invariance Matters: exemplar Memory for Domain Adaptive PersonRe-Identification". In: CVPR.2019, pp.598-607 and Qian Yang et al, "Patch-Based Discriminative Feature Learning for Unsupervised PersonRe-Identification". In: CVPR.2019, pp.3633-3642; the unlabeled samples can also be pseudo-labeled by K nearest neighbor and other methods to help feature learning, and meanwhile, the label information of the source domain dataset is mainly relied on for learning, for example, liangchen Song et al, "Unsupervised domain adaptive re-identification: theory andpractice". In: pattern Recognition102 (2020), p.107173). However, the distance-sensitive clustering scheme is often limited by manually set parameters, cannot handle bias between specific data sets, and the predicted labels are not accurate enough and have poor practical application performance. On the other hand, learning from only auxiliary source data may lead to a bias in the result due to the difference between the source domain data and the target domain data. Domain adaptation using useful information contained in the target unlabeled data is highly necessary. How to automatically learn and predict labels is a critical issue to be addressed.

The search finds that:

chinese patent application publication No. CN111898665a, "method for identifying pedestrians across domains based on neighbor sample information guidance", uses a pytorch framework to construct a network. The method focuses on important roles of neighbor sample information on sample feature updating, is based on a graph convolution neural network, and integrates common neighbor similarity. And the source domain data supervision training diagram convolution module is used for migrating the capability of integrating sample information to the target domain so as to help the target domain data clustering. Compared with the similar method, the graph convolution neural module provided by the method further utilizes the supervision information of the source domain, and further improves the cross-domain pedestrian re-identification performance. However, this method still has the following problems: the information of the labeled source domain data and the unlabeled target domain data is not fully utilized, and the deep constraint relation among nodes is not considered in the graph rolling neural network graph building process.

The Chinese patent application with publication number of CN111738090A (training method and device for pedestrian re-recognition model and method and device for pedestrian re-recognition) uses the convolution network of the pedestrian re-recognition model to extract the characteristics of the pedestrian image so as to obtain the original characteristics of the pedestrian image; processing the original features by using an attention module of the pedestrian re-recognition model to obtain a plurality of pedestrian local features; determining a similarity matrix between the local features of each pedestrian by using a graph neural network of the pedestrian re-recognition model, and adjusting the local features of each pedestrian according to the similarity matrix; and determining a pedestrian recognition result and training loss of the pedestrian re-recognition model based on the adjusted pedestrian local characteristics, and optimizing model parameters according to the training loss. According to the method, under the condition that additional labeling information is not required to be introduced, important pedestrian local features in the image can be automatically extracted, so that the final pedestrian local features have more discriminative power, and the model identification performance is improved. However, the method and the device still have the following problems: belonging to the field of supervised learning, the method is used for learning by completely depending on tag information of target data.

No description or report of similar technology is found at present, and similar data at home and abroad are not collected.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides an unsupervised pedestrian re-identification method, an unsupervised pedestrian re-identification system, an unsupervised pedestrian re-identification terminal and an unsupervised pedestrian re-identification medium based on a graph neural network.

According to one aspect of the present invention, there is provided an unsupervised pedestrian re-recognition method based on a graph neural network, including:

constructing a deep pedestrian feature learning network;

for an image s in the pedestrian data with the tag source domain and an image t in the pedestrian data without the tag target domain, extracting corresponding features x through the deep pedestrian feature learning network respectively _s And x _t ；

Extracting the characteristic x _s And x _t As graph nodes, constructing a graph convolution neural network;

acquiring domain information among the graph nodes, and constructing a graph among the graph nodes to obtain an initial adjacency matrix A;

updating the initial adjacent matrix by utilizing difficult mining to obtain an updated adjacent matrix;

training the graph node information of the graph rolling neural network concerning the erroneous connection by using the updated adjacency matrix, and embedding the characteristics of the graph rolling neural network;

adopting priori loss of a pedestrian data set with a label source domain and consistency loss on pedestrian data of a label-free target domain to restrict the graph convolution neural network to learn and update, and obtaining a trained graph convolution neural network;

predicting a pseudo tag on the pedestrian data of the target domain by using the trained graph convolution neural network, retraining the deep pedestrian feature learning network by using the predicted pseudo tag, and updating a feature extraction result;

and alternately optimizing the deep pedestrian characteristic learning network and the graph convolution neural network, so as to improve and extract pedestrian characteristic representation and obtain an unsupervised pedestrian re-identification result.

Preferably, the constructing the deep pedestrian feature learning network includes:

and training the ResNet50 characteristic learning network by taking the ResNet50 characteristic learning network as a main body and taking the triplet loss and the cross entropy loss as constraints, so as to construct the deep pedestrian characteristic learning network.

Preferably, the feature vector dimension extracted by the deep pedestrian feature learning network is 2048.

Preferably, the domain information between the nodes of the node graph is obtained through an h-hop nearest neighbor searching method, wherein h is {1,2}, and when h=1, the nearest neighbor searching number is 200; when h=2, the nearest search number is 10; the value of the initial adjacency matrix A is 1 or 0, wherein 1 represents that two nodes are connected and 0 represents that the nodes are not connected.

Preferably, the updating the initial adjacency matrix by using difficult-case mining to obtain an updated adjacency matrix includes:

and updating the value of the initial adjacency matrix through difficult mining to obtain an updated adjacency matrix, wherein the updated adjacency matrix is used for training the graph roll-up neural network to learn the relation among the graph nodes which are concerned with the erroneous connection and learning feature embedding.

Preferably, when the difficult cases are mined and then put in the concerned error connection, the value of the corresponding updated adjacent matrix is 2.

Preferably, the weight ratio of the a priori loss and the consistency loss is 2:1.

According to another aspect of the present invention, there is provided an unsupervised pedestrian re-recognition system comprising:

the deep pedestrian feature learning network module constructs a deep pedestrian feature learning network and extracts corresponding features x for an image s in the pedestrian data in the labeled source domain and an image t in the pedestrian data in the unlabeled target domain respectively _s And x _t ；

The image neural network module is used for extracting the characteristics x extracted by the deep pedestrian characteristic learning network module _s And x _t Constructing a graph convolutional neural network as a graph node; acquiring domain information among the graph nodes, and constructing a graph among the graph nodes to obtain an initial adjacency matrix A; updating the initial adjacent matrix by utilizing difficult mining to obtain an updated adjacent matrix; training the graph node information of the graph convolution neural network, which is concerned with the erroneous connection, and the characteristic embedding of the graph convolution neural network by utilizing the updated adjacency matrix, and predicting a pseudo tag on the pedestrian data of the target domain;

the optimization module is used for restraining the graph convolution neural network to learn and update by adopting prior loss of the pedestrian data set in the labeled source domain and consistency loss on the pedestrian data in the unlabeled target domain; and training the deep pedestrian characteristic learning network by using the pseudo tag on the target domain pedestrian data predicted by the updated graph convolution neural network.

According to a third aspect of the present invention there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform the method of any one of the preceding claims or to run the system of the preceding claims.

According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor is operable to perform a method of any of the above, or to run a system as described above.

Due to the adoption of the technical scheme, compared with the prior art, the invention has at least one of the following beneficial effects:

the unsupervised pedestrian re-identification method, the unsupervised pedestrian re-identification system, the unsupervised pedestrian re-identification terminal and the unsupervised pedestrian re-identification medium adopt a four-layer graph convolution neural network with a weighting matrix W, and the context information of the feature nodes is utilized for learning graph embedding for self-adaptive pseudo tag prediction.

According to the unsupervised pedestrian re-identification method, the unsupervised pedestrian re-identification system, the unsupervised pedestrian re-identification terminal and the unsupervised pedestrian re-identification medium, the graph rolling network is enhanced through the difficult-to-case mining strategy, and therefore the relation among the nodes is better learned.

The unsupervised pedestrian re-identification method, system, terminal and medium provided by the invention utilize complementary constraint to enable label prediction to benefit from label-like information provided by a labeled source data set and adapt to information of a target data set without labels.

The unsupervised pedestrian re-identification method, the unsupervised pedestrian re-identification system, the unsupervised pedestrian re-identification terminal and the unsupervised pedestrian re-identification medium adopt the method of alternately optimizing the labels and the characteristic learning network, so that the pedestrian identification rate in the next step is further improved.

The unsupervised pedestrian re-recognition method, the unsupervised pedestrian re-recognition system, the terminal and the medium can improve the false label prediction accuracy and the recognition performance in unsupervised pedestrian re-recognition.

The unsupervised pedestrian re-identification method, system, terminal and medium provided by the invention have good robustness, and can better cope with the defects of the distance-sensitive traditional clustering module, and more accurate pseudo tag generation is obtained, so that the pedestrian feature learning in the next step is facilitated.

According to the unsupervised pedestrian re-identification method, the unsupervised pedestrian re-identification system, the unsupervised pedestrian re-identification terminal and the unsupervised pedestrian re-identification medium, information derived from data and target domain data is comprehensively considered, difficult-case mining is performed to update the adjacency matrix, and network learning is assisted, so that a better effect is obtained.

The invention provides an unsupervised pedestrian re-identification method, an unsupervised pedestrian re-identification system, a terminal and a medium, belongs to the field of cross-domain unsupervised learning, does not need a label of target domain data, and can solve the problem of non-label learning.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a flowchart of an unsupervised pedestrian re-recognition method according to an embodiment of the present invention.

FIG. 2 is a flow chart of an unsupervised pedestrian re-identification method and a data flow chart according to a preferred embodiment of the present invention.

FIG. 3 is a schematic diagram of an unsupervised pedestrian re-recognition system according to an embodiment of the present invention.

Fig. 4 is a schematic diagram showing the working principle of the unsupervised pedestrian re-recognition method according to a preferred embodiment of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the attached drawings: the present embodiment is implemented on the premise of the technical scheme of the present invention, and a detailed implementation manner and a specific operation process are given, but the protection scope of the present invention is not limited to the following examples.

As shown in fig. 1, the unsupervised pedestrian re-identification method provided in this embodiment may include the following steps:

s100, constructing a deep pedestrian feature learning network;

s200, extracting corresponding features x through a deep pedestrian feature learning network respectively aiming at an image S in the pedestrian data in the labeled source domain and an image t in the pedestrian data in the unlabeled target domain _s And x _t ；

S300, extracting the characteristic x _s And x _t As graph nodes, constructing a graph convolution neural network;

s400, acquiring field information among graph nodes, and constructing a graph among the graph nodes to obtain an initial adjacency matrix A;

s500, updating the initial adjacent matrix A by using difficult case mining to obtain an updated adjacent matrix A;

s600, training the graph node information of the graph rolling neural network concerned with the erroneous connection and the characteristic embedding of the graph rolling neural network by utilizing the updated adjacent matrix A;

s700, learning and updating are carried out by adopting a priori loss of a pedestrian data set with a label source domain and a consistency loss constraint graph convolution neural network on pedestrian data without a label target domain, so as to obtain a trained graph convolution neural network;

s800, predicting pseudo labels on pedestrian data in a target domain by using the trained graph convolution neural network, retraining a deep pedestrian feature learning network by using the predicted pseudo labels, and updating feature extraction results;

and S900, alternately optimizing a deep pedestrian characteristic learning network and a graph convolution neural network, and further improving and extracting pedestrian characteristic representation to obtain an unsupervised pedestrian re-identification result.

In S100 of this embodiment, as a preferred embodiment, constructing the deep pedestrian feature learning network may include the steps of:

and training the ResNet50 characteristic learning network by taking the ResNet50 characteristic learning network as a main body and taking the triplet loss and the cross entropy loss as constraints, and constructing the deep pedestrian characteristic learning network.

In S100 of this embodiment, as a preferred embodiment, the feature vector dimension extracted by the deep pedestrian feature learning network constructed may be 2048.

In S400 of this embodiment, as a preferred embodiment, the domain information between nodes of the node map is obtained by an h-hop nearest neighbor search method, where h is {1,2}, and when h=1, the nearest neighbor search number is 200; when h=2, the nearest search number is 10; the value of the initial adjacency matrix a is 1 or 0, where 1 represents that two nodes are connected and 0 represents that no node is connected.

In S500 of this embodiment, as a preferred embodiment, the updating of the initial adjacency matrix a by using the difficult-to-find method to obtain the updated adjacency matrix a may include the following steps:

and updating the value of the initial adjacent matrix A through the difficult-to-find information to obtain an updated adjacent matrix A, wherein the updated adjacent matrix A is used for later training the graph convolution neural network to learn the relation among the graph nodes which pay attention to the erroneous connection and embedding learning features.

In S500 of this embodiment, as a preferred embodiment, when the erroneous connection of interest is found after the difficult-to-find, the value of the corresponding updated adjacency matrix a is 2. The larger the value of the adjacency matrix a, the higher the degree of attention.

In S700 of this embodiment, as a preferred embodiment, the weight ratio of the a priori loss to the consistency loss is 2:1.

According to the unsupervised pedestrian re-identification method provided by the embodiment of the invention, the graph neural network is adopted to replace the original distance-sensitive traditional clustering module, so that more accurate pseudo tag generation is obtained, and further the feature learning of the next step is facilitated. Information of the labeled source domain data set and the unlabeled target domain data set is considered simultaneously to constrain the graph convolutional neural network. Further, a progressive alternate update algorithm is designed for interaction between the graph network-based clustering module and the pedestrian re-recognition network, thereby improving the final performance.

In some embodiments of the invention:

firstly, a graph convolution neural network embedded in a learning graph is designed, and unmarked data is effectively clustered and pseudo-label predicted by utilizing the context information of feature points. Wherein the neighbor node information is aggregated by constructing a graph convolution; the two complementary constraints of a priori knowledge loss and triplet cross-consistency loss are utilized to simultaneously utilize the class label information provided by the labeled source dataset and the underlying structure information of the unlabeled target dataset. The label and feature learning network are alternately optimized to continuously promote final recognition performance in consideration of the interdependence between the pseudo label generation quality and feature learning.

Meanwhile, the context information of pedestrian data is considered, and the context information is efficiently fused in the graph convolution neural network, so that the extracted features are more differentiated and more robust; meanwhile, the prior information from the labeled data and the underlying structure information of the unlabeled target data set are combined to optimize the pseudo-label prediction network, so that the pseudo-label prediction network can play better roles and efficiency in the next feature learning.

Fig. 2 is a flowchart of an unsupervised pedestrian re-recognition method according to a preferred embodiment of the present invention, in which the specific processes and data flows of the first to fourth steps in the preferred embodiment are explained in detail.

As shown in fig. 2, the unsupervised pedestrian re-recognition method provided in the preferred embodiment may include the following steps:

the first step: and constructing a graph convolution neural network, and realizing the purpose of gathering context information to learn feature embedded expression for pseudo tag prediction of the unlabeled target domain pedestrian data.

The method comprises the following specific steps:

1. training a deep pedestrian feature learning network, and extracting nodes in a feature composition diagram of input data:

x _s ＝f(s)，s∈S，

x _t ＝f(t)，t∈T.

wherein S is labeled source domain data, T is unlabeled target domain data, x is an extracted feature, f is a feature extraction network, and the node set may be denoted as V.

2. Constructing a graph among nodes through neighborhood information. For a certain node p, its h-hop neighbor is used as the neighbor node N of the construction graph:

wherein,,for the corresponding semantic features, K is the number of neighbors, and h is the number of layers for which neighbors are selected. Then for node p its neighbor node set is denoted as:

wherein x is ^q To correspond to semantic features, V (x ^p ) Will be briefly denoted as V hereinafter ^p . The adjacency matrix a is expressed as:

3. updating the adjacent matrix A through difficult mining. The meaning is that for the erroneously connected node to be placed at a higher degree of interest, the final adjacency matrix is:

where y is the corresponding classification label, ω is the attention score of the refractory positive sample node, and v is the attention score of the refractory negative sample node.

4. The method comprises the steps of adopting a graph convolution neural network learning characteristic embedding with a weighting matrix W:

wherein,,is a characteristic expression of->Is the predicted output of the graph convolution neural network for the node, and sigma is a nonlinear activation function unit. When the semantic tags of two nodes are identical, the edges between the two nodes will be connected.

And a second step of: and introducing a loss function, and restricting a pseudo tag learning process of the JL-GCN.

The method comprises the following specific steps:

1. a priori knowledge constraints from the tagged source domain dataset are introduced, explicitly helping to distinguish pedestrian identities. The training process of the source domain dataset S is treated as a standard classification problem by cross entropy loss to optimize the graph convolution neural network:

wherein the superscript i is the picture number, the subscript s is the abbreviation of the data source,for the corresponding label->Is the network prediction output obtained according to the graph convolution network, N _s Is the image batch size.

2. A triple cross consistency penalty on unlabeled target domain data is introduced to penalize predicted label differences between original and augmented data in the target domain, thereby encouraging consistent predictions through the sample before and after the change. Given target domain data t and its corresponding data transformationIts corresponding graph neural network prediction outputs y and +.>Should be consistent, the consistency penalty is:

where m is a parameter, dist is the distance between the label prediction vectors, and superscript+ represents taking a non-negative value.

3. And combining the two losses to obtain a loss function of the final JL-GCN:

L _label ＝αL _prior +βL _tcc ，

where α and β are loss function weights.

And a third step of: unsupervised pedestrian re-recognition, and the pedestrian recognition network is retrained by using the predicted pseudo tag.

The method comprises the following specific steps:

1. based on the original data t and the augmentation dataInformation fusion is carried out on the target data according to the characteristics of the (2):

where x is the extracted feature.

2. And performing pseudo tag prediction on the target data based on the fused features.

Wherein,,predictions are made by the JL-GCN model and the recognition network is trained in the next step as pseudo tags for the target data t.

3. Triple loss is used to optimize Re-ID networks.

Wherein t is ^p And t ⁿ Is for the positive and negative sample pairs of the target data t, gamma is a parameter.

Fourth step: the Re-ID network R and the tag learning network JL-GCN are alternately optimized to improve the extraction of pedestrian feature representations.

The unsupervised pedestrian re-identification method provided by the preferred embodiment is based on the graph neural network, performs pseudo tag prediction and automatic feature learning end to end, is easy to reproduce, and has good applicability and popularization.

The technical scheme provided by the embodiment of the invention is further described in detail below in connection with a specific application example.

The image frames employed in this particular application example are from group monitoring videos in databases Market-1501 and DukeMTMC-reID (video for traffic surveillance).

The video sequences are described by "Liang Zheng et al," Scalable Person Re-identification: A Benchmark ". In: ICCV.2015, pp.1116-1124, "and" Zhedong Zheng, liang Zheng, and Yi Yang. "Unlabeled Samples Generated by GANImprove the Person Re-identification Baseline In Vitro". In: ICCV.2017, pp.3774-3782, "for pedestrian re-identification performance assessment.

The unsupervised pedestrian re-identification method related to the specific application example can comprise the following specific steps based on the graph neural network:

the first step: and constructing a graph convolution neural network to realize pseudo tag prediction of the pedestrian data in the unlabeled target domain.

The specific operation in the step is as follows:

step 1, extracting characteristics of input data to form nodes in a graph rolling network:

x _s ＝f(s)，s∈S，

x _t ＝f(t)，t∈T.

wherein S is labeled source domain data, T is unlabeled target domain data, x is extracted features, f is a feature extraction network, and the node set is denoted as V. In this specific application example, the feature vector dimension may be 2048.

And 2, constructing a graph among nodes through the neighborhood information. For a certain node p, its h-hop neighbor is used as the neighbor node N of the construction graph:

where K is the number of neighbors and h is the number of layers selecting neighbors. In this specific application example, h is {1,2}, and K is 200 when h=1; when h=2, K is 10. The set of its neighbor nodes for node p is denoted as:

wherein V (x) ^p ) Will be briefly denoted as V hereinafter ^p . The adjacency matrix a is expressed as:

step 3, performing difficult mining to update the adjacent matrix A, and setting a higher attention degree for the erroneously connected nodes, wherein the final adjacent matrix is as follows:

where y is the corresponding label, ω is the attention score of the refractory positive sample node, and v is the attention score of the refractory negative sample node. In this specific application example, ω and v are both 2.

Step 4, embedding the learning characteristics of the graph roll-up network by adopting a weighting matrix W:

wherein χ is the characteristic expression,is the predicted output of the model for the node, σ is the nonlinear activation function unit. In this specific application example, the number of layers of the graph roll-up network is 4, and σ is a non-linearly activated ReLU function unit.

And a second step of: and constructing a loss function, and restricting a pseudo tag learning process of the JL-GCN.

The specific operation in the step is as follows:

step 1, using a priori knowledge constraints from the tagged source domain dataset, explicitly helps to distinguish pedestrian identities, namely:

wherein,,is the network prediction output obtained according to the graph convolution network. In this particular application example, batch N _s The size is 32.

And 2, adopting the triplet cross consistency loss on the unlabeled target domain data to punish the predicted label difference between the original data and the expanded data in the target domain, thereby encouraging consistent prediction before and after the change through the sample. Given target domain data t and its corresponding data transformationIts corresponding graph neural network prediction outputs y and +.>Should be consistent, the consistency penalty is:

where m is a parameter, dist is the distance between the label prediction vectors, and superscript+ represents taking a non-negative value. In this specific application example, m is 0.3, and dist uses euclidean distance.

Step 3, combining the two losses to obtain a final JL-GCN loss function:

L _label ＝αL _prior +βL _tcc ，

where α and β are loss function weights. In this specific application, α is 2 and β is 1.

And a third step of: the pedestrian recognition network is retrained with the predicted pseudo tag.

The specific operation in the step is as follows:

step 1, based on the original data t and the expansion dataInformation fusion is carried out on the target data according to the characteristics of the (2):

where x is the extracted feature.

And 2, performing pseudo tag prediction on the target data based on the fused features.

Wherein,,predictions are made by the graph convolution neural network and the recognition network is trained in the next step to act as a pseudo tag for the target data t.

Step 3, using triplet loss to optimize Re-ID network:

wherein t is ^p And t ⁿ Is for the positive and negative sample pairs of the target data t, gamma is a parameter. In this particular application example, γ is 0.3.

Fourth step: the Re-ID network R and the tag learning network JL-GCN are alternately optimized to improve extraction of pedestrian feature representations, with optimization iterations being performed on the entire network.

Table 1 is a numerical comparison result of the final recognition accuracy of the performance obtained based on the technical scheme provided in the embodiment of the present invention. The other results for comparison are shown in order from top to bottom in comparison with the values of the results of the practice of the invention. Compared with the previous method, the method provided by the embodiment of the invention can better extract the characteristic with more distinguishing degree, can obtain good effect on different data sets, and obviously improves the recognition precision.

TABLE 1

Table 2 shows the intermediate results of the method according to the above examples of the invention, which are useful for illustrating the effectiveness of the technical solutions according to the invention. The seven rows of results respectively represent the performance of the characteristics obtained in each step for identification from top to bottom, are respectively the results of using a network trained on a labeled source domain data set and directly applying the network to a target data set; only selecting a reference network, namely, using a traditional non-parameterized distance-sensitive clustering algorithm to make a pseudo tag prediction result; constructing a result of pseudo label prediction of the graph neural network by adopting a basic composition method; the method provided by the embodiment of the invention only adopts the result of ternary consistency loss; the method provided by the embodiment of the invention only adopts the prior loss to carry out the constraint result; the method provided by the embodiment of the invention does not carry out difficult mining results; and final result comparison.

TABLE 2

It can be seen from table 2 that the method provided by the above embodiment of the present invention can indeed bring about improvement in performance, and can better automatically predict the pseudo tag. The modules comprise a composition method, a complementation loss constraint and difficult mining.

Fig. 3 is a schematic diagram of a component module of an unsupervised pedestrian re-recognition system according to an embodiment of the present invention.

As shown in fig. 3, the unsupervised pedestrian re-recognition system provided in this embodiment may include: the system comprises a deep pedestrian characteristic learning network module, a graph neural network module and an optimizing module; wherein:

The image neural network module is used for extracting the characteristics x extracted by the deep pedestrian characteristic learning network module _s And x _t Constructing a graph convolutional neural network as a graph node; acquiring domain information among graph nodes, and constructing a graph among the graph nodes to obtain an initial adjacency matrix A; updating the initial adjacent matrix by utilizing difficult mining to obtain an updated adjacent matrix; training the graph rolling neural network by using the updated adjacency matrix, focusing on the erroneously connected graph node information and characteristic embedding of the graph rolling neural network, and predicting a pseudo tag on the pedestrian data of the target domain;

the optimization module carries out learning updating by adopting a priori loss of a pedestrian data set in a labeled source domain and a consistency loss constraint graph convolution neural network on pedestrian data in an unlabeled target domain; and training the deep pedestrian characteristic learning network by using the pseudo tag on the target domain pedestrian data predicted by the updated graph convolution neural network.

An embodiment of the present invention provides a terminal, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the processor is configured to execute the method according to any one of the foregoing embodiments of the present invention or to run the system according to any one of the foregoing embodiments of the present invention when the processor executes the program.

Optionally, a memory for storing a program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

A processor for executing the computer program stored in the memory to implement the steps in the method according to the above embodiment. Reference may be made in particular to the description of the embodiments of the method described above.

The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.

According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor is operative to perform a method according to any of the above embodiments of the present invention or to run a system according to any of the above embodiments of the present invention.

The unsupervised pedestrian re-identification method, the unsupervised pedestrian re-identification system, the unsupervised pedestrian re-identification terminal and the unsupervised pedestrian re-identification medium provided by the embodiment of the invention combine the frames of the labels and the feature learning to solve the unsupervised field self-adaptive identification task. The embedding of the graph is learned by the graph convolution network JL-GCN in combination with the context information. The complementary constraints are employed to enable tag predictions to benefit from tag-like information provided by the tagged source dataset and to accommodate information of the untagged target dataset. Meanwhile, a difficult-case mining strategy is carried out to strengthen the JL-GCN model so as to better learn the relation among nodes, and label prediction is facilitated to effectively carry out cross-domain learning. The pseudo tag prediction and feature learning module is updated alternately, so that the pseudo tag prediction and feature learning module plays a better role and efficiency in the next pedestrian recognition step. The difference between the technical solution provided by the above embodiment of the present invention and the prior art is shown in fig. 4.

It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, etc. in the system, and those skilled in the art may refer to a technical solution of the method to implement the composition of the system, that is, the embodiment in the method may be understood as a preferred example of constructing the system, which is not described herein.

Those skilled in the art will appreciate that the invention provides a system and its individual devices that can be implemented entirely by logic programming of method steps, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., in addition to the system and its individual devices being implemented in pure computer readable program code. Therefore, the system and various devices thereof provided by the present invention may be considered as a hardware component, and the devices included therein for implementing various functions may also be considered as structures within the hardware component; means for achieving the various functions may also be considered as being either a software module that implements the method or a structure within a hardware component.

While the present invention has been described in detail through the foregoing description of the preferred embodiment, it should be understood that the foregoing description is not to be considered as limiting the invention. Many modifications and substitutions of the present invention will become apparent to those of ordinary skill in the art upon reading the foregoing. Accordingly, the scope of the invention should be limited only by the attached claims.

Claims

1. An unsupervised pedestrian re-recognition method, comprising:

constructing a deep pedestrian feature learning network;

acquiring domain information among the graph nodes, and constructing a graph among the graph nodes to obtain an initial adjacency matrix;

2. The unsupervised pedestrian re-recognition method according to claim 1, wherein the constructing a deep pedestrian feature learning network comprises:

3. The unsupervised pedestrian re-recognition method according to claim 2, wherein the feature vector dimension extracted by the deep pedestrian feature learning network is 2048.

4. The unsupervised pedestrian re-recognition method according to claim 1, wherein the domain information between the graph nodes is obtained by an h-hop nearest neighbor search method, wherein h is {1,2}, and when h=1, the nearest neighbor search number is 200; when h=2, the nearest search number is 10; the value of the initial adjacency matrix is 1 or 0, wherein 1 represents that two nodes are connected and 0 represents that the nodes are not connected.

5. The method of claim 1, wherein updating the initial adjacency matrix with difficult-to-find extraction results in an updated adjacency matrix, comprising:

6. The unsupervised pedestrian re-recognition method of claim 5, wherein the value of the corresponding updated adjacency matrix is 2 when the wrong connection is placed in focus after the difficult case is mined.

7. The unsupervised pedestrian re-recognition method of claim 1, wherein the weight ratio of the a priori loss to the consistency loss is 2:1.

8. An unsupervised pedestrian re-recognition system comprising:

deep pedestrian feature learning network module for constructing deep pedestrian feature learning network for pedestrian data with tag source domainRespectively extracting corresponding features x from the image s of the pedestrian data in the target domain without labels and the image t in the pedestrian data in the target domain without labels _s And x _t ；

The image neural network module is used for extracting the characteristics x extracted by the deep pedestrian characteristic learning network module _s And x _t Constructing a graph convolutional neural network as a graph node; acquiring domain information among the graph nodes, and constructing a graph among the graph nodes to obtain an initial adjacency matrix; updating the initial adjacent matrix by utilizing difficult mining to obtain an updated adjacent matrix; training the graph node information of the graph convolution neural network, which is concerned with the erroneous connection, and the characteristic embedding of the graph convolution neural network by utilizing the updated adjacency matrix, and predicting a pseudo tag on the pedestrian data of the target domain;

9. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any one of claims 1-7 or to run the system of claim 8 when the program is executed by the processor.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor is operative to perform the method of any one of claims 1-7 or to run the system of claim 8.