CN115830548B - Self-adaptive pedestrian re-identification method based on non-supervision multi-field fusion - Google Patents
Self-adaptive pedestrian re-identification method based on non-supervision multi-field fusion Download PDFInfo
- Publication number
- CN115830548B CN115830548B CN202310125639.1A CN202310125639A CN115830548B CN 115830548 B CN115830548 B CN 115830548B CN 202310125639 A CN202310125639 A CN 202310125639A CN 115830548 B CN115830548 B CN 115830548B
- Authority
- CN
- China
- Prior art keywords
- domain
- complex
- representing
- model
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 230000004927 fusion Effects 0.000 title claims abstract description 33
- 238000012549 training Methods 0.000 claims abstract description 73
- 238000012806 monitoring device Methods 0.000 claims abstract description 11
- 238000013527 convolutional neural network Methods 0.000 claims description 27
- 239000011159 matrix material Substances 0.000 claims description 23
- 238000004364 calculation method Methods 0.000 claims description 16
- 230000007246 mechanism Effects 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 6
- 238000012544 monitoring process Methods 0.000 claims description 6
- 230000000903 blocking effect Effects 0.000 claims description 4
- 238000010586 diagram Methods 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 230000005012 migration Effects 0.000 claims description 3
- 238000013508 migration Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 238000012546 transfer Methods 0.000 claims description 2
- 238000005457 optimization Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Image Analysis (AREA)
Abstract
The invention provides an unsupervised multi-field fusion self-adaptive pedestrian re-recognition method, which can pretrain a Transformer based on labeled monitored pedestrian pictures of a plurality of monitoring devices at different places in different time periods, and the trained network performs secondary training on the non-labeled monitored pedestrian pictures, so that the aim of finding the same pedestrian at different time of different devices is fulfilled, and the method is not limited by different devices and different places.
Description
Technical Field
The invention belongs to the technical field of pedestrian re-identification, and particularly relates to an unsupervised multi-field fusion self-adaptive pedestrian re-identification method.
Background
The purpose of pedestrian re-recognition is to associate specific objects in different scenes and camera views, and an important component of the method is to extract robust features and distinguishing features, which have been dominated by a CNN-based method for a long time, but the CNN-based method mainly focuses on smaller distinguishing areas and downsampling operations (pooling and convolution steps) adopted by the CNN-based method reduce the spatial resolution of an output feature map, so that the distinguishing capability of objects with similar appearances is greatly affected. Most attention mechanism-based methods are embedded in deep layers, more preferring larger continuous areas, and it is difficult to extract multiple diverse discriminable areas. Most of the existing methods are limited by different devices and different places, the application range is limited, and the matching accuracy is low.
Disclosure of Invention
In view of the above, the invention provides an unsupervised multi-field fusion self-adaptive pedestrian re-identification method, which achieves the purpose of finding the same pedestrian at different time spans different devices, is not limited by different devices and different places, and has a larger application range and higher matching accuracy.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
an unsupervised multi-domain fusion self-adaptive pedestrian re-identification method comprises the following steps:
step 1: taking the pictures of monitoring pedestrians of a plurality of monitoring devices at different places in different time periods as inputs of a transducer network and a DBSCAN network, and dividing the pictures into a source domain and a target domain according to the number of tag information;
step 2: generating pseudo labels for the pedestrian data of the unlabeled target domain by using the DBSCAN network, so that the label data of the target domain and the label data of the source domain are kept consistent;
step 3: slicing and position coding are carried out on pedestrian images input by a source domain, a transducer network is built to obtain different concerns on different pedestrian characteristic points, and training model parameters are reserved;
step 4: slicing and position coding pedestrian images of the target domain, and simultaneously taking the pedestrian images and the source domain images as input of a transducer network so as to transfer different concerns obtained by the transducer network on the source domain to the target domain;
step 5: constructing a complex graph convolution neural network, taking complex domain centers of a plurality of domain data sets extracted through a transducer network as input, inputting the complex domain centers into the complex graph convolution neural network fused by the multi-domain information, realizing the distribution alignment of samples of different domains, and reducing the difference between the data sets;
step 6: constructing a space context information fusion module, wherein the space context information fusion module is provided with a double-layer MLP structure, real number multi-domain central characteristics extracted by a transducer network are used as the input of the space context information fusion module, the input characteristics are transposed and then put into the double-layer MLP structure for information interaction, and the output of the double-layer MLP structure is transposed again and added with the input characteristics for output;
step 7: and generating image features of the source domain and the target domain from the trained transducer network, and matching the image features, so that the same pedestrian appearing on the target device can be found through the monitoring image of the source device.
Further, the step 3 specifically includes:
blocking each pedestrian picture in the source domain data, namely, blocking the nth picture of the ith domainDividing into m blocks, and then encoding each block to obtain word embedded representation, denoted +.>Position information corresponding to each block is obtained through coding calculation, and the calculation mode is as follows:
wherein The dimension representing the mth block after the input image division, pos represents the word embedded representation of the mth block, and thus, the input of the transducer network can be expressed as:
the transducer network contains two modules, one being the MLP and the other being the multi-head attention mechanism, the two modules willInput multi-headIn the attention mechanism, the network is enabled to pay attention to the picture in different degrees, and the calculation mode is as follows:
wherein Representing the projection matrix in the pre-training model, +.>Output characteristics representing the mechanism as sample image +.>Also, the input of MLP, so the output of the transducer is:
wherein ,representing the output of a transducer network>And calculating and optimizing cross entropy loss of the prediction label of the source domain and the real label of the source domain by the prediction label of the picture, and finally obtaining a transducer network pre-training model with good performance.
Further, the step 4 specifically includes:
constructing two models with completely consistent structures, wherein one model is a training model, and the other model is a ema model; slicing the image sample of the target domain, and slicing the nth picture of the target domainDividing into m blocks, and then encoding each block to obtain word embedded representation, denoted +.>And calculates the corresponding position code, which is recorded as: />
wherein The dimension representing the mth block after the input image division, pos represents the word embedded representation of the mth block, and thus, the target field input of the transducer network can be expressed as:
source domainAnd the target Domain->Coded samples->Input into the Multi-head attention mechanism, where +.> and />Respectively representing the number of source domains and the number of target domainsAccording to the sample set, the transform network also forms different degrees of attention to the target domain picture so as to realize the migration of the attention degree, and the calculation mode is as follows:
wherein Is the sample feature obtained by extracting n image samples in the d-th domain, ++>,/>Representing the source domain->Representing a target domain, wherein->The projection matrix representing the current training is updated in the following way:
wherein ,model parameter matrix expressed as optimal, +.>The duty cycle of the model obtained for the historical training times and the current training times is calculated as the current training times, when the training times is 0,for the pre-training of the obtained parameter matrix,by continuously training the optimization, a training model which can achieve the best effect on the same equipment is finally obtained.
Further, the step 5 specifically includes:
the nth sample feature obtained in step 4, wherein />Representing the d-th domain, generating the real number domain center of each domain, denoted as:
wherein ,representing the real number domain center of the d-th domain; mapping it to a complex feature space, denoted as each domain center:
wherein ,representing an operation of acquiring as the real part of a feature in a complex feature space,/->Representing an operation of acquiring the imaginary part of the feature in the complex feature space,/->Representing the complex domain center of the d-th domain, the> and />Is two trainable weights, two complex graph convolution neural networks are respectively constructed aiming at a training model and a ema model, and the complex graph convolution neural network of the training model is marked as->The complex graph convolutional neural network of ema model is marked as;
Complex graph convolutional neural network for training model, its nodesIs the complex domain center of the d-th domain, its adjacency matrix +.>Represented is a vector relationship between the centers in the complex domain, which can be expressed as:
wherein a and b represent an a-th domain and a b-th domain, so that the complex graph convolutional neural network of the training model is updated as follows:
wherein ,plural domain center representing the d-th domain, < >>Representing the global complex domain center post-update feature of the d-th domain,/->Representing complex figuresAdjacency matrix of convolutional neural network, +.>Representing a matrix of unity complex>Complex degree matrix of representation->Representing a complex nonlinear activation function, < >>Is a learnable complex parameter;
aiming at the complex diagram convolutional neural network of the ema model, the output is obtained by calculation by using the same calculation method as the complex diagram convolutional neural network method of the training modelAnd performing modulo operation on the output complex centers, measuring the distance between the complex domain centers updated by the two complex graph convolutional neural networks by using Euclidean distance, and further reducing the sample distribution difference between different domains by using an MSE loss function so as to optimize a transducer network model. />
Further, the step 6 specifically includes: respectively constructing two double-layer MLP structures aiming at a training model and a ema model;
for training model, the real number domain center of the d-th domain obtained in the step 5 is used forFirst, transpose is performed to obtainAnd inputs it into a double layer MLP, whose output is:
wherein and />Representing two trainable weights, +.>Representing a nonlinear activation function, then the output of the double layer MLP is +.>After transposition with the original input->And adding the spatial context information as an output of the spatial context information fusion module:
obtaining the output of the spatial context information fusion module on the ema model in the same way as training the modelAnd finally, carrying out distance measurement on the output of the spatial fusion module on the training model and the ema model, further reducing the difference between multiple domains, and optimizing the transducer network model.
Compared with the prior art, the self-adaptive pedestrian re-identification method based on the non-supervision multi-field fusion has the following advantages:
the invention provides a multi-source domain-based transform and multi-domain information fusion technology, which trains data sets generated on different devices, so that the obtained sample features are more generalized, and the problem that the same pedestrian cannot be matched accurately due to the inconsistency of angles, shooting time and space positions among the devices is solved;
the invention constructs a complex graph convolutional neural network, uses a complex feature space to explore vector semantic association, and further reduces inter-domain sample distribution difference from the vector structure level;
the invention constructs a spatial context information fusion, directly explores spatial direct association of the domain central characteristics, and further reduces inter-domain sample distribution difference from the spatial context level.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of a first stage of training of the present invention;
fig. 2 is a flow chart of a second stage of training of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first", "a second", etc. may explicitly or implicitly include one or more such feature. In the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more.
In the description of the present invention, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art in a specific case.
The invention will be described in detail below with reference to the drawings in connection with embodiments.
The invention provides a transform-based self-adaptive pedestrian re-identification method based on non-supervision multi-field fusion. The training of the method model requires two stages. As shown in fig. 1, in the first stage of training, model pre-training is performed using pedestrian data photographed by a plurality of different monitoring devices at different time periods and having a large number of tags, and model parameters obtained by the training are retained. As shown in fig. 2, in the second stage of training, the model parameters of the first stage are imported into the network, and the model is fine-tuned by using the same tag data as the first stage and the untagged pedestrian data shot by the additional target monitoring device, so as to find different time periods and different places of the same pedestrian on the target source device.
The invention discloses an unsupervised multi-field fusion self-adaptive pedestrian re-identification method, which comprises the following steps:
step 1, taking the pictures of monitoring pedestrians of a plurality of monitoring devices at different places in different time periods as inputs of a transducer network and a DBSCAN network, and dividing the pictures into a source domain and a target domain according to the number of tag information.
Specifically, we take 4 groups of monitoring device images as an example, wherein 3 groups of monitoring device images with complete labels and 1 group of monitoring device images without labels. We consider the ith group of n monitoring device images with complete labels as source domain data, defined asEach group contains n samples, wherein +.>The corresponding data tag is +.>. We regard n monitoring device images without labels as target domain data, defined as +.>Containing n samples.
And 2, generating pseudo labels for the pedestrian data of the unlabeled target domain by using the DBSCAN network so as to keep the label data of the target domain consistent with the label data of the source domain. The invention uses the DBSCAN network to generate pseudo labels for the pedestrian data of the unlabeled target domain.
Specifically, since the target domain data does not contain tag data, the target domain tag needs to be generated by a clustering method, and DBSCAN is a clustering algorithm based on density clustering, which is representative, and defines a maximum set of points connected by density for clusters. The target domain data tag is acquired by using DBSCAN, and the steps are as follows:
(1) Assuming that any data point is taken in the target domainParameter->Representing the tightness of the sample distribution of the neighborhood, wherein +.>Neighborhood distance threshold value representing a certain core point, < ->Indicating a core point radius of +.>
(3) If for parameters and />Selected data point +.>For the core point, find all the sub +.>Data points with reachable density form a cluster;
(4) If a data point is selectedThe edge point is used for selecting another data point as a core point;
(5) Repeating the steps (2) and (3) until all points are processed.
Finally, each cluster obtained represents a category, and thus, the pseudo tag of the obtained target domain is represented asWherein the target field contains n samples.
And 3, in the first stage of training, slicing and position coding are carried out on the pedestrian image input by the source domain, a transducer network is built to obtain different concerns about different pedestrian characteristic points, and training model parameters are reserved.
Specifically, we block each pedestrian picture in the source domain data, i.e. the nth picture of the ith domainDividing into m blocks, and then encoding each block to obtain word embedded representation, denoted +.>. By encoding, we can calculate and obtain the position information corresponding to each block, and the calculation mode is as follows:
wherein Representing the dimension of the mth block after the input image is divided, pos represents the word embedded representation of the mth block. Thus, the input of the transducer can be expressed as:
wherein M-block position information indicating an nth picture of an ith field. The transducer network contains two modules, one being an MLP and the other being a multi-headed attention mechanism. We willThe method is input into a multi-head attention mechanism, so that the network focuses on the picture to different degrees, and the calculation mode is as follows:
wherein Representing the projection matrix in the pre-trained model. />Representing the output characteristics of the mechanism, i.e. sample image +.>Also, the input of MLP, so the output of the transducer is: />
wherein ,representing the output of a transducer network>Predictive label of picture. And calculating and optimizing cross entropy loss of the prediction label and the real label of the source domain, and finally obtaining the transducer network pre-training model with good performance.
And 4, in the second training stage, slicing and position encoding the pedestrian image of the target domain, and simultaneously taking the pedestrian image and the source domain image as input of a transducer network so as to migrate different interests obtained by the transducer network on the source domain to the target domain.
We constructed two models of exactly identical structure in the second training phase, one training model and the other model ema (i.e. the optimal model), which used the same initialization conditions. The purpose of training the model is to obtain model parameters for the current training batch. The purpose of the ema model is to obtain the mean of the model parameters of the previous training batch and the model parameters of the current batch, thereby representing the optimal model.
Specifically, the image sample of the target domain is sliced and recorded asAnd calculates the corresponding position code, which is recorded as:
wherein Representing the dimension of the mth block after the input image is divided, pos represents the word embedded representation of the mth block. Thus, the target field input of a transducer can be expressed as:
we will source domainAnd the target Domain->Coded samples->Input into the Multi-head attention mechanism, where +.> and />The method respectively represents a source domain data sample set and a target domain data sample set, so that the network also forms different degrees of attention to the target domain picture to realize migration of the attention degree, and the calculation mode is as follows:
wherein Is the sample feature obtained by extracting n image samples in the d-th domain, ++>,/>Representing the source domain->Representing a target domain, wherein->The projection matrix representing the current training is updated in the following way:
wherein ,model parameter matrix expressed as optimal, +.>The duty cycle of the model obtained for the historical training times and the current training times. When the number of training times is 0,the obtained parameter matrix is pre-trained. By constantly training the optimization, a training model is finally obtained that achieves the best results on all four different devices.
5. The complex graph convolution neural network module is constructed, complex domain centers of a plurality of domain data sets extracted through a transducer network are used as input and input into the complex graph convolution neural network module with multi-domain information fusion, and vector space structural association among domains is explored, so that distribution alignment of different domain samples is further realized, and differences among the data sets are reduced.
Specifically, by usingThe nth sample feature obtained In (IV), wherein />Representing the d-th domain, generating the real number domain center of each domain, denoted as:
wherein ,the real number domain center of the d-th domain is shown. To further consider the structured relationships between domain centers and mine the semantic associations of domain centers, we map them to a complex feature space to explore the vector semantic associations between domain centers, i.e., each domain center is noted as:
wherein ,representing an operation of acquiring as the real part of a feature in a complex feature space,/->Representing an operation of acquiring the imaginary part of the feature in the complex feature space,/->Representing the complex domain center of the d-th domain, the> and />Is two trainable weights. Two complex graph convolutional neural networks are respectively constructed aiming at a training model and a ema model, and the complex graph convolutional neural network of the training model is marked as +.>The complex graph convolutional neural network of ema model is marked as。
For complex graph convolutional neural network of training model, its nodeIs the complex domain center of the d-th domain, its adjacency matrix +.>Represented is a vector relationship between the centers in the complex domain, which can be expressed as:
wherein a and b represent an a-th domain and a b-th domain. Therefore, the updating mode of the graph convolution neural network is as follows:
wherein ,representing the complex domain center of the d-th domain, the>Representing the global complex domain center post-update feature of the d-th domain,/->Representing the adjacency matrix of the complex-graph convolutional neural network,/>representing a matrix of unity complex>Complex degree matrix of representation->Representing a complex nonlinear activation function, < >>Is a complex parameter that can be learned.
For the complex graph convolutional neural network of the ema model, the calculation method is consistent with the model training method, so that output is obtained。
After the output complex centers are subjected to modulo operation, euclidean distance is used for measuring the distance between complex domain centers updated by two complex graph convolutional neural networks, and MSE loss functions are utilized for further reducing sample distribution differences among unnecessary domains, so that a transform network model is optimized.
And 6, constructing a space context information fusion module, wherein the space context information fusion module is provided with a double-layer MLP structure, taking real number multi-domain central features extracted by a transducer network as the input of the space context information fusion module, transposing the input features, putting the transposed input features into the double-layer MLP for information interaction, transposing the output of the double-layer MLP again and adding the input features, exploring the association of each dimension of the central features in each domain, namely exploring the direct association of one space between the domain centers, and further realizing the distribution alignment of multi-domain samples in space.
Specifically, we constructed two bilayer MLP structures for the training model and ema model, respectively. For training models, the real number domain center of the d-th domain obtained in (fifth), namelyFirst transpose is carried out to obtain +.>And inputs it into a double layer MLP, whose output is:
wherein and />Representing two trainable weights, +.>Representing a nonlinear activation function. Then we will add the output of the bilayer MLP +.>After transposition with the original input->And adding the spatial context information as an output of the spatial context information fusion module:
similar to the training model, we also obtain the output of the spatial context information fusion module on the ema model. Finally, the distance measurement is carried out on the outputs of the spatial fusion modules on the training model and the ema model, so that the difference between multiple domains is further reduced, and the optimization is realized on the transducer network model.
7. And generating image features of the source domain and the target domain from the trained transducer network, and matching the image features, so that the same pedestrian appearing on the target device can be found through the monitoring image of the source device.
After the optimal model is obtained, image features acquired by different devices are matched, and similarity between every two image features is measured by using Euclidean distance, so that images of the same pedestrian are found under different devices, and pedestrian re-recognition is realized.
The method provided by the invention is primarily compared with the CNN algorithm-based pedestrian re-identification problem in four data sets, the average accuracy of the algorithm is 65.9%, the average accuracy of the algorithm based on the transformation algorithm is 62.8%, and the average accuracy of the algorithm based on the CNN algorithm is 56.3%. The method of the invention improves the accuracy of pedestrian re-identification.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.
Claims (3)
1. An unsupervised multi-domain fusion self-adaptive pedestrian re-identification method is characterized in that: the method comprises the following steps:
step 1: taking the pictures of monitoring pedestrians of a plurality of monitoring devices at different places in different time periods as inputs of a transducer network and a DBSCAN network, and dividing the pictures into a source domain and a target domain according to the number of tag information;
step 2: generating pseudo labels for the pedestrian data of the unlabeled target domain by using the DBSCAN network, so that the label data of the target domain and the label data of the source domain are kept consistent;
step 3: slicing and position coding are carried out on pedestrian images input by a source domain, a transducer network is built to obtain different concerns on different pedestrian characteristic points, and training model parameters are reserved;
step 4: slicing and position coding pedestrian images of the target domain, and simultaneously taking the pedestrian images and the source domain images as input of a transducer network so as to transfer different concerns obtained by the transducer network on the source domain to the target domain; the method specifically comprises the following steps:
constructing two models with completely consistent structures, wherein one model is a training model, and the other model is a ema model;
slicing the image sample of the target domain, and slicing the nth picture of the target domainDividing into m blocks, and then encoding each block to obtain word embedded representation, denoted +.>And calculates the corresponding position code, which is recorded as:
wherein The dimension representing the mth block after the input image division, pos represents the word embedded representation of the mth block, and thus, the target field input of the transducer network can be expressed as:
encoded samples of source and target fieldsInput into the Multi-head attention mechanism, where +.> and />The method respectively represents a source domain data sample set and a target domain data sample set, so that the Transformer network also forms different degrees of attention to the target domain picture to realize the migration of the attention degree, and the calculation mode is as follows:
wherein Is the sample feature obtained by extracting n image samples in the d-th domain, ++>,/>Representing the source domain->Representing a target domain, wherein->The projection matrix representing the current training is updated in the following way: />
wherein ,model parameter moment expressed as optimumArray (S)>For the duty ratio of the model obtained for the historical training times and the current training times, when the training times is 0,/for the model obtained for the current training times>The parameter matrix obtained for pre-training is optimized through continuous training, and finally a training model which can achieve the best effect on the same equipment is obtained;
step 5: constructing a complex graph convolution neural network, taking complex domain centers of a plurality of domain data sets extracted through a transducer network as input, inputting the complex domain centers into the complex graph convolution neural network fused by the multi-domain information, realizing the distribution alignment of samples of different domains, and reducing the difference between the data sets; the method specifically comprises the following steps:
the nth sample feature obtained in step 4, wherein />Representing the d-th domain, generating the real number domain center of each domain, denoted as:
wherein ,representing the real number domain center of the d-th domain; mapping it to a complex feature space, denoted as each domain center:
wherein ,representing an operation of acquiring as the real part of a feature in a complex feature space,/->Representing an operation of acquiring the imaginary part of the feature in the complex feature space,/->Representing the complex domain center of the d-th domain, the> and />Is two trainable weights, two complex graph convolution neural networks are respectively constructed aiming at a training model and a ema model, and the complex graph convolution neural network of the training model is marked as->The complex graph convolutional neural network of ema model is marked as;
Complex graph convolutional neural network for training model, its nodesIs the complex domain center of the d-th domain, its adjacency matrix +.>Represented is a vector relationship between the centers in the complex domain, which can be expressed as: />
Wherein a and b represent an a-th domain and a b-th domain, so that the complex graph convolutional neural network of the training model is updated as follows:
wherein ,plural domain center representing the d-th domain, < >>Representing the updated features of the center of the global complex domain of the d-th domain,adjacency matrix representing a complex graph convolutional neural network, < >>Representing a matrix of unity complex>The complex number matrix of the representation is presented,representing a complex nonlinear activation function, < >>Is a learnable complex parameter;
aiming at the complex diagram convolutional neural network of the ema model, the output is obtained by calculation by using the same calculation method as the complex diagram convolutional neural network method of the training modelThen, after the output complex centers are subjected to the modulo operation, the Euclidean distance is used for measuring twoThe distance between the centers of complex domains updated by the complex graph convolutional neural network is further reduced by using an MSE loss function, so that a transducer network model is optimized;
step 6: constructing a space context information fusion module, wherein the space context information fusion module is provided with a double-layer MLP structure, real number multi-domain central characteristics extracted by a transducer network are used as the input of the space context information fusion module, the input characteristics are transposed and then put into the double-layer MLP structure for information interaction, and the output of the double-layer MLP structure is transposed again and added with the input characteristics for output;
step 7: and generating image features of the source domain and the target domain from the trained transducer network, and matching the image features, so that the same pedestrian appearing on the target device can be found through the monitoring image of the source device.
2. The self-adaptive pedestrian re-recognition method for non-supervision multi-domain fusion according to claim 1, wherein the method comprises the following steps: the step 3 specifically includes:
blocking each pedestrian picture in the source domain data, namely, blocking the nth picture of the ith domainDividing into m blocks, and then encoding each block to obtain word embedded representation, denoted +.>Position information corresponding to each block is obtained through coding calculation, and the calculation mode is as follows:
wherein Representing the dimension of the mth block after the input image is divided, pos representing the word embedding of the mth blockInput into the representation, therefore, the input into the transducer network can be represented as:
the transducer network contains two modules, one being the MLP and the other being the multi-head attention mechanism, the two modules willThe method is input into a multi-head attention mechanism, so that the network focuses on the picture to different degrees, and the calculation mode is as follows:
wherein Representing the projection matrix in the pre-training model, +.>Output characteristics representing the mechanism as sample image +.>Also, the input of MLP, so the output of the transducer is:
wherein ,representing the output of a transducer network>And calculating and optimizing cross entropy loss of the prediction label of the source domain and the real label of the source domain by the prediction label of the picture, and finally obtaining a transducer network pre-training model with good performance.
3. The self-adaptive pedestrian re-recognition method for non-supervision multi-domain fusion according to claim 1, wherein the method comprises the following steps: the step 6 specifically includes: respectively constructing two double-layer MLP structures aiming at a training model and a ema model;
for training model, the real number domain center of the d-th domain obtained in the step 5 is used forFirst transpose is carried out to obtain +.>And inputs it into a double layer MLP, whose output is:
wherein and />Representing two trainable weights, +.>Representing a nonlinear activation function, then the output of the double layer MLP is +.>After transposition with the original input->And adding the spatial context information as an output of the spatial context information fusion module:
obtaining the output of the spatial context information fusion module on the ema model in the same way as training the modelAnd finally, carrying out distance measurement on the output of the spatial fusion module on the training model and the ema model, further reducing the difference between multiple domains, and optimizing the transducer network model. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310125639.1A CN115830548B (en) | 2023-02-17 | 2023-02-17 | Self-adaptive pedestrian re-identification method based on non-supervision multi-field fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310125639.1A CN115830548B (en) | 2023-02-17 | 2023-02-17 | Self-adaptive pedestrian re-identification method based on non-supervision multi-field fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115830548A CN115830548A (en) | 2023-03-21 |
CN115830548B true CN115830548B (en) | 2023-05-05 |
Family
ID=85521672
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310125639.1A Active CN115830548B (en) | 2023-02-17 | 2023-02-17 | Self-adaptive pedestrian re-identification method based on non-supervision multi-field fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115830548B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110720906A (en) * | 2019-09-25 | 2020-01-24 | 上海联影智能医疗科技有限公司 | Brain image processing method, computer device, and readable storage medium |
CN111476168A (en) * | 2020-04-08 | 2020-07-31 | 山东师范大学 | Cross-domain pedestrian re-identification method and system based on three stages |
CN112288042A (en) * | 2020-12-18 | 2021-01-29 | 蚂蚁智信(杭州)信息技术有限公司 | Updating method and device of behavior prediction system, storage medium and computing equipment |
CN112446423A (en) * | 2020-11-12 | 2021-03-05 | 昆明理工大学 | Fast hybrid high-order attention domain confrontation network method based on transfer learning |
CN114677646A (en) * | 2022-04-06 | 2022-06-28 | 上海电力大学 | Vision transform-based cross-domain pedestrian re-identification method |
CN115050045A (en) * | 2022-04-06 | 2022-09-13 | 上海电力大学 | Vision MLP-based pedestrian re-identification method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220092420A1 (en) * | 2020-09-21 | 2022-03-24 | Intelligent Fusion Technology, Inc. | Method, device, and storage medium for deep learning based domain adaptation with data fusion for aerial image data analysis |
-
2023
- 2023-02-17 CN CN202310125639.1A patent/CN115830548B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110720906A (en) * | 2019-09-25 | 2020-01-24 | 上海联影智能医疗科技有限公司 | Brain image processing method, computer device, and readable storage medium |
CN111476168A (en) * | 2020-04-08 | 2020-07-31 | 山东师范大学 | Cross-domain pedestrian re-identification method and system based on three stages |
CN112446423A (en) * | 2020-11-12 | 2021-03-05 | 昆明理工大学 | Fast hybrid high-order attention domain confrontation network method based on transfer learning |
CN112288042A (en) * | 2020-12-18 | 2021-01-29 | 蚂蚁智信(杭州)信息技术有限公司 | Updating method and device of behavior prediction system, storage medium and computing equipment |
CN114677646A (en) * | 2022-04-06 | 2022-06-28 | 上海电力大学 | Vision transform-based cross-domain pedestrian re-identification method |
CN115050045A (en) * | 2022-04-06 | 2022-09-13 | 上海电力大学 | Vision MLP-based pedestrian re-identification method |
Also Published As
Publication number | Publication date |
---|---|
CN115830548A (en) | 2023-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111400620B (en) | User trajectory position prediction method based on space-time embedded Self-orientation | |
CN109949317B (en) | Semi-supervised image example segmentation method based on gradual confrontation learning | |
CN111291212B (en) | Zero sample sketch image retrieval method and system based on graph convolution neural network | |
CN107506740B (en) | Human body behavior identification method based on three-dimensional convolutional neural network and transfer learning model | |
CN110781838A (en) | Multi-modal trajectory prediction method for pedestrian in complex scene | |
CN109858390A (en) | The Activity recognition method of human skeleton based on end-to-end space-time diagram learning neural network | |
Zhao et al. | Where are you heading? dynamic trajectory prediction with expert goal examples | |
CN116738911B (en) | Wiring congestion prediction method and device and computer equipment | |
CN115423847B (en) | Twin multi-modal target tracking method based on Transformer | |
CN113065409A (en) | Unsupervised pedestrian re-identification method based on camera distribution difference alignment constraint | |
Yin et al. | Automerge: A framework for map assembling and smoothing in city-scale environments | |
CN116485839A (en) | Visual tracking method based on attention self-adaptive selection of transducer | |
CN115841683A (en) | Light-weight pedestrian re-identification method combining multi-level features | |
CN115051925A (en) | Time-space sequence prediction method based on transfer learning | |
CN116524197B (en) | Point cloud segmentation method, device and equipment combining edge points and depth network | |
CN115830548B (en) | Self-adaptive pedestrian re-identification method based on non-supervision multi-field fusion | |
CN115631513B (en) | Transformer-based multi-scale pedestrian re-identification method | |
CN115830643A (en) | Light-weight pedestrian re-identification method for posture-guided alignment | |
CN115797557A (en) | Self-supervision 3D scene flow estimation method based on graph attention network | |
CN116030255A (en) | System and method for three-dimensional point cloud semantic segmentation | |
CN115034459A (en) | Pedestrian trajectory time sequence prediction method | |
CN112801179A (en) | Twin classifier certainty maximization method for cross-domain complex visual task | |
CN117612214B (en) | Pedestrian search model compression method based on knowledge distillation | |
Chen et al. | Memory segment matching network based image geo-localization | |
Sheng et al. | Learning a deep metric: a lightweight relation network for loop closure in complex industrial scenarios |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |