CN116486483A

CN116486483A - Cross-view pedestrian re-recognition method and device based on Gaussian modeling

Info

Publication number: CN116486483A
Application number: CN202310449124.7A
Authority: CN
Inventors: 李文辉; 周厚燃; 刘安安; 宋丹; 孙正雅; 聂婕; 魏志强
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-07-25

Abstract

The invention discloses a method and a device for re-identifying a pedestrian across visual angles based on Gaussian modeling, wherein the method comprises the following steps: re-modeling by using the characteristics of the class center of the current batch and the uncertainty of the predicted class center characteristics of the current batch, randomly sampling random noise irrelevant to model parameters, equivalently re-expressing visual characteristics, and measuring the similarity of Gaussian distribution between two domains by using JS divergence; modeling to form a new expression of the multi-view of the target domain by utilizing the characteristics of the multi-view extracted by the neural network and the uncertainty between the views; regularization items are introduced, the standard normal distribution is approached through explicit constraint, and KL divergence is used for measuring; and (3) respectively generating new Gaussian feature representation by using Gaussian distribution reserved in the Gaussian prototype level path through a memory network through a Gaussian mixture model, restricting the whole distribution through standardized ternary loss, and outputting a final recognition result. The device comprises: a processor and a memory.

Description

Cross-view pedestrian re-recognition method and device based on Gaussian modeling

Technical Field

The invention relates to the field of cross-view pedestrian re-recognition, in particular to a cross-view pedestrian re-recognition method and device based on Gaussian modeling.

Background

With the wide application of deep learning in various fields, the target retrieval task is focused on the fields of multimedia, biological recognition, computer vision and the like ^[1] And in particular to the practical application of pedestrian re-identification therein. The availability of large scale annotated data sets is relied upon when learning highly accurate pedestrian re-recognition models, but pedestrian re-recognition models performance can drastically decrease when evaluated on data sets lacking true tags for different images, the progress of pedestrian re-recognition models being largely due to deep convolutional neural networks ^[2] Most of the deep convolutional neural networks are trained in a supervised mode, and migration is difficult, so that an unsupervised domain adaptation method using a pedestrian re-recognition model trained by an existing labeled source domain data set and an unlabeled target domain data set is paid great attention, and particularly, a cross-domain multi-view pedestrian re-recognition technology is gradually paid attention to by utilizing a task of searching a corresponding multi-view target on the unlabeled data set through images.

Most of the existing approaches focus on antagonism-based ^[3-5] Based on examples ^[6,7] Based on prototypes ^[8,9] And based on fused view features ^[10,11] Thereby realizing the elimination of inter-domain differences and completing the domain adaptation task. On the one hand, the prototype-based domain adaptation method DLEA ^[9] Make up for the countermeasure-basedThe domain adaptation method ignores semantic information, reduces domain interval, but does not pay attention to data uncertainty caused by mapping of images or multi-view targets to feature space, and cannot well process differences and similarities in classes, so that pedestrian re-recognition is inaccurate, and the authenticity of noisy pseudo tags is still not high because of inter-domain difference.

On the other hand, these methods ignore learning of uncertain information, such as: domain adaptation method GVCNN based on fusion view characteristics and examples ^[11] And HIFA ^[6] Neglecting the uncertainty of a single instance representation due to multiple perspectives, it is difficult to grasp subtle different information from each perspective.

In summary, the existing cross-view pedestrian re-recognition technology method faces the following three disadvantages:

1. the original method is to directly use the image or the characteristics extracted from multiple visual angles for semantic alignment, but does not learn the uncertain information mapped to the characteristic space, thereby reducing the accuracy of pedestrian re-identification;

2. the conventional prototype-based method is simple to apply the pseudo tag, does not propose how to improve the authenticity of the pseudo tag, and can obtain a high-order prototype without adopting the pseudo tag through the novel characteristic expression of the sample high-order Gaussian prototype, so that the model can be constrained after each training so as to improve the authenticity of the predicted pseudo tag, and further, the degradation of the accuracy of pedestrian re-identification caused by directly using the pseudo tag is avoided;

3. The example representation uncertainty information caused by how to ignore multiple views by using the image features only can be ignored, the existing method for fusing the view features easily causes that the pedestrian re-recognition tends to be over-fitted, and the accuracy of the pedestrian re-recognition by using the new example is reduced.

Disclosure of Invention

The invention provides a method and a device for identifying pedestrian re-crossing angles based on Gaussian modeling, which solve the problem that a certain class does not exist in a certain batch due to smaller batches by utilizing a mobile class center and reduce the adverse effect of a pseudo tag; processing the inherent noise of the data by using the original uncertainty, and eliminating the defect of data uncertainty in the continuous mapping space; the image enhancement module is designed to improve the robustness of the pedestrian re-recognition model, and meanwhile uncertainty information caused by multiple images and multiple perspectives is explored; through the inspiring of triplet loss, the memory network records global Gaussian prototype pass loss constraint, so that the class is compact, heterogeneous is dispersed, accuracy in pedestrian re-recognition by utilizing a new instance is improved, further precision of pedestrian re-recognition is improved, requirements are still met under diversified use occasions, and the method is described in detail below:

in a first aspect, a method for identifying a pedestrian re-from a cross-view angle based on gaussian modeling, the method comprising:

Re-modeling by using the characteristics of the class center of the current batch and the uncertainty of the predicted class center characteristics of the current batch, randomly sampling random noise irrelevant to model parameters, equivalently re-expressing visual characteristics, and measuring the similarity of Gaussian distribution between two domains by using JS divergence;

modeling to form a new expression of the multi-view of the target domain by utilizing the characteristics of the multi-view extracted by the neural network and the uncertainty between the views;

regularization items are introduced, the standard normal distribution is approached through explicit constraint, and KL divergence is used for measuring;

and (3) respectively generating new Gaussian feature representation by using Gaussian distribution reserved in the Gaussian prototype level path through a memory network through a Gaussian mixture model, restricting the whole distribution through standardized ternary loss, and outputting a final recognition result.

In a second aspect, a device for identifying a pedestrian re-from a viewing angle based on gaussian modeling, the device comprising: a processor and a memory having stored therein program instructions that invoke the program instructions stored in the memory to cause an apparatus to perform the method steps of any of the first aspects.

In a third aspect, a computer readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method steps of any of the first aspects.

The technical scheme provided by the invention has the beneficial effects that:

1. in the vision and prototype paths, the category centers are moved and aligned by using a mobile semantic migration network to solve the problem that error labels exist in the unsupervised domain adaptive pseudo labels, each category center is aligned in a source domain and a target domain instead of directly regarding a pseudo label sample as a real sample, the adverse effect brought by the error category center is neutralized by using the correct category center, and due to smaller batches in experiments, the mobile category center can realize more accurate semantic representation learning, so that the problem of pedestrian re-recognition matching deviation caused by noise pseudo labels is relieved;

2. modeling gaussian distribution on a prototype level by using the characteristic m of the class center of the current batch and the intra-class uncertainty s in a visual and gaussian prototype path to process inherent noise of a data set, eliminating the defect of data uncertainty in a continuously mapped characteristic space, solving the problem of data uncertainty caused by image and view mapping, re-expressing the characteristic as z=m+ωs, wherein ω e N (0, 1), introducing regularization term in the optimization process, approaching a standard normal distribution N (0, 1) by explicitly constraining z (m, ω, s), and measuring by KL divergence between the two distributions to avoid degradation of the new characteristic expression z to be represented as original certainty z=m+c (c is a constant); meanwhile, the similarity of Gaussian distribution between two domains is measured by utilizing JS divergence, so that the degree of difference between the domains is reduced, the data representation capability of mapping pedestrian images and views to a feature space is improved under the condition of unsupervised domain adaptation, and the pedestrian re-recognition precision is improved;

3. In Gaussian vision and mixed Gaussian prototype paths, in order to solve the problem of uncertainty of multi-view in cross-view pedestrian re-recognition at an example level, the invention converts the original simple multi-view maximum generation characteristics of a target domain into multi-view modeling to form Gaussian distribution, and obtains the relation between pedestrian multi-view information, and correspondingly, each image of a source domain forms a plurality of different image modeling to form Gaussian distribution through an image enhancement technology, so that the robustness of a pedestrian re-recognition model is improved, and simultaneously, the KL divergence is utilized to avoid new characteristic expression z degradation and domain anti-similarity loss is utilized to guide the global alignment of the source domain and the target domain, so that the domain invariant characteristic expression capability of a pedestrian re-recognition visual characteristic extraction network is effectively enhanced;

4. the Gaussian mixture model is used for generating high-order Gaussian feature representation, JS divergence is used for measuring inter-domain divergence, uncertain information of instance representation caused by multiple visual angles is better learned, meanwhile, the distance between a Gaussian instance of the same category and a Gaussian prototype of a high order is shortened by comparing center loss in Gaussian feature space, the problem that pedestrian re-recognition tends to be over-fitted is solved, and accuracy in pedestrian re-recognition by using a new instance is improved;

5. In the global Gaussian mixture path, the embodiment of the invention uses a memory network to retain Gaussian features of pedestrian images and view prototype levels after each round of training, generates new global Gaussian mixture feature representation through a Gaussian mixture model, also uses KL divergence constraint to avoid new feature expression degradation, records the global Gaussian mixture prototype in the memory network through ternary standardized loss constraint by the heuristic of ternary group loss, so that the global category center is more accurate in semantic representation learning, the same category is more compact in space, and different categories are more dispersed in space, and further the precision of pedestrian re-identification is improved.

Drawings

FIG. 1 is a flow chart of a cross-view pedestrian re-recognition method based on Gaussian modeling;

fig. 2 is a network structure diagram of cross-view pedestrian re-recognition based on gaussian modeling.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.

Example 1

Referring to fig. 1, a method for identifying a pedestrian across visual angles based on gaussian modeling comprises the following steps:

101: extracting visual features of an image (source domain) and visual features of a target pedestrian multi-view (target domain) by using a two-dimensional convolutional neural network;

The embodiment of the invention divides the visual characteristics into two different processing modes: firstly, entering a prototype level processing after obtaining visual features, namely, steps 103 and 104; and secondly, processing is performed on the image at the data enhancement and instance level, namely, step 105.

102: minimizing source domain classification errors and differences between source and target domains based on unsupervised challenge domain adaptation learning strategies under covariate transfer conditions to train a classifier and a domain discriminator to guide global alignment of source and target domains;

the domain invariant feature expression capability of the pedestrian re-identification visual feature extraction network can be effectively enhanced through the operation of the step.

103: solving a mobile category center by using a previous batch category center and a current batch category center, enabling the local semantic representation to be more accurate through loss constraint, and guiding global alignment of a source domain and a target domain by using a domain discriminator;

in the vision and prototype paths, the mobile semantic migration network is utilized to calculate the mobile category centers from the category centers of the previous batch and the category centers of the current batch, the square of Euclidean distance is used as a loss to enable the source domain and the target domain to be aligned to each category center, and the correct category center is used for neutralizing the adverse effect brought by the wrong category center;

Because of smaller batches in the experiment, the moving average class center can realize more accurate local semantic representation learning, and the moving semantic migration network solves the problem that error labels exist in the unsupervised domain adaptation pseudo labels, so that the problem of pedestrian re-identification matching deviation caused by noise pseudo labels is solved.

The domain contrast similarity loss is utilized to measure inter-domain divergence, i.e., embodiments of the present invention use an additional domain discriminator to determine whether the features are from the source domain or the target domain, and the visual feature extraction network is used to fool the domain discriminator, expecting a balance to be achieved in both games, thereby leading to global alignment of the source and target domains.

104: modeling by using the characteristic m and the uncertainty s of the class center of the current batch to form Gaussian distribution, and re-expressing the visual characteristic as z=m+ωs, wherein ω epsilon N (0, 1), and measuring the similarity of the Gaussian distribution between two domains by using JS divergence;

in the visual and Gaussian prototype paths, after the characteristics extracted by the neural network are obtained, the embodiment of the invention eliminates the defect of data uncertainty in the continuously mapped characteristic space, solves the problem of data uncertainty caused by image and view mapping, firstly proposes modeling to form Gaussian distribution by using the characteristic m of the category center of the current batch and the uncertainty s in the category, restates the visual characteristics as z=m+ωs, wherein ω epsilon N (0, 1), measures the similarity of Gaussian distribution between two domains by using JS divergence, improves the data representation capability of the pedestrian image mapped to the characteristic space under the condition of unsupervised domain adaptation, and improves the pedestrian re-recognition precision.

Compared with the previous non-supervision domain adaptation method based on the prototype, the method not only utilizes the uncertainty brought by the new feature expression processing mapping of the current batch type center, but also utilizes the features of the previous batch type center and the current batch type center to solve the adverse effects brought by the smaller experimental batch and the wrong type center.

105: multi-view feature m extracted using neural networks _v And uncertainty s between views _v Modeling a Gaussian distribution z forming multiple views of a target domain _views ＝m _v +ωs _v Each image of the source domain is formed into multiple images through an image enhancement algorithm, and the feature m is extracted through a neural network _imgs And calculate the uncertainty s between multiple images _imgs Modeling a Gaussian distribution z forming a source domain multiple image _imgs ＝m _imgs +ωs _imgs The uncertainty in the multi-image and multi-view of the target domain of the cross-view pedestrian source domain is used for representing the characteristics of a single pedestrian for the first time, and the robustness of the pedestrian re-recognition model is improved.

Gaussian distribution for generating source domain and target domain instancesNew expression z _imgs And z _views And respectively obtaining the characteristics and uncertainty of each class in the distribution by fitting a Gaussian mixture model and utilizing an expected maximization algorithm to carry out iterative operation, so as to obtain a high-order Gaussian prototype new characteristic expression of the source domain class and the target domain class. The novel feature expression of the high-order Gaussian prototype is obtained without directly adopting the pseudo tag, so that the model can be constrained after each training, the authenticity of predicting the pseudo tag is improved, the degradation of the pedestrian re-identification precision caused by directly using the pseudo tag is avoided, meanwhile, inter-domain distribution constraint is carried out through JS divergence, and the inter-domain difference is reduced;

In the Gaussian vision and mixed Gaussian prototype paths, in order to solve the problem of uncertainty of single instance representation of multi-view during cross-view pedestrian re-recognition on an instance level, modeling is performed to form Gaussian distribution, and the embodiment of the invention converts the original simple target domain multi-view maximum pooling generation compact description Fu Tezheng into:

multi-view feature m extracted using neural networks _v And uncertainty s between views _v Modeling a Gaussian distribution z that forms multiple views of a target domain _views ＝m _v +ωs _v The uncertainty among different views can be learned in the training process of the network, so that the accuracy of representing a certain example by multiple views is improved, and the relation among pedestrian multiple view information is obtained; correspondingly, the embodiment of the invention forms a plurality of images from each image of the source domain through an image enhancement algorithm, and extracts the characteristic m through a neural network _imgs And calculate the uncertainty s between multiple images _imgs Modeling a Gaussian distribution z forming a source domain multiple image _imgs ＝m _imgs +ωs _imgs The uncertainty in the multi-image and multi-view of the target domain of the cross-view pedestrian source domain is used for representing the characteristics of a single pedestrian for the first time, and the robustness of the pedestrian re-recognition network is enhanced.

Next, new representation z of Gaussian distribution generated by source domain and target domain instances _imgs And z _views Respectively, by fitting a Gaussian mixture model, and performing iterative operation by using an Expectation-maximization (EM) algorithm to obtainThe features and uncertainties of each of these distributions result in a high-order gaussian prototype new feature representation of the source domain and target domain classes. The novel feature expression of the high-order Gaussian prototype is obtained without directly adopting the pseudo tag, so that the model can be constrained after each training, the reality of predicting the pseudo tag is improved, the reduction of the precision of pedestrian re-recognition caused by directly using the pseudo tag is avoided, meanwhile, inter-domain distribution constraint is carried out through JS divergence and contrast center loss, the inter-domain difference is reduced, the distance between the Gaussian examples of the same class and the high-order Gaussian prototype in the Gaussian feature space is shortened through the contrast center loss, the problem that the pedestrian re-recognition tends to be over-fitted is solved, and the accuracy of pedestrian re-recognition by utilizing the novel example is improved.

106: introducing regularization terms, approximating the standard normal distribution N (0, 1) by an explicit constraint z (m, omega, s), measured by KL divergence between the two distributions;

where the degradation of the new feature expression z after prototype-level and instance-level gaussian modeling to the original deterministic representation z=m+c (c is a constant) is measured by KL divergence.

107: the Gaussian distribution remained in the Gaussian prototype path is respectively subjected to a Gaussian mixture model to generate new Gaussian characteristic representation through a memory network, and the whole distribution is constrained through standardized ternary loss;

in the global Gaussian mixture path, the embodiment of the invention uses a memory network to keep the Gaussian characteristics of the pedestrian image and the view in the Gaussian prototype path after each round of training, generates new global Gaussian mixture characteristic representation through a Gaussian mixture model, records the memory network into the global Gaussian mixture prototype through ternary standardized loss constraint by the heuristic of the ternary group loss, so that the category center is more accurate in semantic representation learning, the same category is more compact in space, and different categories are more dispersed in space, thereby improving the accuracy of pedestrian re-identification.

108: and (3) applying the final constraint result in the step (107) to the pedestrian re-recognition, and outputting a final recognition result, thereby improving the precision of the pedestrian re-recognition.

Wherein, this step 108 includes:

by using the cross-view pedestrian re-recognition database as input, a complete cross-view pedestrian re-recognition technology with high recognition accuracy is obtained by performing multiple rounds of training of the method in steps 101-107, and then a cross-view pedestrian re-recognition test is performed: by inputting an image of a single pedestrian, extracting features by the technology described by the invention, and calculating JS divergence between the features of the single pedestrian and other pedestrian in the database, thereby measuring the similarity of the input pedestrian image and all pedestrian cross-view views in the database, and because the result range of the JS divergence is 0-1, the difference degree of two different distributions in the feature space is well described, so that the pedestrian cross-view views with smaller difference and front sequence are easily screened out.

In summary, the embodiment of the invention provides a brand new method for identifying the pedestrian re-from the cross view angle, designs a brand new network structure and improves the performance of identifying the pedestrian re-from the cross view angle.

Example 2

The scheme of example 1 is further described in conjunction with specific examples, as follows:

201: the two-dimensional neural network extracts visual features of an image (source domain) and visual features of a target pedestrian multiview (target domain), and utilizes an AlexNet architecture as a visual feature network, and comprises the following steps: five-layer convolution network and three-layer full-connection network;

202: based on an unsupervised challenge domain adaptive learning strategy, under the condition of covariate transfer, the classifier C and the domain discriminator D are trained by minimizing the classification errors of the source domain and the difference between the source domain and the target domain, so that the global alignment of the source domain and the target domain is guided, the accuracy of a pedestrian re-recognition visual feature extraction network and the domain invariant feature expression capability are effectively enhanced, and the following formula is an objective function:

wherein F is a visual characteristic network of sharing parameters of a source domain and a target domain, and x is a visual characteristic network of sharing parameters of the source domain and the target domain ^s Representing source domain instances，x ^t Representing target domain instances, L _CE In order for the cross-entropy loss to occur,and->All sample sets for the source domain and the target domain, respectively.

203: after the visual features are obtained in the visual and prototype paths, the mobile category centers are aligned on the prototype level by using the mobile semantic migration network to solve the problem of error labels in the unsupervised domain adaptation pseudo labels, and the previous batch category centers are usedAnd the current lot class center->Is to obtain the mobile category center->The method is used for solving the adverse effects caused by small experimental batches and wrong class centers, so that the problem of pedestrian re-identification matching deviation caused by noise pseudo tags is solved, and the updating of the mobile class centers is as follows:

where ρ is a super parameter of the mobile class center, which is typically set to 0.3.

By aligning the centroids of each class in the source and target domains, the squared Euclidean distance d is used due to the smaller lot in the experiment _ED Limiting the distance between centroids of the same class of tags but different domains ensures that features of the same class are similar, as follows:

wherein,,and->And J represents the jth class centers of the source domain and the target domain respectively, wherein J is the number of the class centers of the source domain and the target domain.

204: using the characteristics C of the class center of the current lot ^j And predicting uncertainty s of the center features of the current batch category, and improving the data representation capability of the pedestrian image mapped to the feature space under the condition of unsupervised domain adaptation, thereby improving the accuracy of pedestrian re-recognition;

In the vision and Gaussian prototype paths, after the characteristics extracted by the neural network are obtained, the embodiment of the invention eliminates the defect of data uncertainty in the continuously mapped characteristic space, solves the problem of data uncertainty caused by image and view mapping, and defines that each image or view is no longer a deterministic representation in the characteristic space but obeys Gaussian distribution z (m, omega, s). The embodiment of the invention uses the re-parameterization technique to make the model still as in the conventional model training process, so as to perform back propagation, specifically, the characteristic m=C of the class center of the current batch is utilized ^j And predicting uncertainty s of the center feature of the current lot class to be re-modeled as z=m+ωs, and randomly sampling a random noise ω∈n (0, 1) independent of the model parameters, equivalently re-expressing the visual features as:

wherein n is _class For a current batch of a certain class C ^j Is used in the number of (a) and (b),characteristic of the ith instance of the jth class, the class center of the current lotSign C ^j The formula is as follows:

the problem of data uncertainty caused by continuously mapping the features to the feature space is solved by re-expressing the features of the image and view mapping, the Gaussian distribution is formed by modeling the features of the class center of the current batch and the uncertainty in the class for the first time, the problem that the features extracted from the images or the multiple perspectives are directly used for semantic alignment by the traditional method is solved, the problem of uncertainty information in the class after mapping the images to the feature space is not learned, the data representation capability of the pedestrian image mapping to the feature space is improved under the condition of unsupervised domain adaptation, and the pedestrian re-recognition precision is improved.

Since the Gaussian distribution is formed by modeling by using the characteristics of the class center of the current batch and the uncertainty in the class for the first time, the similarity of the Gaussian distribution between two domains is measured by using the JS divergence, and the difference degree before the two Gaussian distributions is well described because the JS divergence range is 0-1, as follows:

wherein z is ^s And z ^t The new Gaussian distribution expressions of the source domain and the target domain are respectively represented, JS is JS divergence, and KL is KL divergence. The method generally utilizes JS divergence between examples to measure similarity between the examples of the current batch, enhances the characteristic representation capability of categories, improves the semantic representation capability of the example-based method, reduces the domain interval, greatly reduces the running time of a model, enhances the global semantic alignment of a guiding source domain and a target domain by the similarity of Gaussian distribution between the JS divergence measurement, effectively enhances similar compactness and different types of discrete on Gaussian characteristic space of a pedestrian re-recognition network and reduces the difference degree between domains.

Wherein the visual characteristics required for steps 203 and 204 are processed as follows: the cross-view pedestrian view is extracted to visual features through a convolution network, the features are subjected to numerical comparison, the maximum value of a plurality of views is obtained, and then single feature representation of the cross-view pedestrian view is obtained.

205: modeling of characteristics of multiple views and uncertainty between views extracted using neural networks to form a new representation z of a target domain multiple view _views ；

The visual characteristics required for step 205 are processed as follows: firstly, under the inspired of semi-supervision consistency regularization, the source domain performs disturbance to the input image to different degrees, and finally the result of model classification is still similar to the correct result; to increase the disturbance, the present example is image enhancement with the following variations: maximizing image contrast, rotating, reversing pixel values, adjusting brightness, color, sharpness, saturation and definition of an image, miscut and translation along the horizontal or vertical direction of the image, balancing an image histogram, converting the image histogram into a reverse color image, reducing the number of bits of each color channel of the image, and shielding information of a certain area of the image for 16 different operations; n kinds of operations (which are consistent with the number of pedestrian views crossing the visual angles) are selected from 16 kinds of operations in each training, and the intensity M of image change can be adjusted independently to ensure the robustness of the model.

Through the operation, the single image is spliced to form the source domain multi-image after being changed for the first time, the robustness of the cross-view pedestrian re-recognition is improved, and the multi-image and multi-view characteristics are obtained through the visual characteristic extraction network with the multi-view of the target domain.

In the Gaussian vision and mixed Gaussian prototype paths, in order to solve the uncertainty problem of single instance representation of multi-view pedestrian re-recognition at cross-view angle for the first time on the instance level, the relation between the multi-view information of the pedestrians is acquired, and similar to the step 204, the multi-view vision characteristic is equivalently re-expressed as z after being modeled to form Gaussian distribution _views ＝m _v +ωs _v The embodiment of the invention converts the original simple target domain multi-view maximum pooling generation compact description Fu Tezheng into:

multi-view features extracted using neural networksAnd uncertainty modeling between views to form a new representation z of the target domain multi-view _views The uncertainty among different views can be learned in the training process of the network, so that the accuracy of representing a certain example by multiple views is improved, and the target domain multi-view visual characteristic modeling is equivalently restated after Gaussian distribution is formed:

wherein,,k _views for the number of views of the target domain, +.>Features extracted from the kth view of the target domain after passing through the visual feature network.

The existing method ignores the uncertainty problem of single instance representation caused by multiple views, is difficult to master subtle different information caused by each view, and the method based on the fused view features easily causes that the pedestrian re-recognition tends to be over-fitted, so that the accuracy of the pedestrian re-recognition by using the new instance is reduced. In Gaussian vision and mixed Gaussian prototype paths, the invention firstly converts original simple object domain multiple view maximum generation characteristics into multiple view modeling to form Gaussian distribution z _views The link between the pedestrian multi-view information is obtained.

Correspondingly, under the inspired of semi-supervised consistency regularization, the embodiment of the invention firstly forms multiple images of each image of the source domain through an image enhancement algorithm, extracts characteristics through a neural network and calculates uncertainty among the multiple images, and models and forms a new Gaussian distribution expression z of the multiple images of the source domain _imgs ＝m _imgs +ωs _imgs The robustness of the pedestrian re-recognition network is enhanced, the robustness of the pedestrian re-recognition model is improved, the accuracy of representing a certain example by multiple images is improved, and the characteristic m is improved _imgs And uncertainty s _imgs The implementation is consistent with the target domain and will not be described again.

Similar to step 202, the classifier and domain discriminator are trained to align the guided source domain, the target domain, based on an unsupervised challenge domain adaptation learning strategy. Next, new representation z of Gaussian distribution generated by source domain and target domain instances _imgs And z _views The reality of the pedestrian re-recognition model is restrained by fitting a Gaussian mixture model, so that the reality of the predicted false label is improved, a high-order Gaussian prototype new feature expression is obtained without directly adopting the false label, so that the reality of the predicted false label can be improved by restraining the model after each training, further, the reduction of the precision of the pedestrian re-recognition caused by directly using the false label is avoided, and the probability distribution of the Gaussian mixture model is as follows:

Wherein,,parameters of the j-th class in the mixed model are respectively represented: expectation, variance and probability, m _j 、s _j For the expectation and variance of each instance, the parameters of a Gaussian mixture model are obtained by utilizing the iterative operation of an EM algorithm, and then the characteristics and uncertainty of each class in the distributions are obtained, so that the new characteristic expression of the Gaussian prototype of the high order of the source domain and the target domain class is obtained, meanwhile, the inter-domain distribution constraint is carried out through JS divergence, the inter-domain difference is reduced, and the specific implementation mode is similar to the formula (6) and is not repeated; at the same time by contrast center loss L _ccl The distances of the Gaussian instances of the same class and the high-order Gaussian prototype in the Gaussian feature space are drawn as follows:

wherein N is _class D, as the number of instances of the current batch _ED Is the square of the euclidean distance,gaussian feature, which is the nth instance of class j,>for the high order gaussian prototype feature of the current batch, n _class For the total number of the high-order Gaussian prototypes of the current batch, l+.j represents that the high-order Gaussian prototype tag l is not equal to the Gaussian instance +.>The corresponding label j, gamma is a fixed super parameter used for limiting the maximum value of the latter term of the formula (9) so as to prevent the pedestrian re-identification network from entering an unstable state in the training process, and theta is the loss L of the comparison center _ccl And gamma and theta require manual adjustment.

Using JS divergence and contrast center loss L _ccl The method can solve the problems of difference of pedestrian re-recognition domains and tendency to over-fitting, improves the accuracy of pedestrian re-recognition by using a new instance, overcomes the defect that the previous instance-based method and the prototype-based method do not combine the instance and the prototype, and utilizes Gaussian distribution to enable the distance between the Gaussian instance and the same-class high-order Gaussian prototype in a feature space to be denser, namely improves the distribution similarity of the Gaussian instance and the same-class high-order Gaussian prototype, further enables the connection between the instance and the prototype to be tighter, and enables the Gaussian instance and the high-order Gaussian prototypes of different classes to be discrete in the feature space, thereby improving the accuracy of pedestrian re-recognition.

206: by explicitly constraining z (m, ω, s) to approach the standard normal distribution N (0, 1), measured by KL divergence between the two distributions;

the regularization term is introduced in the model optimization process, whether the model optimization process is a visual and Gaussian prototype path or a Gaussian visual and Gaussian mixture prototype path, the model optimization process is characterized in that the regularization term is introduced in the model optimization process, whether the model optimization process is a visual and Gaussian prototype path or a Gaussian mixture prototype path is a visual and Gaussian mixture prototype path or a Gaussian mixture prototype path is measured by the fact that an explicit constraint z (m, omega, s) approaches a standard normal distribution N (0, 1), the KL divergence between the two distributions is used for measuring, the new feature expression z=m+omega s after modeling of prototype level and instance level Gaussian is prevented from being degraded to be an original deterministic expression z=m+c (c is a constant), and the following formula is simplified:

L _sim ＝(m ² +s ² -logs ² -1)/2 (10)

207: acquiring new gaussian feature representationsAlso using the constraint of equation (9), L is used by the heuristic of triplet loss _norm The whole distribution is restrained, the problem that the pedestrian re-recognition tends to be over-fitted is solved, and the accuracy of the pedestrian re-recognition by using the new example is improved;

in the global Gaussian mixture path, the embodiment of the invention utilizes a memory network to ensure that each round of training has previous semantic information, the Gaussian characteristics of the pedestrian image and the view prototype level after each round of training are reserved, and the parameters of the Gaussian mixture model are obtained through the Gaussian mixture model respectively to generate new global Gaussian mixture characteristic representationAnd the constraint of the formula (9) is also utilized, so that the distance between the current batch of Gaussian instances and the new global Gaussian mixture prototype with the same category in the feature space is more dense, the current batch of Gaussian instances and the new global Gaussian mixture prototype with different categories are scattered in the feature space, each round of training has previous semantic information in the cross-view pedestrian re-recognition for the first time, and the accuracy of pedestrian re-recognition is improved. By inspiring the triplet loss, the invention constrains the overall distribution through the standardized current batch Gaussian instance and new different types of high-order Gaussian prototypes to obtain the standardized triplet loss L _norm The following formula:

where η is the minimum distance between the current Gaussian prototype and the global Gaussian mixture prototypes of different categories, J is the number of categories,gao Sixin characterizations of the same class and different classes, respectively, representing the current prototype level, d _ED Is the square of Euclidean distance, z ₂ Normalization for L2, such that Gao Sixin characterizations of different classes are of the same order of magnitude, in order to avoid aggregation of class-centric Gaussian new feature representations into very close spaces, limit ∈ ->And->At least%>Is to be compared with->Far eta, the category center is more accurate in semantic representation learning, the same category is more compact in space, and different categories are more scattered in space, and the accuracy of recognition of pedestrians across visual angles can be greatly improved because each round of training has previous semantic information.

208: the final constraint result in the step 207 is applied to pedestrian re-recognition, so that the recognition accuracy of pedestrian re-recognition is improved.

Wherein, this step 208 includes:

by using the cross-view pedestrian re-recognition database as input, a complete cross-view pedestrian re-recognition technology with high recognition accuracy is obtained by performing multi-round training of the 201-207 method, and then a cross-view pedestrian re-recognition test is performed: by inputting an image of a single pedestrian, extracting features by the technology described by the invention, and calculating JS divergence between the features of the single pedestrian and other pedestrian in the database, thereby measuring the similarity of the input pedestrian image and all pedestrian cross-view views in the database, and because the result range of the JS divergence is 0-1, the difference degree of two different distributions in the feature space is well described, so that the pedestrian cross-view views with smaller difference and front sequence are easily screened out, and the specific operation is as follows:

The pedestrian re-recognition test comprises the steps of firstly, passing an input pedestrian image through a pedestrian re-recognition model trained by a cross-visual angle pedestrian re-recognition database, uploading the pedestrian image to be recognized as input on a recognition page, and obtaining the characteristics of the input pedestrian image after background model operation;

and secondly, similarity measurement is carried out on the pedestrian image characteristics obtained by the background and the cross-view characteristics of all pedestrians in the database. Before the step is carried out, all the characteristics of the cross-view angle views of all pedestrians in the database are obtained through model operation, and the characteristics are stored, so that only the pedestrian image characteristics needing to be identified are required to be extracted after the input, the picture characteristics of the cross-view angle pedestrian identification database are not required to be repeatedly extracted again, the simple matrix multiplication is carried out on the input pedestrian image characteristics and the stored cross-view angle features to realize JS divergence calculation similarity, the actual requirements of pedestrian image identification are met, and the time is saved;

and thirdly, presenting the recognition result of the single pedestrian image on the recognition result page, and seeing the pedestrian views with different visual angles, smaller difference with the input pedestrian image and good sequence according to the similarity.

In summary, through the steps 201-208, the method and the device solve the problem that error labels exist in the unsupervised domain adaptive pseudo labels by using the mobile semantic migration network to move and align the category centers, and realize more accurate semantic representation learning, thereby relieving the problem of pedestrian re-identification matching deviation caused by noise pseudo labels; meanwhile, the defect that the feature space of continuous mapping has data uncertainty is eliminated, the problem of inherent noise of a data set and data uncertainty caused by mapping of images and views is solved, the similarity of Gaussian distribution between two domains is measured by JS divergence, the degree of difference between the domains is reduced, the data characterization capability of the pedestrian image and the view mapped to the feature space is improved under the condition of unsupervised domain adaptation, the accuracy of pedestrian re-recognition is improved, the uncertain information of multiple paths is utilized, the data characterization capability of the pedestrian image and the view mapped to the feature space across the view is explored, the instance representation uncertain information caused by multiple views is better learned, the problem that pedestrian re-recognition tends to be overfitted is solved, the accuracy in the process of pedestrian re-recognition by utilizing new instances in practical application is improved, and meanwhile, the robustness of pedestrian re-recognition is improved, so that the performance of pedestrian re-recognition across the view is improved.

Example 3

A cross-view pedestrian re-recognition device based on gaussian modeling, the device comprising: a processor and a memory. A processor and a memory, the memory having stored therein program instructions, the processor invoking the program instructions stored in the memory to cause the apparatus to perform the method steps of:

and (3) respectively generating new Gaussian feature representation by using Gaussian distribution reserved in the Gaussian prototype level path through a memory network through a Gaussian mixture model, restricting the whole distribution through ternary standardization loss, and outputting a final recognition result.

The new expression of modeling and forming the target domain multi-view by utilizing the characteristics of the multi-view extracted by the neural network and the uncertainty among the views is specifically as follows:

Modeling by utilizing the characteristics of multiple views extracted by a neural network and the uncertainty between the views to form Gaussian distribution of multiple views of a target domain, forming multiple images by image enhancement on each image of a source domain, extracting the characteristics by the neural network, calculating the uncertainty between the multiple images, and modeling to form Gaussian distribution of multiple images of the source domain; utilizing uncertainty inside the multi-image of the cross-view pedestrian source domain and the multi-view of the target domain to represent the characteristics of a single pedestrian;

the new Gaussian distribution expressions generated by the source domain and the target domain examples are respectively subjected to iterative operation by fitting a Gaussian mixture model and utilizing an expected maximization algorithm to acquire the characteristics and uncertainty of each type;

and obtaining a high-order Gaussian prototype new feature expression of the source domain and the target domain category, and carrying out inter-domain distribution constraint through JS divergence.

Wherein, the visual characteristics are equivalently restated as follows:

feature C of class center ^j The formula is as follows:

wherein n is _class For a current batch of a certain class C ^j Is used in the number of (a) and (b),is characteristic of the ith instance of the jth class.

Further, the expression of the new expression of the target domain multiview is:

The Gaussian distribution generated by the source domain and the target domain examples is newly expressed as:

z _imgs ＝m _imgs +ωs _imgs

wherein m is _imgs Is characterized by s _imgs For uncertainty, ω εN (0, 1).

Further, regularization terms are introduced, and the approximate standard normal distribution is achieved through explicit constraint, and the KL divergence is measured specifically as follows:

L _sim ＝(m ² +s ² -logs ² -1)/2

where m is the feature of the class center and s is the uncertainty of predicting the feature of the class center of the current lot.

Constraining the overall distribution by normalizing the ternary loss is:

the overall distribution is constrained by the normalized current batch gaussian instance and the new different class high order gaussian prototype, as follows:

/>

wherein,,gao Sixin characterizations of the same class and different classes, respectively, representing the current prototype level, d _ED Is the square of Euclidean distance, |z| ₂ Normalized for L2 so that Gao Sixin characterizations of different classes are of the same order of magnitude, η is the minimum distance between the current Gaussian prototype and the global mixture Gaussian prototype of different classes, +.>For the global mixture gaussian prototype, J is the number of categories.

And outputting a final recognition result as follows:

inputting an image of a single pedestrian, extracting features, calculating JS divergence between the image and other pedestrian features in the database, measuring similarity of the input pedestrian image and all pedestrian cross-view angles in the database, and screening out pedestrian cross-view angles with smaller differences and higher sequences.

It should be noted that, the device descriptions in the above embodiments correspond to the method descriptions in the embodiments, and the embodiments of the present invention are not described herein in detail.

The execution main body of the processor and the memory can be a device with a calculation function, such as a computer, a singlechip, a microcontroller, and the like, and the execution main body is not limited in the embodiment of the invention, and is selected according to the needs in practical application. The data signals are transmitted between the memory and the processor through the bus, and the embodiments of the present invention will not be described in detail.

Based on the same inventive concept, the embodiment of the present invention also provides a computer readable storage medium, where the storage medium includes a stored program, and when the program runs, the device where the storage medium is controlled to execute the method steps in the above embodiment.

The computer readable storage medium includes, but is not limited to, flash memory, hard disk, solid state disk, and the like.

It should be noted that the readable storage medium descriptions in the above embodiments correspond to the method descriptions in the embodiments, and the embodiments of the present invention are not described herein.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the invention, in whole or in part.

The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium or a semiconductor medium, or the like.

Reference is made to:

[1]Oza P,Sindagi VA,Sharmini V V,et al.Unsupervised domain adaptation ofobject detectors:A survey[J].IEEE Transactions on PatternAnalysis and Machine Intelligence,2023.

[2]LeCun Y,Bottou L,Bengio Y,et al.Gradient-based learning applied to document recognition[J].Proceedings ofthe IEEE,1998,86(11):2278-2324.

[3]Ganin Y,Lempitsky V.Unsupervised domain adaptation by backpropagation[C]//International conference on machine learning.PMLR,2015:1180-1189.

[4]Long M,Zhu H,Wang J,et al.Deep transfer learning with joint adaptation networks[C]//International conference on machine learning.PMLR,2017:2208-2217.

[5]Long M,Cao Z,Wang J,et al.Conditional adversarial domain adaptation[J].Advances in neural information processing systems,2018,31.

[6]Zhou H,Nie W,Li W,et al.Hierarchical instance feature alignment for 2D image-based 3Dshape retrieval[C]//Proceedings of the Twenty-Ninth International Conference on International Joint Conferences onArtificial Intelligence.2021:839-845.

[7]Zhou H,Nie W,Song D,et al.Semantic consistency guided instance feature alignment for 2Dimage-based 3D shape retrieval[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:925-933.

[8]Xie S,Zheng Z,Chen L,et al.Learning semantic representations for unsupervised domain adaptation[C]//International conference on machine learning.PMLR,2018:5423-5432.

[9]Zhou H,Liu AA,Nie W.Dual-level embedding alignment network for 2D image-based 3Dobject retrieval[C]//Proceedings ofthe 27th ACM International Conference on Multimedia.2019:1667-1675.

[10]Long M,Wang J,Ding G,et al.Transfer feature learning with joint distribution adaptation[C]//Proceedings of the IEEE international conference on computer vision.2013:2200-2207.

[11]Feng Y,Zhang Z,Zhao X,et al.Group-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).264-272.

those skilled in the art will appreciate that the drawings are schematic representations of only one preferred embodiment, and that the above-described embodiment numbers are merely for illustration purposes and do not represent advantages or disadvantages of the embodiments.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. A method for re-identifying a pedestrian across viewing angles based on gaussian modeling, the method comprising:

2. The method for identifying the pedestrian re-from the cross view angle based on the Gaussian modeling according to claim 1, wherein the uncertainty between the characteristics and the views of the multiple views extracted by using the neural network is specifically expressed as that:

3. The method for identifying the pedestrian re-from the cross-view angle based on Gaussian modeling according to claim 2, wherein the method is characterized in that the visual characteristics are equivalently restated as follows:

feature C of class center ^j The formula is as follows:

4. The method for identifying the pedestrian re-from the cross-view angle based on the Gaussian modeling according to claim 1, wherein the expression of the new expression of the target domain multi-view is:

5. The method for identifying the pedestrian re-from the cross-view angle based on the Gaussian modeling according to claim 1, wherein the Gaussian distribution generated by the source domain and the target domain examples is newly expressed as:

z _imgs ＝m _imgs +ωs _imgs

Wherein m is _imgs Is characterized by s _imgs For uncertainty, ω εN (0, 1).

6. The method for identifying the pedestrian re-from the cross view angle based on the Gaussian modeling according to claim 1, wherein the regularization term is introduced, and the standard normal distribution is approached through explicit constraint, and the KL divergence is measured specifically by:

L _sim ＝(m ² +s ² -logs ² -1)/2

7. The method for identifying the pedestrian re-from the cross view angle based on the Gaussian modeling according to claim 1, wherein the constraint of the overall distribution by ternary normalization loss is as follows:

8. The method for identifying the pedestrian re-from the cross-view angle based on Gaussian modeling according to claim 1, wherein the final identification result is output as follows:

9. A cross-view pedestrian re-recognition device based on gaussian modeling, the device comprising: a processor and a memory, the memory having stored therein program instructions that invoke the program instructions stored in the memory to cause an apparatus to perform the method steps of any of claims 1-8.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method steps of any of claims 1-8.