CN116486483A - Cross-view pedestrian re-recognition method and device based on Gaussian modeling - Google Patents

Cross-view pedestrian re-recognition method and device based on Gaussian modeling Download PDF

Info

Publication number
CN116486483A
CN116486483A CN202310449124.7A CN202310449124A CN116486483A CN 116486483 A CN116486483 A CN 116486483A CN 202310449124 A CN202310449124 A CN 202310449124A CN 116486483 A CN116486483 A CN 116486483A
Authority
CN
China
Prior art keywords
gaussian
pedestrian
view
uncertainty
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310449124.7A
Other languages
Chinese (zh)
Inventor
李文辉
周厚燃
刘安安
宋丹
孙正雅
聂婕
魏志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN202310449124.7A priority Critical patent/CN116486483A/en
Publication of CN116486483A publication Critical patent/CN116486483A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a device for re-identifying a pedestrian across visual angles based on Gaussian modeling, wherein the method comprises the following steps: re-modeling by using the characteristics of the class center of the current batch and the uncertainty of the predicted class center characteristics of the current batch, randomly sampling random noise irrelevant to model parameters, equivalently re-expressing visual characteristics, and measuring the similarity of Gaussian distribution between two domains by using JS divergence; modeling to form a new expression of the multi-view of the target domain by utilizing the characteristics of the multi-view extracted by the neural network and the uncertainty between the views; regularization items are introduced, the standard normal distribution is approached through explicit constraint, and KL divergence is used for measuring; and (3) respectively generating new Gaussian feature representation by using Gaussian distribution reserved in the Gaussian prototype level path through a memory network through a Gaussian mixture model, restricting the whole distribution through standardized ternary loss, and outputting a final recognition result. The device comprises: a processor and a memory.

Description

Cross-view pedestrian re-recognition method and device based on Gaussian modeling
Technical Field
The invention relates to the field of cross-view pedestrian re-recognition, in particular to a cross-view pedestrian re-recognition method and device based on Gaussian modeling.
Background
With the wide application of deep learning in various fields, the target retrieval task is focused on the fields of multimedia, biological recognition, computer vision and the like [1] And in particular to the practical application of pedestrian re-identification therein. The availability of large scale annotated data sets is relied upon when learning highly accurate pedestrian re-recognition models, but pedestrian re-recognition models performance can drastically decrease when evaluated on data sets lacking true tags for different images, the progress of pedestrian re-recognition models being largely due to deep convolutional neural networks [2] Most of the deep convolutional neural networks are trained in a supervised mode, and migration is difficult, so that an unsupervised domain adaptation method using a pedestrian re-recognition model trained by an existing labeled source domain data set and an unlabeled target domain data set is paid great attention, and particularly, a cross-domain multi-view pedestrian re-recognition technology is gradually paid attention to by utilizing a task of searching a corresponding multi-view target on the unlabeled data set through images.
Most of the existing approaches focus on antagonism-based [3-5] Based on examples [6,7] Based on prototypes [8,9] And based on fused view features [10,11] Thereby realizing the elimination of inter-domain differences and completing the domain adaptation task. On the one hand, the prototype-based domain adaptation method DLEA [9] Make up for the countermeasure-basedThe domain adaptation method ignores semantic information, reduces domain interval, but does not pay attention to data uncertainty caused by mapping of images or multi-view targets to feature space, and cannot well process differences and similarities in classes, so that pedestrian re-recognition is inaccurate, and the authenticity of noisy pseudo tags is still not high because of inter-domain difference.
On the other hand, these methods ignore learning of uncertain information, such as: domain adaptation method GVCNN based on fusion view characteristics and examples [11] And HIFA [6] Neglecting the uncertainty of a single instance representation due to multiple perspectives, it is difficult to grasp subtle different information from each perspective.
In summary, the existing cross-view pedestrian re-recognition technology method faces the following three disadvantages:
1. the original method is to directly use the image or the characteristics extracted from multiple visual angles for semantic alignment, but does not learn the uncertain information mapped to the characteristic space, thereby reducing the accuracy of pedestrian re-identification;
2. the conventional prototype-based method is simple to apply the pseudo tag, does not propose how to improve the authenticity of the pseudo tag, and can obtain a high-order prototype without adopting the pseudo tag through the novel characteristic expression of the sample high-order Gaussian prototype, so that the model can be constrained after each training so as to improve the authenticity of the predicted pseudo tag, and further, the degradation of the accuracy of pedestrian re-identification caused by directly using the pseudo tag is avoided;
3. The example representation uncertainty information caused by how to ignore multiple views by using the image features only can be ignored, the existing method for fusing the view features easily causes that the pedestrian re-recognition tends to be over-fitted, and the accuracy of the pedestrian re-recognition by using the new example is reduced.
Disclosure of Invention
The invention provides a method and a device for identifying pedestrian re-crossing angles based on Gaussian modeling, which solve the problem that a certain class does not exist in a certain batch due to smaller batches by utilizing a mobile class center and reduce the adverse effect of a pseudo tag; processing the inherent noise of the data by using the original uncertainty, and eliminating the defect of data uncertainty in the continuous mapping space; the image enhancement module is designed to improve the robustness of the pedestrian re-recognition model, and meanwhile uncertainty information caused by multiple images and multiple perspectives is explored; through the inspiring of triplet loss, the memory network records global Gaussian prototype pass loss constraint, so that the class is compact, heterogeneous is dispersed, accuracy in pedestrian re-recognition by utilizing a new instance is improved, further precision of pedestrian re-recognition is improved, requirements are still met under diversified use occasions, and the method is described in detail below:
in a first aspect, a method for identifying a pedestrian re-from a cross-view angle based on gaussian modeling, the method comprising:
Re-modeling by using the characteristics of the class center of the current batch and the uncertainty of the predicted class center characteristics of the current batch, randomly sampling random noise irrelevant to model parameters, equivalently re-expressing visual characteristics, and measuring the similarity of Gaussian distribution between two domains by using JS divergence;
modeling to form a new expression of the multi-view of the target domain by utilizing the characteristics of the multi-view extracted by the neural network and the uncertainty between the views;
regularization items are introduced, the standard normal distribution is approached through explicit constraint, and KL divergence is used for measuring;
and (3) respectively generating new Gaussian feature representation by using Gaussian distribution reserved in the Gaussian prototype level path through a memory network through a Gaussian mixture model, restricting the whole distribution through standardized ternary loss, and outputting a final recognition result.
In a second aspect, a device for identifying a pedestrian re-from a viewing angle based on gaussian modeling, the device comprising: a processor and a memory having stored therein program instructions that invoke the program instructions stored in the memory to cause an apparatus to perform the method steps of any of the first aspects.
In a third aspect, a computer readable storage medium stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method steps of any of the first aspects.
The technical scheme provided by the invention has the beneficial effects that:
1. in the vision and prototype paths, the category centers are moved and aligned by using a mobile semantic migration network to solve the problem that error labels exist in the unsupervised domain adaptive pseudo labels, each category center is aligned in a source domain and a target domain instead of directly regarding a pseudo label sample as a real sample, the adverse effect brought by the error category center is neutralized by using the correct category center, and due to smaller batches in experiments, the mobile category center can realize more accurate semantic representation learning, so that the problem of pedestrian re-recognition matching deviation caused by noise pseudo labels is relieved;
2. modeling gaussian distribution on a prototype level by using the characteristic m of the class center of the current batch and the intra-class uncertainty s in a visual and gaussian prototype path to process inherent noise of a data set, eliminating the defect of data uncertainty in a continuously mapped characteristic space, solving the problem of data uncertainty caused by image and view mapping, re-expressing the characteristic as z=m+ωs, wherein ω e N (0, 1), introducing regularization term in the optimization process, approaching a standard normal distribution N (0, 1) by explicitly constraining z (m, ω, s), and measuring by KL divergence between the two distributions to avoid degradation of the new characteristic expression z to be represented as original certainty z=m+c (c is a constant); meanwhile, the similarity of Gaussian distribution between two domains is measured by utilizing JS divergence, so that the degree of difference between the domains is reduced, the data representation capability of mapping pedestrian images and views to a feature space is improved under the condition of unsupervised domain adaptation, and the pedestrian re-recognition precision is improved;
3. In Gaussian vision and mixed Gaussian prototype paths, in order to solve the problem of uncertainty of multi-view in cross-view pedestrian re-recognition at an example level, the invention converts the original simple multi-view maximum generation characteristics of a target domain into multi-view modeling to form Gaussian distribution, and obtains the relation between pedestrian multi-view information, and correspondingly, each image of a source domain forms a plurality of different image modeling to form Gaussian distribution through an image enhancement technology, so that the robustness of a pedestrian re-recognition model is improved, and simultaneously, the KL divergence is utilized to avoid new characteristic expression z degradation and domain anti-similarity loss is utilized to guide the global alignment of the source domain and the target domain, so that the domain invariant characteristic expression capability of a pedestrian re-recognition visual characteristic extraction network is effectively enhanced;
4. the Gaussian mixture model is used for generating high-order Gaussian feature representation, JS divergence is used for measuring inter-domain divergence, uncertain information of instance representation caused by multiple visual angles is better learned, meanwhile, the distance between a Gaussian instance of the same category and a Gaussian prototype of a high order is shortened by comparing center loss in Gaussian feature space, the problem that pedestrian re-recognition tends to be over-fitted is solved, and accuracy in pedestrian re-recognition by using a new instance is improved;
5. In the global Gaussian mixture path, the embodiment of the invention uses a memory network to retain Gaussian features of pedestrian images and view prototype levels after each round of training, generates new global Gaussian mixture feature representation through a Gaussian mixture model, also uses KL divergence constraint to avoid new feature expression degradation, records the global Gaussian mixture prototype in the memory network through ternary standardized loss constraint by the heuristic of ternary group loss, so that the global category center is more accurate in semantic representation learning, the same category is more compact in space, and different categories are more dispersed in space, and further the precision of pedestrian re-identification is improved.
Drawings
FIG. 1 is a flow chart of a cross-view pedestrian re-recognition method based on Gaussian modeling;
fig. 2 is a network structure diagram of cross-view pedestrian re-recognition based on gaussian modeling.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in further detail below.
Example 1
Referring to fig. 1, a method for identifying a pedestrian across visual angles based on gaussian modeling comprises the following steps:
101: extracting visual features of an image (source domain) and visual features of a target pedestrian multi-view (target domain) by using a two-dimensional convolutional neural network;
The embodiment of the invention divides the visual characteristics into two different processing modes: firstly, entering a prototype level processing after obtaining visual features, namely, steps 103 and 104; and secondly, processing is performed on the image at the data enhancement and instance level, namely, step 105.
102: minimizing source domain classification errors and differences between source and target domains based on unsupervised challenge domain adaptation learning strategies under covariate transfer conditions to train a classifier and a domain discriminator to guide global alignment of source and target domains;
the domain invariant feature expression capability of the pedestrian re-identification visual feature extraction network can be effectively enhanced through the operation of the step.
103: solving a mobile category center by using a previous batch category center and a current batch category center, enabling the local semantic representation to be more accurate through loss constraint, and guiding global alignment of a source domain and a target domain by using a domain discriminator;
in the vision and prototype paths, the mobile semantic migration network is utilized to calculate the mobile category centers from the category centers of the previous batch and the category centers of the current batch, the square of Euclidean distance is used as a loss to enable the source domain and the target domain to be aligned to each category center, and the correct category center is used for neutralizing the adverse effect brought by the wrong category center;
Because of smaller batches in the experiment, the moving average class center can realize more accurate local semantic representation learning, and the moving semantic migration network solves the problem that error labels exist in the unsupervised domain adaptation pseudo labels, so that the problem of pedestrian re-identification matching deviation caused by noise pseudo labels is solved.
The domain contrast similarity loss is utilized to measure inter-domain divergence, i.e., embodiments of the present invention use an additional domain discriminator to determine whether the features are from the source domain or the target domain, and the visual feature extraction network is used to fool the domain discriminator, expecting a balance to be achieved in both games, thereby leading to global alignment of the source and target domains.
104: modeling by using the characteristic m and the uncertainty s of the class center of the current batch to form Gaussian distribution, and re-expressing the visual characteristic as z=m+ωs, wherein ω epsilon N (0, 1), and measuring the similarity of the Gaussian distribution between two domains by using JS divergence;
in the visual and Gaussian prototype paths, after the characteristics extracted by the neural network are obtained, the embodiment of the invention eliminates the defect of data uncertainty in the continuously mapped characteristic space, solves the problem of data uncertainty caused by image and view mapping, firstly proposes modeling to form Gaussian distribution by using the characteristic m of the category center of the current batch and the uncertainty s in the category, restates the visual characteristics as z=m+ωs, wherein ω epsilon N (0, 1), measures the similarity of Gaussian distribution between two domains by using JS divergence, improves the data representation capability of the pedestrian image mapped to the characteristic space under the condition of unsupervised domain adaptation, and improves the pedestrian re-recognition precision.
Compared with the previous non-supervision domain adaptation method based on the prototype, the method not only utilizes the uncertainty brought by the new feature expression processing mapping of the current batch type center, but also utilizes the features of the previous batch type center and the current batch type center to solve the adverse effects brought by the smaller experimental batch and the wrong type center.
105: multi-view feature m extracted using neural networks v And uncertainty s between views v Modeling a Gaussian distribution z forming multiple views of a target domain views =m v +ωs v Each image of the source domain is formed into multiple images through an image enhancement algorithm, and the feature m is extracted through a neural network imgs And calculate the uncertainty s between multiple images imgs Modeling a Gaussian distribution z forming a source domain multiple image imgs =m imgs +ωs imgs The uncertainty in the multi-image and multi-view of the target domain of the cross-view pedestrian source domain is used for representing the characteristics of a single pedestrian for the first time, and the robustness of the pedestrian re-recognition model is improved.
Gaussian distribution for generating source domain and target domain instancesNew expression z imgs And z views And respectively obtaining the characteristics and uncertainty of each class in the distribution by fitting a Gaussian mixture model and utilizing an expected maximization algorithm to carry out iterative operation, so as to obtain a high-order Gaussian prototype new characteristic expression of the source domain class and the target domain class. The novel feature expression of the high-order Gaussian prototype is obtained without directly adopting the pseudo tag, so that the model can be constrained after each training, the authenticity of predicting the pseudo tag is improved, the degradation of the pedestrian re-identification precision caused by directly using the pseudo tag is avoided, meanwhile, inter-domain distribution constraint is carried out through JS divergence, and the inter-domain difference is reduced;
In the Gaussian vision and mixed Gaussian prototype paths, in order to solve the problem of uncertainty of single instance representation of multi-view during cross-view pedestrian re-recognition on an instance level, modeling is performed to form Gaussian distribution, and the embodiment of the invention converts the original simple target domain multi-view maximum pooling generation compact description Fu Tezheng into:
multi-view feature m extracted using neural networks v And uncertainty s between views v Modeling a Gaussian distribution z that forms multiple views of a target domain views =m v +ωs v The uncertainty among different views can be learned in the training process of the network, so that the accuracy of representing a certain example by multiple views is improved, and the relation among pedestrian multiple view information is obtained; correspondingly, the embodiment of the invention forms a plurality of images from each image of the source domain through an image enhancement algorithm, and extracts the characteristic m through a neural network imgs And calculate the uncertainty s between multiple images imgs Modeling a Gaussian distribution z forming a source domain multiple image imgs =m imgs +ωs imgs The uncertainty in the multi-image and multi-view of the target domain of the cross-view pedestrian source domain is used for representing the characteristics of a single pedestrian for the first time, and the robustness of the pedestrian re-recognition network is enhanced.
Next, new representation z of Gaussian distribution generated by source domain and target domain instances imgs And z views Respectively, by fitting a Gaussian mixture model, and performing iterative operation by using an Expectation-maximization (EM) algorithm to obtainThe features and uncertainties of each of these distributions result in a high-order gaussian prototype new feature representation of the source domain and target domain classes. The novel feature expression of the high-order Gaussian prototype is obtained without directly adopting the pseudo tag, so that the model can be constrained after each training, the reality of predicting the pseudo tag is improved, the reduction of the precision of pedestrian re-recognition caused by directly using the pseudo tag is avoided, meanwhile, inter-domain distribution constraint is carried out through JS divergence and contrast center loss, the inter-domain difference is reduced, the distance between the Gaussian examples of the same class and the high-order Gaussian prototype in the Gaussian feature space is shortened through the contrast center loss, the problem that the pedestrian re-recognition tends to be over-fitted is solved, and the accuracy of pedestrian re-recognition by utilizing the novel example is improved.
106: introducing regularization terms, approximating the standard normal distribution N (0, 1) by an explicit constraint z (m, omega, s), measured by KL divergence between the two distributions;
where the degradation of the new feature expression z after prototype-level and instance-level gaussian modeling to the original deterministic representation z=m+c (c is a constant) is measured by KL divergence.
107: the Gaussian distribution remained in the Gaussian prototype path is respectively subjected to a Gaussian mixture model to generate new Gaussian characteristic representation through a memory network, and the whole distribution is constrained through standardized ternary loss;
in the global Gaussian mixture path, the embodiment of the invention uses a memory network to keep the Gaussian characteristics of the pedestrian image and the view in the Gaussian prototype path after each round of training, generates new global Gaussian mixture characteristic representation through a Gaussian mixture model, records the memory network into the global Gaussian mixture prototype through ternary standardized loss constraint by the heuristic of the ternary group loss, so that the category center is more accurate in semantic representation learning, the same category is more compact in space, and different categories are more dispersed in space, thereby improving the accuracy of pedestrian re-identification.
108: and (3) applying the final constraint result in the step (107) to the pedestrian re-recognition, and outputting a final recognition result, thereby improving the precision of the pedestrian re-recognition.
Wherein, this step 108 includes:
by using the cross-view pedestrian re-recognition database as input, a complete cross-view pedestrian re-recognition technology with high recognition accuracy is obtained by performing multiple rounds of training of the method in steps 101-107, and then a cross-view pedestrian re-recognition test is performed: by inputting an image of a single pedestrian, extracting features by the technology described by the invention, and calculating JS divergence between the features of the single pedestrian and other pedestrian in the database, thereby measuring the similarity of the input pedestrian image and all pedestrian cross-view views in the database, and because the result range of the JS divergence is 0-1, the difference degree of two different distributions in the feature space is well described, so that the pedestrian cross-view views with smaller difference and front sequence are easily screened out.
In summary, the embodiment of the invention provides a brand new method for identifying the pedestrian re-from the cross view angle, designs a brand new network structure and improves the performance of identifying the pedestrian re-from the cross view angle.
Example 2
The scheme of example 1 is further described in conjunction with specific examples, as follows:
201: the two-dimensional neural network extracts visual features of an image (source domain) and visual features of a target pedestrian multiview (target domain), and utilizes an AlexNet architecture as a visual feature network, and comprises the following steps: five-layer convolution network and three-layer full-connection network;
202: based on an unsupervised challenge domain adaptive learning strategy, under the condition of covariate transfer, the classifier C and the domain discriminator D are trained by minimizing the classification errors of the source domain and the difference between the source domain and the target domain, so that the global alignment of the source domain and the target domain is guided, the accuracy of a pedestrian re-recognition visual feature extraction network and the domain invariant feature expression capability are effectively enhanced, and the following formula is an objective function:
wherein F is a visual characteristic network of sharing parameters of a source domain and a target domain, and x is a visual characteristic network of sharing parameters of the source domain and the target domain s Representing source domain instances,x t Representing target domain instances, L CE In order for the cross-entropy loss to occur,and->All sample sets for the source domain and the target domain, respectively.
203: after the visual features are obtained in the visual and prototype paths, the mobile category centers are aligned on the prototype level by using the mobile semantic migration network to solve the problem of error labels in the unsupervised domain adaptation pseudo labels, and the previous batch category centers are usedAnd the current lot class center->Is to obtain the mobile category center->The method is used for solving the adverse effects caused by small experimental batches and wrong class centers, so that the problem of pedestrian re-identification matching deviation caused by noise pseudo tags is solved, and the updating of the mobile class centers is as follows:
where ρ is a super parameter of the mobile class center, which is typically set to 0.3.
By aligning the centroids of each class in the source and target domains, the squared Euclidean distance d is used due to the smaller lot in the experiment ED Limiting the distance between centroids of the same class of tags but different domains ensures that features of the same class are similar, as follows:
wherein,,and->And J represents the jth class centers of the source domain and the target domain respectively, wherein J is the number of the class centers of the source domain and the target domain.
204: using the characteristics C of the class center of the current lot j And predicting uncertainty s of the center features of the current batch category, and improving the data representation capability of the pedestrian image mapped to the feature space under the condition of unsupervised domain adaptation, thereby improving the accuracy of pedestrian re-recognition;
In the vision and Gaussian prototype paths, after the characteristics extracted by the neural network are obtained, the embodiment of the invention eliminates the defect of data uncertainty in the continuously mapped characteristic space, solves the problem of data uncertainty caused by image and view mapping, and defines that each image or view is no longer a deterministic representation in the characteristic space but obeys Gaussian distribution z (m, omega, s). The embodiment of the invention uses the re-parameterization technique to make the model still as in the conventional model training process, so as to perform back propagation, specifically, the characteristic m=C of the class center of the current batch is utilized j And predicting uncertainty s of the center feature of the current lot class to be re-modeled as z=m+ωs, and randomly sampling a random noise ω∈n (0, 1) independent of the model parameters, equivalently re-expressing the visual features as:
wherein n is class For a current batch of a certain class C j Is used in the number of (a) and (b),characteristic of the ith instance of the jth class, the class center of the current lotSign C j The formula is as follows:
the problem of data uncertainty caused by continuously mapping the features to the feature space is solved by re-expressing the features of the image and view mapping, the Gaussian distribution is formed by modeling the features of the class center of the current batch and the uncertainty in the class for the first time, the problem that the features extracted from the images or the multiple perspectives are directly used for semantic alignment by the traditional method is solved, the problem of uncertainty information in the class after mapping the images to the feature space is not learned, the data representation capability of the pedestrian image mapping to the feature space is improved under the condition of unsupervised domain adaptation, and the pedestrian re-recognition precision is improved.
Since the Gaussian distribution is formed by modeling by using the characteristics of the class center of the current batch and the uncertainty in the class for the first time, the similarity of the Gaussian distribution between two domains is measured by using the JS divergence, and the difference degree before the two Gaussian distributions is well described because the JS divergence range is 0-1, as follows:
wherein z is s And z t The new Gaussian distribution expressions of the source domain and the target domain are respectively represented, JS is JS divergence, and KL is KL divergence. The method generally utilizes JS divergence between examples to measure similarity between the examples of the current batch, enhances the characteristic representation capability of categories, improves the semantic representation capability of the example-based method, reduces the domain interval, greatly reduces the running time of a model, enhances the global semantic alignment of a guiding source domain and a target domain by the similarity of Gaussian distribution between the JS divergence measurement, effectively enhances similar compactness and different types of discrete on Gaussian characteristic space of a pedestrian re-recognition network and reduces the difference degree between domains.
Wherein the visual characteristics required for steps 203 and 204 are processed as follows: the cross-view pedestrian view is extracted to visual features through a convolution network, the features are subjected to numerical comparison, the maximum value of a plurality of views is obtained, and then single feature representation of the cross-view pedestrian view is obtained.
205: modeling of characteristics of multiple views and uncertainty between views extracted using neural networks to form a new representation z of a target domain multiple view views
The visual characteristics required for step 205 are processed as follows: firstly, under the inspired of semi-supervision consistency regularization, the source domain performs disturbance to the input image to different degrees, and finally the result of model classification is still similar to the correct result; to increase the disturbance, the present example is image enhancement with the following variations: maximizing image contrast, rotating, reversing pixel values, adjusting brightness, color, sharpness, saturation and definition of an image, miscut and translation along the horizontal or vertical direction of the image, balancing an image histogram, converting the image histogram into a reverse color image, reducing the number of bits of each color channel of the image, and shielding information of a certain area of the image for 16 different operations; n kinds of operations (which are consistent with the number of pedestrian views crossing the visual angles) are selected from 16 kinds of operations in each training, and the intensity M of image change can be adjusted independently to ensure the robustness of the model.
Through the operation, the single image is spliced to form the source domain multi-image after being changed for the first time, the robustness of the cross-view pedestrian re-recognition is improved, and the multi-image and multi-view characteristics are obtained through the visual characteristic extraction network with the multi-view of the target domain.
In the Gaussian vision and mixed Gaussian prototype paths, in order to solve the uncertainty problem of single instance representation of multi-view pedestrian re-recognition at cross-view angle for the first time on the instance level, the relation between the multi-view information of the pedestrians is acquired, and similar to the step 204, the multi-view vision characteristic is equivalently re-expressed as z after being modeled to form Gaussian distribution views =m v +ωs v The embodiment of the invention converts the original simple target domain multi-view maximum pooling generation compact description Fu Tezheng into:
multi-view features extracted using neural networksAnd uncertainty modeling between views to form a new representation z of the target domain multi-view views The uncertainty among different views can be learned in the training process of the network, so that the accuracy of representing a certain example by multiple views is improved, and the target domain multi-view visual characteristic modeling is equivalently restated after Gaussian distribution is formed:
wherein,,k views for the number of views of the target domain, +.>Features extracted from the kth view of the target domain after passing through the visual feature network.
The existing method ignores the uncertainty problem of single instance representation caused by multiple views, is difficult to master subtle different information caused by each view, and the method based on the fused view features easily causes that the pedestrian re-recognition tends to be over-fitted, so that the accuracy of the pedestrian re-recognition by using the new instance is reduced. In Gaussian vision and mixed Gaussian prototype paths, the invention firstly converts original simple object domain multiple view maximum generation characteristics into multiple view modeling to form Gaussian distribution z views The link between the pedestrian multi-view information is obtained.
Correspondingly, under the inspired of semi-supervised consistency regularization, the embodiment of the invention firstly forms multiple images of each image of the source domain through an image enhancement algorithm, extracts characteristics through a neural network and calculates uncertainty among the multiple images, and models and forms a new Gaussian distribution expression z of the multiple images of the source domain imgs =m imgs +ωs imgs The robustness of the pedestrian re-recognition network is enhanced, the robustness of the pedestrian re-recognition model is improved, the accuracy of representing a certain example by multiple images is improved, and the characteristic m is improved imgs And uncertainty s imgs The implementation is consistent with the target domain and will not be described again.
Similar to step 202, the classifier and domain discriminator are trained to align the guided source domain, the target domain, based on an unsupervised challenge domain adaptation learning strategy. Next, new representation z of Gaussian distribution generated by source domain and target domain instances imgs And z views The reality of the pedestrian re-recognition model is restrained by fitting a Gaussian mixture model, so that the reality of the predicted false label is improved, a high-order Gaussian prototype new feature expression is obtained without directly adopting the false label, so that the reality of the predicted false label can be improved by restraining the model after each training, further, the reduction of the precision of the pedestrian re-recognition caused by directly using the false label is avoided, and the probability distribution of the Gaussian mixture model is as follows:
Wherein,,parameters of the j-th class in the mixed model are respectively represented: expectation, variance and probability, m j 、s j For the expectation and variance of each instance, the parameters of a Gaussian mixture model are obtained by utilizing the iterative operation of an EM algorithm, and then the characteristics and uncertainty of each class in the distributions are obtained, so that the new characteristic expression of the Gaussian prototype of the high order of the source domain and the target domain class is obtained, meanwhile, the inter-domain distribution constraint is carried out through JS divergence, the inter-domain difference is reduced, and the specific implementation mode is similar to the formula (6) and is not repeated; at the same time by contrast center loss L ccl The distances of the Gaussian instances of the same class and the high-order Gaussian prototype in the Gaussian feature space are drawn as follows:
wherein N is class D, as the number of instances of the current batch ED Is the square of the euclidean distance,gaussian feature, which is the nth instance of class j,>for the high order gaussian prototype feature of the current batch, n class For the total number of the high-order Gaussian prototypes of the current batch, l+.j represents that the high-order Gaussian prototype tag l is not equal to the Gaussian instance +.>The corresponding label j, gamma is a fixed super parameter used for limiting the maximum value of the latter term of the formula (9) so as to prevent the pedestrian re-identification network from entering an unstable state in the training process, and theta is the loss L of the comparison center ccl And gamma and theta require manual adjustment.
Using JS divergence and contrast center loss L ccl The method can solve the problems of difference of pedestrian re-recognition domains and tendency to over-fitting, improves the accuracy of pedestrian re-recognition by using a new instance, overcomes the defect that the previous instance-based method and the prototype-based method do not combine the instance and the prototype, and utilizes Gaussian distribution to enable the distance between the Gaussian instance and the same-class high-order Gaussian prototype in a feature space to be denser, namely improves the distribution similarity of the Gaussian instance and the same-class high-order Gaussian prototype, further enables the connection between the instance and the prototype to be tighter, and enables the Gaussian instance and the high-order Gaussian prototypes of different classes to be discrete in the feature space, thereby improving the accuracy of pedestrian re-recognition.
206: by explicitly constraining z (m, ω, s) to approach the standard normal distribution N (0, 1), measured by KL divergence between the two distributions;
the regularization term is introduced in the model optimization process, whether the model optimization process is a visual and Gaussian prototype path or a Gaussian visual and Gaussian mixture prototype path, the model optimization process is characterized in that the regularization term is introduced in the model optimization process, whether the model optimization process is a visual and Gaussian prototype path or a Gaussian mixture prototype path is a visual and Gaussian mixture prototype path or a Gaussian mixture prototype path is measured by the fact that an explicit constraint z (m, omega, s) approaches a standard normal distribution N (0, 1), the KL divergence between the two distributions is used for measuring, the new feature expression z=m+omega s after modeling of prototype level and instance level Gaussian is prevented from being degraded to be an original deterministic expression z=m+c (c is a constant), and the following formula is simplified:
L sim =(m 2 +s 2 -logs 2 -1)/2 (10)
207: acquiring new gaussian feature representationsAlso using the constraint of equation (9), L is used by the heuristic of triplet loss norm The whole distribution is restrained, the problem that the pedestrian re-recognition tends to be over-fitted is solved, and the accuracy of the pedestrian re-recognition by using the new example is improved;
in the global Gaussian mixture path, the embodiment of the invention utilizes a memory network to ensure that each round of training has previous semantic information, the Gaussian characteristics of the pedestrian image and the view prototype level after each round of training are reserved, and the parameters of the Gaussian mixture model are obtained through the Gaussian mixture model respectively to generate new global Gaussian mixture characteristic representationAnd the constraint of the formula (9) is also utilized, so that the distance between the current batch of Gaussian instances and the new global Gaussian mixture prototype with the same category in the feature space is more dense, the current batch of Gaussian instances and the new global Gaussian mixture prototype with different categories are scattered in the feature space, each round of training has previous semantic information in the cross-view pedestrian re-recognition for the first time, and the accuracy of pedestrian re-recognition is improved. By inspiring the triplet loss, the invention constrains the overall distribution through the standardized current batch Gaussian instance and new different types of high-order Gaussian prototypes to obtain the standardized triplet loss L norm The following formula:
where η is the minimum distance between the current Gaussian prototype and the global Gaussian mixture prototypes of different categories, J is the number of categories,gao Sixin characterizations of the same class and different classes, respectively, representing the current prototype level, d ED Is the square of Euclidean distance, z 2 Normalization for L2, such that Gao Sixin characterizations of different classes are of the same order of magnitude, in order to avoid aggregation of class-centric Gaussian new feature representations into very close spaces, limit ∈ ->And->At least%>Is to be compared with->Far eta, the category center is more accurate in semantic representation learning, the same category is more compact in space, and different categories are more scattered in space, and the accuracy of recognition of pedestrians across visual angles can be greatly improved because each round of training has previous semantic information.
208: the final constraint result in the step 207 is applied to pedestrian re-recognition, so that the recognition accuracy of pedestrian re-recognition is improved.
Wherein, this step 208 includes:
by using the cross-view pedestrian re-recognition database as input, a complete cross-view pedestrian re-recognition technology with high recognition accuracy is obtained by performing multi-round training of the 201-207 method, and then a cross-view pedestrian re-recognition test is performed: by inputting an image of a single pedestrian, extracting features by the technology described by the invention, and calculating JS divergence between the features of the single pedestrian and other pedestrian in the database, thereby measuring the similarity of the input pedestrian image and all pedestrian cross-view views in the database, and because the result range of the JS divergence is 0-1, the difference degree of two different distributions in the feature space is well described, so that the pedestrian cross-view views with smaller difference and front sequence are easily screened out, and the specific operation is as follows:
The pedestrian re-recognition test comprises the steps of firstly, passing an input pedestrian image through a pedestrian re-recognition model trained by a cross-visual angle pedestrian re-recognition database, uploading the pedestrian image to be recognized as input on a recognition page, and obtaining the characteristics of the input pedestrian image after background model operation;
and secondly, similarity measurement is carried out on the pedestrian image characteristics obtained by the background and the cross-view characteristics of all pedestrians in the database. Before the step is carried out, all the characteristics of the cross-view angle views of all pedestrians in the database are obtained through model operation, and the characteristics are stored, so that only the pedestrian image characteristics needing to be identified are required to be extracted after the input, the picture characteristics of the cross-view angle pedestrian identification database are not required to be repeatedly extracted again, the simple matrix multiplication is carried out on the input pedestrian image characteristics and the stored cross-view angle features to realize JS divergence calculation similarity, the actual requirements of pedestrian image identification are met, and the time is saved;
and thirdly, presenting the recognition result of the single pedestrian image on the recognition result page, and seeing the pedestrian views with different visual angles, smaller difference with the input pedestrian image and good sequence according to the similarity.
In summary, through the steps 201-208, the method and the device solve the problem that error labels exist in the unsupervised domain adaptive pseudo labels by using the mobile semantic migration network to move and align the category centers, and realize more accurate semantic representation learning, thereby relieving the problem of pedestrian re-identification matching deviation caused by noise pseudo labels; meanwhile, the defect that the feature space of continuous mapping has data uncertainty is eliminated, the problem of inherent noise of a data set and data uncertainty caused by mapping of images and views is solved, the similarity of Gaussian distribution between two domains is measured by JS divergence, the degree of difference between the domains is reduced, the data characterization capability of the pedestrian image and the view mapped to the feature space is improved under the condition of unsupervised domain adaptation, the accuracy of pedestrian re-recognition is improved, the uncertain information of multiple paths is utilized, the data characterization capability of the pedestrian image and the view mapped to the feature space across the view is explored, the instance representation uncertain information caused by multiple views is better learned, the problem that pedestrian re-recognition tends to be overfitted is solved, the accuracy in the process of pedestrian re-recognition by utilizing new instances in practical application is improved, and meanwhile, the robustness of pedestrian re-recognition is improved, so that the performance of pedestrian re-recognition across the view is improved.
Example 3
A cross-view pedestrian re-recognition device based on gaussian modeling, the device comprising: a processor and a memory. A processor and a memory, the memory having stored therein program instructions, the processor invoking the program instructions stored in the memory to cause the apparatus to perform the method steps of:
re-modeling by using the characteristics of the class center of the current batch and the uncertainty of the predicted class center characteristics of the current batch, randomly sampling random noise irrelevant to model parameters, equivalently re-expressing visual characteristics, and measuring the similarity of Gaussian distribution between two domains by using JS divergence;
modeling to form a new expression of the multi-view of the target domain by utilizing the characteristics of the multi-view extracted by the neural network and the uncertainty between the views;
regularization items are introduced, the standard normal distribution is approached through explicit constraint, and KL divergence is used for measuring;
and (3) respectively generating new Gaussian feature representation by using Gaussian distribution reserved in the Gaussian prototype level path through a memory network through a Gaussian mixture model, restricting the whole distribution through ternary standardization loss, and outputting a final recognition result.
The new expression of modeling and forming the target domain multi-view by utilizing the characteristics of the multi-view extracted by the neural network and the uncertainty among the views is specifically as follows:
Modeling by utilizing the characteristics of multiple views extracted by a neural network and the uncertainty between the views to form Gaussian distribution of multiple views of a target domain, forming multiple images by image enhancement on each image of a source domain, extracting the characteristics by the neural network, calculating the uncertainty between the multiple images, and modeling to form Gaussian distribution of multiple images of the source domain; utilizing uncertainty inside the multi-image of the cross-view pedestrian source domain and the multi-view of the target domain to represent the characteristics of a single pedestrian;
the new Gaussian distribution expressions generated by the source domain and the target domain examples are respectively subjected to iterative operation by fitting a Gaussian mixture model and utilizing an expected maximization algorithm to acquire the characteristics and uncertainty of each type;
and obtaining a high-order Gaussian prototype new feature expression of the source domain and the target domain category, and carrying out inter-domain distribution constraint through JS divergence.
Wherein, the visual characteristics are equivalently restated as follows:
feature C of class center j The formula is as follows:
wherein n is class For a current batch of a certain class C j Is used in the number of (a) and (b),is characteristic of the ith instance of the jth class.
Further, the expression of the new expression of the target domain multiview is:
wherein,,k views for the number of views of the target domain, +.>Features extracted from the kth view of the target domain after passing through the visual feature network.
The Gaussian distribution generated by the source domain and the target domain examples is newly expressed as:
z imgs =m imgs +ωs imgs
wherein m is imgs Is characterized by s imgs For uncertainty, ω εN (0, 1).
Further, regularization terms are introduced, and the approximate standard normal distribution is achieved through explicit constraint, and the KL divergence is measured specifically as follows:
L sim =(m 2 +s 2 -logs 2 -1)/2
where m is the feature of the class center and s is the uncertainty of predicting the feature of the class center of the current lot.
Constraining the overall distribution by normalizing the ternary loss is:
the overall distribution is constrained by the normalized current batch gaussian instance and the new different class high order gaussian prototype, as follows:
/>
wherein,,gao Sixin characterizations of the same class and different classes, respectively, representing the current prototype level, d ED Is the square of Euclidean distance, |z| 2 Normalized for L2 so that Gao Sixin characterizations of different classes are of the same order of magnitude, η is the minimum distance between the current Gaussian prototype and the global mixture Gaussian prototype of different classes, +.>For the global mixture gaussian prototype, J is the number of categories.
And outputting a final recognition result as follows:
inputting an image of a single pedestrian, extracting features, calculating JS divergence between the image and other pedestrian features in the database, measuring similarity of the input pedestrian image and all pedestrian cross-view angles in the database, and screening out pedestrian cross-view angles with smaller differences and higher sequences.
It should be noted that, the device descriptions in the above embodiments correspond to the method descriptions in the embodiments, and the embodiments of the present invention are not described herein in detail.
The execution main body of the processor and the memory can be a device with a calculation function, such as a computer, a singlechip, a microcontroller, and the like, and the execution main body is not limited in the embodiment of the invention, and is selected according to the needs in practical application. The data signals are transmitted between the memory and the processor through the bus, and the embodiments of the present invention will not be described in detail.
Based on the same inventive concept, the embodiment of the present invention also provides a computer readable storage medium, where the storage medium includes a stored program, and when the program runs, the device where the storage medium is controlled to execute the method steps in the above embodiment.
The computer readable storage medium includes, but is not limited to, flash memory, hard disk, solid state disk, and the like.
It should be noted that the readable storage medium descriptions in the above embodiments correspond to the method descriptions in the embodiments, and the embodiments of the present invention are not described herein.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the invention, in whole or in part.
The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted across a computer-readable storage medium. Computer readable storage media can be any available media that can be accessed by a computer or data storage devices, such as servers, data centers, etc., that contain an integration of one or more available media. The usable medium may be a magnetic medium or a semiconductor medium, or the like.
Reference is made to:
[1]Oza P,Sindagi VA,Sharmini V V,et al.Unsupervised domain adaptation ofobject detectors:A survey[J].IEEE Transactions on PatternAnalysis and Machine Intelligence,2023.
[2]LeCun Y,Bottou L,Bengio Y,et al.Gradient-based learning applied to document recognition[J].Proceedings ofthe IEEE,1998,86(11):2278-2324.
[3]Ganin Y,Lempitsky V.Unsupervised domain adaptation by backpropagation[C]//International conference on machine learning.PMLR,2015:1180-1189.
[4]Long M,Zhu H,Wang J,et al.Deep transfer learning with joint adaptation networks[C]//International conference on machine learning.PMLR,2017:2208-2217.
[5]Long M,Cao Z,Wang J,et al.Conditional adversarial domain adaptation[J].Advances in neural information processing systems,2018,31.
[6]Zhou H,Nie W,Li W,et al.Hierarchical instance feature alignment for 2D image-based 3Dshape retrieval[C]//Proceedings of the Twenty-Ninth International Conference on International Joint Conferences onArtificial Intelligence.2021:839-845.
[7]Zhou H,Nie W,Song D,et al.Semantic consistency guided instance feature alignment for 2Dimage-based 3D shape retrieval[C]//Proceedings of the 28th ACM International Conference on Multimedia.2020:925-933.
[8]Xie S,Zheng Z,Chen L,et al.Learning semantic representations for unsupervised domain adaptation[C]//International conference on machine learning.PMLR,2018:5423-5432.
[9]Zhou H,Liu AA,Nie W.Dual-level embedding alignment network for 2D image-based 3Dobject retrieval[C]//Proceedings ofthe 27th ACM International Conference on Multimedia.2019:1667-1675.
[10]Long M,Wang J,Ding G,et al.Transfer feature learning with joint distribution adaptation[C]//Proceedings of the IEEE international conference on computer vision.2013:2200-2207.
[11]Feng Y,Zhang Z,Zhao X,et al.Group-view convolutional neural networks for 3D shape recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR).264-272.
those skilled in the art will appreciate that the drawings are schematic representations of only one preferred embodiment, and that the above-described embodiment numbers are merely for illustration purposes and do not represent advantages or disadvantages of the embodiments.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (10)

1. A method for re-identifying a pedestrian across viewing angles based on gaussian modeling, the method comprising:
re-modeling by using the characteristics of the class center of the current batch and the uncertainty of the predicted class center characteristics of the current batch, randomly sampling random noise irrelevant to model parameters, equivalently re-expressing visual characteristics, and measuring the similarity of Gaussian distribution between two domains by using JS divergence;
Modeling to form a new expression of the multi-view of the target domain by utilizing the characteristics of the multi-view extracted by the neural network and the uncertainty between the views;
regularization items are introduced, the standard normal distribution is approached through explicit constraint, and KL divergence is used for measuring;
and (3) respectively generating new Gaussian feature representation by using Gaussian distribution reserved in the Gaussian prototype level path through a memory network through a Gaussian mixture model, restricting the whole distribution through standardized ternary loss, and outputting a final recognition result.
2. The method for identifying the pedestrian re-from the cross view angle based on the Gaussian modeling according to claim 1, wherein the uncertainty between the characteristics and the views of the multiple views extracted by using the neural network is specifically expressed as that:
modeling by utilizing the characteristics of multiple views extracted by a neural network and the uncertainty between the views to form Gaussian distribution of multiple views of a target domain, forming multiple images by image enhancement on each image of a source domain, extracting the characteristics by the neural network, calculating the uncertainty between the multiple images, and modeling to form Gaussian distribution of multiple images of the source domain; utilizing uncertainty inside the multi-image of the cross-view pedestrian source domain and the multi-view of the target domain to represent the characteristics of a single pedestrian;
The new Gaussian distribution expressions generated by the source domain and the target domain examples are respectively subjected to iterative operation by fitting a Gaussian mixture model and utilizing an expected maximization algorithm to acquire the characteristics and uncertainty of each type;
and obtaining a high-order Gaussian prototype new feature expression of the source domain and the target domain category, and carrying out inter-domain distribution constraint through JS divergence.
3. The method for identifying the pedestrian re-from the cross-view angle based on Gaussian modeling according to claim 2, wherein the method is characterized in that the visual characteristics are equivalently restated as follows:
feature C of class center j The formula is as follows:
wherein n is class For a current batch of a certain class C j Is used in the number of (a) and (b),is characteristic of the ith instance of the jth class.
4. The method for identifying the pedestrian re-from the cross-view angle based on the Gaussian modeling according to claim 1, wherein the expression of the new expression of the target domain multi-view is:
wherein,,k views for the number of views of the target domain, +.>Features extracted from the kth view of the target domain after passing through the visual feature network.
5. The method for identifying the pedestrian re-from the cross-view angle based on the Gaussian modeling according to claim 1, wherein the Gaussian distribution generated by the source domain and the target domain examples is newly expressed as:
z imgs =m imgs +ωs imgs
Wherein m is imgs Is characterized by s imgs For uncertainty, ω εN (0, 1).
6. The method for identifying the pedestrian re-from the cross view angle based on the Gaussian modeling according to claim 1, wherein the regularization term is introduced, and the standard normal distribution is approached through explicit constraint, and the KL divergence is measured specifically by:
L sim =(m 2 +s 2 -logs 2 -1)/2
where m is the feature of the class center and s is the uncertainty of predicting the feature of the class center of the current lot.
7. The method for identifying the pedestrian re-from the cross view angle based on the Gaussian modeling according to claim 1, wherein the constraint of the overall distribution by ternary normalization loss is as follows:
the overall distribution is constrained by the normalized current batch gaussian instance and the new different class high order gaussian prototype, as follows:
wherein,,gao Sixin characterizations of the same class and different classes, respectively, representing the current prototype level, d ED Is the square of Euclidean distance, |z| 2 Normalized for L2 so that Gao Sixin characterizations of different classes are of the same order of magnitude, η is the minimum distance between the current Gaussian prototype and the global mixture Gaussian prototype of different classes, +.>For the global mixture gaussian prototype, J is the number of categories.
8. The method for identifying the pedestrian re-from the cross-view angle based on Gaussian modeling according to claim 1, wherein the final identification result is output as follows:
Inputting an image of a single pedestrian, extracting features, calculating JS divergence between the image and other pedestrian features in the database, measuring similarity of the input pedestrian image and all pedestrian cross-view angles in the database, and screening out pedestrian cross-view angles with smaller differences and higher sequences.
9. A cross-view pedestrian re-recognition device based on gaussian modeling, the device comprising: a processor and a memory, the memory having stored therein program instructions that invoke the program instructions stored in the memory to cause an apparatus to perform the method steps of any of claims 1-8.
10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method steps of any of claims 1-8.
CN202310449124.7A 2023-04-24 2023-04-24 Cross-view pedestrian re-recognition method and device based on Gaussian modeling Pending CN116486483A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310449124.7A CN116486483A (en) 2023-04-24 2023-04-24 Cross-view pedestrian re-recognition method and device based on Gaussian modeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310449124.7A CN116486483A (en) 2023-04-24 2023-04-24 Cross-view pedestrian re-recognition method and device based on Gaussian modeling

Publications (1)

Publication Number Publication Date
CN116486483A true CN116486483A (en) 2023-07-25

Family

ID=87219006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310449124.7A Pending CN116486483A (en) 2023-04-24 2023-04-24 Cross-view pedestrian re-recognition method and device based on Gaussian modeling

Country Status (1)

Country Link
CN (1) CN116486483A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958707A (en) * 2023-08-18 2023-10-27 武汉市万睿数字运营有限公司 Image classification method, device and related medium based on spherical machine monitoring equipment
CN117333744A (en) * 2023-09-21 2024-01-02 南通大学 Unbiased scene graph generation method based on spatial feature fusion and prototype embedding
CN117456309A (en) * 2023-12-20 2024-01-26 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Cross-domain target identification method based on intermediate domain guidance and metric learning constraint

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958707A (en) * 2023-08-18 2023-10-27 武汉市万睿数字运营有限公司 Image classification method, device and related medium based on spherical machine monitoring equipment
CN116958707B (en) * 2023-08-18 2024-04-23 武汉市万睿数字运营有限公司 Image classification method, device and related medium based on spherical machine monitoring equipment
CN117333744A (en) * 2023-09-21 2024-01-02 南通大学 Unbiased scene graph generation method based on spatial feature fusion and prototype embedding
CN117333744B (en) * 2023-09-21 2024-05-28 南通大学 Unbiased scene graph generation method based on spatial feature fusion and prototype embedding
CN117456309A (en) * 2023-12-20 2024-01-26 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Cross-domain target identification method based on intermediate domain guidance and metric learning constraint
CN117456309B (en) * 2023-12-20 2024-03-15 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Cross-domain target identification method based on intermediate domain guidance and metric learning constraint

Similar Documents

Publication Publication Date Title
Ma et al. Feature split–merge–enhancement network for remote sensing object detection
Von Stumberg et al. Gn-net: The gauss-newton loss for multi-weather relocalization
Žbontar et al. Stereo matching by training a convolutional neural network to compare image patches
CN116486483A (en) Cross-view pedestrian re-recognition method and device based on Gaussian modeling
Chang et al. Deep, landmark-free fame: Face alignment, modeling, and expression estimation
CN111126482B (en) Remote sensing image automatic classification method based on multi-classifier cascade model
Liu et al. Graftnet: Towards domain generalized stereo matching with a broad-spectrum and task-oriented feature
CN111709311A (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN115797736B (en) Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium
Liu et al. 3D Point cloud analysis
CN114529581A (en) Multi-target tracking method based on deep learning and multi-task joint training
Franchi et al. Latent discriminant deterministic uncertainty
Sugimura et al. Three-dimensional point cloud object detection using scene appearance consistency among multi-view projection directions
Mariotti et al. Viewnet: Unsupervised viewpoint estimation from conditional generation
Hou et al. Learning visual overlapping image pairs for SfM via CNN fine-tuning with photogrammetric geometry information
Liao et al. Multi-scale saliency features fusion model for person re-identification
Wang et al. MsRAN: A multi-scale residual attention network for multi-model image fusion
Vidhyalakshmi et al. Text detection in natural images with hybrid stroke feature transform and high performance deep Convnet computing
CN116109649A (en) 3D point cloud instance segmentation method based on semantic error correction
López‐Martinez et al. Vanishing point detection using the teaching learning‐based optimisation algorithm
Wang et al. An Improved Convolutional Neural Network‐Based Scene Image Recognition Method
CN110968735B (en) Unsupervised pedestrian re-identification method based on spherical similarity hierarchical clustering
Tang et al. Learning Hough regression models via bridge partial least squares for object detection
Wang et al. Learning to count objects with few exemplar annotations
Moussa et al. Efficient common objects localization based on deep hybrid Siamese network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination