CN114299542A - Video pedestrian re-identification method based on multi-scale feature fusion - Google Patents

Video pedestrian re-identification method based on multi-scale feature fusion Download PDF

Info

Publication number
CN114299542A
CN114299542A CN202111635259.XA CN202111635259A CN114299542A CN 114299542 A CN114299542 A CN 114299542A CN 202111635259 A CN202111635259 A CN 202111635259A CN 114299542 A CN114299542 A CN 114299542A
Authority
CN
China
Prior art keywords
feature
local
time sequence
branch
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111635259.XA
Other languages
Chinese (zh)
Other versions
CN114299542B (en
Inventor
艾明晶
刘鹏高
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202111635259.XA priority Critical patent/CN114299542B/en
Priority claimed from CN202111635259.XA external-priority patent/CN114299542B/en
Publication of CN114299542A publication Critical patent/CN114299542A/en
Application granted granted Critical
Publication of CN114299542B publication Critical patent/CN114299542B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a video pedestrian re-identification method based on multi-scale feature fusion, and provides a video pedestrian re-identification network model based on multi-scale feature fusion, aiming at the problem that the effect of a traditional method is poor when time sequence fusion is carried out on complex apparent features. The model leads out three branches at the end of the backbone network: the method comprises the steps of extracting image-level weight recognition features and time sequence attention weights of different scales from a global feature branch, a local feature branch and a time sequence attention branch respectively, splicing weight recognition feature vectors of different scales, fusing according to the time sequence attention weights, realizing accurate pedestrian weight recognition through a multi-feature independent training strategy, and optimizing structural parameters of a network such as local feature quantity, local feature size and Bottleneck quantity through a comparison experiment. Experiments prove that the indexes of mAP and rank-1 of the invention respectively reach 78.7 percent and 85.1 percent on a Mars data set, and the invention is superior to most of the prior methods.

Description

Video pedestrian re-identification method based on multi-scale feature fusion
Technical Field
The invention relates to the field of computer vision and image processing, in particular to a video pedestrian re-identification method based on multi-scale feature fusion. The distinguishing degree of the image-level pedestrian apparent features is mainly improved through the global feature branches and the local feature branches of the video pedestrian re-recognition network model based on multi-scale feature fusion, the performance of the network model is improved through optimization of the number and the size of the local features and the number of the Bottleneck structures, the problem that time sequence fusion of complex image-level features is difficult to effectively carry out is solved, and finally the obtained video-level pedestrian re-recognition features can achieve higher re-recognition accuracy.
Background
As a biological identification technology, pedestrian Re-identification (ReiD) is different from unique identification such as face identification, iris identification and fingerprint identification, and mainly depends on the apparent characteristics of pedestrians, closely related to the appearance characteristics such as clothing and postures, and has wider application prospect.
The image-based pedestrian re-identification technology has excellent performance, but is poor in robustness in practical application scenes such as video monitoring and the like, and mismatching is easy to occur when the pedestrian flow environment is complex. Information of a single frame image is limited in general, so that the pedestrian re-identification based on the video has important research significance. A typical video-based pedestrian re-identification system consists of two parts, an image-level feature extractor (such as a convolutional neural network) and a modeling method to aggregate temporal features. The most central advantage of this method is to consider not only the content information of a single frame image, but also the motion information between frames, etc.
At present, the main research work of the pedestrian re-identification algorithm for the video is focused on the processing of time sequence information in an image sequence, and the effect of complex apparent features is poor when time sequence fusion is carried out, so that most of the used image-level features are global features. The re-recognition feature discrimination is low, the introduced local features can be considered for optimization, and the introduced local features are effectively combined with the time sequence information extraction model to improve the algorithm performance.
The early pedestrian re-identification method utilizes the whole image to obtain a feature vector for identification, and mainly focuses on global features. With the increase of pedestrian data volume and the deepening of a network structure, the requirement on pedestrian re-identification performance can be met only by introducing local features, and several common local feature extraction modes comprise skeleton key point positioning, image blocking, pose correction and the like.
In 2016, Varior et al proposed a size Long Short-Term Memory network for pedestrian Re-identification, which was sent to the network by vertically cutting pictures, and the resulting features fused with local features of the input image blocks, and this method had a high degree of image alignment (refer to the documents R.Varior, B.Shuai, J.Lu, D.xu, and G.Wang, "A size Long Short-Term Memory Architecture for Human Re-identification," in Proceedings of the European Conference reference Vision. spring, Cham,2016, pp.135-153).
In 2017, Zhao et al proposed Spindle Net, 14 key points were located by a pedestrian, the posture was estimated based on the 14 key points, the human body was segmented into 7 regions, local features on different scales were extracted, and then a global feature was extracted by inputting the picture, and the two were fused. (reference: H.ZHao, M.Tian, and S.Sun et al, "Spindle Net: Person Re-identification with Human Body vector Guided feed composition and Fusion," in Proceedings of the IEEE Conference on Computer Vision and Pattern registration, 2017, pp.907-915.).
In 2017, Zhang et al propose AlignedReID, a local block alignment method beyond human expression, and after uniform blocking, all parts are automatically aligned by calculating the shortest path of a local block without supervision and attitude estimation. (reference document: X.Zhang, H.Luo, and X.Fan et al, "AlignedReID: coloring Human-Level Performance in Person Re-Identification," arXIv preprinting arXIv:1711.08184,2017.)
In 2018, Sun et al proposed a Part-based connected basic (PCB) method for uniform blocking, discussed a better inter-block combination approach, and further proposed a Referred Part Posing (RPP) method based on soft partitioning, aligning each local block with attention mechanism. (references: Y.Sun, L.ZHEN, Yi.Yang, Q.Tian, and S.Wang, "Beyond Part Models: Person Retrieval with referred Part Pooling (and A Strong connected basic)," in Proceedings of the European Conference on Computer Vision (ECCV), "2018, pp.501-518.)
In 2018, Wei et al proposed a Global-local-alignment descriptor (GLAD) network, which divides pedestrians into three parts, namely a head, an upper body and a lower body, after extracting key points, extracts local features and assists Global features. (reference: L.Wei, S.Zhang, H.Yao, W.Gao, and Q.Tian. "GLAD: Global-Local-Alignment Descriptor for Scalable Person Re-Identification," IEEE Transactions on Multimedia,2018, pp.1-1.)
It is noted that the methods of the local features are proposed for the image-based pedestrian re-identification problem, and the conversion and application of the methods to the video pedestrian re-identification problem is a valuable research direction.
A typical video-based pedestrian re-identification system consists of an image-level feature extractor and an extraction module that aggregates temporal features. Most recent video-based human ReID methods are based on deep neural networks and research efforts have focused mainly on the temporal modeling section, i.e. on how to aggregate a range of image-level features into video-level features. The results of the research by the university doctor show that the accuracy of the time series attention weighting (TA) method is the highest under the condition that other modules are consistent. The method adopts a time sequence attention time sequence Modeling frame, introduces local features into an image level feature extractor, splices the features of different scales and performs feature fusion according to the time sequence attention mechanism, and improves the accuracy of Video pedestrian re-identification.
Disclosure of Invention
The method aims to solve the problem that complex image-level apparent features are difficult to perform effective time sequence fusion in a video pedestrian re-identification task, and through providing a video pedestrian re-identification network model based on multi-scale feature fusion, image-level pedestrian apparent features and time sequence attention weights of different scales are synchronously extracted, so that the video-level pedestrian re-identification features generated after the multi-scale features of an image sequence are processed by a time sequence module can have higher discrimination.
The video pedestrian re-identification problem is mainly researched from the perspective of complexity of image-level pedestrian re-identification apparent features. The large-scale apparent features pay attention to the global information of the pedestrians, and the small-scale apparent features pay attention to the local information of the pedestrians, so that reasonable inference can be made that the features of different scales are effectively organized to provide richer feature information for the re-recognition task, and further the accuracy of re-recognition is improved.
Based on a PCB-RPP local feature extraction method, the idea of multi-scale feature fusion is experimentally verified, ResNet50 is used as a backbone network, two branches are arranged, global features and local features are respectively extracted, and finally, global feature vectors and local feature vectors are spliced into multi-scale feature vectors. As shown in Table 1, the re-identification accuracy of multi-scale features on the marker-1501 dataset is better than that of single-scale global and local features.
TABLE 1 Multi-Scale feature fusion validation results
mAP Rank1 Rank5 Rank10
ResNet50 Global features 77.9 92.1 96.9 97.8
PCB-RPP local features 79.1 91.8 97.1 98.0
Multiscale features 79.8 92.5 96.9 98.0
Based on the consideration, the invention provides a video pedestrian re-identification network model based on multi-scale feature fusion, which consists of a shared backbone network and three branches. The backbone network is modified on the basis of ResNet50, the tail end of the backbone network is connected with three branches, namely a global feature branch, a local feature branch and a time sequence attention branch, global features, local features and time sequence attention weights are respectively extracted, a model splices global feature vectors and local feature vectors in each frame to obtain multi-scale image-level feature vectors, and finally, the multi-scale feature vectors in each frame are weighted and fused according to the time sequence attention weights to obtain video-level pedestrian weight identification vectors.
The main content of the invention specifically comprises the following steps:
step 1: video pedestrian re-identification network design based on multi-scale fusion
The designed video pedestrian re-identification network model based on multi-scale feature fusion is shown in fig. 1 and consists of a shared backbone network and three branches.
The backbone network cancels the down-sampling operation in the last layer of residual error structure on the basis of the Resnet50 network, so that the size of the output feature map is enlarged to twice of the original size, and a more sufficient division space is provided for extracting local features.
Three branches are led out from a characteristic diagram obtained from the tail end of the backbone network and are respectively used for extracting global characteristics, local characteristics and time sequence information.
On the global feature branch, the feature map generates a group of 2048-dimensional global feature vectors after one convolution, normalization and pooling operation.
On the local feature branch, after being decoupled by Bottleneck, the feature graph is subjected to soft division by a PCB-RPP algorithm to generate a group of 2048-dimensional local feature vectors, wherein two local features respectively occupy 1024 dimensions.
And on the time sequence attention branch, the feature graph sequentially undergoes time domain convolution and space domain convolution to generate a time sequence attention score of the length of the input picture sequence, and the time sequence weight required by time sequence fusion is obtained.
In addition, the header of the local branch adds two layers of bottleeck structures, which are the basic residual structure of ResNet50, as shown in fig. 2. The structure is added at the front end of the local characteristic branch, so that the coupling between the global characteristic and the local characteristic can be reduced, otherwise, the two branches are directly led out from the output end of the backbone network at the same time, and the network is difficult to converge during training. The Bottleneck structure is selected because the structure can remove the coupling relation between the features with enough depth, and meanwhile, the residual structure greatly reduces the calculation load caused by network deepening.
Splicing the global feature vector and the local feature vector of each frame obtained by the network global feature branch and the local feature branch to generate 4096-dimensional single-frame fusion features; and then, carrying out weighted average according to the time sequence attention scores of different frames obtained by the time sequence attention branch to obtain the final 4096-dimensional video-level pedestrian re-identification feature vector.
Step 2: multi-feature independent training strategy design
Because the feature vector finally generated by the network model is formed by splicing and fusing a plurality of feature vectors, in order to ensure the training effect of multiple features, the feature vectors after fusion are divided and trained independently.
(1) Classifier design
And in the training stage, a classifier is independently arranged for each spliced part in the feature vector after time sequence fusion output by the model, namely the features of each scale are independently trained, and the parameters of the classifier are not shared. Wherein, the classifier is a full connection layer of the neural network.
(2) Loss function
For each scale feature, the loss function for training is composed of two parts, as shown in formula (1).
Lossi=Losscross entropy+Losstriplet (1)
Therein, Losscross entropyAnd LosstripletRespectively representing a cross-entropy loss function and a triplet loss function.
The final loss function is obtained by summing the loss functions of the characteristics of each part, as shown in formula (2).
Figure BDA0003442066220000051
Where N represents the number of features before stitching, the present invention uses one global feature and two local features, so N is 3.
(3) Training method
Because the local branch is subjected to characteristic division according to the PCB-RPP thought, the training of the model is divided into two stages: firstly, uniformly dividing a characteristic diagram into an upper local characteristic and a lower local characteristic by a local characteristic branch in a hard division mode; the training of the second stage is carried out on the basis of the convergence of the training of the first stage, namely, a classifier is used to replace the uniform partitioning method of the first stage, and each point on the feature map is allocated to each local feature in a probability mode.
Furthermore, in both training phases, all parameters of the network model participate in the iteration.
And step 3: network model structure parameter optimization
And (3) carrying out comparison experiments on the influence of three parameters of the number of local features, the size of the local features and the number of Bottleneck on the performance of the model, and training and testing on a Mars data set.
Specifically, experiment optimization is performed according to the sequence of the local feature quantity, the local feature size and the Bottleneck quantity, each parameter is optimized, the optimization result is fixed, and the optimization experiment of the next parameter is performed.
Drawings
Fig. 1 is a diagram of a video pedestrian re-identification network model based on multi-scale feature fusion.
FIG. 2 is a schematic diagram of the structure of Bottleneck.
Fig. 3 is an example of a data set sample used in an experiment.
Fig. 4 is a visualization thermodynamic diagram for feature extraction of an image sequence according to the present invention.
Detailed Description
The technical scheme, the experimental method and the test result of the invention are further described in detail with reference to the accompanying drawings and specific experimental embodiments.
The experimental procedure is specifically described below.
The method comprises the following steps: and (3) constructing a three-branch convolutional neural network, inputting a training set sample into the network for training, observing the training condition, and continuously iterating to obtain a training model.
Step two: and testing according to the training result, searching a pedestrian image sequence with the same id as each group of query image sequence in the query from the galery library to form a result sequence, and simultaneously calculating to obtain a corresponding evaluation index.
Step three: and performing a comparison experiment on the network structure parameters according to the evaluation indexes to determine the optimal network structure parameters.
The experimental conditions and conclusions obtained are described in detail below.
3.1 pedestrian re-identification dataset and evaluation index
The test data sets and evaluation indices used in the ReID experiments are presented next. As shown in FIG. 3, the method provided by the invention is tested on two large public data sets, Market-1501 and Mars. The Market-1501 includes 1501 pedestrians shot by 6 cameras and 32668 detected rectangular frames of the pedestrians, the training set comprises 751 persons and 12,936 images, and on average, each person has 17.2 pieces of training data; the test set had 750 people, contained 19732 images, and on average 26.3 test data per person. Mars has a 8298 small track of 625 pedestrians and comprises 509914 pictures based on the data set with the maximum ReID of the video; the test set had 12180 small traces of 636 pedestrians, containing 681089 pictures.
In the task of pedestrian re-identification, a test process is generally to give a (video ReID gives a group of) image to be queried (query), then calculate similarity between the image and images in a candidate set (galery) according to a model, and arrange the images into a sequence from large to small according to the similarity, wherein the closer the former image is to the queried image. In order to evaluate the performance of the pedestrian re-identification algorithm, the current practice is to calculate the corresponding index on the public data set, and then compare with other models. CMC curves (relational Material metrics) and mAP (mean Average precision) are the two most commonly used evaluation criteria.
In the experiment, rank-1, rank-5 and mAP indexes which are most commonly used in the CMC curve are mainly selected, wherein rank-k refers to the probability that the top k-sheet (with the highest confidence) in the search results has correct results, and the mAP index is actually equivalent to an average level, and the higher the mAP is, the higher the query results which are the same person as the query are in the whole ranking list is, the higher the model effect is.
3.2 Primary parameter configuration for ReiD experiments
The specific training parameters are as follows:
the learning rate attenuation strategy uses an lr _ schedule.steplr function, takes 0.0003 as an initial learning rate, and attenuates the learning rate to one tenth of the previous learning rate every 100 epochs of training; the length of the video clip sequence is set to be 4, and the video clip sequence is randomly selected from the data set; the batch size is set to 32; each of the PCB stage and RPP stage trains 400 epochs.
3.3 re-identification of network experiment results
Based on the evaluation indexes and the experimental details, the test is carried out based on a Mars test, and a comparative experimental result of each parameter is obtained.
(1) Local feature quantity
Other parameters in the experiment were configured as follows: the global feature vector size is 2048 and the local feature vector size is 2048.
The test result is shown in table 2, the two local features have the best performance, when the number of the local features is increased, the feature scale is reduced, for the local features with fine granularity, the change of limbs of a person is large when the person walks, and the temporal weighting fusion can cause the blurring of local information.
TABLE 2 influence of local feature quantity on Performance
Number of mAP Rank-1 Rank-5
2 75.0 82.0 93.8
3 73.4 81.1 92.9
4 71.3 79.1 92.2
(2) Local feature size
Other parameters in the experiment were configured as follows: the number of local features is 2, the number of bottletech is 1, and the length of the global feature vector is 2048.
The test results are shown in table 3, and after the size of the local features is reduced by half, the performance is obviously improved, which indicates that the global features have a larger influence on the re-recognition performance.
TABLE 3 influence of local feature size on Performance
Size of mAP Rank-1 Rank-5
2048 75.0 82.0 93.8
1024 77.7 83.8 94.3
(3) Number of bottleeck
Other parameters in the experiment were configured as follows: the number of local features is 2, the global feature vector size is 2048, and the local feature vector size is 1024.
The test result is shown in table 4, the coupling between the global feature and the local feature can be reduced by adding the bottleeck structure at the front end of the local feature branch, the two layers of the bottleeck structure perform best, and the three layers of the bottleeck structure make it difficult for the network to converge.
TABLE 4 Bottlenecek number Effect on Performance
Number of mAP Rank-1 Rank-5
0 77.1 82.7 93.8
1 77.7 83.8 94.3
2 78.7 85.1 94.6
3 74.1 81.3 93.3
In summary, the model performance of the present invention is optimal with two local features, reduced in size by half, and two layers of bottletech.
3.4 feature extraction visualization
In order to observe whether the network global Features and the local feature branches are expected to extract Features of different scales in the image sequence according to the design, the invention uses a Class Activation Mapping algorithm (refer to documents B.Zhou, A.Khosla, A.Lapedriza, A.Oliva and A.Torralba and a "Learning Deep feeds for recognizing Localization and a" in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016 and pp.2921-2929.) to visualize the sensitive areas of different Features on the image sequence, and as a result, as shown in FIG. 4, the global Features extract Features of the whole human body, while the two local Features focus on the head and leg Features respectively, which shows that the Features of each scale effectively extract Features of different positions and different granularities of pedestrians.
3.5 comparison with other methods
Under the condition of not using skills such as re-ranking and the like, on a Mars data set, the method achieves a competitive level in comparison with a mainstream method, and as shown in Table 5, compared with baseline, the indexes of mAP and Rank-1 are respectively improved by 3.3% and 3.4%.
Table 5 comparison with other methods
Figure BDA0003442066220000081
In summary, the invention provides a video pedestrian re-identification model based on multi-scale feature fusion, a time information processing module of the model adopts a time sequence attention feature aggregation method, and a single-frame feature extraction module can effectively extract multi-scale fusion features adapted to the single-frame feature extraction module, so that the discrimination of the features is improved, and the single-frame feature extraction module can effectively cooperate with the time information processing module. In addition, the number and the size of the local features in the model are compared and tested, and the optimal local feature parameters under the algorithm framework are obtained. And the connection structure between the backbone network and the feature extraction branches with different scales is optimized, and the dependence coupling of different branches on the backbone network is reduced. Finally, the performance of the present invention was significantly improved compared to baseline by testing and reached a competitive level.

Claims (1)

1. A video pedestrian re-identification method based on multi-scale feature fusion is characterized by comprising the following steps:
aiming at the problem that the traditional method has poor effect when time sequence fusion is carried out on complex apparent characteristics, a video pedestrian weight recognition network model based on multi-scale characteristic fusion is provided, three branches are led out from the tail end of a backbone network of the model, image-level weight recognition characteristics and time sequence attention weights of different scales are respectively extracted, weight recognition characteristic vectors of different scales are spliced and fused according to the time sequence attention weights, accurate pedestrian weight recognition is realized through a multi-feature independent training strategy, and structural parameters of the network are optimized through a comparison experiment;
the method specifically comprises the following steps:
step 1, video pedestrian re-identification network design based on multi-scale fusion
The designed video pedestrian re-identification network model based on multi-scale feature fusion is composed of a shared backbone network and three branches, wherein the three branches are a global feature branch, a local feature branch and a time sequence attention branch;
the shared backbone network cancels the down-sampling operation in the last layer of residual error structure on the basis of the Resnet50 network, so that the size of the output characteristic diagram is expanded to two times of the original size, and a more sufficient division space is provided for extracting local characteristics;
three branches are led out from a characteristic diagram obtained from the tail end of a backbone network and are respectively used for extracting global characteristics, local characteristics and time sequence information; on the global feature branch, after the feature map is subjected to one convolution, normalization and pooling operation, a group of 2048-dimensional global feature vectors is generated; on the local feature branch, after being decoupled by Bottleneck, the feature graph is subjected to soft division by a PCB-RPP algorithm, namely a local convolution and refinement pooling algorithm, so as to generate a group of 2048-dimensional local feature vectors, wherein two local features respectively occupy 1024 dimensions; on the time sequence attention branch, the feature graph sequentially passes through time domain convolution and space domain convolution to generate a time sequence attention score of the length of the input picture sequence and obtain a time sequence weight required by time sequence fusion;
splicing the global feature vector and the local feature vector of each frame obtained by the network global feature branch and the local feature branch to generate 4096-dimensional single-frame fusion features; carrying out weighted average according to the time sequence attention scores of different frames obtained by the time sequence attention branch to obtain a final 4096-dimensional video-level pedestrian re-identification feature vector;
step 2, designing a multi-feature independent training strategy
Because the feature vector finally generated by the network model is formed by splicing and fusing a plurality of feature vectors, in order to ensure the multi-feature training effect, the feature vectors after fusion are divided and trained independently;
designing a classifier: in the training stage, a classifier is independently arranged for each spliced part in the feature vectors which are output by the model and fused by the time sequence, namely, the features of each scale are independently trained, and the parameters of the classifier are not shared; wherein, the classifier is a full connection layer of the neural network;
loss function: for the feature of each scale, the loss function for training is composed of two parts, as shown in formula (1);
Lossi=Losscrossentropy+Losstriplet (1)
therein, LosscrossentropyAnd LosstripletRespectively representing a cross entropy loss function and a triplet loss function;
the final loss function is obtained by summing the loss functions of all the parts of the characteristics, as shown in the formula (2);
Figure FDA0003442066210000021
wherein N represents the number of features before splicing, and N is 3 because the method uses one global feature and two local features;
the training method comprises the following steps: the local branch is subjected to feature division according to the PCB-RPP idea, so that the training of the model is divided into two stages, and in the first stage, the local feature branch firstly adopts a hard division mode to uniformly divide a feature map into an upper local feature and a lower local feature; the training of the second stage is carried out on the basis of the convergence of the training of the first stage, namely, a classifier is used for replacing a uniform division method in the first stage, and each point on the feature map is distributed to each local feature in a probability mode;
in addition, in the two training stages, all parameters of the network model participate in iteration;
step 3, optimizing the structural parameters of the network model
Performing a comparison experiment aiming at the influence of three parameters of the number of local features, the size of the local features and the number of Bottleneck on the performance of the model, and training and testing on a Mars data set;
specifically, experiment optimization is carried out according to the sequence of the local feature quantity, the local feature size and the Bottleneck quantity, and after each parameter is optimized, the optimization result is kept to enter a comparison experiment of the next parameter.
CN202111635259.XA 2021-12-29 Video pedestrian re-identification method based on multi-scale feature fusion Active CN114299542B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111635259.XA CN114299542B (en) 2021-12-29 Video pedestrian re-identification method based on multi-scale feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111635259.XA CN114299542B (en) 2021-12-29 Video pedestrian re-identification method based on multi-scale feature fusion

Publications (2)

Publication Number Publication Date
CN114299542A true CN114299542A (en) 2022-04-08
CN114299542B CN114299542B (en) 2024-07-05

Family

ID=

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100730A (en) * 2022-07-21 2022-09-23 北京万里红科技有限公司 Iris living body detection model training method, iris living body detection method and device
CN115294601A (en) * 2022-07-22 2022-11-04 苏州大学 Pedestrian re-identification method based on multi-scale feature dynamic fusion
CN115393953A (en) * 2022-07-28 2022-11-25 深圳职业技术学院 Pedestrian re-identification method, device and equipment based on heterogeneous network feature interaction
CN115424022A (en) * 2022-11-03 2022-12-02 南方电网数字电网研究院有限公司 Power transmission corridor ground point cloud segmentation method and device and computer equipment
CN117746462A (en) * 2023-12-19 2024-03-22 深圳职业技术大学 Pedestrian re-recognition method and device based on complementary feature dynamic fusion network model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112101150A (en) * 2020-09-01 2020-12-18 北京航空航天大学 Multi-feature fusion pedestrian re-identification method based on orientation constraint
CN112163498A (en) * 2020-09-23 2021-01-01 华中科技大学 Foreground guiding and texture focusing pedestrian re-identification model establishing method and application thereof
CN112200111A (en) * 2020-10-19 2021-01-08 厦门大学 Global and local feature fused occlusion robust pedestrian re-identification method
CN112818790A (en) * 2021-01-25 2021-05-18 浙江理工大学 Pedestrian re-identification method based on attention mechanism and space geometric constraint
CN113298235A (en) * 2021-06-10 2021-08-24 浙江传媒学院 Neural network architecture of multi-branch depth self-attention transformation network and implementation method
CN113610144A (en) * 2021-08-02 2021-11-05 合肥市正茂科技有限公司 Vehicle classification method based on multi-branch local attention network
US20210390338A1 (en) * 2020-06-15 2021-12-16 Dalian University Of Technology Deep network lung texture recogniton method combined with multi-scale attention

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210390338A1 (en) * 2020-06-15 2021-12-16 Dalian University Of Technology Deep network lung texture recogniton method combined with multi-scale attention
CN112101150A (en) * 2020-09-01 2020-12-18 北京航空航天大学 Multi-feature fusion pedestrian re-identification method based on orientation constraint
CN112163498A (en) * 2020-09-23 2021-01-01 华中科技大学 Foreground guiding and texture focusing pedestrian re-identification model establishing method and application thereof
CN112200111A (en) * 2020-10-19 2021-01-08 厦门大学 Global and local feature fused occlusion robust pedestrian re-identification method
CN112818790A (en) * 2021-01-25 2021-05-18 浙江理工大学 Pedestrian re-identification method based on attention mechanism and space geometric constraint
CN113298235A (en) * 2021-06-10 2021-08-24 浙江传媒学院 Neural network architecture of multi-branch depth self-attention transformation network and implementation method
CN113610144A (en) * 2021-08-02 2021-11-05 合肥市正茂科技有限公司 Vehicle classification method based on multi-branch local attention network

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115100730A (en) * 2022-07-21 2022-09-23 北京万里红科技有限公司 Iris living body detection model training method, iris living body detection method and device
CN115100730B (en) * 2022-07-21 2023-08-08 北京万里红科技有限公司 Iris living body detection model training method, iris living body detection method and device
CN115294601A (en) * 2022-07-22 2022-11-04 苏州大学 Pedestrian re-identification method based on multi-scale feature dynamic fusion
CN115294601B (en) * 2022-07-22 2023-07-11 苏州大学 Pedestrian re-recognition method based on multi-scale feature dynamic fusion
CN115393953A (en) * 2022-07-28 2022-11-25 深圳职业技术学院 Pedestrian re-identification method, device and equipment based on heterogeneous network feature interaction
CN115393953B (en) * 2022-07-28 2023-08-08 深圳职业技术学院 Pedestrian re-recognition method, device and equipment based on heterogeneous network feature interaction
WO2024021283A1 (en) * 2022-07-28 2024-02-01 深圳职业技术学院 Person re-identification method, apparatus, and device based on heterogeneous network feature interaction
CN115424022A (en) * 2022-11-03 2022-12-02 南方电网数字电网研究院有限公司 Power transmission corridor ground point cloud segmentation method and device and computer equipment
CN115424022B (en) * 2022-11-03 2023-03-03 南方电网数字电网研究院有限公司 Power transmission corridor ground point cloud segmentation method and device and computer equipment
CN117746462A (en) * 2023-12-19 2024-03-22 深圳职业技术大学 Pedestrian re-recognition method and device based on complementary feature dynamic fusion network model

Similar Documents

Publication Publication Date Title
Yang et al. Towards rich feature discovery with class activation maps augmentation for person re-identification
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
Wang et al. Large-scale isolated gesture recognition using convolutional neural networks
CN110598543B (en) Model training method based on attribute mining and reasoning and pedestrian re-identification method
CN111738143B (en) Pedestrian re-identification method based on expectation maximization
CN111310668B (en) Gait recognition method based on skeleton information
CN110378208B (en) Behavior identification method based on deep residual error network
US20230267725A1 (en) Person re-identification method based on perspective-guided multi-adversarial attention
CN113239784A (en) Pedestrian re-identification system and method based on space sequence feature learning
CN111460914A (en) Pedestrian re-identification method based on global and local fine-grained features
CN110163117B (en) Pedestrian re-identification method based on self-excitation discriminant feature learning
Jiang et al. Rethinking temporal fusion for video-based person re-identification on semantic and time aspect
CN116030495A (en) Low-resolution pedestrian re-identification algorithm based on multiplying power learning
CN112070010A (en) Pedestrian re-recognition method combining multi-loss dynamic training strategy to enhance local feature learning
CN114818963A (en) Small sample detection algorithm based on cross-image feature fusion
Tian et al. Self-regulation feature network for person reidentification
CN117333908A (en) Cross-modal pedestrian re-recognition method based on attitude feature alignment
CN114821632A (en) Method for re-identifying blocked pedestrians
CN116343294A (en) Pedestrian re-identification method suitable for generalization of field
CN115830643A (en) Light-weight pedestrian re-identification method for posture-guided alignment
CN116343135A (en) Feature post-fusion vehicle re-identification method based on pure vision
CN114299542A (en) Video pedestrian re-identification method based on multi-scale feature fusion
CN111738039A (en) Pedestrian re-identification method, terminal and storage medium
CN114299542B (en) Video pedestrian re-identification method based on multi-scale feature fusion
CN113051962B (en) Pedestrian re-identification method based on twin Margin-Softmax network combined attention machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant