CN112633417A - Pedestrian depth feature fusion method for pedestrian re-identification and with neural network modularization - Google Patents
Pedestrian depth feature fusion method for pedestrian re-identification and with neural network modularization Download PDFInfo
- Publication number
- CN112633417A CN112633417A CN202110059638.2A CN202110059638A CN112633417A CN 112633417 A CN112633417 A CN 112633417A CN 202110059638 A CN202110059638 A CN 202110059638A CN 112633417 A CN112633417 A CN 112633417A
- Authority
- CN
- China
- Prior art keywords
- pedestrian
- network
- feature
- identification
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 20
- 238000007500 overflow downdraw method Methods 0.000 title abstract description 8
- 230000006870 function Effects 0.000 claims abstract description 34
- 230000004927 fusion Effects 0.000 claims abstract description 33
- 238000000034 method Methods 0.000 claims abstract description 17
- 230000000007 visual effect Effects 0.000 claims abstract description 5
- 238000012549 training Methods 0.000 claims description 15
- 238000005259 measurement Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 abstract description 3
- 238000012544 monitoring process Methods 0.000 abstract description 3
- 239000011159 matrix material Substances 0.000 description 4
- 238000011160 research Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/103—Static body considered as a whole, e.g. static pedestrian or occupant recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
A pedestrian depth feature fusion method for pedestrian re-identification and with a neural network modularization achieves a pedestrian re-identification method based on deep learning and suitable for different neural networks and loss functions, and meanwhile achieves feature fusion of different depths of different neural network structures. Therefore, the feature fusion method applied to pedestrian re-identification has higher flexibility and robustness, improves the accuracy of pedestrian re-identification, and weakens the interference of factors such as the change of a monitoring visual angle and the like on pedestrian re-identification. A Resnet50 Network and a multi-granularity Network (MGN) for pedestrian re-identification are adopted as reference algorithms, and cross entropy loss, triple loss and circle loss are adopted as alternative loss functions. The test data set respectively adopts a Market-1501, DukeMTMC-reid and CUHK03 data set, and on a CUHK03 data set, compared with a reference algorithm MGN, the method disclosed by the invention obtains the improvement of Rank-1/mAP of + 2.71%/2.11%.
Description
Technical Field
The invention relates to a pedestrian re-identification method, in particular to a pedestrian re-identification method aiming at pedestrian feature fusion of different depths in a neural network training process, which has more flexibility and robustness.
Background
The pedestrian re-identification task can be regarded as a subtask of image retrieval, which can be understood as a task of searching for a pedestrian of a specific identity in videos or images captured by cameras of a plurality of different imaging areas. As a new technology in the field of intelligent video analysis, the pedestrian re-identification task plays an important role in security and monitoring application. The essence of the pedestrian re-identification task is that pedestrians in images or videos are detected, tracked and subjected to characteristic analysis, due to the influence of monitoring environment, the shooting visual angle changes, the shooting environment difference, the image quality is poor, the accuracy of the pedestrian re-identification result is influenced by factors such as human posture changes, along with the rapid development of computer vision technology and machine learning, many key problems in the pedestrian re-identification task are effectively solved, and the pedestrian re-identification task gradually becomes a very popular research direction.
Although the efficiency of pedestrian re-identification is effectively improved by the deep learning technology, the problem of pedestrian re-identification still has many research difficulties due to the interference caused by the change of the camera view angle and the change of the pedestrian posture. Since there is a large difference between different pictures of the same pedestrian captured at different camera angles, it is necessary to consider feature fusion of the same pedestrian at different camera angles. The existing deep learning method for pedestrian re-identification usually focuses on feature extraction for classification problems, and feature learning of the same pedestrian under different visual angles usually achieves the purpose of similarity improvement through constraint of loss functions. Therefore, the performance of the neural network for pedestrian re-identification can be improved by carrying out effective feature fusion in the network training process. The current common feature fusion method is to fuse the features of a fixed node in the network, and the method has a certain limitation on the flexibility of application.
The invention provides a pedestrian feature fusion method for pedestrian re-identification, which modularizes a neural network, so that the feature fusion method applied to pedestrian re-identification has higher flexibility and robustness. According to the pedestrian re-identification method, the neural network is modularized, so that the characteristic fusion of the characteristics extracted from the networks at different depths is realized, the influence of the network characteristics at different depths on pedestrian re-identification is focused on by the neural network modularization, and meanwhile, the characteristic fusion module can be more conveniently applied to different pedestrian re-identification networks, so that the accuracy rate of pedestrian re-identification is improved.
Disclosure of Invention
The pedestrian re-identification algorithm optimizes the image feature fusion method of the same pedestrian under different visual angles, enables the feature fusion module to be more flexibly applied to a neural network framework in a mode of modularizing a neural network, pays attention to the influence of features extracted from different depth network layers on pedestrian features finally used for identification, and accordingly finds a relatively excellent feature fusion mode, wherein the feature fusion module is formed by cascade 1 x 1 convolution kernels.
1) Dividing a neural network architecture W into a plurality of training modules by taking a plurality of input/outputs as nodes, wherein a loss function is an independent module, and the neural network architecture W is defined as a backbone network.
2) And selecting the output node p of a certain dividing module B _ p as the input node of the characteristic fusion module B _ f.
3) A sub-network behind a node p in an original network architecture W is defined as W _ q, and a sub-network M _ q with the same structure as W _ q is constructed.
4) And splicing the M _ q after the characteristic fusion module B _ f to construct a characteristic fusion branch network.
5) And respectively selecting proper loss functions to constrain the main network and the feature fusion branch network.
6) In the network training process, the loss function of the whole network architecture is the sum of the loss function of the main network and the loss function of the feature fusion branch network.
Alternative loss functions are, cross-entropy loss function LsoftmaxTriple loss functionSum circle loss functionThe definition is as follows:
xiis the ith feature, WkDenotes xiIs the k-th person's possible weight matrix yiDenotes xiIs the true weight matrix for the kth person.Respectively representing the current feature, a hardest positive sample and a hardest negative sample, wherein the hardest positive sample represents the feature which belongs to the same pedestrian as the current feature in the current training batch data and is most dissimilar, and the hardest negative sample represents the feature which belongs to the same pedestrian as the current feature in the current training batch data and is most similar. Alpha is a hyper-parameter in the network.
Each picture is input into the entire network architecture.
And realizing cross image feature fusion at the node p.
And (4) constraining the whole network structure hyper-parameter through the loss function of the backbone network and the loss function of the feature fusion branch network.
10) Judging whether the iteration number set during network training is reached, if so, executing a step 11), otherwise, executing a step 7)
11) And coding the feature _ m obtained by each pedestrian picture through a main network and the feature _ f obtained by the feature fusion branch network to obtain the features finally used for pedestrian re-identification. The specific coding mode is to take the average value or the maximum value of each dimension feature of feature _ m and feature _ f as the pedestrian feature _ o finally used for retrieval.
12) And judging the identity of the pedestrian by adopting a distance measurement mode on the pedestrian feature _ o used for searching for each pedestrian, wherein the closer the distance is, the higher the possibility that the pedestrian is the same person is, and the farther the distance is, the higher the possibility that the pedestrian is not the same person is.
Drawings
FIG. 1 shows the overall flow chart of the process
Table one Resnet50 comparison with the results of the method of the present invention
TABLE II comparison of MGN with the inventive method results
Detailed Description
Taking a Cuhk03 data set and an MGN network as an example, the following is a best implementation mode under one situation in the present invention, and the specific implementation steps are as follows:
1) res _ conv1, res _ conv2 and res _ conv3 of the neural network architecture MGN are respectively taken as input/output nodes to divide a plurality of training modules, wherein a loss function is an independent module, and the neural network architecture W is defined as a backbone network.
2) Res _ conv3 is selected as the partitioning module B _ p, the output node p of which is the input node of the feature fusion module B _ f.
3) A sub-network behind a node p in an original network architecture W is defined as W _ q, and a sub-network M _ q with the same structure as W _ q is constructed.
4) And splicing the M _ q after the characteristic fusion module B _ f to construct a characteristic fusion branch network.
5) The main network loss function selects a cross entropy loss function and a circle loss function, and the loss function of the feature fusion branch network selects the cross entropy loss function and the circle loss function.
6) In the network training process, the loss function of the whole network architecture is the sum of the loss function of the main network and the loss function of the feature fusion branch network.
Cross entropy loss function LsoftmaxSum circle loss function LtircleThe definition is as follows:
xiis the ith feature, WkDenotes xiIs the k-thPossible weight matrix for a personiDenotes xiIs the true weight matrix for the kth person.Respectively representing the current feature, a hardest positive sample and a hardest negative sample, wherein the hardest positive sample represents the feature which belongs to the same pedestrian as the current feature in the current training batch data and is most dissimilar, and the hardest negative sample represents the feature which belongs to the same pedestrian as the current feature in the current training batch data and is most similar. Alpha is a hyper-parameter in the network.
7) Inputting each picture into the whole network architecture, inputting 8 images for the pedestrian images with the same identity, and inputting 8 pedestrian images with the same identity in total, namely inputting 64 images into the deep neural network MGN for pedestrian re-identification, wherein the input pictures require that the images with each identity are adjacent in the parallel training process.
8) And realizing cross image feature fusion at the node p.
9) And (4) constraining the whole network structure hyper-parameter through the loss function of the backbone network and the loss function of the feature fusion branch network.
10) Setting the iteration number to be 400, judging whether the iteration number set during network training is reached, if so, executing step 11), otherwise, executing step 7)
11) And coding the feature _ m obtained by each pedestrian picture through a main network and the feature _ f obtained by the feature fusion branch network to obtain the features finally used for pedestrian re-identification. The specific coding mode is to take the mean value of each dimensional feature of feature _ m and feature _ f as the pedestrian feature _ o finally used for retrieval.
12) And judging the identity of the pedestrian by adopting a distance measurement mode on the pedestrian feature _ o used for searching for each pedestrian, wherein the closer the distance is, the higher the possibility that the pedestrian is the same person is, and the farther the distance is, the higher the possibility that the pedestrian is not the same person is.
Claims (1)
1. A method for optimizing image feature fusion of the same pedestrian under different visual angles based on a neural network modularization mode is characterized by comprising the following steps:
1) dividing a neural network architecture W into a plurality of training modules by taking a plurality of input/outputs as nodes, wherein a loss function is an independent module, and the neural network architecture W is defined as a backbone network;
2) and selecting the output node p of a certain dividing module B _ p as the input node of the characteristic fusion module B _ f.
3) Defining a sub-network behind a node p in an original network architecture W as W _ q, and constructing a sub-network M _ q with the same structure as W _ q;
4) splicing the M _ q after the characteristic fusion module B _ f to construct a characteristic fusion branch network;
5) respectively selecting proper loss functions to constrain the main network and the feature fusion branch network, wherein the proper loss functions comprise a cross entropy loss function, a triple loss function and a circular loss function;
6) inputting each picture into the whole network architecture, and realizing cross image feature fusion at a node p;
7) the method comprises the steps that the whole network structure hyper-parameter is constrained through a loss function of a main network and a loss function of a feature fusion branch network;
8) judging whether the iteration number set during network training is reached, if so, executing step 11), and if not, executing step 7);
9) and coding the feature _ m obtained by each pedestrian picture through a main network and the feature _ f obtained by the feature fusion branch network to obtain the features finally used for pedestrian re-identification. The specific coding mode is that the average value or the maximum value of each dimension feature of feature _ m and feature _ f is taken as the pedestrian feature _ o finally used for retrieval;
10) and judging the identity of the pedestrian by adopting a distance measurement mode on the pedestrian feature _ o used for searching for each pedestrian, wherein the closer the distance is, the higher the possibility that the pedestrian is the same person is, and the farther the distance is, the higher the possibility that the pedestrian is not the same person is.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110059638.2A CN112633417A (en) | 2021-01-18 | 2021-01-18 | Pedestrian depth feature fusion method for pedestrian re-identification and with neural network modularization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110059638.2A CN112633417A (en) | 2021-01-18 | 2021-01-18 | Pedestrian depth feature fusion method for pedestrian re-identification and with neural network modularization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112633417A true CN112633417A (en) | 2021-04-09 |
Family
ID=75294466
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110059638.2A Pending CN112633417A (en) | 2021-01-18 | 2021-01-18 | Pedestrian depth feature fusion method for pedestrian re-identification and with neural network modularization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112633417A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255604A (en) * | 2021-06-29 | 2021-08-13 | 苏州浪潮智能科技有限公司 | Pedestrian re-identification method, device, equipment and medium based on deep learning network |
CN115100690A (en) * | 2022-08-24 | 2022-09-23 | 天津大学 | Image feature extraction method based on joint learning |
WO2023272995A1 (en) * | 2021-06-29 | 2023-01-05 | 苏州浪潮智能科技有限公司 | Person re-identification method and apparatus, device, and readable storage medium |
-
2021
- 2021-01-18 CN CN202110059638.2A patent/CN112633417A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113255604A (en) * | 2021-06-29 | 2021-08-13 | 苏州浪潮智能科技有限公司 | Pedestrian re-identification method, device, equipment and medium based on deep learning network |
CN113255604B (en) * | 2021-06-29 | 2021-10-15 | 苏州浪潮智能科技有限公司 | Pedestrian re-identification method, device, equipment and medium based on deep learning network |
WO2023272995A1 (en) * | 2021-06-29 | 2023-01-05 | 苏州浪潮智能科技有限公司 | Person re-identification method and apparatus, device, and readable storage medium |
US11810388B1 (en) | 2021-06-29 | 2023-11-07 | Inspur Suzhou Intelligent Technology Co., Ltd. | Person re-identification method and apparatus based on deep learning network, device, and medium |
US11830275B1 (en) | 2021-06-29 | 2023-11-28 | Inspur Suzhou Intelligent Technology Co., Ltd. | Person re-identification method and apparatus, device, and readable storage medium |
CN115100690A (en) * | 2022-08-24 | 2022-09-23 | 天津大学 | Image feature extraction method based on joint learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113516012B (en) | Pedestrian re-identification method and system based on multi-level feature fusion | |
CN112633417A (en) | Pedestrian depth feature fusion method for pedestrian re-identification and with neural network modularization | |
CN115171165A (en) | Pedestrian re-identification method and device with global features and step-type local features fused | |
CN112651262B (en) | Cross-modal pedestrian re-identification method based on self-adaptive pedestrian alignment | |
CN109508663A (en) | A kind of pedestrian's recognition methods again based on multi-level supervision network | |
Shankar et al. | Refining architectures of deep convolutional neural networks | |
CN113159466B (en) | Short-time photovoltaic power generation prediction system and method | |
CN114694185B (en) | Cross-modal target re-identification method, device, equipment and medium | |
CN110751018A (en) | Group pedestrian re-identification method based on mixed attention mechanism | |
CN109919084B (en) | Pedestrian re-identification method based on depth multi-index hash | |
Giraldo et al. | Graph CNN for moving object detection in complex environments from unseen videos | |
CN111639564A (en) | Video pedestrian re-identification method based on multi-attention heterogeneous network | |
CN110022422B (en) | Video frame sequence generation method based on dense connection network | |
CN111126223A (en) | Video pedestrian re-identification method based on optical flow guide features | |
CN115063832A (en) | Global and local feature-based cross-modal pedestrian re-identification method for counterstudy | |
CN112418087A (en) | Underwater video fish identification method based on neural network | |
CN109359530B (en) | Intelligent video monitoring method and device | |
CN116934796B (en) | Visual target tracking method based on twinning residual error attention aggregation network | |
CN110120009B (en) | Background blurring implementation method based on salient object detection and depth estimation algorithm | |
CN111079585A (en) | Image enhancement and pseudo-twin convolution neural network combined pedestrian re-identification method based on deep learning | |
CN109598227B (en) | Single-image mobile phone source re-identification method based on deep learning | |
CN114418003B (en) | Double-image recognition and classification method based on attention mechanism and multi-size information extraction | |
CN113537032B (en) | Diversity multi-branch pedestrian re-identification method based on picture block discarding | |
CN116385981A (en) | Vehicle re-identification method and device guided by camera topological graph | |
CN114821629A (en) | Pedestrian re-identification method for performing cross image feature fusion based on neural network parallel training architecture |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210409 |
|
WD01 | Invention patent application deemed withdrawn after publication |