CN110765841A - Group pedestrian re-identification system and terminal based on mixed attention mechanism - Google Patents

Group pedestrian re-identification system and terminal based on mixed attention mechanism Download PDF

Info

Publication number
CN110765841A
CN110765841A CN201910827177.1A CN201910827177A CN110765841A CN 110765841 A CN110765841 A CN 110765841A CN 201910827177 A CN201910827177 A CN 201910827177A CN 110765841 A CN110765841 A CN 110765841A
Authority
CN
China
Prior art keywords
attention
module
feature
mixed
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910827177.1A
Other languages
Chinese (zh)
Inventor
杨华
阳兵
许琪羚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
China Electronics Technology Group Corp CETC
Electronic Science Research Institute of CTEC
Original Assignee
Shanghai Jiaotong University
China Electronics Technology Group Corp CETC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University, China Electronics Technology Group Corp CETC filed Critical Shanghai Jiaotong University
Priority to CN201910827177.1A priority Critical patent/CN110765841A/en
Publication of CN110765841A publication Critical patent/CN110765841A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a group pedestrian re-identification system based on a mixed attention mechanism, which comprises a backbone model module, a mixed attention model module, a feature extraction module, a feature fitting module and a feature evaluation module, wherein the backbone model module is used for generating a plurality of groups of pedestrians; performing primary feature extraction on the group images by using a deep convolutional neural network backbone model; further extracting the preliminarily extracted features by using a mixed attention mechanism model; features that have been subjected to the mixed attention model are aligned and evaluated using least squares residual distances. A terminal for operating the system is also provided. The method utilizes a mixed attention model comprising space attention and channel attention, enables the network to pay more attention to key regions and characteristics of the group images, provides a novel least square residual distance based on a least square algorithm, better learns the measurement between the group image pairs, can well adapt to various challenges brought by the group pedestrian images, and has good diversity and universal applicability.

Description

Group pedestrian re-identification system and terminal based on mixed attention mechanism
Technical Field
The invention belongs to the technical field of computer vision, in particular to a group pedestrian re-identification system and a terminal based on a hybrid attention mechanism, and relates to pedestrian re-identification focusing on a group under a non-overlapping monitoring camera.
Background
In recent years, people pay more and more attention to public safety problems, the attention degree of the safety problems is raised to a brand new level, video monitoring networks in cities are improved, video monitoring is distributed to all corners of the cities, and the quality and the quantity of video monitoring data are greatly improved. In each part of the video monitoring network, in consideration of the flexibility and variability of pedestrian activities and the important significance of monitoring pedestrians on protecting personnel safety and criminal investigation and solution solving, the pedestrians are one of key objects concerned by monitoring videos, and video monitoring is applied to places where various people frequently appear and disappear.
At the present stage, the pedestrian monitoring means mainly monitors real-time camera shooting in a mode mainly based on manual monitoring, a large amount of manpower and material resources are needed in the process, the video data volume obtained by video monitoring is huge, and the obtained huge video data volume is difficult to be completely covered in the mode mainly based on manual monitoring, so that the monitoring mechanism is easy to miss important information and data. In this case, a pedestrian re-recognition algorithm based on deep learning and artificial intelligence is increasingly important. The pedestrian re-identification algorithm can identify the same pedestrian under the monitoring of different cameras which are not overlapped. The rapid development of pedestrian re-identification can bring huge accuracy promotion for the video monitoring system, and simultaneously, a large amount of manpower and material resources can be saved, and the method has important research significance.
The pedestrian re-identification across the cameras plays an important role in solving key problems in video monitoring. The pedestrian re-identification is a key technology in a monitoring scene, clear face images are scarce in a monitoring network, and the resolution of monitoring result images or videos of pedestrians cannot meet the requirement of face identification, so that the face identification technology is difficult to apply to monitoring videos obtained by cameras with extremely low resolution or far positions away from pedestrians. Considering that in practical situations, a monitoring camera is deployed with a plurality of vacuum areas which cannot be monitored, after a pedestrian leaves the monitoring area of the camera and enters the vacuum area, the related technology of target tracking is not applicable, and the pedestrian re-recognition function is a key for solving the cross-camera tracking.
The existing research has high attention on the problem of pedestrian re-identification of a single person, but neglects the important role of pedestrian re-identification based on a group in a pedestrian comparison task. Group-based pedestrian re-identification can solve some problems that cannot be solved by single-person pedestrian re-identification, considering that the behaviors of pedestrians are often accompanied by group information, the group behaviors are very common and the information provided by the common actions of the groups has important value. For example, when a single pedestrian is shielded by another pedestrian in the group in a large area in the camera for a long time, matching cannot be achieved by re-identifying the pedestrian by the single person, and the problem can be solved by re-identifying the pedestrian from the group angle.
Group re-identification has an important role in public safety and surveillance systems. The goal of group re-identification is to identify the same group in different non-overlapping camera views. The existing relevant research of pedestrian re-identification is mostly based on a single person, and in a real situation, the group behaviors are not negligible. Groups are very common behaviors that provide a large amount of useful information. Meanwhile, the group comparison method based on the image can be extended to the video sequence. The video sequence has more information compared with the image, and the system performance can be further improved by integrating the analysis of the spatio-temporal information.
There are relatively few studies on the re-identification of group pedestrians, and there are some problems in that the group pedestrian identification is studied by using information such as position, trajectory, speed and direction (see Ukita, Norimichi, Yusuke Moriguchi, and NorihiroHagita. "Peer re-identification access non-overlapping cameras using group features" "Computer Vision and Image Understanding 144(2016): 228-. Some work uses a sparse feature coding method and unsupervised transfer learning to transfer and learn a sparse coding dictionary which is learned by single pedestrian re-identification into a Group problem (see Lisanti, Giuseppe, et al, "Group re-identification vision dictionary of sparse features encoding," Proceedings of the IEEE International conference on Computer Vision.2017 "), but this method needs results of DPM, ACF and R-CNN simultaneously, and the cost of preprocessing is high. Some work extracts subsets of the Group and iteratively matches them using a Multi-level matching algorithm, which can fully extract the features of the Group and handle the variability of population within the Group (see Xiao, Hao, et al, "Group Re-Identification: describing and Integrating Multi-Grain information," 2018 a Multimedia Conference on Multimedia Conference. acm,2018.), but the matching process is very time consuming and not applicable to large data sets and real-world scenarios.
Aiming at the requirements of the current social public safety system, the group re-identification research of the cross-camera has an important role in the accuracy of the monitoring system and the high recall rate of target pedestrians. The problem of group re-identification across cameras will therefore be extensively studied in the present invention.
Disclosure of Invention
In view of the foregoing problems in the prior art, an object of the present invention is to provide a group pedestrian re-identification system and a terminal based on a hybrid attention system, where the system and the terminal utilize the advantages of the existing deep learning method, extract features of group images by utilizing deep learning, and simultaneously fuse the hybrid attention system, extract features with higher resolution and apply greater weight to key regions, so that the attention of a frame can be focused on the entire group images including a background to focus on more information with resolution of the key in group pedestrians, thereby improving the performance of group pedestrian re-identification under a monitoring camera.
The group behaviors can provide more information for pedestrian re-identification, and the method has important research value for further improving the result of pedestrian re-identification. The invention aims to re-identify based on groups, and can compare target groups and perform cross-camera retrieval in a video monitoring system. In the target retrieval process, the deep learning method based on group re-identification and pedestrian re-identification is far superior to manual search. With the continuous expansion of the coverage of video monitoring, the intelligent video monitoring technology has far better effect than the traditional manual monitoring in the aspect of monitoring the network. The comparison of target pedestrians is effectively controlled through an intelligent video monitoring technology, and the social security level and the management efficiency can be effectively improved. The research on the problem has very important practical significance for security and criminal investigation events.
The group features with discrimination can be focused on based on multiple attentions such as spatial attention and channel attention. By designing a network framework for group cross-camera re-identification by utilizing a deep neural network and aiming at the problem of space change in a group, the invention performs distinguishable efficient image feature extraction and expression on the group in the monitoring video by designing a feature extraction algorithm fusing global and local features based on an attention model. Meanwhile, considering that a re-recognition algorithm based on a single-frame image is limited to the visual features of static pedestrians, the practical challenges such as deformation and shielding of pedestrians under different camera viewing angles cannot be solved well, on the basis of feature extraction of the image, the spatial and temporal features of the pedestrians in the video are further utilized, a network which is based on a video sequence and extracts the spatial features and the temporal features in the image at the same time is designed, and comparison and correlation analysis of key pedestrian targets across a time-space domain are performed.
In the surveillance video, the situation that a target pedestrian is shielded by a large area often occurs, and the shielded target person is difficult to be retrieved through simple pedestrian re-identification. Therefore, the greatest advantage of group-based pedestrian re-identification lies in the problem of being able to further judge the re-identification of pedestrians whose information is insufficient by means of the matching information of the group. In the pedestrian re-recognition problem of a single person, if a person is continuously occluded by other people in a group under the view angle of a certain video camera, the person is difficult to re-recognize in the camera, and group-based re-recognition has a capability of solving such a problem. The key point of matching such images is to fully utilize the image information of pedestrians with complete images in the group, so that the pedestrians with complete information become important persons of image matching, and the matching of the group is completed by matching the important persons and simultaneously supplementing the information of the blocked pedestrians. Group identification can make up for this disadvantage of single person re-identification. After the group comparison result is applied to the single pedestrian re-identification task, the group comparison result can be further applied to the single pedestrian re-identification result based on the research basis of the group comparison in the research. When the pedestrian re-identification and the group re-identification are simultaneously applied, the accuracy of the pedestrian re-identification can be optimized by increasing more effective information provided by group comparison;
the attention mechanism in deep learning is substantially similar to the human selective visual attention mechanism. The core goal is to select more critical information from the multitude of information that is useful for the current task goal. Aiming at various challenges existing in group re-identification, a plurality of attention mechanisms including channel attention and space attention are adopted, and a feature vector with higher resolution can be obtained by extracting network features through a mixed attention model. The image of the group has more background information than the image of the individual pedestrian. Therefore, the local features of the pedestrian part are extracted by designing a feature extraction algorithm based on the attention model and fusing the global features and the local features, the extracted features are more focused on the pedestrian part, and the background part is given less weight. More discriminative feature vectors can be extracted using an attention mechanism. The spatial attention is attention corresponding to a position different from the picture in two dimensions of width and height of the image. In the traditional pedestrian comparison process, mismatching is easy to occur for different pedestrians with similar appearances. Under the condition, the spatial attention mechanism can better pay attention to local features with discrimination, and further improves the accuracy of overall pedestrian re-identification.
The invention is realized by the following technical scheme.
According to one aspect of the invention, a group pedestrian re-identification system based on a hybrid attention mechanism is provided, comprising:
a backbone model module: forming a backbone model feature extraction network P of a group pedestrian re-identification task based on a deep convolutional neural network, applying the backbone model feature extraction network P obtained by image pairs on the whole group pedestrian re-identification data set, and generating a feature vector E for each image s of group pedestrians through the backbone model feature extraction network P;
a mixed attention model module: on the basis of the backbone model feature extraction network P, adding a mixed attention mechanism network H, and further extracting the preliminarily extracted features; the mixed attention mechanism network H comprises a channel attention module C and a space attention module S, wherein the channel attention module C and the space attention module S respectively act on the features of the feature vector E in different dimensions;
a feature extraction module: capturing global dependence of the feature vector E through the mixed attention mechanism network H, and obtaining a channel attention parameter w after each input feature vector E is processed by a channel attention module C and a space attention module S in the mixed attention mechanism network H respectively1And spatial attention parameter w2(ii) a Applying attention parameter w1And w2Obtaining the integral attention weight w of the feature vector E; multiplying the feature vector E and the attention weight w to obtain features F of key areas of the pedestrian images of the more attention groups;
a feature fitting module: matching the extracted image features F with the features of the detection target; in the distance measurement stage, least square residual distance is adopted, the characteristics of a detection target and a matched object in the least square residual distance after being extracted by a backbone model characteristic extraction network P and a mixed attention mechanism network H are respectively Y and X, and a polynomial fitting function is fitted through learning
Figure BDA0002189461020000051
To approximate a feature Y that is close to the true detection target; wherein the polynomial fitting functionIs shown as
Figure BDA0002189461020000053
In the formula, A is the form of expanding the characteristic X of the matching object into a matrix, and W is the coefficient of a polynomial fitting function; fitting a function to a polynomial
Figure BDA0002189461020000054
Adding a regularization term; solving the polynomial fitting function added with the regularization term by using a least square method to find out the coefficient W of the optimal polynomial fitting function;
a characteristic evaluation module: fitting a polynomial to a function
Figure BDA0002189461020000055
And a function formed by the difference value of the fitting result of (a) and the feature Y extracted by the backbone model feature extraction network P
Figure BDA0002189461020000056
The features X extracted by the mixed attention mechanism network H are compared and evaluated as distances.
Preferably, in the backbone model module, the backbone model feature extraction network P mainly consists of a deep convolutional neural network, and outputs the preliminary features of each image s of the group of pedestrians.
Preferably, in the feature extraction module:
for the channel attention module C, the global pooling operation is first performed on the feature vectors, the calculation formula is as follows,
Figure BDA0002189461020000057
wherein X represents the features extracted by the backbone model feature extraction network P, and h, w and C respectively represent the lengths of the input vector of the attention module C in three dimensions; after the feature vectors are subjected to global pooling, the features are changed into the length which is the same as the number of the channels, attention weights of different feature channels are represented at the same time, and then the feature vectors obtained through the global pooling are respectively sent to two full-connection layers;
for the spatial attention module S, firstly pooling the feature vectors in the channel dimension, and using a full-connection layer with the kernel size of 1 × 1 and the step length of 1 to learn the attention weight of the network in the length dimension and the width dimension; and finally, the results of the space attention and the channel attention are used as weights to act on the original features, so that the features with higher resolution are obtained.
Preferably, in the feature extraction module, the channel attention parameter w1There is a dimension, the spatial attention parameter w2There are two different dimensions; attention parameter w1And w2Merging to obtain a parameter w with attention weight in h, w and c dimensions; h, w, C represent the length of the input vector of attention module C in three dimensions, respectively.
Preferably, in the feature extraction module, the feature F is a feature of the group image obtained after the mixed attention mechanism is added, and is obtained by multiplying the feature vector E without the mixed attention mechanism by the body attention weight w.
Preferably, in the feature learning module, the polynomial fitting function after adding the regularization term is:
Figure BDA0002189461020000061
preferably, in the feature learning module, the feature X is (X)1,x2,x3,……,xd) The expansion into a matrix is in the form of:
Figure BDA0002189461020000062
where d is the dimension of feature X.
Preferably, in the feature learning module, the equation for solving the coefficient W of the optimal polynomial fitting function by using the least square method to the polynomial fitting function added with the regularization term is as follows:
W=(ATA+βI)-1ATY (4)
preferably, in the feature evaluation module, the function
Figure BDA0002189461020000063
Comprises the following steps:
Figure BDA0002189461020000064
the features that have been mixed attention models are compared and evaluated as distances.
According to another aspect of the invention, there is provided a terminal comprising a memory, a processor, and a system as provided by any of the above stored on the memory and executable by the processor.
Compared with the prior art, the invention has the beneficial effects that:
1) the deep learning method is used for extracting the image features in the group re-recognition task and sequencing and recognizing the group identities, and compared with the conventional mode that the two parts of manual design features and measurement learning are completely separated, the deep learning method can extract more effective group image features.
2) The present invention uses a hybrid attention mechanism and can extract features with more resolving power. Aiming at various challenges existing in group re-identification, various attention mechanisms including channel attention and space attention are adopted, and a mixed attention model is used for extracting network features, so that feature vectors with higher resolution can be obtained, and a group re-identification task is completed better.
3) Different from the traditional distance measurement which mostly uses Euclidean distance and cosine distance, the method matches the pictures with the extracted features and can adopt the residual distance based on the least square method in the distance measurement stage, so that the association between the target image and the features of the image to be matched can be better learned, the distance between the residual errors of the features is used as the distance between the features of the two pictures, the relationship between the two pictures can be more effectively represented, and the accuracy of re-identification is improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram of a hybrid attention mechanism network feature extraction according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating group re-identification of challenging problems according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a least squares residual distance in accordance with an embodiment of the present invention;
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the accompanying drawings: the present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
The embodiment of the invention provides a group pedestrian re-identification system based on a mixed attention mechanism, which comprises the following modules:
backbone model: the module forms a backbone model feature extraction network P of a group pedestrian re-identification task based on a deep convolutional neural network, applies the backbone model feature extraction network P obtained by an image on the whole group pedestrian re-identification data set, generates a feature vector E for each image s of group pedestrians through the backbone model feature extraction network P of the group pedestrian re-identification task, and outputs the initial features of the group images.
A mixed attention model module: the module adds a mixed attention mechanism network H on the basis of a backbone model feature extraction network P to further extract the preliminarily extracted features. The mixed attention mechanism network H captures global dependencies within a single feature. And adding a network H of a mixed attention mechanism to pay more attention to key areas and characteristics of the group images, wherein the network H of the mixed attention mechanism is divided into a channel attention module C and a space attention module S, and the two modules respectively act on the characteristics of the images in different dimensions after the characteristics are extracted through the backbone model characteristic extraction network P.
A feature extraction module: capturing global dependency in single feature by a mixed attention mechanism network H in a mixed attention model module, and obtaining an attention parameter w after processing each input feature E by a channel attention module C and a space attention module S in the attention model network H respectively1And w2Representing the weights of all channel features and location features, respectively; applying attention parameter w1And w2Obtaining the integral attention weight w of the feature vector E; multiplying the feature vector E and the attention weight w to obtain features F of key areas of the pedestrian images of the more attention groups;
specifically, the method comprises the following steps:
each input feature E is processed by a channel attention module C and a space attention module S in an attention model network H to obtain an attention parameter w1And w2Representing the weights of all channel features and location features, respectively; for the channel attention module, the global pooling operation is first performed on the feature vectors, the calculation formula is as follows,
Figure BDA0002189461020000081
wherein X represents the characteristics extracted by the backbone model characteristic extraction network, and h, w and c represent the lengths of the input vectors of the attention module in three dimensions respectively. After the feature vectors are subjected to global pooling, the features are changed into the length which is the same as the number of the channels, attention weights of different feature channels are represented at the same time, and then the feature vectors obtained through the global pooling are respectively sent to two full-connection layers, so that the operation increases the nonlinearity of the network, and parameters required by network training are reduced. For the spatial attention module, the feature vectors are first pooled in the channel dimension and used to learn the attention weight of the network in both the length and width dimensions through the fully connected layer with the kernel size of 1 × 1 and the step size of 1. And finally, the results of the space attention and the channel attention are used as weights to act on the original features, so that the features with higher resolution are obtained.
Applying an attention parameter w of a channel dimension and a position dimension1And w2Obtaining the integral attention weight w of the characteristic E; attention parameter w1And w2There are one and two different dimensions, respectively, for the channel attention parameter and the spatial attention parameter, respectively. Will w1And w2After the combination, a parameter w with attention weight in three dimensions of h, w and c is obtained.
Multiplying the feature E of the group pedestrian by the attention weight w, namely multiplying the position and channel dimension of the group feature by the weight of the position feature and the channel feature respectively to obtain the feature F of the key area of the attention group image, wherein the feature F of the group image obtained after adding the mixed attention mechanism is the product of the feature E of the image without adding the mixed attention mechanism and the attention parameter w.
A feature fitting module: and matching the extracted image features with the features of the detection target. The least squares residual distance is used in the distance measurement phase. Features of a detection target and a matching object in the least square residual distance after being extracted by a backbone model and a mixed attention model are respectively Y and X, and a polynomial fitting function is learned
Figure BDA0002189461020000091
To approximate a feature Y that is close to the true detection target; fitting a function to a polynomial
Figure BDA0002189461020000092
Adding a regularization term; solving the polynomial fitting function added with the regularization term by using a least square method to find out the coefficient W of the optimal polynomial fitting function
Specifically, the method comprises the following steps:
the features of the detection target and the matching object after being extracted by the backbone model and the mixed attention model are Y and X respectively. The characteristics of a detection target and a matching object in the least square residual distance after being extracted by a backbone model and a mixed attention model are respectivelyY and X, fitting a function by learning a polynomial
Figure BDA0002189461020000093
Figure BDA0002189461020000094
To approximate the feature Y that is close to the real detection target. Expressing a polynomial fitting function as
Figure BDA0002189461020000095
A form in which a is a form in which a feature X of a matching object is expanded into a matrix form, and the feature X is (X)1,x2,x3,……,xd) The expansion is carried out as a polynomial,
Figure BDA0002189461020000096
where d is the dimension of feature X. Problem transformation to find
Figure BDA0002189461020000097
The optimal solution of (a) can be obtained from the formula of the least square methodTA+βI)-1ATAnd Y. W is the coefficient of the target polynomial fitting function;
a model obtained by simply considering the optimal solution will have a great likelihood of overfitting and the prediction results are poor. To solve this problem, a regularization term is added to the objective function, and the objective function after the regularization term is added is,
Figure BDA0002189461020000098
solving the target fitting function using a least squares method to find a coefficient parameter W of the best fitting function, solving the target fitting function using a least squares method to find a coefficient parameter W of the best fitting function using an equation of,
W=(ATA+βI)-1ATY (4)
a characteristic evaluation module: will be provided withFitting function results
Figure BDA0002189461020000104
Function formed by difference with real network extracted characteristic Y
Figure BDA0002189461020000101
The features that have been mixed attention models are compared and evaluated as distances. The final residual distance is
Figure BDA0002189461020000102
The features that have been mixed attention models are compared and evaluated as distances.
And finishing the sequencing retrieval and retrieval of the target group by comparing the distance between the detection target and the matching object, and evaluating the accuracy of the result by using the mAP and the sequencing number.
Based on the mixed attention mechanism-based group pedestrian re-identification system provided by the above embodiment of the invention, an embodiment of the invention also provides a terminal, which comprises a memory, a processor and the system provided by the above embodiment, wherein the system is stored on the memory and can be operated by the processor.
The system and the terminal provided by the embodiment of the invention extract the characteristics of group pedestrians in the image through the mixed attention model, and can focus on the group characteristics with the discrimination based on multiple attentions such as space attention, channel attention and the like. By designing a network framework for cross-camera re-identification of groups by utilizing a deep neural network and aiming at the problem of space change in the groups, and designing a feature extraction algorithm fusing global and local features based on an attention model, the groups in the monitoring video are subjected to distinguishable efficient image feature extraction and expression. Meanwhile, the association between the target image and the characteristics of the image to be matched can be better learned by applying the least square residual distance, and the distance between the residual errors of the characteristics is used as the distance for judging the characteristics of the two images, so that the relationship between the two images can be more effectively represented, and the accuracy of re-identification is improved.
The following table 1 is a numerical comparison result of the final recognition accuracy of the performance obtained based on the system and the terminal provided by the above embodiment of the present invention. The other results for comparison are shown from top to bottom in numerical comparison with the results of the present example. It can be seen that the accuracy of the above embodiment of the invention is improved.
TABLE 1
Figure BDA0002189461020000103
The following table 2 is a comparison between the performance obtained by the two modules in the mixed attention model acting on the network respectively and the experimental result obtained by the whole mixed attention module, and it can be seen that each part of attention in the mixed attention model acting on the network independently can bring improvement to the result, and the result of the mixed attention model is better.
TABLE 2
Method of producing a composite material R=1 R=5 R=10 mAP
Backbone model 80.7% 89.7% 92.6% 71.0%
Attention in space 80.9% 89.1% 94.2% 73.4%
Channel attention 81.3% 89.9% 92.6% 73.8%
Mixed attention 82.7% 91.6% 94.6% 75.2%
In summary, the mixed attention mechanism-based group pedestrian re-identification system and the terminal provided in the above embodiments of the present invention add the mixed attention model based on the convolutional neural network, and adopt various attention mechanisms including channel attention and spatial attention to solve various challenges existing in group re-identification, and extract network features through the mixed attention model, so as to obtain feature vectors with higher resolution and better complete group re-identification tasks; through a multi-instance learning mode, input elements are changed on the basis of the original data set, and a video is changed into a video segment, so that not only is the limitation of a network training Bayer-Town data set insufficient, but also a testing stage can be accurately positioned to an abnormal segment, and the robustness of abnormal detection is improved; capturing global dependence inside the features through the attention model, correcting the features, and participating in training of the whole network in a more reasonable form; finally, the universality of the system and the terminal is improved.
While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims (10)

1. A group pedestrian re-identification system based on a hybrid attentiveness mechanism, comprising:
a backbone model module: the backbone model module forms a backbone model feature extraction network P of a group pedestrian re-identification task based on a deep convolutional neural network, applies the backbone model feature extraction network P obtained by image pairs on the whole group pedestrian re-identification data set, and generates a feature vector E for each image s of group pedestrians through the backbone model feature extraction network P;
a mixed attention model module: the mixed attention model module adds a mixed attention mechanism network H on the basis of a backbone model feature extraction network P of the backbone model module, and further extracts the preliminarily extracted features; the mixed attention mechanism network H comprises a channel attention module C and a space attention module S, wherein the channel attention module C and the space attention module S respectively act on the features of the feature vector E in different dimensions;
a feature extraction module: the feature extraction module captures the global dependence of the feature vector E through the mixed attention mechanism network H of the mixed attention model module, and each input feature vector E is processed by the channel attention module C and the space attention module S in the mixed attention mechanism network H to obtain a channel attention parameter w1And spatial attention parameter w2(ii) a Applying attention parameter w1And w2Obtaining the integral attention weight w of the feature vector E; multiplying the feature vector E and the attention weight w to obtain image features F of key areas of the pedestrian images of the more attention groups;
a feature fitting module: the feature fitting module matches the image features F extracted by the feature extraction module with the features of the detection target; wherein: in the distance measurement stage, least square residual distance is adopted, the characteristics of a detection target and a matched object in the least square residual distance after being extracted by a backbone model characteristic extraction network P and a mixed attention mechanism network H are respectively Y and X, and a polynomial fitting function is fitted through learning
Figure FDA0002189461010000011
To approximate a feature Y that is close to the true detection target; wherein the polynomial fitting function
Figure FDA0002189461010000012
Is shown as
Figure FDA0002189461010000013
In the formula, A is the form of expanding the characteristic X of the matching object into a matrix, and W is the coefficient of a polynomial fitting function; fitting a function to a polynomial
Figure FDA0002189461010000014
Adding a regularization term; solving the polynomial fitting function added with the regularization term by using a least square method to find out the coefficient W of the optimal polynomial fitting function;
a characteristic evaluation module: the feature evaluation module fits a polynomial to a functionAnd a function formed by the difference value of the fitting result of (a) and the feature Y extracted by the backbone model feature extraction network P
Figure FDA0002189461010000016
The features X extracted by the mixed attention mechanism network H are compared and evaluated as distances.
2. The mixed attention mechanism-based group pedestrian re-identification system of claim 1, wherein: in the backbone model module, a backbone model feature extraction network P mainly comprises a deep convolutional neural network and outputs the preliminary features of each image s of the group of pedestrians.
3. The mixed attention mechanism-based group pedestrian re-identification system of claim 1, wherein: the feature extraction module is characterized in that:
for the channel attention module C, firstly, performing global pooling operation on the feature vectors; after the feature vectors are subjected to global pooling, the features are changed into the length which is the same as the number of the channels, attention weights of different feature channels are represented at the same time, and then the feature vectors obtained through the global pooling are respectively sent to two full-connection layers;
for the space attention module S, firstly pooling the feature vectors in the channel dimension, and using the feature vectors in a full connection layer for learning the attention weight of the network in the length dimension and the width dimension; and finally, the results of the space attention and the channel attention are used as weights to act on the original features, so that the features with higher resolution are obtained.
4. The mixed attention mechanism-based group pedestrian re-identification system of claim 1, wherein: in the feature extraction module, a channel attention parameter w1There is a dimension, the spatial attention parameter w2There are two different dimensions; attention parameter w1And w2Merging to obtain a parameter w with attention weight in h, w and c dimensions; h, w, C represent the length of the input vector of attention module C in three dimensions, respectively.
5. The mixed attention mechanism-based group pedestrian re-identification system of claim 1, wherein: in the feature extraction module, the feature F is a feature of the group image obtained after the mixed attention mechanism is added, and is obtained by multiplying the feature vector E without the mixed attention mechanism by the body attention weight w.
6. The mixed attention mechanism-based group pedestrian re-identification system of claim 1, wherein: in the feature learning module, the polynomial fitting function after adding the regularization term is as follows:
Figure FDA0002189461010000021
7. the mixed attention mechanism-based group pedestrian re-identification system of claim 1, wherein: in the feature learning module, the feature X is (X)1,x2,x3,......,xd) The expansion into a matrix is in the form of:
Figure FDA0002189461010000031
where d is the dimension of feature X.
8. The mixed attention mechanism-based group pedestrian re-identification system of claim 1, wherein: in the feature learning module, the equation for solving the coefficient W of the optimal polynomial fitting function by using the least square method to the polynomial fitting function added with the regularization term is as follows:
W=(ATA+βI)-1ATY (4)
9. the mixed attention mechanism-based group pedestrian re-identification system of claim 1, wherein: in the feature evaluation module, a function
Figure FDA0002189461010000032
Comprises the following steps:
the features that have been mixed attention models are compared and evaluated as distances.
10. A terminal comprising a memory, a processor, and the system of any one of claims 1 to 9 stored on the memory and executable by the processor.
CN201910827177.1A 2019-09-03 2019-09-03 Group pedestrian re-identification system and terminal based on mixed attention mechanism Pending CN110765841A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910827177.1A CN110765841A (en) 2019-09-03 2019-09-03 Group pedestrian re-identification system and terminal based on mixed attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910827177.1A CN110765841A (en) 2019-09-03 2019-09-03 Group pedestrian re-identification system and terminal based on mixed attention mechanism

Publications (1)

Publication Number Publication Date
CN110765841A true CN110765841A (en) 2020-02-07

Family

ID=69330231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910827177.1A Pending CN110765841A (en) 2019-09-03 2019-09-03 Group pedestrian re-identification system and terminal based on mixed attention mechanism

Country Status (1)

Country Link
CN (1) CN110765841A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310633A (en) * 2020-02-10 2020-06-19 江南大学 Parallel space-time attention pedestrian re-identification method based on video
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism
CN112489098A (en) * 2020-12-09 2021-03-12 福建农林大学 Image matching method based on spatial channel attention mechanism neural network
CN112733695A (en) * 2021-01-04 2021-04-30 电子科技大学 Unsupervised key frame selection method in pedestrian re-identification field
CN112733590A (en) * 2020-11-06 2021-04-30 哈尔滨理工大学 Pedestrian re-identification method based on second-order mixed attention
CN113239784A (en) * 2021-05-11 2021-08-10 广西科学院 Pedestrian re-identification system and method based on space sequence feature learning
CN113657534A (en) * 2021-08-24 2021-11-16 北京经纬恒润科技股份有限公司 Classification method and device based on attention mechanism
CN114581858A (en) * 2022-05-06 2022-06-03 中科智为科技(天津)有限公司 Method for identifying group of people with small shares and model training method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090303026A1 (en) * 2008-06-04 2009-12-10 Mando Corporation Apparatus, method for detecting critical areas and pedestrian detection apparatus using the same
CN110070073A (en) * 2019-05-07 2019-07-30 国家广播电视总局广播电视科学研究院 Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism
US20190259284A1 (en) * 2018-02-20 2019-08-22 Krishna Khadloya Pedestrian detection for vehicle driving assistance
CN110188611A (en) * 2019-04-26 2019-08-30 华中科技大学 A kind of pedestrian recognition methods and system again introducing visual attention mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090303026A1 (en) * 2008-06-04 2009-12-10 Mando Corporation Apparatus, method for detecting critical areas and pedestrian detection apparatus using the same
US20190259284A1 (en) * 2018-02-20 2019-08-22 Krishna Khadloya Pedestrian detection for vehicle driving assistance
CN110188611A (en) * 2019-04-26 2019-08-30 华中科技大学 A kind of pedestrian recognition methods and system again introducing visual attention mechanism
CN110070073A (en) * 2019-05-07 2019-07-30 国家广播电视总局广播电视科学研究院 Pedestrian's recognition methods again of global characteristics and local feature based on attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QILING XU 等: "Group Re-Identification with Hybrid Attention Model and Residual Distance", 《2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP)》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310633A (en) * 2020-02-10 2020-06-19 江南大学 Parallel space-time attention pedestrian re-identification method based on video
CN111310633B (en) * 2020-02-10 2023-05-05 江南大学 Parallel space-time attention pedestrian re-identification method based on video
CN112733590A (en) * 2020-11-06 2021-04-30 哈尔滨理工大学 Pedestrian re-identification method based on second-order mixed attention
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism
CN112489098A (en) * 2020-12-09 2021-03-12 福建农林大学 Image matching method based on spatial channel attention mechanism neural network
CN112489098B (en) * 2020-12-09 2024-04-09 福建农林大学 Image matching method based on spatial channel attention mechanism neural network
CN112733695A (en) * 2021-01-04 2021-04-30 电子科技大学 Unsupervised key frame selection method in pedestrian re-identification field
CN113239784A (en) * 2021-05-11 2021-08-10 广西科学院 Pedestrian re-identification system and method based on space sequence feature learning
CN113657534A (en) * 2021-08-24 2021-11-16 北京经纬恒润科技股份有限公司 Classification method and device based on attention mechanism
CN114581858A (en) * 2022-05-06 2022-06-03 中科智为科技(天津)有限公司 Method for identifying group of people with small shares and model training method

Similar Documents

Publication Publication Date Title
CN110751018A (en) Group pedestrian re-identification method based on mixed attention mechanism
CN110765841A (en) Group pedestrian re-identification system and terminal based on mixed attention mechanism
Mou et al. A relation-augmented fully convolutional network for semantic segmentation in aerial scenes
Wen et al. Detection, tracking, and counting meets drones in crowds: A benchmark
Chen et al. Partition and reunion: A two-branch neural network for vehicle re-identification.
CN107153817B (en) Pedestrian re-identification data labeling method and device
CN108960141B (en) Pedestrian re-identification method based on enhanced deep convolutional neural network
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
Bedagkar-Gala et al. Multiple person re-identification using part based spatio-temporal color appearance model
Tang et al. Multi-modal metric learning for vehicle re-identification in traffic surveillance environment
Liu et al. Ktan: knowledge transfer adversarial network
CN110390308B (en) Video behavior identification method based on space-time confrontation generation network
CN112818790A (en) Pedestrian re-identification method based on attention mechanism and space geometric constraint
Czyżewski et al. Multi-stage video analysis framework
CN113822246A (en) Vehicle weight identification method based on global reference attention mechanism
CN115240121B (en) Joint modeling method and device for enhancing local features of pedestrians
CN113239885A (en) Face detection and recognition method and system
CN113269099B (en) Vehicle re-identification method under heterogeneous unmanned system based on graph matching
CN113326738B (en) Pedestrian target detection and re-identification method based on deep network and dictionary learning
CN114519863A (en) Human body weight recognition method, human body weight recognition apparatus, computer device, and medium
Guo et al. Multi-modal human authentication using silhouettes, gait and rgb
CN116824641B (en) Gesture classification method, device, equipment and computer storage medium
CN115393788B (en) Multi-scale monitoring pedestrian re-identification method based on global information attention enhancement
Huang et al. Whole-body detection, recognition and identification at altitude and range

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200207

WD01 Invention patent application deemed withdrawn after publication