CN111860368A - Pedestrian re-identification method, device, equipment and storage medium - Google Patents

Pedestrian re-identification method, device, equipment and storage medium Download PDF

Info

Publication number
CN111860368A
CN111860368A CN202010724262.8A CN202010724262A CN111860368A CN 111860368 A CN111860368 A CN 111860368A CN 202010724262 A CN202010724262 A CN 202010724262A CN 111860368 A CN111860368 A CN 111860368A
Authority
CN
China
Prior art keywords
feature map
vector
pedestrian
input
recognized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010724262.8A
Other languages
Chinese (zh)
Inventor
范宝余
王立
郭振华
赵雅倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010724262.8A priority Critical patent/CN111860368A/en
Publication of CN111860368A publication Critical patent/CN111860368A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a pedestrian re-identification method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of input characteristic graphs obtained by extracting pedestrian images to be recognized; fusing each input feature map to obtain a fused feature map, calculating statistics of feature maps corresponding to each channel in the fused feature map to obtain a statistic vector, learning the statistic vector based on different structures containing fully-connected layers to obtain an importance vector, and determining a weight vector corresponding to each channel in each input feature map based on the importance vector; and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in the database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized. The pedestrian feature expression and the effective screening of discriminative features can be realized, and the identification accuracy is improved.

Description

Pedestrian re-identification method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of computer vision, in particular to a pedestrian re-identification method, a pedestrian re-identification device, pedestrian re-identification equipment and a storage medium.
Background
The pedestrian re-identification is a technology for searching whether a specific pedestrian exists in an image or a video shot under a non-overlapped vision field, and in a task of the pedestrian re-identification, the task is mainly divided into 2 steps: 1) acquiring pedestrian characteristics; 2) and (4) metric learning. The pedestrian feature expression and the screening of discriminative features directly determine whether a target pedestrian can be correctly identified, and are important links of a pedestrian re-identification task.
Disclosure of Invention
The invention aims to provide a pedestrian re-identification method, a pedestrian re-identification device, pedestrian re-identification equipment and a pedestrian re-identification storage medium, which can realize effective screening of pedestrian feature expression and discriminative features and further improve identification accuracy.
In order to achieve the above purpose, the invention provides the following technical scheme:
a pedestrian re-identification method, comprising:
acquiring a plurality of input characteristic graphs which are obtained by extracting a pedestrian image to be identified and need to be input to multiple branches;
fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing fully-connected layers to obtain importance vectors respectively corresponding to each input feature map, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input characteristic diagram comprises the weight of each channel in the input characteristic diagram;
and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in a database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized.
Preferably, determining a weight vector corresponding to each input feature map based on the importance vector includes:
normalizing each importance vector to obtain a first weight vector corresponding to each input feature map; forming corresponding vectors by using elements at the same position in each importance vector to obtain a plurality of corresponding recombination vectors, normalizing each recombination vector, and replacing the value of the same element in each recombination vector with the value of each element in each recombination vector to obtain a second weight vector corresponding to each input feature map;
and fusing the first weight vector and the second weight vector respectively corresponding to each input feature map to obtain the weight vector respectively corresponding to each input feature map.
Preferably, the fusing the first weight vector and the second weight vector respectively corresponding to each of the input feature maps includes:
fusing the first weight vector and the second weight vector respectively corresponding to each input feature map according to the following formula:
Wi=αiPiiQi
wherein, PiIs a first weight vector, Q, of the ith input feature mapiA second weight vector, α, for the ith input feature mapiAnd betaiAre hyper-parameters, W, corresponding to trainable ith input feature mapiIs the weight vector of the ith input feature map.
Preferably, normalizing each of the importance vectors and the recombination vectors includes:
normalizing each recombination vector by using a softmax function, and normalizing each importance vector by using a sigmoid function.
Preferably, obtaining the feature to be identified based on the output feature map includes:
and processing the output characteristic graph on the basis of a Global Pooling layer, two full-connection layers and a softmax function in sequence to obtain corresponding characteristics to be recognized.
Preferably, the structure including the fully-connected layer includes a first fully-connected layer, a first activation function, a second fully-connected layer, and a second activation function in sequence, and values of parameters in different structures including the fully-connected layer are not completely the same.
Preferably, after determining that the pedestrian with the standard feature matching the feature to be recognized is a pedestrian included in the image of the pedestrian to be recognized, the method further includes:
and recording and displaying the pedestrian identification corresponding to the pedestrian contained in the pedestrian image to be recognized.
A pedestrian re-identification apparatus comprising:
an acquisition module to: acquiring a plurality of input characteristic graphs which are obtained by extracting a pedestrian image to be identified and need to be input to multiple branches;
a pre-processing module to: fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing fully-connected layers to obtain importance vectors respectively corresponding to each input feature map, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input characteristic diagram comprises the weight of each channel in the input characteristic diagram;
an identification module to: and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in a database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized.
A pedestrian re-identification apparatus comprising:
a memory for storing a computer program;
a processor for implementing the steps of the pedestrian re-identification method as described in any one of the above when the computer program is executed.
A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the pedestrian re-identification method of any one of the preceding claims.
The invention provides a pedestrian re-identification method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of input characteristic graphs which are obtained by extracting a pedestrian image to be identified and need to be input to multiple branches; fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing fully-connected layers to obtain importance vectors respectively corresponding to each input feature map, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input characteristic diagram comprises the weight of each channel in the input characteristic diagram; and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in a database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized. According to the technical scheme, after a plurality of input feature maps which are extracted from a pedestrian image to be recognized and need to be input to multiple branches are obtained, each input feature map is fused to obtain a fused feature map, statistics of feature maps corresponding to each channel in the fused feature maps are calculated to obtain statistic vectors consisting of the statistics, the statistic vectors are learned based on structures containing full connection layers of corresponding numbers to obtain importance vectors corresponding to the input feature maps respectively, then a weight vector corresponding to each feature map is obtained based on each importance vector, each input feature map and the corresponding weight vectors are subjected to weighted summation to obtain the feature to be recognized, and finally the feature to be recognized is compared with standard features in a database to determine the pedestrian contained in the pedestrian image; the method realizes the feature selection of the multi-branch channel level, thereby enhancing the expression capability of the channel with improved performance in the network, simultaneously inhibiting the expression capability of the channel with little influence on the final result, realizing the effective screening of pedestrian feature expression and discriminative features, and further improving the identification accuracy.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a pedestrian re-identification method according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a basic block structure based on a novel attention model in the pedestrian re-identification method provided by the embodiment of the invention;
fig. 3 is a novel network structure diagram including a basic block structure based on a novel attention model in the pedestrian re-identification method according to the embodiment of the present invention;
fig. 4 is a schematic diagram of an implementation process of processing an input feature map in a pedestrian re-identification method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a pedestrian re-identification apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Deep learning has achieved great success in solving problems in the computer vision field such as image classification, image segmentation and object detection, and a plurality of excellent convolutional neural network models exist at present. Among them, the Attention Model (one of deep learning models) is widely used in various types of deep learning tasks such as natural language processing, image recognition, and speech recognition, and is one of the most important core technologies in deep learning technologies, so the Attention Model (Attention) is widely regarded and is an important research direction.
The visual attention mechanism is a brain signal processing mechanism specific to human vision, the human vision obtains a target area needing important attention by rapidly scanning a global image, namely a focus of attention, and then puts more attention resources into the area to obtain more detailed information of the target needing attention and suppress other useless information. The method is a means for rapidly screening high-value information from a large amount of information by using limited attention resources, is a survival mechanism formed in long-term evolution of human beings, and greatly improves the efficiency and accuracy of visual information processing by using the human visual attention mechanism. For networks, the Attention mechanism is colloquially designed to focus Attention on important points and ignore other unimportant factors. Wherein the judgment of the importance degree depends on different network structures or application scenarios.
With the continuous development of deep learning technology, network models are developed endlessly, but in order to further improve the precision, researchers tend to design new network structures in the direction of deepening or widening the network. It is undeniable that as the network becomes deeper or wider, the learning capability of the model is also enhanced, but the calculation amount and parameter amount of the model also increase rapidly, which is not beneficial to deployment in practical application, and meanwhile, as the number of model layers becomes larger, a large amount of noise (i.e. a lot of useless features) is inevitably introduced, and too many features usually not only do not improve the capability of the network model, but also confuse the classifier, thereby reducing the recognition capability of the network. Therefore, good discrimination, the ability to maximize the model, can be achieved only if a limited number of discriminative features can be selected, and the attention mechanism (attention mechanism) has been widely adopted because it shows a great advantage in feature selection ability.
In order to improve the precision in neural network training, especially in a multi-branch-based network structure, a characteristic diagram channel contributing to a final result is enhanced, and a channel with little influence on the final network precision or performance is restrained.
Referring to fig. 1, a flowchart of a pedestrian re-identification method according to an embodiment of the present invention is shown, where the method includes:
s11: and acquiring a plurality of input characteristic graphs which are obtained by extracting the pedestrian image to be identified and need to be input to multiple branches.
The execution main body of the pedestrian re-identification method provided by the embodiment of the invention can be a corresponding pedestrian re-identification device; the technical scheme provided by the application is more suitable for multi-branch networks, such as googlenet, resnext and the like; if a pedestrian image (i.e. an image including a pedestrian) needs to be re-identified, the pedestrian image may be processed first, so as to extract an input feature map of each branch in the network that needs to be input to the multi-branch, and the extraction of the input feature map of each branch in the network that needs to be input to the multi-branch may be implemented by using a portion (also referred to as a module) for feature extraction in any multi-branch network in the prior art.
S12: fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing full-connection layers to obtain importance vectors corresponding to each input feature map respectively, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input feature map comprises the weight of each channel in the input feature map.
After the input feature map of each branch is obtained, the input feature maps of all branches can be fused into a total feature map, which can be referred to as a fusion feature map in the present application, so as to realize the fusion of the input feature maps of each branch; after the fused feature map is obtained, a certain statistic of the feature map corresponding to each channel in the fused feature map can be calculated, and then the statistic vector is formed by the certain statistic of the feature map corresponding to each channel, so that feature compression is realized; in order to better learn the importance of different channels, after obtaining the statistic vector, the application can establish a plurality of branches (the branches correspond to the input feature map one by one), and each branch is independent to calculate the importance degree of each channel in the input feature map, namely an attention mechanism; each branch comprises different structures containing fully-connected layers, so that the statistic vector is learned by the structure of each branch, an importance vector corresponding to each input feature map can be obtained, a weight vector corresponding to each feature map is determined based on the importance vectors, and the weight vector corresponding to each feature map comprises weights representing the importance of each channel in the feature map.
It should be noted that the statistics of the feature map corresponding to each channel calculated when a certain statistic of the feature map corresponding to each channel is calculated are the same. The statistical quantity can include any one of the statistical quantities such as mean value, variance, coefficient of variation, skewness and kurtosis, and the mean value is preferably adopted as the statistical quantity to be calculated in the application; specifically, when the calculation of any statistic is realized, the calculation is realized on the basis of pixels in each sub-feature map; the mean value is used for describing the average quantity of the data value, the average quantity of all pixel values contained in the sub-feature graph is calculated in the application, the average quantity can be calculated by using a formula (1), the variance reflects the fluctuation and the stability of the data, the average quantity can be calculated by using a formula (2), and the coefficient of variation is the standard deviationAnd the ratio of the mean value, which is a dimensionless quantity, is used for describing the relative dispersity of the data, can be calculated by using a formula (3), skewness is an index used for describing the symmetry of the data, can be calculated by using a formula (4), kurtosis can describe the steepness degree of the data distribution form relative to normal distribution, and can be calculated by using a formula (5), wherein M is the number of pixels in any sub-feature diagram, X is the number of pixels in any sub-feature diagram, andiis the pixel value of the ith pixel in any sub-feature map,
Figure BDA0002601106390000071
is the mean value, S, of any of the sub-feature mapsnThe variance of any sub-feature map, CV, skewness, kurtosis, and kurtosis are the variance, skewness, and kurtosis of any sub-feature map.
Figure BDA0002601106390000072
Figure BDA0002601106390000073
Figure BDA0002601106390000074
Figure BDA0002601106390000075
Figure BDA0002601106390000076
S13: and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in the database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized.
After obtaining the weight vector of each input feature map, a corresponding weighted summation calculation may be implemented, specifically, the number of input feature maps may be represented as N, the input feature maps may be represented as b1 to bN, the corresponding weight vectors may be represented as W1 to WN, the W1 to WN are multiplied by the corresponding input feature maps b1 to bN at the pixel level according to the correspondence of the channels, respectively, channel weighted vectors wb1 to wbN are obtained, all channel weighted vectors wb1 to wbN are added, and the final result is output, which may be calculated by using equation (6):
Figure BDA0002601106390000081
wherein, the l represents a Hadamard product,
Figure BDA0002601106390000082
Wi(i-th weight vector) corresponding to the weight of the channel and biAll elements of the corresponding channel (i-th input profile) are multiplied, and O represents the output profile.
The method comprises the steps that the pedestrian images of pedestrians known to be included in the pedestrian images to be recognized can be processed in advance according to the method for processing the pedestrian images to be recognized disclosed by the application, corresponding standard features can be obtained after final weighted summation calculation, and then the standard features are stored in a database, namely, each standard feature in the database is a feature obtained from the pedestrian image of the pedestrian known to be included in the database; therefore, the features to be recognized are compared with the standard features, and if the features to be recognized are matched with the standard features, the same pedestrian is contained in the pedestrian image corresponding to the features. When determining whether the features to be identified are matched with any standard feature, calculating the Euclidean distance between the features to be identified and any standard feature, and considering the standard feature with the minimum Euclidean distance as the standard feature matched with the features to be identified so as to realize a final pedestrian retrieval task; of course, other settings according to actual needs are within the protection scope of the present invention.
According to the technical scheme, after a plurality of input feature maps which are extracted from a pedestrian image to be recognized and need to be input to multiple branches are obtained, each input feature map is fused to obtain a fused feature map, statistics of feature maps corresponding to each channel in the fused feature maps are calculated to obtain statistic vectors consisting of the statistics, the statistic vectors are learned based on structures containing full connection layers of corresponding numbers to obtain importance vectors corresponding to the input feature maps respectively, then a weight vector corresponding to each feature map is obtained based on each importance vector, each input feature map and the corresponding weight vectors are subjected to weighted summation to obtain the feature to be recognized, and finally the feature to be recognized is compared with standard features in a database to determine the pedestrian contained in the pedestrian image; the method realizes the feature selection of the multi-branch channel level, thereby enhancing the expression capability of the channel with improved performance in the network, simultaneously inhibiting the expression capability of the channel with little influence on the final result, realizing the effective screening of pedestrian feature expression and discriminative features, and further improving the identification accuracy.
The pedestrian re-identification method provided by the embodiment of the invention determines the weight vector corresponding to each input feature map based on the importance vector, and can comprise the following steps:
normalizing each importance vector to obtain a first weight vector corresponding to each input feature map respectively; forming corresponding vectors by elements at the same position in each importance vector to obtain a plurality of corresponding recombination vectors, normalizing each recombination vector, and replacing the value of the same element in each recombination vector with the value of each element in each recombination vector to obtain a second weight vector corresponding to each input feature map;
and fusing the first weight vector and the second weight vector corresponding to each input feature map respectively to obtain the weight vector corresponding to each input feature map respectively.
It should be noted that after obtaining the importance vector, the present application proposes a bidirectional attention model, so as to simultaneously implement importance selection of channels in branches and importance selection of channels between branches; specifically, two parts, namely bidirectional channel importance screening and attention weight fusion, can be included, and the following steps are respectively included:
the first step is as follows:
in order to really realize the importance screening of the channels among the branches, the realization scheme of the invention is as follows:
for all the importance vectors, extracting the elements of the corresponding channels to form a new vector vi, and extracting the positions of the corresponding channels means that: the dimension of the given importance vector is C × 1, N feature maps, namely N branches, from I1 to IN are shared, I1 to IN are traversed, the first element of the N feature maps is taken (namely all vectors Ii are traversed to take the value of the first element), a new vector v1 is formed, I1 to IN are traversed, the second element of the N feature maps is taken (namely all vectors Ii are traversed to take the value of the first element), a new vector v2 is formed, and the like, the dimension of vi is S × 1, and S is the number of channels;
normalizing each recombination vector vi through a softmax function to really realize the screening of the channel level of the input feature map;
replacing the value of the corresponding element in the original feature vector Ii with the value of each element in the normalized recombination vector in a one-to-one correspondence manner, namely, restoring the normalized feature to the original position to obtain new vectors Q1 to QN (second weight vector);
the second step is that:
in order to realize the normalization of the channel weight in the branch, the invention sequentially realizes the normalization of all the importance vectors through the activation function, thereby mapping the channel weight to a reasonable range [0,1], and the invention adopts the sigmoid activation function to realize the nonlinear mapping of the weight. The concrete implementation is as follows: the dimension of the given importance vector is C × 1, C represents the number of channels of the ith branch, because the output value of the importance vector is distributed from 0 to infinity after passing through the two fully-connected layers, in order to implement weight normalization, a sigmoid function is applied to all elements of the importance vector to obtain normalized intra-branch channel weight vectors P1 to PN (i.e., the first weight vector represents the importance degree of the channel corresponding to the input feature map), and normalization can be implemented by using formula (7):
Figure BDA0002601106390000101
the third step: weight fusion
The first two steps simultaneously calculate the importance weight of the channel in the branch and the importance weight of the channel between the branches, and the acquired channel weights can be fused in order to better utilize the acquired importance information of the channels.
Specifically, the fusing the first weight vector and the second weight vector respectively corresponding to each input feature map may include:
and (3) fusing the first weight vector and the second weight vector respectively corresponding to each input feature map according to the following formula (8):
Wi=αiPiiQiformula (8)
Wherein, PiIs a first weight vector, Q, of the ith input feature mapiA second weight vector, α, for the ith input feature mapiAnd betaiAre hyper-parameters, W, corresponding to the i-th trainable input feature mapiFor the weight vector of the ith input feature map, i represents the ith input feature map (since the input feature maps correspond to the branches one by one, it can also be understood as the ith branch), where i ∈ [1, N ]]N is the number of input profiles (which can also be understood as the number of branches since the input profiles correspond one-to-one to the branches).
The embodiment of the invention provides a pedestrian re-identification method, which normalizes each importance vector and each recombination vector and comprises the following steps:
each recombined vector is normalized by a softmax function, and each importance vector is normalized by a sigmoid function.
The reconstruction vector can be normalized by utilizing a softmax function, and the importance vector can be normalized by utilizing a sigmoid function, so that the reconstruction vector and the importance vector can be quickly and effectively limited in a reasonable range, and the effectiveness of expressing the importance of the corresponding channel by the reconstruction vector and the importance vector is improved.
The pedestrian re-identification method provided by the embodiment of the invention obtains the features to be identified based on the output feature map, and comprises the following steps:
and processing the output characteristic graph based on the Global Pooling layer, the two full-link layers and the softmax function in sequence to obtain corresponding characteristics to be recognized.
According to the method and the device, the output characteristic diagram can be sequentially processed based on the Global Pooling layer, the two full connection layers and the softmax function, so that the characteristic to be identified, which can be compared with the standard characteristic, can be effectively and quickly obtained; of course, other settings according to actual needs are within the protection scope of the present invention. In addition, in a specific implementation manner, the step of processing the image of the pedestrian to be recognized by the present application to obtain the corresponding feature to be recognized may be encapsulated in a model, which is referred to as a novel attention model, and as shown in fig. 2, the present invention provides a basic block structure (E-block) based on the novel attention model, and it can be seen that the novel attention model provided by the present application is plug-and-play, i.e., corresponds to a multi-branch structure, and is directly inserted into the multi-branch output position, and finally a weighted feature map (i.e., an output feature map) is obtained; fig. 3 is a novel network structure diagram containing a basic block structure based on a novel attention model, which can implement image classification, and the feature to be identified extracted in the present application may be a feature output by a softmax layer, where the feature is a vector.
According to the pedestrian re-identification method provided by the embodiment of the invention, the structure comprising the full connection layer can sequentially comprise the first full connection layer, the first activation function, the second full connection layer and the second activation function, and the values of parameters in different structures comprising the full connection layer are not completely the same.
When learning of the statistic vector is achieved, the learning can be achieved by using structures such as full connected layer- > Relu- > full connected layer- > Relu, and values of parameters contained in each structure are not identical, so that effective learning of the corresponding vector is achieved, and the importance of the corresponding input feature map is obtained.
The pedestrian re-identification method provided by the embodiment of the invention can further comprise the following steps after determining that the pedestrian with the standard characteristic matched with the characteristic to be identified is the pedestrian contained in the image of the pedestrian to be identified:
and recording and displaying the pedestrian identification corresponding to the pedestrian contained in the pedestrian image to be recognized.
In order to facilitate the staff and the like to acquire the information at any time, the pedestrian identification (such as a serial number) which can uniquely represent the identified pedestrian can be recorded and displayed, and of course, other settings can be performed according to actual needs, which are all within the protection scope of the invention.
For a multi-branch structure, in the feature extraction stage, each branch has different functions, for example, each branch provides features of different receptive fields, and can provide richer features in the fusion stage, but when the branches increase to provide the richer features, a large amount of noise is undoubtedly introduced, and the redundant features do not improve the performance of the network under more conditions, and usually damage the classification capability of the classifier, so a mechanism needs to be designed to remove the redundant features and retain the features with optimal discriminative power; in a specific implementation manner, the input feature map is illustrated by taking four branches as examples in the pedestrian re-identification method provided in the embodiment of the present invention, and an implementation process map of the method may be as shown in fig. 4, and may include four stages, i.e., feature fusion 1, feature compression and splitting 2, feature screening 3, and feature weighting 4. Wherein the dotted line part is the above novel attention model; the specific implementation can include:
1) extraction phase of multi-branch features
The method comprises the steps of obtaining characteristic input (namely, inputting a characteristic diagram) of a multi-branch pedestrian image from any characteristic extraction module, wherein the characteristic extraction module of a multi-branch network is not limited to be used; assuming that the input feature map of each branch is c × H × W, b is used for each branch featureiAnd (4) showing.
2) Feature fusion phase
The multi-branch input feature maps input at the stage are fused, and the fusion operation can adopt two types, namely a) all the input feature maps are added, and b) all the input feature maps are connected together in the channel dimension (concat); the fused feature (i.e., the fused feature map) may be represented by F, the dimension of which is C × H × W. When the number of the 4 branch feature map channels is the same, the 4 branch feature map channels may be added, and when the number of the 4 branch feature map channels is different, the 4 branch feature map channels may be concat together, and the operation of adding is drawn in fig. 4 of the present invention.
3) Feature compression
The fused feature map contains a large amount of information, the dimensionality of the fused feature map is C multiplied by H multiplied by W, and C represents the channel number of the fused feature map; the attention mechanism of the invention represents the importance of each channel by traversing each channel C, solving the mean value H multiplied by W of the corresponding channel characteristic diagram, and obtaining a vector V (statistic vector) consisting of the mean values of the C channels after solving the mean value of the C channels.
4) Feature screening phase
Firstly, establishing a plurality of branches according to the obtained V, as shown in FIG. 4;
each branch is independent to calculate the importance degree of the channel of the input characteristic diagram, namely an attention mechanism; each branch comprises 2 full connection layers, the specific structure is full connected layer- > Relu- > full connected layer- > Relu, and the structure learns V to acquire the importance of the corresponding input feature diagram channel; a channel importance vector Ii can be obtained by this operation (Ii is the ith importance vector);
secondly, the invention provides a bidirectional attention model, the essence of the invention is to realize the importance selection of the feature map channel in the branch and the importance selection of the feature map channel between the branches at the same time, in order to achieve the above purpose, the invention provides a bidirectional attention mechanism, the mechanism comprises 2 parts, which are bidirectional channel importance screening and attention weight fusion respectively, and the following can be respectively:
the first step is as follows:
in order to really realize the importance screening of the channels among the branches, the realization scheme of the invention is as follows:
and extracting elements of corresponding channels from all the importance vectors to form a new vector vi, wherein the extracted elements of the corresponding channels are as follows: the dimension of the given importance vector is C × 1, N feature maps, namely N branches, from I1 to IN are total, I1 to IN are traversed, the ith element of the feature maps is taken respectively (namely all vectors Ii are traversed and the value of the ith element is taken), a new vector vi is formed, the dimension of vi is S × 1, and S is the channel number;
normalizing each recombination vector vi through a softmax function to really realize the screening of the channel level of the input feature map;
replacing the value of the corresponding element in the original feature vector Ii with the value of each element in the normalized recombination vector in a one-to-one correspondence manner, namely, restoring the normalized feature to the original position to obtain new vectors Q1 to QN (second weight vector);
the second step is that:
in order to realize the normalization of the channel weight in the branch, the invention sequentially realizes the normalization of all the importance vectors through the activation function, thereby mapping the channel weight to a reasonable range [0,1], and the invention adopts the sigmoid activation function to realize the nonlinear mapping of the weight. The concrete implementation is as follows: the dimension of the given importance vector is C × 1, C represents the number of channels of the ith branch, because the output value of the importance vector is distributed from 0 to infinity after passing through the two fully-connected layers, in order to implement weight normalization, a sigmoid function is applied to all elements of the importance vector to obtain normalized intra-branch channel weight vectors P1 to PN (i.e. the first weight vector represents the importance degree of the channel corresponding to the input feature map), and normalization can be implemented by using formula (9):
Figure BDA0002601106390000131
the third step: weight fusion
The importance weight of the channel in the branch and the importance weight of the channel between the branches are calculated at the same time in the first two steps, and the acquired channel weights can be fused in order to better utilize the acquired importance information of the channels; in particular, 2 trainable hyper-parameters α are provided for each branch in the networkiAnd betaiAnd i represents the ith branch, and the channel weight fusion calculation formula of each branch is as follows:
Wi=αiPiiQi
wi represents the last fused channel weight feature, i.e. the weight vector.
5) In the feature weighting stage
Weighting the input feature maps by using the trained and screened channel importance features W1 to WN, wherein the weighting method comprises the steps of firstly, multiplying W1 to WN by corresponding input feature maps b1 to bN in a pixel level mode according to the correspondence of channels to obtain vectors wb1 to wbN after channel weighting, and secondly, adding all the weighted features of the channels and outputting a final result.
In addition, since the input feature map corresponds to the branches receiving the input feature map one to one, and the input feature map corresponds to the branches processing the statistic vector one to one, the branches receiving the input feature map and the processing statistic vector correspond to one.
Therefore, in order to improve the accuracy in neural network training, especially in a multi-branch-based network structure, the method and the device enhance the characteristic diagram channels contributing to the final result, inhibit the channels with little influence on the final network accuracy or performance, and weight the channels of the characteristic diagram in the network layer, thereby improving the network accuracy.
An embodiment of the present invention further provides a pedestrian re-identification apparatus, as shown in fig. 5, which may include:
an obtaining module 11, configured to: acquiring a plurality of input characteristic graphs which are obtained by extracting a pedestrian image to be identified and need to be input to multiple branches;
a pre-processing module 12 for: fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing full-connection layers to obtain importance vectors corresponding to each input feature map respectively, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input characteristic diagram comprises the weight of each channel in the input characteristic diagram;
an identification module 13 configured to: and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in the database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized.
In the pedestrian re-identification apparatus provided in the embodiment of the present invention, the preprocessing module may include:
a weight determination unit for: normalizing each importance vector to obtain a first weight vector corresponding to each input feature map respectively; forming corresponding vectors by elements at the same position in each importance vector to obtain a plurality of corresponding recombination vectors, normalizing each recombination vector, and replacing the value of the same element in each recombination vector with the value of each element in each recombination vector to obtain a second weight vector corresponding to each input feature map; and fusing the first weight vector and the second weight vector corresponding to each input feature map respectively to obtain the weight vector corresponding to each input feature map respectively.
In the pedestrian re-identification apparatus provided in the embodiment of the present invention, the weight determination unit may include:
a weight fusion subunit to: and fusing the first weight vector and the second weight vector respectively corresponding to each input feature map according to the following formula:
Wi=αiPiiQi
wherein, PiIs a first weight vector, Q, of the ith input feature mapiA second weight vector, α, for the ith input feature mapiAnd betaiAre hyper-parameters, W, corresponding to trainable ith input feature mapiIs the weight vector of the ith input feature map.
In the pedestrian re-identification apparatus provided in the embodiment of the present invention, the weight determination unit may include:
a normalizing subunit to: each recombined vector is normalized by a softmax function, and each importance vector is normalized by a sigmoid function.
The embodiment of the invention provides a pedestrian re-identification device, wherein an identification module comprises:
a feature acquisition unit configured to: and processing the output characteristic graph based on the Global Pooling layer, the two full-link layers and the softmax function in sequence to obtain corresponding characteristics to be recognized.
The pedestrian re-identification device provided by the embodiment of the invention comprises a structure comprising the full connection layer, wherein the structure comprises the first full connection layer, the first activation function, the second full connection layer and the second activation function in sequence, and the values of parameters in different structures comprising the full connection layer are not identical.
The pedestrian re-identification device provided by the embodiment of the invention can further comprise:
a record output module to: and after determining that the pedestrian with the standard characteristic matched with the characteristic to be recognized is the pedestrian contained in the image of the pedestrian to be recognized, recording and displaying the pedestrian identification corresponding to the pedestrian contained in the image of the pedestrian to be recognized.
An embodiment of the present invention further provides a pedestrian re-identification device, which may include:
a memory for storing a computer program;
a processor for implementing the steps of the pedestrian re-identification method as described above when executing the computer program.
The embodiment of the invention also provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program realizes the steps of any pedestrian re-identification method.
It should be noted that for the description of the relevant parts in the pedestrian re-identification apparatus, the device and the storage medium provided in the embodiment of the present invention, reference is made to the detailed description of the corresponding parts in the pedestrian re-identification method provided in the embodiment of the present invention, and no further description is given here. In addition, parts of the above technical solutions provided in the embodiments of the present invention that are consistent with the implementation principles of the corresponding technical solutions in the prior art are not described in detail, so as to avoid redundant description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A pedestrian re-identification method is characterized by comprising the following steps:
acquiring a plurality of input characteristic graphs which are obtained by extracting a pedestrian image to be identified and need to be input to multiple branches;
fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing fully-connected layers to obtain importance vectors respectively corresponding to each input feature map, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input characteristic diagram comprises the weight of each channel in the input characteristic diagram;
and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in a database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized.
2. The method of claim 1, wherein determining a weight vector for each of the input feature maps based on the importance vectors comprises:
normalizing each importance vector to obtain a first weight vector corresponding to each input feature map; forming corresponding vectors by using elements at the same position in each importance vector to obtain a plurality of corresponding recombination vectors, normalizing each recombination vector, and replacing the value of the same element in each recombination vector with the value of each element in each recombination vector to obtain a second weight vector corresponding to each input feature map;
and fusing the first weight vector and the second weight vector respectively corresponding to each input feature map to obtain the weight vector respectively corresponding to each input feature map.
3. The method of claim 2, wherein fusing the first weight vector and the second weight vector respectively corresponding to each of the input feature maps comprises:
fusing the first weight vector and the second weight vector respectively corresponding to each input feature map according to the following formula:
Wi=αiPiiQi
wherein, PiIs a first weight vector, Q, of the ith input feature mapiA second weight vector, α, for the ith input feature mapiAnd betaiAre hyper-parameters, W, corresponding to trainable ith input feature mapiIs the weight vector of the ith input feature map.
4. The method of claim 3, wherein normalizing each of the importance vectors and the rebinning vectors comprises:
normalizing each recombination vector by using a softmax function, and normalizing each importance vector by using a sigmoid function.
5. The method of claim 4, wherein obtaining the feature to be identified based on the output feature map comprises:
and processing the output characteristic graph on the basis of a Global Pooling layer, two full-connection layers and a softmax function in sequence to obtain corresponding characteristics to be recognized.
6. The method of claim 5, wherein the structure comprising the fully-connected layers comprises a first fully-connected layer, a first activation function, a second fully-connected layer, and a second activation function in this order, and values of parameters in different structures comprising the fully-connected layers are not identical.
7. The method according to claim 6, wherein after determining that the pedestrian with the standard feature matching the feature to be recognized is a pedestrian included in the image of the pedestrian to be recognized, the method further comprises:
and recording and displaying the pedestrian identification corresponding to the pedestrian contained in the pedestrian image to be recognized.
8. A pedestrian re-recognition apparatus, comprising:
an acquisition module to: acquiring a plurality of input characteristic graphs which are obtained by extracting a pedestrian image to be identified and need to be input to multiple branches;
a pre-processing module to: fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing fully-connected layers to obtain importance vectors respectively corresponding to each input feature map, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input characteristic diagram comprises the weight of each channel in the input characteristic diagram;
an identification module to: and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in a database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized.
9. A pedestrian re-recognition apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the pedestrian re-identification method according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the pedestrian re-identification method according to any one of claims 1 to 7.
CN202010724262.8A 2020-07-24 2020-07-24 Pedestrian re-identification method, device, equipment and storage medium Withdrawn CN111860368A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010724262.8A CN111860368A (en) 2020-07-24 2020-07-24 Pedestrian re-identification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010724262.8A CN111860368A (en) 2020-07-24 2020-07-24 Pedestrian re-identification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN111860368A true CN111860368A (en) 2020-10-30

Family

ID=72950857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010724262.8A Withdrawn CN111860368A (en) 2020-07-24 2020-07-24 Pedestrian re-identification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111860368A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112584108A (en) * 2021-03-01 2021-03-30 杭州科技职业技术学院 Line physical damage monitoring method for unmanned aerial vehicle inspection
CN113989579A (en) * 2021-10-27 2022-01-28 腾讯科技(深圳)有限公司 Image detection method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112584108A (en) * 2021-03-01 2021-03-30 杭州科技职业技术学院 Line physical damage monitoring method for unmanned aerial vehicle inspection
CN112584108B (en) * 2021-03-01 2021-06-04 杭州科技职业技术学院 Line physical damage monitoring method for unmanned aerial vehicle inspection
CN113989579A (en) * 2021-10-27 2022-01-28 腾讯科技(深圳)有限公司 Image detection method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2021043168A1 (en) Person re-identification network training method and person re-identification method and apparatus
Shao et al. Feature learning for image classification via multiobjective genetic programming
CN111898736B (en) Efficient pedestrian re-identification method based on attribute perception
CN111259850A (en) Pedestrian re-identification method integrating random batch mask and multi-scale representation learning
CN110837846A (en) Image recognition model construction method, image recognition method and device
CN111783576A (en) Pedestrian re-identification method based on improved YOLOv3 network and feature fusion
CN105981008A (en) Learning deep face representation
CN110222718B (en) Image processing method and device
CN110390308B (en) Video behavior identification method based on space-time confrontation generation network
CN112464730B (en) Pedestrian re-identification method based on domain-independent foreground feature learning
CN112801015A (en) Multi-mode face recognition method based on attention mechanism
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
CN113610046B (en) Behavior recognition method based on depth video linkage characteristics
CN112580458A (en) Facial expression recognition method, device, equipment and storage medium
CN111860368A (en) Pedestrian re-identification method, device, equipment and storage medium
CN115410087A (en) Transmission line foreign matter detection method based on improved YOLOv4
CN110135435B (en) Saliency detection method and device based on breadth learning system
CN111291785A (en) Target detection method, device, equipment and storage medium
CN113033587B (en) Image recognition result evaluation method and device, electronic equipment and storage medium
CN112528077A (en) Video face retrieval method and system based on video embedding
CN111881803A (en) Livestock face recognition method based on improved YOLOv3
CN114841887A (en) Image restoration quality evaluation method based on multi-level difference learning
CN115661754A (en) Pedestrian re-identification method based on dimension fusion attention
CN114202659A (en) Fine-grained image classification method based on spatial symmetry irregular local region feature extraction
CN111860374A (en) Pedestrian re-identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20201030