CN111860368A

CN111860368A - Pedestrian re-identification method, device, equipment and storage medium

Info

Publication number: CN111860368A
Application number: CN202010724262.8A
Authority: CN
Inventors: 范宝余; 王立; 郭振华; 赵雅倩
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-10-30

Abstract

The invention discloses a pedestrian re-identification method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of input characteristic graphs obtained by extracting pedestrian images to be recognized; fusing each input feature map to obtain a fused feature map, calculating statistics of feature maps corresponding to each channel in the fused feature map to obtain a statistic vector, learning the statistic vector based on different structures containing fully-connected layers to obtain an importance vector, and determining a weight vector corresponding to each channel in each input feature map based on the importance vector; and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in the database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized. The pedestrian feature expression and the effective screening of discriminative features can be realized, and the identification accuracy is improved.

Description

Pedestrian re-identification method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a pedestrian re-identification method, a pedestrian re-identification device, pedestrian re-identification equipment and a storage medium.

Background

The pedestrian re-identification is a technology for searching whether a specific pedestrian exists in an image or a video shot under a non-overlapped vision field, and in a task of the pedestrian re-identification, the task is mainly divided into 2 steps: 1) acquiring pedestrian characteristics; 2) and (4) metric learning. The pedestrian feature expression and the screening of discriminative features directly determine whether a target pedestrian can be correctly identified, and are important links of a pedestrian re-identification task.

Disclosure of Invention

The invention aims to provide a pedestrian re-identification method, a pedestrian re-identification device, pedestrian re-identification equipment and a pedestrian re-identification storage medium, which can realize effective screening of pedestrian feature expression and discriminative features and further improve identification accuracy.

In order to achieve the above purpose, the invention provides the following technical scheme:

a pedestrian re-identification method, comprising:

acquiring a plurality of input characteristic graphs which are obtained by extracting a pedestrian image to be identified and need to be input to multiple branches;

fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing fully-connected layers to obtain importance vectors respectively corresponding to each input feature map, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input characteristic diagram comprises the weight of each channel in the input characteristic diagram;

and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in a database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized.

Preferably, determining a weight vector corresponding to each input feature map based on the importance vector includes:

normalizing each importance vector to obtain a first weight vector corresponding to each input feature map; forming corresponding vectors by using elements at the same position in each importance vector to obtain a plurality of corresponding recombination vectors, normalizing each recombination vector, and replacing the value of the same element in each recombination vector with the value of each element in each recombination vector to obtain a second weight vector corresponding to each input feature map;

and fusing the first weight vector and the second weight vector respectively corresponding to each input feature map to obtain the weight vector respectively corresponding to each input feature map.

Preferably, the fusing the first weight vector and the second weight vector respectively corresponding to each of the input feature maps includes:

fusing the first weight vector and the second weight vector respectively corresponding to each input feature map according to the following formula:

W_i＝α_iP_i+β_iQ_i

wherein, P_iIs a first weight vector, Q, of the ith input feature map_iA second weight vector, α, for the ith input feature map_iAnd beta_iAre hyper-parameters, W, corresponding to trainable ith input feature map_iIs the weight vector of the ith input feature map.

Preferably, normalizing each of the importance vectors and the recombination vectors includes:

normalizing each recombination vector by using a softmax function, and normalizing each importance vector by using a sigmoid function.

Preferably, obtaining the feature to be identified based on the output feature map includes:

and processing the output characteristic graph on the basis of a Global Pooling layer, two full-connection layers and a softmax function in sequence to obtain corresponding characteristics to be recognized.

Preferably, the structure including the fully-connected layer includes a first fully-connected layer, a first activation function, a second fully-connected layer, and a second activation function in sequence, and values of parameters in different structures including the fully-connected layer are not completely the same.

Preferably, after determining that the pedestrian with the standard feature matching the feature to be recognized is a pedestrian included in the image of the pedestrian to be recognized, the method further includes:

and recording and displaying the pedestrian identification corresponding to the pedestrian contained in the pedestrian image to be recognized.

A pedestrian re-identification apparatus comprising:

an acquisition module to: acquiring a plurality of input characteristic graphs which are obtained by extracting a pedestrian image to be identified and need to be input to multiple branches;

a pre-processing module to: fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing fully-connected layers to obtain importance vectors respectively corresponding to each input feature map, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input characteristic diagram comprises the weight of each channel in the input characteristic diagram;

an identification module to: and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in a database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized.

A pedestrian re-identification apparatus comprising:

a memory for storing a computer program;

a processor for implementing the steps of the pedestrian re-identification method as described in any one of the above when the computer program is executed.

A computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the pedestrian re-identification method of any one of the preceding claims.

The invention provides a pedestrian re-identification method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring a plurality of input characteristic graphs which are obtained by extracting a pedestrian image to be identified and need to be input to multiple branches; fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing fully-connected layers to obtain importance vectors respectively corresponding to each input feature map, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input characteristic diagram comprises the weight of each channel in the input characteristic diagram; and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in a database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized. According to the technical scheme, after a plurality of input feature maps which are extracted from a pedestrian image to be recognized and need to be input to multiple branches are obtained, each input feature map is fused to obtain a fused feature map, statistics of feature maps corresponding to each channel in the fused feature maps are calculated to obtain statistic vectors consisting of the statistics, the statistic vectors are learned based on structures containing full connection layers of corresponding numbers to obtain importance vectors corresponding to the input feature maps respectively, then a weight vector corresponding to each feature map is obtained based on each importance vector, each input feature map and the corresponding weight vectors are subjected to weighted summation to obtain the feature to be recognized, and finally the feature to be recognized is compared with standard features in a database to determine the pedestrian contained in the pedestrian image; the method realizes the feature selection of the multi-branch channel level, thereby enhancing the expression capability of the channel with improved performance in the network, simultaneously inhibiting the expression capability of the channel with little influence on the final result, realizing the effective screening of pedestrian feature expression and discriminative features, and further improving the identification accuracy.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart of a pedestrian re-identification method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a basic block structure based on a novel attention model in the pedestrian re-identification method provided by the embodiment of the invention;

fig. 3 is a novel network structure diagram including a basic block structure based on a novel attention model in the pedestrian re-identification method according to the embodiment of the present invention;

fig. 4 is a schematic diagram of an implementation process of processing an input feature map in a pedestrian re-identification method according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a pedestrian re-identification apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Deep learning has achieved great success in solving problems in the computer vision field such as image classification, image segmentation and object detection, and a plurality of excellent convolutional neural network models exist at present. Among them, the Attention Model (one of deep learning models) is widely used in various types of deep learning tasks such as natural language processing, image recognition, and speech recognition, and is one of the most important core technologies in deep learning technologies, so the Attention Model (Attention) is widely regarded and is an important research direction.

The visual attention mechanism is a brain signal processing mechanism specific to human vision, the human vision obtains a target area needing important attention by rapidly scanning a global image, namely a focus of attention, and then puts more attention resources into the area to obtain more detailed information of the target needing attention and suppress other useless information. The method is a means for rapidly screening high-value information from a large amount of information by using limited attention resources, is a survival mechanism formed in long-term evolution of human beings, and greatly improves the efficiency and accuracy of visual information processing by using the human visual attention mechanism. For networks, the Attention mechanism is colloquially designed to focus Attention on important points and ignore other unimportant factors. Wherein the judgment of the importance degree depends on different network structures or application scenarios.

With the continuous development of deep learning technology, network models are developed endlessly, but in order to further improve the precision, researchers tend to design new network structures in the direction of deepening or widening the network. It is undeniable that as the network becomes deeper or wider, the learning capability of the model is also enhanced, but the calculation amount and parameter amount of the model also increase rapidly, which is not beneficial to deployment in practical application, and meanwhile, as the number of model layers becomes larger, a large amount of noise (i.e. a lot of useless features) is inevitably introduced, and too many features usually not only do not improve the capability of the network model, but also confuse the classifier, thereby reducing the recognition capability of the network. Therefore, good discrimination, the ability to maximize the model, can be achieved only if a limited number of discriminative features can be selected, and the attention mechanism (attention mechanism) has been widely adopted because it shows a great advantage in feature selection ability.

In order to improve the precision in neural network training, especially in a multi-branch-based network structure, a characteristic diagram channel contributing to a final result is enhanced, and a channel with little influence on the final network precision or performance is restrained.

Referring to fig. 1, a flowchart of a pedestrian re-identification method according to an embodiment of the present invention is shown, where the method includes:

s11: and acquiring a plurality of input characteristic graphs which are obtained by extracting the pedestrian image to be identified and need to be input to multiple branches.

The execution main body of the pedestrian re-identification method provided by the embodiment of the invention can be a corresponding pedestrian re-identification device; the technical scheme provided by the application is more suitable for multi-branch networks, such as googlenet, resnext and the like; if a pedestrian image (i.e. an image including a pedestrian) needs to be re-identified, the pedestrian image may be processed first, so as to extract an input feature map of each branch in the network that needs to be input to the multi-branch, and the extraction of the input feature map of each branch in the network that needs to be input to the multi-branch may be implemented by using a portion (also referred to as a module) for feature extraction in any multi-branch network in the prior art.

S12: fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing full-connection layers to obtain importance vectors corresponding to each input feature map respectively, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input feature map comprises the weight of each channel in the input feature map.

After the input feature map of each branch is obtained, the input feature maps of all branches can be fused into a total feature map, which can be referred to as a fusion feature map in the present application, so as to realize the fusion of the input feature maps of each branch; after the fused feature map is obtained, a certain statistic of the feature map corresponding to each channel in the fused feature map can be calculated, and then the statistic vector is formed by the certain statistic of the feature map corresponding to each channel, so that feature compression is realized; in order to better learn the importance of different channels, after obtaining the statistic vector, the application can establish a plurality of branches (the branches correspond to the input feature map one by one), and each branch is independent to calculate the importance degree of each channel in the input feature map, namely an attention mechanism; each branch comprises different structures containing fully-connected layers, so that the statistic vector is learned by the structure of each branch, an importance vector corresponding to each input feature map can be obtained, a weight vector corresponding to each feature map is determined based on the importance vectors, and the weight vector corresponding to each feature map comprises weights representing the importance of each channel in the feature map.

It should be noted that the statistics of the feature map corresponding to each channel calculated when a certain statistic of the feature map corresponding to each channel is calculated are the same. The statistical quantity can include any one of the statistical quantities such as mean value, variance, coefficient of variation, skewness and kurtosis, and the mean value is preferably adopted as the statistical quantity to be calculated in the application; specifically, when the calculation of any statistic is realized, the calculation is realized on the basis of pixels in each sub-feature map; the mean value is used for describing the average quantity of the data value, the average quantity of all pixel values contained in the sub-feature graph is calculated in the application, the average quantity can be calculated by using a formula (1), the variance reflects the fluctuation and the stability of the data, the average quantity can be calculated by using a formula (2), and the coefficient of variation is the standard deviationAnd the ratio of the mean value, which is a dimensionless quantity, is used for describing the relative dispersity of the data, can be calculated by using a formula (3), skewness is an index used for describing the symmetry of the data, can be calculated by using a formula (4), kurtosis can describe the steepness degree of the data distribution form relative to normal distribution, and can be calculated by using a formula (5), wherein M is the number of pixels in any sub-feature diagram, X is the number of pixels in any sub-feature diagram, and_iis the pixel value of the ith pixel in any sub-feature map,

is the mean value, S, of any of the sub-feature maps_nThe variance of any sub-feature map, CV, skewness, kurtosis, and kurtosis are the variance, skewness, and kurtosis of any sub-feature map.

S13: and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in the database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized.

After obtaining the weight vector of each input feature map, a corresponding weighted summation calculation may be implemented, specifically, the number of input feature maps may be represented as N, the input feature maps may be represented as b1 to bN, the corresponding weight vectors may be represented as W1 to WN, the W1 to WN are multiplied by the corresponding input feature maps b1 to bN at the pixel level according to the correspondence of the channels, respectively, channel weighted vectors wb1 to wbN are obtained, all channel weighted vectors wb1 to wbN are added, and the final result is output, which may be calculated by using equation (6):

wherein, the l represents a Hadamard product,

W_i(i-th weight vector) corresponding to the weight of the channel and b_iAll elements of the corresponding channel (i-th input profile) are multiplied, and O represents the output profile.

The method comprises the steps that the pedestrian images of pedestrians known to be included in the pedestrian images to be recognized can be processed in advance according to the method for processing the pedestrian images to be recognized disclosed by the application, corresponding standard features can be obtained after final weighted summation calculation, and then the standard features are stored in a database, namely, each standard feature in the database is a feature obtained from the pedestrian image of the pedestrian known to be included in the database; therefore, the features to be recognized are compared with the standard features, and if the features to be recognized are matched with the standard features, the same pedestrian is contained in the pedestrian image corresponding to the features. When determining whether the features to be identified are matched with any standard feature, calculating the Euclidean distance between the features to be identified and any standard feature, and considering the standard feature with the minimum Euclidean distance as the standard feature matched with the features to be identified so as to realize a final pedestrian retrieval task; of course, other settings according to actual needs are within the protection scope of the present invention.

According to the technical scheme, after a plurality of input feature maps which are extracted from a pedestrian image to be recognized and need to be input to multiple branches are obtained, each input feature map is fused to obtain a fused feature map, statistics of feature maps corresponding to each channel in the fused feature maps are calculated to obtain statistic vectors consisting of the statistics, the statistic vectors are learned based on structures containing full connection layers of corresponding numbers to obtain importance vectors corresponding to the input feature maps respectively, then a weight vector corresponding to each feature map is obtained based on each importance vector, each input feature map and the corresponding weight vectors are subjected to weighted summation to obtain the feature to be recognized, and finally the feature to be recognized is compared with standard features in a database to determine the pedestrian contained in the pedestrian image; the method realizes the feature selection of the multi-branch channel level, thereby enhancing the expression capability of the channel with improved performance in the network, simultaneously inhibiting the expression capability of the channel with little influence on the final result, realizing the effective screening of pedestrian feature expression and discriminative features, and further improving the identification accuracy.

The pedestrian re-identification method provided by the embodiment of the invention determines the weight vector corresponding to each input feature map based on the importance vector, and can comprise the following steps:

normalizing each importance vector to obtain a first weight vector corresponding to each input feature map respectively; forming corresponding vectors by elements at the same position in each importance vector to obtain a plurality of corresponding recombination vectors, normalizing each recombination vector, and replacing the value of the same element in each recombination vector with the value of each element in each recombination vector to obtain a second weight vector corresponding to each input feature map;

and fusing the first weight vector and the second weight vector corresponding to each input feature map respectively to obtain the weight vector corresponding to each input feature map respectively.

It should be noted that after obtaining the importance vector, the present application proposes a bidirectional attention model, so as to simultaneously implement importance selection of channels in branches and importance selection of channels between branches; specifically, two parts, namely bidirectional channel importance screening and attention weight fusion, can be included, and the following steps are respectively included:

the first step is as follows:

in order to really realize the importance screening of the channels among the branches, the realization scheme of the invention is as follows:

for all the importance vectors, extracting the elements of the corresponding channels to form a new vector vi, and extracting the positions of the corresponding channels means that: the dimension of the given importance vector is C × 1, N feature maps, namely N branches, from I1 to IN are shared, I1 to IN are traversed, the first element of the N feature maps is taken (namely all vectors Ii are traversed to take the value of the first element), a new vector v1 is formed, I1 to IN are traversed, the second element of the N feature maps is taken (namely all vectors Ii are traversed to take the value of the first element), a new vector v2 is formed, and the like, the dimension of vi is S × 1, and S is the number of channels;

normalizing each recombination vector vi through a softmax function to really realize the screening of the channel level of the input feature map;

replacing the value of the corresponding element in the original feature vector Ii with the value of each element in the normalized recombination vector in a one-to-one correspondence manner, namely, restoring the normalized feature to the original position to obtain new vectors Q1 to QN (second weight vector);

the second step is that:

in order to realize the normalization of the channel weight in the branch, the invention sequentially realizes the normalization of all the importance vectors through the activation function, thereby mapping the channel weight to a reasonable range [0,1], and the invention adopts the sigmoid activation function to realize the nonlinear mapping of the weight. The concrete implementation is as follows: the dimension of the given importance vector is C × 1, C represents the number of channels of the ith branch, because the output value of the importance vector is distributed from 0 to infinity after passing through the two fully-connected layers, in order to implement weight normalization, a sigmoid function is applied to all elements of the importance vector to obtain normalized intra-branch channel weight vectors P1 to PN (i.e., the first weight vector represents the importance degree of the channel corresponding to the input feature map), and normalization can be implemented by using formula (7):

the third step: weight fusion

The first two steps simultaneously calculate the importance weight of the channel in the branch and the importance weight of the channel between the branches, and the acquired channel weights can be fused in order to better utilize the acquired importance information of the channels.

Specifically, the fusing the first weight vector and the second weight vector respectively corresponding to each input feature map may include:

and (3) fusing the first weight vector and the second weight vector respectively corresponding to each input feature map according to the following formula (8):

W_i＝α_iP_i+β_iQ_iformula (8)

Wherein, P_iIs a first weight vector, Q, of the ith input feature map_iA second weight vector, α, for the ith input feature map_iAnd beta_iAre hyper-parameters, W, corresponding to the i-th trainable input feature map_iFor the weight vector of the ith input feature map, i represents the ith input feature map (since the input feature maps correspond to the branches one by one, it can also be understood as the ith branch), where i ∈ [1, N ]]N is the number of input profiles (which can also be understood as the number of branches since the input profiles correspond one-to-one to the branches).

The embodiment of the invention provides a pedestrian re-identification method, which normalizes each importance vector and each recombination vector and comprises the following steps:

each recombined vector is normalized by a softmax function, and each importance vector is normalized by a sigmoid function.

The reconstruction vector can be normalized by utilizing a softmax function, and the importance vector can be normalized by utilizing a sigmoid function, so that the reconstruction vector and the importance vector can be quickly and effectively limited in a reasonable range, and the effectiveness of expressing the importance of the corresponding channel by the reconstruction vector and the importance vector is improved.

The pedestrian re-identification method provided by the embodiment of the invention obtains the features to be identified based on the output feature map, and comprises the following steps:

and processing the output characteristic graph based on the Global Pooling layer, the two full-link layers and the softmax function in sequence to obtain corresponding characteristics to be recognized.

According to the method and the device, the output characteristic diagram can be sequentially processed based on the Global Pooling layer, the two full connection layers and the softmax function, so that the characteristic to be identified, which can be compared with the standard characteristic, can be effectively and quickly obtained; of course, other settings according to actual needs are within the protection scope of the present invention. In addition, in a specific implementation manner, the step of processing the image of the pedestrian to be recognized by the present application to obtain the corresponding feature to be recognized may be encapsulated in a model, which is referred to as a novel attention model, and as shown in fig. 2, the present invention provides a basic block structure (E-block) based on the novel attention model, and it can be seen that the novel attention model provided by the present application is plug-and-play, i.e., corresponds to a multi-branch structure, and is directly inserted into the multi-branch output position, and finally a weighted feature map (i.e., an output feature map) is obtained; fig. 3 is a novel network structure diagram containing a basic block structure based on a novel attention model, which can implement image classification, and the feature to be identified extracted in the present application may be a feature output by a softmax layer, where the feature is a vector.

According to the pedestrian re-identification method provided by the embodiment of the invention, the structure comprising the full connection layer can sequentially comprise the first full connection layer, the first activation function, the second full connection layer and the second activation function, and the values of parameters in different structures comprising the full connection layer are not completely the same.

When learning of the statistic vector is achieved, the learning can be achieved by using structures such as full connected layer- > Relu- > full connected layer- > Relu, and values of parameters contained in each structure are not identical, so that effective learning of the corresponding vector is achieved, and the importance of the corresponding input feature map is obtained.

The pedestrian re-identification method provided by the embodiment of the invention can further comprise the following steps after determining that the pedestrian with the standard characteristic matched with the characteristic to be identified is the pedestrian contained in the image of the pedestrian to be identified:

In order to facilitate the staff and the like to acquire the information at any time, the pedestrian identification (such as a serial number) which can uniquely represent the identified pedestrian can be recorded and displayed, and of course, other settings can be performed according to actual needs, which are all within the protection scope of the invention.

For a multi-branch structure, in the feature extraction stage, each branch has different functions, for example, each branch provides features of different receptive fields, and can provide richer features in the fusion stage, but when the branches increase to provide the richer features, a large amount of noise is undoubtedly introduced, and the redundant features do not improve the performance of the network under more conditions, and usually damage the classification capability of the classifier, so a mechanism needs to be designed to remove the redundant features and retain the features with optimal discriminative power; in a specific implementation manner, the input feature map is illustrated by taking four branches as examples in the pedestrian re-identification method provided in the embodiment of the present invention, and an implementation process map of the method may be as shown in fig. 4, and may include four stages, i.e., feature fusion 1, feature compression and splitting 2, feature screening 3, and feature weighting 4. Wherein the dotted line part is the above novel attention model; the specific implementation can include:

1) extraction phase of multi-branch features

The method comprises the steps of obtaining characteristic input (namely, inputting a characteristic diagram) of a multi-branch pedestrian image from any characteristic extraction module, wherein the characteristic extraction module of a multi-branch network is not limited to be used; assuming that the input feature map of each branch is c × H × W, b is used for each branch feature_iAnd (4) showing.

2) Feature fusion phase

The multi-branch input feature maps input at the stage are fused, and the fusion operation can adopt two types, namely a) all the input feature maps are added, and b) all the input feature maps are connected together in the channel dimension (concat); the fused feature (i.e., the fused feature map) may be represented by F, the dimension of which is C × H × W. When the number of the 4 branch feature map channels is the same, the 4 branch feature map channels may be added, and when the number of the 4 branch feature map channels is different, the 4 branch feature map channels may be concat together, and the operation of adding is drawn in fig. 4 of the present invention.

3) Feature compression

The fused feature map contains a large amount of information, the dimensionality of the fused feature map is C multiplied by H multiplied by W, and C represents the channel number of the fused feature map; the attention mechanism of the invention represents the importance of each channel by traversing each channel C, solving the mean value H multiplied by W of the corresponding channel characteristic diagram, and obtaining a vector V (statistic vector) consisting of the mean values of the C channels after solving the mean value of the C channels.

4) Feature screening phase

Firstly, establishing a plurality of branches according to the obtained V, as shown in FIG. 4;

each branch is independent to calculate the importance degree of the channel of the input characteristic diagram, namely an attention mechanism; each branch comprises 2 full connection layers, the specific structure is full connected layer- > Relu- > full connected layer- > Relu, and the structure learns V to acquire the importance of the corresponding input feature diagram channel; a channel importance vector Ii can be obtained by this operation (Ii is the ith importance vector);

secondly, the invention provides a bidirectional attention model, the essence of the invention is to realize the importance selection of the feature map channel in the branch and the importance selection of the feature map channel between the branches at the same time, in order to achieve the above purpose, the invention provides a bidirectional attention mechanism, the mechanism comprises 2 parts, which are bidirectional channel importance screening and attention weight fusion respectively, and the following can be respectively:

the first step is as follows:

and extracting elements of corresponding channels from all the importance vectors to form a new vector vi, wherein the extracted elements of the corresponding channels are as follows: the dimension of the given importance vector is C × 1, N feature maps, namely N branches, from I1 to IN are total, I1 to IN are traversed, the ith element of the feature maps is taken respectively (namely all vectors Ii are traversed and the value of the ith element is taken), a new vector vi is formed, the dimension of vi is S × 1, and S is the channel number;

the second step is that:

in order to realize the normalization of the channel weight in the branch, the invention sequentially realizes the normalization of all the importance vectors through the activation function, thereby mapping the channel weight to a reasonable range [0,1], and the invention adopts the sigmoid activation function to realize the nonlinear mapping of the weight. The concrete implementation is as follows: the dimension of the given importance vector is C × 1, C represents the number of channels of the ith branch, because the output value of the importance vector is distributed from 0 to infinity after passing through the two fully-connected layers, in order to implement weight normalization, a sigmoid function is applied to all elements of the importance vector to obtain normalized intra-branch channel weight vectors P1 to PN (i.e. the first weight vector represents the importance degree of the channel corresponding to the input feature map), and normalization can be implemented by using formula (9):

the third step: weight fusion

The importance weight of the channel in the branch and the importance weight of the channel between the branches are calculated at the same time in the first two steps, and the acquired channel weights can be fused in order to better utilize the acquired importance information of the channels; in particular, 2 trainable hyper-parameters α are provided for each branch in the network_iAnd beta_iAnd i represents the ith branch, and the channel weight fusion calculation formula of each branch is as follows:

W_i＝α_iP_i+β_iQ_i

wi represents the last fused channel weight feature, i.e. the weight vector.

5) In the feature weighting stage

Weighting the input feature maps by using the trained and screened channel importance features W1 to WN, wherein the weighting method comprises the steps of firstly, multiplying W1 to WN by corresponding input feature maps b1 to bN in a pixel level mode according to the correspondence of channels to obtain vectors wb1 to wbN after channel weighting, and secondly, adding all the weighted features of the channels and outputting a final result.

In addition, since the input feature map corresponds to the branches receiving the input feature map one to one, and the input feature map corresponds to the branches processing the statistic vector one to one, the branches receiving the input feature map and the processing statistic vector correspond to one.

Therefore, in order to improve the accuracy in neural network training, especially in a multi-branch-based network structure, the method and the device enhance the characteristic diagram channels contributing to the final result, inhibit the channels with little influence on the final network accuracy or performance, and weight the channels of the characteristic diagram in the network layer, thereby improving the network accuracy.

An embodiment of the present invention further provides a pedestrian re-identification apparatus, as shown in fig. 5, which may include:

an obtaining module 11, configured to: acquiring a plurality of input characteristic graphs which are obtained by extracting a pedestrian image to be identified and need to be input to multiple branches;

a pre-processing module 12 for: fusing each input feature map to obtain a corresponding fused feature map, calculating statistics of the feature map corresponding to each channel in the fused feature map to obtain a statistic vector containing the statistics corresponding to each channel, learning the statistic vector based on different structures containing full-connection layers to obtain importance vectors corresponding to each input feature map respectively, and determining a weight vector corresponding to each input feature map based on the importance vectors; wherein, the weight vector corresponding to any input characteristic diagram comprises the weight of each channel in the input characteristic diagram;

an identification module 13 configured to: and carrying out weighted summation on each input feature map and the corresponding weight vector to obtain a corresponding output feature map, obtaining features to be recognized based on the output feature map, comparing the features to be recognized with each standard feature in the database, and determining the pedestrians with the standard features matched with the features to be recognized as the pedestrians contained in the images of the pedestrians to be recognized.

In the pedestrian re-identification apparatus provided in the embodiment of the present invention, the preprocessing module may include:

a weight determination unit for: normalizing each importance vector to obtain a first weight vector corresponding to each input feature map respectively; forming corresponding vectors by elements at the same position in each importance vector to obtain a plurality of corresponding recombination vectors, normalizing each recombination vector, and replacing the value of the same element in each recombination vector with the value of each element in each recombination vector to obtain a second weight vector corresponding to each input feature map; and fusing the first weight vector and the second weight vector corresponding to each input feature map respectively to obtain the weight vector corresponding to each input feature map respectively.

In the pedestrian re-identification apparatus provided in the embodiment of the present invention, the weight determination unit may include:

a weight fusion subunit to: and fusing the first weight vector and the second weight vector respectively corresponding to each input feature map according to the following formula:

W_i＝α_iP_i+β_iQ_i

a normalizing subunit to: each recombined vector is normalized by a softmax function, and each importance vector is normalized by a sigmoid function.

The embodiment of the invention provides a pedestrian re-identification device, wherein an identification module comprises:

a feature acquisition unit configured to: and processing the output characteristic graph based on the Global Pooling layer, the two full-link layers and the softmax function in sequence to obtain corresponding characteristics to be recognized.

The pedestrian re-identification device provided by the embodiment of the invention comprises a structure comprising the full connection layer, wherein the structure comprises the first full connection layer, the first activation function, the second full connection layer and the second activation function in sequence, and the values of parameters in different structures comprising the full connection layer are not identical.

The pedestrian re-identification device provided by the embodiment of the invention can further comprise:

a record output module to: and after determining that the pedestrian with the standard characteristic matched with the characteristic to be recognized is the pedestrian contained in the image of the pedestrian to be recognized, recording and displaying the pedestrian identification corresponding to the pedestrian contained in the image of the pedestrian to be recognized.

An embodiment of the present invention further provides a pedestrian re-identification device, which may include:

a memory for storing a computer program;

a processor for implementing the steps of the pedestrian re-identification method as described above when executing the computer program.

The embodiment of the invention also provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when being executed by a processor, the computer program realizes the steps of any pedestrian re-identification method.

It should be noted that for the description of the relevant parts in the pedestrian re-identification apparatus, the device and the storage medium provided in the embodiment of the present invention, reference is made to the detailed description of the corresponding parts in the pedestrian re-identification method provided in the embodiment of the present invention, and no further description is given here. In addition, parts of the above technical solutions provided in the embodiments of the present invention that are consistent with the implementation principles of the corresponding technical solutions in the prior art are not described in detail, so as to avoid redundant description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A pedestrian re-identification method is characterized by comprising the following steps:

2. The method of claim 1, wherein determining a weight vector for each of the input feature maps based on the importance vectors comprises:

3. The method of claim 2, wherein fusing the first weight vector and the second weight vector respectively corresponding to each of the input feature maps comprises:

W_i＝α_iP_i+β_iQ_i

4. The method of claim 3, wherein normalizing each of the importance vectors and the rebinning vectors comprises:

5. The method of claim 4, wherein obtaining the feature to be identified based on the output feature map comprises:

6. The method of claim 5, wherein the structure comprising the fully-connected layers comprises a first fully-connected layer, a first activation function, a second fully-connected layer, and a second activation function in this order, and values of parameters in different structures comprising the fully-connected layers are not identical.

7. The method according to claim 6, wherein after determining that the pedestrian with the standard feature matching the feature to be recognized is a pedestrian included in the image of the pedestrian to be recognized, the method further comprises:

8. A pedestrian re-recognition apparatus, comprising:

9. A pedestrian re-recognition apparatus, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the pedestrian re-identification method according to any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the pedestrian re-identification method according to any one of claims 1 to 7.