CN112801008A

CN112801008A - Pedestrian re-identification method and device, electronic equipment and readable storage medium

Info

Publication number: CN112801008A
Application number: CN202110168058.7A
Authority: CN
Inventors: 黄燕挺; 冯子钜; 叶润源; 毛永雄; 董帅; 邹昆
Original assignee: Zhongshan Xidao Technology Co ltd; University of Electronic Science and Technology of China Zhongshan Institute
Current assignee: Zhongshan Xidao Technology Co ltd; University of Electronic Science and Technology of China Zhongshan Institute
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-05-14
Anticipated expiration: 2041-02-05

Abstract

The application provides a pedestrian re-identification method, a pedestrian re-identification device, electronic equipment and a readable storage medium, and relates to the technical field of image processing. According to the method, the first branch network is added to the pedestrian re-identification model to extract the pedestrian segmentation attention feature map, so that when the pedestrian re-identification is carried out, the model can pay more attention to the features of the region where the pedestrian is located, the more beneficial and more obvious features for the pedestrian re-identification can be extracted from the image, a better identification effect is achieved for the pedestrian re-identification under the condition that the pedestrian is shielded, and the accuracy of the pedestrian re-identification can be effectively improved.

Description

Pedestrian re-identification method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a pedestrian re-identification method, apparatus, electronic device, and readable storage medium.

Background

Pedestrian re-identification is a technology for researching fire and heat in the field of computer vision in recent years, identifies pedestrians through characteristics such as wearing, posture and hairstyle of the pedestrians, and is mainly oriented to identification and retrieval of the pedestrians in a cross-camera and cross-scene mode. The pedestrian re-identification method has wide application prospect in the fields of video monitoring, intelligent security and the like, and the development of the pedestrian re-identification technology has important significance for building safe cities.

Pedestrian re-identification is a very challenging computer vision task, whose task is to retrieve pedestrians under different cameras, where the difficulties include varying backgrounds, lighting, blurring of pictures, different poses of pedestrians, and occlusion of sundries.

In recent years, a deep learning method is widely applied to the fields of image classification, target identification and the like in a plurality of computer vision, compared with a traditional manual design method, the deep learning method can obtain better performance, however, due to the fact that a monitoring video is complex, the situation that pedestrians are shielded can be caused by the fact that collected images are affected by various factors, such as trash cans, buildings, other pedestrians and the like, the existing identification method is difficult to identify the same pedestrian under different cameras, and the accuracy rate of pedestrian re-identification is low.

Disclosure of Invention

An object of the embodiments of the present application is to provide a method, an apparatus, an electronic device and a readable storage medium for re-identifying a pedestrian, so as to solve the problem of low accuracy rate of re-identifying a pedestrian in the prior art.

In a first aspect, an embodiment of the present application provides a pedestrian re-identification method, where the method includes:

extracting image characteristics of an image to be identified through a trunk network of a pedestrian re-identification model;

extracting a pedestrian segmentation attention feature map according to the image features through a first branch network of the pedestrian re-identification model, wherein the pedestrian segmentation attention feature map is used for marking the position of a pedestrian in the image to be identified;

extracting a global feature map of the pedestrian according to the image features through a second branch network of the pedestrian re-identification model;

fusing the pedestrian segmentation attention feature map and the global feature map through the pedestrian re-recognition model to obtain a fused feature map;

and carrying out pedestrian re-identification on the basis of the fusion characteristic diagram through the pedestrian re-identification model to obtain an identification result.

In the implementation process, the first branch network is added into the pedestrian re-identification model to extract the pedestrian segmentation attention feature graph, so that when the pedestrian is re-identified, the model can pay more attention to the features of the region where the pedestrian is located, the more beneficial and more obvious features for the pedestrian re-identification in the image can be extracted, the better identification effect is achieved for the pedestrian re-identification under the condition that the pedestrian is shielded, and the accuracy of the pedestrian re-identification can be effectively improved.

Optionally, the extracting, by the first branch network of the pedestrian re-identification model, a pedestrian segmentation attention feature map according to the image feature includes:

detecting the pedestrian in the image to be identified through the first branch network according to the image characteristics to obtain a pedestrian detection frame;

segmenting the area framed by the pedestrian detection frame from the image to be identified through the first branch network to obtain a segmented image;

filling the segmentation image through the first branch network to obtain a target segmentation image with the same size as the image to be identified;

and identifying pedestrians for each pixel point in the target segmentation image through the first branch network to obtain a pedestrian segmentation attention feature map.

In the implementation process, the pedestrian detection frame is segmented, so that the pedestrian area can be segmented, and the interference of the background features on pedestrian identification is eliminated.

Optionally, the performing pedestrian re-recognition through the pedestrian re-recognition model based on the fusion feature map to obtain a recognition result includes:

segmenting the fusion feature map through the pedestrian re-recognition model to obtain a plurality of feature blocks;

uniformly pooling each feature block through the pedestrian re-identification model to obtain a first local feature corresponding to each feature block;

performing dimensionality reduction processing on each first local feature through the pedestrian re-identification model by using a preset convolution kernel to obtain a corresponding second local feature;

and inputting each second local feature into a corresponding classifier in the pedestrian re-identification model to obtain an identification result output by the classifier.

In the implementation process, the fusion feature map is divided into local features for later prediction, so that finer-grained features can be provided for pedestrian re-identification, and the accuracy of pedestrian re-identification is improved.

Optionally, the inputting each second local feature into a corresponding classifier in the pedestrian re-recognition model to obtain a recognition result output by the classifier includes:

connecting a plurality of second local features with the global feature map according to channel dimensions through the pedestrian re-identification model to obtain total features;

and inputting the total features into a corresponding classifier in the pedestrian re-identification model to obtain an identification result output by the classifier.

In the implementation process, the obtained total features are input into the classifier, so that effective fusion among all input features is realized, more features can be identified, and the identification effect is favorably improved.

Optionally, the pedestrian segmentation attention feature map is a binary image, a feature point with a feature value of 1 represents a position of a pedestrian in the pedestrian segmentation attention feature map, and the fusing of the pedestrian segmentation attention feature map and the global feature map by the pedestrian re-recognition model to obtain a fused feature map includes:

and multiplying the pedestrian segmentation attention feature map by the global feature map through the pedestrian re-identification model to obtain a fusion feature map, so that the features belonging to the pedestrian in the global feature map can be effectively positioned.

Optionally, the pedestrian re-recognition model is trained by:

inputting a training image into the pedestrian re-recognition model to obtain a prediction result output by the pedestrian re-recognition model, wherein the training image comprises a pedestrian shielding image;

calculating a total loss value by using a loss function according to the prediction result;

and updating the network parameters in the pedestrian re-identification model according to the total loss value.

In the implementation process, the pedestrian shielding image is added in the training process to serve as the training image, so that the accuracy of re-identification of the pedestrian when the model shields the pedestrian can be effectively improved.

Optionally, the label information of the training image includes whether each pixel belongs to a label of a pedestrian, and the method further includes:

carrying out pedestrian detection on the training image through an example segmentation algorithm to obtain a segmentation mask image, wherein the segmentation mask image is used for marking pixel points belonging to pedestrians in the training image;

and marking the training image by using the segmentation mask image as a label.

In the implementation process, the segmentation mask image is obtained by adopting an example segmentation algorithm to serve as the label of the training image, so that the problem that a large amount of time is consumed for manual labeling can be solved.

Optionally, the obtaining the segmentation mask image includes:

carrying out pedestrian detection on the training image through an example segmentation algorithm to obtain at least one pedestrian detection frame;

if the at least one pedestrian detection frame comprises at least two detection frames, determining the position relation between the center of the detection frame of each detection frame and the horizontal center of the training image;

determining a target detection frame according to the position relation, wherein the area framed by the target detection frame is the area where the pedestrian to be identified is located;

and obtaining a corresponding segmentation mask image according to the target detection frame.

In the implementation process, the detection frame close to the central position in the image can be selected as the area where the pedestrian is located through the mode, so that the model can pay more attention to the pedestrian at the middle position in the training process, and the pedestrian can be accurately segmented under the condition that the pedestrian is shielded.

Optionally, the method further comprises:

and preprocessing the pedestrian images in the training images by adopting a random erasure data enhancement algorithm to obtain pedestrian occlusion images, so that the number of the pedestrian occlusion images can be increased.

In a second aspect, an embodiment of the present application provides a pedestrian re-identification apparatus, including:

the trunk feature extraction module is used for extracting the image features of the image to be identified through a trunk network of the pedestrian re-identification model;

the first branch feature extraction module is used for extracting a pedestrian segmentation attention feature map according to the image features through a first branch network of the pedestrian re-identification model, and the pedestrian segmentation attention feature map is used for marking the position of a pedestrian in the image to be identified;

the second branch feature extraction module is used for extracting a global feature map of the pedestrian according to the image features through a second branch network of the pedestrian re-identification model;

the feature fusion module is used for fusing the pedestrian segmentation attention feature map and the global feature map through the pedestrian re-identification model to obtain a fusion feature map;

and the pedestrian re-identification module is used for re-identifying the pedestrian through the pedestrian re-identification model based on the fusion characteristic diagram to obtain an identification result.

Optionally, the first branch feature extraction module is configured to:

Optionally, the pedestrian re-identification module is configured to:

Optionally, the pedestrian re-identification module is configured to connect the plurality of second local features and the global feature map according to a channel dimension through the pedestrian re-identification model to obtain a total feature; and inputting the total features into a corresponding classifier in the pedestrian re-identification model to obtain an identification result output by the classifier.

Optionally, the pedestrian segmentation attention feature map is a binary image, the feature point where the feature value is 1 represents the position of the pedestrian in the pedestrian segmentation attention feature map, and the feature fusion module is configured to multiply the pedestrian segmentation attention feature map and the global feature map by using the pedestrian re-identification model to obtain a fusion feature map.

Optionally, the apparatus further comprises:

the model training module is used for inputting a training image into the pedestrian re-recognition model to obtain a prediction result output by the pedestrian re-recognition model, wherein the training image comprises a pedestrian shielding image; calculating a total loss value by using a loss function according to the prediction result; and updating the network parameters in the pedestrian re-identification model according to the total loss value.

Optionally, the label information of the training image includes whether each pixel belongs to a label of a pedestrian, the model training module is configured to perform pedestrian detection on the training image through an example segmentation algorithm to obtain a segmentation mask image, and the segmentation mask image is used to mark the pixel belonging to the pedestrian in the training image; and marking the training image by using the segmentation mask image as a label.

Optionally, the model training module is configured to:

Optionally, the model training module is configured to perform preprocessing on the pedestrian image in the training image by using a random erasure data enhancement algorithm, so as to obtain a pedestrian occlusion image.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the memory stores computer-readable instructions, and when the computer-readable instructions are executed by the processor, the steps in the method as provided in the first aspect are executed.

In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps in the method as provided in the first aspect.

Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a schematic structural diagram of an electronic device for performing a pedestrian re-identification method according to an embodiment of the present application;

fig. 2 is a flowchart of a pedestrian re-identification method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a pedestrian re-identification model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an image obtained by applying a random erasure data enhancement algorithm to the image according to an embodiment of the present application;

fig. 5 is a block diagram of a pedestrian re-identification apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

The embodiment of the application provides a pedestrian re-identification method, which is characterized in that a first branch network is added into a pedestrian re-identification model for extracting a pedestrian segmentation attention feature map, so that when the pedestrian is re-identified, the model can pay more attention to the features of an area where the pedestrian is located, so that the more beneficial and more obvious features for the pedestrian re-identification in an image can be extracted, a better identification effect is realized for the pedestrian re-identification under the condition that the pedestrian is shielded, and the accuracy of the pedestrian re-identification can be effectively improved.

Referring to fig. 1, fig. 1 is a schematic structural diagram of an electronic device for executing a pedestrian re-identification method according to an embodiment of the present disclosure, where the electronic device may include: at least one processor 110, such as a CPU, at least one communication interface 120, at least one memory 130, and at least one communication bus 140. Wherein the communication bus 140 is used for realizing direct connection communication of these components. The communication interface 120 of the device in the embodiment of the present application is used for performing signaling or data communication with other node devices. The memory 130 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). Memory 130 may optionally be at least one memory device located remotely from the aforementioned processor. The memory 130 stores computer readable instructions which, when executed by the processor 110, cause the electronic device to perform the method processes of fig. 2 described below. For example, the memory 130 is used for storing images, and various feature maps extracted, and the processor 110 can be used for feature extraction and pedestrian re-identification based on features.

It will be appreciated that the configuration shown in fig. 1 is merely illustrative and that the electronic device may also include more or fewer components than shown in fig. 1 or have a different configuration than shown in fig. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

Referring to fig. 2, fig. 2 is a flowchart of a pedestrian re-identification method according to an embodiment of the present application, where the method includes the following steps:

step S110: and extracting the image characteristics of the image to be identified through a trunk network of the pedestrian re-identification model.

In some embodiments, the pedestrian re-identification model may be a neural network model, such as a convolutional neural network, a cyclic convolutional neural network, a ResNet network, or a variation of these neural networks, or the like.

A schematic structural diagram of the pedestrian re-identification model in the embodiment of the present application is shown in fig. 3, and the model includes a backbone network, a first branch network and a second branch network, where the backbone network is connected to the first branch network and the second branch network respectively, the backbone network is used to extract basic image features of an image, the first branch network is used to segment a pedestrian image and extract a pedestrian segmentation attention feature map, and the second branch network is used to extract a global image of a pedestrian.

The backbone network is used for extracting image features of an image to be identified, the structure of the backbone network can be a convolutional layer in a convolutional neural network, and the like, of course, the backbone network can also be other network structures, such as VGG, and the convolutional layer can also be replaced by other deformable convolutions.

The image to be recognized can be a series of pedestrian images acquired by a camera, or a video image and the like. After the image to be recognized is input into the pedestrian re-recognition model, the image is processed by a convolution layer in a backbone network, and image characteristics are output.

For the convenience of calculation, the image features may be feature maps formed by means of feature descriptions or feature vectors and the like.

Step S120: and extracting a pedestrian segmentation attention feature map according to the image features through a first branch network of the pedestrian re-identification model.

In order to facilitate the pedestrian re-identification model to pay more attention to the pedestrian region in the image to be identified, so that the model can extract the features beneficial to the re-identification of the pedestrian, a first branch network can be added in the model and used for identifying the region where the pedestrian is located in the image to be identified, so that the pedestrian region is segmented from the original image, and thus the influence of other background features on the identification result of the re-identification of the pedestrian can be avoided.

The pedestrian segmentation attention characteristic map is used for marking the position of a pedestrian in the pedestrian segmentation attention map. In some embodiments, the feature value of each feature point in the pedestrian segmentation attention feature map is used to characterize the probability that the feature point belongs to a pedestrian.

The feature point with the feature value greater than the preset threshold may be set as a pixel point belonging to a pedestrian, and the preset threshold may be set according to an actual requirement, such as 0.5 or 0.8, for example, if the feature value of a certain feature point in the pedestrian segmentation attention feature map is 0.9, the feature point is determined as the position where the pedestrian is located. According to the method, the position of the pedestrian in the trip person segmentation attention feature map can be determined.

In some other embodiments, the pedestrian segmentation attention feature map may also be a binary image, the feature value of the feature point is 0 or 1, the feature point with the feature value of 1 corresponds to a pixel point in the image to be recognized, that is, the position where the pedestrian is located, and the feature point with the feature value of 0 corresponds to a pixel point in the image to be recognized, that is, the position where the pedestrian is not located, which may be a background feature. In this way, the pedestrian segmentation attention feature map can be understood as a mask image.

In the embodiment of the application, the pedestrian segmentation attention feature map can be fused into a pedestrian global feature map as attention, so that the extraction of the model on the features of the region where the pedestrian is located is increased, and the accuracy of pedestrian re-identification in a shielding state is increased.

Step S130: and extracting a global feature map of the pedestrian according to the image features through a second branch network of the pedestrian re-identification model.

The second branch network is used for extracting global features, and may include a down-sampling layer, a convolution layer, a normalization layer, a dimensionality reduction layer, and the like, for example, when image features are input into the second branch network, the image features may be down-sampled through global average pooling to obtain feature vectors, and then the feature vectors are subjected to dimensionality reduction by using the convolution layer, the normalization layer, and the dimensionality reduction layer, so that the obtained dimensionality reduction features may be the global features. Wherein, the expression form of the global feature is a global feature graph.

Step S140: and fusing the pedestrian segmentation attention feature map and the global feature map through the pedestrian re-identification model to obtain a fused feature map.

The attention mechanism means that after attention is applied to the global feature map, features needing important attention can be obtained, and then more attention resources are put on the features. Therefore, after the pedestrian segmentation attention feature map and the global feature map are fused, a feature region needing important attention can be applied to the global feature map, so that the pedestrian re-identification model can identify the features of the feature region, the significant features in the global feature can be enhanced, the meaningless features are inhibited, the significant features of pedestrian re-identification are obtained, and the identification accuracy is effectively improved.

In some embodiments, if the pedestrian segmentation attention feature map is a binary image, the fusion mode may be to multiply the pedestrian segmentation attention feature map and the global feature map to obtain a fusion feature map, and the fusion feature map obtained in this way retains the global feature of the pedestrian position and removes other meaningless features except the pedestrian position.

Of course, in other embodiments, regardless of whether the pedestrian segmentation attention feature map is a binary image, the pedestrian segmentation attention feature map and each corresponding feature in the global feature map may be subjected to weighted summation to obtain a fused feature map.

Step S150: and carrying out pedestrian re-identification on the basis of the fusion characteristic diagram through the pedestrian re-identification model to obtain an identification result.

After obtaining the fused feature map, pedestrian re-identification may be performed based on the fused feature map. In the identification, the fusion feature map can be subjected to pedestrian ID classification by adopting classification in the pedestrian re-identification model, so that an identification result is obtained.

In some embodiments, if there are a plurality of pedestrians in the image to be recognized, and some pedestrians are blocked, at this time, in the process of obtaining the pedestrian segmentation attention, a first branch network may first detect a pedestrian in the image to be recognized according to the image features to obtain a pedestrian detection frame, then a region framed by the pedestrian detection frame is segmented from the image to be recognized through the first branch network to obtain a segmented image, then the segmented image is filled through the first branch network to obtain a target segmented image with the same size as the image to be recognized, and finally, pedestrian recognition is performed on each pixel point in the target segmented image to obtain a pedestrian segmentation attention feature map.

The area framed by the pedestrian detection frame is the area where the pedestrian is located, if multiple pedestrians exist in the image to be recognized, multiple pedestrian detection frames can be obtained, and each detection frame can be divided. Since the segmented image may be smaller than the image to be recognized, the segmented image may be enlarged to have the same size as the image to be recognized in order to facilitate the subsequent fusion with the global feature map.

After the segmentation image is obtained, pedestrian recognition can be performed on each pixel point in the segmentation image, so that a pedestrian segmentation attention feature map is obtained.

In addition, if there are a plurality of detection frames, and if there is an overlapping area between two of the plurality of detection frames, during the division, in order to avoid dividing another pedestrian, if the detection frame 1 corresponds to the pedestrian 1, the detection frame 2 corresponds to the pedestrian 2, and when the pedestrian 1 is partially covered by the pedestrian 2, there may be a certain overlapping area between the detection frame 1 and the detection frame 2, so that the pedestrian 2 may be divided in the division, in order to avoid this, during the division of the detection frame 1, the overlapping area between the detection frame 1 and the detection frame 2 may be filled with a background color, that is, the overlapping area between the detection frame 1 and the detection frame 2 is not a pedestrian area but is considered as a background area, so during the pedestrian recognition, the overlapping area in the detection frame 1 is not recognized as a pedestrian area, thereby avoiding that the pedestrian of the pedestrian 2 pixel point is mistakenly recognized as 1 during the pedestrian recognition in the detection frame 1 The pixel point of (2).

In the implementation process, the pedestrian detection frame is segmented, so that the pedestrian area can be segmented to eliminate the interference of the background features on pedestrian identification.

In some embodiments, in order to improve accuracy of pedestrian re-identification, during re-identification, the fused feature map may be segmented by a pedestrian re-identification model to obtain a plurality of feature blocks, then each feature block is uniformly pooled by the pedestrian re-identification model to obtain a first local feature corresponding to each feature block, each first local feature is checked by the pedestrian re-identification model by using a preset convolution to perform dimension reduction processing to obtain a corresponding second local feature, and then each second local feature is input to a corresponding classifier in the pedestrian re-identification model to obtain an identification result output by the classifier.

When the fused feature map is divided, the fused feature map can be uniformly divided into a plurality of feature blocks in the horizontal direction, for example, the fused feature map can be uniformly divided into 6 feature blocks. And uniformly pooling each feature block to obtain a plurality of first local features, wherein the local features are expressed in the form of feature vectors. In the dimension reduction processing, the preset convolution kernel may be a 1 × 1 convolution kernel, and of course, convolution kernels of other sizes may be set according to actual requirements, the dimension reduction processing may be performed on the first local feature to obtain a corresponding second local feature, and the expression form of the second local feature may also be a feature vector. During identification, the second local features may be input into a classifier composed of a full connection layer and a Softmax function, and each second local feature is input into a corresponding classifier, that is, each second local feature does not share a classifier, so as to obtain an identification result output by each classifier.

In some embodiments, when the second local features are input into the classifier for classification and identification, in order to improve the accuracy of classification and identification, the plurality of second local features may be connected with the global feature map according to the channel dimension to obtain total features, and then the total features are input into the classifier to obtain the identification result output by the classifier.

And connecting the plurality of second local features with the global feature map according to the channel dimension, and inputting the obtained total features into the classifier, so that effective fusion among all input features is realized, more features can be identified, and the identification effect is favorably improved.

In some embodiments, in order to improve the recognition accuracy of the pedestrian re-recognition model, the pedestrian re-recognition model may be trained in the following manner:

inputting a training image into the pedestrian re-recognition model to obtain a prediction result output by the pedestrian re-recognition model, wherein the training image comprises a pedestrian shielding image, then calculating a total loss value by using a loss function according to the prediction result, and updating a network parameter in the pedestrian re-recognition model according to the total loss value.

The training images refer to images in a plurality of data sets, and the data sets comprise Market-1501, Duke MTMC-reiD, OCHuman, custom data sets and the like. The training data sets of Market-1501 and DukeMTMC-reiD are small, the Market-1501 training set has only 751 persons of 12936 figures, and the DukeMTMC-reiD training set has only 702 persons of 16522 figures; therefore, in order to form a data set with a larger scale, some pseudo label data of the field video can be added, and the accuracy of the trained pedestrian re-identification model is higher. In addition, the definition of the Market-1501 and DukeMTMC-reiD pictures is actually lower, which may be the reason for shooting by a camera which is far away, and a data set or picture which is larger than a person and has higher definition can be added.

Wherein the Market-1501 data set has 1501 identities collected by 6 cameras, and 32668 pedestrian images in total, and the data set is divided into a training set and a testing set, and the training set contains 12936 images of 751 identities; the test set contains 3368 query images and 15913 billary images, containing 750 identities; the dukeltmmc-reID dataset contains 1404 identities collected by more than 2 cameras for a total of 36411 images, with the training set containing 16522 images of 702 identities and the test set containing the other 702 identities.

In addition, the pedestrian video of the existing actual scene can be inferred by using a pre-trained pedestrian detection model resnet50 on a coco data set, a plurality of human body pictures are cut out, then pictures with high similarity are screened out through a universal pedestrian re-recognition model, part of the pictures are cut out from continuous video images, the human body picture difference is small, the images with poor model recognition are not greatly distinguished even between two adjacent frames of images, in this case, the images with poor model recognition can be screened out manually, and finally the retained images are used as a custom data set.

It is to be understood that the data process inside the pedestrian re-identification model in the training process is similar to the process of performing pedestrian re-identification in the above embodiment, and for brevity of description, the description is not repeated here. When the pedestrian re-recognition is performed, the global feature map and the local features are input into the classifier to be recognized, and the total loss obtained by the recognition may be the sum of the loss predicted based on the global feature map and the total loss predicted based on each local feature, and the calculation formula is as follows:

wherein ID _ Loss represents the total Loss, Loss_globalRepresents the Loss obtained by prediction based on the global feature, n represents the number of the second local features, λ represents the weight in weighting, Loss_iIndicating the loss corresponding to the ith second local feature.

The weight can be obtained by performing point multiplication on the mask and the local features corresponding to each second local feature and then summing and calculating an average value, and the calculation formula is as follows:

λ＝Avg(∑P_mask·P_i)；

wherein λ represents a weight, P_maskRepresenting the correspondence of the second local feature, P_iRepresenting the ith second local feature.

After the total loss is calculated, the total loss is transmitted back to the model, and the network parameters in the pedestrian re-identification model are updated in the direction of reducing the total loss, wherein the network parameters comprise parameters of a main network, a first branch network, a second branch network and the like.

The loss function for calculating the total loss can adopt a triple loss function, and the triple loss function can effectively shorten the intra-class distance and lengthen the inter-class distance. Of course, other loss functions may be used for the calculation, such as a quadruple loss function. The loss corresponding to the global feature can be calculated by adopting a cross entropy loss function or a triplet loss function, and the loss corresponding to the local feature can also be calculated by adopting the cross entropy loss function or the triplet loss function.

And after the model reaches convergence or reaches the iteration times, the model training is finished.

In some embodiments, in order to implement segmentation of pedestrians, the label information of the training image includes whether each pixel belongs to a label of a pedestrian, and since the label information required for segmenting the pedestrian is at a pixel level, if labeling is performed manually, a large amount of time may be consumed.

The specific way of obtaining the segmented mask image by using the example segmentation algorithm may refer to the related implementation process in the prior art, and will not be described herein in detail.

In some embodiments, in order to ensure the accuracy of segmentation, a pedestrian detection is performed on a training image through an example segmentation algorithm to obtain at least one pedestrian detection frame, if at least two pedestrian detection frames are included, the position relationship between the center of the detection frame of each detection frame and the horizontal center of the training image is determined, then a target detection frame is determined according to the position relationship, the selected region of the target detection frame is the region where a pedestrian to be identified is located, and a corresponding segmentation mask image is obtained according to the target detection frame.

Understandably, in the pedestrian-occluded image, most of the images are of two persons, so that the detection frame should be selected such that the detection frame is not selected to be only large but should be selected to be closest to the horizontal center of the image, and if the horizontal centers of both the detection frames are close to each other, the vertical center is selected to be above, thereby determining the finally selected target detection frame. And then, each pixel point in the target detection frame can be identified to determine the pixel points belonging to the pedestrians, so that a segmentation mask image is obtained, the pixel point belonging to each pedestrian can be accurately identified, and the accuracy of the label is improved.

In some embodiments, in order to expand the pedestrian occlusion image, the pedestrian image in the training image may be preprocessed by using a random search data enhancement algorithm to obtain the pedestrian occlusion image.

The random erasure data enhancement algorithm is to randomly select a region in the training image and add random normal distribution noise (the schematic diagram is shown in fig. 4), so that the overfitting condition of the model can be reduced, and the performance of the model can be improved. In addition, the specific implementation process of the random erasure data enhancement algorithm may refer to related implementation processes in the prior art, and for brevity of description, not described herein too much.

Referring to fig. 5, fig. 5 is a block diagram of a pedestrian re-identification apparatus 200 according to an embodiment of the present disclosure, where the apparatus 200 may be a module, a program segment, or a code on an electronic device. It should be understood that the apparatus 200 corresponds to the above-mentioned embodiment of the method of fig. 2, and can perform various steps related to the embodiment of the method of fig. 2, and the specific functions of the apparatus 200 can be referred to the above description, and the detailed description is appropriately omitted here to avoid redundancy.

Optionally, the apparatus 200 comprises:

a trunk feature extraction module 210, configured to extract image features of the image to be recognized through a trunk network of the pedestrian re-recognition model;

a first branch feature extraction module 220, configured to extract a pedestrian segmentation attention feature map according to the image features through a first branch network of the pedestrian re-identification model, where the pedestrian segmentation attention feature map is used to mark a position of a pedestrian in the image to be identified;

a second branch feature extraction module 230, configured to extract a global feature map of the pedestrian according to the image features through a second branch network of the pedestrian re-identification model;

a feature fusion module 240, configured to fuse the pedestrian segmentation attention feature map and the global feature map through the pedestrian re-recognition model to obtain a fusion feature map;

and the pedestrian re-identification module 250 is used for carrying out pedestrian re-identification through the pedestrian re-identification model based on the fusion feature map to obtain an identification result.

Optionally, the first branch feature extraction module 220 is configured to:

Optionally, the pedestrian re-identification module 250 is configured to:

Optionally, the pedestrian re-identification module 250 is configured to connect, by using the pedestrian re-identification model, the plurality of second local features and the global feature map according to a channel dimension to obtain a total feature; and inputting the total features into a corresponding classifier in the pedestrian re-identification model to obtain an identification result output by the classifier.

Optionally, the pedestrian segmentation attention feature map is a binary image, the feature point where the feature value is 1 represents the position of the pedestrian in the pedestrian segmentation attention feature map, and the feature fusion module 240 is configured to multiply the pedestrian segmentation attention feature map and the global feature map by using the pedestrian re-identification model to obtain a fusion feature map.

Optionally, the apparatus 200 further comprises:

Optionally, the model training module is configured to:

It should be noted that, for the convenience and brevity of description, the specific working procedure of the above-described apparatus may refer to the corresponding procedure in the foregoing method embodiment, and the description is not repeated herein.

Embodiments of the present application provide a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the method processes performed by an electronic device in the method embodiment shown in fig. 1.

The present embodiments disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the methods provided by the above-described method embodiments, for example, comprising: extracting image characteristics of an image to be identified through a trunk network of a pedestrian re-identification model; extracting a pedestrian segmentation attention feature map according to the image features through a first branch network of the pedestrian re-identification model, wherein the pedestrian segmentation attention feature map is used for marking the position of a pedestrian in the image to be identified; extracting a global feature map of the pedestrian according to the image features through a second branch network of the pedestrian re-identification model; fusing the pedestrian segmentation attention feature map and the global feature map through the pedestrian re-recognition model to obtain a fused feature map; and carrying out pedestrian re-identification on the basis of the fusion characteristic diagram through the pedestrian re-identification model to obtain an identification result.

To sum up, the embodiment of the application provides a pedestrian re-identification method, a device, an electronic device and a readable storage medium, wherein a first branch network is added to a pedestrian re-identification model for extracting a pedestrian segmentation attention feature map, so that when the pedestrian re-identification is performed, the model can pay more attention to the features of an area where the pedestrian is located, thereby being capable of extracting more beneficial and more significant features for the pedestrian re-identification in an image, having a better identification effect for the pedestrian re-identification under the condition that the pedestrian is shielded, and being capable of effectively improving the accuracy of the pedestrian re-identification.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A pedestrian re-identification method, the method comprising:

2. The method according to claim 1, wherein the extracting a pedestrian segmentation attention feature map from the image features through the first branch network of the pedestrian re-identification model comprises:

3. The method according to claim 1, wherein the obtaining of the recognition result through the pedestrian re-recognition model based on the fused feature map comprises:

4. The method according to claim 3, wherein the inputting each second local feature into a corresponding classifier in the pedestrian re-recognition model to obtain a recognition result output by the classifier comprises:

5. The method according to claim 1, wherein the pedestrian segmentation attention feature map is a binary image, the feature point with the feature value of 1 represents the position of a pedestrian in the pedestrian segmentation attention feature map, and the fusing of the pedestrian segmentation attention feature map and the global feature map by the pedestrian re-recognition model to obtain a fused feature map comprises:

and multiplying the pedestrian segmentation attention feature map and the global feature map through the pedestrian re-identification model to obtain a fusion feature map.

6. The method according to any one of claims 1-5, wherein the pedestrian re-recognition model is trained by:

7. The method of claim 6, wherein the label information of the training image includes a label of whether each pixel point belongs to a pedestrian, the method further comprising:

and marking the training image by using the segmentation mask image as a label.

8. The method of claim 7, wherein obtaining the segmented mask image comprises:

9. The method of claim 6, further comprising:

and preprocessing the pedestrian image in the training image by adopting a random erasure data enhancement algorithm to obtain a pedestrian shielding image.

10. A pedestrian re-identification apparatus, the apparatus comprising:

11. An electronic device comprising a processor and a memory, the memory storing computer readable instructions that, when executed by the processor, perform the method of any of claims 1-9.

12. A readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-9.