CN114022823A

CN114022823A - Shielding-driven pedestrian re-identification method and system and storable medium

Info

Publication number: CN114022823A
Application number: CN202111354994.3A
Authority: CN
Inventors: 宋文凤; 叶莹; 尚钰; 许庆胜; 刘程锦; 那婷婷; 张欣宇
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2021-11-16
Filing date: 2021-11-16
Publication date: 2022-02-08

Abstract

The invention discloses a pedestrian re-identification method, a system and a storage medium driven by shielding, which relate to the technical field of image processing and comprise the following steps: constructing a panoramic data set and performing data enhancement on the panoramic data set to obtain a training set; inputting the training set into an occlusion classification model for training until the model converges to obtain a trained occlusion classification model; inputting the training set into submodels corresponding to different shielding degrees in the re-recognition model for training based on different shielding degrees until the recognition accuracy of each submodel reaches a preset value, and obtaining a trained re-recognition model; and judging the shielding degree of the picture to be detected and obtaining a pedestrian characteristic recognition result through the trained shielding classification model and the trained re-recognition model. The method can still accurately identify the characteristics of the pedestrian under the conditions of different shielding scenes and fuzzy and poor quality pictures, and improves the identification accuracy.

Description

Shielding-driven pedestrian re-identification method and system and storable medium

Technical Field

The invention relates to the technical field of image processing, in particular to a pedestrian re-identification method and system driven by shielding and a storage medium.

Background

The traditional pedestrian attribute identification method usually focuses on establishing a robust feature representation from the aspects of manual features, powerful classifiers or attribute relations and the like, wherein the robust feature representation comprises HOG, SIFT, SVM or CRF models and the like, and the performance of the traditional algorithms is far from meeting the requirements of practical application. Most deep learning methods for pedestrian attribute identification now use shallow convolutional networks. Using convolutional neural networks, one can use random gradient descent to learn features and end-to-end learning with reduced classification, and limit some methods of support vector machine and manual feature correlation. The extracted features are obtained by training data learning in a direct convolution filter, and the method has the main advantages that: the feature extractor and classifier parameters are optimized in a very simple and convenient end-to-end manner, and the extracted features are adaptively optimized for specific attributes. Multi-label convolutional neural networks are preferred over support vector machines because they allow a more comprehensive learning of the relationships between attributes.

The pedestrian re-identification aims at carrying out pedestrian matching under a non-overlapping visual angle domain multi-camera network, namely, whether pedestrian targets shot by cameras at different positions at different moments are a person or not is confirmed. Deep learning based approaches have dominated the area of hard recognition. These methods can be broadly divided into two categories: the first category mainly focuses on extracting invariant features from images to improve discrimination; the second category is the extension of feature search from images to video sequences by integrating the timeline into the spatial features.

The early small data sets (Viper and the like) of the pedestrian re-identification technology only acquire images through two cameras, each pedestrian only has one correct retrieval target, comprehensive evaluation cannot be provided, and the actual application effect is not obvious. The existing pedestrian re-identification data sets (Duke, CUHK03, Market-1501 and the like) are all images acquired by a camera in an actual campus, and the largest data set is tens of thousands of pictures. In the face of shooting angles of different scenes, the process of obtaining the camera view conversion function model by learning the characteristic change is complex.

The existing pedestrian re-identification technology has high identification accuracy on pedestrians under the condition of no shielding, but partial characteristic information is lost in the shielding scene, so that the identification difficulty is increased; and under the condition that the image quality of the data set is low, the picture is fuzzy and noisy, the difficulty in extracting pedestrian features in the image is increased, and the edge of a lens is distorted due to the adoption of a fisheye camera in the panoramic data set, so that the identification accuracy is further reduced. Therefore, how to accurately identify the features of the pedestrian under the conditions of fuzzy and poor quality of different occlusion scenes and pictures is a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

In view of this, the invention provides a pedestrian re-identification method and system driven by occlusion, and a storage medium, which can accurately identify the features of pedestrians and improve the identification accuracy under the conditions of fuzzy and poor quality of different occlusion scenes and pictures.

In order to achieve the above object, the present invention provides a pedestrian re-recognition method by occlusion driving, comprising the steps of:

acquiring a panoramic video and constructing a panoramic data set;

expanding the panoramic data set by a data enhancement method to obtain an enhanced panoramic data set serving as a training set;

constructing an occlusion classification model, inputting the training set into the occlusion classification model for training until the model converges, and obtaining a trained occlusion classification model;

constructing a re-recognition model, inputting the training set into submodels corresponding to different shielding degrees in the re-recognition model for training based on different shielding degrees, and stopping training until the recognition accuracy of each submodel reaches a preset value to obtain a trained re-recognition model;

and inputting the picture to be detected into the trained shielding classification model, acquiring the shielding degree of the picture to be detected, and inputting the picture to be detected into a corresponding sub-model of the trained re-identification model for identification to obtain a pedestrian feature identification result of the picture to be detected.

The technical scheme discloses specific steps of the pedestrian re-identification method, and the method has the advantages that the influence of lens distortion on pedestrian identification is weakened as much as possible by processing the panoramic video, and pedestrian features can be well identified for pictures under different shielding scenes.

Optionally, the obtaining of the panoramic video and the constructing of the panoramic data set specifically include the following steps:

shooting a panoramic video through a panoramic camera;

editing the panoramic video to acquire videos with pedestrians in each frame;

and performing frame truncation operation on the video with pedestrians in each frame, converting the video data into picture data, and constructing a panoramic data set by adopting pictures without abundant motion characteristics but with clear human images.

Optionally, the panoramic data set is expanded by a data enhancement method, specifically: the opencv is adopted to carry out brightness enhancement, chroma enhancement and sharpness enhancement on the picture data to obtain an enhanced panoramic data set, and the problems of insufficient data quantity, low picture quality and fuzzy degree in the data set processing process can be solved.

Optionally, the constructed occlusion classification model is a ResNet18 model; obtaining a trained shielding classification model, and specifically comprising the following steps:

processing the training set and the corresponding label by adopting a data set function to obtain the length of the training set;

iterating a training set in the data iterator to obtain a corresponding picture tensor and a corresponding label tensor;

migrating the ResNet18 model to a GPU to obtain the predicted occlusion degree;

and training the ResNet18 model through a cross entropy loss function, and optimizing network parameters by adopting an optimizer until the cross entropy loss function is converged to obtain a trained shielding classification model.

The technical scheme discloses a specific process for training the shielding classification model, the classification level of the output picture is more accurate by updating the network weight according to the cross entropy loss function, and the judgment of the shielding degree of the trained shielding classification model can be more accurate.

Optionally, the cross entropy loss function is:

H(p,q)＝-∑(p(x)logq(x))；

wherein: p (x) represents the true probability distribution of the feature x of the input data, and q (x) represents the predicted probability distribution of the feature x of the input data.

Optionally, a re-identification model is constructed based on the converter model, and the re-identification model includes a non-occlusion submodel, a small-occlusion submodel, and a severe-occlusion submodel.

Optionally, the training of the submodel includes two phases: feature extraction and supervised learning;

in the feature extraction stage, a picture is cut into a plurality of blocks, each block is a sequence and is input into an encoder of a converter model, and a block with overlapped pixels is obtained;

in the stage of supervised learning, the global features and the local features are respectively coded through two independent converter layers of a global branch and an image block recombination branch; and regrouping all the image blocks by the image block regrouping module in the image block regrouping branch, and inputting the regrouped image blocks into a shared converter layer to obtain local characteristics.

The technical scheme discloses a specific process for training the re-recognition model, the re-recognition model is divided into three subspaces for training according to different shielding degrees, the input set is classified through the shielding classification model, and then the sub-models trained in the subspaces of the corresponding types are used for recognition, so that the interference of shielding on a face recognition result can be reduced.

The invention also provides a pedestrian re-identification system driven by shielding, which comprises:

the acquisition module is used for acquiring a panoramic video and constructing a panoramic data set;

the image enhancement module is used for expanding the panoramic data set by a data enhancement method to obtain an enhanced panoramic data set which is used as a training set;

the first construction module is used for constructing an occlusion classification model;

the first training module is used for inputting the training set into the shielding classification model for training until the model converges to obtain a trained shielding classification model;

the second construction module is used for constructing a re-recognition model;

the second training module inputs the training set into submodels corresponding to different shielding degrees in the re-recognition model for training based on different shielding degrees, and stops training until the recognition accuracy of each submodel reaches a preset value to obtain a trained re-recognition model;

and the detection module acquires the shielding degree of the picture to be detected by inputting the picture to be detected into the trained shielding classification model, and inputs the picture to be detected into a corresponding sub-model of the trained re-identification model for identification to obtain a pedestrian feature identification result of the picture to be detected.

The present invention also provides a computer-storable medium having stored thereon a computer program which, when being executed by a processor, realizes the steps of the above pedestrian re-identification method.

Compared with the prior art, the invention discloses a pedestrian re-identification method and system driven by shielding and a storable medium, and the method has the following beneficial effects:

(1) according to the pedestrian re-identification method, firstly, classification is carried out through the shielding classification model, the shielding degree is judged, and then the shielding degree is input into the sub-model of the re-identification model in the corresponding type for identification, so that the influence of shielding on face identification can be reduced;

(2) the pedestrian re-identification method processes the panoramic video, and can weaken the influence of lens distortion on pedestrian identification as much as possible;

(3) according to the pedestrian re-identification method, the pictures with low resolution, fuzzy and noisy characteristics in the data set are enhanced, so that the characteristics of pedestrians can be accurately identified under the conditions of different shielding scenes and poor fuzzy and quality pictures, and the identification accuracy is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a pedestrian re-identification method of the present invention;

2(a) -2 (b) are images before and after data enhancement, respectively;

FIG. 3 is a schematic diagram of an occlusion classification model;

FIG. 4 is a schematic diagram of a training process of a re-recognition model;

fig. 5 is a block diagram of a pedestrian re-identification system of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

The embodiment of the invention discloses a pedestrian re-identification method driven by shielding, which comprises the following steps as shown in figure 1:

acquiring a panoramic video and constructing a panoramic data set;

constructing an occlusion classification model, inputting a training set into the occlusion classification model for training until the model converges, and obtaining a trained occlusion classification model;

and inputting the picture to be detected into the trained shielding classification model, acquiring the shielding degree of the picture to be detected, and inputting the picture to be detected into the corresponding sub-model of the trained re-identification model for identification to obtain the pedestrian feature identification result of the picture to be detected.

Further, regarding the acquisition and construction of the panoramic data set: data were collected using an Insta 360 panoramic camera. The acquisition environment has large pedestrian flow and unstable video, and if data are continuously acquired for a long time, a lot of data are invalid. In order to solve the problem, the high effectiveness of each data segment is ensured by a short-time high-frequency acquisition method. To obtain better experimental results, video with a resolution of 1080 × 1920, averaging 30 frames per second, was collected, while the duration of each video segment was controlled to be around 20 seconds. The panoramic data set has more angles and larger range, and is closer to the data collected by the camera. Such data set is more natural, but because adopt the fisheye camera can lead to the edge of camera lens to have the distortion, and light and shade and resolution ratio height also can bring the influence to training and test. Therefore, an Insta 360Studio is used for editing the panoramic video before the panoramic data set is made, and pedestrians are guaranteed to exist in each frame as much as possible; then, the frame is truncated using cv2.video, thereby converting the video data into a picture. When a plurality of angles in the panoramic video are effective, saving is carried out, and some pictures which do not have abundant motion characteristics but have clear human images are used as the panoramic data set.

Further, in the processing of the data set, there are some cases that the amount of data is insufficient, the quality of the picture is low, and the picture is fuzzy, so in order to improve the size and quality of the data set, the data set is expanded by a data enhancement method. In the present embodiment, the enhancement of the picture data is performed by using opencv, and there are many methods for performing data enhancement on the picture, such as changing brightness, changing chroma, and changing sharpness. As shown in fig. 2(a), the image before data enhancement is enhanced, and when the image is enhanced, the luminance of the picture is increased by 0.2, the chrominance of the picture is increased by 0.8, and the sharpness of the picture is increased by 3.0, so that the image after enhancement as shown in fig. 2(b) is obtained.

Further, the constructed occlusion classification model is the ResNet18 model, see FIG. 3. Firstly, reading a training set and corresponding labels into a data set function for processing, adding pictures read in each time into a list to obtain the length of the list, namely the length of the data set, reading each picture from the list and converting the picture into a tensor; then, loading a training set and putting the training set into a data iterator, setting the learning rate to be 0.001, loading a ResNet18 model, dividing the model into three shielding types, namely ResNet (3,3), migrating the model onto a GPU for training, selecting a cross entropy loss function, updating a learnable parameter params (iterative) in the model by using an Optim.Adam () optimizer, selecting a random number 1000, and then starting training. The method comprises the steps of carrying out forward propagation during training, iterating data in a data iterator, respectively obtaining corresponding picture tensors and label tensors, transferring the picture tensors and the label tensors to a GPU, obtaining a predicted value, namely the predicted shielding degree of an input picture, calculating a cross entropy loss function, emptying optimizer parameters to carry out backward propagation on loss, updating weights in a network according to the cross entropy loss function to enable the classification level of the picture output by the network to be more accurate, then optimizing model parameters by using the optimizer until the model converges, and obtaining a trained shielding classification model.

Specifically, the cross entropy loss function is:

H(p,q)＝-∑(p(x)logq(x))；

Further, as shown in fig. 4, a schematic diagram of a training process of the re-recognition model is provided, and the re-recognition model is proposed based on the converter model, which includes: a no-occlusion submodel, a small-occlusion submodel, and a severe-occlusion submodel. Since these three models are trained in subspaces divided according to the degree of occlusion, each sub-model is focused on different degrees of occlusion, i.e., no occlusion, small occlusion, and severe occlusion. For the input picture, firstly classifying through the shielding classification model, judging the shielding degree, and then identifying by using the sub-model trained in the sub-space of the corresponding type so as to reduce the interference of shielding on the face identification result.

The training of the sub-model is divided into two stages, feature extraction and supervised learning. In the feature extraction stage, the image input to the module is cut into blocks, each block being a sequence, and input to the encoder of the converter model. Blocks with overlapping pixels are generated by sliding windows to avoid missing local neighboring structures around the block. Two-dimensional bilinear interpolation is also introduced as a learnable position code to process input images of any resolution. Meanwhile, in the stage of supervised learning, the global features and the local features are respectively coded by using two independent transformer layers of a global branch and an image block recombination branch. The tile reassembly branch contains a tile reassembly module that shuffles and regroups all tiles, then inputs into the shared transformer layer, and acquires local features.

In addition, when the sub-models are tested, the models in each sub-space can be tested, and the test data and results are shown in table 1. Firstly, in a subspace formed after the occlusion classification model is classified, identifying pictures of a test set one by using corresponding sub models, and outputting an identification result. Then, the accuracy and the head-to-tail hit rate of each submodel are counted to judge whether the re-identification of the pedestrian with the corresponding shielding degree by using each submodel is more accurate.

TABLE 1 test data and results for submodels

Example 2

The present embodiment provides a pedestrian re-recognition system driven by occlusion, as shown in fig. 5, including:

the first training module is used for inputting a training set into the shielding classification model for training until the model converges to obtain a trained shielding classification model;

the second construction module is used for constructing a re-recognition model;

The present invention also provides a computer-storable medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the pedestrian re-identification method as described in embodiment 1.

The invention provides a pedestrian re-identification method driven by shielding in a panoramic camera, and the structural components of the method comprise an image enhancement model, a shielding classification model and a re-identification model. The scheme is favorable for the test stage of shooting the image on the open campus. Images taken on an open campus may have more interference, such as shading and light, than ordinary images. Therefore, firstly, the brightness, the chroma and the definition of the image are enhanced, then the image is divided into 3 types according to the shielding degree to form three subspaces, and the image is input into a corresponding submodel for training. According to the technical scheme, the pedestrian feature recognition method and device can still accurately recognize the pedestrian feature under different shielding scenes, such as partial shielding, severe shielding, pedestrian self-shielding, fuzzy pictures and poor quality, and the accuracy of pedestrian recognition is improved.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A pedestrian re-identification method driven by sheltering is characterized by comprising the following steps:

acquiring a panoramic video and constructing a panoramic data set;

2. The occlusion driven pedestrian re-identification method according to claim 1, wherein the acquiring of the panoramic video and the constructing of the panoramic data set specifically comprises the following steps:

shooting a panoramic video through a panoramic camera;

editing the panoramic video to acquire videos with pedestrians in each frame;

3. The occlusion driven pedestrian re-identification method according to claim 1, wherein the panoramic data set is augmented by a data enhancement method, specifically: and performing brightness enhancement, chroma enhancement and sharpness enhancement on the picture data by adopting opencv to obtain an enhanced panoramic data set.

4. The occlusion driven pedestrian re-identification method according to claim 1, wherein the constructed occlusion classification model is a ResNet18 model; obtaining a trained shielding classification model, and specifically comprising the following steps:

processing a training set and a corresponding label by adopting a data set function to obtain the length of the training set;

migrating the ResNet18 model to a GPU to obtain a predicted occlusion degree;

5. The occlusion driven pedestrian re-identification method according to claim 4, wherein the cross entropy loss function is:

H(p,q)＝-∑(p(x)logq(x))；

6. The occlusion driven pedestrian re-identification method according to claim 1, wherein a re-identification model is constructed based on the converter model, the re-identification model comprising a no-occlusion sub-model, a small-occlusion sub-model, a heavy-occlusion sub-model.

7. The occlusion driven pedestrian re-identification method according to claim 6, wherein the training of the sub-model comprises two stages: feature extraction and supervised learning;

8. A barrier-driven pedestrian re-identification system, comprising:

the second construction module is used for constructing a re-recognition model;

9. A computer-storable medium on which a computer program is stored which, when being executed by a processor, carries out the steps of the pedestrian re-identification method as claimed in any one of claims 1 to 7.