CN111783609A

CN111783609A - Pedestrian re-identification method, device, equipment and computer readable storage medium

Info

Publication number: CN111783609A
Application number: CN202010594933.3A
Authority: CN
Inventors: 蒋旻悦; 杨喜鹏; 孙昊; 谭啸; 章宏武; 文石磊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-10-16

Abstract

The application discloses a pedestrian re-identification method, a pedestrian re-identification device, pedestrian re-identification equipment and a computer readable storage medium, and relates to computer vision and deep learning technology in image processing. The specific implementation scheme is as follows: reconstructing a 3D model of the target pedestrian according to at least one 2D image of the target pedestrian; extracting 3D features of the target pedestrian according to the 3D model; extracting 2D features of the target pedestrian according to the at least one 2D image; and according to the fusion characteristics after the 2D characteristics and the 3D characteristics of the target pedestrian are fused, re-identification processing of the target pedestrian is carried out, and the accuracy of re-identification of the pedestrian is improved.

Description

Pedestrian re-identification method, device, equipment and computer readable storage medium

Technical Field

The embodiments of the present application relate to computer vision and deep learning technology in image processing, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for pedestrian re-identification.

Background

Pedestrian re-identification (Person re-identification), also known as pedestrian re-identification, is a technique that uses computer vision techniques to determine whether a particular pedestrian is present in an image or video sequence. Given a monitored pedestrian image, the pedestrian image is retrieved across the device. The visual limitation of a fixed camera is overcome, the pedestrian detection/pedestrian tracking technology can be combined, and the method can be widely applied to the fields of intelligent video monitoring, intelligent security and the like.

In the scenes of security monitoring, people searching and the like, in order to lock a target, the pedestrian re-identification is required according to the unconstrained image of the pedestrian. At present, most pedestrian re-identification methods are based on convolutional neural networks of images, and generally perform well in standard data sets or strictly set application scenes. However, due to the difference between different camera devices, and meanwhile, in the moving process of pedestrians, the appearance is easily affected by wearing, size, shielding, posture, visual angle and the like, the existing convolutional neural network cannot well extract features, and the accuracy of pedestrian re-identification is low.

Disclosure of Invention

A method, apparatus, device and computer-readable storage medium for pedestrian re-identification are provided.

According to an aspect of the present application, there is provided a pedestrian re-identification method, including:

reconstructing a 3D model of the target pedestrian according to at least one 2D image of the target pedestrian;

extracting 3D features of the target pedestrian according to the 3D model;

extracting 2D features of the target pedestrian according to the at least one 2D image;

and according to the fusion feature obtained after the 2D feature and the 3D feature of the target pedestrian are fused, carrying out re-identification processing on the target pedestrian.

According to another aspect of the present application, there is provided a pedestrian re-recognition apparatus including:

the 3D model reconstruction module is used for reconstructing a 3D model of the target pedestrian by utilizing a trained 3D model reconstruction network according to at least one 2D image of the target pedestrian;

the 3D feature extraction module is used for extracting the 3D features of the target pedestrian according to the 3D model;

the 2D feature extraction module is used for extracting 2D features of the target pedestrian according to the at least one 2D image;

and the recognition processing module is used for carrying out re-recognition processing on the target pedestrian according to the fusion feature obtained by fusing the 2D feature and the 3D feature of the target pedestrian.

According to another aspect of the present application, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method described above.

According to another aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method described above.

According to another aspect of the present application, there is provided a pedestrian re-identification method, including:

reconstructing a 3D model of a target pedestrian according to a 2D image of the target pedestrian;

extracting 3D features of the target pedestrian according to the 3D model of the target pedestrian;

fusing the 2D features and the 3D features of the target pedestrian to obtain fused features;

and according to the fusion characteristics, carrying out re-identification processing on the target pedestrian.

according to at least one 2D image of the target pedestrian, reconstructing a network by using the trained 3D model, and reconstructing a 3D model of the target pedestrian;

according to the 3D model of the target pedestrian, through a trained 3D feature extraction model, the 3D model of the target pedestrian is cut into at least two 3D model blocks in the horizontal direction, the 3D feature of the pedestrian in each 3D model block is extracted, and the 3D features of the pedestrians in the at least two 3D model blocks are spliced to obtain the 3D feature of the target pedestrian;

According to the technology of the application, the accuracy of pedestrian re-identification is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a flow chart of a method for pedestrian re-identification provided in a first embodiment of the present application;

FIG. 2 is a flow chart of a method for pedestrian re-identification provided by a second embodiment of the present application;

FIG. 3 is a schematic diagram of a pedestrian re-identification apparatus according to a third embodiment of the present application;

FIG. 4 is a diagram illustrating an apparatus for pedestrian re-identification according to a fourth embodiment of the present application;

fig. 5 is a block diagram of an electronic device for implementing a method of pedestrian re-identification according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The application provides a pedestrian re-identification method, a device, equipment and a computer readable storage medium, which are applied to computer vision and deep learning technology in image processing to achieve the technical effect of improving the accuracy of pedestrian re-identification.

The pedestrian re-identification method can be particularly applied to scenes such as security monitoring and people searching. The method comprises the steps of acquiring 2D images including target pedestrians, which are shot by a plurality of shooting devices from different positions, angles or different times, extracting features of the target pedestrians according to the 2D images, and matching the extracted features with features corresponding to pictures stored in a search library so as to determine the pictures of the target pedestrians in the search library and realize the re-identification of the target pedestrians.

Fig. 1 is a flowchart of a pedestrian re-identification method according to a first embodiment of the present application. As shown in fig. 1, the method comprises the following specific steps:

and S101, reconstructing a 3D model of the target pedestrian according to at least one 2D image of the target pedestrian.

Wherein the 2D image includes the target pedestrian.

In practical application, under various application scenes, only 2D images of target pedestrians can be acquired. In the embodiment of the application, one or more 2D images including the target pedestrian and shot by one shooting device at different moments can be obtained, and 2D images including the target pedestrian and shot by a plurality of shooting devices from a plurality of different positions and different angles can be obtained to obtain a plurality of 2D images of the target pedestrian.

Illustratively, a 3D model of the target pedestrian may be reconstructed from each 2D image of the target pedestrian, and the resulting at least one 3D model of the target pedestrian may be fused into one 3D model.

Illustratively, at least one 2D image of the target pedestrian can be fused, and a 3D model of the target pedestrian can be reconstructed according to the fused 2D image.

And S102, extracting the 3D features of the target pedestrian according to the 3D model.

After the 3D model of the target pedestrian is acquired, feature extraction may be performed on the 3D features of the target pedestrian to extract the 3D features of the target pedestrian.

The 3D features of the target pedestrian are extracted according to the 3D model of the target pedestrian, and any algorithm model for extracting the 3D features in the prior art can be adopted, which is not described herein again.

And S103, extracting the 2D features of the target pedestrian according to the at least one 2D image.

In the embodiment of the application, the trained 3D model is used for reconstructing the network according to at least one 2D image of the target pedestrian, the 3D model of the target pedestrian is reconstructed, the 3D feature of the target pedestrian is extracted according to the 3D model, and meanwhile the 2D feature of the target pedestrian can be extracted according to at least one 2D image.

The process of extracting the 2D feature of the target pedestrian in this step may be performed in parallel with the process of extracting the 3D feature of the target pedestrian in steps S101 to S102, or the 2D feature of the target pedestrian may be extracted first and then the 3D feature of the target pedestrian may be extracted, or the 3D feature of the target pedestrian may be extracted first and then the 2D feature of the target pedestrian may be extracted, which is not specifically limited herein.

And step S104, carrying out re-identification processing on the target pedestrian according to the fusion feature obtained by fusing the 2D feature and the 3D feature of the target pedestrian.

And after the 2D features and the 3D features of the target pedestrian are obtained, fusing the 2D features and the 3D features of the target pedestrian to obtain fused features. And then, the pedestrian is identified again according to the fusion characteristics of the target pedestrian.

The process of re-identifying the pedestrian according to the fusion person of the target pedestrian can be realized by adopting any pedestrian re-identification model in the prior art, and the embodiment is not repeated herein.

According to the embodiment of the application, the 3D model of the target pedestrian is reconstructed according to at least one 2D image of the target pedestrian, different postures of the target pedestrian in different 2D images can be fused to the 3D model of the target pedestrian, and the 3D features extracted according to the 3D model comprise pedestrian posture features; after the 2D features and the 3D features which are extracted respectively are fused, the pedestrians are identified again according to the fusion features, the posture change of the pedestrians can be effectively coped with, and the accuracy of the pedestrian identification is improved.

Fig. 2 is a flowchart of a pedestrian re-identification method according to a second embodiment of the present application. On the basis of the first embodiment, in this embodiment, before the pedestrian re-identification is performed, the 3D model reconstruction network is trained in advance by using training data, so as to obtain a trained 3D model reconstruction network; and performing combined training on the 2D feature extraction model and the 3D feature extraction model to obtain a trained 2D feature extraction model and a trained 3D feature extraction model. As shown in fig. 2, the method comprises the following specific steps:

step S201, training the 3D model reconstruction network by using the training data to obtain the trained 3D model reconstruction network.

The training data comprises a plurality of training samples, each training sample comprises a pedestrian 2D image and marking information thereof, and the marking information comprises SMPL (A Skinned Multi-Person Linear skin) model parameters of a pedestrian 3D model corresponding to the pedestrian 2D image. The labeling information of the 2D image may be implemented by manual labeling or automatic labeling, and this embodiment is not limited in this respect.

For example, the 3D model reconstruction network may use resnet (redundant network), vgg (oxford Visual Geometry group) Net, ShuffleNet, etc. as a basic network for training, which is not specifically limited in this embodiment.

The L1 loss function may be used in training of the 3D model reconstruction network.

The trained 3D model reconstruction network is used for outputting a group of SMPL model parameters corresponding to the target pedestrian according to the input 2D image containing the target pedestrian, and the group of SMPL model parameters can determine a 3D model of the target pedestrian.

Through training 3D model rebuild network in advance, when carrying out pedestrian re-discernment, can directly use the 3D model rebuild network that trains well to rebuild the 3D model of target pedestrian, can improve efficiency and the precision that target pedestrian's 3D model rebuild.

Step S202, performing combined training on the 2D feature extraction model and the 3D feature extraction model to obtain a trained 2D feature extraction model and a trained 3D feature extraction model.

The 2D feature extraction model may adopt resnet (redundant Neural network), vgg (oxford visual Geometry group) Net, ShuffleNet, or other Neural networks for extracting pedestrian 2D features in the 2D image, which is not specifically limited in this embodiment.

The 3D feature extraction model may be obtained by training using PointNet or a PointNet improved version as a base model, or may be implemented by using other models for extracting 3D features in a human body 3D model, which is not specifically limited in this embodiment.

In the embodiment of the application, the 2D feature extraction model and the 3D feature extraction model are subjected to combined training, in each round of training, a first classification probability in a recognition result of the 2D features extracted by the 2D feature extraction model is utilized, a second classification probability in a recognition structure of the 3D features extracted by the 3D feature extraction model is utilized, and KL divergence or SJ divergence of the first classification probability and the second classification probability is used as a loss function, so that the two can be supervised with each other, and the two are iteratively updated. After multiple rounds of training, the first classification probability and the second classification probability tend to be consistent. Therefore, after the 2D features extracted by the trained 2D feature extraction model and the 3D features extracted by the trained 3D feature extraction model are fused, the accuracy of pedestrian re-identification according to the fused features is higher.

And step S203, acquiring at least one 2D image of the target pedestrian.

Wherein the 2D image includes the target pedestrian.

And S204, reconstructing a network by using the trained 3D model according to at least one 2D image of the target pedestrian, and reconstructing a 3D model of the target pedestrian.

In a possible embodiment, this step may be implemented in the following manner:

according to each 2D image, a trained 3D model is used for reconstructing a network, and a corresponding group of SMPL model parameters are determined; synthesizing SMPL model parameters corresponding to all the 2D images, and determining final SMPL model parameters; and determining a 3D model of the target pedestrian according to the final SMPL model parameters.

Illustratively, the SMPL model parameters corresponding to all the 2D images are integrated to determine final SMPL model parameters, and the final SMPL model parameters may be determined by calculating an average of the SMPL model parameters of each group.

Alternatively, a weight may be set for each 2D image according to the actual situation of the target pedestrian in each 2D image, the weight of the 2D image is used as the weight of the corresponding set of SMPL model parameters, and the set of SMPL model parameters are weighted and averaged to determine the final SMPL model parameters.

The 3D model of the target pedestrian is determined according to the final SMPL model parameters obtained by synthesizing the SMPL model parameters corresponding to all the 2D images, different postures of the target pedestrian in different 2D images can be fused to the 3D model of the target pedestrian, and the accuracy of the 3D model of the target pedestrian is improved.

And S205, adding texture information of the surface of the 3D model according to the 2D image of the target pedestrian.

In the embodiment of the application, the 3D model of the target pedestrian is reconstructed according to the 2D image, and a point in the 3D model corresponds to a point in the 2D image, so that a mapping relationship exists.

In this step, for a point on the 3D model surface, the texture information of the mapped point mapped in the 2D map is extracted, and the texture information of the mapped point is added to the point on the 3D model surface, so that the addition of the texture information of the 3D model surface can be completed.

In addition, according to the 2D image of the target pedestrian, the texture information of the surface of the 3D model is added, and any method in the prior art that maps the texture information of the human body in the 2D image to the corresponding 3D model can be adopted, and this embodiment is not specifically limited here.

Texture information is added to the surface of the 3D model, so that the 3D model contains more features of target pedestrians and is closer to the target pedestrians, the 3D features extracted through the 3D model are more accurate, and the accuracy of pedestrian re-identification can be improved.

And S206, extracting the 3D features of the target pedestrian according to the 3D model.

In the embodiment of the application, the 3D feature extraction model is trained in advance and used for extracting the 3D features of the target pedestrian according to the 3D model of the target pedestrian.

In a preferred embodiment, this step can be specifically implemented as follows:

through the trained 3D feature extraction model, the 3D model is cut into at least two 3D model blocks in the horizontal direction; extracting 3D features of the pedestrians in each 3D model block; and splicing the 3D features of the pedestrians in the at least two 3D model blocks to obtain the 3D feature of the target pedestrian.

The human body center is placed on a three-dimensional coordinate origin in a coordinate system where the 3D features of the human body are located, the X axis is along the direction from feet to head of the human body, and the direction of the X axis is the horizontal direction; the Y axis is along the width direction of the human body, and the Z axis is vertical to the plane of the X axis and the Y axis.

Specifically, the 3D model is divided into at least two 3D model blocks in the horizontal direction, and the 3D model is divided into at least two segments along the X-axis direction (i.e. the direction of the body length in the 3D model), where each segment is a 3D model block. And extracting the pedestrian 3D features of each 3D model block by using the trained 3D feature extraction model. And then splicing the 3D features of the pedestrians in the at least two 3D model blocks according to the positions in the 3D models to obtain the 3D features of the target pedestrian.

The number of the 3D model blocks into which the 3D model is divided may be configured and adjusted according to an actual application scenario, and this embodiment is not specifically limited herein.

Further, background information on the 2D image of the target pedestrian may be pasted on a plane corresponding to 0 of the 3D model z to add the background information in the corresponding 3D model.

By segmenting the 3D model into a plurality of 3D model blocks, such that the 3D feature extraction model generates "attention" to the detailed features in each block, the accuracy of pedestrian re-identification can be improved.

In another implementation manner of this embodiment, the global feature of the 3D model may also be directly extracted through the 3D feature extraction model, and this embodiment is not specifically limited herein.

It should be noted that the implementation manner adopted by the 3D feature extraction model to extract the 3D features of the 3D model is consistent with the implementation manner of extracting the 3D features in the human body 3D model in the training process of the 3D feature extraction model.

And step S207, extracting the 2D features of the target pedestrian according to the at least one 2D image.

In the embodiment of the application, the 2D feature extraction model is trained in advance and used for extracting the 2D features of the target pedestrian according to the 2D image of the target pedestrian.

Specifically, the step may be specifically implemented as follows:

respectively extracting 2D features of the pedestrians in each 2D image through the trained 2D feature extraction model; and fusing the 2D features of the pedestrians in the at least one 2D image to obtain the 2D features of the target pedestrian.

Specifically, the 2D features of the pedestrian in at least one 2D image are fused to obtain the 2D features of the target pedestrian, and the method can be implemented by at least any one of the following manners:

one possible implementation is: and splicing the 2D features extracted according to each 2D image to obtain the 2D features of the target pedestrian.

Another possible implementation is: and summing the 2D features extracted according to each 2D image to obtain the 2D features of the target pedestrian.

Another possible implementation is: and setting weights for the 2D images according to the actual situation of the target pedestrian in the 2D images, taking the weights of the 2D images as the weights of the corresponding 2D features, and carrying out weighted average on the 2D features corresponding to the 2D images to obtain the 2D features of the target pedestrian.

Through the 2D characteristic of fusing a plurality of 2D images, can obtain more comprehensive and abundant target pedestrian's 2D characteristic, be favorable to improving pedestrian's accuracy of discerning again.

Further, through a trained 2D feature extraction model, the 2D features of the pedestrian in each 2D image are respectively extracted, which can be implemented as follows:

for each 2D image, cutting the 2D image into at least two 2D image blocks along the body length direction of a human body through a trained 2D feature extraction model; extracting 2D features of pedestrians in each 2D image block; and splicing the 2D features of the pedestrians in the at least two 2D image blocks to obtain the 2D features of the pedestrians in the 2D image.

Specifically, the 2D image is segmented into at least two 2D image segments along the body length direction, and the 2D image is segmented into at least two segments along the body length direction in the 2D image, where each segment is a 2D image segment. And extracting the 2D features of the pedestrians of each 2D image block by using the trained 3D feature extraction model. And then, splicing the 2D features of the pedestrians in the at least two 2D model blocks according to the positions in the 2D images to obtain the 2D features of the target pedestrian.

The number of 2D image blocks into which the 2D image is cut may be configured and adjusted according to an actual application scenario, and this embodiment is not specifically limited here.

By segmenting the 2D image into a plurality of 2D image patches, such that the 2D feature extraction model generates "attention" to the detail features in each image patch, the accuracy of pedestrian re-identification can be improved.

In another implementation manner of this embodiment, the global feature of the 2D image may also be directly extracted through the 2D feature extraction model, and this embodiment is not specifically limited herein.

It should be noted that the implementation manner of extracting the 2D features of the human body in the 2D image by the 2D feature extraction model is consistent with the implementation manner of extracting the 2D features of the human body in the 2D image in the training process of the 2D feature extraction model.

And S208, fusing the 2D features and the 3D features of the target pedestrian to obtain the fusion features of the target pedestrian.

In the embodiment of the application, after obtaining the 2D characteristic and the 3D characteristic of target pedestrian, through fusing the 2D characteristic and the 3D characteristic of target pedestrian, because the fusion characteristic that obtains both includes the 2D characteristic of target pedestrian, also contains the 3D characteristic of target pedestrian, carry out the re-discernment of target pedestrian according to fusing the characteristic, can improve the precision of pedestrian re-discernment.

One preferred embodiment of this step is:

the 2D features and the 3D features of the target pedestrians are spliced to obtain fusion features, and accuracy of pedestrian re-identification can be improved.

Another possible implementation of this step is:

and adding the 2D features and the 3D features of the target pedestrian to obtain fusion features.

In this embodiment, optionally, the dimensions of the feature vectors output by the 2D feature extraction model and the 3D feature extraction model are set to be the same, so that the fusion feature can be obtained by adding the 2D feature and the 3D feature of the target pedestrian.

Optionally, if the dimensions of the 2D feature and the 3D feature of the target pedestrian are different, the 2D feature and the 3D feature of the target pedestrian may be adjusted to be the same dimension by zero padding and then added to obtain the fusion feature.

And step S209, performing re-identification processing on the target pedestrian according to the fusion characteristics of the target pedestrian.

In the embodiment of the present application, the step may be specifically implemented by the following method:

and determining the pedestrian picture with the similarity between the features and the fusion features larger than or equal to a preset threshold value according to the similarity between the features of the pedestrian picture stored in the search library and the fusion features.

The similarity between the features of the pedestrian picture and the fusion features can be calculated by calculating the cosine similarity or the euclidean distance, and the similarity between the two features is calculated, which is not repeated herein.

In the embodiment of the application, the trained 3D model reconstruction network can be directly used for reconstructing the 3D model of the target pedestrian when the pedestrian is identified again through the pre-trained 3D model reconstruction network, so that the efficiency and the accuracy of reconstructing the 3D model of the target pedestrian can be improved; by performing combined training on the 2D feature extraction model and the 3D feature extraction model, after the 2D features extracted by the trained 2D feature extraction model and the 3D features extracted by the trained 3D feature extraction model are fused, the accuracy of pedestrian re-identification according to the fusion features is higher; further, according to each 2D image, a trained 3D model is used for reconstructing a network, and a corresponding group of SMPL model parameters are determined; synthesizing SMPL model parameters corresponding to all the 2D images, and determining final SMPL model parameters; the 3D model of the target pedestrian is determined according to the final SMPL model parameters, different postures of the target pedestrian in different 2D images can be fused to the 3D model of the target pedestrian, and the accuracy of the 3D model of the target pedestrian is improved; further, the 3D model is cut into a plurality of 3D model blocks, so that the 3D feature extraction model generates 'attention' to the detail features in each block, the 2D features of a plurality of 2D images are fused, more comprehensive and abundant 2D features of the target pedestrian can be obtained, the 2D image is cut into a plurality of 2D image blocks, so that the 2D feature extraction model generates 'attention' to the detail features in each image block, and the accuracy of pedestrian re-identification is improved.

Fig. 3 is a schematic diagram of a pedestrian re-identification device according to a third embodiment of the present application. The pedestrian re-identification device provided by the embodiment of the application can execute the processing flow provided by the embodiment of the pedestrian re-identification method. As shown in fig. 3, the pedestrian re-recognition apparatus 30 includes: a 3D model reconstruction module 301, a 3D feature extraction module 302, a 2D feature extraction module 303 and a recognition processing module 304.

Specifically, the 3D model reconstruction module 301 is configured to reconstruct a 3D model of the target pedestrian by using the trained 3D model reconstruction network according to at least one 2D image of the target pedestrian.

And a 3D feature extraction module 302, configured to extract a 3D feature of the target pedestrian according to the 3D model.

The 2D feature extraction module 303 is configured to extract a 2D feature of the target pedestrian according to the at least one 2D image.

And the identification processing module 304 is configured to perform re-identification processing on the target pedestrian according to the fusion feature obtained by fusing the 2D feature and the 3D feature of the target pedestrian.

The apparatus provided in this embodiment of the present application may be specifically configured to execute the method embodiment provided in the first embodiment, and specific functions are not described herein again.

According to the embodiment of the application, the 3D model of the target pedestrian is reconstructed by utilizing the trained 3D model reconstruction network according to at least one 2D image of the target pedestrian, different postures of the target pedestrian in different 2D images can be fused to the 3D model of the target pedestrian, and the 3D features extracted according to the 3D model comprise the posture features of the pedestrian; after the 2D features and the 3D features which are extracted respectively are fused, the pedestrians are identified again according to the fusion features, the posture change of the pedestrians can be effectively coped with, and the accuracy of the pedestrian identification is improved.

Fig. 4 is a schematic diagram of a pedestrian re-identification apparatus according to a fourth embodiment of the present application. On the basis of the third embodiment, in this embodiment, the 3D model reconstruction module is further configured to:

In one possible embodiment, as shown in fig. 4, the pedestrian re-identification apparatus 30 further includes: a first model training module 305.

The first model training module 305 is configured to:

training the 3D model reconstruction network by using the training data to obtain a trained 3D model reconstruction network; the training data comprises a plurality of training samples, each training sample comprises a pedestrian 2D image and marking information thereof, and the marking information comprises SMPL model parameters of a pedestrian 3D model corresponding to the pedestrian 2D image.

In one possible implementation, the 3D feature extraction module is further configured to:

In one possible implementation, the 2D feature extraction module is further configured to:

In one possible embodiment, as shown in fig. 4, the pedestrian re-identification apparatus 30 further includes: a second model training module 306.

The second model training module 306 is configured to:

and performing combined training on the 2D feature extraction model and the 3D feature extraction model to obtain a trained 2D feature extraction model and a trained 3D feature extraction model.

In one possible embodiment, the 3D model reconstruction module is further configured to:

and adding texture information of the surface of the 3D model according to the 2D image of the target pedestrian.

In one possible implementation, the identification processing module is further configured to:

splicing the 2D features and the 3D features of the target pedestrian to obtain fusion features; or adding the 2D features and the 3D features of the target pedestrian to obtain the fusion features.

The apparatus provided in the embodiment of the present application may be specifically configured to execute the method embodiment provided in the second embodiment, and specific functions are not described herein again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 5, the embodiment of the present application is a block diagram of an electronic device of a pedestrian re-identification method. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors Y01, a memory Y02, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, one processor Y01 is taken as an example.

Memory Y02 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for pedestrian re-identification provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of pedestrian re-identification provided herein.

The memory Y02 is a non-transitory computer readable storage medium that can be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of pedestrian re-identification in the embodiment of the present application (for example, the 3D model reconstruction module 301, the 3D feature extraction module 302, the 2D feature extraction module 303, and the identification processing module 304 shown in fig. 3). The processor Y01 executes various functional applications of the server and data processing, i.e., the method of pedestrian re-identification in the above-described method embodiment, by executing non-transitory software programs, instructions, and modules stored in the memory Y02.

The memory Y02 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the stored data area may store data created from use of the electronic device for pedestrian re-recognition, and the like. Additionally, the memory Y02 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory Y02 may optionally include memory located remotely from the processor Y01, which may be connected to an electronic device for pedestrian re-identification over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of pedestrian re-identification may further include: an input device Y03 and an output device Y04. The processor Y01, the memory Y02, the input device Y03 and the output device Y04 may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 5.

The input device Y03 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device for pedestrian re-recognition, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output device Y04 may include a display device, an auxiliary lighting device (e.g., LED), a tactile feedback device (e.g., vibration motor), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of pedestrian re-identification, comprising:

extracting 3D features of the target pedestrian according to the 3D model;

2. The method of claim 1, wherein said reconstructing a 3D model of a target pedestrian from at least one 2D image of said target pedestrian comprises:

according to each 2D image, a trained 3D model is used for reconstructing a network, and a corresponding group of SMPL model parameters are determined;

synthesizing multi-person linear skin SMPL model parameters corresponding to all the 2D images, and determining final SMPL model parameters;

and determining a 3D model of the target pedestrian according to the final SMPL model parameters.

3. The method of claim 2, further comprising:

training the 3D model reconstruction network by using the training data to obtain a trained 3D model reconstruction network;

the training data comprises a plurality of training samples, each training sample comprises a pedestrian 2D image and marking information thereof, and the marking information comprises SMPL model parameters of a pedestrian 3D model corresponding to the pedestrian 2D image.

4. The method of claim 1, wherein said extracting 3D features of said target pedestrian from said 3D model comprises:

through a trained 3D feature extraction model, cutting the 3D model into at least two 3D model blocks in the horizontal direction;

extracting 3D features of the pedestrians in each 3D model block;

and splicing the 3D features of the pedestrians in the at least two 3D model blocks to obtain the 3D feature of the target pedestrian.

5. The method of claim 4, wherein said extracting 2D features of the target pedestrian from the at least one 2D image comprises:

respectively extracting 2D features of pedestrians in each 2D image through a trained 2D feature extraction model;

and fusing 2D features of the pedestrians in the at least one 2D image to obtain the 2D features of the target pedestrian.

6. The method according to claim 5, wherein the separately extracting 2D features of the pedestrian in each 2D image through the trained 2D feature extraction model comprises:

for each 2D image, cutting the 2D image into at least two 2D image blocks along the length direction of the human body through a trained 2D feature extraction model;

extracting 2D features of pedestrians in each 2D image block;

and splicing the 2D features of the pedestrians in the at least two 2D image blocks to obtain the 2D features of the pedestrians in the 2D image.

7. The method of claim 5, further comprising:

8. The method according to any one of claims 1-7, wherein after reconstructing the 3D model of the target pedestrian using the trained 3D model reconstruction network from the 2D image of the at least one pedestrian of the target pedestrian, further comprising:

and adding texture information of the 3D model surface according to the 2D image of the target pedestrian.

9. The method according to any one of claims 1 to 7, wherein the performing of the re-identification processing of the target pedestrian according to the fused feature of the fused 2D feature and 3D feature of the target pedestrian comprises:

10. The method of claim 9, further comprising:

splicing the 2D features and the 3D features of the target pedestrian to obtain the fusion features;

or adding the 2D features and the 3D features of the target pedestrian to obtain the fusion features.

11. A pedestrian re-identification apparatus comprising:

the 3D model reconstruction module is used for reconstructing a 3D model of the target pedestrian according to at least one 2D image of the target pedestrian;

12. The apparatus of claim 11, wherein the 3D model reconstruction module is further to:

synthesizing SMPL model parameters corresponding to all the 2D images, and determining final SMPL model parameters;

13. The apparatus of claim 12, further comprising: a first model training module to:

14. The apparatus of claim 11, wherein the 3D feature extraction module is further to:

extracting 3D features of the pedestrians in each 3D model block;

15. The apparatus of claim 14, wherein the 2D feature extraction module is further to:

16. The apparatus of claim 15, wherein the 2D feature extraction module is further to:

extracting 2D features of pedestrians in each 2D image block;

17. The apparatus of claim 15, further comprising: a second model training module to:

18. The apparatus of any of claims 11-17, the 3D model reconstruction module further to:

19. The apparatus of any of claims 11-17, the identification processing module further to:

20. The apparatus of claim 19, the identification processing module further to:

21. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.