CN115631510B

CN115631510B - Pedestrian re-identification method and device, computer equipment and storage medium

Info

Publication number: CN115631510B
Application number: CN202211302703.0A
Authority: CN
Inventors: 谢喜林
Original assignee: Athena Eyes Co Ltd
Current assignee: Athena Eyes Co Ltd
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2023-07-04
Anticipated expiration: 2042-10-24
Also published as: CN115631510A

Abstract

The invention discloses a pedestrian re-identification method, a device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be identified; extracting features of the image to be identified based on a backbone network to obtain a first feature image, and extracting features of the first feature image by adopting a current convolution layer to obtain a second feature image; performing sensing processing on the second feature map to obtain a sensing feature map; when the next convolution layer is not the last convolution layer, taking the second feature map as the first feature map, taking the next convolution layer corresponding to the current convolution layer as the current convolution layer, and returning to the step of extracting the features of the first feature map by adopting the current convolution layer to obtain the second feature map to continue to execute; otherwise, carrying out convolution calculation on the second feature map to obtain a third feature map; and re-identifying the image to be identified based on the perception feature map and the third feature map to obtain a pedestrian re-identifying result, and improving the pedestrian re-identifying accuracy under different illumination influences by adopting the invention.

Description

Pedestrian re-identification method and device, computer equipment and storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to a pedestrian re-recognition method, apparatus, computer device, and storage medium.

Background

With the rapid development of hardware computing power and the rapid development and popularization of artificial intelligence computer vision technology, especially deep learning technology, the computer vision technology has wide application in the field of smart city security.

Pedestrian Re-identification (Person Re-identification) is a technology which uses computer vision technology to search from a large-scale distributed monitoring system to determine whether a specific pedestrian exists, and is also called as Re-identification for short. In the video of the monitoring camera, the installation distribution of the cameras has the conditions of different installation positions, installation angles, installation environments and resolution ratios of the cameras, and the video images of the cameras are different due to different background, space-time pedestrian postures and environmental illumination influences. Therefore, various differences of the monitoring cameras need to be overcome when the pedestrian is re-recognized. However, due to a series of illumination influences such as different illumination brightness differences, different exposure degree differences, forward and reverse light differences, even interference of an environment colored light source and the like in the monitored scene, the accuracy of pedestrian re-identification is greatly influenced. In most scenes, the problem of low recognition accuracy exists in pedestrian re-recognition caused by illumination influence.

Therefore, the conventional pedestrian re-recognition has a problem of low recognition accuracy due to the influence of illumination.

Disclosure of Invention

The embodiment of the invention provides a pedestrian re-identification method, a device, computer equipment and a storage medium, which are used for improving the accuracy of pedestrian re-identification under different illumination influences.

In order to solve the above technical problems, an embodiment of the present application provides a pedestrian re-recognition method, including.

The method comprises the steps of obtaining an image to be identified, and inputting the image to be identified into a pedestrian re-identification model, wherein the pedestrian re-identification model comprises a main network and an illumination sensing network, the main network comprises N+2 convolution layers, the illumination sensing network comprises N illumination sensing modules, the (i+1) th convolution layer is connected with the (i) th illumination sensing module, the value range of the (i) is (1, N) and the i is a positive integer.

And extracting the characteristics of the image to be identified based on the first convolution layer of the backbone network to obtain a first characteristic diagram, and taking the second convolution layer of the backbone network as the current convolution layer.

And carrying out feature extraction on the first feature map by adopting the current convolution layer to obtain a second feature map.

And adopting an illumination sensing module connected with the current convolution layer to perform sensing processing on the second feature map to obtain a sensing feature map.

And when the next convolution layer corresponding to the current convolution layer is not the last convolution layer, taking the second feature map as a first feature map, taking the next convolution layer corresponding to the current convolution layer as the current convolution layer, and returning to the step of extracting features of the first feature map by adopting the current convolution layer to obtain the second feature map to continue to execute.

And when the next convolution layer corresponding to the current convolution layer is the last convolution layer, performing convolution calculation on the second feature map to obtain a third feature map.

And carrying out re-recognition processing on the image to be recognized based on all the perception feature images and the third feature images to obtain a pedestrian re-recognition result.

In order to solve the technical problem, the embodiment of the application also provides a pedestrian re-identification device, which comprises the following steps.

The system comprises an image acquisition module to be identified, a pedestrian re-identification module and a light perception module, wherein the image acquisition module to be identified is used for acquiring an image to be identified and inputting the image to be identified into the pedestrian re-identification module, the pedestrian re-identification module comprises a main network and a light perception network, the main network comprises N+2 convolution layers, the light perception network comprises N light perception modules, the i+1th convolution layer is connected with the i light perception module, the value range of i is (1, N) and i is a positive integer.

The first feature map acquisition module is used for extracting features of the image to be identified based on a first convolution layer of the backbone network to obtain a first feature map, and taking a second convolution layer of the backbone network as a current convolution layer.

And the second feature map acquisition module is used for carrying out feature extraction on the first feature map by adopting the current convolution layer to obtain a second feature map.

And the perception feature map acquisition module is used for carrying out perception processing on the second feature map by adopting an illumination perception module connected with the current convolution layer to obtain a perception feature map.

And the circulation module is used for taking the second characteristic map as a first characteristic map when the next layer of convolution layer corresponding to the current convolution layer is not the last layer of convolution layer, taking the next layer of convolution layer corresponding to the current convolution layer as the current convolution layer, and returning to the step of adopting the current convolution layer to perform characteristic extraction on the first characteristic map to obtain the second characteristic map to continue to execute.

And the third feature map acquisition module is used for carrying out convolution calculation on the second feature map to obtain a third feature map when the next convolution layer corresponding to the current convolution layer is the last convolution layer.

And the pedestrian re-recognition module is used for re-recognizing the image to be recognized based on all the perception feature images and the third feature images to obtain a pedestrian re-recognition result.

In order to solve the above technical problem, the embodiments of the present application further provide a computer device, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the steps of the pedestrian re-recognition method are implemented when the processor executes the computer program.

In order to solve the above technical problem, the embodiments of the present application further provide a computer readable storage medium storing a computer program, where the steps of the pedestrian re-recognition method described above are implemented when the computer program is executed by a processor.

The pedestrian re-recognition method, the device, the computer equipment and the storage medium provided by the embodiment of the invention are characterized in that the image to be recognized is obtained and is input into a pedestrian re-recognition model; based on a first convolution layer of a backbone network, extracting features of an image to be identified to obtain a first feature map, and taking a second convolution layer of the backbone network as a current convolution layer; carrying out feature extraction on the first feature map by adopting the current convolution layer to obtain a second feature map; adopting an illumination sensing module connected with the current convolution layer to perform sensing processing on the second feature map to obtain a sensing feature map; when the next layer of convolution layer corresponding to the current convolution layer is not the last layer of convolution layer, taking the second feature map as a first feature map, taking the next layer of convolution layer corresponding to the current convolution layer as the current convolution layer, and returning to perform feature extraction on the first feature map by adopting the current convolution layer, so that the step of obtaining the second feature map is continuously executed; when the next convolution layer corresponding to the current convolution layer is the last convolution layer, performing convolution calculation on the second feature map to obtain a third feature map; and carrying out re-recognition processing on the image to be recognized based on all the perception feature images and the third feature images to obtain a pedestrian re-recognition result. Through the steps, the influence of different illumination conditions on the re-identification of the pedestrians can be weakened or even removed, and the accuracy of the re-identification of the pedestrians under different illumination influences is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments of the present invention will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied.

FIG. 2 is a flow chart of one embodiment of a pedestrian re-identification method of the present application.

FIG. 3 is an exemplary diagram of a pedestrian re-recognition model of the present application.

Fig. 4 is an exemplary diagram of an illumination sensing module of the present application.

Fig. 5 is a schematic structural view of an embodiment of a pedestrian re-recognition device according to the present application.

FIG. 6 is a schematic structural diagram of one embodiment of a computer device according to the present application.

Detailed Description

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, as shown in fig. 1, a system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may interact with the server 105 via the network 104 using the

terminal devices

101, 102, 103 to receive or send messages or the like.

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture E interface display perts Group Audio Layer III, moving Picture expert compression standard audio layer 3), MP4 players (Moving Picture E interface display perts Group Audio Layer IV, moving Picture expert compression standard audio layer 4), laptop and desktop computers, and so on.

The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the

terminal devices

101, 102, 103.

It should be noted that, the pedestrian re-recognition method provided by the embodiment of the present application is executed by a server, and accordingly, the pedestrian re-recognition device is disposed in the server.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. Any number of terminal devices, networks and servers may be provided according to implementation requirements, and the

terminal devices

101, 102 and 103 in the embodiments of the present application may specifically correspond to application systems in actual production.

Referring to fig. 2, fig. 2 shows a pedestrian re-recognition method according to an embodiment of the present invention, and the application of the method to the server in fig. 1 is described as follows.

S201, acquiring an image to be identified, and inputting the image to be identified into a pedestrian re-identification model, wherein the pedestrian re-identification model comprises a main network and an illumination sensing network, the main network comprises N+2 convolution layers, the illumination sensing network comprises N illumination sensing modules, the i+1th convolution layer is connected with the i illumination sensing module, the value range of i is (1, N) and i is a positive integer.

In step S201, the image to be recognized refers to a pedestrian image.

The method for acquiring the image to be identified includes, but is not limited to, capturing an image from a monitoring video shot by a monitoring camera and shooting by a mobile phone. Specifically, the method for acquiring the image to be identified is adaptively adjusted according to the actual application scene.

The pedestrian re-recognition model refers to a model for recognizing pedestrian data in an image to be recognized. It should be understood that the pedestrian re-recognition model herein may be adapted to specific pedestrians or non-specific pedestrians, specifically according to the actual application scenario.

Specifically, the pedestrian re-recognition model comprises a main network and an illumination perception network, wherein the main network is used for re-recognizing the pedestrian of the image to be recognized, the main network comprises n+2 convolution layers, and the illumination perception network is used for further extracting the characteristics of the image to be recognized and reducing the interference of illumination influence. The illumination sensing network comprises N illumination sensing modules, the (i+1) th convolution layer is connected with the (i) th illumination sensing module, the value range of i is (1, N), and i is a positive integer.

The number of the convolution layers in the backbone network is adaptively adjusted according to the actual application scenario.

The image to be identified is obtained through the method, and is input into the pedestrian re-identification model, so that the characteristic extraction of the image to be identified is conveniently carried out on the basis of the pedestrian re-identification model, the robustness of a pedestrian re-identification algorithm to the illumination environment is improved, and the pedestrian re-identification accuracy under different illumination influences is improved.

S202, extracting features of an image to be identified based on a first convolution layer of a backbone network to obtain a first feature map, and taking a second convolution layer of the backbone network as a current convolution layer.

In step S202, the feature extraction refers to performing convolution calculation on the image to be identified.

The current convolution layer refers to a convolution layer which is used for processing the feature map currently in the backbone network.

The first feature map includes, but is not limited to, color features and gesture features.

The color features are overall color distribution features of the image to be identified, for example, the overall color features of the image to be identified are extracted through a histogram algorithm.

The above-mentioned gesture feature refers to a feature of a pedestrian, for example, a key point of the pedestrian is predicted by a gesture estimation model, and the same key point is aligned by affine transformation to extract the gesture feature. It should be understood that a pedestrian is typically divided into 14 key points, the key points divide the human body into a plurality of regions, the region features are extracted, and affine transformation is aligned with the same key points to complete gesture feature extraction, so as to obtain the pedestrian features of the image to be identified.

Here, it should be noted that the features of the first feature map may be adjusted according to the actual situation, which is not particularly limited herein.

And carrying out convolution calculation on the image to be identified by adopting a first convolution layer of the main network to obtain a first feature image, and carrying out deep feature extraction on the first feature image by adopting a subsequent convolution layer of the main network to improve the identification accuracy of pedestrians.

And S203, carrying out feature extraction on the first feature map by adopting the current convolution layer to obtain a second feature map.

In step S203, the feature extraction means performing convolution calculation on the first feature map.

The second feature map is a feature map obtained by performing convolution calculation on the first feature map.

Here, the features of the second feature map are the same as those of the first feature map. For example, when the first feature map includes only gesture features, then the second feature map also includes only gesture features.

And carrying out feature extraction on the first feature map by adopting the current convolution layer to obtain a second feature map so as to facilitate the subsequent further feature calculation on the feature map by the backbone network, improve the robustness of the pedestrian re-recognition algorithm on the illumination environment and improve the pedestrian re-recognition accuracy under different illumination influences.

S204, adopting an illumination sensing module connected with the current convolution layer to perform sensing processing on the second feature map to obtain a sensing feature map.

In step S204, the illumination sensing module is configured to extract illumination features in the second feature map.

Here, the illumination characteristic refers to a characteristic that appears under the influence of light. For example, when a red light or a green light exists in a shooting environment, the red light or the green light can cause red light on a pedestrian in an acquired pedestrian image. At this time, the red light characteristic existing on the body of the traveler is extracted as the illumination characteristic.

The perceived feature map is a feature map obtained by eliminating the illumination influence in the second feature map.

It should be noted that, each light sensing module corresponds to a sensing feature map. All illumination sensing modules belong to an illumination sensing network.

The implementation method of the illumination sensing network includes but is not limited to a two-class network and an attention network.

The classification network classifies the image to be identified by adopting a classification algorithm to obtain a classification result, and carries out perception processing on the second feature map by adopting different modes according to the classification result to obtain a perception feature map. For example, when the classification result is that the time period corresponding to the image to be identified is daytime, the visible light has similar effect to that of the thermal infrared during daytime, so that the visible light and the thermal infrared are given the same weight to perform sensing processing on the second feature map; when the classification result is that the time period corresponding to the image to be identified is night, the night is dominant by the thermal infrared, so that the thermal infrared is given higher weight to perform sensing processing on the second feature map.

The attention network is to construct a plurality of branches of the illumination sensing network based on attention, the branches and pedestrians re-identify main branches, and perform feature fusion on a multi-level feature map of the main branches in a multi-level joint mode to fuse illumination features.

Preferably, the present application employs an attention-based network to construct an illumination-aware network.

By adopting a multi-level combination mode, the illumination sensing module of the illumination sensing network performs feature fusion on the multi-level feature map of the trunk branch, fuses illumination features, inhibits illumination interference, improves the robustness of the pedestrian re-recognition algorithm on illumination environment, and improves the pedestrian re-recognition accuracy under different illumination influences.

And S205, when the next layer of convolution layer corresponding to the current convolution layer is not the last layer of convolution layer, taking the second feature map as the first feature map, taking the next layer of convolution layer corresponding to the current convolution layer as the current convolution layer, and returning to perform feature extraction on the first feature map by adopting the current convolution layer, so that the step of obtaining the second feature map is continuously executed.

In step S205, when the next convolution layer corresponding to the current convolution layer is not the last convolution layer, the next convolution layer corresponding to the current convolution layer is still connected to one illumination sensing module.

By identifying whether the next convolution layer corresponding to the current convolution layer is the last convolution layer or not, different treatments can be carried out according to the identification result, so that the treatment speed of the image to be identified is improved.

S206, when the next convolution layer corresponding to the current convolution layer is the last convolution layer, performing convolution calculation on the second feature map to obtain a third feature map.

In step S206, when the next convolution layer corresponding to the current convolution layer is the last convolution layer, then the next convolution layer corresponding to the current convolution layer has no connected illumination sensing module.

The third feature map is a feature map obtained by performing convolution calculation on the second feature map.

Here, the features of the third feature map are the same as those of the second feature map. For example, when the second feature map includes only the gesture features, then the third feature map also includes only the gesture features.

And the convolution calculation is carried out on the second feature map through the last convolution layer, so that the robustness of the pedestrian re-identification algorithm on the illumination environment is improved, and the pedestrian re-identification accuracy under different illumination influences is improved.

S207, re-recognition processing is carried out on the image to be recognized based on all the perception feature images and the third feature images, and a pedestrian re-recognition result is obtained.

In step S207, it is specifically.

And carrying out full connection processing on the third feature map, and carrying out connection processing on the feature map obtained by full connection and all the perception feature maps to obtain a connection feature map.

And performing full connection and re-identification processing on the connection feature map to obtain a pedestrian re-identification result.

The sequence of the connection processing can be specifically adjusted according to actual situations. Preferably, the method adopts the feature map and the perception feature map which are obtained through full connection, and the sequence of the perception feature map is sequentially arranged according to the level.

And the pedestrian re-identification result is whether the target pedestrian is contained.

Through the steps, the re-recognition of the image to be recognized is realized, the robustness of a pedestrian re-recognition algorithm to the illumination environment is improved, and the pedestrian re-recognition accuracy under different illumination influences is improved.

The following embodiment explains the pedestrian re-recognition model in the above steps S201 to S207, as shown in fig. 3, and fig. 3 is an exemplary diagram of the pedestrian re-recognition model. Wherein conv, conv1_x, conv2_x, conv3_x, and conv4_x are 5 convolution layers of the backbone network, and the illumination sensing network comprises 3 illumination sensing modules. Each illumination sensing module is connected with one convolution layer, namely, a second convolution layer of the backbone network is connected with a first illumination sensing module of the illumination sensing website, a third convolution layer of the backbone network is connected with a second illumination sensing module of the illumination sensing website, and so on. fc is fully connected layers, full connectivity layer, concat refers to the connectivity process. Softmax loss, which in this figure represents the full connection layer access class loss function at the backbone network, is a measure of pedestrian characteristics using ternary losses.

And the conv1_x, the conv2_x and the conv3_x of the main network are respectively connected with an illumination sensing module, and an illumination attention network is arranged on a multi-level feature map of the main network. And finally, integrating the output of the full-connection layer and the three-level illumination sensing module of the backbone network, and generating the final pedestrian re-identification characteristic through full connection. And the classification loss is accessed to the full-connection layer of the backbone network, the identity classification of pedestrians is guided, the final fusion characteristic measures the characteristics of the pedestrians by using the ternary loss, the robustness of the pedestrian re-recognition algorithm to the illumination environment is improved, and the pedestrian re-recognition accuracy under different illumination influences is improved.

In the embodiment, the image to be identified is acquired and input into the pedestrian re-identification model; based on a first convolution layer of a backbone network, extracting features of an image to be identified to obtain a first feature map, and taking a second convolution layer of the backbone network as a current convolution layer; carrying out feature extraction on the first feature map by adopting the current convolution layer to obtain a second feature map; adopting an illumination sensing module connected with the current convolution layer to perform sensing processing on the second feature map to obtain a sensing feature map; when the next layer of convolution layer corresponding to the current convolution layer is not the last layer of convolution layer, taking the second feature map as a first feature map, taking the next layer of convolution layer corresponding to the current convolution layer as the current convolution layer, and returning to perform feature extraction on the first feature map by adopting the current convolution layer, so that the step of obtaining the second feature map is continuously executed; when the next convolution layer corresponding to the current convolution layer is the last convolution layer, performing convolution calculation on the second feature map to obtain a third feature map; and carrying out re-recognition processing on the image to be recognized based on all the perception feature images and the third feature images to obtain a pedestrian re-recognition result. Through the steps, the influence of different illumination conditions on the re-identification of the pedestrians can be weakened or even removed, and the accuracy of the re-identification of the pedestrians under different illumination influences is improved.

In some optional implementations of the present embodiment, before step S201, the pedestrian re-recognition method further includes.

S101, acquiring a pedestrian image set.

S102, based on the residual network, pre-training the initialized backbone network by adopting the pedestrian images in the pedestrian image set to obtain a pre-training backbone network.

S103, constructing an initialized illumination sensing network based on an attention algorithm.

S104, constructing an initialized pedestrian re-recognition model based on the pre-training backbone network and the initialized illumination sensing network.

S105, based on a preset data enhancement mode, data enhancement is carried out on the pedestrian images in the pedestrian image set, and a training image set is obtained.

And S106, training the initialized pedestrian re-recognition model according to the training image set to obtain the pedestrian re-recognition model.

In step S101, the method for acquiring the pedestrian image set includes, but is not limited to, capturing an image from a monitoring video captured by a monitoring camera, and capturing by a mobile phone. Specifically, the above-mentioned manner of acquiring the pedestrian image set is adaptively adjusted according to the actual application scenario.

Preferably, the embodiment of the application adopts a mode of using a plurality of monitoring cameras which are defined in a geographical area to acquire monitoring video data of a certain time. And detecting by using a pedestrian detection model to obtain a pedestrian image set.

In step S102, specifically, a residual network is used to perform feature extraction on the pedestrian images in the pedestrian image set. The method comprises the steps of training an initialized backbone network by using an acquired pedestrian image set, training to obtain the optimal state of loss and characteristic indexes through repeated iterative training, and taking the trained network as a pre-training backbone network for pedestrian re-identification in a subsequent step.

The residual network may be a resnet50, a resnet101, a resnet152, and the resnet50 model is preferred in the present application.

In step S104, the multi-layer convolution layer of the pre-training backbone network is sequentially connected with the illumination sensing module of the initialized illumination sensing network, that is, the illumination sensing network is set on the multi-layer convolution layer of the pre-training backbone network, so as to construct the initialized pedestrian re-recognition model.

In step S105, the above-mentioned data enhancement method includes, but is not limited to, data enhancement based on generation of an countermeasure network, random enhancement.

It should be appreciated that the above-described method of data enhancement is data enhancement of lighting conditions for pedestrian images in a pedestrian image set.

In step S106, specifically, according to the training image set, an initialized illumination sensing network for initializing a pedestrian re-recognition model is trained to obtain a pre-training model with illumination applicability.

And performing fusion training on the pre-training model by adopting a low learning rate and a training image set, performing supervised learning on a main network and an illumination perception network of the pre-training model, and finely adjusting the main network of the pre-training model to obtain the pedestrian re-recognition model.

According to the training image set, training an initialized illumination sensing network for initializing a pedestrian re-recognition model to obtain a pre-training model with illumination applicability, wherein the pre-training backbone network is imported into the initialized illumination sensing network, and the parameters of the backbone network are iteratively frozen by adopting preset times to train an illumination sensing module of the illumination sensing network. It should be noted that, the parameters of the frozen backbone network for the preset number of iterations are preferably parameters of the frozen backbone network for the first 20 iterations.

In the embodiment, the attention mechanism network is used for constructing the illumination perception and illumination estimation branches, the weighting mechanism is realized by combining the multi-level feature map of the backbone network, the pedestrian re-recognition network and the pre-training weight are fused, the multi-stage training mechanism is adopted, the pre-training model is subjected to fine adjustment, the expression of the illumination high-level features in the pedestrian re-recognition features is restrained in a self-adaptive mode, the influence of different illumination conditions on the pedestrian re-recognition can be weakened or even removed, and the pedestrian re-recognition accuracy under different illumination influences is improved.

In some optional implementations of the present embodiment, in step S105, the step of obtaining the training image set includes performing data enhancement on the pedestrian images in the pedestrian image set based on a preset data enhancement mode.

Extracting a preset number of pedestrian images from the pedestrian image set to serve as images to be enhanced, and adding all the images to be enhanced into the image set to be enhanced.

And (3) carrying out brightness processing on the images to be enhanced selected from the images to be enhanced in sequence based on a histogram algorithm to obtain enhancement results corresponding to the images to be enhanced.

And summarizing all the images to be enhanced and enhancement results corresponding to the images to be enhanced to obtain a training image set.

Specifically, the preset number can be adjusted according to actual situations, and the application is not particularly limited. For example, 20% of pedestrian images are randomly extracted from a pedestrian image set as images to be enhanced.

Based on a histogram algorithm, the overall brightness of the image to be enhanced is adjusted, so that color image channels of the image to be enhanced are separated, and different brightness is adjusted in a grading manner according to the separation result by using a proportion, so that an enhancement result corresponding to the image to be enhanced is obtained.

Here, the brightness processing may be adjusted according to the actual situation, and the present application is not limited specifically.

In this embodiment, the brightness of the image to be enhanced is adjusted through a histogram algorithm, so that the number of samples is enhanced, data enhancement is realized, so that when the pedestrian re-recognition model is trained, the influence of different illumination conditions on pedestrian re-recognition can be weakened or even removed, and the pedestrian re-recognition accuracy under different illumination influences is improved.

In some optional implementations of the present embodiment, in step S105, based on a preset data enhancement mode, the step of performing data enhancement on the pedestrian image in the pedestrian image set to obtain the training image set includes:

based on a preset image segmentation mode, carrying out image segmentation on pedestrian images in a pedestrian image set to obtain segmentation results corresponding to each pedestrian image, wherein the segmentation results comprise a foreground region image and a background region image.

And respectively carrying out brightness adjustment on the foreground region image and the background region image corresponding to the pedestrian image by adopting Gaussian distribution aiming at each pedestrian image to obtain a foreground adjustment image corresponding to the foreground region image and a background adjustment image corresponding to the background region image.

And combining the foreground adjusting image and the background adjusting image aiming at each pedestrian image to obtain an exposure image corresponding to the pedestrian image.

And (3) carrying out light source screening on all the exposure images, and adding pedestrian images corresponding to the exposure images screened by the light source into a training image set.

The preset image segmentation mode includes, but is not limited to, equally dividing an image and unequally dividing an image. Dividing the image equally refers to dividing the image into picture blocks with consistent sizes, and dividing the image unequally refers to dividing the image into picture blocks with inconsistent sizes.

Preferably, the present application employs an equipartition image. Meanwhile, the PSPNet segmentation technology is adopted in the method, and image segmentation is carried out on the pedestrian images in the pedestrian image set. PSPNet is an improvement on FCN, more context information is introduced for solving, and when the segmentation layer has more global information, the probability of wrong segmentation is reduced.

And (3) aiming at each pedestrian image, adopting Gaussian distribution to respectively adjust the brightness distribution of the foreground region image and the background region image corresponding to the pedestrian image, and obtaining foreground adjustment images and background adjustment images with different brightness.

And combining the foreground adjustment images with different brightness and the background adjustment image to obtain a pedestrian image under the condition of simulated exposure and forward and backward light.

The light source screening refers to screening out images meeting the standard. For example, images with light sources at 20% -80% brightness are screened out. Here, it should be noted that, the above light source screening is specifically adjusted according to the actual situation, and the application is not specifically limited.

In the embodiment, the images are segmented, and the segmented foreground area image and background area image are subjected to shading to obtain the pedestrian image under the condition of simulated exposure and forward and backward light, so that data enhancement is realized, and when the pedestrian re-recognition model is trained, the influence of different illumination conditions on pedestrian re-recognition can be weakened or even removed, and the pedestrian re-recognition accuracy under different illumination influences is improved.

Based on generating the countermeasure network algorithm, a pre-training network is determined.

Training the pedestrian image set by adopting a pre-training network to obtain a colored light source migration model.

And based on the colored light source migration model, carrying out data enhancement on the pedestrian images in the pedestrian image set to obtain a training image set.

Specifically, based on the generation of the antagonism network algorithm training generator and the discriminator network, the training network generates a colored light source migration model, different images can be used for generating various colored light source templates, pedestrian images in the colored light source environment are migrated and synthesized, and data enhancement on the colored light source condition is formed.

In the embodiment, the data set is expanded by the method, so that the pedestrian data are enriched, and when the pedestrian re-recognition model is trained, the influence of different illumination conditions on the pedestrian re-recognition can be weakened or even removed, and the pedestrian re-recognition accuracy under different illumination influences is improved.

In some optional implementations of this embodiment, in step S204, a light sensing module connected to the current convolution layer is used to perform sensing processing on the second feature map, and the step of obtaining a sensing feature map includes.

And carrying out convolution calculation on the second feature map based on the illumination sensing module connected with the current convolution layer to obtain a fourth feature map and a fifth feature map.

And carrying out pooling and full-connection processing on the fifth feature map to obtain a full-connection feature map.

And carrying out product calculation on the full-connection feature map and the fourth feature map to obtain a fusion feature map.

And carrying out fusion processing on the second feature map and the fusion feature map to obtain a perception feature map.

Specifically, as shown in fig. 4, fig. 4 is a schematic diagram of an illumination sensing module according to an embodiment of the present application.

The fourth feature map and the fifth feature map are feature maps obtained by performing convolution calculation on the second feature map by an illumination sensing module connected with the current convolution layer. Here the fifth and fourth feature patterns are essentially identical.

And pooling the fifth characteristic diagram based on the avarage group pooling layer, and performing full-connection processing on the pooled fifth characteristic diagram based on the FC layer.

In this embodiment, a multi-level joint mode is adopted to perform feature fusion on the multi-level feature graphs of the trunk branches, and the fusion illumination features can fuse the correlation between channels and inhibit the interference of illumination by exchanging the feature information between the feature graphs through the two full-connection layers and the activation layer.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present invention.

Fig. 5 shows a schematic block diagram of a pedestrian re-recognition apparatus in one-to-one correspondence with the pedestrian re-recognition method of the above embodiment. As shown in fig. 5, the pedestrian re-recognition device includes an image acquisition module 31 to be recognized, a first feature map acquisition module 32, a second feature map acquisition module 33, a perceived feature map acquisition module 34, a circulation module 35, a third feature map acquisition module 36, and a pedestrian re-recognition module 37. The functional modules are described in detail below.

The image to be identified obtaining module 31 is configured to obtain an image to be identified, and input the image to be identified into a pedestrian re-identification model, where the pedestrian re-identification model includes a backbone network and an illumination sensing network, the backbone network includes n+2 convolution layers, the illumination sensing network includes N illumination sensing modules, the i+1th convolution layer is connected with the i-th illumination sensing module, a value range of i is (1, N) and i is a positive integer.

The first feature map obtaining module 32 is configured to perform feature extraction on the image to be identified based on a first convolution layer of the backbone network, obtain a first feature map, and use a second convolution layer of the backbone network as a current convolution layer.

And the second feature map obtaining module 33 is configured to perform feature extraction on the first feature map by using the current convolution layer to obtain a second feature map.

The sensing feature map obtaining module 34 performs sensing processing on the second feature map by using an illumination sensing module connected with the current convolution layer to obtain a sensing feature map.

And a circulation module 35, configured to take the second feature map as the first feature map when the next convolution layer corresponding to the current convolution layer is not the last convolution layer, take the next convolution layer corresponding to the current convolution layer as the current convolution layer, and return to use the current convolution layer to perform feature extraction on the first feature map, so that the step of obtaining the second feature map is continuously executed.

And the third feature map obtaining module 36 is configured to perform convolution calculation on the second feature map to obtain a third feature map when the next convolution layer corresponding to the current convolution layer is the last convolution layer.

The pedestrian re-recognition module 37 is configured to re-recognize the image to be recognized based on all the perceived feature maps and the third feature map, so as to obtain a pedestrian re-recognition result.

In some alternative implementations of the present embodiment, the pedestrian re-recognition device further includes, prior to the image acquisition module to be recognized 31.

And the pedestrian image set acquisition module is used for acquiring the pedestrian image set.

The pre-training backbone network acquisition module is used for pre-training the initialized backbone network by adopting pedestrian images in the pedestrian image set based on the residual error network to obtain a pre-training backbone network.

The initialization illumination sensing network construction module is used for constructing an initialization illumination sensing network based on an attention algorithm.

The initialization pedestrian re-recognition model construction module is used for constructing an initialization pedestrian re-recognition model based on the pre-training backbone network and the initialization illumination perception network.

The training image set acquisition module is used for carrying out data enhancement on pedestrian images in the pedestrian image set based on a preset data enhancement mode to obtain a training image set.

And the pedestrian re-recognition model acquisition module is used for training the initialized pedestrian re-recognition model according to the training image set to obtain the pedestrian re-recognition model.

In some optional implementations of this embodiment, the training image set acquisition module includes.

The image to be enhanced acquisition unit is used for extracting a preset number of pedestrian images from the pedestrian image set to serve as images to be enhanced, and adding all the images to be enhanced into the image set to be enhanced.

And the histogram unit is used for sequentially carrying out brightness processing on the images to be enhanced selected from the image set to be enhanced based on a histogram algorithm to obtain enhancement results corresponding to the images to be enhanced.

And the summarizing unit is used for summarizing all the images to be enhanced and the enhancement results corresponding to the images to be enhanced to obtain a training image set.

The segmentation unit is used for carrying out image segmentation on the pedestrian images in the pedestrian image set based on a preset image segmentation mode to obtain a segmentation result corresponding to each pedestrian image, wherein the segmentation result comprises a foreground area image and a background area image.

And the brightness adjustment unit is used for respectively carrying out brightness adjustment on the foreground region image and the background region image corresponding to the pedestrian image by adopting Gaussian distribution for each pedestrian image to obtain a foreground adjustment image corresponding to the foreground region image and a background adjustment image corresponding to the background region image.

The exposure image acquisition unit is used for combining the foreground adjustment image and the background adjustment image for each pedestrian image to obtain an exposure image corresponding to the pedestrian image.

The light source screening unit is used for carrying out light source screening on all the exposure images and adding pedestrian images corresponding to the exposure images screened by the light source into the training image set.

A pre-training network acquisition unit for determining a pre-training network based on generating an countermeasure network algorithm,

the colored light source migration model construction unit is used for training the pedestrian image set by adopting the pre-training network to obtain a colored light source migration model.

The training image set acquisition unit is used for carrying out data enhancement on the pedestrian images in the pedestrian image set based on the colored light source migration model to obtain a training image set.

Further, the perceptual feature map acquisition module 34 includes.

The convolution unit is used for carrying out convolution calculation on the second feature map based on the illumination perception module connected with the current convolution layer to obtain a fourth feature map and a fifth feature map.

And the full-connection unit is used for carrying out pooling and full-connection processing on the fifth characteristic diagram to obtain a full-connection characteristic diagram.

And the product unit is used for carrying out product calculation on the full-connection feature map and the fourth feature map to obtain a fusion feature map.

And the perception feature map acquisition unit is used for carrying out fusion processing on the second feature map and the fusion feature map to obtain the perception feature map.

The specific limitation of the pedestrian re-recognition device can be referred to as the limitation of the pedestrian re-recognition method hereinabove, and will not be repeated here. The above-described individual modules in the pedestrian re-recognition device may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 6, fig. 6 is a basic structural block diagram of a computer device according to the present embodiment.

The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only a computer device 4 having a component connection memory 41, a processor 42, a network interface 43 is shown in the figures, but it is understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.

The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.

The memory 41 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or D interface display memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 4. Of course, the memory 41 may also comprise both an internal memory unit of the computer device 4 and an external memory device. In this embodiment, the memory 41 is typically used for storing an operating system and various application software installed on the computer device 4, such as program codes for controlling electronic files, etc. Further, the memory 41 may be used to temporarily store various types of data that have been output or are to be output.

The processor 42 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute a program code stored in the memory 41 or process data, such as a program code for executing control of an electronic file.

The network interface 43 may comprise a wireless network interface or a wired network interface, which network interface 43 is typically used for establishing a communication connection between the computer device 4 and other electronic devices.

The present application also provides another embodiment, namely, a computer-readable storage medium storing an interface display program executable by at least one processor to cause the at least one processor to perform the steps of the pedestrian re-recognition method as described above.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims

1. A pedestrian re-recognition method, characterized in that the pedestrian re-recognition method comprises:

acquiring an image to be identified, and inputting the image to be identified into a pedestrian re-identification model, wherein the pedestrian re-identification model comprises a main network and an illumination perception network, the main network comprises N+2 convolution layers, the illumination perception network comprises N illumination perception modules, the (i+1) th convolution layer is connected with the (i) th illumination perception module, the value range of the (i) is (1, N) and the i is a positive integer;

Based on a first convolution layer of the backbone network, extracting features of the image to be identified to obtain a first feature map, and taking a second convolution layer of the backbone network as a current convolution layer;

performing feature extraction on the first feature map by adopting the current convolution layer to obtain a second feature map;

adopting an illumination sensing module connected with the current convolution layer to extract illumination characteristics of the second characteristic image so as to obtain a sensing characteristic image;

when the next convolution layer corresponding to the current convolution layer is not the last convolution layer, taking the second feature map as a first feature map, taking the next convolution layer corresponding to the current convolution layer as the current convolution layer, and returning to the step of extracting features of the first feature map by adopting the current convolution layer to obtain a second feature map to continue to execute;

when the next convolution layer corresponding to the current convolution layer is the last convolution layer, performing convolution calculation on the second feature map to obtain a third feature map;

2. The pedestrian re-recognition method according to claim 1, characterized in that, before the step of acquiring an image to be recognized and inputting the image to be recognized into a pedestrian re-recognition model, the method comprises:

acquiring a pedestrian image set;

based on the residual error network, pre-training an initialized backbone network by adopting the pedestrian images in the pedestrian image set to obtain a pre-trained backbone network;

constructing an initialized illumination sensing network based on an attention algorithm;

constructing an initialized pedestrian re-recognition model based on the pre-trained backbone network and the initialized illumination perception network;

based on a preset data enhancement mode, enhancing the data of the pedestrian images in the pedestrian image set to obtain a training image set;

and training the initialized pedestrian re-recognition model according to the training image set to obtain a pedestrian re-recognition model.

3. The pedestrian re-recognition method of claim 2, wherein the step of data enhancing the pedestrian images in the pedestrian image set based on a preset data enhancement mode to obtain a training image set includes:

extracting a preset number of pedestrian images from the pedestrian image set to serve as images to be enhanced, and adding all the images to be enhanced into an image set to be enhanced;

Sequentially carrying out brightness processing on the images to be enhanced selected from the image set to be enhanced based on a histogram algorithm to obtain enhancement results corresponding to the images to be enhanced;

4. The pedestrian re-recognition method of claim 2, wherein the step of data enhancing the pedestrian images in the pedestrian image set based on a preset data enhancement mode to obtain a training image set includes:

based on a preset image segmentation mode, carrying out image segmentation on pedestrian images in the pedestrian image set to obtain segmentation results corresponding to each pedestrian image, wherein the segmentation results comprise a foreground region image and a background region image;

for each pedestrian image, respectively carrying out brightness adjustment on a foreground region image and a background region image corresponding to the pedestrian image by adopting Gaussian distribution to obtain a foreground adjustment image corresponding to the foreground region image and a background adjustment image corresponding to the background region image;

combining the foreground adjustment image and the background adjustment image for each pedestrian image to obtain an exposure image corresponding to the pedestrian image;

And carrying out light source screening on all the exposure images, and adding pedestrian images corresponding to the exposure images screened by the light source into a training image set.

5. The pedestrian re-recognition method of claim 2, wherein the step of data enhancing the pedestrian images in the pedestrian image set based on a preset data enhancement mode to obtain a training image set includes:

determining a pre-training network based on generating an countermeasure network algorithm;

training the pedestrian image set by adopting the pre-training network to obtain a colored light source migration model;

6. The pedestrian re-recognition method of claim 1, wherein the step of extracting the illumination feature of the second feature map by using an illumination sensing module connected with the current convolution layer to obtain a sensing feature map includes:

based on the illumination sensing module connected with the current convolution layer, performing convolution calculation on the second feature map to obtain a fourth feature map and a fifth feature map;

sequentially pooling, full-connection, activating and full-connection the fifth feature map, and performing activation treatment to obtain a full-connection feature map;

Performing product calculation on the full-connection feature map and the fourth feature map to obtain a fusion feature map;

7. A pedestrian re-recognition device, characterized in that the pedestrian re-recognition device includes:

the system comprises an image acquisition module to be identified, a pedestrian re-identification module and a storage module, wherein the image acquisition module to be identified is used for acquiring an image to be identified and inputting the image to be identified into the pedestrian re-identification module, the pedestrian re-identification module comprises a main network and an illumination sensing network, the main network comprises N+2 convolution layers, the illumination sensing network comprises N illumination sensing modules, the i+1th convolution layer is connected with the i illumination sensing module, the value range of i is (1, N) and the i is a positive integer;

the first feature map acquisition module is used for extracting features of the image to be identified based on a first convolution layer of the backbone network to obtain a first feature map, and taking a second convolution layer of the backbone network as a current convolution layer;

the second feature map obtaining module is used for carrying out feature extraction on the first feature map by adopting the current convolution layer to obtain a second feature map;

The sensing characteristic diagram acquisition module is used for extracting illumination characteristics of the second characteristic diagram by adopting an illumination sensing module connected with the current convolution layer to obtain a sensing characteristic diagram;

the circulation module is used for taking the second characteristic map as a first characteristic map when the next layer of convolution layer corresponding to the current convolution layer is not the last layer of convolution layer, taking the next layer of convolution layer corresponding to the current convolution layer as the current convolution layer, and returning to the step of adopting the current convolution layer to perform characteristic extraction on the first characteristic map to obtain the second characteristic map to continue to execute;

the third feature map obtaining module is used for carrying out convolution calculation on the second feature map to obtain a third feature map when the next convolution layer corresponding to the current convolution layer is the last convolution layer;

8. The pedestrian re-recognition device of claim 7, wherein the perception feature map acquisition module comprises:

the convolution unit is used for carrying out convolution calculation on the second characteristic map based on the illumination sensing module connected with the current convolution layer to obtain a fourth characteristic map and a fifth characteristic map;

The full-connection unit is used for carrying out pooling and full-connection treatment on the fifth characteristic diagram to obtain a full-connection characteristic diagram;

the product unit is used for carrying out product calculation on the full-connection feature map and the fourth feature map to obtain a fusion feature map;

and the perception feature map acquisition unit is used for carrying out fusion processing on the second feature map and the fusion feature map to obtain a perception feature map.

9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the pedestrian re-recognition method according to any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the pedestrian re-recognition method according to any one of claims 1 to 6.