CN115082748B

CN115082748B - Classification network training and target re-identification method, device, terminal and storage medium

Info

Publication number: CN115082748B
Application number: CN202211014862.0A
Authority: CN
Inventors: 司永洁; 潘华东; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-08-23
Filing date: 2022-08-23
Publication date: 2022-11-22
Anticipated expiration: 2042-08-23
Also published as: CN115082748A

Abstract

The invention provides a method, a device, a terminal and a storage medium for classification network training and target re-identification, wherein the classification network training method comprises the following steps: inputting a first sample image containing a target object into an initial classification network; performing feature extraction on the first sample image through an initial classification network to obtain a first feature map; generating category labels corresponding to all areas in the first sample image based on data information corresponding to all the points on the first feature map; predicting the prediction category corresponding to each region in the first sample image based on the first feature map; and iteratively training an initial classification network based on the error between the class label corresponding to the same region in the first sample image and the prediction class to obtain a classification network, wherein the classification network is used for identifying the class of each region in the image containing the target object. According to the classification network training method, the label labeling of the first sample image is not needed, the workload is reduced, and the detection accuracy of the region category is improved.

Description

Classification network training and target re-identification method, device, terminal and storage medium

Technical Field

The invention relates to the technical field of image recognition, in particular to a classification network training and target re-recognition method, a classification network training and target re-recognition device, a terminal and a computer-readable storage medium.

Background

In the security monitoring industry, the application range of pedestrian re-identification is very wide, including but not limited to searching for a specific person in some places with very dense pedestrian traffic, and at this time, it becomes very important to search for the position and time where a person appears in a short time according to massive pedestrian data in the base. However, due to the angle of installation of the device and the influence of environmental factors, the pedestrian pictures acquired by the device may not be completely captured, such that it is relatively difficult to obtain a good search effect for such picture data.

Disclosure of Invention

The invention mainly solves the technical problem of providing a classification network training and target re-identification method, device, terminal and computer readable storage medium, and solves the problem of poor search effect of pedestrian re-identification in the prior art.

In order to solve the technical problems, the first technical scheme adopted by the invention is as follows: a classification network training method is provided, and comprises the following steps: inputting a first sample image containing a target object into an initial classification network; performing feature extraction on the first sample image through an initial classification network to obtain a first feature map; generating category labels corresponding to all areas in the first sample image based on data information corresponding to all the points on the first feature map; predicting the prediction category corresponding to each region in the first sample image based on the first feature map; and iteratively training an initial classification network based on errors between the class labels and the prediction classes corresponding to the same region in the first sample image to obtain a classification network, wherein the classification network is used for identifying the class of each region in the image containing the target object.

In an optional embodiment, generating a category label corresponding to each region in the first sample image based on data information corresponding to each location on the first feature map includes: extracting the features of the first feature map to obtain a second feature map; and generating a category label corresponding to each area in the first sample image based on the activation response value of each point on the second feature map in each channel.

In an optional embodiment, generating a category label corresponding to each region in the first sample image based on the activation response value of each position on each channel on the second feature map includes: generating a feature vector corresponding to each bit on the basis of an activation response value of each bit in each channel in the second feature map; generating feature information of a second feature map according to the feature vectors respectively corresponding to all the sites; and generating a category label corresponding to each area in the first sample image based on the feature information of the second feature map.

In an optional embodiment, generating a feature vector corresponding to each bit point based on an activation response value of each bit point in each channel in the second feature map includes: determining channel identifications of channels corresponding to the activation response values of the sites based on the attribute categories corresponding to the activation response values of the sites; the attribute class corresponding to the activation response value is determined based on the size of the activation response value; and generating a feature vector of the site according to the channel identifier of the channel corresponding to each activation response value of the site.

In an alternative embodiment, the attribute categories include a target category and a non-target category, and the target category is a category corresponding to a maximum activation response value of the site; the non-target class is a class corresponding to the non-maximum activation response value of the locus; the channel identification comprises a first identifier and a second identifier; determining a channel identifier of a channel corresponding to each activation response value of the site based on the attribute category corresponding to each activation response value of the site, including: assigning a channel corresponding to the activation response value of the site as a first identifier in response to the activation response value of the site corresponding to the target category; assigning a channel corresponding to the activation response value of the site as a second identifier in response to the activation response value of the site corresponding to the non-target category; generating a feature vector of the site according to the channel identifier of the channel corresponding to each activation response value of the site, including: a feature vector for the location is determined based on the first identifier and the second identifier for the location corresponding in each channel.

In an optional embodiment, the generating a category label corresponding to each region in the first sample image based on the feature information of the second feature map includes: generating a feature map corresponding to each region in the first sample image based on the feature information of the second feature map; the number of the areas is the same as that of the channels; and determining the category label corresponding to the area based on the number of the first identifiers in the feature map corresponding to the area.

In an alternative embodiment, the category labels include no occlusions; determining the category label corresponding to the area based on the number of the first identifiers in the feature map corresponding to the area, wherein the determining comprises the following steps: and in response to the fact that the number of the first identifiers exceeds the preset number, determining that the category label corresponding to the area is not blocked.

In an alternative embodiment, the category label includes an occlusion; determining the category label corresponding to the area based on the number of the first identifiers in the feature map corresponding to the area, wherein the determining comprises the following steps: and in response to the number of the first identifiers not exceeding the preset number, determining that the category label corresponding to the area is an occlusion.

In an optional embodiment, the initial classification network includes a first feature extraction network, and the first feature extraction network includes an attitude estimation module and a feature extraction module which are sequentially cascaded; carrying out feature extraction on the first sample image through an initial classification network to obtain a first feature map of the first sample image, wherein the feature extraction comprises the following steps: performing key point detection on the first sample image through an attitude estimation module to obtain a key point heat map of the target object; and performing feature extraction on the key point heat map through a feature extraction module to determine a first feature map of the target object.

In an optional embodiment, the initial classification network further includes a class prediction network, and the class prediction network is connected to the first feature extraction network; predicting the prediction category corresponding to each region in the first sample image based on the first feature map, wherein the predicting comprises the following steps: and determining the prediction type corresponding to each area in the first sample image through a type prediction network based on the first feature map.

In an optional embodiment, the initial classification network further includes a tag generation network, and the tag generation network is connected to the first feature extraction network; generating category labels corresponding to the regions in the first sample image based on the data information corresponding to the regions on the first feature map, including: and generating category labels corresponding to all the areas in the first sample image based on the data information corresponding to all the points on the first feature map through a label generation network.

In an alternative embodiment, the classification network includes a first feature extraction network and a class prediction network; iteratively training an initial classification network based on an error between a class label and a prediction class corresponding to the same region in the first sample image to obtain a classification network, further comprising: iteratively training an initial classification network based on error values between class labels and prediction classes corresponding to the same region of the first sample image; and determining networks except the label generation network in the initial classification network after iterative training as the classification network after the training is finished.

In order to solve the above technical problems, the second technical solution adopted by the present invention is: provided is a target re-identification method, which comprises the following steps: acquiring an image to be identified; the image to be recognized comprises a target object; detecting the image to be identified through a classification network, and determining detection categories respectively corresponding to all areas in the image to be identified; the classification network is obtained by training through the classification network training method; the detection categories include unoccluded and occluded; extracting the features of each region in the image to be recognized to obtain a local feature map corresponding to each region; and determining the re-identification result of the target object based on the detection type corresponding to each region and the local feature map corresponding to each region.

In an optional embodiment, determining a re-recognition result of the target object based on the detection categories corresponding to the respective regions and the local feature maps corresponding to the respective regions includes: determining that the detection type is an unobstructed area as a target area based on the detection type corresponding to each area in the image to be identified; and determining a re-recognition result of the target object according to the local feature map corresponding to the target area.

In an optional embodiment, detecting an image to be recognized through a classification network, and determining detection categories respectively corresponding to regions in the image to be recognized includes: performing feature extraction on an image to be recognized through a classification network to obtain a first feature map of the image to be recognized; determining detection categories respectively corresponding to all areas in the image to be recognized based on the first feature map; the method comprises the following steps of extracting features of each region in an image to be recognized to obtain a local feature map corresponding to each region, wherein the method comprises the following steps: and generating a third feature map of the image to be recognized based on the maximum activation response value respectively corresponding to each bit point in the first feature map.

In an optional embodiment, the performing feature extraction on each region in the image to be recognized to obtain a local feature map corresponding to each region includes: performing feature extraction on an image to be recognized to obtain a global feature map of the image to be recognized; performing feature fusion on the basis of the third feature map and the global feature map corresponding to the image to be recognized to obtain a fourth feature map of the image to be recognized; and determining the local feature map corresponding to each region in the image to be recognized based on the fourth feature map of the image to be recognized.

In an optional embodiment, performing feature extraction on an image to be recognized to obtain a local feature map corresponding to each region in the image to be recognized includes: performing feature extraction on an image to be recognized by adopting a target recognition network to obtain a local feature map corresponding to each region in the image to be recognized; the training method of the target recognition network comprises the following steps: acquiring a second training sample set, wherein the second training sample set comprises a plurality of second sample images containing targets, and the second sample images have identity labels of the contained targets; identifying the target in the second sample image through an initial identification network to obtain the predicted identity of the target; the initial identification network comprises a second feature extraction network and an identity prediction network which are sequentially cascaded; iteratively training an initial recognition network based on an error value between an identity label and a predicted identity corresponding to the same target; and determining the networks except the identity prediction network in the initial recognition network after the iterative training as the trained target recognition network.

In an optional embodiment, the initial identification network further includes a first feature extraction network, and the first feature extraction network is connected to the second feature extraction network; the training method for the initial recognition network further comprises the following steps: performing feature extraction on the second sample image through a first feature extraction network to obtain a first feature map of the second sample image; identifying the target in the second sample image through the initial identification network to obtain the predicted identity of the target, wherein the method comprises the following steps: performing feature extraction on the second sample image through a second feature extraction network to obtain a global feature map of the second sample image; performing feature fusion on the first feature map and the global feature map corresponding to the second sample image to obtain a fusion feature map of the second sample image; and obtaining the predicted identity of the target in the second sample image based on the fusion feature map.

In an optional embodiment, determining a re-recognition result of the target object based on the detection category corresponding to each region and the local feature map corresponding to each region includes: comparing the local characteristic diagram corresponding to the area with the detection type of non-shielding with a preset characteristic diagram of a corresponding area of a preset target object in a database; and determining the identity information of the preset target object as the identity information of the target object in response to the fact that the similarity between the local feature map of the unoccluded area and the preset feature map of the area corresponding to the preset target object exceeds a similarity threshold value.

In order to solve the technical problems, the third technical scheme adopted by the invention is as follows: provided is a classification network training device including: the sample acquisition module is used for inputting a first sample image containing a target object into an initial classification network; the first feature extraction module is used for extracting features of the first sample image through an initial classification network to obtain a first feature map; the category generating module is used for generating category labels corresponding to all areas in the first sample image based on data information corresponding to all the points on the first feature map; the category prediction module is used for predicting prediction categories corresponding to all the areas in the first sample image based on the first feature map; and the training module is used for iteratively training an initial classification network based on the error between the class label and the prediction class corresponding to the same region in the first sample image to obtain a classification network, and the classification network is used for identifying the class of each region in the image containing the target object.

In order to solve the technical problems, the fourth technical scheme adopted by the invention is as follows: provided is an object re-recognition device including: the image acquisition module is used for acquiring an image to be identified; the image to be recognized comprises a target object; the class detection module is used for detecting the image to be identified through a classification network and determining detection classes respectively corresponding to all areas in the image to be identified; the classification network is obtained by training through the classification network training method; the detection categories include unoccluded and occluded; the second feature extraction module is used for extracting features of each region in the image to be recognized to obtain a local feature map corresponding to each region; and the identity recognition module is used for determining the re-recognition result of the target object based on the detection types respectively corresponding to the areas and the local feature maps corresponding to the areas.

In order to solve the above technical problems, a fifth technical solution adopted by the present invention is: there is provided a terminal comprising a memory, a processor and a computer program stored in the memory and running on the processor, the processor being configured to execute the sequence data to implement the steps in the classification network training method as described above or the target re-identification method as described above.

In order to solve the technical problems, the sixth technical scheme adopted by the invention is as follows: there is provided a computer readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of classification network training as described above or the method of object re-identification as described above.

The beneficial effects of the invention are: different from the prior art, the classification network training and target re-identification method, the classification network training and target re-identification device, the terminal and the computer readable storage medium are provided, and the classification network training method comprises the following steps: inputting a first sample image containing a target object into an initial classification network; performing feature extraction on the first sample image through an initial classification network to obtain a first feature map; generating category labels corresponding to all areas in the first sample image based on data information corresponding to all the points on the first feature map; predicting the prediction type corresponding to each area in the first sample image based on the first feature map; and iteratively training an initial classification network based on errors between the class labels and the prediction classes corresponding to the same region in the first sample image to obtain a classification network, wherein the classification network is used for identifying the class of each region in the image containing the target object. According to the classification network training method, label labeling is not needed to be carried out on the first sample image, the class labels corresponding to all the regions of the first sample image can be generated based on the data information corresponding to all the positions on the first feature diagram of the first sample image and serve as the pseudo labels of all the regions of the first sample image, and then the classification network is obtained based on the error value between the predicted class of each region predicted by the first sample image and the pseudo labels of the corresponding regions through training, so that the workload is reduced, the detection accuracy of the region class is improved, and the retrieval effect of subsequent target re-identification is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a classification network training method provided by the present invention;

FIG. 2 is a schematic flow chart of a target re-identification method provided by the present invention;

FIG. 3 is a schematic block diagram of an embodiment of a target re-identification method provided by the present invention;

FIG. 4 is a flowchart illustrating an embodiment of step S21 of the object re-identification method shown in FIG. 2;

FIG. 5 is a flowchart illustrating an embodiment of step S212 of the object re-identification method provided in FIG. 4;

FIG. 6 is a schematic block diagram illustrating a method for re-identifying an object in accordance with one embodiment of the present invention;

FIG. 7 is a schematic block diagram of an embodiment of a classification network training apparatus provided in the present invention;

FIG. 8 is a schematic block diagram of an embodiment of a pedestrian re-identification apparatus provided in the present invention;

FIG. 9 is a schematic block diagram of one embodiment of a terminal provided by the present invention;

FIG. 10 is a schematic block diagram of one embodiment of a computer-readable storage medium provided by the present invention.

Detailed Description

The embodiments of the present application will be described in detail below with reference to the drawings.

In the following description, for purposes of explanation rather than limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, "plurality" herein means two or more than two.

In order to make those skilled in the art better understand the technical solution of the present invention, the following describes a classification network training method and an object re-recognition method provided by the present invention in further detail with reference to the accompanying drawings and the detailed description.

Currently, the main optimization methods for the human body occlusion problem include two types: in the first category, a local random erasure strategy is added into training data, so that the network is less sensitive to the loss of local information, the influence of an occluded part on the distance between features can be reduced as much as possible under the condition that an occluded pedestrian picture searches for a complete picture, and a main dominant feature can be grasped to calculate the similarity distance. However, the method cannot obviously distinguish the picture characteristic distances between different IDs (IDentity, identification numbers), and false detection often occurs for a target which is worn relatively close to the target; and secondly, in order to provide more supervision information for the shielded target, semantic segmentation information of a picture is often added for auxiliary training, a pre-trained semantic segmentation model is adopted to extract a semantic label of the target, then a coding and decoding framework is used for training, finally, in a testing stage, according to a result of semantic segmentation, a feature with high local confidence coefficient and a global feature are selected for similarity measurement, a segmentation branch is used to obtain significance information of a target foreground, so that the network learning is assisted to obtain more key local features, the model is high in cost in time consumption, and a new error is inevitably introduced due to the introduction of a new network structure, so that the model is difficult to popularize and use in some practical application scenes.

Referring to fig. 1, fig. 1 is a schematic flow chart of a classification network training method according to the present invention. The embodiment provides a classification network training method, which comprises the following steps.

S11: a first sample image containing a target object is input into an initial classification network.

In particular, the first training sample set includes a plurality of first sample images containing the target object. The target object in the first sample image may be partially occluded or completely unoccluded.

In a preferred embodiment, the first training sample set includes a plurality of first sample images containing a target object with a partially occluded part and a plurality of first sample images containing a target object without occlusion. Wherein the target object may be a pedestrian.

In a particular embodiment, the initial classification network includes a first feature extraction network, a class prediction network, and a label generation network. The first feature extraction network is connected to the category prediction network and the label generation network, respectively. The first feature extraction network comprises an attitude estimation module and a feature extraction module which are sequentially cascaded.

And inputting each first sample image in the first training sample set into the initial classification network respectively.

S12: and performing feature extraction on the first sample image through an initial classification network to obtain a first feature map.

Specifically, the first sample image is subjected to key point detection through the attitude estimation module to obtain a key point heat map of the target object, and then the attitude of the target object in the first sample image is determined. The pose estimation module may be an openpos network structure, which is open-source.

And performing feature extraction on the key point heat map through a feature extraction module to determine a first feature map of the target object.

In a particular embodiment, the feature extraction module performs a series of convolution and dimension reduction operations on the keypoint heat map to reduce the size of the first feature map by a factor of 16 over the first sample image. Specifically, the number of channels of the first feature map is reduced to 6, so that the size of the first feature map of the obtained first sample image is adjusted to W × H × 6.

S13: and generating category labels corresponding to the areas in the first sample image based on the data information corresponding to the areas on the first feature map.

Specifically, feature extraction is carried out on the first feature map to obtain a second feature map containing the target object; and generating category labels of corresponding areas in the first sample image based on the activation response values of all the points on the second feature map in all the channels.

In a specific embodiment, the feature map with richer semantic information is obtained by performing convolution processing on the first feature map, and then normalization processing and activation function processing are performed on the feature map to obtain a second feature map corresponding to the first sample image. The second characteristic diagram may also be referred to as an attention diagram. The size of the second characteristic diagram may be the same as or different from the size of the first characteristic diagram, and is specifically set according to the actual situation.

In one embodiment, based on the attribute category corresponding to each activation response value of a site, determining a channel identifier of a channel corresponding to each activation response value of the site; the attribute class corresponding to the activation response value is determined based on the size of the activation response value; and generating a feature vector of the site according to the channel identifier of the channel corresponding to each activation response value of the site. The type and number of attribute categories can be set according to actual conditions.

In a particular embodiment, the attribute categories include two categories, a target category and a non-target category. The target class is the class corresponding to the maximum activation response value of the locus; the non-target category is a category corresponding to the non-maximum activation response value of the locus; the channel identification includes a first identifier and a second identifier. Assigning a channel corresponding to the activation response value of the site as a first identifier in response to the activation response value of the site corresponding to the target category; and in response to the activation response value of the site corresponding to the non-target category, assigning a channel corresponding to the activation response value of the site to be a second identifier. A feature vector for the location is determined based on the first identifier and the second identifier for the location corresponding in each channel.

In one embodiment, the activation response values in the 6 channels for each site are different due to the difference in characteristic information for each site. And assigning the label of a channel corresponding to the maximum activation response value of one site in the second feature map as 1, assigning the labels of other channels of each site as 0, and further determining the feature vector corresponding to the site in the second feature map. That is, the first identifier is 0 and the second identifier is 1. For example, for a site a in the second feature map of W × H, if the activation response value of the sixth channel of the six channels corresponding to the site a is the largest, the one-hot vector corresponding to the site a is determined to be [0, 1]. That is, [0,0,0,0,0,0,1 ] is taken as the feature vector of the site a.

And traversing all the positions contained in the second characteristic diagram, and generating a vector matrix X epsilon [32, W, H,6] corresponding to the second characteristic diagram based on the characteristic vectors corresponding to the positions.

In one embodiment, feature maps corresponding to all the regions in the first sample image are generated based on feature information of the second feature map; the number of the areas is the same as that of the channels; and determining the category label corresponding to the area based on the number of the first identifiers in the feature map corresponding to the area.

In the present embodiment, the number of channels is the same as the number of divided regions in the second sample image. And correspondingly extracting each channel to obtain a feature map of one region. In a specific embodiment, each region in the second sample image may be a region of a target object. And performing feature extraction on the part area of the target object based on each channel to obtain a feature map corresponding to the corresponding part area.

In one embodiment, the number of 1 s in the vector matrix X in each channel is counted. If the number of 1 contained in the channel is more, the probability that the area corresponding to the channel is blocked is lower. If the number of 1 contained in the channel is less, the probability that the area corresponding to the channel is blocked is higher.

In a particular embodiment, the category label includes no occlusion. In response to the number of first identifiers exceeding a preset number; the category label of the corresponding region is unoccluded. In a particular embodiment, the category label includes an occlusion. In response to the number of first identifiers not exceeding a preset number; the category label of the corresponding region is occlusion. The number of the first identifiers in each channel is t, and the preset number may be W × H/7.

In one embodiment, the tag generation network is connected to the first feature extraction network. And generating category labels corresponding to all the areas in the first sample image based on the data information corresponding to all the points on the first feature map through a label generation network.

S14: and predicting the prediction type corresponding to each area in the first sample image based on the first feature map.

Specifically, a prediction category corresponding to each region in the first sample image is determined based on the first feature map through a category prediction network. Wherein, the class prediction network is a two-class network. The prediction classes also include occlusion and non-occlusion.

In a specific embodiment, the pooled feature maps are obtained by performing an adaptive pooling operation on the first feature map, the pooled feature maps are adjusted to 1 × 6 in size, and then the prediction categories of the 6 local regions of the first identical image are predicted based on the feature maps of the 6 local regions obtained after pooling.

S15: and iteratively training the initial classification network based on the error between the class label and the prediction class corresponding to the same region in the first sample image to obtain the classification network.

Specifically, the initial classification network is iteratively trained based on error values between class labels and prediction classes corresponding to the same region of the same first sample image.

In a specific embodiment, an error value between a class label and a prediction class corresponding to the same region of the same first sample image is calculated based on a BCE loss function, and then an initial classification network is iteratively trained based on the error value.

In an optional embodiment, the result of the initial classification network is propagated reversely, and the weight of the initial classification network is modified according to the error value between the class label and the prediction class corresponding to the same region in the first sample image, so as to train the initial classification network.

The first sample image is input into an initial classification network, and the initial classification network detects the classification of each area in the first sample image. When the error value between the class label and the prediction class corresponding to the same region in the first sample image is smaller than the preset threshold, the preset threshold may be set by itself, for example, 1%, 5%, and the like, and then the training of the initial classification network is stopped.

And after the initial classification network is converged, determining the networks except the label generation network in the initial classification network after the iterative training as the trained classification network. That is, the classification network includes a first feature extraction network and a category prediction network. The classification network is used for identifying the category of each area in the image containing the target object.

The classification network training method provided by the embodiment comprises the steps of inputting a first sample image containing a target object into an initial classification network; performing feature extraction on the first sample image through an initial classification network to obtain a first feature map; generating a category label corresponding to each region in the first sample image based on data information corresponding to each bit on the first feature map; predicting the prediction category corresponding to each region in the first sample image based on the first feature map; and iteratively training an initial classification network based on the error between the class label corresponding to the same region in the first sample image and the prediction class to obtain a classification network, wherein the classification network is used for identifying the class of each region in the image containing the target object. According to the classification network training method, label labeling is not needed for the first sample image, the class labels corresponding to all the areas of the first sample image can be generated based on data information corresponding to all the points on the first feature map of the first sample image and serve as the pseudo labels of all the areas of the first sample image, and then the classification network is obtained based on error values between the predicted classes of all the areas predicted by the first sample image and the pseudo labels of the corresponding areas through training, so that the workload is reduced, the detection accuracy of the area classes is improved, and the identification accuracy of the target object in the subsequent pedestrian re-identification process is improved.

Referring to fig. 2 and fig. 3, fig. 2 is a schematic flow chart of a target re-identification method according to the present invention; fig. 3 is a schematic block diagram of an embodiment of a target re-identification method provided in the present invention.

The present embodiment provides a target re-identification method, which includes the following steps.

S21: and training the target recognition network.

Specifically, the target identification network comprises a first feature extraction network, a second feature extraction network and an identity prediction network, wherein the first feature extraction network is connected with the second feature extraction network, and the second feature extraction network is connected with the identity prediction network.

Specifically, the specific steps of training to obtain the target recognition network are as follows.

Referring to fig. 4, fig. 4 is a flowchart illustrating an embodiment of step S21 of the object re-identification method provided in fig. 2.

S211: a second set of training samples is obtained.

Specifically, the second training sample set includes a plurality of second sample images containing the target, the second sample images having the identity tag of the contained target. Wherein the target may be a pedestrian in particular.

In a specific embodiment, the second training sample set includes 32 second sample images. Among them, 32 sample images correspond to 8 IDs, and each ID contains 4 second sample images. That is, the pedestrians included in each ID are the same, but the 4 images include an image in which a partial portion of the pedestrian is blocked, and also include an image in which the pedestrian is not blocked.

S212: and identifying the target in the second sample image through the initial identification network to obtain the predicted identity of the target.

Specifically, the initial recognition network includes a first feature extraction network, a second feature extraction network, and an identity prediction network. The first feature extraction network is connected with the second feature network, and the second feature network is connected with the identity prediction network.

The specific steps of identifying the pedestrian in the second sample image through the target identification network and obtaining the identity prediction type of the pedestrian are as follows.

Referring to fig. 5, fig. 5 is a flowchart illustrating an embodiment of step S212 in the object re-identification method provided in fig. 4.

S2121: and performing feature extraction on the second sample image through a first feature extraction network to obtain a first feature map of the second sample image.

Specifically, the pedestrian in the second sample image is subjected to key point detection through the first feature extraction network, and a key point heat map of the pedestrian is obtained. Pose information for the pedestrian may be determined based on the pedestrian's keypoint heat map. And performing feature extraction on the key point heat map to obtain a first feature map corresponding to the second sample image.

S2122: and performing feature extraction on the second sample image through a second feature extraction network to obtain a global feature map of the second sample image.

In particular, the second feature extraction network may be a backbone network. And performing feature extraction on the second sample image based on the backbone network to obtain a global feature map of the second sample image.

S2123: and performing feature fusion based on the first feature map and the global feature map of the same second sample image to obtain a fusion feature map of the second sample image.

Specifically, in order to obtain more semantic information in the second sample image, the first feature map and the global feature map of the same second sample image are subjected to spatial feature re-weighting, so that feature fusion is realized, and a fusion feature map of the second sample image is obtained.

S2124: and obtaining the predicted identity of the target in the corresponding second sample image based on the fused feature map.

Specifically, the adaptive pooling operation in the horizontal direction is performed on the fusion feature map of the second sample image, so that 6 horizontal direction segmented local features corresponding to the second sample image can be obtained, and the predicted identity of the pedestrian contained in the second sample image is predicted through the 6 local features to be obtained in the full connection layer. Wherein the predicted identity may be the ID of the pedestrian.

S213: and iteratively training the initial recognition network based on the error value between the identity label corresponding to the same target and the predicted identity.

Specifically, the initial recognition network is iteratively trained based on error values between the identity labels and the predicted identities corresponding to the same pedestrian.

In one embodiment, an error value between the identity tag and the predicted identity corresponding to the same pedestrian may be calculated based on a cross entropy loss function, and then the initial recognition network may be iteratively trained based on the error value.

In an optional embodiment, the result of the initial recognition network is propagated reversely, and the weight of the initial recognition network is modified according to the error value between the identity label corresponding to the pedestrian in the second sample image and the predicted identity, so as to realize the training of the initial recognition network.

And inputting the second sample image into an initial identification network, and detecting the identity category of the pedestrian in the second sample image by the initial identification network. When the error value between the identity label corresponding to the same pedestrian in the second sample image and the predicted identity is smaller than the preset threshold, the preset threshold can be set by itself, for example, 1%, 5% and the like, and then the training of the initial recognition network is stopped.

And after the target recognition network is converged, determining the network except the identity prediction network in the initial recognition network after the iterative training as the trained target recognition network. The target recognition network includes a first feature extraction network and a second feature extraction network.

S22: and acquiring an image to be identified.

Specifically, images of a parking lot, a market or a road are acquired through image acquisition equipment, and target detection is performed on the images to obtain detection frames of all target objects.

And extracting a small picture containing the target object from the image as an image to be recognized based on the position information of each target object in the image. That is, one target object is included in the image to be recognized.

S23: and detecting the image to be recognized through a classification network, and determining the detection category corresponding to each region in the image to be recognized.

Specifically, the detection categories include two categories, non-occlusion and occlusion. The method comprises the steps of firstly carrying out key point detection on a target object in an image to be recognized through a posture estimation module in a first feature extraction network to obtain a key point heat map of the target object. Pose information for the target object may be determined based on the keypoint heat map of the target object. And performing feature extraction on the key point heat map through a feature extraction module to obtain a first feature map corresponding to the image to be identified.

And determining detection categories respectively corresponding to the areas in the image to be recognized based on the first feature map through a category prediction network.

S24: and performing feature extraction on the first feature map to obtain a second feature map corresponding to the image to be identified.

Specifically, convolution processing is carried out on the first feature map to obtain a feature map with richer semantic information, and then normalization processing and activation function processing are carried out on the feature map to obtain a second feature map corresponding to the image to be recognized. The second characteristic diagram may also be referred to as an attention diagram. The size of the second characteristic diagram may be the same as or different from the size of the first characteristic diagram, and is specifically set according to actual situations.

S25: and generating a third feature map of the image to be recognized based on the maximum activation response value respectively corresponding to each bit point in the second feature map.

Specifically, the maximum activation response value corresponding to each bit in the second feature map is extracted, and a third feature map of the image to be recognized is generated based on the maximum activation response value corresponding to each bit. And the size of the third characteristic diagram is W H1. The third profile may also be referred to as a spatial attention weight of the second profile.

S26: and performing feature extraction on the image to be recognized to obtain a local feature map corresponding to each region in the image to be recognized.

Specifically, feature extraction is performed on the image to be recognized, so that a global feature map of the image to be recognized is obtained. In an embodiment, feature extraction is performed on the network to be identified through the second feature extraction network, so that a global feature map of the image to be identified is obtained. Wherein the second feature extraction network may be a backbone network.

And performing feature fusion based on the third feature map and the global feature map corresponding to the image to be recognized to obtain a fourth feature map of the image to be recognized. In an embodiment, the third feature map and the global feature map corresponding to the image to be recognized are subjected to spatial feature re-weighting, so that spatial feature re-weighting is realized, and semantic information in the obtained fourth feature map is richer. And the size of the fourth characteristic diagram is W × H × C.

And determining the local feature map corresponding to each region in the image to be recognized based on the fourth feature map of the image to be recognized. In one embodiment, the local feature maps of the 6 regions divided in the horizontal direction of the image to be recognized are obtained by performing a local pooling operation on the fourth feature map. Wherein the size of each local feature map is W H/6C.

Obtaining detection types respectively corresponding to all areas in the image to be identified through the step S23; the local feature maps corresponding to the respective regions in the image to be recognized are obtained through step S26. That is, each region of the image to be recognized corresponds to one detection category and one local feature map, respectively. The detection category may also be referred to as a local label, and the local feature map may also be referred to as a local feature.

S27: and determining the identity information of the target object based on the local feature map corresponding to the region of which the detection type is not blocked.

Specifically, selecting a local feature map corresponding to an area of which the detection category is not blocked, and comparing the local feature map with a preset feature map of a corresponding area of a preset target object in a database; and determining the identity information of the preset target object as the identity information of the target object in response to the fact that the similarity between the local feature map of the unoccluded region and the preset feature map of the corresponding region of the preset target object exceeds a similarity threshold.

Referring to fig. 6, fig. 6 is a schematic block diagram of a target re-identification method according to an embodiment of the present invention.

In a specific embodiment, the left image is an image to be recognized, and the right image is an image of a preset target object. And carrying out class detection on 6 areas in the image to be identified to obtain local labels corresponding to the 6 areas. Wherein, the local labels in the first four areas are displayed as not being shielded, and the two latter local labels are displayed as being shielded. And performing feature extraction on the image to be recognized to obtain local features of 6 regions corresponding to the image to be recognized. And then matching the distances between the local features of the first four regions and the local features of the first four regions of the preset target object, and further calculating to obtain corresponding similarity. And in response to the similarity exceeding the similarity threshold, determining the identity information of the preset target object as the identity information of the target object in the image to be recognized.

The target re-identification method provided by the embodiment comprises the steps of obtaining an image to be identified; the image to be recognized comprises a target object; detecting the image to be identified through a classification network, and determining detection categories respectively corresponding to all areas in the image to be identified; extracting the features of each region in the image to be recognized to obtain a local feature map corresponding to each region; and determining the re-identification result of the target object based on the detection type corresponding to each region and the local feature map corresponding to each region. The method comprises the steps of extracting features of an image to be recognized to obtain a local feature map corresponding to each region in the image to be recognized, determining identity information of a target object in the image to be recognized based on the local feature map corresponding to the non-shielded region, and improving the recognition accuracy of the target object.

Referring to fig. 7, fig. 7 is a schematic block diagram of an embodiment of a classification network training apparatus provided in the present invention. The present embodiment provides a classification network training apparatus 50, and the classification network training apparatus 50 includes a sample obtaining module 51, a first feature extracting module 52, a category generating module 53, a category predicting module 54, and a training module 55.

The sample acquisition module 51 is used to input a first sample image containing a target object into the initial classification network.

The first feature extraction module 52 is configured to perform feature extraction on the first sample image through an initial classification network to obtain a first feature map.

The initial classification network comprises a first feature extraction network, a category prediction network and a label generation network, wherein the category prediction network is connected with the first feature extraction network, and the first feature extraction network is connected with the label generation network.

The first feature extraction module 52 is further configured to enable the first feature extraction network to include an attitude estimation module and a feature extraction module, which are sequentially cascaded. Performing key point detection on the first sample image through an attitude estimation module to obtain a key point heat map of the target object; and performing feature extraction on the key point heat map through a feature extraction module to determine a first feature map of the target object.

The category generating module 53 is configured to generate category labels corresponding to the regions in the first sample image based on the data information corresponding to the respective points on the first feature map.

The category prediction module 54 is configured to predict a prediction category corresponding to each region in the first sample image based on the first feature map.

The category prediction module 54 is further configured to determine, through a category prediction network, a prediction category corresponding to each region in the first sample image based on the first feature map.

The category prediction module 54 is further configured to perform feature extraction on the first feature map to obtain a second feature map; and generating a category label corresponding to each area in the first sample image based on the activation response value of each point on the second feature map in each channel.

The category prediction module 54 is further configured to generate category labels corresponding to the regions in the first sample image based on the data information corresponding to the regions on the first feature map through the label generation network.

The category prediction module 54 is further configured to generate a feature vector corresponding to each bit based on an activation response value of each bit in each channel in the second feature map; generating feature information of a second feature map according to the feature vectors respectively corresponding to all the sites; and generating a category label corresponding to each area in the first sample image based on the feature information of the second feature map.

The category prediction module 54 is further configured to determine, based on the attribute category corresponding to each activation response value of the site, a channel identifier of a channel corresponding to each activation response value of the site; the attribute class corresponding to the activation response value is determined based on the size of the activation response value; and generating a feature vector of the site according to the channel identifier of the channel corresponding to each activation response value of the site.

The attribute category comprises a target category and a non-target category, and the target category is a category corresponding to the maximum activation response value of the site; the non-target category is a category corresponding to the non-maximum activation response value of the locus; the channel identification includes a first identifier and a second identifier.

The category prediction module 54 is further configured to, in response to that the activation response value of the location corresponds to the target category, assign a channel corresponding to the activation response value of the location to be a first identifier; responding to the fact that the activation response value of the site corresponds to the non-target category, and assigning a channel corresponding to the activation response value of the site as a second identifier; a feature vector for the location is determined based on the first identifier and the second identifier for the location corresponding in each channel.

The category prediction module 54 is further configured to generate a feature map corresponding to each region in the first sample image based on the feature information of the second feature map; the number of the areas is the same as that of the channels; and determining the category label corresponding to the area based on the number of the first identifiers in the feature map corresponding to the area.

The category labels include both unoccluded and occluded categories. The category prediction module 54 is further configured to determine that the category label corresponding to the area is not occluded in response to that the number of the first identifiers exceeds a preset number; and the method is also used for determining that the category label corresponding to the area is an occlusion in response to the number of the first identifiers not exceeding the preset number.

The training module 55 is configured to iteratively train an initial classification network based on an error between a class label and a prediction class corresponding to the same region in the first sample image, so as to obtain a classification network, where the classification network is configured to identify a class of each region in an image including a target object.

The training module 55 is further configured to iteratively train an initial classification network based on error values between class labels and prediction classes corresponding to the same region of the first sample image; and determining networks except the label generation network in the initial classification network after iterative training as the classification network after the training is finished. The classification network includes a first feature extraction network and a class prediction network.

In the classification network training device provided in this embodiment, label labeling is not required for the first sample image, a class label corresponding to each region of the first sample image may be generated based on data information corresponding to each bit on the first feature map of the first sample image, and the class label is used as a pseudo label for each region of the first sample image, and then a classification network is obtained based on error value training between a prediction class of each region predicted by the first sample image and the pseudo label of the corresponding region, so that workload is reduced, detection accuracy of the region class is improved, and identification accuracy of a target object in a subsequent pedestrian re-recognition process is improved.

Referring to fig. 8, fig. 8 is a schematic block diagram of an embodiment of a pedestrian re-identification device provided in the present invention. The embodiment provides a pedestrian re-identification device 60, and the pedestrian re-identification device 60 comprises an image acquisition module 61, a category detection module 62, a second feature extraction module 63 and an identity identification module 64.

The image acquisition module 61 is used for acquiring an image to be identified; the image to be recognized includes a target object.

The category detection module 62 is configured to detect the image to be recognized through a classification network, and determine detection categories corresponding to the regions in the image to be recognized; the classification network is obtained by training through the classification network training method; the detection category is unoccluded or occluded.

The category detection module 62 is configured to perform feature extraction on the image to be identified through a classification network to obtain a first feature map of the image to be identified; determining detection categories respectively corresponding to all areas in the image to be recognized based on the first feature map; and generating a third feature map of the image to be recognized based on the maximum activation response values respectively corresponding to the bit points in the first feature map.

The category detection module 62 is further configured to perform feature extraction on the second sample image through a first feature extraction network, so as to obtain a first feature map of the second sample image.

The second feature extraction module 63 is configured to perform feature extraction on each region in the image to be recognized, so as to obtain a local feature map corresponding to each region.

The second feature extraction module 63 is further configured to perform feature extraction on the image to be recognized to obtain a global feature map of the image to be recognized; performing feature fusion on the basis of the third feature map and the global feature map corresponding to the image to be recognized to obtain a fourth feature map of the image to be recognized; and determining the local feature map corresponding to each region in the image to be recognized based on the fourth feature map of the image to be recognized.

The second feature extraction module 63 is further configured to perform feature extraction on the image to be recognized by using a target recognition network, so as to obtain a local feature map corresponding to each region in the image to be recognized.

The identity recognition module 64 is configured to determine a re-recognition result of the target object based on the detection category corresponding to each region and the local feature map corresponding to each region.

The identity recognition module 64 is configured to determine, based on detection categories respectively corresponding to the regions in the image to be recognized, that a detection category is an unobstructed region as a target region; and determining a re-recognition result of the target object according to the local feature map corresponding to the target area.

The identity recognition module 64 is further configured to compare the local feature map corresponding to the region with the detection category of which is not blocked with a preset feature map of a region corresponding to a preset target object in the database; and in response to the fact that the similarity between the local feature map of the non-shielded area and the preset feature map of the area corresponding to the preset target object exceeds a similarity threshold, determining the identity information of the preset target object as the identity information of the target object.

The pedestrian re-identification device provided in this embodiment performs category detection on each region in the image to be identified to obtain a detection category corresponding to each region. The method comprises the steps of extracting features of an image to be recognized to obtain local feature maps corresponding to all regions in the image to be recognized, determining identity information of a target object in the image to be recognized based on the local feature maps corresponding to the regions which are not shielded, and further being beneficial to improving the retrieval effect of pedestrian re-recognition.

Referring to fig. 9, fig. 9 is a schematic block diagram of a terminal according to an embodiment of the present invention. The terminal 80 includes a memory 81 and a processor 82 coupled to each other, the processor 82 being configured to execute program instructions stored in the memory 81 to implement any of the above-described classification network training methods or the above-described target re-identification method. In one specific implementation scenario, the terminal 80 may include, but is not limited to: a microcomputer, a server, and in addition, the terminal 80 may further include a mobile device such as a notebook computer, a tablet computer, and the like, which is not limited herein.

In particular, the processor 82 is configured to control itself and the memory 81 to implement any of the above-described classification network training methods or the above-described target re-identification method. Processor 82 may also be referred to as a CPU (Central Processing Unit). The processor 82 may be an integrated circuit chip having signal processing capabilities. The Processor 82 may also be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 82 may be collectively implemented by an integrated circuit chip.

Referring to fig. 10, fig. 10 is a schematic block diagram of an embodiment of a computer-readable storage medium provided in the present invention. The computer readable storage medium 90 stores program instructions 901 capable of being executed by a processor, the program instructions 901 for implementing any of the above-described classification network training methods or the steps in the above-described object re-identification method.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again.

The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, and for brevity, will not be described again herein.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a module or a unit is merely one type of logical division, and an actual implementation may have another division, for example, a unit or a component may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some interfaces, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, before the sensitive personal information is processed, a product applying the technical scheme of the application obtains individual consent and simultaneously meets the requirement of 'explicit consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is considered as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization in the modes of pop-up window information or asking the person to upload personal information thereof and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

The above description is only an embodiment of the present invention, and is not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes performed by the present invention or directly or indirectly applied to other related technical fields are also included in the scope of the present invention.

Claims

1. A classification network training method is characterized by comprising the following steps:

inputting a first sample image containing a target object into an initial classification network;

performing feature extraction on the first sample image through the initial classification network to obtain a first feature map;

performing feature extraction on the first feature map to obtain a second feature map;

generating category labels corresponding to all the areas in the first sample image based on the activation response values of all the points on the second feature map in all the channels;

predicting a prediction category corresponding to each region in the first sample image based on the first feature map;

iteratively training the initial classification network based on an error between the class label corresponding to the same region in the first sample image and the prediction class to obtain a classification network, wherein the classification network is used for identifying the class of each region in the image containing the target object;

the initial classification network comprises a first feature extraction network, and the first feature extraction network comprises an attitude estimation module and a feature extraction module which are sequentially cascaded;

the obtaining of the first feature map of the first sample image by performing feature extraction on the first sample image through the initial classification network includes:

performing key point detection on the first sample image through the attitude estimation module to obtain a key point heat map of the target object;

and performing feature extraction on the key point heat map through the feature extraction module, and determining a first feature map of the target object.

2. The classification network training method according to claim 1,

generating a category label corresponding to each region in the first sample image based on the activation response value of each position on each channel on the second feature map, including:

generating a feature vector corresponding to each of the position points based on an activation response value of each of the position points in each of the channels in the second feature map;

generating feature information of the second feature map according to the feature vectors respectively corresponding to all the sites;

and generating a category label corresponding to each area in the first sample image based on the feature information of the second feature map.

3. The classification network training method according to claim 2,

generating a feature vector corresponding to each of the bit points based on an activation response value of each of the bit points in the second feature map in each of the channels, including:

determining channel identifications of channels corresponding to the activation response values of the sites based on the attribute categories corresponding to the activation response values of the sites; the attribute category corresponding to the activation response value is determined based on the size of the activation response value;

and generating a feature vector of the site according to the channel identifier of the channel corresponding to each activation response value of the site.

4. The classification network training method according to claim 3, wherein the attribute classes include a target class and a non-target class, and the target class is a class corresponding to the maximum activation response value of the site; the non-target class is a class of the site that corresponds to a non-maximum of the activation response values; the channel identification comprises a first identifier and a second identifier;

the determining, based on the attribute category corresponding to each activation response value of the site, a channel identifier of a channel corresponding to each activation response value of the site includes:

in response to the activation response value of the locus corresponding to the target class, assigning the channel corresponding to the activation response value of the locus to the first identifier;

assigning the channel corresponding to the activation response value of the locus as the second identifier in response to the activation response value of the locus corresponding to the non-target category;

generating, by the device according to the channel identifier of the channel corresponding to each of the activation response values of the site, a feature vector of the site, including:

determining a feature vector for the location based on the first identifier and the second identifier for the location in each of the lanes.

5. The classification network training method according to claim 4,

the generating of the category label corresponding to each region in the first sample image based on the feature information of the second feature map includes:

generating a feature map corresponding to each region in the first sample image based on feature information of the second feature map; the number of the regions is the same as that of the channels;

and determining the category label corresponding to the area based on the number of the first identifiers in the feature map corresponding to the area.

6. The classification network training method of claim 5, wherein the class label comprises an unoccluded;

the determining the category label corresponding to the region based on the number of the first identifiers in the feature map corresponding to the region includes:

and in response to the fact that the number of the first identifiers exceeds a preset number, determining that the category label corresponding to the area is not blocked.

7. The classification network training method of claim 5, wherein the class labels comprise occlusions;

the determining, based on the number of the first identifiers in the feature map corresponding to the area, the category label corresponding to the area includes:

and in response to that the number of the first identifiers does not exceed a preset number, determining that the category label corresponding to the area is an occlusion.

8. The classification network training method of claim 1, wherein the initial classification network further comprises a class prediction network, the class prediction network being connected to the first feature extraction network;

the predicting the prediction category corresponding to each region in the first sample image based on the first feature map includes:

and determining a prediction category corresponding to each region in the first sample image based on the first feature map through the category prediction network.

9. The classification network training method of claim 8, wherein the initial classification network further comprises a label generation network, the label generation network being connected to the first feature extraction network;

the generating of the category label corresponding to each region in the first sample image based on the data information corresponding to each point on the first feature map includes:

and generating category labels corresponding to the areas in the first sample image based on the data information corresponding to the bit points on the first feature map through the label generation network.

10. The classification network training method according to claim 9, wherein the classification network includes the first feature extraction network and a class prediction network;

the iteratively training the initial classification network based on the error between the class label and the prediction class corresponding to the same region in the first sample image to obtain a classification network further includes:

iteratively training the initial classification network based on error values between the class labels and the prediction classes corresponding to the same region of the first sample image;

and determining networks except the label generation network in the initial classification network after iterative training as the classification network after training.

11. An object re-identification method, characterized in that the object re-identification method comprises:

acquiring an image to be identified; the image to be recognized comprises a target object;

detecting the image to be identified through a classification network, and determining detection categories corresponding to all areas in the image to be identified respectively; wherein the classification network is obtained by training through the classification network training method of any one of claims 1 to 10; the detection categories include unoccluded and occluded;

performing feature extraction on each region in the image to be recognized to obtain a local feature map corresponding to each region;

and determining the re-identification result of the target object based on the detection category corresponding to each region and the local feature map corresponding to each region.

12. The object re-recognition method according to claim 11,

determining a re-recognition result of the target object based on the detection category corresponding to each region and the local feature map corresponding to each region, respectively, including:

determining the detection type as the unoccluded area as a target area based on the detection type corresponding to each area in the image to be identified;

and determining a re-identification result of the target object according to the local feature map corresponding to the target area.

13. The object re-recognition method according to claim 11,

the detecting the image to be recognized through the classification network, and determining the detection categories respectively corresponding to the areas in the image to be recognized, includes:

performing feature extraction on the image to be identified through the classification network to obtain a first feature map of the image to be identified;

determining detection categories respectively corresponding to all areas in the image to be recognized based on the first feature map;

the feature extraction is performed on each region in the image to be identified to obtain a local feature map corresponding to each region, and the method further comprises the following steps:

and generating a third feature map of the image to be identified based on the maximum activation response value corresponding to each position point in the first feature map.

14. The object re-recognition method of claim 13,

the step of performing feature extraction on each region in the image to be recognized to obtain a local feature map corresponding to each region includes:

extracting the features of the image to be recognized to obtain a global feature map of the image to be recognized;

performing feature fusion on the third feature map and the global feature map corresponding to the image to be recognized to obtain a fourth feature map of the image to be recognized;

and determining the local feature map corresponding to each region in the image to be recognized based on the fourth feature map of the image to be recognized.

15. The object re-recognition method according to claim 11,

the feature extraction of the image to be recognized to obtain the local feature map corresponding to each region in the image to be recognized includes:

extracting the features of the image to be recognized by adopting a target recognition network to obtain the local feature map corresponding to each region in the image to be recognized;

the training method of the target recognition network comprises the following steps:

acquiring a second training sample set, wherein the second training sample set comprises a plurality of second sample images containing targets, and the second sample images are provided with identity labels containing the targets;

identifying the target in the second sample image through an initial identification network to obtain the predicted identity of the target; the initial identification network comprises a second characteristic extraction network and an identity prediction network which are sequentially cascaded;

iteratively training the initial recognition network based on an error value between the identity label and the predicted identity corresponding to the same target;

and determining networks except the identity prediction network in the initial recognition network after iterative training as the trained target recognition network.

16. The object re-recognition method of claim 15, wherein the initial recognition network further comprises a first feature extraction network, the first feature extraction network being connected to the second feature extraction network;

the training method of the initial recognition network further comprises the following steps:

performing feature extraction on the second sample image through the first feature extraction network to obtain a first feature map of the second sample image;

the identifying the target in the second sample image through the initial identification network to obtain the predicted identity of the target includes:

performing feature extraction on the second sample image through the second feature extraction network to obtain a global feature map of the second sample image;

performing feature fusion on the first feature map and the global feature map corresponding to the second sample image to obtain a fusion feature map of the second sample image;

and obtaining the predicted identity of the target in the second sample image based on the fusion feature map.

17. The object re-recognition method according to claim 11,

determining a re-recognition result of the target object based on the detection categories respectively corresponding to the regions and the local feature maps corresponding to the regions, including:

comparing the local feature map corresponding to the area with the detection category of non-shielding with a preset feature map of a corresponding area of a preset target object in a database;

and determining that the identity information of the preset target object is the identity information of the target object in response to that the similarity between the local feature map of the unoccluded region and the preset feature map of the region corresponding to the preset target object exceeds a similarity threshold.

18. A classification network training apparatus, comprising:

the sample acquisition module is used for inputting a first sample image containing a target object into an initial classification network;

the first feature extraction module is used for performing feature extraction on the first sample image through the initial classification network to obtain a first feature map; the initial classification network comprises a first feature extraction network, and the first feature extraction network comprises an attitude estimation module and a feature extraction module which are sequentially cascaded; the first feature extraction module is used for performing key point detection on the first sample image through the attitude estimation module to obtain a key point heat map of the target object; performing feature extraction on the key point heat map through the feature extraction module to determine a first feature map of the target object;

the category generation module is used for extracting the features of the first feature map to obtain a second feature map; generating category labels corresponding to all the areas in the first sample image based on the activation response values of all the points on the second feature map in all the channels;

a category prediction module, configured to predict, based on the first feature map, a prediction category corresponding to each of the regions in the first sample image;

and the training module is used for iteratively training the initial classification network based on the error between the class label and the prediction class corresponding to the same region in the first sample image to obtain a classification network, and the classification network is used for identifying the class of each region in the image containing the target object.

19. An object re-recognition apparatus, characterized in that the object re-recognition apparatus comprises:

the image acquisition module is used for acquiring an image to be identified; the image to be recognized comprises a target object;

the class detection module is used for detecting the image to be identified through a classification network and determining detection classes respectively corresponding to all areas in the image to be identified; wherein the classification network is obtained by training through the classification network training method of any one of claims 1 to 10; the detection categories include unoccluded and occluded;

the second feature extraction module is used for extracting features of each region in the image to be identified to obtain a local feature map corresponding to each region;

and the identity recognition module is used for determining a re-recognition result of the target object based on the detection category corresponding to each region and the local feature map corresponding to each region.

20. A terminal, comprising a memory, a processor and a computer program stored in the memory and running on the processor, wherein the processor is configured to execute sequence data to implement the steps of the classification network training method according to any one of claims 1 to 10 or the object re-identification method according to any one of claims 11 to 17.

21. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the method for training a classification network according to any one of claims 1 to 10 or the method for re-identifying a target according to any one of claims 11 to 17.