CN111191533B

CN111191533B - Pedestrian re-recognition processing method, device, computer equipment and storage medium

Info

Publication number: CN111191533B
Application number: CN201911309205.7A
Authority: CN
Inventors: 王鋆玙
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2024-03-19
Anticipated expiration: 2039-12-18
Also published as: CN111191533A

Abstract

The application relates to a pedestrian re-identification processing method, a pedestrian re-identification processing device, computer equipment and a storage medium. The method comprises the following steps: inputting an identification image to be re-identified into a pre-trained prediction neural network to obtain a plurality of sub-features of the identification image output by the prediction neural network and visibility confidence degrees corresponding to each sub-feature of the identification image; determining features of the identification image according to a plurality of sub-features of the identification image and visibility confidence corresponding to each sub-feature of the identification image; and searching a target image containing the target object in a preset image database according to the characteristics of the identification image. By the embodiment of the invention, the accuracy of image searching is improved, and the difficulty of image searching is reduced.

Description

Pedestrian re-recognition processing method, device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of pedestrian re-recognition technologies, and in particular, to a pedestrian re-recognition processing method, device, computer device, and storage medium.

Background

Pedestrian re-recognition (Person-identification) is also called pedestrian re-recognition, and is a technology for judging whether a specific pedestrian exists in an image or a video sequence by utilizing a computer vision technology, and is widely applied to the fields of intelligent video monitoring, intelligent security and the like.

At present, pedestrian re-identification has higher discrimination capability on obvious characteristics such as appearance color, appearance and the like of an object. However, if the pedestrian is partially occluded in the image, the image area available for recognition is reduced, increasing the difficulty of re-recognition of the pedestrian. For example, if a pedestrian rides a non-motor vehicle in one image, since the non-motor vehicle shields part of the characteristics of the pedestrian, it is difficult to find out other images containing the same pedestrian from the image.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a processing method, apparatus, computer device, and storage medium capable of reducing recognition difficulty of pedestrian re-recognition when a target object is blocked.

In a first aspect, an embodiment of the present invention provides a method for processing pedestrian re-recognition, where the method includes:

inputting the identification image to be re-identified into a pre-trained prediction neural network to obtain a plurality of sub-features of the identification image output by the prediction neural network and visibility confidence corresponding to each sub-feature of the identification image; the identification image comprises a target object, a plurality of sub-features of the identification image correspond to a plurality of partial areas of the target object, and the partial areas of the target object form the target object; the visibility confidence is used for indicating the probability that the partial area of the target object corresponding to each sub-feature of the identification image is not blocked;

Determining features of the identification image according to the plurality of sub-features of the identification image and the visibility confidence corresponding to each sub-feature of the identification image;

and searching a target image containing the target object in a preset image database according to the characteristics of the identification image.

In one embodiment, before the identifying image to be re-identified is input into the pre-trained predictive neural network, the method further includes:

acquiring a training sample set; the training sample set comprises a plurality of training samples and visibility labels of the training samples; the training sample comprises a training object, a plurality of sub-features of the training sample correspond to a plurality of partial areas of the training object, the plurality of partial areas of the training object form the training object, and the visibility mark is used for indicating the visibility of each sub-feature of the training sample corresponds to the partial area of the training object;

and training the neural network based on the training sample set to obtain the predicted neural network.

In one embodiment, the acquiring a training sample set includes:

acquiring a plurality of training samples;

and labeling each sub-feature in each training sample according to the pixel value to obtain the visibility label of each training sample.

In one embodiment, the marking each sub-feature in each training sample according to the pixel value to obtain the visibility mark of each training sample includes:

calculating an average pixel value of the training sample;

if the average pixel value of the partial region of the training object corresponding to the sub-feature of the training sample is larger than the average pixel value of the training sample, marking the sub-feature of the training sample as visible;

if the average pixel value of the partial region of the training object corresponding to the sub-feature of the training sample is smaller than or equal to the average pixel value of the training sample, the sub-feature of the training sample is marked as invisible.

In one embodiment, the training of the neural network based on the training sample set to obtain a predicted neural network includes:

extracting features of the training sample through a neural network to obtain features of the training sample, and dividing the features of the training sample into a plurality of sub-features;

convolving the sub-features of the training sample by adopting a convolution layer with unshared weight, and obtaining a one-dimensional feature vector corresponding to each sub-feature of the training sample through a global average pooling layer and a full connection layer;

obtaining visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample;

And training the neural network according to the visibility labels of the training samples and the visibility confidence degrees corresponding to each sub-feature of the training samples to obtain a prediction neural network.

In one embodiment, determining the feature of the identification image according to the plurality of sub-features of the identification image and the visibility confidence corresponding to each sub-feature of the identification image includes:

weighting and calculating a plurality of sub-features of the identification image and visibility confidence degrees corresponding to each sub-feature of the identification image to obtain a plurality of intermediate features;

and carrying out summation calculation on the plurality of intermediate features to obtain the features of the identification image.

In one embodiment, the searching the target image including the target object in the preset image database according to the target feature of the identification image includes:

extracting the characteristics of each candidate image in the image database to obtain the characteristics of each candidate image;

finding out target features matched with the features of the identification images from the features of the candidate images;

and determining the candidate image corresponding to the target feature as a target image.

In one embodiment, the characteristic of the identification image is a whole-body characteristic of the pedestrian; the plurality of sub-features of the identification image includes an upper body feature and a lower body feature of the pedestrian.

In a second aspect, an embodiment of the present invention provides a processing apparatus for pedestrian re-recognition, where the apparatus includes:

the visibility prediction module is used for inputting the identification image to be re-identified into a pre-trained prediction neural network to obtain a plurality of sub-features of the identification image output by the prediction neural network and visibility confidence corresponding to each sub-feature of the identification image; the identification image comprises a target object, a plurality of sub-features of the identification image correspond to a plurality of partial areas of the target object, and the partial areas of the target object form the target object; the visibility confidence is used for indicating the probability that the partial area of the target object corresponding to each sub-feature of the identification image is not blocked;

the identifying image feature determining module is used for determining the features of the identifying image according to the plurality of sub-features of the identifying image and the visibility confidence corresponding to each sub-feature of the identifying image;

and the target image searching module is used for searching out a target image containing a target object in a preset image database according to the characteristics of the identification image.

In one embodiment, the apparatus further comprises:

the training sample set acquisition module is used for acquiring a training sample set; the training sample set comprises a plurality of training samples and visibility labels of the training samples; the training sample comprises a training object, a plurality of sub-features of the training sample correspond to a plurality of partial areas of the training object, the plurality of partial areas of the training object form the training object, and the visibility mark is used for indicating the visibility of each sub-feature of the training sample corresponds to the partial area of the training object;

And the training module is used for training the neural network based on the training sample set to obtain a predicted neural network.

In one embodiment, the training sample set obtaining module includes:

the training sample acquisition sub-module is used for acquiring a plurality of training samples;

the visibility annotation obtaining sub-module is used for annotating each sub-feature in each training sample according to the pixel value to obtain the visibility annotation of each training sample.

In one embodiment, the visibility annotation obtaining sub-module is specifically configured to calculate an average pixel value of the training sample; if the average pixel value of the partial region of the training object corresponding to the sub-feature of the training sample is larger than the average pixel value of the training sample, marking the sub-feature of the training sample as visible; if the average pixel value of the partial region of the training object corresponding to the sub-feature of the training sample is smaller than or equal to the average pixel value of the training sample, the sub-feature of the training sample is marked as invisible.

In one embodiment, the training module is specifically configured to perform feature extraction on a training sample through a neural network to obtain features of the training sample, and divide the features of the training sample into a plurality of sub-features; convolving the sub-features of the training sample by adopting a convolution layer with unshared weight, and obtaining a one-dimensional feature vector corresponding to each sub-feature of the training sample through global average pooling and a full connection layer; obtaining visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample; and training the neural network according to the visibility labels of the training samples and the visibility confidence degrees corresponding to each sub-feature of the training samples to obtain a prediction neural network.

In one embodiment, the identifying image feature determining module is specifically configured to perform weighted calculation on a plurality of sub-features of the identifying image and visibility confidence degrees corresponding to each sub-feature of the identifying image, so as to obtain a plurality of intermediate features; and carrying out summation calculation on the plurality of intermediate features to obtain the features of the identification image.

In one embodiment, the target image searching module is specifically configured to perform feature extraction on each candidate image in the image database to obtain features of each candidate image; finding out target features matched with the features of the identification images from the features of the candidate images; and determining the candidate image corresponding to the target feature as a target image.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, the memory storing a computer program, the processor implementing the steps of the method as described above when executing the computer program.

In a fourth aspect, embodiments of the present invention provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs steps in a method as described above.

The pedestrian re-recognition processing method, the pedestrian re-recognition processing device, the computer equipment and the storage medium are used for inputting the recognition image to be re-recognized into a pre-trained prediction neural network to obtain a plurality of sub-features of the recognition image output by the prediction neural network and visibility confidence degrees corresponding to each sub-feature of the recognition image; determining features of the identification image according to the plurality of sub-features of the identification image and the visibility confidence corresponding to each sub-feature of the identification image; and searching a target image containing the target object in a preset image database according to the characteristics of the identification image. According to the embodiment of the invention, the prediction neural network performs feature extraction on the identification image containing the target object, the extracted features are segmented, then the visibility prediction is performed on the segmented multiple sub-features to obtain the probability that the partial area of the target object corresponding to each sub-feature is not blocked, then the features of the identification image are determined according to the probability that the partial area of the target object corresponding to each sub-feature is not blocked, and finally the image search is performed according to the features of the identification image. Because the visibility confidence of the non-occluded sub-features is higher, namely the sub-features corresponding to the non-occluded partial regions are higher in the features of the obtained identification image, when the image search is carried out subsequently, the search is mainly carried out according to the sub-features corresponding to the non-occluded partial regions; however, the sub-features of the shielded partial area are also considered during searching, so that the accuracy of image searching is improved, and the difficulty of image searching is reduced.

Drawings

FIG. 1 is an application environment diagram of a pedestrian re-recognition processing method in one embodiment;

FIG. 2 is a flow chart of a method for processing pedestrian re-recognition in one embodiment;

FIG. 3 is a flow chart illustrating the steps for training a predictive neural network in one embodiment;

FIG. 4 is a flowchart of a method for processing pedestrian re-recognition in another embodiment;

FIG. 5 is a block diagram of a processing device for pedestrian re-identification in one embodiment;

fig. 6 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The pedestrian re-recognition processing method provided by the application can be applied to an application environment shown in fig. 1. The application environment includes a terminal 102 and a server 104, the terminal 102 communicating with the server 104 over a network. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices, and the server 104 may be implemented by a stand-alone server or a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a pedestrian re-recognition processing method is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:

step 201, inputting the identification image to be re-identified into a pre-trained prediction neural network, and obtaining a plurality of sub-features of the identification image output by the prediction neural network, and a visibility confidence corresponding to each sub-feature of the identification image.

In this embodiment, the identification image includes a target object, and the sub-features of the plurality of identification images correspond to a plurality of partial areas of the target object; a plurality of partial areas of the target object form the target object; the visibility confidence is used to indicate the probability that a partial region of the target object contained by the sub-features of each identified image is not occluded. Specifically, a predictive neural network is trained in advance, and an identification image to be re-identified is input into the predictive neural network. After receiving the identification image, the prediction neural network performs feature extraction on the identification image, divides the extracted feature into a plurality of sub-features, and predicts whether a partial region of a target object corresponding to each sub-feature is blocked or not to obtain visibility confidence corresponding to each sub-feature.

For example, the target object contained in the recognition image may be a pedestrian, a vehicle, or the like. In this embodiment, a pedestrian is taken as an example, and each sub-feature of the identification image may correspond to a partial area of the pedestrian, for example, the sub-feature of the identification image corresponds to the upper half or the lower half of the pedestrian, or corresponds to the head, the arm, the body, the leg, or the like. After feature extraction is performed on the identification image by the prediction neural network, the extracted feature A is divided into sub-features B1 and B2, the sub-feature B1 is predicted and the visibility confidence is output to be 0.92, and the sub-feature B2 is predicted and the visibility confidence is output to be 0.15. That is, the probability that the partial region X1 of the target object corresponding to the sub-feature B1 is not blocked is 92%, and the probability that the partial region X2 of the target object corresponding to the sub-feature B2 is not blocked is 15%, that is, the partial region of the target object corresponding to the sub-feature B1 of the identification image is not blocked, and the partial region of the target object corresponding to the sub-feature B2 of the identification image is blocked.

Step 202, determining features of the identification image according to the plurality of sub-features of the identification image and the visibility confidence corresponding to each sub-feature of the identification image.

In this embodiment, according to a plurality of sub-features of the identification image and the visibility confidence degrees corresponding to the sub-features, the features of the identification image may be calculated. In one embodiment, determining the features of the identification image may specifically include the steps of: weighting and calculating each sub-feature of the identification image and the visibility confidence corresponding to the sub-feature to obtain a plurality of intermediate features; and carrying out summation calculation on the plurality of intermediate features to obtain the features of the identification image.

For example, the sub-feature B1 of the identification image is multiplied by the visibility confidence level 0.92 to obtain an intermediate feature B1, the sub-feature B2 of the identification image is multiplied by the visibility confidence level 0.15 to obtain an intermediate feature B2, and finally the two intermediate features B1 and B2 are added to obtain the feature a of the identification image.

It can be understood that the visibility confidence of the non-occluded sub-feature is higher, and the visibility confidence of the occluded sub-feature is lower, so that the sub-feature of the non-occluded partial region is higher in the features of the identification image, and when the image search is performed subsequently, the search is performed mainly according to the sub-feature of the non-occluded partial region, but the sub-feature of the occluded partial region is considered, instead of cutting the sub-feature of the occluded partial region, the features of the identification image are reduced, so that the accuracy of the image search is improved.

And 203, searching out a target image containing a target object in a preset image database according to the characteristics of the identification image.

In this embodiment, an image database is preset, a large number of candidate images are stored in the image database, and feature extraction is performed on candidate images to be searched to obtain features of each candidate image. Wherein the features of each candidate image are the same dimension as the features of the identification image. At this time, the features of the identification image are compared with the features of the respective candidate images, and when the features of one of the candidate images match the features of the identification image, the candidate image is determined as the target image. I.e. the target image and the identification image contain the same target object.

For example, feature extraction is performed on candidate images C1, C2 … … C100 to be searched to obtain features C1, C2 … … C100 of the candidate images; the feature a of the identification image is compared with the features c1, c2 … … c100 of the candidate image one by one. Wherein, the similarity between the feature C15 of the candidate image and the feature a of the identification image is greater than the preset similarity, and the candidate image C15 is determined as the target image. The invention does not limit the preset similarity in detail, and can be set according to actual conditions.

In the pedestrian re-recognition processing method, the prediction neural network performs feature extraction and segmentation on the recognition image containing the target object, performs visibility prediction on the segmented multiple sub-features to obtain the probability that each sub-feature is not blocked, then determines the feature of the recognition image according to the probability that each sub-feature is not blocked, and performs image search according to the feature of the recognition image. According to the embodiment of the invention, the visibility confidence of the non-occluded sub-features in the features of the identification image is higher, namely the occupation ratio of the sub-features corresponding to the non-occluded partial regions in the obtained features of the identification image is higher, so that the search is mainly carried out according to the sub-features corresponding to the non-occluded partial regions when the image search is carried out subsequently; however, the sub-features of the shielded partial area are also considered during searching, so that the accuracy of image searching is improved, and the difficulty of image searching is reduced.

For example, the target object included in the identification image is a pedestrian riding, and after the feature extraction is performed on the identification image by the prediction neural network, the extracted feature is divided into an upper sub-feature and a lower sub-feature, wherein the upper sub-feature corresponds to the upper body feature of the pedestrian, and the lower sub-feature corresponds to the lower body feature of the pedestrian. Aiming at the identification image, the prediction neural network can obtain the visibility confidence coefficient corresponding to the upper body characteristic and the visibility confidence coefficient corresponding to the lower body characteristic of the pedestrian, and further obtain the whole body characteristic of the pedestrian according to the upper body characteristic and the corresponding visibility confidence coefficient of the pedestrian and the lower body characteristic and the corresponding visibility confidence coefficient of the pedestrian. The upper body of the pedestrian is not shielded by the vehicle, so that the visibility confidence corresponding to the upper body characteristics of the pedestrian is higher; and the lower body features of the pedestrian are blocked by the vehicle, so the visibility confidence corresponding to the lower body features of the pedestrian is lower. When searching is carried out according to the identification image, searching is mainly carried out according to the upper body characteristics of the pedestrian, but the lower body characteristics of the pedestrian are considered instead of cutting off the lower body of the pedestrian, so that the accuracy of image searching can be improved, and the difficulty of image searching is reduced.

In another embodiment, as shown in FIG. 3, this embodiment is directed to an alternative process of training a predictive neural network. Based on the embodiment shown in fig. 2, the step 202 may specifically include the following steps:

step 301, obtaining a training sample set; the training sample set comprises a plurality of training samples and visibility labels of the training samples; the training sample comprises a training object, a plurality of sub-features of the training sample correspond to a plurality of partial areas of the training object, the plurality of partial areas of the training object form the training object, and the visibility mark is used for indicating the visibility of each sub-feature of the training sample corresponds to the partial area of the training object.

In this embodiment, step 301 includes: acquiring a plurality of training samples; and labeling each sub-feature in each training sample according to the pixel value to obtain the visibility label of each training sample. Since the size of the pixel value of the training sample reflects the state of the partial region of the target object, each sub-feature in the training sample can be visually labeled according to the pixel value. Wherein the training object may be a pedestrian contained in the training sample, and the plurality of sub-features of the training sample may include upper body features and lower body features of the pedestrian.

In one embodiment, labeling each sub-feature in each training sample according to a pixel value to obtain a visibility label of each training sample includes: calculating an average pixel value of the training sample; if the average pixel value of the partial region of the training object corresponding to the sub-feature of the training sample is larger than the average pixel value of the training sample, marking the sub-feature of the training sample as visible; if the average pixel value of the partial region of the training object corresponding to the sub-feature of the training sample is smaller than or equal to the average pixel value of the training sample, the sub-feature of the training sample is marked as invisible.

For example, obtaining the pixel value of each pixel in the training sample M, then calculating the average pixel value to be 150, and if the average pixel value of the sub-feature of the training sample corresponding to the partial region of the training object is greater than the average pixel value 150, marking the sub-feature of the training sample as visible; if the average pixel value of the sub-feature of the training sample corresponding to the partial region of the training object is less than the average pixel value 150, the sub-feature of the training sample is marked as invisible.

Step 302, training the neural network based on the training sample set to obtain a predicted neural network.

In this embodiment, after the training sample set is obtained, training of the neural network is performed according to a plurality of training samples in the training sample set and visibility labels of the training samples, so as to obtain a predicted neural network.

In one embodiment, the training process may specifically include the following steps: extracting features of the training sample through a neural network to obtain features of the training sample, and dividing the features of the training sample into a plurality of sub-features; convolving the sub-features of the training sample by adopting a convolution layer with unshared weight, and obtaining a one-dimensional feature vector corresponding to each sub-feature of the training sample through a global average pooling layer and a full connection layer; obtaining visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample; and training the neural network according to the visibility labels of the training samples and the visibility confidence degrees corresponding to each sub-feature of the training samples to obtain a prediction neural network.

For example, the training sample M is input into a neural network, the neural network performs feature extraction on the training sample M, and the extracted features are divided into upper and lower sub-features N1 and N2. Then, the neural network carries out convolution processing on the sub-feature N1 and the sub-feature N2 by adopting a convolution layer with unshared weight. And then, reducing the dimensions of the sub-features N1 and N2 through a global average pooling layer and a full connection layer to obtain a one-dimensional feature vector. Finally, the visibility confidence corresponding to the sub-feature N1 of the training sample and the visibility confidence corresponding to the sub-feature N2 of the training sample are obtained through the activation function. Calculating the difference between the visibility confidence coefficient of the sub-feature of the training sample output by the neural network and the visibility mark through the loss function; and optimizing the neural network according to the difference value. When the loss function tends to converge, training is finished, and the prediction neural network adopted by the embodiment of the invention is obtained.

In the step of training the prediction neural network, a training sample set is obtained; and training the neural network based on the training sample set to obtain the predicted neural network. The prediction neural network trained by the embodiment of the invention can predict the visibility of a plurality of parts in a complete image, so as to output the non-shielded probability of each part, and further can be applied to image retrieval, thereby avoiding the problem that the shielded parts in the complete image are cut off and are difficult to re-identify pedestrians.

In another embodiment, as shown in FIG. 4, this embodiment is directed to an alternative process for image searching. Based on the embodiment shown in fig. 2, the method specifically includes the following steps:

step 401, clipping the original image to obtain an identification image including the target object.

In this embodiment, if the target object in the original image to be searched is not in the center of the image, the original image may be cut to obtain the identification image, so that the target object is located in the center of the identification image.

For example, if the person image X is located in the lower half of the image in the original image D, the original image is cut to obtain the identification image a, so that the person image X is located in the center of the identification image a.

Step 402, obtaining a plurality of training samples; labeling each sub-feature in each training sample according to the pixel value to obtain the visibility label of each training sample; the training sample comprises a training object, a plurality of sub-features of the training sample correspond to a plurality of partial areas of the training object, the plurality of partial areas of the training object form the training object, and the visibility mark is used for indicating the visibility of each sub-feature of the training sample corresponds to the partial area of the training object.

In one embodiment, an average pixel value of the training samples is calculated; if the average pixel value of the partial region of the training object corresponding to the sub-feature of the training sample is larger than the average pixel value of the training sample, marking the sub-feature of the training sample as visible; if the average pixel value of the partial region of the training object corresponding to the sub-feature of the training sample is smaller than or equal to the average pixel value of the training sample, the sub-feature of the training sample is marked as invisible.

And step 403, training the neural network based on the training sample set to obtain a predicted neural network.

In one embodiment, feature extraction is performed on a training sample through a neural network to obtain features of the training sample, and the features of the training sample are divided into a plurality of sub-features; convolving the sub-features of the training sample by adopting a convolution layer with unshared weight, and obtaining a one-dimensional feature vector corresponding to each sub-feature of the training sample through global average pooling and a full connection layer; obtaining visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample; and training the neural network according to the visibility labels of the training samples and the visibility confidence degrees corresponding to each sub-feature of the training samples to obtain a prediction neural network.

Step 404, inputting the identification image to be re-identified into a pre-trained prediction neural network, so as to obtain a plurality of sub-features of the identification image output by the prediction neural network, and visibility confidence corresponding to each sub-feature of the identification image.

The identification image comprises a target object, and each sub-feature of the identification image corresponds to a partial area of the target object; the visibility confidence is used to indicate a probability that a partial region of the target object corresponding to each sub-feature of the recognition image is not occluded.

Step 405, determining features of the identification image based on the plurality of sub-features of the identification image and the visibility confidence level corresponding to each sub-feature of the identification image.

Step 406, extracting the characteristics of each candidate image in the image database to obtain the characteristics of each candidate image; finding out target features matched with the features of the identification images from the features of the candidate images; and determining the candidate image corresponding to the target feature as a target image.

In the embodiment, feature extraction is performed on candidate features to be searched to obtain features of each candidate image; then, a difference value between the feature of the identification image and the feature of each candidate image may be calculated, and when the difference value is smaller than a preset difference value, a target feature matching the feature of the identification image is found.

For example, difference values between the feature a of the identification image and the features c1, c2 … … c100 of the candidate image are calculated, respectively. The difference value between the feature a of the identification image and the feature C15 of the candidate image is smaller than a preset difference value, the feature C15 of the candidate image is determined to be a target feature, and the candidate image C15 is determined to be a target image. Wherein the target image C15 contains the same target object as the identification image. The embodiment of the invention does not limit the preset difference value in detail, and can be set according to actual conditions.

The euclidean distance between the two features may also be calculated, and when the euclidean distance is less than a preset distance, the two features are determined to match. The embodiment of the invention does not limit the matching mode in detail, and can be set according to actual conditions.

In the pedestrian re-recognition processing method, the original image is cut to obtain a recognition image comprising the target object; inputting the identification image to be re-identified into a pre-trained prediction neural network to obtain a plurality of sub-features of the identification image output by the prediction neural network and visibility confidence corresponding to each sub-feature of the identification image; then, the features of the identification image are determined according to the plurality of sub-features of the identification image and the visibility confidence corresponding to each sub-feature of the identification image, so that the target image is obtained by searching according to the features of the identification image. According to the embodiment of the invention, when the image search is carried out subsequently, the search is mainly carried out according to the sub-features of the partial area which is not shielded; however, the sub-features of the shielded partial area are also considered during searching, so that the accuracy of image searching is improved, and the difficulty of image searching is reduced.

It should be understood that, although the steps in the flowcharts of fig. 2-4 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-4 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor do the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or steps.

In one embodiment, as shown in fig. 5, there is provided a processing apparatus for pedestrian re-recognition, including:

the visibility prediction module 501 is configured to input an identification image to be re-identified into a pre-trained prediction neural network, so as to obtain a plurality of sub-features of the identification image output by the prediction neural network, and a visibility confidence level corresponding to each sub-feature of the identification image; the identification image comprises a target object, a plurality of sub-features of the identification image correspond to a plurality of partial areas of the target object, and the partial areas of the target object form the target object; the visibility confidence is used for indicating the probability that the partial area of the target object corresponding to each sub-feature of the identification image is not blocked;

An identification image feature determining module 502, configured to determine features of the identification image according to a plurality of sub-features of the identification image and a visibility confidence level corresponding to each sub-feature of the identification image;

the target image searching module 503 is configured to search a preset image database for a target image including a target object according to the features of the identification image.

In one embodiment, the apparatus further comprises:

In one embodiment, the training sample set obtaining module includes:

In one embodiment, the training module is specifically configured to perform feature extraction on a training sample through a neural network to obtain features of the training sample, and divide the features of the training sample into a plurality of sub-features; convolving the sub-features of the training sample by adopting a convolution layer with unshared weight, and obtaining a one-dimensional feature vector corresponding to each sub-feature of the training sample through a global average pooling layer and a full connection layer; obtaining visibility confidence corresponding to each sub-feature of the training sample according to the activation function and the one-dimensional feature vector corresponding to each sub-feature of the training sample; and training the neural network according to the visibility labels of the training samples and the visibility confidence degrees corresponding to each sub-feature of the training samples to obtain a prediction neural network.

The specific limitation of the processing device for pedestrian re-recognition may be referred to the limitation of the processing method for pedestrian re-recognition hereinabove, and will not be described herein. The above-described individual modules in the pedestrian re-recognition processing apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing processing data of pedestrian re-identification. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of pedestrian re-recognition processing.

It will be appreciated by those skilled in the art that the structure shown in fig. 6 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

In one embodiment, the processor, when executing the computer program, performs the steps of:

acquiring a plurality of training samples;

calculating an average pixel value of the training sample;

In one embodiment, the plurality of sub-features of the other image includes an upper body feature and a lower body feature of the pedestrian.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by a processor performs the steps of:

acquiring a plurality of training samples;

calculating an average pixel value of the training sample;

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of processing pedestrian re-recognition, the method comprising:

inputting an identification image to be re-identified into a pre-trained prediction neural network to obtain a plurality of sub-features of the identification image output by the prediction neural network and visibility confidence degrees corresponding to each sub-feature of the identification image; the identification image comprises a target object, a plurality of sub-features of the identification image correspond to a plurality of partial areas of the target object, and the partial areas of the target object form the target object; the visibility confidence is used for indicating the probability that the partial area of the target object corresponding to each sub-feature of the identification image is not blocked;

Determining features of the identification image according to a plurality of sub-features of the identification image and visibility confidence corresponding to each sub-feature of the identification image;

searching a target image containing the target object in a preset image database according to the characteristics of the identification image;

wherein, before the identification image to be re-identified is input into the pre-trained predictive neural network, the method further comprises:

extracting features of the training sample through the neural network to obtain features of the training sample, and dividing the features of the training sample into a plurality of sub-features;

Obtaining visibility confidence corresponding to each sub-feature of the training sample according to an activation function and a one-dimensional feature vector corresponding to each sub-feature of the training sample;

and training the neural network according to the visibility labels of the training samples and the visibility confidence degrees corresponding to each sub-feature of the training samples to obtain the prediction neural network.

2. The method of claim 1, wherein the acquiring a training sample set comprises:

acquiring a plurality of training samples;

and labeling each sub-feature of each training sample according to the pixel value to obtain the visibility label of each training sample.

3. The method according to claim 2, wherein labeling each sub-feature of each training sample according to the pixel value, to obtain a visibility label of each training sample, comprises:

calculating an average pixel value of the training sample;

and if the average pixel value of the partial region of the training object corresponding to the sub-feature of the training sample is smaller than or equal to the average pixel value of the training sample, marking the sub-feature of the training sample as invisible.

4. The method of claim 1, wherein the determining the features of the identification image based on the plurality of sub-features of the identification image and the visibility confidence level corresponding to each sub-feature of the identification image comprises:

5. The method according to claim 1, wherein the searching the preset image database for the target image containing the target object according to the features of the identification image includes:

finding out target features matched with the features of the identification image from the features of each candidate image;

and determining the candidate image corresponding to the target feature as the target image.

6. The method of claim 1, wherein the characteristic of the identification image is a whole-body characteristic of a pedestrian; the plurality of sub-features of the identification image includes an upper body feature and a lower body feature of the pedestrian.

7. A pedestrian re-recognition processing apparatus, characterized in that the apparatus comprises:

the visibility prediction module is used for inputting an identification image to be re-identified into a pre-trained prediction neural network to obtain a plurality of sub-features of the identification image output by the prediction neural network and visibility confidence degrees corresponding to each sub-feature of the identification image; the identification image comprises a target object, a plurality of sub-features of the identification image correspond to a plurality of partial areas of the target object, and the partial areas of the target object form the target object; the visibility confidence is used for indicating the probability that the partial area of the target object corresponding to each sub-feature of the identification image is not blocked;

the target image searching module is used for searching a target image containing the target object in a preset image database according to the characteristics of the identification image;

Wherein the apparatus further comprises:

the training module is used for extracting the characteristics of the training sample through the neural network to obtain the characteristics of the training sample, and dividing the characteristics of the training sample into a plurality of sub-characteristics; convolving the sub-features of the training sample by adopting a convolution layer with unshared weight, and obtaining a one-dimensional feature vector corresponding to each sub-feature of the training sample through a global average pooling layer and a full connection layer; obtaining visibility confidence corresponding to each sub-feature of the training sample according to an activation function and a one-dimensional feature vector corresponding to each sub-feature of the training sample; and training the neural network according to the visibility labels of the training samples and the visibility confidence degrees corresponding to each sub-feature of the training samples to obtain the prediction neural network.

8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.