CN112149470A

CN112149470A - Pedestrian re-identification method and device

Info

Publication number: CN112149470A
Application number: CN201910575126.4A
Authority: CN
Inventors: 魏艾
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2019-06-28
Filing date: 2019-06-28
Publication date: 2020-12-29
Anticipated expiration: 2039-06-28
Also published as: CN112149470B

Abstract

The embodiment of the application provides a pedestrian re-identification method and a device, wherein the method comprises the following steps: the method comprises the steps of obtaining an image to be recognized including a figure picture, conducting human body feature extraction on the image to be recognized to obtain human body feature data of the image to be recognized, conducting image segmentation on the image to be recognized to obtain image segmentation data of the image to be recognized, determining the image type of the image to be recognized according to the image segmentation data, if the image type is a human body local image, obtaining the human body local feature data of the image to be recognized based on a local feature extraction network corresponding to a pre-trained image type, the image segmentation data and the human body feature data, and determining whether a figure in the image to be recognized is a figure represented by the preset human body local feature data based on a matching result of the human body local feature data of the image to be recognized and the preset human body local feature data. Based on the above processing, the accuracy of the pedestrian re-recognition result can be improved.

Description

Pedestrian re-identification method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a pedestrian re-identification method and apparatus.

Background

Pedestrian re-identification (Person re-identification), also known as pedestrian re-identification, is a process of determining whether a Person displayed in an image (which may be referred to as an image to be identified) is a specific Person by using an image processing technology, and is widely applied in the fields of intelligent video monitoring, intelligent security and the like.

In the related art, the pedestrian re-identification may include the steps of: the method comprises the steps of extracting human body features of an image to be recognized to obtain human body feature data (which can be called as human body feature data to be recognized) of the image to be recognized, matching the human body feature data to be recognized with preset human body feature data, and determining whether a person in the image to be recognized is a person represented by the preset human body feature data according to a matching result.

In the above process, if the picture of the person in the image to be recognized is incomplete, the human body feature data to be recognized cannot effectively represent the local feature of the person. For example, when only the upper body of a person is displayed in the image to be recognized, the upper body feature of the person cannot be effectively represented by the human body feature data obtained by extracting the human body feature from the image to be recognized. Therefore, the accuracy of the pedestrian re-recognition result may be low by performing the pedestrian re-recognition directly according to the human body feature data.

Disclosure of Invention

The embodiment of the application aims to provide a pedestrian re-identification method and device, which can improve the accuracy of a pedestrian re-identification result. The specific technical scheme is as follows:

in a first aspect, in order to achieve the above object, an embodiment of the present application discloses a pedestrian re-identification method, including:

acquiring an image to be identified containing a figure picture;

extracting human body characteristics of the image to be recognized to obtain human body characteristic data of the image to be recognized;

performing image segmentation on the image to be recognized to obtain image segmentation data of the image to be recognized, wherein the image segmentation data is used for representing human body parts to which pixel points contained in the figure picture belong;

determining the image type of the image to be identified according to the image segmentation data;

if the image type is a human body local image, obtaining human body local feature data of the image to be recognized based on a local feature extraction network corresponding to the image type trained in advance, the image segmentation data and the human body feature data;

and determining whether the person in the image to be recognized is the person represented by the preset human body local characteristic data or not based on the matching result of the human body local characteristic data of the image to be recognized and the preset human body local characteristic data.

Optionally, the local feature extraction network includes a squeezing and excitation network SeNet;

the obtaining of the human body local feature data of the image to be recognized based on the pre-trained local feature extraction network corresponding to the image type, the image segmentation data and the human body feature data comprises:

inputting the human body characteristic data into the SeNet to obtain output data of the SeNet;

performing feature fusion on the output data of the SeNet and the image segmentation data, and performing convolution processing on a feature fusion result;

and performing weighting processing on the output data of the SeNet based on the convolution processing result to obtain the human body local feature data of the image to be recognized.

Optionally, the loss function of the local feature extraction network includes a difference between image feature data output by a preset network model and human body local feature data output by the local feature extraction network, and a difference between an actual output result of the local feature extraction network and an expected output result, the training sample of the preset network model includes an image having the same image type as the image to be recognized, and the preset network model is used to obtain human body local feature data of an image having the same image type as the image to be recognized.

Optionally, the determining, based on a matching result of the local human feature data of the image to be recognized and preset local human feature data, whether a person in the image to be recognized is a person represented by the preset local human feature data includes:

calculating the similarity between the human body local feature data of the image to be recognized and preset human body local feature data;

and determining whether the person in the image to be recognized is the person represented by the preset human body local feature data or not according to the obtained similarity.

Optionally, the method further includes:

and if the image type of the image to be recognized is the whole human body image, determining whether the person in the image to be recognized is the person represented by the preset human body characteristic data or not according to the human body characteristic data of the image to be recognized and the preset human body characteristic data.

Optionally, the determining the image type of the image to be recognized according to the image segmentation data includes:

if the proportion of the pixel points belonging to the upper half of the human body in the figure picture in the image to be recognized is larger than or equal to a first preset threshold value, and the proportion of the pixel points belonging to the lower half of the human body in the figure picture in the image to be recognized is smaller than a second preset threshold value, determining that the image type of the image to be recognized is a local image of the upper half of the human body;

if the proportion of the pixel points belonging to the upper half of the human body in the figure picture in the image to be recognized is smaller than the first preset threshold value, and the proportion of the pixel points belonging to the lower half of the human body in the figure picture in the image to be recognized is larger than or equal to the second preset threshold value, determining that the image type of the image to be recognized is a partial image of the human body of the lower half;

and if the pixel points of the figure picture belonging to the upper half of the human body account for more than or equal to the first preset threshold in the image to be recognized, and the pixel points of the figure picture belonging to the lower half of the human body account for more than or equal to the second preset threshold in the image to be recognized, determining that the image type of the image to be recognized is the whole human body image.

In order to achieve the above object, an embodiment of the present application discloses a pedestrian re-identification apparatus, including:

the acquisition module is used for acquiring an image to be identified containing a figure picture;

the extraction module is used for extracting human body characteristics of the image to be recognized to obtain human body characteristic data of the image to be recognized;

the segmentation module is used for carrying out image segmentation on the image to be recognized to obtain image segmentation data of the image to be recognized, wherein the image segmentation data is used for representing a human body part to which pixel points contained in the figure picture belong;

the determining module is used for determining the image type of the image to be identified according to the image segmentation data;

the first processing module is used for obtaining the human body local feature data of the image to be recognized based on a local feature extraction network corresponding to the image type trained in advance, the image segmentation data and the human body feature data if the image type is a human body local image;

and the identification module is used for determining whether the person in the image to be identified is the person represented by the preset human body local characteristic data or not based on the matching result of the human body local characteristic data of the image to be identified and the preset human body local characteristic data.

the first processing module is specifically configured to input the human body feature data into the SeNet to obtain output data of the SeNet;

Optionally, the identification module is specifically configured to calculate a similarity between the human body local feature data of the image to be identified and preset human body local feature data;

Optionally, the apparatus further comprises:

and the second processing module is used for determining whether the person in the image to be recognized is the person represented by the preset human body characteristic data or not according to the human body characteristic data of the image to be recognized and the preset human body characteristic data if the image type of the image to be recognized is the human body whole image.

Optionally, the determining module is specifically configured to determine that the image type of the image to be recognized is an upper-half body partial image if the proportion of the pixel points belonging to the upper half body of the human body in the character image in the image to be recognized is greater than or equal to a first preset threshold, and the proportion of the pixel points belonging to the lower half body of the human body in the character image in the image to be recognized is smaller than a second preset threshold;

In another aspect of this application, in order to achieve the above object, an embodiment of this application further discloses an electronic device, where the electronic device includes a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

the memory is used for storing a computer program;

the processor is configured to implement the pedestrian re-identification method according to the first aspect when executing the program stored in the memory.

In yet another aspect of this embodiment, there is also provided a computer-readable storage medium having stored therein instructions which, when run on a computer, implement the pedestrian re-identification method according to the first aspect described above.

In yet another aspect of this application, this application embodiment further provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the pedestrian re-identification method described in the above first aspect.

The embodiment of the application provides a pedestrian re-identification method, wherein if the image type of an image to be identified is a human body local image, the human body local feature data of the image to be identified is obtained based on a local feature extraction network corresponding to a pre-trained image type, image segmentation data of the image to be identified and human body feature data, and then whether a person in the image to be identified is a person represented by the preset human body local feature data is determined based on a matching result of the human body local feature data of the image to be identified and the preset human body local feature data. The image segmentation data and the human body characteristic data are based on, and the human body local characteristic data acquired through the local characteristic extraction network can effectively reflect the local characteristics of people in the image, so that the pedestrian re-identification is carried out according to the human body local characteristic data of the image to be identified and the preset human body local characteristic data, compared with the prior art that the pedestrian re-identification is directly carried out according to the human body characteristic data of the image to be identified, the accuracy of the pedestrian re-identification result can be improved.

Of course, not all advantages described above need to be achieved at the same time in the practice of any one product or method of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a pedestrian re-identification method according to an embodiment of the present application;

fig. 2 is a flowchart of a pedestrian re-identification method according to an embodiment of the present application;

fig. 3 is a structural diagram of a SeNet according to an embodiment of the present application;

fig. 4 is a structural diagram of a spatial selection network according to an embodiment of the present application;

fig. 5 is an architecture diagram of a feature extraction network according to an embodiment of the present application;

fig. 6 is a flowchart of an example of a pedestrian re-identification method according to an embodiment of the present application;

fig. 7 is a structural diagram of a pedestrian re-identification apparatus according to an embodiment of the present application;

fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In the prior art, if the person displayed in the image to be recognized is incomplete, the human body feature data obtained by extracting the features of the image to be recognized cannot effectively reflect the local features of the person, and further, the accuracy of the result of re-recognition of the pedestrian may be low by directly recognizing the pedestrian according to the human body feature data.

In order to solve the above problem, an embodiment of the present application provides a pedestrian re-identification method, which may be applied to an electronic device, where the electronic device may be a terminal or a server, and the electronic device is used to perform pedestrian re-identification on an image to be identified.

After the electronic device obtains the image to be recognized, the electronic device may perform image segmentation on the image to be recognized to obtain image segmentation data of the image to be recognized, and perform human body feature extraction on the image to be recognized to obtain human body feature data of the image to be recognized.

Then, the electronic device may determine the image type of the image to be recognized according to the image segmentation data of the image to be recognized. If the image type of the image to be recognized is a human body local image, the electronic device may extract a network, image segmentation data of the image to be recognized, and human body characteristic data based on a pre-trained local characteristic corresponding to the image type, to obtain the human body local characteristic data of the image to be recognized.

Furthermore, the electronic device may determine whether the person in the image to be recognized is the person represented by the preset human body local feature data based on a matching result of the human body local feature data of the image to be recognized and the preset human body local feature data.

Based on the image segmentation data and the human body characteristic data, the local characteristic data of the person in the image can be effectively embodied through the local characteristic data of the human body acquired by the local characteristic extraction network, therefore, the electronic equipment performs pedestrian re-identification according to the local characteristic data of the human body of the image to be identified and the preset local characteristic data of the human body, and compared with the prior art that the pedestrian re-identification is directly performed according to the local characteristic data of the image to be identified, the accuracy of the result of the pedestrian re-identification can be improved.

Referring to fig. 1, fig. 1 is a flowchart of a pedestrian re-identification method provided in an embodiment of the present application, where the method may include the following steps:

s101: and acquiring the image to be identified containing the figure picture.

The character picture in the image to be recognized may be a complete character picture or an incomplete character picture, for example, the character picture in the image to be recognized may be a picture of the upper half of the character, or the character picture in the image to be recognized may be a picture of the lower half of the character.

The number of images to be recognized may be one or plural. For example, the image to be recognized may be a certain image frame in the monitoring video image, or the image to be recognized may also be a plurality of image frames in the monitoring video image.

In the embodiment of the application, when a user needs to judge whether a person displayed in a certain image (i.e. an image to be recognized) is a specific person, the user can input the image to be recognized into the electronic device. For example, the user may input the image to be recognized to the electronic device through an input section of the electronic device.

Correspondingly, the electronic device can acquire the image to be identified, and further, the electronic device can judge whether the person in the image to be identified is the specific person according to the pedestrian re-identification method of the embodiment of the application.

If the image to be recognized is an image frame, the electronic device can process the image to be recognized according to the pedestrian re-recognition method of the embodiment of the application; if the image to be recognized is a plurality of image frames, the electronic device may sequentially process each image to be recognized according to the pedestrian re-recognition method of the embodiment of the application.

S102: and extracting human body characteristics of the image to be recognized to obtain human body characteristic data of the image to be recognized.

In the embodiment of the application, after the electronic device acquires the image to be recognized, the electronic device can extract the human body feature of the image to be recognized, and then the human body feature data of the image to be recognized can be obtained.

In this step, the method for extracting human body features of the image to be recognized by the electronic device is the same as the method in the prior art. In one implementation, the electronic device may process the image to be recognized according to a pre-trained human body feature extraction network, so as to obtain human body feature data of the image to be recognized. The human feature extraction Network may be a Resnet (Residual neural Network) 50 or other neural Network.

S103: and carrying out image segmentation on the image to be identified to obtain image segmentation data of the image to be identified.

The image segmentation data can be used for representing human body parts to which pixel points contained in the human figure picture belong.

In the embodiment of the application, after the electronic device acquires the image to be recognized, the electronic device can also perform feature extraction on the image to be recognized, and further, image segmentation data of the image to be recognized can be obtained.

In one implementation, according to a preset image segmentation method, the electronic device may use the identifier a to represent the pixel points belonging to the upper half of the human body in the image to be recognized, and use the identifier B to represent the pixel points belonging to the lower half of the human body in the image to be recognized, so as to obtain the image segmentation data of the image to be recognized.

In another implementation, according to a preset image segmentation method, the electronic device may use the identifier C to represent the pixel points belonging to the head of the human body in the image to be recognized, use the identifier D to represent the pixel points belonging to the trunk of the human body in the image to be recognized, and use the identifier E to represent the pixel points belonging to the limbs of the human body in the image to be recognized, so as to obtain image segmentation data of the image to be recognized.

The method for the electronic device to perform image segmentation on the image to be recognized is not limited to this, and in actual operation, the method for the electronic device to perform image segmentation on the image to be recognized may be determined by a user according to business requirements and experience.

S104: and determining the image type of the image to be identified according to the image segmentation data.

The image types may include a partial image of a human body and a whole image of the human body. The human body partial image can be further divided into an upper body partial image and a lower body partial image.

In the embodiment of the application, after the electronic device acquires the image segmentation data of the image to be recognized, the electronic device may determine the image type of the image to be recognized according to the image segmentation data of the image to be recognized.

Optionally, the electronic device may determine the image type of the image to be recognized according to the number of pixel points belonging to the upper half of the human body and the number of pixel points belonging to the lower half of the human body in the image to be recognized, that is, S104 may include the following three cases:

in the first case, if the proportion of the pixel points belonging to the upper half of the human body in the character picture in the image to be recognized is greater than or equal to the first preset threshold, and the proportion of the pixel points belonging to the lower half of the human body in the character picture in the image to be recognized is less than the second preset threshold, it is determined that the image type of the image to be recognized is the upper half human body partial image.

In the second case, if the proportion of the pixel points belonging to the upper half of the human body in the character picture in the image to be recognized is smaller than the first preset threshold, and the proportion of the pixel points belonging to the lower half of the human body in the character picture in the image to be recognized is greater than or equal to the second preset threshold, it is determined that the image type of the image to be recognized is the local image of the lower half of the human body.

And if the proportion of the pixel points belonging to the upper half of the human body in the character picture in the image to be recognized is greater than or equal to the first preset threshold value, and the proportion of the pixel points belonging to the lower half of the human body in the image to be recognized is greater than or equal to the second preset threshold value, determining that the image type of the image to be recognized is the whole human body image.

The first preset threshold and the second preset threshold may be set by a user according to experience.

In one implementation, the electronic device may count the number of pixel points belonging to the upper half of the human body (which may be referred to as a first number) and the number of pixel points belonging to the lower half of the human body (which may be referred to as a second number) in the image to be recognized according to image segmentation data of the image to be recognized.

If the ratio of the first number to the total number of the pixel points in the image to be recognized is greater than or equal to a first preset threshold value, and the ratio of the second number to the total number of the pixel points in the image to be recognized is less than a second preset threshold value, the electronic device may determine that the image to be recognized is the upper half human body partial image.

If the ratio of the first number to the total number of the pixel points in the image to be recognized is smaller than a first preset threshold value, and the ratio of the second number to the total number of the pixel points in the image to be recognized is greater than or equal to a second preset threshold value, the electronic device may determine that the image to be recognized is a lower half body partial image.

If the ratio of the first number to the total number of the pixel points in the image to be recognized is greater than or equal to a first preset threshold value, and the ratio of the second number to the total number of the pixel points in the image to be recognized is greater than or equal to a second preset threshold value, the electronic device may determine that the image to be recognized is the whole human body image.

S105: and if the image type is a human body local image, extracting the network, the image segmentation data and the human body characteristic data based on the local characteristic corresponding to the pre-trained image type to obtain the human body local characteristic data of the image to be recognized.

In the embodiment of the application, when the electronic device determines that the image to be recognized is the local image of the human body, the electronic device may extract the network according to the local feature corresponding to the image type of the image to be recognized, and process the image segmentation data and the human body feature data of the image to be recognized to obtain the local feature data of the human body of the image to be recognized.

It can be understood that, if the image to be recognized is the upper half body human body partial image, the electronic device may process the image segmentation data and the human body feature data of the image to be recognized according to a local feature extraction network (which may be referred to as a first local feature extraction network) corresponding to the upper half body human body partial image, so as to obtain the human body partial feature data of the image to be recognized. The obtained human body local feature data (which can be called as upper half body feature data) can effectively represent the upper half body features of the people in the image to be recognized.

If the image to be recognized is a partial image of the lower body, the electronic device may process the image segmentation data and the human body feature data of the image to be recognized according to a local feature extraction network (which may be referred to as a second local feature extraction network) corresponding to the partial image of the lower body, so as to obtain the human body partial feature data of the image to be recognized. The obtained human body local feature data (which may be referred to as lower-body feature data) can effectively represent the lower-body features of the person in the image to be recognized.

S106: and determining whether the person in the image to be recognized is the person represented by the preset human body local characteristic data or not based on the matching result of the human body local characteristic data of the image to be recognized and the preset human body local characteristic data.

The preset human body local feature data can be human body local feature data of a specific person image. The specific personal image may be a personal image in a preset gallery.

In the application embodiment, after the electronic device obtains the human body local feature data of the image to be recognized, the electronic device may match the human body local feature data of the image to be recognized with the preset human body local feature data, and determine whether a person in the image to be recognized is a person represented by the preset human body local feature data according to a matching result.

In one implementation, for each person image in the preset gallery, the electronic device may perform human feature extraction on the person image to obtain human feature data of the person image, and perform image segmentation on the person image to obtain image segmentation data of the person image.

The electronic device may then process the human body feature data and the image segmentation data of the human image to obtain upper-half body feature data of the human image based on the first local feature extraction network, process the human body feature data and the image segmentation data of the human image to obtain lower-half body feature data of the human image based on the second local feature extraction network, and further obtain upper-half body feature data and lower-half body feature data of each human image in a preset gallery.

If the image to be recognized is the upper half body human body local image, the electronic equipment can match the upper half body characteristic data of the image to be recognized with the upper half body characteristic data of each human body image in the preset image library, and determine whether the person in the image to be recognized is the person to which the human body image in the preset image library belongs according to the matching result.

If the image to be recognized is a partial image of the lower body of the human body, the electronic device may match the lower body feature data of the image to be recognized with the lower body feature data of each human image in the preset image library, and determine whether the person in the image to be recognized is the person to which the human image in the preset image library belongs according to the matching result.

Optionally, the electronic device may perform pedestrian re-identification according to the similarity of the feature data, and referring to fig. 2, S106 may include the following steps:

s1061: and calculating the similarity between the human body local characteristic data of the image to be recognized and the preset human body local characteristic data.

In the embodiment of the application, after the electronic device acquires the human body local feature data of the image to be recognized, the electronic device can calculate the human body local feature data of the image to be recognized, and the similarity between the human body local feature data of the image to be recognized and the preset human body local feature data is convenient for subsequent processing.

In this step, the electronic device may calculate the human body local feature data of the image to be recognized and the similarity of the preset human body local feature data according to a preset similarity algorithm. For example, the electronic device may calculate the cosine similarity between the local human feature data of the image to be recognized and the preset local human feature data, as the similarity between the local human feature data of the image to be recognized and the preset local human feature data.

In one implementation, if the image to be recognized is an upper-half body local image, the electronic device may calculate a similarity between the upper-half body feature data of the image to be recognized and the upper-half body feature data of each human image in the preset gallery (which may be referred to as an upper-half body similarity).

If the image to be recognized is a partial image of the lower body of the human body, the electronic device may calculate the similarity between the lower body feature data of the image to be recognized and the lower body feature data of each human body image in the preset image library (which may be referred to as lower body similarity).

S1062: and determining whether the person in the image to be recognized is the person represented by the preset human body local feature data or not according to the obtained similarity.

In the embodiment of the application, the electronic device may determine whether the person in the image to be recognized is the person represented by the preset human body local feature data according to the similarity between the human body local feature data of the image to be recognized and the preset human body local feature data.

In one implementation, if the image to be recognized is a local image of an upper half body of a human body, and the calculated upper half body similarity includes an upper half body similarity greater than a first preset similarity threshold, the electronic device may determine that a person in the image to be recognized is a person to which a person image in a preset gallery belongs. In addition, the electronic equipment can also determine a person image with the highest similarity with the upper half-body feature data of the image to be recognized in the preset image library, and determine the person in the image to be recognized as the person in the person image.

If the image to be recognized is a partial image of the lower body of the human body, and the calculated similarity of the lower body is greater than a second preset similarity threshold, the electronic device may determine that the person in the image to be recognized is the person to which the person image in the preset image library belongs. In addition, the electronic equipment can also determine a person image with the highest similarity with the lower-body feature data of the image to be recognized in the preset image library, and determine the person in the image to be recognized as the person in the person image.

Optionally, to further improve the accuracy of pedestrian re-identification, the local feature extraction network may include a SeNet (squaze-and-Excitation network). Accordingly, S105 may include the steps of:

step one, inputting human body characteristic data into SeNet to obtain output data of SeNet.

In the embodiment of the application, the electronic device can input the human body feature data of the image to be recognized into the pre-trained SeNet to obtain the output data of the SeNet.

The structure of SeNet can be seen in fig. 3. The human body feature data of the image to be recognized may include C feature maps, each of which is H × W. The number of channels of the neural network for extracting the human body features of the image to be recognized can be represented by C, and the numerical value of C can be set by a user according to experience. H is the height of the image to be recognized, and W is the width of the image to be recognized.

C H multiplied by W feature maps are subjected to global pooling to obtain a vector of 1 multiplied by C. The 1 × 1 × C vector passes through full Connected Layers (FC) to obtain a 1 × 1 × C/r vector, where r is a preset coefficient, and the value of r may be set by a user according to experience. Then, the 1 × 1 × C/r vector is passed through a modified Linear Unit (ReLU) and a full connection layer to obtain a 1 × 1 × C vector. The resulting 1 × 1 × C vector is processed according to the activation function, still resulting in a 1 × 1 × C vector. Then, the human body feature data of the image to be recognized can be weighted (Scale) according to the vector of 1 × 1 × C obtained by the activation function, so as to obtain C H × W feature maps after weighting. Wherein the activation function may be a Sigmoid function.

The C number values in the 1 × 1 × C vector obtained from the activation function may represent the weighted weights of the C H × W feature maps, respectively. In the weighting process, each H × W feature map may be multiplied by a weighting weight corresponding to the H × W feature map, and C H × W feature maps after the weighting process may be obtained.

Because different parts of the human body have different influences on different channels, the C weighting weights obtained by the pre-trained SeNet can reflect the influence degree of the different parts of the human body on the channels, and further, the characteristic maps are weighted on the channels according to the different parts of the human body, and the C H multiplied by W characteristic maps after weighting can more effectively reflect the characteristics of the different parts of the human body.

And step two, performing feature fusion on the output data of the SeNet and the image segmentation data, and performing convolution processing on a feature fusion result.

In the embodiment of the application, after the electronic device obtains the output data of SeNet, the electronic device can perform feature fusion on the output data of SeNet and the image segmentation data of the image to be recognized, and perform convolution processing on the result of the feature fusion.

In one implementation, the electronic device may input the output data of the SeNet and the image segmentation data of the image to be recognized to a fusion (fusion) layer of the network for feature fusion, and then, the electronic device may input a result of the feature fusion to a Convolution (Convolution) layer of the network to perform Convolution processing on the result of the feature fusion and obtain a result of the Convolution processing.

And thirdly, weighting the output data of the SeNet based on the convolution processing result to obtain the human body local characteristic data of the image to be recognized.

In the embodiment of the application, after the result of the convolution processing is obtained, the electronic device may perform weighting processing on the output data of the SeNet based on the result of the convolution processing to obtain the human body local feature data of the image to be recognized.

Based on the second step and the third step, the image segmentation data can represent different image areas (also referred to as different image spaces in the image to be recognized) in the image to be recognized, and the electronic device processes the image segmentation data and the human body feature data of the image to be recognized according to the neural network, so that the neural network can learn how to use the image segmentation data and the human body feature data of the image, spatially weight the feature map according to different parts of the human body, and the obtained human body local feature data can effectively represent the local features of the human body.

In one implementation, the network part (which may be referred to as a spatial selection network) corresponding to the second step and the third step can be referred to in fig. 4. The feature data may be C H × W feature maps obtained in the first step after weighting processing, and the image segmentation data may be H × W × 1 vectors. After the fusion layer and the convolution layer, a vector of H × W × 1 can be obtained. H multiplied by W values in the obtained vector can respectively represent the weighting weight of each pixel point in the feature map.

For each feature map obtained in the first step, the electronic device may multiply the feature value of each pixel in the feature map by the weighting weight corresponding to the pixel in the H × W × 1 vector obtained in the second step, so as to obtain a feature map after weighting processing.

Then, the electronic device can obtain the characteristic map subjected to channel weighting and spatial weighting, and the obtained characteristic map can effectively embody the local characteristics of the human body.

It can be understood that, in the actual operation, after the feature map is obtained, the electronic device may further perform pooling, feature encoding, and other operations on the feature map to obtain corresponding character features.

In addition, in order to further improve the accuracy of the local feature extraction network, the electronic device may further instruct a training process of the local feature extraction network based on a teacher network.

Optionally, the loss function of the local feature extraction network may include a difference between image feature data output by the preset network model and human body local feature data output by the local feature extraction network, and a difference between an actual output result of the local feature extraction network and an expected output result, the training sample of the preset network model includes an image having the same image type as the image to be recognized, and the preset network model is used to obtain the human body local feature data of the image having the same image type as the image to be recognized.

In the application embodiment, the electronic device can use the output result of the preset network model as a teacher signal for guiding the training process of the local feature extraction network, and the preset network model has a more complex structure and can effectively extract the local feature data of the human body. The preset network model may include a network part (may be referred to as an upper-body teacher network) for extracting characteristic data of the upper body of the human body, and a network part (may be referred to as a lower-body teacher network) for extracting characteristic data of the lower body of the human body.

The expected output result of the local feature extraction network may be a sample identifier of the training sample, where the sample identifier is used to represent a person to which the training sample belongs.

The electronic device may obtain a difference value (which may be referred to as a first difference value) between image feature data output by the preset network model and human body local feature data output by the local feature extraction network, and a difference value (which may be referred to as a second difference value) between an actual output result and an expected output result of the local feature extraction network, and train the local feature extraction network according to the first difference value and the second difference value. In one implementation, the first difference value may be represented by an Euclidean metric (Euclidean Distance), and the second difference value may be represented by a softmax loss.

For example, the electronic device may train the upper half teacher network, calculate a euclidean metric of image feature data output by the upper half teacher network and human body local feature data output by the first local feature extraction network, and a softmax loss of an actual output result and an expected output result of the first local feature extraction network, and train the first local feature extraction network according to a sum of the obtained euclidean metric and the softmax loss. The input data of the upper half teacher network can be the upper half human body local image of the pedestrian, the output data can be the pedestrian information of the pedestrian, and the upper half teacher network is used for acquiring the upper half characteristic data.

The electronic device may train the lower teacher network, calculate an euclidean metric of image feature data output by the lower teacher network and human body local feature data output by the second local feature extraction network, and a softmax loss of an actual output result and an expected output result of the first local feature extraction network, and train the second local feature extraction network according to a sum of the obtained euclidean metric and the softmax loss. The input data of the lower body teacher network may be a partial image of the lower body of the pedestrian, the output data may be pedestrian information of the pedestrian, and the lower body teacher network is used for acquiring the lower body characteristic data.

Based on the processing, the upper body teacher network can effectively extract the characteristics of the upper body of the human body, and the upper body teacher network is used for guiding the first local characteristic extraction network to train, so that the first local characteristic extraction network can also effectively extract the characteristics of the upper body of the human body; similarly, the lower body teacher network can effectively extract the characteristics of the lower body of the human body, and the second local characteristic extraction network is guided to train by the lower body teacher network, so that the second local characteristic extraction network can also effectively extract the characteristics of the lower body of the human body.

In addition, the electronic device may further perform pedestrian re-identification on the whole human body, and optionally, the method may further include the following steps:

In the embodiment of the application, when the electronic device determines that the image to be recognized is the whole human body image, the electronic device may determine whether a person in the image to be recognized is a person represented by the preset human body feature data according to the human body feature data of the image to be recognized and the preset human body feature data.

In one implementation, the electronic device may calculate similarity between the human characteristic data of the image to be recognized and the human characteristic data of each human image in the preset gallery, and then, the electronic device may determine whether the person in the image to be recognized is the person to which the human image in the preset gallery belongs according to the similarity.

The method for determining whether the person in the image to be recognized is the person belonging to the person image in the preset gallery by the electronic device according to the similarity is similar to the step S1062, and details are not repeated here.

In another implementation, the electronic device may perform weighting processing on the human body feature data of the image to be recognized based on the image segmentation data of the image to be recognized to obtain weighted human body feature data (which may be referred to as first human body feature data), and for each person picture in the preset map library, the electronic device may also perform weighting processing on the human body feature data of the person picture based on the image segmentation data of the person picture to obtain weighted human body feature data (which may be referred to as second human body feature data), and then the electronic device may calculate a similarity between the first human body feature data and each second human body feature data, and further determine whether the person in the image to be recognized belongs to the person in the preset map library according to each similarity.

For example, the human feature data of the image to be recognized may include C feature maps, each of which is H × W. H is the height of the image to be recognized, and W is the width of the image to be recognized. The image segmentation data of the image to be identified can be H multiplied by W vectors, and H multiplied by W values in the vectors can respectively represent the weighting weight of each pixel point in the feature map.

For each feature map, the electronic device may multiply the feature value of each pixel in the feature map by the weighting weight corresponding to the pixel in the image segmentation data, so as to obtain the weighted overall feature data (i.e., the first human body feature data).

The method for acquiring the second human body characteristic data by the electronic device is similar to the method for acquiring the first human body characteristic data, and is not described herein again.

With reference to fig. 5, fig. 5 is a structural diagram of a feature extraction network according to the embodiment of the present application.

The backbone network may perform human feature extraction on the input image to obtain human feature data of the image, and may be the human feature extraction network.

If the image is a whole human body image, the human body feature data of the image is subjected to pooling and feature coding to obtain the whole body features of the people in the image.

If the image is the upper half body human body local image, the human body characteristic data of the image is subjected to upper half body channel selection processing to obtain characteristic data after weighting processing on the channel. The upper body channel selection may be SeNet in the above embodiment.

Then, the upper body space selection processing may be performed on the processing result of the upper body channel selection and the image segmentation data of the image, so as to obtain feature data subjected to the spatial weighting processing. The upper body space selection may be the space selection network of fig. 4 in the above embodiment. Similarly, the upper body features of the people in the image can be obtained by pooling and feature coding the processing result of the upper body space selection.

The network part formed by upper body channel selection, upper body space selection, pooling and feature coding can be called an upper body feature extraction network.

In fig. 5, the process of acquiring the lower-body feature of the person in the image is similar to the process of acquiring the upper-body feature of the person in the image, and is not described herein again. Similarly, the network portion formed by the lower body channel selection, the lower body space selection, the pooling and the feature encoding may be referred to as a lower body feature extraction network.

Referring to fig. 6, fig. 6 is a flowchart of an example of a pedestrian re-identification method provided in an embodiment of the present application, where the method may include the following steps:

s601: and acquiring the image to be identified containing the figure picture.

S602: and extracting human body characteristics of the image to be recognized to obtain human body characteristic data of the image to be recognized.

S603: and carrying out image segmentation on the image to be identified to obtain image segmentation data of the image to be identified.

The image segmentation data is used for representing human body parts to which pixel points contained in the character picture belong.

S604: and determining the image type of the image to be recognized according to the image segmentation data of the image to be recognized.

Wherein the image type is an upper half body human body partial image, a lower half body human body partial image or a human body whole image.

S605: and if the image to be recognized is the upper half body human body local image, processing the image segmentation data and the human body characteristic data of the image to be recognized based on the pre-trained upper half body characteristic extraction network to obtain the upper half body characteristic data of the person in the image to be recognized.

S606: calculating the upper half body characteristic data of the person in the image to be recognized, calculating the similarity between the upper half body characteristic data and the preset upper half body characteristic data, and determining whether the person in the image to be recognized is the person represented by the preset upper half body characteristic data or not according to the obtained similarity.

S607: and if the image to be recognized is a partial image of the lower half body of the human body, processing image segmentation data and human body characteristic data of the image to be recognized based on a lower half body characteristic extraction network trained in advance to obtain the lower half body characteristic data of the person in the image to be recognized.

S608: and calculating the lower-half characteristic data of the character of the image to be recognized, the similarity between the character of the image to be recognized and the preset lower-half characteristic data, and determining whether the character in the image to be recognized is the character represented by the preset lower-half characteristic data according to the obtained similarity.

S609: and if the image to be recognized is the whole human body image, calculating human body characteristic data of the image to be recognized, and determining whether the person in the image to be recognized is the person represented by the preset human body characteristic data according to the obtained similarity.

Corresponding to the embodiment of the method in fig. 1, referring to fig. 7, fig. 7 is a block diagram of a pedestrian re-identification apparatus provided in the embodiment of the present application, where the apparatus may include:

an obtaining module 701, configured to obtain an image to be identified, where the image includes a character picture;

an extraction module 702, configured to perform human body feature extraction on the image to be recognized to obtain human body feature data of the image to be recognized;

a segmentation module 703, configured to perform image segmentation on the image to be recognized to obtain image segmentation data of the image to be recognized, where the image segmentation data is used to represent a human body part to which a pixel point included in the person picture belongs;

a determining module 704, configured to determine an image type of the image to be identified according to the image segmentation data;

a first processing module 705, configured to, if the image type is a human body local image, obtain human body local feature data of the image to be recognized based on a local feature extraction network corresponding to the image type trained in advance, the image segmentation data, and the human body feature data;

the identifying module 706 is configured to determine whether a person in the image to be identified is a person represented by preset human body local feature data based on a matching result of the human body local feature data of the image to be identified and the preset human body local feature data.

the first processing module 705 is specifically configured to input the human body feature data into the SeNet to obtain output data of the SeNet;

Optionally, the identifying module 706 is specifically configured to calculate a similarity between the local human body feature data of the image to be identified and preset local human body feature data;

Optionally, the apparatus further comprises:

Optionally, the determining module 704 is specifically configured to determine that the image type of the image to be recognized is the partial image of the upper half body if the proportion of the pixel points belonging to the upper half body of the human body in the human figure image in the image to be recognized is greater than or equal to a first preset threshold, and the proportion of the pixel points belonging to the lower half body of the human body in the human figure image in the image to be recognized is smaller than a second preset threshold;

The embodiment of the present application further provides an electronic device, as shown in fig. 8, which includes a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete mutual communication through the communication bus 804,

a memory 803 for storing a computer program;

the processor 801 is configured to implement the following steps when executing the program stored in the memory 803:

acquiring an image to be identified containing a figure picture;

The communication bus 804 mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus 804 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface 802 is used for communication between the above-described electronic apparatus and other apparatuses.

The Memory 803 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory 803 may also be at least one storage device located remotely from the aforementioned processor.

The Processor 801 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The electronic equipment provided by the embodiment of the application carries out pedestrian heavy identification, according to the human body local characteristic data of waiting to discern the image to and predetermine human body local characteristic data and carry out pedestrian heavy identification, directly carry out pedestrian heavy identification according to the human body characteristic data of waiting to discern the image for prior art, can improve the degree of accuracy of pedestrian heavy identification result.

The embodiment of the present application further provides a computer-readable storage medium, in which instructions are stored, and when the computer-readable storage medium runs on a computer, the computer is enabled to execute the pedestrian re-identification method provided by the embodiment of the present application.

Specifically, the pedestrian re-identification method includes:

acquiring an image to be identified containing a figure picture;

It should be noted that other implementation manners of the pedestrian re-identification method are partially the same as those of the embodiment of the foregoing method, and are not described again here.

By operating the instruction stored in the computer-readable storage medium provided by the embodiment of the application, when the pedestrian re-identification is carried out, the pedestrian re-identification is carried out according to the human body local characteristic data of the image to be identified and the preset human body local characteristic data, compared with the prior art that the pedestrian re-identification is directly carried out according to the human body characteristic data of the image to be identified, the accuracy of the pedestrian re-identification result can be improved.

The embodiment of the present application further provides another computer program product containing instructions, which when run on a computer, causes the computer to execute the pedestrian re-identification method provided by the embodiment of the present application.

Specifically, the pedestrian re-identification method includes:

acquiring an image to be identified containing a figure picture;

Through operating the computer program product provided by the embodiment of the application, when the pedestrian re-identification is carried out, the pedestrian re-identification is carried out according to the human body local characteristic data of the image to be identified and the preset human body local characteristic data, compared with the prior art, the pedestrian re-identification is directly carried out according to the human body characteristic data of the image to be identified, and the accuracy of the pedestrian re-identification result can be improved.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A pedestrian re-identification method, the method comprising:

acquiring an image to be identified containing a figure picture;

2. The method of claim 1, wherein the local feature extraction network comprises a squeeze and excite network, SeNet;

3. The method according to claim 1 or 2, wherein the loss function of the local feature extraction network comprises a difference value between image feature data output by a preset network model and human body local feature data output by the local feature extraction network, and a difference value between an actual output result and an expected output result of the local feature extraction network, the training sample of the preset network model comprises an image of the same image type as the image to be recognized, and the preset network model is used for acquiring the human body local feature data of the image of the same image type as the image to be recognized.

4. The method of claim 1, wherein the determining whether the person in the image to be recognized is the person represented by the preset human body local feature data based on the matching result of the human body local feature data of the image to be recognized and the preset human body local feature data comprises:

5. The method of claim 1, further comprising:

6. The method of claim 1, wherein determining the image type of the image to be recognized according to the image segmentation data comprises:

7. A pedestrian re-identification apparatus, the apparatus comprising:

8. The apparatus of claim 7, wherein the local feature extraction network comprises a squeeze and excite network (SeNet);

9. The apparatus according to claim 7 or 8, wherein the loss function of the local feature extraction network comprises a difference between image feature data output by a preset network model and human body local feature data output by the local feature extraction network, and a difference between an actual output result and an expected output result of the local feature extraction network, the training sample of the preset network model comprises an image of the same image type as the image to be recognized, and the preset network model is used for obtaining human body local feature data of an image of the same image type as the image to be recognized.

10. The device according to claim 7, wherein the recognition module is specifically configured to calculate a similarity between the local human body feature data of the image to be recognized and preset local human body feature data;

11. The apparatus of claim 7, further comprising:

12. The apparatus according to claim 7, wherein the determining module is specifically configured to determine that the image type of the image to be recognized is an upper-half body partial image if the ratio of the pixels belonging to the upper half of the body in the human figure image in the image to be recognized is greater than or equal to a first preset threshold, and the ratio of the pixels belonging to the lower half of the body in the human figure image in the image to be recognized is less than a second preset threshold;

13. An electronic device, comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory communicate with each other via the communication bus;

the memory is used for storing a computer program;

the processor, when executing the program stored in the memory, implementing the method steps of any of claims 1-6.

14. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.