CN115690840A

CN115690840A - Pedestrian re-identification method, terminal equipment and storage medium

Info

Publication number: CN115690840A
Application number: CN202211308979.XA
Authority: CN
Inventors: 王宗跃; 谢道顺; 陈屹东; 苏锦河; 陈文平; 陈智鹏
Original assignee: Shenzhen Saiwei Network Technology Co ltd; Jimei University
Current assignee: Shenzhen Saiwei Network Technology Co ltd; Jimei University
Priority date: 2022-10-25
Filing date: 2022-10-25
Publication date: 2023-02-03

Abstract

The invention relates to a pedestrian re-identification method, terminal equipment and a storage medium, wherein the method comprises the following steps: extracting multi-channel characteristics of each image of the image data set; extracting local features of each image of the image dataset; merging the multichannel characteristics and the local characteristics into global characteristics, and constructing the global characteristics of all images in the image data set into a characteristic library of the image data set; extracting high-level semantic features of a pedestrian image to be inquired; and calculating the similarity between the high-level semantic features of the pedestrian image to be inquired and the global features of the images in the image data set, and taking the image in the image data set with the highest similarity as the same person image of the pedestrian image to be inquired. The invention improves the problem of global representation capability limitation existing in the current deep learning pedestrian re-identification network, and simultaneously improves the problem of errors caused by semantic misalignment due to human body space dislocation.

Description

Pedestrian re-identification method, terminal equipment and storage medium

Technical Field

The invention relates to the field of computer vision, in particular to a pedestrian re-identification method, terminal equipment and a storage medium.

Background

The purpose of pedestrian re-identification is to retrieve the presence of a particular pedestrian in an image or video sequence using computer vision techniques. The pedestrian re-identification task is an important component of an intelligent monitoring system. The early human recognition task consists of two parts: feature learning and metric learning. Feature learning is used to extract invariant features representing persons of the same identity, while metric learning is used to measure the similarity between portraits.

The pedestrian re-recognition task aims at capturing images of the same person at different physical locations. This task is challenging due to large changes in person pose and viewing angle, imperfect person detection, cluttered backgrounds, occlusion, and differences in lighting. Some deep learning approaches lack the focus on local differences and have no explicit mechanism to solve the misalignment problem, and the ability to globally characterize is limited. Furthermore, gesture-based body part alignment does not achieve satisfactory alignment. Due to the spatial misalignment of the human body, the same spatial position does not correspond to the same semantic.

Disclosure of Invention

In order to solve the above problem, the present invention provides a pedestrian re-identification method, a terminal device and a storage medium.

The specific scheme is as follows:

a pedestrian re-identification method comprises the following steps:

s1: acquiring an image data set consisting of pedestrian images corresponding to different persons, acquiring different images belonging to the same person as the images of the image data set as to-be-inquired pedestrian images, labeling the to-be-inquired pedestrian images on the same person images of the image data set, and constructing a training set based on the labeled to-be-inquired pedestrian images and the image data set;

s2: constructing a pedestrian re-identification network model, taking the image data set and the pedestrian image to be inquired as the input of the model, and outputting the model as the same figure image of the pedestrian image to be inquired in the image data set; training the model through a training set;

the realization process of the model comprises the following steps:

s21: extracting multi-channel characteristics of each image of the image data set;

s22: extracting local features of each image of the image dataset;

s23: merging the multichannel characteristics and the local characteristics into global characteristics, and constructing the global characteristics of all images in the image data set into a characteristic library of the image data set;

s24: extracting high-level semantic features of the pedestrian image to be inquired;

s25: calculating the similarity between the high-level semantic features of the pedestrian image to be inquired and the global features of the images in the image data set, and taking the image in the image data set with the highest similarity as the same character image of the pedestrian image to be inquired;

s3: and identifying the pedestrian image to be inquired through the trained pedestrian re-identification network model.

Further, the process of extracting the multi-channel features in step S21 includes the following steps:

s211: inputting the image into a DenseNet network to obtain a characteristic diagram of the image;

s212: inputting the feature map into a maximum pooling layer to obtain a first feature vector;

s213: inputting the feature map into an average pooling layer to obtain a second feature vector;

s214: and adding the first feature vector and the second feature vector to obtain the multichannel feature.

Further, the local feature extraction process in step S22 includes the following steps:

s221: uniformly cutting each image into a plurality of sub-images;

s222: inputting the subgraphs into a basic convolutional neural network to obtain a characteristic graph of each subgraph;

s223: inputting the feature graph of the subgraph into the maximum pooling layer to obtain subgraph level features, and inputting the feature graph of the subgraph into a multilayer perceptron to obtain a correlation matrix of the subgraph;

s224: inputting the subgraph level features and the correlation matrix of the subgraphs into a graph convolution neural network to obtain the incidence relation between the subgraphs;

s225: and inputting the association relation between the subgraphs into a multilayer perceptron to obtain the local features of the image.

Further, the extraction process of the high-level semantic features in step S24 includes the following steps:

s241: extracting 5-dimensional feature vectors of the pedestrian image to be inquired;

s242: inputting the 5-dimensional feature vector into a feature pyramid network to obtain semantic features of the pedestrian image to be inquired;

s243: based on the semantic features of the pedestrian image to be inquired, grouping the image into super pixels by adopting a clustering algorithm;

s244: and extracting high-level semantic features of the pedestrian image to be inquired by adopting a multi-head attention mechanism based on the super-pixel grouping result.

Further, the 5-dimension in step S241 includes two coordinates and three primary colors in the coordinate system.

A pedestrian re-identification terminal device comprises a processor, a memory and a computer program stored in the memory and operable on the processor, wherein the processor executes the computer program to implement the steps of the method of the embodiment of the invention.

A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above for an embodiment of the invention.

By adopting the technical scheme, the invention improves the problem of global representation capability limitation existing in the current deep learning pedestrian re-identification network, and simultaneously improves the problem of errors caused by semantic misalignment due to human body spatial dislocation.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present invention.

Fig. 2 is a flowchart illustrating a process of implementing a model according to an embodiment of the present invention.

Detailed Description

To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. With these references, one of ordinary skill in the art will appreciate other possible embodiments and advantages of the present invention.

The invention will now be further described with reference to the drawings and the detailed description.

The first embodiment is as follows:

the embodiment of the invention provides a pedestrian re-identification method, which comprises the following steps as shown in figure 1:

s1: acquiring an image data set X = { I } consisting of pedestrian images corresponding to different persons ₁ ,I ₂ ,…,I _N And N represents the number of images in the input image data set X, different images belonging to the same person as the images in the image data set are collected at the same time to serve as the images of the pedestrian to be inquired, the images of the pedestrian to be inquired are marked on the images of the same person in the image data set, and a training set is constructed based on the marked images of the pedestrian to be inquired and the image data set.

S2: and constructing a pedestrian re-identification network model, taking the image data set and the pedestrian image to be inquired as the input of the model, and outputting the model as the same figure image of the pedestrian image to be inquired in the image data set. The model is trained through a training set.

As shown in fig. 2, the implementation process of the model includes:

s21: and extracting the multichannel characteristics of each image in the image data set.

The extraction process of the multi-channel features in the embodiment comprises the following steps

S211: inputting the image I into a DenseNet network to obtain a characteristic diagram F of the image _M ＝DenseNet(Ι)，

H represents the height of the image, W represents the width of the image, and d represents the number of characteristic channels.

S212: inputting the feature map into the maximum pooling layer to obtain a first feature vector F _Max ＝Maxpooling(F _M )。

S213: inputting the feature map into an average pooling layer to obtain a second feature vector F _Avg ＝Avgpooling(F _M )。

S214: the first feature vector F _Max And a second feature vector F _Avg Adding to obtain multi-channel features

S22: local features are extracted for each image of the image dataset.

The process of extracting the local features in this embodiment includes the following steps:

s221: uniformly cutting each image I into a plurality of sub-images I = { s = {(s) } ₁ ,s ₂ ,…,s _k }，

h represents the height of the subgraph, w represents the width of the subgraph, and k represents the number of the subgraphs.

S222: inputting the subgraphs into a basic convolutional neural network to obtain a feature graph f of each subgraph _M ＝CNN(s _k )。

S223: inputting the feature graph of the subgraph into the maximum pooling layer to obtain a subgraph-level feature f _k ＝Maxpooling(f _M )，

Inputting the feature graph of the subgraph into a multilayer perceptron to obtain a correlation matrix of the subgraph

S224: after the subgraph level features and the correlation matrix of the subgraphs are input into a graph convolution neural network, the incidence relation F between the subgraphs is obtained _R ＝GCN(f _k ,M)。

S225: relating relationship F between subgraphs _R Inputting the multilayer perceptron to obtain local features F of the image I _local ＝MLP(F _R )。

S23: multiple channel feature F _merge And local feature F _local Merge into a global feature F _global ＝Concat(F _merge ,F _local ) Constructing global characteristics of all images of the image data set into a characteristic library of the image data set

S24: and extracting high-level semantic features of the pedestrian image to be inquired.

The process for extracting the high-level semantic features in the embodiment comprises the following steps:

s241: extracting pedestrian image I to be inquired _q 5-dimensional feature vector of (a), the 5-dimensional feature vector in this embodiment includes two coordinates (x, y) and three primary colors (R, G, B) in a coordinate system.

S242: inputting the 5-dimensional feature vector into a feature pyramid network to obtain a pedestrian image I to be inquired _q Semantic feature F of _sem ＝FPN(F _a )。

S243: based on waiting to inquire pedestrian's image I _q Semantic feature F of _sem Grouping the image into superpixels P by adopting a clustering algorithm _super 。

Clustering algorithm in this embodiment, a Kmeans algorithm is used.

S244: based on superpixels P _super Grouping results, using a multi-head attention mechanismObtaining high-level semantic features F of pedestrian image to be inquired _q ＝MHAttention(P _super )。

S25: and calculating the similarity between the high-level semantic features of the pedestrian image to be inquired and the global features of the images in the image data set, and taking the image in the image data set with the highest similarity as the same person image of the pedestrian image to be inquired.

Model loss function in this embodiment

Using triple losses

And cross entropy loss

Namely, it is

Wherein 0 < λ < 1 represents a weight coefficient.

Loss of triad

The calculation formula of (2) is as follows:

wherein, the first and the second end of the pipe are connected with each other,

represents the way in which the distance metric is measured,

represent and F _q A positive sample belonging to the same character,

represents and F _q Negative examples belonging to different characters, m represents an edge parameter.

The cross-entropy loss comprises two parts of a first cross-entropy loss and a second cross-entropy loss, wherein:

the first cross entropy loss is used for calculating the accuracy of multi-channel feature extraction of each image of the image data set, and in order to calculate the first cross entropy loss, the model further comprises: multiple channel feature F _merge And after the full connection layer is input, the classification probability of the image is obtained through a Softmax function.

The second cross entropy loss is used for calculating the accuracy of the same person image identification of the pedestrian image to be inquired.

According to the embodiment of the invention, the global features are obtained by combining the multi-channel features and the local features, and the global features of each image are jointly formed into the feature library of the image data set, so that the problem of global characterization capability limitation existing in the current deep learning pedestrian re-identification network is solved. In addition, the embodiment of the invention extracts the semantic features of the pedestrian image to be inquired from the feature pyramid network, divides the image into super pixels by a clustering algorithm, and further extracts high-level semantic features by utilizing multi-head attention. Therefore, the embodiment of the invention improves the problem of errors caused by semantic misalignment due to spatial dislocation of human bodies.

The second embodiment:

the invention further provides a pedestrian re-identification terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps in the above method embodiment of the first embodiment of the invention.

Further, as an executable scheme, the pedestrian re-identification terminal device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The pedestrian re-identification terminal device can comprise, but is not limited to, a processor and a memory. It is understood by those skilled in the art that the above-mentioned constituent structure of the pedestrian re-identification terminal device is only an example of the pedestrian re-identification terminal device, and does not constitute a limitation to the pedestrian re-identification terminal device, and may include more or less components than the above, or combine some components, or different components, for example, the pedestrian re-identification terminal device may further include an input/output device, a network access device, a bus, and the like, which is not limited by the embodiment of the present invention.

Further, as an executable solution, the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor is a control center of the pedestrian re-identification terminal device and connects various parts of the entire pedestrian re-identification terminal device by using various interfaces and lines.

The memory can be used for storing the computer program and/or the module, and the processor can realize various functions of the pedestrian re-identification terminal device by running or executing the computer program and/or the module stored in the memory and calling the data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The present invention also provides a computer-readable storage medium, which stores a computer program, which, when executed by a processor, implements the steps of the above-mentioned method of an embodiment of the present invention.

The module/unit integrated with the pedestrian re-identification terminal device may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), software distribution medium, and the like.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A pedestrian re-identification method is characterized by comprising the following steps:

the realization process of the model comprises the following steps:

s21: extracting multi-channel characteristics of each image in the image data set;

s22: extracting local features of each image of the image dataset;

s23: merging the multi-channel features and the local features into global features, and constructing the global features of all images in the image data set into a feature library of the image data set;

s24: extracting high-level semantic features of a pedestrian image to be inquired;

2. The pedestrian re-identification method according to claim 1, characterized in that: the process of extracting the multichannel features in step S21 includes the following steps:

s213: inputting the feature map into the average pooling layer to obtain a second feature vector;

3. The pedestrian re-identification method according to claim 1, characterized in that: the process of extracting the local features in step S22 includes the steps of:

s221: uniformly cutting each image into a plurality of sub-images;

s222: inputting the subgraphs into a basic convolutional neural network to obtain a feature graph of each subgraph;

4. The pedestrian re-recognition method according to claim 1, characterized in that: the extraction process of the high-level semantic features in the step S24 comprises the following steps:

5. The pedestrian re-identification method according to claim 1, characterized in that: the 5-dimensional in step S241 includes two coordinates and three primary colors in a coordinate system.

6. The pedestrian re-identification method according to claim 1, characterized in that: the loss function employed by the model training process includes triplet losses and cross entropy losses.

7. A pedestrian re-identification terminal device is characterized in that: comprising a processor, a memory and a computer program stored in said memory and running on said processor, said processor implementing the steps of the method according to any one of claims 1 to 6 when executing said computer program.

8. A computer-readable storage medium storing a computer program, characterized in that: the computer program realizing the steps of the method as claimed in any one of claims 1 to 6 when executed by a processor.