CN115690840A - Pedestrian re-identification method, terminal equipment and storage medium - Google Patents

Pedestrian re-identification method, terminal equipment and storage medium Download PDF

Info

Publication number
CN115690840A
CN115690840A CN202211308979.XA CN202211308979A CN115690840A CN 115690840 A CN115690840 A CN 115690840A CN 202211308979 A CN202211308979 A CN 202211308979A CN 115690840 A CN115690840 A CN 115690840A
Authority
CN
China
Prior art keywords
image
pedestrian
inquired
data set
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211308979.XA
Other languages
Chinese (zh)
Inventor
王宗跃
谢道顺
陈屹东
苏锦河
陈文平
陈智鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Saiwei Network Technology Co ltd
Jimei University
Original Assignee
Shenzhen Saiwei Network Technology Co ltd
Jimei University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Saiwei Network Technology Co ltd, Jimei University filed Critical Shenzhen Saiwei Network Technology Co ltd
Priority to CN202211308979.XA priority Critical patent/CN115690840A/en
Publication of CN115690840A publication Critical patent/CN115690840A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention relates to a pedestrian re-identification method, terminal equipment and a storage medium, wherein the method comprises the following steps: extracting multi-channel characteristics of each image of the image data set; extracting local features of each image of the image dataset; merging the multichannel characteristics and the local characteristics into global characteristics, and constructing the global characteristics of all images in the image data set into a characteristic library of the image data set; extracting high-level semantic features of a pedestrian image to be inquired; and calculating the similarity between the high-level semantic features of the pedestrian image to be inquired and the global features of the images in the image data set, and taking the image in the image data set with the highest similarity as the same person image of the pedestrian image to be inquired. The invention improves the problem of global representation capability limitation existing in the current deep learning pedestrian re-identification network, and simultaneously improves the problem of errors caused by semantic misalignment due to human body space dislocation.

Description

Pedestrian re-identification method, terminal equipment and storage medium
Technical Field
The invention relates to the field of computer vision, in particular to a pedestrian re-identification method, terminal equipment and a storage medium.
Background
The purpose of pedestrian re-identification is to retrieve the presence of a particular pedestrian in an image or video sequence using computer vision techniques. The pedestrian re-identification task is an important component of an intelligent monitoring system. The early human recognition task consists of two parts: feature learning and metric learning. Feature learning is used to extract invariant features representing persons of the same identity, while metric learning is used to measure the similarity between portraits.
The pedestrian re-recognition task aims at capturing images of the same person at different physical locations. This task is challenging due to large changes in person pose and viewing angle, imperfect person detection, cluttered backgrounds, occlusion, and differences in lighting. Some deep learning approaches lack the focus on local differences and have no explicit mechanism to solve the misalignment problem, and the ability to globally characterize is limited. Furthermore, gesture-based body part alignment does not achieve satisfactory alignment. Due to the spatial misalignment of the human body, the same spatial position does not correspond to the same semantic.
Disclosure of Invention
In order to solve the above problem, the present invention provides a pedestrian re-identification method, a terminal device and a storage medium.
The specific scheme is as follows:
a pedestrian re-identification method comprises the following steps:
s1: acquiring an image data set consisting of pedestrian images corresponding to different persons, acquiring different images belonging to the same person as the images of the image data set as to-be-inquired pedestrian images, labeling the to-be-inquired pedestrian images on the same person images of the image data set, and constructing a training set based on the labeled to-be-inquired pedestrian images and the image data set;
s2: constructing a pedestrian re-identification network model, taking the image data set and the pedestrian image to be inquired as the input of the model, and outputting the model as the same figure image of the pedestrian image to be inquired in the image data set; training the model through a training set;
the realization process of the model comprises the following steps:
s21: extracting multi-channel characteristics of each image of the image data set;
s22: extracting local features of each image of the image dataset;
s23: merging the multichannel characteristics and the local characteristics into global characteristics, and constructing the global characteristics of all images in the image data set into a characteristic library of the image data set;
s24: extracting high-level semantic features of the pedestrian image to be inquired;
s25: calculating the similarity between the high-level semantic features of the pedestrian image to be inquired and the global features of the images in the image data set, and taking the image in the image data set with the highest similarity as the same character image of the pedestrian image to be inquired;
s3: and identifying the pedestrian image to be inquired through the trained pedestrian re-identification network model.
Further, the process of extracting the multi-channel features in step S21 includes the following steps:
s211: inputting the image into a DenseNet network to obtain a characteristic diagram of the image;
s212: inputting the feature map into a maximum pooling layer to obtain a first feature vector;
s213: inputting the feature map into an average pooling layer to obtain a second feature vector;
s214: and adding the first feature vector and the second feature vector to obtain the multichannel feature.
Further, the local feature extraction process in step S22 includes the following steps:
s221: uniformly cutting each image into a plurality of sub-images;
s222: inputting the subgraphs into a basic convolutional neural network to obtain a characteristic graph of each subgraph;
s223: inputting the feature graph of the subgraph into the maximum pooling layer to obtain subgraph level features, and inputting the feature graph of the subgraph into a multilayer perceptron to obtain a correlation matrix of the subgraph;
s224: inputting the subgraph level features and the correlation matrix of the subgraphs into a graph convolution neural network to obtain the incidence relation between the subgraphs;
s225: and inputting the association relation between the subgraphs into a multilayer perceptron to obtain the local features of the image.
Further, the extraction process of the high-level semantic features in step S24 includes the following steps:
s241: extracting 5-dimensional feature vectors of the pedestrian image to be inquired;
s242: inputting the 5-dimensional feature vector into a feature pyramid network to obtain semantic features of the pedestrian image to be inquired;
s243: based on the semantic features of the pedestrian image to be inquired, grouping the image into super pixels by adopting a clustering algorithm;
s244: and extracting high-level semantic features of the pedestrian image to be inquired by adopting a multi-head attention mechanism based on the super-pixel grouping result.
Further, the 5-dimension in step S241 includes two coordinates and three primary colors in the coordinate system.
A pedestrian re-identification terminal device comprises a processor, a memory and a computer program stored in the memory and operable on the processor, wherein the processor executes the computer program to implement the steps of the method of the embodiment of the invention.
A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method as described above for an embodiment of the invention.
By adopting the technical scheme, the invention improves the problem of global representation capability limitation existing in the current deep learning pedestrian re-identification network, and simultaneously improves the problem of errors caused by semantic misalignment due to human body spatial dislocation.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention.
Fig. 2 is a flowchart illustrating a process of implementing a model according to an embodiment of the present invention.
Detailed Description
To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. With these references, one of ordinary skill in the art will appreciate other possible embodiments and advantages of the present invention.
The invention will now be further described with reference to the drawings and the detailed description.
The first embodiment is as follows:
the embodiment of the invention provides a pedestrian re-identification method, which comprises the following steps as shown in figure 1:
s1: acquiring an image data set X = { I } consisting of pedestrian images corresponding to different persons 1 ,I 2 ,…,I N And N represents the number of images in the input image data set X, different images belonging to the same person as the images in the image data set are collected at the same time to serve as the images of the pedestrian to be inquired, the images of the pedestrian to be inquired are marked on the images of the same person in the image data set, and a training set is constructed based on the marked images of the pedestrian to be inquired and the image data set.
S2: and constructing a pedestrian re-identification network model, taking the image data set and the pedestrian image to be inquired as the input of the model, and outputting the model as the same figure image of the pedestrian image to be inquired in the image data set. The model is trained through a training set.
As shown in fig. 2, the implementation process of the model includes:
s21: and extracting the multichannel characteristics of each image in the image data set.
The extraction process of the multi-channel features in the embodiment comprises the following steps
S211: inputting the image I into a DenseNet network to obtain a characteristic diagram F of the image M =DenseNet(Ι),
Figure BDA0003907158320000051
H represents the height of the image, W represents the width of the image, and d represents the number of characteristic channels.
S212: inputting the feature map into the maximum pooling layer to obtain a first feature vector F Max =Maxpooling(F M )。
S213: inputting the feature map into an average pooling layer to obtain a second feature vector F Avg =Avgpooling(F M )。
S214: the first feature vector F Max And a second feature vector F Avg Adding to obtain multi-channel features
Figure BDA0003907158320000052
S22: local features are extracted for each image of the image dataset.
The process of extracting the local features in this embodiment includes the following steps:
s221: uniformly cutting each image I into a plurality of sub-images I = { s = {(s) } 1 ,s 2 ,…,s k },
Figure BDA0003907158320000053
h represents the height of the subgraph, w represents the width of the subgraph, and k represents the number of the subgraphs.
S222: inputting the subgraphs into a basic convolutional neural network to obtain a feature graph f of each subgraph M =CNN(s k )。
S223: inputting the feature graph of the subgraph into the maximum pooling layer to obtain a subgraph-level feature f k =Maxpooling(f M ),
Figure BDA0003907158320000054
Inputting the feature graph of the subgraph into a multilayer perceptron to obtain a correlation matrix of the subgraph
Figure BDA0003907158320000061
S224: after the subgraph level features and the correlation matrix of the subgraphs are input into a graph convolution neural network, the incidence relation F between the subgraphs is obtained R =GCN(f k ,M)。
S225: relating relationship F between subgraphs R Inputting the multilayer perceptron to obtain local features F of the image I local =MLP(F R )。
S23: multiple channel feature F merge And local feature F local Merge into a global feature F global =Concat(F merge ,F local ) Constructing global characteristics of all images of the image data set into a characteristic library of the image data set
Figure BDA0003907158320000062
S24: and extracting high-level semantic features of the pedestrian image to be inquired.
The process for extracting the high-level semantic features in the embodiment comprises the following steps:
s241: extracting pedestrian image I to be inquired q 5-dimensional feature vector of (a), the 5-dimensional feature vector in this embodiment includes two coordinates (x, y) and three primary colors (R, G, B) in a coordinate system.
S242: inputting the 5-dimensional feature vector into a feature pyramid network to obtain a pedestrian image I to be inquired q Semantic feature F of sem =FPN(F a )。
S243: based on waiting to inquire pedestrian's image I q Semantic feature F of sem Grouping the image into superpixels P by adopting a clustering algorithm super
Clustering algorithm in this embodiment, a Kmeans algorithm is used.
S244: based on superpixels P super Grouping results, using a multi-head attention mechanismObtaining high-level semantic features F of pedestrian image to be inquired q =MHAttention(P super )。
S25: and calculating the similarity between the high-level semantic features of the pedestrian image to be inquired and the global features of the images in the image data set, and taking the image in the image data set with the highest similarity as the same person image of the pedestrian image to be inquired.
Model loss function in this embodiment
Figure BDA0003907158320000071
Using triple losses
Figure BDA0003907158320000072
And cross entropy loss
Figure BDA0003907158320000073
Namely, it is
Figure BDA0003907158320000074
Wherein 0 < λ < 1 represents a weight coefficient.
Loss of triad
Figure BDA0003907158320000075
The calculation formula of (2) is as follows:
Figure BDA0003907158320000076
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003907158320000077
represents the way in which the distance metric is measured,
Figure BDA0003907158320000078
represent and F q A positive sample belonging to the same character,
Figure BDA0003907158320000079
represents and F q Negative examples belonging to different characters, m represents an edge parameter.
The cross-entropy loss comprises two parts of a first cross-entropy loss and a second cross-entropy loss, wherein:
the first cross entropy loss is used for calculating the accuracy of multi-channel feature extraction of each image of the image data set, and in order to calculate the first cross entropy loss, the model further comprises: multiple channel feature F merge And after the full connection layer is input, the classification probability of the image is obtained through a Softmax function.
The second cross entropy loss is used for calculating the accuracy of the same person image identification of the pedestrian image to be inquired.
S3: and identifying the pedestrian image to be inquired through the trained pedestrian re-identification network model.
According to the embodiment of the invention, the global features are obtained by combining the multi-channel features and the local features, and the global features of each image are jointly formed into the feature library of the image data set, so that the problem of global characterization capability limitation existing in the current deep learning pedestrian re-identification network is solved. In addition, the embodiment of the invention extracts the semantic features of the pedestrian image to be inquired from the feature pyramid network, divides the image into super pixels by a clustering algorithm, and further extracts high-level semantic features by utilizing multi-head attention. Therefore, the embodiment of the invention improves the problem of errors caused by semantic misalignment due to spatial dislocation of human bodies.
The second embodiment:
the invention further provides a pedestrian re-identification terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps in the above method embodiment of the first embodiment of the invention.
Further, as an executable scheme, the pedestrian re-identification terminal device may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The pedestrian re-identification terminal device can comprise, but is not limited to, a processor and a memory. It is understood by those skilled in the art that the above-mentioned constituent structure of the pedestrian re-identification terminal device is only an example of the pedestrian re-identification terminal device, and does not constitute a limitation to the pedestrian re-identification terminal device, and may include more or less components than the above, or combine some components, or different components, for example, the pedestrian re-identification terminal device may further include an input/output device, a network access device, a bus, and the like, which is not limited by the embodiment of the present invention.
Further, as an executable solution, the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, a discrete hardware component, and the like. The general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc., and the processor is a control center of the pedestrian re-identification terminal device and connects various parts of the entire pedestrian re-identification terminal device by using various interfaces and lines.
The memory can be used for storing the computer program and/or the module, and the processor can realize various functions of the pedestrian re-identification terminal device by running or executing the computer program and/or the module stored in the memory and calling the data stored in the memory. The memory can mainly comprise a program storage area and a data storage area, wherein the program storage area can store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The present invention also provides a computer-readable storage medium, which stores a computer program, which, when executed by a processor, implements the steps of the above-mentioned method of an embodiment of the present invention.
The module/unit integrated with the pedestrian re-identification terminal device may be stored in a computer-readable storage medium if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), software distribution medium, and the like.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (8)

1. A pedestrian re-identification method is characterized by comprising the following steps:
s1: acquiring an image data set consisting of pedestrian images corresponding to different persons, acquiring different images belonging to the same person as the images of the image data set as to-be-inquired pedestrian images, labeling the to-be-inquired pedestrian images on the same person images of the image data set, and constructing a training set based on the labeled to-be-inquired pedestrian images and the image data set;
s2: constructing a pedestrian re-identification network model, taking the image data set and the pedestrian image to be inquired as the input of the model, and outputting the model as the same figure image of the pedestrian image to be inquired in the image data set; training the model through a training set;
the realization process of the model comprises the following steps:
s21: extracting multi-channel characteristics of each image in the image data set;
s22: extracting local features of each image of the image dataset;
s23: merging the multi-channel features and the local features into global features, and constructing the global features of all images in the image data set into a feature library of the image data set;
s24: extracting high-level semantic features of a pedestrian image to be inquired;
s25: calculating the similarity between the high-level semantic features of the pedestrian image to be inquired and the global features of the images in the image data set, and taking the image in the image data set with the highest similarity as the same character image of the pedestrian image to be inquired;
s3: and identifying the pedestrian image to be inquired through the trained pedestrian re-identification network model.
2. The pedestrian re-identification method according to claim 1, characterized in that: the process of extracting the multichannel features in step S21 includes the following steps:
s211: inputting the image into a DenseNet network to obtain a characteristic diagram of the image;
s212: inputting the feature map into a maximum pooling layer to obtain a first feature vector;
s213: inputting the feature map into the average pooling layer to obtain a second feature vector;
s214: and adding the first feature vector and the second feature vector to obtain the multichannel feature.
3. The pedestrian re-identification method according to claim 1, characterized in that: the process of extracting the local features in step S22 includes the steps of:
s221: uniformly cutting each image into a plurality of sub-images;
s222: inputting the subgraphs into a basic convolutional neural network to obtain a feature graph of each subgraph;
s223: inputting the feature graph of the subgraph into the maximum pooling layer to obtain subgraph level features, and inputting the feature graph of the subgraph into a multilayer perceptron to obtain a correlation matrix of the subgraph;
s224: inputting the subgraph level features and the correlation matrix of the subgraphs into a graph convolution neural network to obtain the incidence relation between the subgraphs;
s225: and inputting the association relation between the subgraphs into a multilayer perceptron to obtain the local features of the image.
4. The pedestrian re-recognition method according to claim 1, characterized in that: the extraction process of the high-level semantic features in the step S24 comprises the following steps:
s241: extracting 5-dimensional feature vectors of the pedestrian image to be inquired;
s242: inputting the 5-dimensional feature vector into a feature pyramid network to obtain semantic features of the pedestrian image to be inquired;
s243: based on the semantic features of the pedestrian image to be inquired, grouping the image into super pixels by adopting a clustering algorithm;
s244: and extracting high-level semantic features of the pedestrian image to be inquired by adopting a multi-head attention mechanism based on the super-pixel grouping result.
5. The pedestrian re-identification method according to claim 1, characterized in that: the 5-dimensional in step S241 includes two coordinates and three primary colors in a coordinate system.
6. The pedestrian re-identification method according to claim 1, characterized in that: the loss function employed by the model training process includes triplet losses and cross entropy losses.
7. A pedestrian re-identification terminal device is characterized in that: comprising a processor, a memory and a computer program stored in said memory and running on said processor, said processor implementing the steps of the method according to any one of claims 1 to 6 when executing said computer program.
8. A computer-readable storage medium storing a computer program, characterized in that: the computer program realizing the steps of the method as claimed in any one of claims 1 to 6 when executed by a processor.
CN202211308979.XA 2022-10-25 2022-10-25 Pedestrian re-identification method, terminal equipment and storage medium Pending CN115690840A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211308979.XA CN115690840A (en) 2022-10-25 2022-10-25 Pedestrian re-identification method, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211308979.XA CN115690840A (en) 2022-10-25 2022-10-25 Pedestrian re-identification method, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115690840A true CN115690840A (en) 2023-02-03

Family

ID=85099141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211308979.XA Pending CN115690840A (en) 2022-10-25 2022-10-25 Pedestrian re-identification method, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115690840A (en)

Similar Documents

Publication Publication Date Title
CN109359538B (en) Training method of convolutional neural network, gesture recognition method, device and equipment
CN111709409B (en) Face living body detection method, device, equipment and medium
CN109960742B (en) Local information searching method and device
CN110197146B (en) Face image analysis method based on deep learning, electronic device and storage medium
US8750573B2 (en) Hand gesture detection
US8792722B2 (en) Hand gesture detection
US10445602B2 (en) Apparatus and method for recognizing traffic signs
CN110532970B (en) Age and gender attribute analysis method, system, equipment and medium for 2D images of human faces
Molina-Moreno et al. Efficient scale-adaptive license plate detection system
WO2019033570A1 (en) Lip movement analysis method, apparatus and storage medium
CN110852311A (en) Three-dimensional human hand key point positioning method and device
Demirkus et al. Hierarchical temporal graphical model for head pose estimation and subsequent attribute classification in real-world videos
CN113011253B (en) Facial expression recognition method, device, equipment and storage medium based on ResNeXt network
WO2019033568A1 (en) Lip movement capturing method, apparatus and storage medium
WO2021169642A1 (en) Video-based eyeball turning determination method and system
CN112084849A (en) Image recognition method and device
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN111291612A (en) Pedestrian re-identification method and device based on multi-person multi-camera tracking
CN111400528A (en) Image compression method, device, server and storage medium
CN110569818A (en) intelligent reading learning method
Zhu et al. Text detection based on convolutional neural networks with spatial pyramid pooling
Zarbakhsh et al. Low-rank sparse coding and region of interest pooling for dynamic 3D facial expression recognition
Juang et al. Stereo-camera-based object detection using fuzzy color histograms and a fuzzy classifier with depth and shape estimations
CN114168768A (en) Image retrieval method and related equipment
CN111144469A (en) End-to-end multi-sequence text recognition method based on multi-dimensional correlation time sequence classification neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination