CN111783753A - Pedestrian re-identification method based on semantic consistency horizontal bar and foreground correction - Google Patents

Pedestrian re-identification method based on semantic consistency horizontal bar and foreground correction Download PDF

Info

Publication number
CN111783753A
CN111783753A CN202010918791.1A CN202010918791A CN111783753A CN 111783753 A CN111783753 A CN 111783753A CN 202010918791 A CN202010918791 A CN 202010918791A CN 111783753 A CN111783753 A CN 111783753A
Authority
CN
China
Prior art keywords
pedestrian
features
image
foreground
input image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010918791.1A
Other languages
Chinese (zh)
Other versions
CN111783753B (en
Inventor
郭海云
朱宽
王金桥
唐明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202010918791.1A priority Critical patent/CN111783753B/en
Publication of CN111783753A publication Critical patent/CN111783753A/en
Application granted granted Critical
Publication of CN111783753B publication Critical patent/CN111783753B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Abstract

The invention belongs to the field of computer vision and pattern recognition, and particularly relates to a pedestrian re-recognition method based on semantic consistency horizontal bars and foreground correction, aiming at solving the problem that the existing pedestrian re-recognition method is poor in re-recognition robustness. The method comprises the following steps: acquiring an image to be identified as an input image; extracting features of an input image as first features; based on the first features, respectively acquiring foreground features corresponding to pedestrians in the input image as second features through a row classifier in a pedestrian re-recognition model, and acquiring features of horizontal bar regions of all set parts of the pedestrians in the input image as third features; point-to-point multiplication is carried out on the second characteristic and the third characteristic, and the second characteristic and the third characteristic are spliced to obtain a fourth characteristic; and calculating and sequencing the Euclidean distance between the fourth feature and the feature corresponding to each image in the image library, and outputting the sequencing result as a re-identification result. The invention improves the robustness of pedestrian re-identification.

Description

Pedestrian re-identification method based on semantic consistency horizontal bar and foreground correction
Technical Field
The invention belongs to the field of computer vision and pattern recognition, and particularly relates to a pedestrian re-recognition method, system and device based on semantic consistency horizontal bars and foreground correction.
Background
Pedestrian re-identification belongs to a sub-problem in the field of image retrieval. Given a pedestrian image, the pedestrian re-recognition task aims to find the pedestrian image in other scenes. However, due to the change of the view angle, the difference of the postures and the occlusion of the object, a part of the human body may appear at any position of the picture. Therefore, it is important to learn a feature that can effectively locate each part of the human body and extract a part with sufficient discrimination power.
Existing pedestrian re-identification based on position alignment is roughly classified into four types: horizontal-stripe-based methods, bounding-box-based methods, attention-based methods, and additional semantic information-based methods. Among these, the horizontal-bar-based approach is particularly popular because of its convenience, rapidity, and relatively high performance. Among them, PCB, MGN, Pyramid, etc. are more popular. PCB (YifanSun, Liang Zheng, Yi Yang, Qi Tian, Shengjin Wang. Beyond Part Models: PersonRetrieval with referred Part Pooling (and A Strong connected basic. ECCV, 2018) originally proposes to divide a pedestrian picture into horizontal strips with equal height, then to separately perform average Pooling on each horizontal strip to obtain characteristics, and to separately calculate loss. MGN (Guanshuo Wang, Yufeng Yuan, Xiong Chen, Jiwei Li. left distributed features with multiple gradualities for Person Re-IDentification. ACM MM, 2018) and Pyramid (Zheng F, Deng C, Sun X, et al, Pyramid Person Re-IDentification via Multi-Loss Dynamic tracking. CVPR, 2019.) on the basis of PCB, Multi-granularity and overlapped horizontal strips are designed, and the robustness of the algorithm is greatly improved. However, none of the above methods solves the following two problems: (1) the height and the position of the horizontal bar are fixed. Due to the influence of problems such as the difference of the posture and the visual angle, the shielding and the like, the semantics in each horizontal bar can not be ensured to be consistent. The above method, however, employs fixed horizontal bars and does not attempt to solve this problem. (2) Interference from background noise. In the interior of each horizontal bar, there is inevitably interference of background information, and how to remove background noise in the interior of the horizontal bar, there is no method to solve at present. Based on the method, the invention provides a pedestrian re-identification method based on semantic consistency horizontal bars and foreground correction.
Disclosure of Invention
In order to solve the above problems in the prior art, that is, to solve the problem of poor re-recognition robustness of the existing pedestrian re-recognition method due to the fixed height and position of the horizontal bar and the non-elimination of background noise, the invention provides a pedestrian re-recognition method based on semantic consistency horizontal bar and foreground correction, which comprises the following steps:
step S10, acquiring an image to be recognized as an input image;
step S20, extracting the characteristics of the input image as first characteristics through a characteristic extraction layer of a pedestrian re-identification model;
step S30, based on the first features, respectively acquiring foreground features corresponding to pedestrians in the input image as second features and acquiring features of horizontal bar areas of all set parts of the pedestrians in the input image as third features through a pre-trained row classifier in a pedestrian re-recognition model;
step S40, point-to-point multiplication is carried out on the second characteristic and the third characteristic, and the second characteristic and the third characteristic are spliced to obtain a fourth characteristic;
step S50, calculating and sorting the Euclidean distance between the fourth feature and the feature corresponding to each image in the image library, and outputting the sorting result as a re-identification result;
the pedestrian re-identification model is constructed on the basis of a deep convolutional neural network; the line classifier is built based on a fully connected layer and a softmax layer.
In some preferred embodiments, the line classifier is trained by:
step A10, obtaining a training sample image set;
step A20, extracting and pooling the line characteristics of any image in the training sample image set to obtain the corresponding average characteristics;
step A30, judging whether the current iteration number M is a multiple of N, if so, executing step A40, otherwise, jumping to step A50; wherein N, M is a natural number;
step A40, extracting the line characteristics of each training sample image in the training sample set, obtaining the pseudo labels corresponding to each set part through self-similarity clustering, and executing step A50;
and A50, calculating the loss of the local features and the pseudo labels acquired in the step A20, and updating parameters of the line classifier.
In some preferred embodiments, the self-similar clustering is a k-means clustering method.
In some preferred embodiments, the method for respectively acquiring foreground features corresponding to pedestrians in the input image as second features by using a pre-trained row classifier in a pedestrian re-recognition model includes:
obtaining the confidence coefficient of each pixel point in the input image to human foreground semantics through the line classifier;
taking the pixel points with the confidence degrees larger than a first set threshold as foreground pixels, and taking the pixel points with the straightness degrees smaller than a second set threshold as background pixels;
and constructing features based on the extracted foreground pixels to serve as foreground features corresponding to the pedestrians in the input image.
In some preferred embodiments, the method for acquiring, as the third feature, the feature of the horizontal bar region of each set part of the pedestrian in the input image by using the pre-trained row classifier in the pedestrian re-recognition model includes:
performing semantic segmentation on the input image through a line classifier to obtain a confidence map of a horizontal bar region of each set part of a pedestrian in the input image;
and respectively carrying out point-to-point product operation on each confidence map and the first characteristics to obtain the characteristics of the horizontal bar area of each set position of the pedestrian in the input image.
In some preferred embodiments, the pedestrian re-identification model has a loss function in training as follows:
Figure 682909DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 127797DEST_PATH_IMAGE002
a loss value representing a pedestrian re-recognition model,
Figure 402921DEST_PATH_IMAGE003
representing the number of training sample images of a batch of the pedestrian re-identification model during training,
Figure 905446DEST_PATH_IMAGE004
the number of the batches is represented by,
Figure 975033DEST_PATH_IMAGE005
any image in a batch of training sample images representing the pedestrian re-identification model during training,
Figure 438376DEST_PATH_IMAGE006
representing features of images in image set A
Figure 251611DEST_PATH_IMAGE005
The characteristic Euclidean distance of the training sample image is maximum,
Figure 608643DEST_PATH_IMAGE007
representing image features in image set B
Figure 114711DEST_PATH_IMAGE005
Characteristic of (2) is minimumAn image of a training sample is obtained,
Figure 65349DEST_PATH_IMAGE008
indicating a preset distance interval between the first and second electrodes,
Figure 682275DEST_PATH_IMAGE009
representation comprises and
Figure 769180DEST_PATH_IMAGE005
a set of images of all images of the same ID,
Figure 570783DEST_PATH_IMAGE010
indicating the current batch except
Figure 743138DEST_PATH_IMAGE009
All images except the image contained in (1) construct an image set,
Figure 898176DEST_PATH_IMAGE011
representing the euclidean distance.
In a second aspect of the present invention, a pedestrian re-identification system based on semantic consistency horizontal stripes and foreground correction is provided, the system comprising: the system comprises an image acquisition module, a global feature extraction module, a local feature extraction module, a feature splicing module and an identification output module;
the image acquisition module is configured to acquire an image to be identified as an input image;
the global feature extraction module is configured to extract features of the input image as first features through a feature extraction layer of a pedestrian re-recognition model;
the local feature extraction module is configured to respectively acquire foreground features corresponding to pedestrians in the input image as second features and acquire features of horizontal bar areas of all set parts of the pedestrians in the input image as third features through a pre-trained row classifier in a pedestrian re-recognition model based on the first features;
the feature splicing module is configured to multiply the second feature and the third feature point to point, and splice the second feature and the first feature to obtain a fourth feature;
the recognition output module is configured to calculate and sort the Euclidean distance between the third feature and the feature corresponding to each image in the image library, and output a sorting result as a re-recognition result;
the pedestrian re-identification model is constructed on the basis of a deep convolutional neural network; the line classifier is built based on a fully connected layer and a softmax layer.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being applied and loaded by a processor and executed to implement the above pedestrian re-identification method based on semantic consistency level bar and foreground correction.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the pedestrian re-identification method based on semantic consistent horizontal bars and foreground correction described above.
The invention has the beneficial effects that:
the invention improves the robustness of pedestrian re-identification. According to the invention, each line is divided into specific semantics through the pre-trained line classifier so as to form the horizontal bars with consistent semantics, and the height and the position of the horizontal bars can be adjusted in a self-adaptive manner, so that the semantics contained in each horizontal bar are consistent, and the problem of semantic consistency of the horizontal bars is solved.
At the same time, each pixel is also assigned to foreground or background semantics. By taking the intersection of the horizontal bar semantics and the foreground region, the positions of all parts of the human body can be approximately obtained, the interference of background information is solved, and the positioning accuracy of all parts and the discrimination of local features are improved.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.
FIG. 1 is a schematic flow chart of a pedestrian re-identification method based on semantic consistency horizontal bars and foreground correction according to an embodiment of the invention;
FIG. 2 is a block diagram of a pedestrian re-identification system based on semantic consistency horizontal bars and foreground correction according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a pedestrian re-identification method based on semantic consistency horizontal bars and foreground correction according to an embodiment of the present invention;
FIG. 4 is a schematic illustration of the comparison of the line classifier of the present invention with a prior art fixed horizontal bar height and position line classifier in accordance with one embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
A pedestrian re-identification method based on semantic consistency horizontal stripes and foreground correction according to a first embodiment of the present invention is, as shown in fig. 1, including the following steps:
step S10, acquiring an image to be recognized as an input image;
step S20, extracting the characteristics of the input image as first characteristics through a characteristic extraction layer of a pedestrian re-identification model;
step S30, based on the first features, respectively acquiring foreground features corresponding to pedestrians in the input image as second features and acquiring features of horizontal bar areas of all set parts of the pedestrians in the input image as third features through a pre-trained row classifier in a pedestrian re-recognition model;
step S40, point-to-point multiplication is carried out on the second characteristic and the third characteristic, and the second characteristic and the third characteristic are spliced to obtain a fourth characteristic;
step S50, calculating and sorting the Euclidean distance between the fourth feature and the feature corresponding to each image in the image library, and outputting the sorting result as a re-identification result;
the pedestrian re-identification model is constructed on the basis of a deep convolutional neural network; the line classifier is built based on a fully connected layer and a softmax layer.
In order to more clearly explain the pedestrian re-identification method based on semantic consistency horizontal stripes and foreground correction, the following is a detailed description of each step in one embodiment of the method.
In the following embodiments, the training process of the pedestrian re-recognition model is detailed first, and then the process of obtaining the pedestrian re-recognition result by the pedestrian re-recognition method based on the semantic consistency horizontal bar and the foreground correction is detailed.
1. Training process of pedestrian re-identification model
Step B10, pre-training the pedestrian re-identification model
In the invention, a pedestrian re-identification model is constructed based on a Deep convolutional neural network, wherein the Deep convolutional neural network is preferably an HRNet network provided by the document 'Sun K, Xiao B, Liu D, et al, Deep High-resolution reproduction Learning for Human dose Estimation [ J ]. 2019', the HRNet comprises multi-scale semantic information, and the HRNet is more suitable for being used as a network common to Human body semantic analysis and pedestrian re-identification. The pedestrian re-identification model is shown in fig. 3, wherein the neural network model in fig. 3 refers to a convolutional neural network for extracting features, and the aligned features of the pedestrian parts represent features obtained by sequentially stitching the features of the human body parts, which is explained in detail below.
In this embodiment, the pedestrian re-recognition model is pre-trained by using the ImageNet data set, and network parameters of the pedestrian re-recognition model are initialized. In the pre-training process, the selected sample image size is compressed to
Figure 370746DEST_PATH_IMAGE012
And inputting 64 images for each iteration, and performing 6000 times of iterative training, wherein in other embodiments, the number of iterations of model pre-training and the number of sample images input for each iteration can be selected according to actual requirements.
Step B20, obtaining training sample image set
In the present embodiment, training sample images including pedestrians are acquired, and a training sample image set is constructed.
Step B30, extracting the feature of each training sample image in the training sample image set as the global feature
In this embodiment, the features of the training sample image are extracted through the feature extraction layer of the pedestrian re-recognition model as global features, and the feature extraction layer is a feature extraction layer constructed based on convolutional nerves. The size of the feature (feature map) finally output by the HRNet is 64322048, and the feature output by the HRNet is reduced to 6432512 size by 11 convolution, and then division and foreground correction operations are performed.
And step B40, based on the global features, respectively obtaining the foreground features corresponding to the pedestrians in each training sample image and obtaining the features of the horizontal bar areas of each set part of the pedestrians in each training sample image through the pre-trained row classifier in the pedestrian re-recognition model.
The invention provides a semantic consistency horizontal bar and foreground correction classifier, which is used for dividing a pedestrian image into semantic consistency horizontal bars and then removing a background in the horizontal bars. The network includes a semantic consistency level bar and a foreground modification module. The former generates a pseudo label of a horizontal bar in an iterative clustering mode, and then indicates the learning of horizontal bar division; the latter uses the learned horizontal bar classifier to obtain a foreground response image, and uses the foreground response image to guide the division of the foreground and the background. Finally, an efficient pedestrian characterization is obtained by combining global and local features. The method comprises the following specific steps:
in this embodiment, the semantics are consistent with the division of the horizontal bar, i.e. the line division, the line division mainly includes a line classifier, which is constructed by a full connection layer and a softmax layer, and each line can be divided into different semantics. Firstly, performing pooling operation on each line of a training sample image (namely line unit pooling in fig. 3) to obtain average features of each line, namely line features, then classifying the average features of each line by using a line classifier, wherein the classified category of the average features of each line represents the semantic part of the line, namely, performing semantic segmentation on a set body part area of a pedestrian in the training sample image by using the line classifier to obtain a confidence map of the set body part area of the pedestrian, and performing point-to-point product operation on each confidence map and global features to form weighted feature maps (the features of horizontal bar areas (line areas) of the set parts of the pedestrian) corresponding to different parts of a body.
The present invention preferably captures horizontal bar features of five body parts of a pedestrian, corresponding to head, chest, abdomen, legs and feet features, respectively, denoted as M1, M2, M3, M4 and M5. As shown in fig. 4, (a), (b), and (c) in fig. 4 are feature maps obtained by a conventional horizontal bar height and fixed position line classifier, and it can be found that the feature map obtained by using the conventional human body analysis model (i.e., the feature map after stitching) cannot affect the performance by using effective information such as a backpack.
Wherein, the line classifier uses an iterative clustering method to assign a pseudo label to each line in the training process. Namely, every time n times of training stages are carried out, clustering is carried out on the characteristic mean value of each line (horizontal bar area of the pedestrian setting part) of the image, and then semantics are distributed according to the positions from top to bottom. In the subsequent training process, the assigned semantic pseudo labels are used for learning of a supervision line classifier. Therefore, the method can adaptively divide the horizontal strips of different semantic parts in the picture to obtain the horizontal strips with consistent semantics. The training process of the line classifier is as follows:
step B41, extracting line characteristics of any image in the training sample image set and pooling the line characteristics to obtain corresponding average characteristics;
step B42, judging whether the current iteration number M is a multiple of N, if so, executing step B43, otherwise, jumping to step B44; wherein N, M is a natural number; the current iteration number M is also the current iteration number of the pedestrian re-recognition training model;
step B43, extracting the average characteristics of all training sample images in the training sample set, acquiring the corresponding pseudo label of each line through self-similar clustering, updating, and executing step B44;
and B44, calculating the average characteristics obtained in the step B41 and the loss of the updated pseudo labels, and updating parameters of the line classifier.
And step B45, circularly executing the steps B41-B44 until a trained line classifier is obtained.
Self-similarity clustering is performed by a k-means clustering algorithm in the invention.
After the line classifier acquires the local features of each set part of the pedestrian, the noise interference is reduced in order to further remove background pixels in the horizontal bar of each set part. The invention designs a foreground-guided part refining method, namely a front background classifier is added to predict whether each pixel of a training sample image belongs to the foreground or the background. In view of the fact that the line classifier has been learned before, the present invention preferably uses the line classifier to distinguish the confidence of each pixel point to each set part of the human body, and the present invention preferably uses the pixels with confidence greater than 0.8 as foreground pixels, the pixels with confidence less than 0.2 as background pixels, and the others as neutral pixels (i.e. neutral in fig. 3). The features constructed based on the extracted foreground pixels are used as foreground features corresponding to pedestrians in the training sample image (namely, the foreground/background features are extracted by taking the pixels as units in fig. 3).
Step B50, feature splicing
In the present embodiment, M1-5 is first compressed into 5 256-dimensional vectors, denoted as S1-5, by global feature pooling. Then, M1-5 are added to obtain the foreground feature map, and compressed into a 256-dimensional vector by the average pooling mapping, which is denoted as S6. The global features are compressed into a 256-dimensional vector through the average pooling mapping, which is denoted as S7, and the feature vector can well convey the overall abstract features. And finally, splicing the three feature vectors to finally obtain a 7 x 256-dimensional feature to represent the feature after pedestrian fusion.
The foreground features of the training sample images can be directly obtained through the line classifier by the S6, and if the foreground features of the training sample images are obtained through the line classifier, the foreground features of the training sample images are obtained through point-to-point multiplication of the S1-5 and the S6 and then are spliced with the S7 to represent the features after pedestrian fusion.
And step B60, calculating and sequencing the Euclidean distance between the spliced features and the corresponding features of the images in the image library, and outputting the sequencing result as a re-identification result.
In the embodiment, the Euclidean distances corresponding to the spliced pedestrian features and the images in the image library are calculated and sorted in an ascending order, and the higher the matching rate of Rank-1 (the first Rank) and the top of the sorting is, the better the effect on the target re-identification task is. The image library is a database for storing a plurality of pedestrian images.
Based on the re-identified results, the present invention uses triple loss to supervise the training of the entire network. The core idea of this loss is to separate the unmatched pedestrian pairs from the matched pedestrian pairs by distance separation to increase the inter-class difference and reduce the intra-class difference, as shown in equation (1):
Figure 953037DEST_PATH_IMAGE001
(1)
wherein the content of the first and second substances,
Figure 737322DEST_PATH_IMAGE002
a loss value representing a pedestrian re-recognition model,
Figure 696051DEST_PATH_IMAGE003
representing the number of training sample images of a batch of the pedestrian re-identification model during training,
Figure 757548DEST_PATH_IMAGE004
the number of the batches is represented by,
Figure 776319DEST_PATH_IMAGE005
any image in a batch of training sample images representing the pedestrian re-identification model during training,
Figure 47900DEST_PATH_IMAGE013
representing features of images in image set A
Figure 810320DEST_PATH_IMAGE005
The image of the training sample with the largest characteristic euclidean distance, i.e. the least likely positive sample,
Figure 726324DEST_PATH_IMAGE007
representing image features in image set B
Figure 915996DEST_PATH_IMAGE005
The image of the training sample with the minimum Euclidean distance of the features of (1), namely the most image negative sample,
Figure 550240DEST_PATH_IMAGE014
a three-element group is formed,
Figure 975405DEST_PATH_IMAGE008
indicating a preset distance interval between the first and second electrodes,
Figure 745915DEST_PATH_IMAGE009
representation comprises and
Figure 106489DEST_PATH_IMAGE005
a set of images of all images of the same ID,
Figure 493608DEST_PATH_IMAGE010
indicating the current batch except
Figure 332251DEST_PATH_IMAGE009
All images except the image contained in (1) construct an image set,
Figure 347481DEST_PATH_IMAGE011
representing the euclidean distance.
And updating the network parameters of the pedestrian re-identification model based on the loss, and jumping to the step B20 until a trained pedestrian re-identification model is obtained.
2. Pedestrian re-identification method based on semantic consistency horizontal bar and foreground correction
Step S10, acquiring an image to be recognized as an input image;
in the present embodiment, an image of the identified pedestrian is acquired first as an input image.
Step S20, extracting the characteristics of the input image as first characteristics through a characteristic extraction layer of a pedestrian re-identification model;
in the embodiment, the global features of the pedestrians in the input image are obtained through the trained pedestrian re-recognition model, that is, the features of the input image are extracted by the feature extraction layer based on the pedestrian re-recognition model as the first features.
Step S30, based on the first features, respectively acquiring foreground features corresponding to pedestrians in the input image as second features and acquiring features of horizontal bar areas of all set parts of the pedestrians in the input image as third features through a pre-trained row classifier in a pedestrian re-recognition model;
in this embodiment, the confidence of each pixel point in the input image to the human foreground semantic is obtained through the line classifier, the pixel point with the confidence greater than the first set threshold is used as the foreground pixel, the pixel point with the confidence less than the second set threshold is used as the background pixel, and the feature constructed based on the extracted foreground pixel is used as the foreground feature corresponding to the pedestrian in the input image and is used as the second feature.
And performing semantic segmentation on the input image through a line classifier to obtain confidence maps of horizontal bar regions of all set positions of the pedestrians in the input image, and performing point-to-point product operation on the confidence maps and the first features to obtain the features of the horizontal bar regions of all the set positions of the pedestrians in the input image as third features.
Step S40, point-to-point multiplication is carried out on the second characteristic and the third characteristic, and the second characteristic and the third characteristic are spliced to obtain a fourth characteristic;
in the present embodiment, the acquired features of the pedestrian are spliced.
And step S50, calculating and sorting the Euclidean distance between the fourth feature and the feature corresponding to each image in the image library, and outputting the sorting result as a re-identification result.
In this embodiment, the euclidean distance between the spliced fourth feature and the feature corresponding to each pedestrian image in the image library is calculated, and the features are sorted, and the sorted result is output as the re-recognition result. In the invention, ascending sorting is preferably adopted, and the higher the sorting is, the higher the matching rate is.
A pedestrian re-recognition system based on semantic consistency horizontal bars and foreground correction according to a second embodiment of the present invention, as shown in fig. 2, includes: the image processing system comprises an image acquisition module 100, a global feature extraction module 200, a local feature extraction module 300, a feature splicing module 400 and an identification output module 500;
the image acquisition module 100 is configured to acquire an image to be recognized as an input image;
the global feature extraction module 200 is configured to extract features of the input image as first features through a feature extraction layer of a pedestrian re-recognition model;
the local feature extraction module 300 is configured to, based on the first feature, respectively acquire, by using a pre-trained row classifier in a pedestrian re-recognition model, foreground features corresponding to pedestrians in the input image as second features, and acquire, as third features, features of horizontal bar regions of each set part of the pedestrians in the input image;
the feature splicing module 400 is configured to multiply the second feature and the third feature point to point, and splice the second feature and the first feature to obtain a fourth feature;
the recognition output module 500 is configured to calculate and sort the euclidean distances between the fourth features and the features corresponding to the images in the image library, and output a sorting result as a re-recognition result;
the pedestrian re-identification model is constructed on the basis of a deep convolutional neural network; the line classifier is built based on a fully connected layer and a softmax layer.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.
It should be noted that, the pedestrian re-identification system based on semantic consistency horizontal bars and foreground correction provided in the foregoing embodiment is only illustrated by the division of the functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiments of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiments may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage device according to a third embodiment of the present invention stores a plurality of programs adapted to be loaded by a processor and to implement the above-described pedestrian re-identification method based on semantic consistency level bar and foreground correction.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is adapted to be loaded and executed by a processor to implement the pedestrian re-identification method based on semantic consistency level bar and foreground correction described above.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method examples, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," "third," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (9)

1. A pedestrian re-identification method based on semantic consistency horizontal stripes and foreground correction is characterized by comprising the following steps:
step S10, acquiring an image to be recognized as an input image;
step S20, extracting the characteristics of the input image as first characteristics through a characteristic extraction layer of a pedestrian re-identification model;
step S30, based on the first features, respectively acquiring foreground features corresponding to pedestrians in the input image as second features and acquiring features of horizontal bar areas of all set parts of the pedestrians in the input image as third features through a pre-trained row classifier in a pedestrian re-recognition model;
step S40, point-to-point multiplication is carried out on the second characteristic and the third characteristic, and the second characteristic and the third characteristic are spliced to obtain a fourth characteristic;
step S50, calculating and sorting the Euclidean distance between the fourth feature and the feature corresponding to each image in the image library, and outputting the sorting result as a re-identification result;
the pedestrian re-identification model is constructed on the basis of a deep convolutional neural network; the line classifier is built based on a fully connected layer and a softmax layer.
2. The pedestrian re-identification method based on semantic consistency horizontal stripe and foreground correction according to claim 1, wherein the training method of the line classifier is as follows:
step A10, obtaining a training sample image set;
step A20, extracting and pooling the line characteristics of any image in the training sample image set to obtain the corresponding average characteristics;
step A30, judging whether the current iteration number M is a multiple of N, if so, executing step A40, otherwise, jumping to step A50; wherein N, M is a natural number;
step A40, extracting the average characteristics of all training sample images in the training sample set, acquiring a pseudo label corresponding to each row through self-similarity clustering, and executing step A50;
and A50, calculating the average characteristics obtained in the step A20 and the loss of the pseudo labels, and updating parameters of the line classifier.
3. The pedestrian re-identification method based on semantic consistency level bar and foreground correction according to claim 2, wherein the self-similarity clustering is a k-means clustering method.
4. The pedestrian re-recognition method based on semantic consistency horizontal stripes and foreground correction according to claim 1, wherein the method comprises the following steps of "respectively acquiring foreground features corresponding to pedestrians in the input image as second features through a row classifier pre-trained in a pedestrian re-recognition model":
obtaining the confidence coefficient of each pixel point in the input image to each part of the human body through the line classifier;
taking the pixel points with the confidence degrees larger than a first set threshold value as foreground pixels, and taking the pixel points with the confidence degrees smaller than a second set threshold value as background pixels;
and constructing features based on the extracted foreground pixels to serve as foreground features corresponding to the pedestrians in the input image.
5. The pedestrian re-recognition method based on semantic consistency horizontal bar and foreground correction according to claim 1, wherein "the features of the horizontal bar regions of each set part of the pedestrian in the input image are obtained as third features by a pre-trained row classifier in a pedestrian re-recognition model", and the method comprises:
performing semantic segmentation on the input image through the line classifier to obtain a confidence map of a horizontal bar region of each set part of a pedestrian in the input image;
and respectively carrying out point-to-point product operation on each confidence map and the first characteristics to obtain the characteristics of the horizontal bar area of each set position of the pedestrian in the input image.
6. The pedestrian re-recognition method based on semantic consistency level bar and foreground correction according to claim 5, wherein the pedestrian re-recognition model has a loss function in training as follows:
Figure 894234DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 751331DEST_PATH_IMAGE002
a loss value representing a pedestrian re-recognition model,
Figure 180039DEST_PATH_IMAGE003
representing the number of training sample images of a batch of the pedestrian re-identification model during training,
Figure 566021DEST_PATH_IMAGE004
the number of the batches is represented by,
Figure 470392DEST_PATH_IMAGE005
any image in a batch of training sample images representing the pedestrian re-identification model during training,
Figure 600022DEST_PATH_IMAGE006
representing features of images in image set A
Figure 617656DEST_PATH_IMAGE005
The characteristic Euclidean distance of the training sample image is maximum,
Figure 564752DEST_PATH_IMAGE007
representing image features in image set B
Figure 566207DEST_PATH_IMAGE005
One training sample image with the smallest euclidean distance of the features of (a),
Figure 499527DEST_PATH_IMAGE008
indicating a preset distance interval between the first and second electrodes,
Figure 637248DEST_PATH_IMAGE009
representation comprises and
Figure 755245DEST_PATH_IMAGE005
all images of the same IDThe set of images of (a) is,
Figure 243995DEST_PATH_IMAGE010
indicating the current batch except
Figure 715428DEST_PATH_IMAGE009
All images except the image contained in (1) construct an image set,
Figure 238813DEST_PATH_IMAGE011
representing the euclidean distance.
7. A pedestrian re-identification system based on semantic consistency horizon and foreground correction, the system comprising: the system comprises an image acquisition module, a global feature extraction module, a local feature extraction module, a feature splicing module and an identification output module;
the image acquisition module is configured to acquire an image to be identified as an input image;
the global feature extraction module is configured to extract features of the input image as first features through a feature extraction layer of a pedestrian re-recognition model;
the local feature extraction module is configured to respectively acquire foreground features corresponding to pedestrians in the input image as second features and acquire features of horizontal bar areas of all set parts of the pedestrians in the input image as third features through a pre-trained row classifier in a pedestrian re-recognition model based on the first features;
the feature splicing module is configured to multiply the second feature and the third feature point to point, and splice the second feature and the first feature to obtain a fourth feature;
the recognition output module is configured to calculate and sort the Euclidean distance between the fourth feature and the feature corresponding to each image in the image library, and output a sorting result as a re-recognition result;
the pedestrian re-identification model is constructed on the basis of a deep convolutional neural network; the line classifier is built based on a fully connected layer and a softmax layer.
8. A storage device having stored thereon a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the pedestrian re-identification method based on semantic correspondence level bar and foreground correction of any one of claims 1-6.
9. A processing device comprising a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; characterized in that said program is adapted to be loaded and executed by a processor to implement a pedestrian re-identification method based on semantic correspondence level bar and foreground correction according to any one of claims 1 to 6.
CN202010918791.1A 2020-09-04 2020-09-04 Pedestrian re-identification method based on semantic consistency horizontal bar and foreground correction Active CN111783753B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010918791.1A CN111783753B (en) 2020-09-04 2020-09-04 Pedestrian re-identification method based on semantic consistency horizontal bar and foreground correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010918791.1A CN111783753B (en) 2020-09-04 2020-09-04 Pedestrian re-identification method based on semantic consistency horizontal bar and foreground correction

Publications (2)

Publication Number Publication Date
CN111783753A true CN111783753A (en) 2020-10-16
CN111783753B CN111783753B (en) 2020-12-15

Family

ID=72762343

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010918791.1A Active CN111783753B (en) 2020-09-04 2020-09-04 Pedestrian re-identification method based on semantic consistency horizontal bar and foreground correction

Country Status (1)

Country Link
CN (1) CN111783753B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801238A (en) * 2021-04-15 2021-05-14 中国科学院自动化研究所 Image classification method and device, electronic equipment and storage medium
CN112906614A (en) * 2021-03-08 2021-06-04 中南大学 Pedestrian re-identification method and device based on attention guidance and storage medium
CN113077556A (en) * 2021-03-29 2021-07-06 深圳大学 Ticket checking system and method based on pedestrian re-identification
CN113158901A (en) * 2021-04-22 2021-07-23 天津大学 Domain-adaptive pedestrian re-identification method
CN114842512A (en) * 2022-07-01 2022-08-02 山东省人工智能研究院 Shielded pedestrian re-identification and retrieval method based on multi-feature cooperation and semantic perception

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108674A (en) * 2017-12-08 2018-06-01 浙江捷尚视觉科技股份有限公司 A kind of recognition methods again of the pedestrian based on joint point analysis
CN110321813A (en) * 2019-06-18 2019-10-11 南京信息工程大学 Cross-domain pedestrian recognition methods again based on pedestrian's segmentation
CN111046732A (en) * 2019-11-11 2020-04-21 华中师范大学 Pedestrian re-identification method based on multi-granularity semantic analysis and storage medium
US20200242404A1 (en) * 2019-01-24 2020-07-30 Casio Computer Co., Ltd. Image searching apparatus, classifier training method, and recording medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108674A (en) * 2017-12-08 2018-06-01 浙江捷尚视觉科技股份有限公司 A kind of recognition methods again of the pedestrian based on joint point analysis
US20200242404A1 (en) * 2019-01-24 2020-07-30 Casio Computer Co., Ltd. Image searching apparatus, classifier training method, and recording medium
CN110321813A (en) * 2019-06-18 2019-10-11 南京信息工程大学 Cross-domain pedestrian recognition methods again based on pedestrian's segmentation
CN111046732A (en) * 2019-11-11 2020-04-21 华中师范大学 Pedestrian re-identification method based on multi-granularity semantic analysis and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906614A (en) * 2021-03-08 2021-06-04 中南大学 Pedestrian re-identification method and device based on attention guidance and storage medium
CN113077556A (en) * 2021-03-29 2021-07-06 深圳大学 Ticket checking system and method based on pedestrian re-identification
CN112801238A (en) * 2021-04-15 2021-05-14 中国科学院自动化研究所 Image classification method and device, electronic equipment and storage medium
CN112801238B (en) * 2021-04-15 2021-07-27 中国科学院自动化研究所 Image classification method and device, electronic equipment and storage medium
CN113158901A (en) * 2021-04-22 2021-07-23 天津大学 Domain-adaptive pedestrian re-identification method
CN114842512A (en) * 2022-07-01 2022-08-02 山东省人工智能研究院 Shielded pedestrian re-identification and retrieval method based on multi-feature cooperation and semantic perception
CN114842512B (en) * 2022-07-01 2022-10-14 山东省人工智能研究院 Shielded pedestrian re-identification and retrieval method based on multi-feature cooperation and semantic perception

Also Published As

Publication number Publication date
CN111783753B (en) 2020-12-15

Similar Documents

Publication Publication Date Title
CN111783753B (en) Pedestrian re-identification method based on semantic consistency horizontal bar and foreground correction
Rocco et al. Efficient neighbourhood consensus networks via submanifold sparse convolutions
CN108399386B (en) Method and device for extracting information in pie chart
Tong et al. Salient object detection via bootstrap learning
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
CN111795704B (en) Method and device for constructing visual point cloud map
Krajník et al. Image features for visual teach-and-repeat navigation in changing environments
Lazebnik et al. Affine-invariant local descriptors and neighborhood statistics for texture recognition
Costea et al. Creating roadmaps in aerial images with generative adversarial networks and smoothing-based optimization
CN109522908A (en) Image significance detection method based on area label fusion
Carmona et al. Human action recognition by means of subtensor projections and dense trajectories
Xia et al. Loop closure detection for visual SLAM using PCANet features
CN111931703B (en) Object detection method based on human-object interaction weak supervision label
Bojanić et al. On the comparison of classic and deep keypoint detector and descriptor methods
CN111680678B (en) Target area identification method, device, equipment and readable storage medium
CN111753119A (en) Image searching method and device, electronic equipment and storage medium
Haindl et al. A competition in unsupervised color image segmentation
CN108932455A (en) Remote sensing images scene recognition method and device
Bappy et al. Real estate image classification
CN113723492A (en) Hyperspectral image semi-supervised classification method and device for improving active deep learning
CN113111716A (en) Remote sensing image semi-automatic labeling method and device based on deep learning
CN112052819A (en) Pedestrian re-identification method, device, equipment and storage medium
CN114168768A (en) Image retrieval method and related equipment
CN113255394A (en) Pedestrian re-identification method and system based on unsupervised learning
Elfiky et al. Compact and adaptive spatial pyramids for scene recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant