CN113378729A - Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method - Google Patents
Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method Download PDFInfo
- Publication number
- CN113378729A CN113378729A CN202110667913.9A CN202110667913A CN113378729A CN 113378729 A CN113378729 A CN 113378729A CN 202110667913 A CN202110667913 A CN 202110667913A CN 113378729 A CN113378729 A CN 113378729A
- Authority
- CN
- China
- Prior art keywords
- image
- pedestrian
- pose
- embedding
- fusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000004927 fusion Effects 0.000 title claims abstract description 37
- 241000282414 Homo sapiens Species 0.000 claims abstract description 27
- 238000005259 measurement Methods 0.000 claims abstract description 18
- 230000036544 posture Effects 0.000 claims abstract description 16
- 238000007781 pre-processing Methods 0.000 claims abstract description 5
- 238000012360 testing method Methods 0.000 claims description 18
- 239000013598 vector Substances 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 238000005457 optimization Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- 125000004432 carbon atom Chemical group C* 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 description 9
- 230000008859 change Effects 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 210000003423 ankle Anatomy 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 210000003127 knee Anatomy 0.000 description 3
- 210000002414 leg Anatomy 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 210000000707 wrist Anatomy 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 206010034719 Personality change Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention discloses a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method, which comprises the following steps of: preprocessing an original pedestrian image by adopting a random erasing mode to obtain a pedestrian image, optimizing a baseline network of a Resnet-50 network model, and extracting a deep convolution characteristic; extracting a significant human body image from an original pedestrian image; firstly, extracting the postures of the human body saliency image, and then extracting the local semantic features of the human body position image; carrying out weighted fusion on the depth convolution characteristics and the local semantic characteristics, and carrying out distance measurement on the weighted fusion characteristics to generate an initial measurement list; and reordering the images in the initial measurement list according to a reordering algorithm to obtain an image correct matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian. The accuracy of identification and positioning can be greatly improved.
Description
Technical Field
The invention belongs to the technical field of image processing methods, and relates to a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method.
Background
In recent years, artificial intelligence is the most important point for the development of science and technology, and the technology used by people is the most important point. Its use in the intelligent monitoring domain has also become extremely important. With the expansion of cities, monitoring systems are further popularized, and each city has thousands of cameras all over the street head and the street tail. The use of cameras is increasing and the cost of relying solely on human monitoring is extremely expensive and not capable of monitoring so many pictures at the same time. Therefore, the pedestrian re-identification technology attracts the attention of researchers. It can help people to monitor, track and identify pedestrians. Since human beings receive and perceive various kinds of external information mainly through visual technology, and the visual technology owned by human beings can directly obtain required information from cumbersome images. Researchers also desire to have cameras that effectively and quickly capture objects in an environment that mimic the human visual system. This technology is ultimately derived into our present pedestrian re-identification technology. The technology of pedestrian re-identification is widely used, and for example, an intelligent monitoring system needs to use the technology of pedestrian re-identification. The technology processes data by using the powerful capability of a computer, for example, a video monitoring system can automatically filter out some useless information and actively identify a human body, so that the comprehensive monitoring is effectively carried out, and a 24-hour monitoring system with early warning and later evidence obtaining can be realized. Also using this technique is pedestrian traffic statistics. It also borrows the powerful ability of computers to process data, automatically filters out some useless information, and automatically identifies pedestrians and counts. Meanwhile, pedestrians appearing in different areas for many times can not be counted repeatedly, so that the pedestrian flow can be effectively and accurately counted.
The pedestrian re-identification precision is influenced by a key factor, namely the dislocation of the pedestrian, and the mutual shielding of all parts of the body of the pedestrian and the continuous change of the posture caused by the dislocation are a great challenge for the research of the pedestrian re-identification. First, the pedestrian is constantly changing in posture during the course of movement, and the pedestrian inevitably changes in posture, which means that local changes in the body are unpredictable in the bounding box. For example, a pedestrian may place his hand behind or on the top of his head during movement, causing local occlusion due to misalignment, which has a large impact on the extracted features. Secondly, detecting when the arrangement of the pedestrians is irregular can affect the accuracy of the pedestrian re-identification research. One method commonly used in the area of pedestrian re-identification is to divide the bounding box into horizontal stripes, however this method can only be built with slight vertical deviations. When the vertical deviation is misaligned, the detection of the body and the head may be matched with the background, resulting in erroneous recognition of the pedestrian re-recognition task. The horizontal striping approach is therefore not ideal in the case of severe misalignment. In the case of a pedestrian with a changing posture, the background changes, so that the background may be wrongly weighted by the convolutional neural network to influence the recognition accuracy. Therefore, how to solve the influence of dislocation and background change caused by the change of the pedestrian posture is the key for improving the accuracy of pedestrian re-identification.
Disclosure of Invention
The invention aims to provide a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method, which solves the problem of low pedestrian re-identification precision caused by dislocation and background change caused by pedestrian attitude change in the prior art.
The invention adopts the technical scheme that a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method comprises the following steps:
step 1, preprocessing an original pedestrian image by adopting a random erasing mode to obtain a pedestrian image, performing baseline network optimization on a Resnet-50 network model, and inputting the pedestrian image into the optimized Resnet-50 network model to obtain a deep convolution characteristic;
step 2, taking the original pedestrian image as an input image to perform feature extraction to obtain a significant human body image;
step 3, firstly adopting a posture convolver to extract the postures of the human body saliency images to obtain body position images, and then inputting the body position images into a ResNet-50 network to extract local semantic features;
step 4, carrying out weighted fusion on the depth convolution characteristics and the local semantic characteristics to obtain weighted fusion characteristics, respectively measuring the distances between the images in the image test library and the image query library and the fusion characteristics, and generating an initial measurement list for the result after distance measurement;
and 5, reordering the images in the initial measurement list according to a reordering algorithm to obtain a correct image matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian.
The invention is also characterized in that:
the specific mode for carrying out the baseline network optimization on the Resnet-50 network model is as follows:
and optimizing a loss function of the Resnet-50 network model by combining Softmax loss and triple loss, wherein the optimized loss function is as follows:
in the above formula, m is the number of loss functions;
in the above formula, the first and second carbon atoms are,is the feature vector of the anchor point sample,is the feature vector of the positive sample,is a feature vector of negative samples, alpha isA distance betweenThe distance between them is the smallest distance, + represents [, ]]When the value of the internal is more than zero, the value is a loss value, and when the value is less than zero, the loss is zero.
The step 2 specifically comprises the following steps:
step 2.1, removing the last pooling stage of the VGG-16 network structure to be used as a network structure, inputting an original pedestrian image serving as an input image into the network structure, and outputting feature mapping;
step 2.2, deconvoluting the feature mapping into the size of the input image, and adding a new convolution layer to generate a prediction significance map;
and 2.3, firstly, applying the convolutional layer with the core size of 1 multiplied by 1 in the network structure to the conv1-2 layer to generate boundary prediction, then adding the boundary prediction to the prediction significance map to obtain a refined boundary frame, and then, applying one convolutional layer to carry out convolution on the refined boundary frame to obtain a significant human body image.
The step 3 specifically comprises the following steps:
step 3.1, taking the significant human body image as the input of a posture estimator, and positioning 14 joint points;
3.2, positioning 14 human body joints into 6 sub-areas, cutting, rotating and adjusting the sizes of the 6 sub-areas to fixed sizes and directions, and combining to form a spliced body part image;
step 3.3, carrying out pose transformation on the size of each body part in the spliced body part image to obtain a body part image;
and 3.4, inputting the body part image into a ResNet-50 network for training, and extracting local semantic features.
The specific process of the step 5 is as follows:
testing image p and image set G ═ G for a pedestrianiCoding k-reciprocal nearest neighbor into a single vector through weighting to form k-reciprocal characteristics, then calculating Jacobian distances between a pedestrian test image p and an image set by utilizing the k-reciprocal characteristics of the image, and finally weighting the original distances between the pedestrian test image p and the image set and the Jacobian distances to obtain a distance formula; and calculating the distance between the image and the fusion feature in the initial measurement list according to a distance formula, reordering to obtain a correct image matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian.
The invention has the beneficial effects that:
the invention relates to a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method, which comprises the steps of fusing a depth global feature and a local semantic feature, measuring distances between different images through fused weighted features, identifying and retrieving the images of the same pedestrian, and using the pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method to identify and retrieve the images of the pedestrian in an original image database to obtain the image of the specific pedestrian, so that the pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method is better suitable for a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification system; the performance of the baseline network is improved by the methods of random erasure and triple loss functions, and the local features obtained by attitude estimation and extraction and the global features obtained by the baseline network are used for carrying out feature weighted aggregation, so that the aim of global optimization is fulfilled, the target identification and positioning are facilitated, the operation speed of the algorithm is increased, and the stability of the system is improved; the method can greatly improve the accuracy of identification and positioning, and can be used for identifying and searching the target of the pedestrian image and in other fields.
Drawings
FIG. 1 is a flow chart of a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method;
FIG. 2 is a diagram of the effect of random erasure processing of a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method of the invention;
FIG. 3 is a triple loss schematic diagram of a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method of the invention;
FIG. 4 is a pose embedding effect diagram of a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
A pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method is shown in figure 1 and comprises the following steps:
step 1, establishing an image database, wherein the image database is a pedestrian image which is manually collected and corrected by using a computer in the embodiment, and the total number of the images is 72000. Preprocessing an original pedestrian image by adopting a random erasing mode to obtain a pedestrian image, performing baseline network optimization on the Resnet-50 network model, and inputting the pedestrian image into the optimized Resnet-50 network model to obtain a deep convolution characteristic;
step 1.1, randomly erasing an original pedestrian image by adopting a random erasing enhancement processing method to obtain a pedestrian image;
specifically, Random erase enhancement (REA) is an effective data enhancement method. The method aims to shield different training images, randomly generate a rectangular area in the images, randomly generate the position and the size of the rectangular area, shield partial pedestrian images, and set the pixel value of the image shielding area as a random value. By the method, the occurrence of over-fitting can be reduced, and the convergence capability of the network model is improved, so that the performance of the deep learning model is improved.
In the network model training, for an original training data set, assuming that the probability of random erasure of the original data set is P, the probability of non-erasure is 1-P. In the random erasing process, a rectangular area is generated with a set probability P to shield the image, and the position and the size of the shielded area which are randomly erased and shielded in the process are random.
Assume that an image that needs to be randomly erased, i.e., the size of the original pedestrian image, is:
S=W×H (1);
in the above formula, W is the width of the pedestrian image, and H is the height value of the pedestrian image;
assuming that the area size of the rectangular region for random erase is SeAnd the size of the area is at a minimum value SlAnd maximum value ShWithin the specified range. The aspect ratio of the random erasure area is reThen randomly erasing the width H of the rectangular areaeAnd height WeComprises the following steps:
in the above formula, SeFor the area value of the erased rectangular frame, reTo erase the aspect ratio of the rectangular frame, HeTo erase the height of the rectangular frame, WeTo erase the width of the rectangular frame.
Randomly selecting a point P ═ x (x) on the original pedestrian imagee,ye) If the following formula (4) and formula (5) are satisfied:
xe+We≤W (4);
ye+He≤H (5);
then the rectangular area to be erased of the original pedestrian image is (x)e,ye,xe+We,ye+He) The area to be erased is selected by random erasing, and each pixel in the rectangular area is allocated [0, 255%]The random value in (1) is used to replace the original rectangular region. If the randomly selected point P is (x)e,ye) If the conditions of equations (4) and (5) are not satisfied, the above process is repeated all the time, and the image is emphasizedA new point P ═ x is newly selectede,ye) Until the appropriate random point is selected. Finally, the original pedestrian image (i.e., the pedestrian image) after being randomly erased is output, as shown in fig. 2.
Step 1.2, optimizing a loss function of the Resnet-50 network model by combining Softmax loss and triple loss;
specifically, in the field of pedestrian re-identification, triple loss (Triplet loss) is also widely applied, and is more applied in a network model together with Softmax loss. As shown in fig. 3, when using the triple loss function, three pictures are taken as input to the network:whereinFor Anchor samples (Anchor), randomly selecting samples in the data set for training the network model,training samples representing the identity of pedestrians that belong to the same class as the anchor sample, i.e. positive samples,training samples representing identities of pedestrians that are not of the same class as the anchor sample, i.e., negative samples. These training samples are input into a similar network structure for feature extraction, as shown in fig. 3, and after learning through Triplet loss, the distance between the original sample and the positive sample is the smallest, and the distance between the original sample and the negative sample is the largest. The final formula for calculating Triplet loss is:
in the above formula, the first and second carbon atoms are,as anchor samplesThe feature vector of the present invention is,is the feature vector of the positive sample,is a feature vector of negative samples, alpha isA distance betweenThe distance between them is the smallest distance, + represents [, ]]When the internal value is more than zero, the value is a loss value, and when the internal value is less than zero, the loss is zero;
as can be seen from the objective function: when in useAndis less thanAndthe distance between the two is added with alpha]If the value of (1) is greater than zero, there will be a loss value, whenAndis greater than or equal toAndthe distance between them is addedAt α, the loss value is zero.
Through the triple loss function, the network model can shorten the distance between the pedestrian images with the same label and shorten the distance between the pedestrian images with different labels, so that the trained network model has higher discriminability.
And optimizing a loss function of the Resnet-50 network model by combining Softmax loss and triple loss, wherein the optimized loss function is as follows:
in the above formula, m is the number of loss functions;
and step 1.3, inputting the pedestrian image into the optimized Resnet-50 network model to obtain the deep convolution characteristic.
Step 2, taking the original pedestrian image as an input image to perform feature extraction, and separating the foreground from the background to obtain a significant human body image;
step 2.1, removing the last pooling stage of the VGG-16 network structure to be used as a network structure, inputting an original pedestrian image serving as an input image into the network structure, and outputting feature mapping;
specifically, the VGG-16 model has ideal effects in the aspects of image classification and generalization special effects, so the significance model also uses the VGG-16 to construct a network structure. Given an input image of size WXH, the output map has a size [ W/2 ]5,H/25]So a network structure built based on VGG-16 reduces the output by a factor of 32 of feature mapping. In this embodiment, the last pooling stage of VGG-16 is eliminated, so that the size of the input image can be enlarged, and the semantic context and image details can be balanced. Therefore, the feature map output by the network structure of the present invention is reduced by 16 times compared to the input image.
Step 2.2, the integrated feature map already contains various saliency cues so that they can be used to predict saliency maps. Specifically, deconvoluting the feature mapping into the size of the input image, and adding a new convolution layer to generate a prediction significance map;
and 2.3, adding boundary refinement by introducing short connection into the prediction result, further performing boundary refinement to separate the foreground from the background, and expecting that the bottom layer features are helpful for predicting the boundary of the object. Furthermore, these features also have the same spatial resolution for the input image. Specifically, a convolutional layer with the core size of 1 × 1 in the network structure is applied to a conv1-2 layer to generate boundary prediction, the boundary prediction is added to a prediction significance map to obtain a refined boundary frame, and then the refined boundary frame is convolved by applying one convolutional layer to obtain a significant human body image.
And 3, firstly, extracting the postures of the human body saliency images by adopting a posture convolver to obtain body position images, and then inputting the body position images into a ResNet-50 network to extract local semantic features. Specifically, the pose extraction was performed using a ready-made model of a pose convolver, which is a sequential convolution structure that can detect 14 body joints, i.e., the head, neck, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right knees, and left and right ankles, as shown in fig. 4.
Step 3.1, taking the significant human body image as the input of a posture estimator, and positioning 14 joint points, wherein the 14 joints are head, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, left hip, left knee, left ankle, right hip, right knee and right ankle;
step 3.2, positioning 14 human joints into 6 sub-regions (head, upper body, left arm, right arm, left leg and right leg) as human body parts, cutting, rotating and adjusting the size of the 6 sub-regions to a fixed size and direction, and combining to form a spliced body part image; due to the different sizes of 6 parts of a human body, black areas inevitably appear in a human body image;
step 3.3, carrying out pose transformation on the size of each body part in the spliced body part image to obtain a body part image;
since the black region appears in the stitched body part image, it is necessary to perform pose transformation on the size of each body part to remove the black region, and the size of each body part is determined mainly according to observation. For example, the embodiment observes that the arm width is about 20 pixels and the leg width is about 30 pixels, and decreasing these parameter values will result in information loss, and increasing these parameters may bring more background noise. But as long as the parameter variation is small, the system performance remains stable. The reason for this is that when the part size varies within a small range, the authentication information contained therein does not vary much, so the network can still learn authentication embedding given the monitoring signal.
And 3.4, dividing the body part image into a test set and a training set, inputting a ResNet-50 network for training, and extracting local semantic features. The ResNet-50 network in the step and the ResNet-50 network optimized in the step 1 do not contribute to the weight, but train a new weight alone to judge the local semantic image and extract the local semantic features.
Step 4, carrying out weighted fusion on the depth convolution characteristics and the local semantic characteristics to obtain weighted fusion characteristics, respectively measuring the distances between the images in the image test library and the image query library and the fusion characteristics, generating an initial measurement list ranking for the results after distance measurement, and returning a query score; the feature weighted aggregation is taken as shown in the following formula:
d=αfDEEP+(1-α)fSOD (8);
in the above formula, the parameter 0 ≦ α ≦ 1 represents different weights between the deep global feature and the local semantic feature.
And 5, reordering the images in the initial measurement list according to a reordering algorithm to obtain a correct image matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian.
Specifically, a pedestrian test image p and an image set G ═ G are testedi1,2, N, coding k-reciprocal nearest neighbor into a single vector by weighting to form k-reciprocal characteristics, then calculating Jacobian distances of a pedestrian test image p and an image set by using the k-reciprocal characteristics of the image, and finally testing the pedestrian test image p and the original distance of the image set and the Jacobian distancesWeighting the distance to obtain the distance; and calculating the distance between the image and the fusion feature in the initial measurement list, sequencing to obtain the correct matching ranking of the image, and outputting a pedestrian matching image to identify the specific pedestrian.
Step 5.1, firstly, a pedestrian image p is given for testing, and an image set G ═ G is giveni1,2, N is used for pedestrian image reference, and the original distance between the pedestrian image p and the reference data set gi is measured by mahalanobis distance, and the measurement result is shown in a formula
In the above formula, xpTo test the appearance characteristics of the image p,is a reference image giM is a positive semi-definite matrix;
from test image P and reference image giThe original distance between the two is obtained after initializing the sorted list:
and 5.2, the purpose of the reordering strategy is to reorder the L (p, G) initial list ranking, so that more correctly matched image samples are arranged at the first position of the list, and the identification precision of pedestrian re-identification is improved.
The top k ranked samples in the initial ranking list, i.e., k neighbors (k-nearest neighbors, k-nn):
the k-reciprocal nearest neighbors (k-reciprocal nearest neighbors, k-rnn) are expressed as:
R(p,k)=gi|(gi∈N(p,k))∧p∈N(gi,k) (12);
however, due to a series of influencing factors such as brightness variation, posture variation, view angle variation and occlusion, the correctly matched samples may be excluded from the nearest neighbors. To solve this problem, each candidate nearest neighbor set is converted into a more robust set:
for each test image sample in the original set R (p, k), find their k-reciprocal nearest neighbor setWhen the number of the overlapped samples reaches a certain condition, the overlapped samples and the R (p, k) are merged, and more positive samples can be added into the R (p, k) set after expansion;
step 5.3, according to the original distance between the retrieval image and the near neighbor, the weight is redistributed, and the k-inverted nearest neighbor set of the sample image is encoded into an N-dimensional vector through a Gaussian kernel, which is defined as Expressed as:
based on neighbors being assigned greater weights and distant neighbors being assigned lesser weights, the candidates for intersection and union needed to compute the Jacobian distances may be computed as:
the intersection sets take the minimum value in the corresponding dimensionality of the two feature vectors as the degree that the two feature vectors contain gi together through minimum operation, and the maximum operation of the union set is to count the total set of matching candidates in the two sets;
step 5.4, the final Jacobian distance is expressed as:
and correcting the initial sorted list by combining the original distance and the Jacobi distance, wherein the final distance is defined as:
d*(p,gi)=(1-λ)dJ(p,gi)+λd(p,gi) (18);
in the above formula, λ is a weighting parameter λ representing the weight of two distances, and when λ is 0, only the jacobian distance is considered, and when λ is 1, only the original distance is considered, where λ is set to 0.3;
and 5.5, calculating the distance between the image and the fusion feature in the initial measurement list by using a formula (18), sequencing to obtain a correct image matching ranking, outputting a pedestrian matching image to identify a specific pedestrian, and finishing identification.
Through the mode, the multi-scale convolution feature fusion pedestrian re-identification method based on pose embedding mainly aims at retrieving and inquiring corresponding pedestrian pictures from a large number of pedestrian image databases and finding the pictures of the same pedestrian in the image databases through a pair of images. Under the influence of a complex background is filtered by separating a foreground and the background, the local characteristics of pedestrians are extracted by using a human body key point estimation method, and the robustness of a network model is enhanced by carrying out image preprocessing on a basic line network by using a random erasing method, so that the global characteristics with higher robustness are extracted; and finally, performing depth weighted fusion on the features with different scales, and improving the similarity measurement between the features by a reordering method.
Claims (5)
1. A pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method is characterized by comprising the following steps:
step 1, preprocessing an original pedestrian image by adopting a random erasing mode to obtain a pedestrian image, performing baseline network optimization on a Resnet-50 network model, and inputting the pedestrian image into the optimized Resnet-50 network model to obtain a deep convolution characteristic;
step 2, taking the original pedestrian image as an input image to perform feature extraction to obtain a significant human body image;
step 3, firstly adopting a posture convolver to extract the postures of the human body saliency images to obtain body position images, and then inputting the body position images into a ResNet-50 network to extract local semantic features;
step 4, performing weighted fusion on the depth convolution characteristics and the local semantic characteristics to obtain weighted fusion characteristics, respectively measuring the distances between the images in the image test library and the image query library and the fusion characteristics, and generating an initial measurement list for the result after distance measurement;
and 5, reordering the images in the initial measurement list according to a reordering algorithm to obtain a correct image matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian.
2. The pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method according to claim 1, characterized in that a specific way of performing baseline network optimization on a Resnet-50 network model is as follows:
and optimizing a loss function of the Resnet-50 network model by combining Softmax loss and triple loss, wherein the optimized loss function is as follows:
in the above formula, m is the number of loss functions;
in the above formula, the first and second carbon atoms are,is the feature vector of the anchor point sample,is the feature vector of the positive sample,is a feature vector of negative samples, alpha isA distance betweenThe distance between them is the smallest distance, + represents [, ]]When the value of the internal is more than zero, the value is a loss value, and when the value is less than zero, the loss is zero.
3. The pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method according to claim 1, wherein the step 2 specifically comprises the following steps:
step 2.1, removing the last pooling stage of the VGG-16 network structure to be used as a network structure, inputting an original pedestrian image serving as an input image into the network structure, and outputting feature mapping;
step 2.2, deconvoluting the feature mapping into the size of an input image, adding a new convolution layer, and generating a prediction significance map;
and 2.3, firstly, applying the convolution layer with the core size of 1 multiplied by 1 in the network structure to a conv1-2 layer to generate boundary prediction, then adding the boundary prediction to a prediction significance map to obtain a refined boundary frame, and then, applying one convolution layer to carry out convolution on the refined boundary frame to obtain a significant human body image.
4. The pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method according to claim 1, wherein the step 3 specifically comprises the following steps:
step 3.1, taking the significant human body image as the input of a posture estimator, and positioning 14 joint points;
3.2, positioning 14 human body joints into 6 sub-areas, cutting, rotating and adjusting the sizes of the 6 sub-areas to fixed sizes and directions, and combining to form a spliced body part image;
step 3.3, carrying out pose transformation on the size of each body part in the spliced body part image to obtain a body part image;
and 3.4, inputting the body part image into a ResNet-50 network for training, and extracting local semantic features.
5. The pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method according to claim 1, characterized in that the specific process of step 5 is as follows:
testing image p and image set G ═ G for a pedestrianiCoding k-reciprocal nearest neighbor into a single vector through weighting to form k-reciprocal characteristics, then calculating Jacobian distances between a pedestrian test image p and an image set by utilizing the k-reciprocal characteristics of the image, and finally weighting the original distances between the pedestrian test image p and the image set and the Jacobian distances to obtain a distance formula; and calculating the distance between the image and the fusion feature in the initial measurement list according to a distance formula, reordering to obtain a correct image matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110667913.9A CN113378729A (en) | 2021-06-16 | 2021-06-16 | Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110667913.9A CN113378729A (en) | 2021-06-16 | 2021-06-16 | Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113378729A true CN113378729A (en) | 2021-09-10 |
Family
ID=77572789
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110667913.9A Pending CN113378729A (en) | 2021-06-16 | 2021-06-16 | Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113378729A (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105787439A (en) * | 2016-02-04 | 2016-07-20 | 广州新节奏智能科技有限公司 | Depth image human body joint positioning method based on convolution nerve network |
CN109359684A (en) * | 2018-10-17 | 2019-02-19 | 苏州大学 | Fine granularity model recognizing method based on Weakly supervised positioning and subclass similarity measurement |
CN109740541A (en) * | 2019-01-04 | 2019-05-10 | 重庆大学 | A kind of pedestrian weight identifying system and method |
CN110163110A (en) * | 2019-04-23 | 2019-08-23 | 中电科大数据研究院有限公司 | A kind of pedestrian's recognition methods again merged based on transfer learning and depth characteristic |
CN110717411A (en) * | 2019-09-23 | 2020-01-21 | 湖北工业大学 | Pedestrian re-identification method based on deep layer feature fusion |
CN111401113A (en) * | 2019-01-02 | 2020-07-10 | 南京大学 | Pedestrian re-identification method based on human body posture estimation |
CN111709311A (en) * | 2020-05-27 | 2020-09-25 | 西安理工大学 | Pedestrian re-identification method based on multi-scale convolution feature fusion |
CN111783736A (en) * | 2020-07-23 | 2020-10-16 | 上海高重信息科技有限公司 | Pedestrian re-identification method, device and system based on human body semantic alignment |
CN111860147A (en) * | 2020-06-11 | 2020-10-30 | 北京市威富安防科技有限公司 | Pedestrian re-identification model optimization processing method and device and computer equipment |
-
2021
- 2021-06-16 CN CN202110667913.9A patent/CN113378729A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105787439A (en) * | 2016-02-04 | 2016-07-20 | 广州新节奏智能科技有限公司 | Depth image human body joint positioning method based on convolution nerve network |
CN109359684A (en) * | 2018-10-17 | 2019-02-19 | 苏州大学 | Fine granularity model recognizing method based on Weakly supervised positioning and subclass similarity measurement |
CN111401113A (en) * | 2019-01-02 | 2020-07-10 | 南京大学 | Pedestrian re-identification method based on human body posture estimation |
CN109740541A (en) * | 2019-01-04 | 2019-05-10 | 重庆大学 | A kind of pedestrian weight identifying system and method |
CN110163110A (en) * | 2019-04-23 | 2019-08-23 | 中电科大数据研究院有限公司 | A kind of pedestrian's recognition methods again merged based on transfer learning and depth characteristic |
CN110717411A (en) * | 2019-09-23 | 2020-01-21 | 湖北工业大学 | Pedestrian re-identification method based on deep layer feature fusion |
CN111709311A (en) * | 2020-05-27 | 2020-09-25 | 西安理工大学 | Pedestrian re-identification method based on multi-scale convolution feature fusion |
CN111860147A (en) * | 2020-06-11 | 2020-10-30 | 北京市威富安防科技有限公司 | Pedestrian re-identification model optimization processing method and device and computer equipment |
CN111783736A (en) * | 2020-07-23 | 2020-10-16 | 上海高重信息科技有限公司 | Pedestrian re-identification method, device and system based on human body semantic alignment |
Non-Patent Citations (1)
Title |
---|
郑烨;赵杰煜;王翀;张毅;: "基于姿态引导对齐网络的局部行人再识别", 计算机工程, no. 05, 15 May 2020 (2020-05-15) * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108960140B (en) | Pedestrian re-identification method based on multi-region feature extraction and fusion | |
Wan et al. | DA-RoadNet: A dual-attention network for road extraction from high resolution satellite imagery | |
Yin et al. | Hot region selection based on selective search and modified fuzzy C-means in remote sensing images | |
WO2019232894A1 (en) | Complex scene-based human body key point detection system and method | |
CN109949340A (en) | Target scale adaptive tracking method based on OpenCV | |
CN111046856B (en) | Parallel pose tracking and map creating method based on dynamic and static feature extraction | |
CN112101150A (en) | Multi-feature fusion pedestrian re-identification method based on orientation constraint | |
CN111046732B (en) | Pedestrian re-recognition method based on multi-granularity semantic analysis and storage medium | |
Cao et al. | A coarse-to-fine weakly supervised learning method for green plastic cover segmentation using high-resolution remote sensing images | |
CN111709311A (en) | Pedestrian re-identification method based on multi-scale convolution feature fusion | |
CN110008913A (en) | The pedestrian's recognition methods again merged based on Attitude estimation with viewpoint mechanism | |
CN112699834B (en) | Traffic identification detection method, device, computer equipment and storage medium | |
CN112395977A (en) | Mammal posture recognition method based on body contour and leg joint skeleton | |
CN114596500A (en) | Remote sensing image semantic segmentation method based on channel-space attention and DeeplabV3plus | |
CN105574545B (en) | The semantic cutting method of street environment image various visual angles and device | |
Gong et al. | A two-level framework for place recognition with 3D LiDAR based on spatial relation graph | |
CN115527269A (en) | Intelligent human body posture image identification method and system | |
CN111709317A (en) | Pedestrian re-identification method based on multi-scale features under saliency model | |
Guo et al. | Image classification based on SURF and KNN | |
Pang et al. | Analysis of computer vision applied in martial arts | |
Shanmugavadivu et al. | FOSIR: fuzzy-object-shape for image retrieval applications | |
CN105825215A (en) | Instrument positioning method based on local neighbor embedded kernel function and carrier of method | |
Zhang | Sports action recognition based on particle swarm optimization neural networks | |
CN111862147A (en) | Method for tracking multiple vehicles and multiple human targets in video | |
Zhou et al. | Place recognition and navigation of outdoor mobile robots based on random Forest learning with a 3D LiDAR |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |