CN113378729A - Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method - Google Patents

Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method Download PDF

Info

Publication number
CN113378729A
CN113378729A CN202110667913.9A CN202110667913A CN113378729A CN 113378729 A CN113378729 A CN 113378729A CN 202110667913 A CN202110667913 A CN 202110667913A CN 113378729 A CN113378729 A CN 113378729A
Authority
CN
China
Prior art keywords
image
pedestrian
pose
embedding
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110667913.9A
Other languages
Chinese (zh)
Inventor
廖开阳
雷浩
郑元林
章明珠
范冰
黄港
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Technology
Original Assignee
Xian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Technology filed Critical Xian University of Technology
Priority to CN202110667913.9A priority Critical patent/CN113378729A/en
Publication of CN113378729A publication Critical patent/CN113378729A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method, which comprises the following steps of: preprocessing an original pedestrian image by adopting a random erasing mode to obtain a pedestrian image, optimizing a baseline network of a Resnet-50 network model, and extracting a deep convolution characteristic; extracting a significant human body image from an original pedestrian image; firstly, extracting the postures of the human body saliency image, and then extracting the local semantic features of the human body position image; carrying out weighted fusion on the depth convolution characteristics and the local semantic characteristics, and carrying out distance measurement on the weighted fusion characteristics to generate an initial measurement list; and reordering the images in the initial measurement list according to a reordering algorithm to obtain an image correct matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian. The accuracy of identification and positioning can be greatly improved.

Description

Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method
Technical Field
The invention belongs to the technical field of image processing methods, and relates to a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method.
Background
In recent years, artificial intelligence is the most important point for the development of science and technology, and the technology used by people is the most important point. Its use in the intelligent monitoring domain has also become extremely important. With the expansion of cities, monitoring systems are further popularized, and each city has thousands of cameras all over the street head and the street tail. The use of cameras is increasing and the cost of relying solely on human monitoring is extremely expensive and not capable of monitoring so many pictures at the same time. Therefore, the pedestrian re-identification technology attracts the attention of researchers. It can help people to monitor, track and identify pedestrians. Since human beings receive and perceive various kinds of external information mainly through visual technology, and the visual technology owned by human beings can directly obtain required information from cumbersome images. Researchers also desire to have cameras that effectively and quickly capture objects in an environment that mimic the human visual system. This technology is ultimately derived into our present pedestrian re-identification technology. The technology of pedestrian re-identification is widely used, and for example, an intelligent monitoring system needs to use the technology of pedestrian re-identification. The technology processes data by using the powerful capability of a computer, for example, a video monitoring system can automatically filter out some useless information and actively identify a human body, so that the comprehensive monitoring is effectively carried out, and a 24-hour monitoring system with early warning and later evidence obtaining can be realized. Also using this technique is pedestrian traffic statistics. It also borrows the powerful ability of computers to process data, automatically filters out some useless information, and automatically identifies pedestrians and counts. Meanwhile, pedestrians appearing in different areas for many times can not be counted repeatedly, so that the pedestrian flow can be effectively and accurately counted.
The pedestrian re-identification precision is influenced by a key factor, namely the dislocation of the pedestrian, and the mutual shielding of all parts of the body of the pedestrian and the continuous change of the posture caused by the dislocation are a great challenge for the research of the pedestrian re-identification. First, the pedestrian is constantly changing in posture during the course of movement, and the pedestrian inevitably changes in posture, which means that local changes in the body are unpredictable in the bounding box. For example, a pedestrian may place his hand behind or on the top of his head during movement, causing local occlusion due to misalignment, which has a large impact on the extracted features. Secondly, detecting when the arrangement of the pedestrians is irregular can affect the accuracy of the pedestrian re-identification research. One method commonly used in the area of pedestrian re-identification is to divide the bounding box into horizontal stripes, however this method can only be built with slight vertical deviations. When the vertical deviation is misaligned, the detection of the body and the head may be matched with the background, resulting in erroneous recognition of the pedestrian re-recognition task. The horizontal striping approach is therefore not ideal in the case of severe misalignment. In the case of a pedestrian with a changing posture, the background changes, so that the background may be wrongly weighted by the convolutional neural network to influence the recognition accuracy. Therefore, how to solve the influence of dislocation and background change caused by the change of the pedestrian posture is the key for improving the accuracy of pedestrian re-identification.
Disclosure of Invention
The invention aims to provide a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method, which solves the problem of low pedestrian re-identification precision caused by dislocation and background change caused by pedestrian attitude change in the prior art.
The invention adopts the technical scheme that a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method comprises the following steps:
step 1, preprocessing an original pedestrian image by adopting a random erasing mode to obtain a pedestrian image, performing baseline network optimization on a Resnet-50 network model, and inputting the pedestrian image into the optimized Resnet-50 network model to obtain a deep convolution characteristic;
step 2, taking the original pedestrian image as an input image to perform feature extraction to obtain a significant human body image;
step 3, firstly adopting a posture convolver to extract the postures of the human body saliency images to obtain body position images, and then inputting the body position images into a ResNet-50 network to extract local semantic features;
step 4, carrying out weighted fusion on the depth convolution characteristics and the local semantic characteristics to obtain weighted fusion characteristics, respectively measuring the distances between the images in the image test library and the image query library and the fusion characteristics, and generating an initial measurement list for the result after distance measurement;
and 5, reordering the images in the initial measurement list according to a reordering algorithm to obtain a correct image matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian.
The invention is also characterized in that:
the specific mode for carrying out the baseline network optimization on the Resnet-50 network model is as follows:
and optimizing a loss function of the Resnet-50 network model by combining Softmax loss and triple loss, wherein the optimized loss function is as follows:
Figure BDA0003117648780000031
in the above formula, m is the number of loss functions;
Figure BDA0003117648780000032
in the above formula, the first and second carbon atoms are,
Figure BDA0003117648780000033
is the feature vector of the anchor point sample,
Figure BDA0003117648780000034
is the feature vector of the positive sample,
Figure BDA0003117648780000035
is a feature vector of negative samples, alpha is
Figure BDA0003117648780000036
A distance between
Figure BDA0003117648780000037
The distance between them is the smallest distance, + represents [, ]]When the value of the internal is more than zero, the value is a loss value, and when the value is less than zero, the loss is zero.
The step 2 specifically comprises the following steps:
step 2.1, removing the last pooling stage of the VGG-16 network structure to be used as a network structure, inputting an original pedestrian image serving as an input image into the network structure, and outputting feature mapping;
step 2.2, deconvoluting the feature mapping into the size of the input image, and adding a new convolution layer to generate a prediction significance map;
and 2.3, firstly, applying the convolutional layer with the core size of 1 multiplied by 1 in the network structure to the conv1-2 layer to generate boundary prediction, then adding the boundary prediction to the prediction significance map to obtain a refined boundary frame, and then, applying one convolutional layer to carry out convolution on the refined boundary frame to obtain a significant human body image.
The step 3 specifically comprises the following steps:
step 3.1, taking the significant human body image as the input of a posture estimator, and positioning 14 joint points;
3.2, positioning 14 human body joints into 6 sub-areas, cutting, rotating and adjusting the sizes of the 6 sub-areas to fixed sizes and directions, and combining to form a spliced body part image;
step 3.3, carrying out pose transformation on the size of each body part in the spliced body part image to obtain a body part image;
and 3.4, inputting the body part image into a ResNet-50 network for training, and extracting local semantic features.
The specific process of the step 5 is as follows:
testing image p and image set G ═ G for a pedestrianiCoding k-reciprocal nearest neighbor into a single vector through weighting to form k-reciprocal characteristics, then calculating Jacobian distances between a pedestrian test image p and an image set by utilizing the k-reciprocal characteristics of the image, and finally weighting the original distances between the pedestrian test image p and the image set and the Jacobian distances to obtain a distance formula; and calculating the distance between the image and the fusion feature in the initial measurement list according to a distance formula, reordering to obtain a correct image matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian.
The invention has the beneficial effects that:
the invention relates to a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method, which comprises the steps of fusing a depth global feature and a local semantic feature, measuring distances between different images through fused weighted features, identifying and retrieving the images of the same pedestrian, and using the pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method to identify and retrieve the images of the pedestrian in an original image database to obtain the image of the specific pedestrian, so that the pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method is better suitable for a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification system; the performance of the baseline network is improved by the methods of random erasure and triple loss functions, and the local features obtained by attitude estimation and extraction and the global features obtained by the baseline network are used for carrying out feature weighted aggregation, so that the aim of global optimization is fulfilled, the target identification and positioning are facilitated, the operation speed of the algorithm is increased, and the stability of the system is improved; the method can greatly improve the accuracy of identification and positioning, and can be used for identifying and searching the target of the pedestrian image and in other fields.
Drawings
FIG. 1 is a flow chart of a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method;
FIG. 2 is a diagram of the effect of random erasure processing of a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method of the invention;
FIG. 3 is a triple loss schematic diagram of a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method of the invention;
FIG. 4 is a pose embedding effect diagram of a pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
A pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method is shown in figure 1 and comprises the following steps:
step 1, establishing an image database, wherein the image database is a pedestrian image which is manually collected and corrected by using a computer in the embodiment, and the total number of the images is 72000. Preprocessing an original pedestrian image by adopting a random erasing mode to obtain a pedestrian image, performing baseline network optimization on the Resnet-50 network model, and inputting the pedestrian image into the optimized Resnet-50 network model to obtain a deep convolution characteristic;
step 1.1, randomly erasing an original pedestrian image by adopting a random erasing enhancement processing method to obtain a pedestrian image;
specifically, Random erase enhancement (REA) is an effective data enhancement method. The method aims to shield different training images, randomly generate a rectangular area in the images, randomly generate the position and the size of the rectangular area, shield partial pedestrian images, and set the pixel value of the image shielding area as a random value. By the method, the occurrence of over-fitting can be reduced, and the convergence capability of the network model is improved, so that the performance of the deep learning model is improved.
In the network model training, for an original training data set, assuming that the probability of random erasure of the original data set is P, the probability of non-erasure is 1-P. In the random erasing process, a rectangular area is generated with a set probability P to shield the image, and the position and the size of the shielded area which are randomly erased and shielded in the process are random.
Assume that an image that needs to be randomly erased, i.e., the size of the original pedestrian image, is:
S=W×H (1);
in the above formula, W is the width of the pedestrian image, and H is the height value of the pedestrian image;
assuming that the area size of the rectangular region for random erase is SeAnd the size of the area is at a minimum value SlAnd maximum value ShWithin the specified range. The aspect ratio of the random erasure area is reThen randomly erasing the width H of the rectangular areaeAnd height WeComprises the following steps:
Figure BDA0003117648780000071
Figure BDA0003117648780000072
in the above formula, SeFor the area value of the erased rectangular frame, reTo erase the aspect ratio of the rectangular frame, HeTo erase the height of the rectangular frame, WeTo erase the width of the rectangular frame.
Randomly selecting a point P ═ x (x) on the original pedestrian imagee,ye) If the following formula (4) and formula (5) are satisfied:
xe+We≤W (4);
ye+He≤H (5);
then the rectangular area to be erased of the original pedestrian image is (x)e,ye,xe+We,ye+He) The area to be erased is selected by random erasing, and each pixel in the rectangular area is allocated [0, 255%]The random value in (1) is used to replace the original rectangular region. If the randomly selected point P is (x)e,ye) If the conditions of equations (4) and (5) are not satisfied, the above process is repeated all the time, and the image is emphasizedA new point P ═ x is newly selectede,ye) Until the appropriate random point is selected. Finally, the original pedestrian image (i.e., the pedestrian image) after being randomly erased is output, as shown in fig. 2.
Step 1.2, optimizing a loss function of the Resnet-50 network model by combining Softmax loss and triple loss;
specifically, in the field of pedestrian re-identification, triple loss (Triplet loss) is also widely applied, and is more applied in a network model together with Softmax loss. As shown in fig. 3, when using the triple loss function, three pictures are taken as input to the network:
Figure BDA0003117648780000081
wherein
Figure BDA0003117648780000082
For Anchor samples (Anchor), randomly selecting samples in the data set for training the network model,
Figure BDA0003117648780000083
training samples representing the identity of pedestrians that belong to the same class as the anchor sample, i.e. positive samples,
Figure BDA0003117648780000084
training samples representing identities of pedestrians that are not of the same class as the anchor sample, i.e., negative samples. These training samples are input into a similar network structure for feature extraction, as shown in fig. 3, and after learning through Triplet loss, the distance between the original sample and the positive sample is the smallest, and the distance between the original sample and the negative sample is the largest. The final formula for calculating Triplet loss is:
Figure BDA0003117648780000085
in the above formula, the first and second carbon atoms are,
Figure BDA0003117648780000086
as anchor samplesThe feature vector of the present invention is,
Figure BDA0003117648780000087
is the feature vector of the positive sample,
Figure BDA0003117648780000088
is a feature vector of negative samples, alpha is
Figure BDA0003117648780000089
A distance between
Figure BDA00031176487800000810
The distance between them is the smallest distance, + represents [, ]]When the internal value is more than zero, the value is a loss value, and when the internal value is less than zero, the loss is zero;
as can be seen from the objective function: when in use
Figure BDA00031176487800000811
And
Figure BDA00031176487800000812
is less than
Figure BDA00031176487800000813
And
Figure BDA00031176487800000814
the distance between the two is added with alpha]If the value of (1) is greater than zero, there will be a loss value, when
Figure BDA00031176487800000815
And
Figure BDA00031176487800000816
is greater than or equal to
Figure BDA00031176487800000817
And
Figure BDA00031176487800000818
the distance between them is addedAt α, the loss value is zero.
Through the triple loss function, the network model can shorten the distance between the pedestrian images with the same label and shorten the distance between the pedestrian images with different labels, so that the trained network model has higher discriminability.
And optimizing a loss function of the Resnet-50 network model by combining Softmax loss and triple loss, wherein the optimized loss function is as follows:
Figure BDA0003117648780000091
in the above formula, m is the number of loss functions;
and step 1.3, inputting the pedestrian image into the optimized Resnet-50 network model to obtain the deep convolution characteristic.
Step 2, taking the original pedestrian image as an input image to perform feature extraction, and separating the foreground from the background to obtain a significant human body image;
step 2.1, removing the last pooling stage of the VGG-16 network structure to be used as a network structure, inputting an original pedestrian image serving as an input image into the network structure, and outputting feature mapping;
specifically, the VGG-16 model has ideal effects in the aspects of image classification and generalization special effects, so the significance model also uses the VGG-16 to construct a network structure. Given an input image of size WXH, the output map has a size [ W/2 ]5,H/25]So a network structure built based on VGG-16 reduces the output by a factor of 32 of feature mapping. In this embodiment, the last pooling stage of VGG-16 is eliminated, so that the size of the input image can be enlarged, and the semantic context and image details can be balanced. Therefore, the feature map output by the network structure of the present invention is reduced by 16 times compared to the input image.
Step 2.2, the integrated feature map already contains various saliency cues so that they can be used to predict saliency maps. Specifically, deconvoluting the feature mapping into the size of the input image, and adding a new convolution layer to generate a prediction significance map;
and 2.3, adding boundary refinement by introducing short connection into the prediction result, further performing boundary refinement to separate the foreground from the background, and expecting that the bottom layer features are helpful for predicting the boundary of the object. Furthermore, these features also have the same spatial resolution for the input image. Specifically, a convolutional layer with the core size of 1 × 1 in the network structure is applied to a conv1-2 layer to generate boundary prediction, the boundary prediction is added to a prediction significance map to obtain a refined boundary frame, and then the refined boundary frame is convolved by applying one convolutional layer to obtain a significant human body image.
And 3, firstly, extracting the postures of the human body saliency images by adopting a posture convolver to obtain body position images, and then inputting the body position images into a ResNet-50 network to extract local semantic features. Specifically, the pose extraction was performed using a ready-made model of a pose convolver, which is a sequential convolution structure that can detect 14 body joints, i.e., the head, neck, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right knees, and left and right ankles, as shown in fig. 4.
Step 3.1, taking the significant human body image as the input of a posture estimator, and positioning 14 joint points, wherein the 14 joints are head, neck, right shoulder, right elbow, right wrist, left shoulder, left elbow, left wrist, left hip, left knee, left ankle, right hip, right knee and right ankle;
step 3.2, positioning 14 human joints into 6 sub-regions (head, upper body, left arm, right arm, left leg and right leg) as human body parts, cutting, rotating and adjusting the size of the 6 sub-regions to a fixed size and direction, and combining to form a spliced body part image; due to the different sizes of 6 parts of a human body, black areas inevitably appear in a human body image;
step 3.3, carrying out pose transformation on the size of each body part in the spliced body part image to obtain a body part image;
since the black region appears in the stitched body part image, it is necessary to perform pose transformation on the size of each body part to remove the black region, and the size of each body part is determined mainly according to observation. For example, the embodiment observes that the arm width is about 20 pixels and the leg width is about 30 pixels, and decreasing these parameter values will result in information loss, and increasing these parameters may bring more background noise. But as long as the parameter variation is small, the system performance remains stable. The reason for this is that when the part size varies within a small range, the authentication information contained therein does not vary much, so the network can still learn authentication embedding given the monitoring signal.
And 3.4, dividing the body part image into a test set and a training set, inputting a ResNet-50 network for training, and extracting local semantic features. The ResNet-50 network in the step and the ResNet-50 network optimized in the step 1 do not contribute to the weight, but train a new weight alone to judge the local semantic image and extract the local semantic features.
Step 4, carrying out weighted fusion on the depth convolution characteristics and the local semantic characteristics to obtain weighted fusion characteristics, respectively measuring the distances between the images in the image test library and the image query library and the fusion characteristics, generating an initial measurement list ranking for the results after distance measurement, and returning a query score; the feature weighted aggregation is taken as shown in the following formula:
d=αfDEEP+(1-α)fSOD (8);
in the above formula, the parameter 0 ≦ α ≦ 1 represents different weights between the deep global feature and the local semantic feature.
And 5, reordering the images in the initial measurement list according to a reordering algorithm to obtain a correct image matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian.
Specifically, a pedestrian test image p and an image set G ═ G are testedi1,2, N, coding k-reciprocal nearest neighbor into a single vector by weighting to form k-reciprocal characteristics, then calculating Jacobian distances of a pedestrian test image p and an image set by using the k-reciprocal characteristics of the image, and finally testing the pedestrian test image p and the original distance of the image set and the Jacobian distancesWeighting the distance to obtain the distance; and calculating the distance between the image and the fusion feature in the initial measurement list, sequencing to obtain the correct matching ranking of the image, and outputting a pedestrian matching image to identify the specific pedestrian.
Step 5.1, firstly, a pedestrian image p is given for testing, and an image set G ═ G is giveni1,2, N is used for pedestrian image reference, and the original distance between the pedestrian image p and the reference data set gi is measured by mahalanobis distance, and the measurement result is shown in a formula
Figure BDA0003117648780000121
In the above formula, xpTo test the appearance characteristics of the image p,
Figure BDA0003117648780000122
is a reference image giM is a positive semi-definite matrix;
from test image P and reference image giThe original distance between the two is obtained after initializing the sorted list:
Figure BDA0003117648780000123
and 5.2, the purpose of the reordering strategy is to reorder the L (p, G) initial list ranking, so that more correctly matched image samples are arranged at the first position of the list, and the identification precision of pedestrian re-identification is improved.
The top k ranked samples in the initial ranking list, i.e., k neighbors (k-nearest neighbors, k-nn):
Figure BDA0003117648780000124
the k-reciprocal nearest neighbors (k-reciprocal nearest neighbors, k-rnn) are expressed as:
R(p,k)=gi|(gi∈N(p,k))∧p∈N(gi,k) (12);
however, due to a series of influencing factors such as brightness variation, posture variation, view angle variation and occlusion, the correctly matched samples may be excluded from the nearest neighbors. To solve this problem, each candidate nearest neighbor set is converted into a more robust set:
Figure BDA0003117648780000131
Figure BDA0003117648780000132
for each test image sample in the original set R (p, k), find their k-reciprocal nearest neighbor set
Figure BDA0003117648780000133
When the number of the overlapped samples reaches a certain condition, the overlapped samples and the R (p, k) are merged, and more positive samples can be added into the R (p, k) set after expansion;
step 5.3, according to the original distance between the retrieval image and the near neighbor, the weight is redistributed, and the k-inverted nearest neighbor set of the sample image is encoded into an N-dimensional vector through a Gaussian kernel, which is defined as
Figure BDA0003117648780000134
Figure BDA0003117648780000135
Expressed as:
Figure BDA0003117648780000136
based on neighbors being assigned greater weights and distant neighbors being assigned lesser weights, the candidates for intersection and union needed to compute the Jacobian distances may be computed as:
Figure BDA0003117648780000137
Figure BDA0003117648780000138
the intersection sets take the minimum value in the corresponding dimensionality of the two feature vectors as the degree that the two feature vectors contain gi together through minimum operation, and the maximum operation of the union set is to count the total set of matching candidates in the two sets;
step 5.4, the final Jacobian distance is expressed as:
Figure BDA0003117648780000139
and correcting the initial sorted list by combining the original distance and the Jacobi distance, wherein the final distance is defined as:
d*(p,gi)=(1-λ)dJ(p,gi)+λd(p,gi) (18);
in the above formula, λ is a weighting parameter λ representing the weight of two distances, and when λ is 0, only the jacobian distance is considered, and when λ is 1, only the original distance is considered, where λ is set to 0.3;
and 5.5, calculating the distance between the image and the fusion feature in the initial measurement list by using a formula (18), sequencing to obtain a correct image matching ranking, outputting a pedestrian matching image to identify a specific pedestrian, and finishing identification.
Through the mode, the multi-scale convolution feature fusion pedestrian re-identification method based on pose embedding mainly aims at retrieving and inquiring corresponding pedestrian pictures from a large number of pedestrian image databases and finding the pictures of the same pedestrian in the image databases through a pair of images. Under the influence of a complex background is filtered by separating a foreground and the background, the local characteristics of pedestrians are extracted by using a human body key point estimation method, and the robustness of a network model is enhanced by carrying out image preprocessing on a basic line network by using a random erasing method, so that the global characteristics with higher robustness are extracted; and finally, performing depth weighted fusion on the features with different scales, and improving the similarity measurement between the features by a reordering method.

Claims (5)

1. A pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method is characterized by comprising the following steps:
step 1, preprocessing an original pedestrian image by adopting a random erasing mode to obtain a pedestrian image, performing baseline network optimization on a Resnet-50 network model, and inputting the pedestrian image into the optimized Resnet-50 network model to obtain a deep convolution characteristic;
step 2, taking the original pedestrian image as an input image to perform feature extraction to obtain a significant human body image;
step 3, firstly adopting a posture convolver to extract the postures of the human body saliency images to obtain body position images, and then inputting the body position images into a ResNet-50 network to extract local semantic features;
step 4, performing weighted fusion on the depth convolution characteristics and the local semantic characteristics to obtain weighted fusion characteristics, respectively measuring the distances between the images in the image test library and the image query library and the fusion characteristics, and generating an initial measurement list for the result after distance measurement;
and 5, reordering the images in the initial measurement list according to a reordering algorithm to obtain a correct image matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian.
2. The pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method according to claim 1, characterized in that a specific way of performing baseline network optimization on a Resnet-50 network model is as follows:
and optimizing a loss function of the Resnet-50 network model by combining Softmax loss and triple loss, wherein the optimized loss function is as follows:
Figure FDA0003117648770000011
in the above formula, m is the number of loss functions;
Figure FDA0003117648770000021
in the above formula, the first and second carbon atoms are,
Figure FDA0003117648770000022
is the feature vector of the anchor point sample,
Figure FDA0003117648770000023
is the feature vector of the positive sample,
Figure FDA0003117648770000024
is a feature vector of negative samples, alpha is
Figure FDA0003117648770000025
A distance between
Figure FDA0003117648770000026
The distance between them is the smallest distance, + represents [, ]]When the value of the internal is more than zero, the value is a loss value, and when the value is less than zero, the loss is zero.
3. The pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method according to claim 1, wherein the step 2 specifically comprises the following steps:
step 2.1, removing the last pooling stage of the VGG-16 network structure to be used as a network structure, inputting an original pedestrian image serving as an input image into the network structure, and outputting feature mapping;
step 2.2, deconvoluting the feature mapping into the size of an input image, adding a new convolution layer, and generating a prediction significance map;
and 2.3, firstly, applying the convolution layer with the core size of 1 multiplied by 1 in the network structure to a conv1-2 layer to generate boundary prediction, then adding the boundary prediction to a prediction significance map to obtain a refined boundary frame, and then, applying one convolution layer to carry out convolution on the refined boundary frame to obtain a significant human body image.
4. The pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method according to claim 1, wherein the step 3 specifically comprises the following steps:
step 3.1, taking the significant human body image as the input of a posture estimator, and positioning 14 joint points;
3.2, positioning 14 human body joints into 6 sub-areas, cutting, rotating and adjusting the sizes of the 6 sub-areas to fixed sizes and directions, and combining to form a spliced body part image;
step 3.3, carrying out pose transformation on the size of each body part in the spliced body part image to obtain a body part image;
and 3.4, inputting the body part image into a ResNet-50 network for training, and extracting local semantic features.
5. The pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method according to claim 1, characterized in that the specific process of step 5 is as follows:
testing image p and image set G ═ G for a pedestrianiCoding k-reciprocal nearest neighbor into a single vector through weighting to form k-reciprocal characteristics, then calculating Jacobian distances between a pedestrian test image p and an image set by utilizing the k-reciprocal characteristics of the image, and finally weighting the original distances between the pedestrian test image p and the image set and the Jacobian distances to obtain a distance formula; and calculating the distance between the image and the fusion feature in the initial measurement list according to a distance formula, reordering to obtain a correct image matching ranking, and outputting a pedestrian matching image to identify a specific pedestrian.
CN202110667913.9A 2021-06-16 2021-06-16 Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method Pending CN113378729A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110667913.9A CN113378729A (en) 2021-06-16 2021-06-16 Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110667913.9A CN113378729A (en) 2021-06-16 2021-06-16 Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method

Publications (1)

Publication Number Publication Date
CN113378729A true CN113378729A (en) 2021-09-10

Family

ID=77572789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110667913.9A Pending CN113378729A (en) 2021-06-16 2021-06-16 Pose embedding-based multi-scale convolution feature fusion pedestrian re-identification method

Country Status (1)

Country Link
CN (1) CN113378729A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787439A (en) * 2016-02-04 2016-07-20 广州新节奏智能科技有限公司 Depth image human body joint positioning method based on convolution nerve network
CN109359684A (en) * 2018-10-17 2019-02-19 苏州大学 Fine granularity model recognizing method based on Weakly supervised positioning and subclass similarity measurement
CN109740541A (en) * 2019-01-04 2019-05-10 重庆大学 A kind of pedestrian weight identifying system and method
CN110163110A (en) * 2019-04-23 2019-08-23 中电科大数据研究院有限公司 A kind of pedestrian's recognition methods again merged based on transfer learning and depth characteristic
CN110717411A (en) * 2019-09-23 2020-01-21 湖北工业大学 Pedestrian re-identification method based on deep layer feature fusion
CN111401113A (en) * 2019-01-02 2020-07-10 南京大学 Pedestrian re-identification method based on human body posture estimation
CN111709311A (en) * 2020-05-27 2020-09-25 西安理工大学 Pedestrian re-identification method based on multi-scale convolution feature fusion
CN111783736A (en) * 2020-07-23 2020-10-16 上海高重信息科技有限公司 Pedestrian re-identification method, device and system based on human body semantic alignment
CN111860147A (en) * 2020-06-11 2020-10-30 北京市威富安防科技有限公司 Pedestrian re-identification model optimization processing method and device and computer equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105787439A (en) * 2016-02-04 2016-07-20 广州新节奏智能科技有限公司 Depth image human body joint positioning method based on convolution nerve network
CN109359684A (en) * 2018-10-17 2019-02-19 苏州大学 Fine granularity model recognizing method based on Weakly supervised positioning and subclass similarity measurement
CN111401113A (en) * 2019-01-02 2020-07-10 南京大学 Pedestrian re-identification method based on human body posture estimation
CN109740541A (en) * 2019-01-04 2019-05-10 重庆大学 A kind of pedestrian weight identifying system and method
CN110163110A (en) * 2019-04-23 2019-08-23 中电科大数据研究院有限公司 A kind of pedestrian's recognition methods again merged based on transfer learning and depth characteristic
CN110717411A (en) * 2019-09-23 2020-01-21 湖北工业大学 Pedestrian re-identification method based on deep layer feature fusion
CN111709311A (en) * 2020-05-27 2020-09-25 西安理工大学 Pedestrian re-identification method based on multi-scale convolution feature fusion
CN111860147A (en) * 2020-06-11 2020-10-30 北京市威富安防科技有限公司 Pedestrian re-identification model optimization processing method and device and computer equipment
CN111783736A (en) * 2020-07-23 2020-10-16 上海高重信息科技有限公司 Pedestrian re-identification method, device and system based on human body semantic alignment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郑烨;赵杰煜;王翀;张毅;: "基于姿态引导对齐网络的局部行人再识别", 计算机工程, no. 05, 15 May 2020 (2020-05-15) *

Similar Documents

Publication Publication Date Title
CN108960140B (en) Pedestrian re-identification method based on multi-region feature extraction and fusion
Wan et al. DA-RoadNet: A dual-attention network for road extraction from high resolution satellite imagery
Yin et al. Hot region selection based on selective search and modified fuzzy C-means in remote sensing images
WO2019232894A1 (en) Complex scene-based human body key point detection system and method
CN109949340A (en) Target scale adaptive tracking method based on OpenCV
CN111046856B (en) Parallel pose tracking and map creating method based on dynamic and static feature extraction
CN112101150A (en) Multi-feature fusion pedestrian re-identification method based on orientation constraint
CN111046732B (en) Pedestrian re-recognition method based on multi-granularity semantic analysis and storage medium
Cao et al. A coarse-to-fine weakly supervised learning method for green plastic cover segmentation using high-resolution remote sensing images
CN111709311A (en) Pedestrian re-identification method based on multi-scale convolution feature fusion
CN110008913A (en) The pedestrian's recognition methods again merged based on Attitude estimation with viewpoint mechanism
CN112699834B (en) Traffic identification detection method, device, computer equipment and storage medium
CN112395977A (en) Mammal posture recognition method based on body contour and leg joint skeleton
CN114596500A (en) Remote sensing image semantic segmentation method based on channel-space attention and DeeplabV3plus
CN105574545B (en) The semantic cutting method of street environment image various visual angles and device
Gong et al. A two-level framework for place recognition with 3D LiDAR based on spatial relation graph
CN115527269A (en) Intelligent human body posture image identification method and system
CN111709317A (en) Pedestrian re-identification method based on multi-scale features under saliency model
Guo et al. Image classification based on SURF and KNN
Pang et al. Analysis of computer vision applied in martial arts
Shanmugavadivu et al. FOSIR: fuzzy-object-shape for image retrieval applications
CN105825215A (en) Instrument positioning method based on local neighbor embedded kernel function and carrier of method
Zhang Sports action recognition based on particle swarm optimization neural networks
CN111862147A (en) Method for tracking multiple vehicles and multiple human targets in video
Zhou et al. Place recognition and navigation of outdoor mobile robots based on random Forest learning with a 3D LiDAR

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination