CN111062329B

CN111062329B - Unsupervised pedestrian re-identification method based on augmented network

Info

Publication number: CN111062329B
Application number: CN201911310016.1A
Authority: CN
Inventors: 郑伟诗; 袁子逸
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2023-05-30
Anticipated expiration: 2039-12-18
Also published as: CN111062329A

Abstract

The invention provides an unsupervised pedestrian re-identification method based on an augmentation network, which is characterized in that based on pedestrian image data in an original database, various forms of data augmentation are carried out, and the characteristics of the same label data access parameters which are taken as basic data after the augmentation are respectively extracted by a network with unshared parameters, so that the network is helped to train. The method mainly considers how to utilize the unlabeled data which cannot be directly used as input under the condition that the data set is not abundant, and the main network model obtained by the method can be directly used for testing after the feature extraction is directly carried out on the test set; the method can also be used for pre-training a plurality of augmentation networks and the main network by using the unlabeled data, and then the main network parameters are finely adjusted by using the labeled data, so that the unlabeled information is effectively utilized, and the accuracy of pedestrian re-identification is improved.

Description

Unsupervised pedestrian re-identification method based on augmented network

Technical Field

The invention relates to the field of deep learning, in particular to an unsupervised pedestrian re-recognition method.

Background

In recent years, deep learning technology is continuously developed, and deep learning methods based on deep neural networks have been applied to aspects of our lives. Such as text translation, text classification, etc. in the field of natural language processing (Natural Language Processing); image retrieval, face recognition, etc. in the field of Computer vision (Computer Vison). The occurrence of the deep learning method brings great convenience to human society.

The pedestrian re-recognition method is an important application based on the deep learning method. Pedestrian re-recognition (Person-identification) is also called pedestrian re-recognition, and is a technique for judging whether a specific pedestrian exists in an image or video sequence acquired by cameras whose fields of view do not overlap each other by using a computer vision technique. Because of the difference between different camera devices, pedestrians have the characteristics of rigidity and flexibility, and the appearance is easily influenced by wearing, dimensions, shielding, postures, visual angles and the like, the pedestrian re-recognition becomes a hot subject which has research value and is very challenging in the field of computer vision.

Pedestrian re-identification has some special databases in the academic field, but because the acquisition and calibration of data require a lot of manpower and financial resources, the number of images of the data sets is small. Market-1501 and DukeMTMC-reID are two of the common data sets.

The mark-1501 dataset was collected on a university campus of bloom, with images from 6 different cameras. The training set contained 12,936 images and the test set contained 19,732 images. The training data is of 751 people in total and 750 people in the test set. There are on average 17.2 pieces of training data per class (per person) in the training set.

The DukeMTMC-reID dataset was acquired at the university of duke and the images were from 8 different cameras. The training set contained 16,522 images and the test set contained 17,661 images. There were 702 total people in the training data, and 23.5 training data per class (each person) on average.

The two common pedestrian re-identification data sets only have 33,000 Zhang Zuoyou image data, and the difference is obvious compared with the image data of tens of millions in enterprises. While the data set is too small, the training of the neural network may tend to over fit (overfit), resulting in reduced accuracy of testing the neural network trained on the original data set on other data sets.

In this case, many data augmentation (Data Augmentation) methods are beginning to be used in the area of pedestrian re-identification, such as random clipping, random flipping, etc. However, this method only performs secondary processing on the image data on the original labeled data set, and other unlabeled data sets are still not reasonably utilized.

Disclosure of Invention

The invention mainly aims to overcome the defects and shortcomings of the prior art, provides a set of femur distal end personalized bone cutting guide plate which is convenient to operate and can realize accurate positioning and is provided by novel bone cutting logic, materials and design difficulty are reduced as much as possible under the condition that the conditions are met, and the design efficiency is improved.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

an unsupervised pedestrian re-identification method based on an augmented network comprises the following steps:

s1: performing augmentation operation on an original pedestrian image data set D0 without labels, wherein the augmentation operation comprises one or more of image scaling, random cutting, random erasing, noise adding and Gaussian blur, and M new augmentation data sets D1-DM are obtained, and M is a positive integer;

s2: the original image data in the original pedestrian image data set D0 is introduced into a convolutional neural network as a main network N0 to carry out forward propagation extraction to obtain a feature F0;

s3: respectively inputting corresponding augmented image data in M augmented data sets D1-DN into M convolutional neural networks with unshared parameters as augmented networks N1-NM to perform forward propagation extraction to obtain characteristics F1-FM;

s4: randomly selecting an image inequative from an original pedestrian image data set D0 to serve as a negative sample, and introducing a main network N0 to forward propagation and extraction to obtain a characteristic Fnegative;

s5: calculating Euclidean distances by using the output characteristics F0 and the output characteristics F1-FM respectively to obtain M loss values L1-LM;

s6: calculating Euclidean distance by using the output characteristics Fnegative and the output characteristics F0-FM respectively to obtain M+1 loss values L0 negative-LMnegative;

s7: the obtained result of subtracting the M loss values L1-LM obtained in S5 from the M loss values L1 negative-LMnegative obtained in S6 is used as loss to carry out backward propagation calculation gradient update on the augmented network N1-NM;

s8: summing M loss values L1-LM obtained in the step S5, and subtracting the result of summing the loss values L0 negative-LMnegative obtained in the step S6 to obtain a total loss value L0;

s9: taking the total loss value L0 obtained in the step S8 as loss to carry out backward propagation calculation gradient update on the main network N0 to update the main network parameters;

s10: repeating the operations of S2-S9 until the main network and the augmentation network are converged;

s11: the master network model is taken as output.

As a preferred technical solution, in step S1, when the image scaling process is included in the augmentation operation, the image is scaled by using a bilinear interpolation method, so as to simulate various images with different resolutions that may occur in the natural dataset, and the specific calculation method is as follows:

where q11= (x 1, y 1), q12= (x 1, y 2), q21= (x 2, y 1), q22= (x 2, y 2) is the four pixel points where the point (x, y) is closest.

As a preferred technical solution, in step S1, when the augmentation operation includes random clipping, the augmentation is performed by using a random clipping method, so as to simulate various partial pedestrian images that may occur in the natural dataset, and the specific method is as follows:

firstly, randomly selecting a pixel point in an image, then taking the pixel point as an upper left corner, randomly forming a rectangle with a certain length and a certain width, and outputting the pixel point in the whole rectangle as a cutting result.

As a preferred technical solution, in step S1, when the random erasure process is included in the augmentation operation, the random erasure method is used to perform the augmentation, so as to simulate various missing or incomplete pedestrian images that may occur in the natural dataset, and the specific method is as follows:

a pixel point is selected randomly from an image, then the pixel point is taken as the upper left corner to form a rectangle with a certain length and a certain width, all pixel values of the pixel points in the whole rectangle are set to be black, namely pixel values (0, 0 and 0), and then the whole image after operation is output as a random erasing result.

As a preferred technical solution, in step S1, when the noise adding method is included in the amplifying operation, the noise adding method is used for amplifying, so as to simulate the image noise possibly occurring in the natural data set, and the specific operations are as follows:

for each pixel, a certain probability value becomes white point, namely pixel value (255 )), or black point, namely pixel value (0, 0), and then the whole image after operation is output as a noise adding result.

As a preferred technical solution, in step S1, when the gaussian blur processing is included in the augmentation operation, the gaussian blur method is used for the augmentation, so as to simulate the situation of image blur that may occur in the natural dataset, according to the following formula:

after the sigma value is set, a weight matrix can be calculated, so that matrix operation is performed by taking each pixel in the image as the center, and the purpose of blurring the image can be achieved.

As a preferable technical solution, in step S2 and step S3, the respective pedestrian image data are transmitted to the corresponding convolutional neural network, and feature extraction is performed by using a forward propagation method, where a specific forward propagation formula is as follows:

wherein a represents the intermediate layer output; sigma represents an activation function; z represents the input of the activation layer; the superscript indicates the number of layers; * Representing a convolution operation; w represents a convolution kernel; b denotes the bias.

As a preferable technical scheme, step S5 specifically includes:

the Euclidean distance is calculated by the characteristic F0 extracted by the main network N0 and the characteristics F1-FM extracted by the augmentation networks N1-NM respectively, and the specific formula is as follows:

wherein x is brought into the feature F0 extracted by the main network; y are respectively brought into the characteristics F1-FM extracted by the five augmentation networks; xi and yi are values in each dimension of the corresponding feature;

the step S6 specifically comprises the following steps:

and the feature Fnegative is a negative sample, an image is randomly selected as the negative sample Fnegative, and the Euclidean distance is calculated with the output features F0-FM.

As a preferable technical scheme, step S7 specifically includes:

and transmitting the calculated error value back to the corresponding convolutional neural network, and iteratively updating the parameter value of the convolutional neural network by using a backward propagation algorithm, wherein the specific formula is as follows:

wherein the superscript indicates the number of layers; delta represents a gradient value; * Representing a convolution operation; w represents a convolution kernel; rot180 means that the matrix is turned 180 degrees, namely turned up and down once and then turned left and right once; o represents a point-to-point multiplication; σ' represents the derivative of the activation function.

As a preferable technical scheme, step S8 specifically includes:

the error values L1-LM summation obtained after the Euclidean distance is calculated respectively by the features obtained by the augmentation network and the features obtained by the main network are subtracted from the results of the L0 negative-LM negative summation to obtain a total error value L0, and the specific formula is as follows:

wherein lambda is _i ,i∈[1,M]Is a positive sample weight value, L _i ,i∈[1,M]The corresponding positive sample error value, here λ, is taken _i ＝1,i∈[1,M]；λ _inegative ,inegative∈[0,M]Is a negative sample weight value, L _inegative ,inegative∈[0,M]A corresponding negative sample error value, here λ _ineg ＝1,ineg∈[0,M]。

Compared with the prior art, the invention has the following advantages and beneficial effects:

the method uses the method of the augmentation network to utilize the unlabeled pedestrian image data which cannot be directly used as the input of the deep neural network, and the augmentation network and the main network can be trained by end-to-end operation after the unlabeled data set is subjected to the augmentation operation. The deep neural network is trained by utilizing the information that the characteristics extracted by the original data and the augmented data obtained by the original data are consistent as much as possible. The method has great benefits for the pedestrian re-recognition field in which the data set and the data quantity are relatively lacking, and in addition, various different augmentation operations simulate the possible condition of blurring and missing of the pedestrian re-recognition data to a certain extent, so the method provided by the invention can promote the generalization of the trained deep neural network, relieve the overfitting and finally achieve the effect of improving the recognition accuracy

Drawings

FIG. 1 is a flow chart of an unsupervised pedestrian re-identification method based on an augmented network of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.

Examples

As shown in fig. 1, the embodiment provides an unsupervised pedestrian re-identification method based on an augmentation network, which includes the following steps:

s1: the enhancement operation is performed on the unlabeled pedestrian image data set D0, including image scaling, random clipping, random erasure, noise addition and gaussian blur (several of these five enhancement operations may be selected to perform the combined call volume, or all of them may be selected, and the embodiment further illustrates the enhancement operation in step 5), so as to obtain five new enhancement data sets D1 to D5.

In the step S1, the image scaling, random clipping, random erasing, noise adding and Gaussian blur are specifically as follows:

s11: the original unlabeled pedestrian image data is scaled by a bilinear interpolation method, so that various images with different resolutions possibly appearing in a natural dataset are simulated, and the specific calculation mode is as follows:

S12: the original unlabeled pedestrian image data is amplified by using a random clipping method, so that various local pedestrian images possibly appearing in a natural data set are simulated, and the specific operation is as follows: firstly, randomly selecting a pixel point in an image, then taking the pixel point as an upper left corner, randomly forming a rectangle with a certain length and a certain width, and outputting the pixel point in the whole rectangle as a cutting result.

S13: the original unlabeled pedestrian image data is amplified by using a random erasing method, so that various missing or incomplete pedestrian images possibly appearing in a natural data set are simulated, and the specific operation is as follows:

a pixel point is selected randomly from an image, then the pixel point is taken as the upper left corner to form a rectangle with a certain length and a certain width randomly, the pixel values of the pixel points in the whole rectangle are all set to be black (namely, the pixel values (0, 0)), and then the whole image after operation is output as a random erasing result.

S14: the original unlabeled pedestrian image data is amplified by using a noise adding method, so that image noise possibly occurring in a natural data set is simulated, and the method specifically comprises the following steps:

each pixel point has a certain probability value, which becomes a white point (i.e., pixel value (255, 255)) or a black point (i.e., pixel value (0, 0)), and then the entire image after the operation is output as a noise result.

S15: the original unlabeled pedestrian image data is augmented with a gaussian blur method to simulate the image blur that may occur in a natural dataset. According to the following formula:

S2: the original image data in the original pedestrian image data set D0 is introduced into a convolutional neural network as a main network N0 to carry out forward propagation extraction to obtain a feature F0; the respective pedestrian image data are transmitted to the corresponding convolutional neural network, and feature extraction is carried out by using a forward propagation method, wherein the specific formula of forward propagation is as follows:

S3: the corresponding augmented image data in the five augmented data sets D1-D5 are respectively input into a convolutional neural network which is not shared by the five parameters as the augmented networks N1-N5 to carry out forward propagation extraction to obtain characteristics F1-F5; the forward propagation in step S3 takes the same way as in step S2.

S4, randomly selecting an image inequative from an original pedestrian image data set D0 to serve as a negative sample, and introducing a main network N0 to forward propagation and extraction to obtain a characteristic Fnegative;

s5: the Euclidean distance is calculated by using the output characteristic F0 and the output characteristics F1 to F5 respectively to obtain five loss values L1 to L5, specifically:

the Euclidean distance is calculated by the characteristic F0 extracted by the main network N0 and the characteristics F1 to F5 extracted by the augmentation networks N1 to N5 respectively, and the specific formula is as follows:

wherein x is brought into the feature F0 extracted by the main network; y are respectively brought into the characteristics F1 to F5 extracted by the five augmentation networks; xi and yi are values in the respective dimensions of the corresponding feature.

S6: calculating Euclidean distances by using the output characteristics Fnegative and the output characteristics F0-F5 respectively to obtain 6 loss values L0 negative-LMnegative; since in general the data volume in the data set is large and the proportion of the same class of data to the total data volume is small, a randomly selected image is taken here as a negative sample, which is feasible in most cases.

S7: and (3) taking the results obtained by subtracting the five loss values L1-L5 obtained in the step (S5) from the 5 loss values L1 negative-LMnegative obtained in the step (S6) as losses to perform backward propagation calculation gradient update on the augmented networks N1-N5.

The calculated error value is transmitted back to the corresponding convolutional neural network, and the parameter value of the convolutional neural network is iteratively updated by using a backward propagation algorithm, wherein the specific formula is as follows:

S8: the five loss values L1 to L5 obtained in the step S4 are summed and subtracted from the result of the summation of the loss values L0negative to L5negative obtained in the step S6 to obtain a total loss value L0, and the specific formula is as follows:

wherein lambda is _i ,i∈[1,5]Is a positive sample weight value, L _i ,i∈[1,5]The corresponding positive sample error value, here λ, is taken _i ＝1,i∈[1,5]；λ _inegative ,inegative∈[0,5]Is a negative sample weight value, L _inegative ,inegative∈[0,5]A corresponding negative sample error value, here λ _inegative ＝1,inegative∈[0,5]。

S9: and (3) taking the total loss value L0 obtained in the step S6 as loss to perform backward propagation calculation gradient updating on the main network N0 to update the main network parameters.

S10: the operations of S2 to S10 are repeated until the main network and the augmented network converge.

S11: the master network model is taken as output.

The above examples are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the above examples, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principle of the present invention should be made in the equivalent manner, and the embodiments are included in the protection scope of the present invention.

Claims

1. An unsupervised pedestrian re-identification method based on an augmented network is characterized by comprising the following steps:

s11: the master network model is taken as output.

2. The method for unsupervised pedestrian re-recognition based on the augmentation network according to claim 1, wherein in step S1, when the image scaling process is included in the augmentation operation, the image is scaled by using a bilinear interpolation method, so as to simulate the images with various resolutions which may occur in the natural dataset, and the specific calculation method is as follows:

3. The method for unsupervised pedestrian re-recognition based on the augmentation network according to claim 1, wherein in step S1, when the augmentation operation includes random clipping, the augmentation is performed by using a random clipping method, so as to simulate various local pedestrian images that may occur in the natural dataset, and the specific method is as follows:

4. The method for unsupervised pedestrian re-recognition based on the augmentation network according to claim 1, wherein in step S1, when the random erasure process is included in the augmentation operation, the random erasure method is used for the augmentation, so as to simulate various missing or incomplete pedestrian images possibly occurring in the natural dataset, and the specific method is as follows:

5. The method for unsupervised pedestrian re-recognition based on the augmentation network according to claim 1, wherein in step S1, when the operation of augmentation includes a noise adding method, the noise adding method is used for augmentation, so as to simulate the image noise possibly occurring in the natural data set, and the specific operation is as follows:

6. The method for unsupervised pedestrian re-recognition based on the augmentation network according to claim 1, wherein in step S1, when the gaussian blur processing is included in the augmentation operation, the augmentation is performed using a gaussian blur method, so as to simulate the situation of image blur that may occur in the natural data set, according to the following formula:

7. The method for unsupervised pedestrian re-recognition based on the augmented network according to claim 1, wherein in step S2 and step S3, the respective pedestrian image data are transmitted to the corresponding convolutional neural network, and the feature extraction is performed by using a forward propagation method, and the forward propagation specific formula is as follows:

8. The method for unsupervised pedestrian re-identification based on the augmented network according to claim 1, wherein step S5 specifically comprises:

the step S6 specifically comprises the following steps:

9. The method for unsupervised pedestrian re-identification based on the augmented network according to claim 1, wherein step S7 specifically comprises:

10. The method for unsupervised pedestrian re-identification based on the augmented network according to claim 1, wherein step S8 specifically comprises:

wherein lambda is _i ,i∈[1,M]Is a positive sample weight value, L _i ,i∈[1,M]The corresponding positive sample error value, here λ, is taken _i ＝1,i∈[1,M]；λ _inegative ,inegative∈[0,M]Is a negative sample weight value, L _inegative ,inegative∈[0,M]A corresponding negative sample error value, here λ _inegative ＝1,inegative∈[0,M]。