CN110175506B

CN110175506B - Pedestrian re-identification method and device based on parallel dimensionality reduction convolutional neural network

Info

Publication number: CN110175506B
Application number: CN201910277665.XA
Authority: CN
Inventors: 熊贇; 朱旭东; 段宇; 朱扬勇
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2019-04-08
Filing date: 2019-04-08
Publication date: 2023-01-06
Anticipated expiration: 2039-04-08
Also published as: CN110175506A

Abstract

The invention belongs to the technical field of machine learning, and particularly relates to a pedestrian re-identification method and device based on a parallel dimension reduction convolutional neural network. The method comprises the following steps: constructing and training a convolutional neural network based on a parallel reduced convolutional kernel as a feature extraction model; preprocessing the target image determined to be retrieved and the target image to be judged to obtain a preprocessed target image and a corresponding image to be judged; sequentially inputting the target image and the image to be judged into the feature extraction model to obtain a plurality of pedestrian feature vectors to be judged and a plurality of target feature vectors; and searching a pedestrian image consistent with the target image in the target image to be determined according to the characteristic vector. The method adopts the parallel convolution kernels to reduce convolution parameters, simultaneously adopts a plurality of low-dimensional symmetric convolution kernels and low-dimensional asymmetric convolution kernels to replace higher-dimensional convolution kernels, reduces the operation amount, and has higher pedestrian re-identification precision than the prior various pedestrian re-identification methods.

Description

Pedestrian re-identification method and device based on parallel dimensionality reduction convolutional neural network

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to a pedestrian image re-identification method and device.

Background

Pedestrian re-identification is a method of identifying a specified target in an image or video obtained by a cross-regional camera by performing feature analysis on the obtained image by using a computer vision technology. The pedestrian re-identification is a research work with great significance in the security field and has important life application scene potential.

The technology of pedestrian re-identification originally originates from multi-target tracking and gradually develops into a more independent research field. Early pedestrian re-identification technology adopts a machine learning method to analyze by manually selecting characteristics such as color, texture, edge, shape and the like. However, these corresponding features and methods are difficult to analyze the target features more comprehensively, and the recognition accuracy is not ideal enough under the condition of changing due to different environmental factors such as light, weather, etc.

With the advent and popularity of deep learning methods and convolutional neural networks in recent years, a number of corresponding methods have been applied to the study of pedestrian re-identification, and have made some progress in identification accuracy. The method comprises the main processes of training a convolutional neural network by using a training set, extracting features of a target image by using the trained convolutional neural network, obtaining feature expression vectors of the target image and an image to be judged, and finding out a target with the highest similarity by using a similarity table comparison method related to the vectors.

The pedestrian image features are more, the manual feature extraction is difficult to describe, and the convolutional neural network can effectively extract the depth features. Researchers have achieved high recognition rate on the data set Market1501 by using the convolutional neural network.

However, the common convolutional neural network model always faces the problems of parameter increase and calculation amount increase along with the increase of the number of layers and the depth of the model, and in the process of deep learning of the multilayer neural network, the problems of excessive parameters, excessive calculation amount and low efficiency still exist to a certain extent in the face of the huge training data amount of pedestrian re-identification. Therefore, the storage and calculation costs are very large, and when the method is applied to large-scale pedestrian re-recognition (for example, when the number of images to be determined is large), the problems of excessive parameters, overlarge models, massive calculation and low efficiency are more easily caused, so that not only is the hardware requirement required for completing the pedestrian re-recognition improved, but also the time required for model training is increased, and even the training process can hardly be completed.

Disclosure of Invention

The invention aims to provide a pedestrian image re-identification method and a pedestrian image re-identification device which can complete a pedestrian re-identification task under a large-scale data set and have small calculation amount.

The invention provides a pedestrian image re-identification method, which is based on a parallel dimension reduction convolution neural network technology, finds a given target in an image or video sequence acquired by a plurality of different cameras, and replaces an original symmetric convolution kernel by adding dimension reduction convolution and an asymmetric convolution kernel, and comprises the following specific steps:

step S1: constructing a convolutional neural network based on a parallel deconvolution kernel, adopting a plurality of pedestrian images selected from a standard data set as a training set, carrying out quantitative training on a convolutional neural network model, and taking the trained convolutional neural network model as a feature extraction model;

step S2: preprocessing the target image determined to be retrieved to obtain a corresponding preprocessed target image, and preprocessing the target image to be judged to obtain a corresponding image to be judged;

and step S3: sequentially inputting the preprocessed target image and the image to be judged into a feature extraction model to respectively obtain a plurality of pedestrian feature vectors to be judged corresponding to the preprocessed image to be judged and a plurality of target feature vectors corresponding to the preprocessed target image;

and step S4: and searching a pedestrian image consistent with the target image in the target image to be judged according to the target characteristic vector and the pedestrian characteristic vector to be judged.

Wherein, step S1 comprises the following substeps:

s1-1, preprocessing a plurality of existing target pedestrian images used as training sets to obtain images with uniform sizes;

s1-2, constructing a convolutional neural network model based on a parallel deconvolution kernel, wherein the convolutional neural network model comprises an input module, a deconvolution module, a reduction module, a pooling layer and a full-connection layer; the main functions of each module are as follows: the input module is used for inputting image data to be detected and extracting relevant features, the convolution reduction module adopts a multi-channel parallel structure to carry out convolution, image feature parameters are reduced, and the dimension reduction module carries out dimension reduction on the extracted image features; the parameters in the parameter matrix of each layer are randomly set; the structure of which is shown in figure 2. The deconvolution module is divided into 3 types, which are recorded as: reduce convolution module A, reduce convolution module B, reduce convolution module C, reduce the module and divide into 2 types, record as: a reduction module A and a reduction module B; wherein:

4 deconvolution modules A connected in sequence are arranged behind the input module; the deconvolution module A is divided into 4 groups of convolutions;

the reduction module A is arranged behind the deconvolution module A to play a role of pooling reduction data scale, and the reduction module A is composed of 3 groups of convolution lines;

7 sequentially connected deconvolution modules B are arranged behind the reduction module A, and the deconvolution modules B are divided into 4 convolution channels;

the deconvolution module B is connected with a reduction module B, and the reduction module B has a pooling effect; the reduction module B consists of 3 groups of convolution;

the reduction module B is connected with 3 sequentially connected deconvolution modules C, and each deconvolution module C consists of 5 groups of convolution;

the deconvolution module C is followed by an average pooling layer, followed by a random discard layer (i.e., dropout layer), each of which contains a calculation weight value (i.e., parameter) for calculating the data passed to the next layer. Convolutional neural network model training is performed as follows:

s1-3, inputting the preprocessed pedestrian image serving as a training set into a convolutional neural network model;

s1-4, forward transmitting a calculation error to the convolutional neural network model;

s1-5, transmitting error updating parameters by adopting a back propagation method;

and S1-6, repeating the steps S1-3 to S1-5 until the training requirement condition is met, and obtaining a trained convolutional neural network as a feature extraction model.

The convolutional neural network has the technical characteristics that parallel convolutional kernels are added into the same convolutional layer, so that the single-layer convolution can extract parameter attributes with different sparsity degrees, the network width is increased, and meanwhile, the adaptability of the network is increased.

In the invention, the convolution neural network also has the technical characteristic that the convolution kernel of the convolution neural network is replaced by 2 n multiplied by n (n < m) convolution kernels instead of the commonly used m multiplied by m, so that the parameter number of convolution layers can be reduced while the same visual field can be obtained, and meanwhile, the depth of the neural network is increased.

In the invention, the convolutional neural network also has the technical characteristics that a larger n multiplied by n symmetric convolutional kernel with the size larger than 5 multiplied by 5 is replaced by combining 1 multiplied by n and n multiplied by 1 asymmetric convolutional kernels, the parameter quantity and the calculated quantity can be further reduced on the premise of unchanged extracted feature quantity, and a better training effect is obtained.

The convolution neural network also has the technical characteristics that the convolution kernel reducing module is divided into 3 types, each type is provided with a plurality of groups of convolutions, the reducing module is divided into 2 types, and each type is provided with a plurality of groups of convolutions.

In the invention, the output of each convolution layer in the step S-1 is subjected to batch standardization, and the output of each convolution layer is normalized to be normal distribution of N (0, 1), so that the problem of gradient disappearance in the back propagation process is prevented.

In the present invention, additional 1 × 1 convolutional layers are added before the 3 × 3 and 5 × 5 convolutional layers in step S1, limiting the number of input channels and reducing the amount of computation.

In the invention, a special reduction module is introduced in the step S1-3 and is used for changing the width and the height of the grid and reducing the output dimension and the number of relevant parameters for training.

In the present invention, the training completion conditions in step S1-6 are: the predetermined number of cycles is completed and the parameters have converged or the training error is eliminated.

The invention also comprises a pedestrian re-identification device based on the method, which comprises the following steps: the system comprises a convolutional neural network model building and training module, an image to be judged and target image preprocessing module, a feature extraction module and a consistency judging module. The four modules execute the functions of the steps S1, S2, S3 and S4 in the pedestrian re-identification method in sequence.

The method adopts the parallel convolution kernels to reduce convolution parameters, and simultaneously uses a plurality of low-dimensional symmetrical convolution kernels and low-dimensional asymmetrical convolution kernels to replace higher-dimensional convolution kernels, so that the number of operation parameters and operation calculation amount are reduced, the model calculation can be completed quickly, the corresponding feature vector extraction and the training speed of the pedestrian re-identification model are improved, the pedestrian re-identification precision obtained on a data set by the method is far higher than that of the existing various pedestrian re-identification methods, and the calculation amount and the calculation time are reduced.

Drawings

Fig. 1 is a flowchart of a pedestrian re-identification method based on a parallel deconvolution kernel convolutional neural network according to an embodiment of the present invention.

FIG. 2 is a schematic representation of a convolutional neural network structure of an embodiment of the present invention.

Fig. 3 is an illustration of a deconvolution module a of an embodiment of the present invention.

Fig. 4 is a diagram of a deconvolution module B of an embodiment of the present invention.

Fig. 5 is a diagram of a deconvolution module C of an embodiment of the present invention.

Fig. 6 is a diagram of a reduction module a of an embodiment of the present invention.

Fig. 7 is a diagram of a reduction module B of an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made with reference to the accompanying drawings.

The model construction and the like in the embodiment are all realized on a Linux platform, and the platform is supported by at least one Graphics Processing Unit (GPU) card.

FIG. 1 is a flow chart of a convolutional neural network based on parallel deconvolution kernels according to an embodiment of the present invention. The pedestrian re-identification method based on the parallel reduced convolution kernel neural network comprises the following steps:

and S1, constructing and training a convolutional neural network model. The method comprises the steps of constructing a convolutional neural network model based on parallel deconvolution kernels, and training the convolutional neural network model by adopting a plurality of existing pedestrian images to obtain a trained convolutional neural network model as a feature extraction model. The model construction and training comprises the following steps:

and S1-1, preprocessing a plurality of existing pedestrian images used as training sets to obtain preprocessed training images which are uniform in size and respectively correspond to the existing pedestrian images.

In this embodiment, the image as the training set is from a data set Market1501, the data set includes 1501 pedestrian IDs including 750 training IDs and 751 test IDs to be retrieved, the training data set includes 12936 pictures, which are captured by cameras in 6 different regions, and the input image format is 128 × 64, respectively. The pretreatment comprises the following steps:

and S1-1-1, carrying out face detection on the image to be processed, and finding out the face position in the image. In this embodiment, the pedestrian in the image is detected by using the Faster-RCNN in the related art.

And S1-1-2, detecting a plurality of key position points in the pedestrian found in the step S1-1-1, namely key position points at least comprising the head, the trunk and the limbs.

And S1-1-3, aligning the image to be processed according to the key position point, and unifying the size of the image to be processed (namely, adjusting the size to be unified). In this embodiment, the alignment operation of the to-be-processed images is performed according to the key points including the head, the trunk, and the limbs, and after the alignment, the to-be-processed images are uniformly adjusted to the size of 299 × 299 pixels by a common image adjustment means. In addition, the number of channels of each image to be processed is not changed.

And S1-1-4, cutting the to-be-processed image with unified size to obtain a corresponding preprocessed image.

And S1-2, constructing a model. The model adopted in the embodiment is a convolutional neural network based on a parallel deconvolution kernel, and the convolutional neural network model comprises an input module, a deconvolution module, a reduction module and a full connection module. The structure of which is shown in figure 2. The deconvolution module is divided into 3 types, which are recorded as: reduce convolution module A, reduce convolution module B, reduce convolution module C, reduce the module and divide into 2 types, record as: a reduction module A and a reduction module B.

The convolutional neural network firstly inputs data into an input module, and 4 sequentially connected deconvolution modules A are arranged behind the input module. The deconvolution module a is divided into 4 sets of convolutions: LA1, LA2, LA3, LA4. Wherein LA1 is the average pooling layer P1 in the first layer and LA1C1 in the second layer. LA2 contains only one layer of convolutional layer LA2C1.LA3 the first layer is convolutional layer LA3C1, and the second layer is convolutional layer LA3C2.LA4 the first layer is a convolutional layer LA4C1, the second layer is LA4C2, and the third layer is LA4C3.

The deconvolution module a is followed by a reduction module a to serve as a pooling reduction data size, pooling the input 35 x 35 pixel value data to 17 x 17 size. The reduction module a consists of 3 sets of convolution rows, respectively: RA1, RA2, RA3. Wherein RA1 is the largest pooling layer P2, RA2 is the convolutional layer RA2C1, RA3 is the convolutional layer RA3C1 for the first layer, the convolutional layer RA3C2 for the second layer, and the convolutional layer RA3C3 for the third layer.

The reduction module A is followed by 7 successive convolution reducing modules B, which are divided into 4 convolution channels: LB1, LB2, LB3, LB4. The first layer of the LB1 is an average pooling layer P3, and the second layer is a convolution layer LB1C1.LB2 contains only one layer of the convolutional layer LB2C1. The first layer of LB3 is a coiled-up layer LB3C1, the second layer is a coiled-up layer LB3C2, and the third layer is a coiled-up layer LB3C3. The first layer of LB4 is a convolution layer LB4C1, the second layer is LB4C2, the third layer is LB4C3, and the fourth layer is LB4C4.

The deconvolution module B is connected with a reduction module B, and the reduction module B has a pooling effect; the 17 x 17 pixel block input to the reduction module B is reduced to a size of 8 x 8. The reduction block B is convolved by 3 sets: RB1, RB2 and RB 3. Wherein RB1 is a maximum pooling layer P4, RB2 is a first layer of a convolution layer RB2C1, and RB2C2 is a second layer. RB3 the first layer is convolutional layer RB3C1, the second layer is convolutional layer RB3C2, the third layer is convolutional layer RB3C3, and the fourth layer is convolutional layer RB3C4.

The reduction module B is followed by 3 convolution reducing modules C which are connected in sequence, and the convolution reducing modules C are formed by 5 groups of convolutions: LC1, LC2, LC3, LC4 and LC 5. The first layer of the LC1 is an average pooling layer P5, and the second layer is a convolution layer LC1C1.LC2 contains only one layer of convolutional layer LC2C1. The LC3 first layer is convolution layer LC3C1 and the second layer is 2 sets of parallel convolution LC3C21 and LC3C22. The first layer of LC4 is convolution layer LC4C1, the second layer is LC4C2, the third layer is LC4C3, and the fourth layer is 2 groups of parallel convolution LC4C41 and LC4C42.

The deconvolution module C is followed by an average pooling layer, followed by a random discard layer (i.e., dropout layer), each of which contains a calculation weight value (i.e., parameter) for calculating the data passed to the next layer.

The parameters of each layer of the convolutional neural network model in this embodiment are shown in table 1 below.

TABLE 1

Layer name	Parameter(s)
		Input layer	299×299×3
P1	Pooling interval of 2X 2, average pooling
		LA1C1	Convolution kernel 1 × 1, channel 96, shift step 1
LA2C1	Convolution kernel 1 × 1, channel 96, shift step 1
		LA3C1	Convolution kernel 1 × 1, channel 64, step size 1 shift
LA3C2	Convolution kernel 3 × 3, channel 96, moving step 1
		LA4C1	Convolution kernel 1 × 1, channel 64, step size 1 shift
LA4C2	Convolution kernel 3 × 3, channel 96, moving step 1
		LA4C3	Convolution kernel 3 × 3, channel 96, moving step 1
P2	Pooling interval of 3 × 3, maximum pooling, moving step length of 2
		RA2C1	Convolution kernel 3 × 3, channel 384, shift step 2
RA3C1	Convolution kernel 1 × 1, channel 192, shift step 1
		RA3C2	Convolution kernel 3 × 3, channel 224, moving step 1
RA3C3	Convolution kernel 3 × 3, channel 256, shift step 2
		P3	Pooling interval 2X 2, average pooling
LB1C1	Convolution kernel 1 × 1, channel 128, moving step 1
		LB2C1	Convolution kernel 1 × 1, channel 384, shift step 1
LB3C1	Convolution kernel 1 × 1, channel 192, shift step 1
		LB3C2	Convolution kernel 1 × 7, channel 224, move step size1
LB3C3	Convolution kernel 7 × 1, channel 256, shift step 1
		LB4C1	Convolution kernel 1 × 1, channel 192, shift step 1
LB4C2	Convolution kernel 1 × 7, channel 192, shift step 1
		LB4C3	Convolution kernel 7 × 1, channel 224, moving step 1
LB4C4	Convolution kernel 1 × 7, channel 224, shift step 1
		LB4C5	Convolution kernel 7 × 1, channel 256, shift step 1
P4	Pooling interval of 3 × 3, maximum pooling, moving step length of 2
		RB2C1	Convolution kernel 1 × 1, channel 192, shift step 1
RB2C2	Convolution kernel 3 × 3, channel 192, move step 2
		RB3C1	Convolution kernel 1 × 1, channel 256, shift step 1
RB3C2	Convolution kernel 1 × 7, channel 256, shift step 1
		RB3C3	Convolution kernel 7 × 1, channel 320, step size 1 of move
RB3C4	Convolution kernel 3 × 3, channel 320, move step 2
		P5	Pooling interval 2X 2, average pooling
LC1C1	Convolution kernel 1 × 1, channel 256, shift step 1
		LC2C1	Convolution kernel 1 × 1, channel 256, shift step 1
LC3C1	Convolution kernel 1 × 1, channel 384, shift step 1
		LC3C21	Convolution kernel 1 × 3, channel 256, shift step 1
LC3C22	Convolution kernel 3 × 1, channel 256, shift step 1
		LC4C1	Convolution kernel 1 × 1, channel 384, shift step 1
LC4C2	Convolution kernel 1 × 3, channel 448, moving step 1
		LC4C3	Convolution kernel 3 × 1, channel 512, step size 1 shift
LC4C41	Convolution kernel 1 × 1, channel 256, shift step 1
		LC4C42	Convolution kernel 1 × 3, channel 256, move step 1
P6	Output 1024 dimensions
		Dropout	Random discard ratio 0.5
FC6	Output 10575D

As can be seen from table 1, after the model is constructed, the training set can be used to train the model.

And S1-3, inputting the preprocessed training image serving as a training set into a convolutional neural network model.

And S1-4, carrying out forward transmission in the convolutional neural network model and calculating a training error.

And S1-5, transmitting error updating parameters by adopting a back propagation algorithm, namely performing back propagation calculation in the model according to the training errors, and gradually adjusting parameters of each layer to gradually reduce the training errors.

And S1-6, repeating the steps S1-3 to S1-5 until the training completion condition is reached (namely the preset cycle number is completed, the parameters are converged or the training errors are basically eliminated), and obtaining a trained convolutional neural network model as a feature extraction model.

In order to facilitate image input and accelerate the model training speed, the training process of this embodiment adopts a batch input processing mode. Dividing the images of the training set into 203 batches, inputting 64 images in each batch, and then respectively performing the processing of the steps S1-4 to S1-5 in each batch; after all batches are input and processed, one cycle is completed, and then the batch input processing process of the next cycle can be carried out.

In this example, the total number of cycles is 200. In addition, the initial learning rate of the model at the time of input was set to 0.003, and reduced by 10 times at 40, 80, and 110 cycles, respectively. The model is supervised by a Softmax loss function, and the updating parameters are propagated reversely through the setting of the step S1-5.

Through the steps, the convolutional neural network model of the embodiment is constructed and trained, and can be used for re-identifying the pedestrian. In this embodiment, the trained convolutional neural network model is used as a feature extraction unit to extract feature vectors of the target image and the images to be determined, and the obtained feature vectors can be used to determine the degree of similarity between the target image and each image to be determined, so as to find out a pedestrian image that is consistent with the target image in the images to be determined.

Before inputting the target image and the image to be judged into the trained model, corresponding preprocessing is required to be performed to obtain images with consistent sizes, namely, step S2: and preprocessing the target image to obtain a preprocessed target image, and preprocessing the image to be judged to obtain a corresponding preprocessed image to be judged.

In this embodiment, a Market1501 data set is used as a test data set, where the Market1501 data set includes 19281 images of 750 people.

The image to be determined and the target image also need to be preprocessed before being input into the trained model, and the preprocessing process is basically the same as the preprocessing process (namely the steps) of the pedestrian image serving as the training set, and the method comprises the following steps:

s2-1, carrying out pedestrian detection on the image to be processed, and finding out the position of a pedestrian in the image to be processed;

s2-2, detecting a plurality of key position points of the pedestrian human body, which are found out in the S2-1, at least comprising the head, the trunk and the limbs;

s2-3, aligning the image to be processed according to the key position points and unifying the size of the image to be processed;

and S2-4, performing center cutting on the image to be processed after the size is unified to obtain a corresponding preprocessed image. In this embodiment, unlike the training set, center clipping is adopted for each pedestrian image as the test set.

The image to be judged and the target image can be subjected to feature extraction and judgment after being preprocessed, namely the following steps:

and S3, sequentially inputting the pre-processed image to be judged and the pre-processed target image into the feature extraction model, thereby obtaining a plurality of feature vectors to be judged corresponding to the pre-processed image to be judged and target feature vectors corresponding to the pre-processed target image.

And S4, judging the pedestrian image consistent with the target image in the image to be judged according to the target feature vector and the vector to be judged.

In this embodiment, in order to facilitate the examination of the model determination accuracy after training, the preprocessing target image and the corresponding preprocessing image to be determined (i.e., the preprocessed image) are set as a picture pair. After the models are respectively input to obtain corresponding eigenvectors, the cosine distance is adopted to calculate the similarity of the two eigenvectors in one picture pair, when the cosine distance obtained by calculation is larger than a preset value, the target image and the image to be judged in the picture pair are judged not to be the same person, and when the cosine distance is smaller than the preset value, the target image and the image to be judged are judged to be the same person.

Obviously, since the image pair to be determined is formed only after the target image is turned over, both are actually the same person. After the image preprocessing and the process of obtaining the characteristic vector by the input model, calculating the cosine distance and judging, if the judgment result of one image pair is not the same person, the result of the face recognition of the time is wrong, and if the judgment result is the same person, the result of the pedestrian recognition of the time is correct.

Table 2 shows the determination accuracy result of the pedestrian re-identification method based on the parallel reduced convolution kernel neural network according to the embodiment of the present invention, and the comparison with the accuracy of the existing common method model. The "inclusion V1", "inclusion V2", "Alexnet" and "denssenet" are all common image recognition neural networks.

TABLE 2

。

The pedestrian re-identification method based on the parallel reduced convolution kernel neural network has high identification precision, and the identification precision is higher than that of a pedestrian re-identification method commonly used in the prior art.

Examples effects and effects

According to the embodiment, the dimension convolution kernel is subjected to parallel decomposition, and the high-dimension convolution kernel is replaced by a plurality of symmetrical or asymmetrical low-dimension convolutions, so that the number of parameters and the calculation time which need to be calculated are greatly reduced, the model training can be completed faster, the feature vector extraction by the trained model can be completed faster, and the model training speed for identifying the pedestrians, the feature extraction speed of the target image and the feature extraction speed of the image to be judged are all accelerated.

The above examples are only for illustrating the specific embodiments of the present invention and are not intended to limit the present invention.

According to the method, the invention also provides a corresponding pedestrian re-identification device, which comprises: the convolutional neural network model obtained through the construction and training is packaged to form a convolutional neural network model construction and training module, a preprocessing module for preprocessing an image to be judged and a target image, a feature extraction module for feature extraction, and a consistency judgment module for performing consistency judgment according to a target feature vector extracted by the feature extraction module and the vector to be judged. The functions of the four modules are sequentially for the operations of step S1, step S2, step S3, step S4 of executing the pedestrian re-identification method.

In the embodiment, whether the target image and the image to be determined are consistent is determined through cosine distance calculation between feature vectors of the target image and the image to be determined. In the present invention, the consistency between the target image and the image to be determined may also be determined by using another vector distance calculation method.

In the embodiment, in order to facilitate image input and accelerate the model training speed, a batch input processing mode is adopted in the training process. However, when other training sets with a small number of images are used, the training sets may be directly input in their entirety without using a batch input processing method, and then the processing steps from step S1-4 to step S1-5 are performed.

Claims

1. A pedestrian re-identification method based on a parallel dimension reduction convolution neural network is characterized by comprising the following specific steps:

and step S4: searching a pedestrian image consistent with the target image in the target image to be judged according to the target characteristic vector and the pedestrian characteristic vector to be judged;

the step S1 includes the following substeps:

s1-2, constructing a convolutional neural network model based on a parallel deconvolution kernel, wherein the convolutional neural network model comprises an input module, a deconvolution module, a reduction module, a pooling layer and a full-connection layer; the input module is used for inputting image data to be detected and extracting relevant features; the deconvolution module performs convolution by adopting a multi-channel parallel structure so as to reduce image characteristic parameters; the reduction module reduces the dimension of the extracted image features; the parameters in the parameter matrix of each layer are randomly set; the deconvolution module is divided into 3 types, which are recorded as: reduce convolution module A, reduce convolution module B, reduce convolution module C, reduce the module and divide into 2 types, record as: a reduction module A and a reduction module B;

4 deconvolution modules A connected in sequence are arranged behind the input module; the deconvolution module A is composed of 4 groups of convolution;

the reduction module A is arranged behind the deconvolution module A to play a role of pooling reduction of data scale, and the reduction module A is composed of 3 groups of convolution;

the reduction module A is followed by 7 sequentially connected deconvolution modules B, and the deconvolution modules B are divided into 4 convolution channels;

the reduction module B is connected with 3 sequentially connected deconvolution modules C, and each deconvolution module C consists of 4 groups of convolution;

the deconvolution module C is connected with an average pooling layer and a random discarding layer, and each layer contains a calculation weight value for calculating data transmitted to the next layer;

s1-3, inputting the preprocessed pedestrian image as a training set into a convolutional neural network model;

s1-6, repeating the steps S1-3 to S1-5 until the training requirement condition is met, and obtaining a trained convolutional neural network as a feature extraction model;

in step S1-2:

the deconvolution module A is divided into 4 groups of convolutions, which are respectively recorded as: LA1, LA2, LA3, LA4; in LA1, the first layer is an average pooling layer and is marked as P1, and the second layer is a convolution layer and is marked as LA1C1; LA2 contains only one layer of convolutional layer, and is marked as LA2C1; in LA3, the first layer is a convolutional layer and is marked as LA3C1, and the second layer is a convolutional layer and is marked as LA3C2; in LA4, the first layer is a convolutional layer, denoted as LA4C1, the second layer is a convolutional layer, denoted as LA4C2, and the third layer is a convolutional layer, denoted as LA4C3;

the deconvolution module B is divided into 4 convolution channels which are respectively marked as LB1, LB2, LB3 and LB4; in LB1, the first layer is an average pooling layer and is marked as P3, and the second layer is a convolution layer and is marked as LB1C1; LB2 contains only one layer of convolution layer, which is marked as LB2C1; in the LB3, the first layer is a convolutional layer, which is marked as LB3C1, the second layer is a convolutional layer, which is marked as LB3C2, and the third layer is a convolutional layer, which is marked as LB3C3; in the LB4, the first layer is a convolutional layer, which is denoted as LB4C1, the second layer is a convolutional layer, which is denoted as LB4C2, the third layer is a convolutional layer, which is denoted as LB4C3, the fourth layer is a convolutional layer, which is denoted as LB4C4, and the fifth layer is a convolutional layer, which is denoted as LB4C5;

the deconvolution module C is composed of 4 sets of convolution products, which are respectively recorded as: LC1, LC2, LC3, LC4; in LC1, the first layer is an average pooling layer and is marked as P5, and the second layer is a convolution layer and is marked as LC1C1; LC2 only contains one convolution layer, which is marked as LC2C1; in LC3, the first layer is a convolution layer and is marked as LC3C1, and the second layer is 2 groups of parallel convolutions and is marked as LC3C21 and LC3C22; in the LC4, the first layer is a convolution layer and is marked as LC4C1, the second layer is a convolution layer and is marked as LC4C2, the third layer is a convolution layer and is marked as LC4C3, and the fourth layer is 2 groups of parallel convolutions and is marked as LC4C41 and LC4C42;

the reduction module a is composed of 3 groups of convolution, which are respectively recorded as: RA1, RA2, RA3; RA1 is a maximum pooling layer, is marked as P2, RA2 is a convolution layer, is marked as RA2C1, and in RA3, the first layer is a convolution layer, is marked as RA3C1, the second layer is a convolution layer, is marked as RA3C2, and the third layer is a convolution layer, is marked as RA3C3;

the reduction module B is composed of 3 groups of convolution, which are respectively recorded as: RB1, RB2, RB3; RB1 is a maximum pooling layer and is marked as P4, in RB2, the first layer is a convolution layer and is marked as RB2C1, and the second layer is a convolution layer and is marked as RB2C2; in RB3, the first layer is denoted as RB3C1, the second layer is denoted as RB3C2, the third layer is denoted as RB3C3, and the fourth layer is denoted as RB3C4.

2. The pedestrian re-identification method based on the parallel dimension-reducing convolutional neural network as claimed in claim 1, wherein the step S1-1 comprises the following sub-steps:

s1-1-1, carrying out face detection on an image to be processed to find out the face position in the image;

s1-1-2, detecting a plurality of key position points in the pedestrian found in the step S1-1-1, wherein the key position points at least comprise key position points of the head, the trunk and the limbs;

s1-1-3, aligning the images to be processed according to the key position points, and unifying the sizes of the images to be processed;

and S1-1-4, cutting the to-be-processed images with uniform sizes to obtain corresponding preprocessed images.

3. The pedestrian re-identification method based on the parallel dimension-reduction convolutional neural network as claimed in claim 1, wherein in the convolutional neural network, parallel convolutional kernels are added in the same convolutional layer, so that parameter attributes of different sparsity degrees can be extracted by single-layer convolution, the network width is increased, and meanwhile, the adaptability of the network is increased.

4. The pedestrian re-identification method based on the parallel dimensionality reduction convolutional neural network as claimed in claim 1, wherein in the convolutional neural network, a convolutional kernel of the convolutional neural network is replaced by 2 n × n convolutional kernels instead of m × m which is commonly used, wherein n is smaller than m, so that the number of parameters of the convolutional layers is reduced while the same visual field is obtained, and the depth of the neural network is increased.

5. The pedestrian re-identification method based on the parallel dimensionality reduction convolutional neural network as claimed in claim 1, wherein in the convolutional neural network, a large n × n symmetric convolutional kernel with the size larger than 5 × 5 is replaced by combining 1 × n and n × 1 asymmetric convolutional kernels, and the number of parameters and the amount of calculation are further reduced on the premise that the number of extracted features is not changed, so that a better training effect is obtained.

6. The pedestrian re-identification method based on the parallel dimensionality reduction convolutional neural network as claimed in claim 1, wherein the parameters of each layer of the convolutional neural network model are shown in the following table:

7. the pedestrian re-identification method based on the parallel dimensionality reduction convolutional neural network as claimed in claim 1, wherein the output of each convolutional layer in the step S-1 is subjected to batch standardization, and the output of each layer is normalized to a normal distribution of N (0, 1).

8. The pedestrian re-identification method based on the parallel dimensionality reduction convolutional neural network as claimed in claim 1, wherein the training completion conditions in the step S1-6 are as follows: the predetermined number of cycles is completed and the parameters have converged or the training error is eliminated.

9. The pedestrian re-identification method based on the parallel dimension-reducing convolutional neural network as claimed in claim 1, wherein the step S2 comprises the following sub-steps:

s2-2, detecting a plurality of key position points of the pedestrian human body found in the S2-1, wherein the key position points at least comprise the head, the trunk and the limbs;

and S2-4, performing center cutting on the image to be processed after the size is unified to obtain a corresponding preprocessed image.

10. Pedestrian re-identification device based on the method of one of claims 1 to 9, characterized in that it comprises: the system comprises a convolutional neural network model construction and training module, an image to be judged and target image preprocessing module, a feature extraction module and a consistency judgment module; the four modules execute functions corresponding to the operations of step S1, step S2, step S3 and step S4 in the pedestrian re-identification method in sequence.