CN114863505A

CN114863505A - Pedestrian re-identification method based on trident convolutional neural network

Info

Publication number: CN114863505A
Application number: CN202210215993.9A
Authority: CN
Inventors: 熊明福; 高志於; 李家辉; 胡新荣; 陈佳; 张俊杰
Original assignee: Wuhan Textile University
Current assignee: Wuhan Textile University
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-08-05
Anticipated expiration: 2042-03-07
Also published as: CN114863505B

Abstract

The invention discloses a pedestrian re-identification method based on a tridentate multi-branch convolutional neural network. The convolutional neural network based on color, the convolutional neural network based on spatial position information and the convolutional neural network based on high-level semantic information are designed, and the designed neural network model has a lighter network structure and fewer network parameters, so that the possibility is provided for realizing a pedestrian re-identification technology based on movement. The designed network model is used for acquiring the depth level characteristics of pedestrians and is more in line with the recognition mechanism of human beings for things from coarse to fine and from shallow to deep; in addition, a network unified model training mode is based on, optimization of network parameters is facilitated, and compared with a traditional training mode of ignoring network middle layer information, the structure designed by the invention can better combine the bottom layer visual features and the high layer semantic attributes of the pedestrians, and can obtain more discriminative pedestrian descriptors so as to realize the efficiency of pedestrian re-recognition.

Description

Pedestrian re-identification method based on trident convolutional neural network

Technical Field

The invention relates to a pedestrian re-identification technology, in particular to a pedestrian re-identification method based on a tridentate convolutional neural network.

Background

Pedestrian re-identification is a technology for judging whether two pedestrians under different space physical position camera visual angles are the same target or not by matching, and the pedestrian re-identification is widely concerned and applied in the application fields of academia and industry (artificial intelligence and public security criminal investigation) and the like. However, the pedestrian re-identification technology is a challenging problem due to the influence of objective environmental factors (including illumination, viewing angle, scale, etc.) and the like. In the actual work of re-identifying and researching pedestrians, the method mainly comprises the following three steps: feature extraction (appearance feature representation of pedestrian objects), distance measurement (similarity comparison of pedestrian objects) and feedback optimization (optimization of sequencing results), and at each link, relevant scholars invest in specific research work. In recent years, with the development of deep learning technology, it is becoming a mainstream method for solving the problem of pedestrian re-recognition, and a good effect is obtained. However, in the practical process of the actual algorithm, the method based on the deep neural network model occupies huge calculation cost and hardware resources in the aspects of parameter storage, network optimization and the like, so that the practical operability and the possibility of wide application of the method are limited, and particularly, the method meets huge technical bottlenecks in the aspect of application based on the intelligent mobile terminal device.

Disclosure of Invention

The method aims at the problem that a pedestrian re-recognition algorithm cannot be effectively applied to mobile terminal equipment due to the time-consuming and labor-consuming limitations of a traditional deep neural network in the aspects of model design, network optimization and the like. The invention designs a set of efficient and light deep neural network models to realize the pedestrian re-identification technology aiming at the requirements of actual intelligent mobile terminal equipment, namely, a depth feature of a pedestrian is extracted by adopting a structure based on a tridentate convolutional neural network. The pedestrian appearance is expressed in a pedestrian level feature cascade learning mode, accurate expression of the pedestrian is achieved, and the pedestrian re-identification technology is completed. The applicability of the invention on intelligent mobile terminal equipment is demonstrated through the analysis of the complexity of the model, so that the accuracy of pedestrian identification is ensured.

The invention provides a light and simple trident convolutional neural network structure to solve the pedestrian re-recognition problem, extracts the pedestrian level depth features through the combined training of a multi-branch network result with a simple structure, and performs L2 normalization processing on the pedestrian level depth features to realize the pedestrian re-recognition problem. Therefore, the specific implementation steps of the multi-branch convolutional neural network structure designed by the invention are as follows:

the designed tridentate multi-branch network structure mainly comprises three convolutional neural network structures, namely a semantic convolutional neural network (S-CNN), a color convolutional neural network (color CNN, C-CNN) and a position convolutional neural network (L-CNN). The S-CNN is used for acquiring the semantic information characteristics of the pedestrians and mainly comprises the acquisition of the final high-level abstract characteristics of a deep network structure; the C-CNN is used for acquiring the bottom visual features (such as color, texture and the like) of the pedestrian, and the L-CNN is used for extracting the spatial position information of the pedestrian. In the invention, the hierarchy depth feature extraction of pedestrians is realized by integrally training a tridentate convolutional neural network structure, and the method specifically comprises the following steps:

step 1, acquiring pedestrian image data under different cameras and using the pedestrian image data as input data of a network, wherein the pedestrian image data comprises a triple image consisting of an anchor point sample, a positive sample image and a negative sample image;

step 2, preprocessing the pedestrian image obtained in the step 1, whitening each pedestrian image sample, and normalizing and subtracting a mean value;

step 3, designing and training a network structure; inputting the preprocessed pedestrian image data into a trident network structure for optimization training, designing the trident network structure, and then realizing the optimization training of the network structure in a triple loss mode to obtain the depth features of the pedestrian levels;

the three-fork network structure comprises three convolutional neural network structures, namely a semantic convolutional neural network (semantic CNN, S-CNN), a color convolutional neural network (color CNN, C-CNN) and a position convolutional neural network (location CNN, L-CNN); the S-CNN is used for acquiring the semantic information features of the pedestrians, including the acquisition of the final high-level abstract features of the deep network structure; the C-CNN is used for acquiring the visual characteristics of the bottom layer of the pedestrian, and the L-CNN is used for extracting the spatial position information of the pedestrian;

step 4, adopting L to the different depth level characteristics of the pedestrians obtained in the step 3 ₂ Normalization processing is carried out, so that the final hierarchy depth feature described for the pedestrian is obtained;

and 5, measuring the pedestrian similarity of the pedestrian level depth features obtained in the step 4 by adopting a distance measuring mode for different pedestrian samples so as to obtain a final pedestrian re-identification result.

Further, in step 1, acquiring pedestrian image data under different cameras includes:

step 1.1, take an image I of a pedestrian as an anchor point sample, and the positive sample image thereof is represented as I ⁺ The image of the same person as the anchor sample, the negative sample image being denoted I ^- Images of the same person as the anchor sample;

step 1.2, for the selection of the triple group image, the generation of triple group pedestrian image data is realized by adopting a data enhancement method so as to obtain pedestrian triple group image data suitable for network structure training.

Further, in the step 2, preprocessing the pedestrian image, wherein the specific steps comprise;

step 2.1, combining whitening and dimensionality reductionIt is made such that the covariance matrix of the input data becomes the identity matrix I, specifically if R is an arbitrary orthogonal matrix, that is, RR is satisfied ^T ＝R ^T Where R is a rotation or reflection matrix, the defined ZCA whitening result is: x is the number of _ZCAwhite ＝Ux _PCAwhite Where U refers to the eigenvector matrix of the covariance matrix of the data, x _PCAwhite Referred to as PCA whitening, x _ZCAwhite The ZCA whitening is equivalent to converting the data after PCA whitening back to the original space, and the result after the ZCA whitening is as close to the original input data x as possible;

and 2.2, obtaining a data result after the ZCA whitening, carrying out normalization processing on the data, and subtracting the mean value of the data to make the data more accord with the data form input by the tridentate network structure.

Further, the network structure of the color convolutional neural network is that firstly, an original pedestrian monitoring image is decomposed into an RGB component diagram, color features of each component are extracted through a network model taking CaffeNet as a reference, namely through a convolutional layer 1, a convolutional layer 2 and a convolutional layer 3, and then the color components are combined, namely through a sub-convolutional layer 1 and three sub-full connecting layers, so that the color features of pedestrians are obtained;

the network structure of the position convolution neural network comprises the steps of extracting pedestrian convolution characteristics through a convolution layer 1, a convolution layer 2, a convolution layer 3 and a convolution layer 4, designing a multi-scale-based space pyramid pooling layer structure to obtain pedestrian characteristics under different scales, extracting characteristics related to space position information through a sub-convolution layer 2, then extracting characteristics related to space position information through a space pyramid layer and three sub-full-connection layers, wherein the pyramid structure is respectively designed into 16, 4, 256 and 256-dimensional pooling layer structures, and finally performing cascade learning on the characteristics to form the space characteristics of pedestrians;

the network structure of the semantic convolutional neural network is that on the basis of a CaffeNet network structure, input image convolution characteristics are extracted through a convolutional layer 1, a convolutional layer 2 and a convolutional layer 3, local segmentation is carried out on the convolution characteristics by using a segmentation algorithm, namely, each part of a head, a trunk, a leg and a shoe is obtained through a convolutional layer 4, a convolutional layer 5, a convolutional layer 6 and a convolutional layer 7 so as to obtain an interested area, and finally corresponding semantic attributes of pedestrians are obtained through global average pooling operation and three full-connection layers;

wherein, except for the convolutional layers 3 and 4, the other convolutional layers are connected with the pooling layer.

Further, in step 3, the optimization training of the tridentate convolutional neural network structure specifically includes the steps of:

step 3.1, inputting data into convolutional neural networks with different structures, wherein the semantic convolutional neural network comprises 7 convolutional layers, 5 pooling layers and 3 full-connection layers, and the last full-connection layer outputs semantic information with the characteristic of pedestrians; the color convolution neural network structure comprises 4 convolution layers, 3 pooling layers and 3 full-connection layers, wherein the last full-connection layer is color information of pedestrians; the position convolution neural network comprises 5 convolution layers, 2 pooling layers, a space pyramid layer and 3 full-connection layers and is used for outputting the space position information of the pedestrian;

step 3.2, acquiring pedestrian level features according to the network structure designed in the step 3.1, taking the last layer feature of each branch network as the pedestrian feature, and taking the last but one layer feature as a descriptor of the pedestrian to obtain the more discriminative level pedestrian feature;

step 3.3, regarding the input pedestrian image triple<I、I ⁺ 、I ^- >According to the inherent characteristics of re-identification of pedestrians, the distance between the same person is smaller than the distance between different persons to optimize the network structure, namely:

wherein

Expressed as the distance between one and the same person,

expressed as the distance between different pedestrians; in particular toAs for the above-described inputted pedestrian triplet data<I、I ⁺ 、I ^- >After training through the trident network, the obtained depth feature is expressed as<g _w (I)、g _w (I ⁺ )、g _w (I ^- )>Wherein g is _w (I) Characteristic of anchoring pedestrians, g _w (I ⁺ ) Characteristic of a positive sample pedestrian, g _w (I ^- ) Is the pedestrian characteristic of a negative sample, w is the network parameter, and the realization is realized in the specific network training process

‖g _w (I)-g _w (I ⁺ )‖<‖g _w (I)-g _w (I ^- )‖ (1)

It is always true that the error calculation is facilitated, and the formula (1) can be expressed as follows:

‖g _w (I)-g _w (I ⁺ )‖ ² <‖g _w (I)-g _w (I ^- )‖ ² (2)

a derivative calculation is performed on the transformed cost function.

Further, the inverse propagation idea is adopted to conduct derivation calculation on the transformed cost function, and the specific steps are as follows:

step 3.3.1, firstly, calculating similarity distances among different pedestrians, and transforming the loss function expressed by the formula (2) into:

wherein L is the loss of three networks in the network structure, d (w, I) is the difference between the distance between the same pedestrian and the distance between different pedestrians, N is the total number of samples of the pedestrian, and C is the boundary distance for constraining the positive and negative samples;

step 3.3.2, in the overall network structure, convolution layers 1, 2 and 3 use convolution kernels of size 11 × 11,5 × 5,3 × 3, respectively, and their forward convolution operations are expressed as:

wherein

And

respectively represent the output, k, of the ith neuron of the l-th layer and the ith neuron of the l-1 layer ^(l) For the convolution kernel size between layer l and layer l-1,

for the bias term of the kth feature map of the l layer, ReLU (Rectified Linear Unit, ReLU) is a modified Linear Unit and is expressed as an activation function between two convolutional layers;

step 3.3.3, calculating errors after one-time forward propagation, and realizing backward propagation; the partial derivative calculation method of the formula (3) is as follows:

wherein W represents a network parameter, I _j Representing the jth pedestrian image;

by pairing d (W, I) _j ) The gradient calculation method can be obtained as follows:

step 3.3.4: respectively calculating according to the calculation results derived in the step 3.3.3

And

to obtain the final loss;

and substituting the partial derivative formula into a gradient descent algorithm to minimize L according to the partial derivative formula, so that the back propagation loss of each layer is solved, and the final network optimization process is realized.

Further, in step 4, L is performed on the level depth feature ₂ The normalization processing comprises the following specific steps:

step 4.1, according to the optimized network structure in step 3, obtaining depth features, colors, spatial position information and high-level semantic information of different levels, and aiming at different input features, adopting L ₂ The normalization preprocessing comprises the following specific calculation modes:

wherein f ═ f ₁ ,f ₂ ,…,f _p ]Expressed as a network output feature having a dimension of k;

step 4.2, according to L of step 4.1 ₂ And (4) carrying out PCA (principal component analysis) on the processed features y to obtain a pedestrian level descriptor with high discriminative power.

Further, in step 5, the specific steps of the similarity measurement include;

the method is used for carrying out pedestrian similarity discrimination based on Euclidean distance, wherein the Euclidean distance is derived from two points x in N-dimensional Euclidean space ₁ ,x ₂ The distance formula can be expressed as:

where M is expressed as the total number of samples, x ₁ ,x ₂ The pedestrian re-identification method can be used for respectively representing the characteristics of pedestrians under different visual angles of two persons, obtaining a final measurement result through Euclidean distance calculation, and ranking the measurement result to obtain a final pedestrian re-identification result.

The invention has the following positive effects and advantages:

1) the invention designs a trident multi-branch neural network model to realize the application of the pedestrian re-identification technology based on the mobile terminal equipment. Specifically, a convolutional neural network based on color, a convolutional neural network based on spatial position information and a convolutional neural network based on high-level semantic information are designed. The designed neural network model has a shallower network structure and fewer network parameters, and provides possibility for realizing the pedestrian re-identification technology based on movement.

2) The designed network model is used for acquiring the depth level characteristics of pedestrians and is more in line with the recognition mechanism of human beings for things from coarse to fine and from shallow to deep; in addition, a network unified model training mode is based on, optimization of network parameters is facilitated, and compared with a traditional training mode of neglecting network middle layer information, the structure designed by the invention can better combine the bottom layer visual features and the high layer semantic attributes of pedestrians, and can obtain pedestrian descriptors with higher discriminative power.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

FIG. 2 is a schematic diagram of a model of a tridentate multi-branch convolutional neural network.

Fig. 3 shows a process of the color convolutional neural network.

Fig. 4 shows a process of the spatial convolutional neural network.

FIG. 5 is a process of semantic convolutional neural network processing.

Detailed Description

To further clarify the technical means and effects of the present invention, the following description will be made with reference to the accompanying drawings and specific embodiments.

The invention provides a pedestrian re-identification method based on a tridentate multi-branch network structure, which comprises the following steps of:

and step 1, acquiring data. For a plurality of different camera angles (the invention takes two cameras as an example, namely C) _a And C _b ) Respectively expressed as

And

where n and m each represent the number of people under each camera view (in our problem, n ═ m is typically present). Taking out corresponding pedestrian images from the visual angles of the cameras a and b respectively as input data of a network;

and 2, preprocessing. Preprocessing the pedestrian image obtained in the step 1, whitening each pedestrian image sample, and subtracting the mean value and the normalized image to adapt to the network result, thereby facilitating the training of subsequent data;

and 3, training a network structure. Inputting the preprocessed pedestrian image data into a trident network structure for optimization training, wherein a triple training mode is adopted in the step to realize optimization of the network structure so as to obtain depth features of pedestrian levels;

step 4, L of hierarchical characteristics ₂ And (6) normalization processing. Adopting L for the different depth level characteristics of the pedestrians obtained in the step 3 ₂ Normalization processing is carried out to obtain the final characteristics described for the pedestrian;

and 5, measuring the similarity. And (4) for the pedestrian level depth features obtained in the step (4), measuring the pedestrian similarity by adopting a distance measuring mode (the Euclidean distance and the XQDA distance are respectively adopted by the invention, and details are described later) for different pedestrian samples so as to obtain a final pedestrian re-identification result.

In the above pedestrian re-identification method based on the tridentate convolutional neural network structure, in step 1, the acquiring of pedestrian image data under different cameras specifically includes:

step 1.1: because the network structure provided by the invention is developed and trained based on the triple loss function, the input image data is developed based on the triple image. Taking an image I of a pedestrian as an anchor sample, and representing the positive sample image asI ⁺ (the same person as the anchor sample), the negative sample image is denoted I ^- (images of the same person as the anchor sample);

step 1.2: for the selection of the triple images, the generation of the triple pedestrian image data is realized by adopting a data enhancement method, so that the pedestrian triple image data adaptive to the network structure training is obtained.

In the above pedestrian re-identification method based on the tridentate convolutional neural network structure, in step 2, the obtained triple image data is preprocessed, and the specific steps include:

step 2.1: performing ZCA whitening processing on the pedestrian image pair; in the present invention, we use a combination of whitening and dimensionality reduction to transform the covariance matrix of the input data into the identity matrix I, specifically if R is an arbitrary orthogonal matrix (i.e., satisfies RR) ^T ＝R ^T R ═ I, where R is a rotating or reflecting matrix), then the defined ZCA whitening result is: x is the number of _ZCAwhite ＝Ux _PCAwhite Where U refers to the eigenvector matrix of the covariance matrix of the data, x _PCAwhite Referred to as PCA whitening, x _ZCAwhite The ZCA whitening is equivalent to converting the data after PCA whitening back to the original space, and the result after the ZCA whitening is as close to the original input data x as possible;

step 2.2: and obtaining a ZCA whitened data result, carrying out normalization processing on the data, and subtracting the mean value of the data to make the data more accord with the data form input by the trident network structure.

In the above pedestrian re-identification method based on the trident convolutional neural network structure, in step 3, the optimization training of the trident convolutional neural network structure includes the specific steps of:

step 3.1: we input data to convolutional neural networks of different structures (color, spatial location information and semantics). Specifically, the semantic convolutional neural network comprises 7 convolutional layers, 5 pooling layers and 3 full-connection layers, and the output characteristic of the last full-connection layer is considered as the semantic information of pedestrians; the color convolutional neural network structure comprises 4 convolutional layers, 3 pooling layers and 3 full-connection layers, and the last full-connection layer is considered as color information of pedestrians; the position convolution neural network comprises 5 convolution layers, 2 pooling layers, a space pyramid layer and 3 full-connection layers and is used for outputting the space position information of the pedestrian;

step 3.1.1: aiming at the designed network trident network structure, specifically, the design of the color convolution neural network structure is as follows: firstly, decomposing an original pedestrian monitoring image into an RGB component diagram, respectively extracting the color features of each component through a network model taking CaffeNet as a reference, namely a convolution layer 1, a convolution layer 2 and a convolution layer 3 in the attached figure 2, then combining the color components, and obtaining the color features of pedestrians through a sub-convolution layer 1 and a sub-full connection layer;

step 3.1.2: the design of the position convolution neural network is specifically to extract the pedestrian convolution characteristics and design a multi-scale-based spatial pyramid pooling layer structure so as to obtain the pedestrian characteristics under different scales. After passing through the common convolutional layer 1, convolutional layer 2 and convolutional layer 3 in fig. 2, extracting features related to spatial position information through the sub-convolutional layer 2, then passing through a spatial pyramid layer, wherein the pyramid structure is respectively designed into 16 × 256 dimensional, 4 × 256 dimensional and 256 dimensional pooling layer structures, and finally performing cascade learning on the features to form spatial features of pedestrians;

step 3.1.3: the design of the semantic convolutional neural network is based on a CaffeNet network structure, namely, the convolutional layer 1, the convolutional layer 2 and the convolutional layer 3 in the attached figure 2 are used for carrying out local segmentation on the convolutional features by utilizing a segmentation algorithm, and the parts of the head, the trunk, the legs and the shoes are respectively obtained through the convolutional layer 4, the convolutional layer 5, the convolutional layer 6 and the convolutional layer 7 so as to obtain an interested region;

step 3.2: according to the network structure designed and obtained in the step 3.1, the pedestrian level features are obtained, the last layer features of each branch network are used as the pedestrian features, and the last but one layer features of each branch network are also used as descriptors of pedestrians, so that the more discriminative level pedestrian features are obtained;

step 3.3: for input pedestrian image triplet<I、I ⁺ 、I ^- >According to the inherent characteristics of re-identification of pedestrians, the distance between the same person is smaller than the distance between different persons to optimize the network structure, namely:

wherein

Expressed as the distance between one and the same person,

expressed as the distance between different pedestrians. Specifically, in the network structure training of the present invention, the triple data for the pedestrian inputted as described above<I、I ⁺ 、I ^- >After training through the trident network, the obtained depth feature is expressed as<g _w (I)、g _w (I ⁺ )、g _w (I ^- )>In which g is _w (I) Characteristic of anchoring pedestrians, g _w (I ⁺ ) Characteristic of a positive sample pedestrian, g _w (I ^- ) Is a negative sample pedestrian characteristic, and w is a network parameter. During the specific network training process, namely to be realized

‖g _w (I)-g _w (I ⁺ )‖<‖g _w (I)-g _w (I ^- )‖ (1)

‖g _w (I)-g _w (I ⁺ )‖ ² <‖g _w (I)-g _w (I ^- )‖ ² (2)

the derivation calculation is carried out on the transformed cost function, a back propagation idea is specifically adopted, and the method specifically comprises the following steps:

step 3.3.1, in the pedestrian re-identification problem, in order to facilitate the optimization of network parameters, the similarity distance between different pedestrians is calculated, so the loss function represented by the formula (2) can be transformed into:

wherein L is a loss of a network structure (three networks and three losses are collectively represented by L in the present invention), d (w, I) is a difference between a distance between the same pedestrian and a distance between different pedestrians, N is a total number of samples of the pedestrian, and C is a boundary distance that constrains positive and negative samples;

step 3.3.2: in the overall structure, the present invention employs convolution kernels of size 11 × 11,5 × 5,3 × 3 for convolution layer 1, convolution layer 2, and convolution layer 3, respectively, in fig. 2, and the forward convolution operation thereof is represented as:

wherein

And

step 3.3.3: the method comprises the steps of firstly carrying out a forward propagation process to obtain an output result, calculating errors between the output result and an actual result, carrying out backward propagation on the errors, and adjusting values of various parameters according to the errors in the backward propagation process. And continuously iterating the process until convergence. The partial derivative calculation method of the formula (3) is as follows:

where W represents a network parameter. I.C. A _i Representing the ith pedestrian image.

By pairing d (W, I) _i ) The gradient calculation method can be obtained as follows:

And

to obtain the final loss.

With the above partial derivative formulas (5) and (6), we can substitute it into algorithms such as gradient descent method to minimize L, thereby solving the loss of each layer of back propagation and realizing the final network optimization process.

In the above pedestrian re-identification method based on the tridentate convolutional neural network structure, in step 4, the L of the layer-level features ₂ The normalization processing comprises the following specific steps:

step 4.1: according to the optimized network structure in the step 3, depth features (color, spatial position information and high-level semantic attributes) of different levels are obtained, and L is adopted according to output features of different dimensions ₂ The normalization preprocessing comprises the following specific calculation modes:

step 4.2: according to L of step 4.1 ₂ The processed characteristic y is processed by PCA (principal Component analysis), namely a principal Component analysis method, and is a pedestrian level descriptor with high discriminative power by using the most extensive data dimension reduction algorithm.

In the above pedestrian re-identification method based on the tridentate convolutional neural network structure, in step 5, the specific steps of measuring the similarity include:

step 5.1: generally, a method based on Euclidean distance and XQDA (Cross-view quantitative cognitive Analysis) is adopted to judge the similarity of pedestrians;

step 5.2: euclidean distance is derived from two points x in N-dimensional Euclidean space ₁ ,x ₂ The distance formula can be expressed as:

where M is expressed as the total number of samples. In the present invention, x ₁ ,x ₂ The pedestrian characteristics of two persons under different visual angles can be respectively represented, and the final measurement result can be obtained through Euclidean distance calculation; XQDA (Cross-view quantized differential Analysis) is a Cross-view Quadratic Discriminant Analysis used to calculate the similarity between samples at different views. The distance calculation formula can be expressed as:

wherein x _i ,x _j Respectively representing two samples, Σ, across view angles _I Sum Σ _E Is a sample covariance matrix; measuring the lines respectively by the two distance calculation modesAnd ranking the measurement result according to the pedestrian similarity of the people so as to obtain a final pedestrian re-identification result.

Example 1

Preparation work:

1. hypothesis C _a And C _b For two camera views under different spatial and regional environments (the invention takes two cameras as an example), the data sets of people in the cameras are respectively

And

where n and m represent the number of people under each camera view, respectively;

2. preprocessing pedestrian data acquired by actual monitoring, mainly performing ZCA whitening processing and normalization, mean value removal and other related operations to obtain a de-noised pedestrian image;

3. and obtaining the ternary group data, wherein corresponding input data is formed by selecting the anchor point pedestrian, the positive sample and the negative sample mainly in an online generation mode.

The method comprises the following specific implementation steps:

1. design of trifurcate multi-branch convolutional neural network

Specifically, the trifurcate convolutional neural network model designed by the present invention mainly includes a semantic convolutional neural network, a spatial location information convolutional neural network, and a semantic convolutional neural network, as shown in fig. 2. The semantic convolutional neural network comprises 7 convolutional layers and 5 pooling layers, and the output characteristics of the final full-connection layer are considered as the semantic information of pedestrians; the color convolution neural network structure comprises 4 convolution layers and 3 pooling layers, and the last full-connection layer is considered to be color information of pedestrians; the position convolution neural network comprises 5 convolution layers, 2 pooling layers and a space pyramid layer and is used for outputting the space position information of the pedestrian; and the last two fully connected layers of each sub-network are both used as the hierarchy pedestrian feature description of the pedestrian;

2. training of trident multi-branch convolutional neural networks

The convolutional neural network model designed by the invention is trained in a back propagation mode, and the specific cost function is triple loss. The concrete expression form is as follows:

and loss calculation is performed by means of back propagation. The calculation method of back propagation is as follows to realize the calculation of the network model.

3. Acquisition of depth-level depth features

After the network structure is optimized, the test image is subjected to forward propagation, and the forward convolution operation of the test image is represented as:

and in the invention, the last two full connection layers of each convolutional neural network are used as the features of the pedestrians to respectively obtain the hierarchical features of the respective network structures. Compared with the traditional deep learning-based method, the method has higher discrimination on the obtained depth features.

4. Normalization processing of depth-level features

For the output level characteristics of the sub-deep neural network, the invention adopts the L-based method ₂ Normalization plus pca (principal Component analysis), a principal Component analysis method. And dimension reduction and noise reduction of the hierarchical features are realized, so that the pedestrian features with higher discrimination are obtained.

5. Similarity measure to pedestrian features

According to the processed pedestrian characteristics obtained in step 4, the similarity measurement of the pedestrian is realized by adopting the Euclidean distance and XQDA measurement modes (the detailed principle is described in step 5 in the summary of the invention).

The above-mentioned embodiments only show the embodiments of the present invention, and the description thereof is more specific and detailed, but it should not be understood as the limitation of the scope of the invention, and it should be noted that those skilled in the art and ordinary skill can make several variations and modifications without departing from the spirit of the present invention, which all fall within the scope of the present invention, therefore, the scope of the present invention should be determined by the appended claims.

Claims

1. A pedestrian re-identification method based on a tridentate multi-branch network structure is characterized by comprising the following steps:

the three-fork network structure comprises three convolutional neural network structures, namely a semantic convolutional neural network (semantic CNN, S-CNN), a color convolutional neural network (color CNN, C-CNN) and a position convolutional neural network (location CNN, L-CNN); the S-CNN is used for acquiring the semantic information characteristics of the pedestrians, including the acquisition of the last high-level abstract characteristics of a deep network structure; the C-CNN is used for acquiring the visual characteristics of the bottom layer of the pedestrian, and the L-CNN is used for extracting the spatial position information of the pedestrian;

step 4, for step 3The obtained different depth level characteristics of the pedestrians adopt L ₂ Normalization processing is carried out, so that the final hierarchy depth feature described for the pedestrian is obtained;

and 5, measuring the pedestrian similarity of the pedestrian level depth features obtained in the step 4 by adopting a distance measurement mode on different pedestrian samples so as to obtain a final pedestrian re-identification result.

2. The method according to claim 1, wherein the pedestrian re-identification method based on the tridentate multi-branch network structure comprises: in the step 1, acquiring pedestrian image data under different cameras comprises the following specific steps:

and step 1.2, for the selection of the triple images, generating triple pedestrian image data by adopting a data enhancement method so as to obtain pedestrian triple image data adaptive to network structure training.

3. The method according to claim 1, wherein the pedestrian re-identification method based on the tridentate multi-branch network structure comprises the following steps: in the step 2, preprocessing the pedestrian image, wherein the specific steps comprise;

step 2.1, combining whitening and dimensionality reduction to change the covariance matrix of the input data to an identity matrix I, specifically if R is any orthogonal matrix, then RR is satisfied ^T ＝R ^T Where R is a rotation or reflection matrix, the defined ZCA whitening result is: x is the number of _ZCAwhite ＝Ux _PCAwhite Where U refers to the eigenvector matrix of the covariance matrix of the data, x _PCAwhite Referred to as PCA whitening, x _ZCAwhite The ZCA whitening is equivalent to converting the data after PCA whitening back to the original space, and the result after the ZCA whitening is as close to the original input data x as possible;

and 2.2, obtaining a data result after ZCA whitening, carrying out normalization processing on the data, and subtracting the mean value of the data to make the data more accord with the data form input by the tridentate network structure.

4. The method according to claim 1, wherein the pedestrian re-identification method based on the tridentate multi-branch network structure comprises: the network structure of the color convolutional neural network is that firstly, an original pedestrian monitoring image is decomposed into RGB component graphs, color features of each component are extracted through a network model taking CaffeNet as a reference, namely through a convolutional layer 1, a convolutional layer 2 and a convolutional layer 3, and then the color components are combined, namely through a sub-convolutional layer 1 and three sub-full connecting layers, so that the color features of pedestrians are obtained;

5. The method according to claim 1, wherein the pedestrian re-identification method based on the tridentate multi-branch network structure comprises: in step 3, the optimization training of the structure of the trident convolutional neural network comprises the following specific steps:

wherein

Expressed as the distance between one and the same person,

expressed as the distance between different pedestrians; specifically, with respect to the above-described inputted pedestrian triplet data<I、I ⁺ 、I ^- >After training through the trident network, the obtained depth feature is expressed as<g _w (I)、g _w (I ⁺ )、g _w (I ^- )>Wherein g is _w (I) Characteristic of anchoring pedestrians, g _w (I ⁺ ) Characteristic of a positive sample pedestrian, g _w (I ^- ) Is the pedestrian characteristic of a negative sample, w is the network parameter, and the realization is realized in the specific network training process

‖g _w (I)-g _w (I ⁺ )‖<‖g _w (I)-g _w (I ^- )‖ (1)

‖g _w (I)-g _w (I ⁺ )‖ ² <‖g _w (I)-g _w (I ^- )‖ ² (2)

a derivative calculation is performed on the transformed cost function.

6. The method according to claim 5, wherein the pedestrian re-identification method based on the tridentate multi-branch network structure comprises: the inverse propagation idea is adopted to conduct derivation calculation on the transformed cost function, and the method specifically comprises the following steps:

wherein

And

And

to obtain the final loss;

7. The method according to claim 1, wherein the pedestrian re-identification method based on the tridentate multi-branch network structure comprises: in step 4, L is carried out on the layer depth characteristic ₂ The normalization processing comprises the following specific steps:

8. The method according to claim 1, wherein the pedestrian re-identification method based on the tridentate multi-branch network structure comprises: in step 5, the specific steps of the similarity measurement include;