CN112766180B

CN112766180B - Pedestrian re-identification method based on feature fusion and multi-core learning

Info

Publication number: CN112766180B
Application number: CN202110087196.2A
Authority: CN
Inventors: 雷大江; 张鑫; 冉港生; 张莉萍; 吴渝
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2022-07-12
Anticipated expiration: 2041-01-22
Also published as: CN112766180A

Abstract

The invention relates to the field of pedestrian re-identification, in particular to a pedestrian re-identification method based on feature fusion and multi-core learning; the method comprises the steps of acquiring a pedestrian image and preprocessing the pedestrian image; extracting bottom layer characteristics and middle layer semantic characteristics of the pedestrian image, and performing characteristic fusion by adopting a characteristic weight combination method; mapping the fusion characteristics by adopting different kernel functions respectively; giving weight to each kernel function by using a center alignment method, performing linear combination, and performing composite mapping; processing the fusion features after the composite mapping by adopting a multiple logistic regression algorithm, calculating the similarity between pedestrian images, and obtaining a pedestrian classification value, namely a pedestrian re-identification result by utilizing descending order of the similarity; the fusion characteristics adopted by the invention have robustness and can reflect the characteristics of pedestrians more truly; the invention adds a multi-core learning method and maps the fusion features to a high-dimensional space, so that the features can be better expressed, thereby enhancing the classification effect of the pedestrian re-identification problem.

Description

Pedestrian re-identification method based on feature fusion and multi-core learning

Technical Field

The invention relates to the field of pedestrian re-identification, in particular to a pedestrian re-identification method based on feature fusion and multi-core learning.

Background

The pedestrian re-identification is a computer vision application technology which is very concerned by people in recent years, and the research on the technology has obtained a series of research results, becomes a research hotspot in the field of computer vision, and is widely applied to the fields of pedestrian tracking, behavior analysis, criminal investigation and the like.

The pedestrian re-identification refers to identifying whether images shot by different cameras in a non-overlapping area are the same person, and currently, research on pedestrian re-identification mainly aims at pedestrians appearing in videos and pedestrians appearing in pictures. Pedestrian re-identification is developed from pedestrian detection and pedestrian identification, and the purpose of pedestrian detection is to segment and accurately locate pedestrians appearing in videos or images from backgrounds, and the pedestrian re-identification method is mainly used in related fields such as intelligent driving, auxiliary driving and intelligent monitoring. Pedestrian recognition is to identify a specified person from an input picture or video, to determine information such as the identity or attribute of the person, and is mainly used for image retrieval and the like. The difference between the pedestrian re-identification and the pedestrian detection and the pedestrian identification is that the pedestrian re-identification needs to compare the characteristics of the pedestrian to be searched with the characteristics of other pedestrians so as to judge whether the pedestrian is the same person.

In an actual application scene, the resolution of a pedestrian image in a video is often low, the information of the pedestrian image is fuzzy, and the appearance of the same person greatly changes under the influence of factors such as the self characteristics of a camera, illumination, a visual angle, a posture and shielding. This makes feature extraction complex and difficult to extract features with discriminability and robustness. Moreover, the conventional method for re-identifying pedestrians is to classify the pedestrian samples by using a linear method, and the real pedestrian sample data is often non-linear, which also increases the difficulty of re-identifying pedestrians.

Disclosure of Invention

In order to solve the technical problem, the invention provides a pedestrian re-identification method based on feature fusion and multi-core learning, which comprises the following steps:

acquiring a pedestrian image, and preprocessing the pedestrian image;

extracting bottom layer characteristics and middle layer semantic characteristics of the pedestrian image, and performing characteristic fusion by adopting a characteristic weight combination method;

mapping the fusion characteristics by adopting different kernel functions respectively;

giving weight to each kernel function by using a center alignment method, carrying out linear combination to obtain a composite kernel function, and carrying out composite mapping on the fusion characteristics according to the composite kernel function;

and processing the fusion features after the composite mapping by adopting a multiple logistic regression algorithm, calculating the similarity between the pedestrian images, and obtaining a pedestrian classification value, namely a pedestrian re-identification result by utilizing descending order of the similarity.

The embodiment of the invention has the following beneficial effects:

(1) the invention extracts the bottom layer characteristic and the middle layer semantic characteristic of the pedestrian and fuses the characteristics together by adopting a characteristic weight combination method.

(2) The invention adds a multi-core learning method, and maps the fusion characteristics to a high-dimensional space by adopting different kernel functions or different parameters of the same kernel function, so that the characteristics can be better expressed, and the classification effect of the pedestrian re-identification problem is enhanced.

Drawings

Fig. 1 is a flowchart of a pedestrian re-identification method based on feature fusion and multi-core learning in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment provides a pedestrian re-identification method based on feature fusion and multi-kernel learning, as shown in fig. 1, the method includes:

s101, acquiring a pedestrian image, and preprocessing the pedestrian image;

in some embodiments, the invention divides the pedestrian image into two areas of the pedestrian and the background, and removes the information of the background part.

Firstly, the pedestrian image source in the invention is mainly that pedestrians on a road or other scenes are collected and stored as videos, for the embodiment of the invention, different monitoring cameras on the same street can be selected, the shooting ranges of the cameras are not crossed, the monitoring videos in a preset time period are intercepted to be used as the basis for selecting the pedestrian images, the intercepted monitoring videos are preprocessed, the pedestrian images are extracted, a plurality of pedestrian sample images are obtained, and the same pedestrian in the pedestrian sample images is identified to distinguish different pedestrians in the pedestrian images, so that a plurality of pedestrian sample images carrying pedestrian identification labels are obtained.

It should be noted that the number of the selected monitoring videos on the same street may be selected according to actual situations, which is not limited in the present invention, and the number of the obtained pedestrian sample images may be as many as possible to ensure the accuracy of the subsequent method, which is not limited in the present invention.

For example, two adjacent first monitoring cameras and two adjacent second monitoring cameras in a certain street are selected, the coverage areas covered by the first monitoring cameras and the second monitoring cameras are not crossed, monitoring videos from 8 to 10 am in the first monitoring cameras and the second monitoring cameras are intercepted, preprocessing and pedestrian image extraction are respectively performed on the monitoring videos from 8 to 10 am in the first monitoring cameras and the second monitoring cameras, 100 pedestrian sample images in the first monitoring cameras and 100 pedestrian sample images in the monitoring cameras B are obtained, then the obtained pedestrian sample images are marked, the sample images belonging to the same pedestrian carry the same mark, and the mark is reflected in a subsequent multiple logistic regression algorithm.

In some preferred embodiments, the collected pedestrian image is segmented by adopting a deep learning mode, which may include but is not limited to ordinary segmentation, semantic segmentation and example segmentation based on the image, in this embodiment, ordinary segmentation is preferred, the ordinary segmentation separates pixel regions of objects of different categories, such as foreground and background, and pedestrian and background regions, the image block around the image pixel points can be adopted to determine the category of each pixel, and the CNN is adopted to greatly improve the accuracy of image segmentation.

In the present invention, any conventional technique may be adopted for the segmentation method of the pedestrian image, and the present invention is not limited to this.

S102, extracting bottom layer features and middle layer semantic features of the pedestrian image, and performing feature fusion by adopting a feature weight combination method;

in the embodiment of the invention, the bottom layer characteristic and the middle layer semantic characteristic of the pedestrian are respectively extracted and fused together, so that the fusion expression can complement the deficiency of the other side in expression and has higher robustness.

In some embodiments, the underlying features include basic image features such as color, texture, and gradient; these single features can be computed quickly and can be computed quickly using an integral graph technique.

In this embodiment, the bottom layer features include a color histogram feature and a multi-level guide edge energy feature.

For the color histogram feature, the pedestrian image can be divided into six horizontal bands, the color histogram is extracted from each horizontal band, and the feature fusion of four color spaces of RGB, HSV, CIELAB and YCbCr can be selected in the embodiment of the invention in consideration of different color spaces with different characteristics.

For a multi-level steering edge energy feature, which is a multi-scale version of the HOG feature, the feature is simpler and has no overlapping cells and the feature dimension is lower than that of the HOG feature. If the input image is 64 x 128, the HOG feature dimension is 3780 dimensions, and the feature dimension is 1360 dimensions, which is reduced by nearly 3 times, but the classification accuracy of the feature is not reduced.

The multi-level guide edge energy feature extraction step comprises the following steps: first, the edge energy response of the image in different directions is calculated, assuming 8 directions are set, and here, l can be performed on the 8-direction energy responses in a cell with a fixed size (6 × 16) overlapping₁Normalization, wherein a plurality of levels can be set according to the requirements of the user; and dividing the image into non-overlapped cells in the levels, overlapping normalized energy responses in all the cells to form a histogram feature, and finally performing weighted summation on the levels to finally form a feature vector.

For the middle-layer semantic features, a middle-layer semantic feature color name descriptor is extracted, the color name descriptor maps R, G, B triples of each pixel into 11-dimensional probability expression, the 11 predefined main colors correspond to the color name descriptor, namely, each R, G, B triplet belongs to a certain color with a certain probability instead of simple one-to-one mapping, each R, G, B triplet is mapped into a certain index according to the pixel value, channels R, G, B are divided into 32 small color intervals, and the relation between the certain index and the 11-dimensional probability expression is obtained by a probability latent semantic analysis method.

Conventional hybrid features refer to a fusion of various underlying features, or higher order statistical features of the underlying features. The image features can be described from different side faces, the detection accuracy is improved, but as the dimension of the features is increased, the calculation of the features and the detection time of a classifier are also increased, and the real-time performance is influenced. Learning-based features generally refer to features that neural networks learn directly from raw images. The feature can learn the feature with strong judgment capability from a large number of samples, and is excellent in pedestrian detection, but the calculation of the feature depends on high-performance hardware, is closely related to a training sample, and if the sample is not representative, good features are difficult to learn.

The method breaks through the limitation of the traditional technology, fuses the bottom layer characteristic and the middle layer semantic characteristic, and particularly fuses the color histogram characteristic in the bottom layer characteristic and the multi-level guide edge energy characteristic with the color name descriptor in the middle layer semantic characteristic, thereby realizing the characteristic complementation and enhancing the robustness of the pedestrian characteristic.

In the embodiment of the invention, the extracted three characteristics are subjected to dimensionality reduction by PCA respectively, then a characteristic weight combination method is adopted to fuse the characteristics, the adopted characteristic weight combination method adopts the idea that the similarity between different samples in the same class is far higher than the similarity between different samples in different classes in an LDA theory, and therefore the obtained weight coefficient value is the ratio of the square of the difference value of the similarity mean value of the samples in the classes and the similarity mean value of the samples between the classes to the sum of the similarity covariance of the samples in the classes and the similarity covariance of the samples between the classes.

S103, mapping the fusion characteristics by adopting different kernel functions respectively;

in the embodiment of the present invention, each fusion feature may be mapped by using a different kernel function, that is, the fusion feature is mapped and expressed differently by using a kernel matrix corresponding to the kernel function.

The different kernel functions include different types of kernel functions and kernel functions of the same type but different parameters, for example, the kernel functions used in the present invention may include gaussian kernel functions and polynomial kernel functions;

gaussian kernel function:

wherein, K (x)_i，x_j) Fusion feature x representing a sample image of a pedestrian_iAnd x_jMapping to a high dimension by adopting a Gaussian kernel function; sigma is more than 0, and sigma is the Gaussian kernel bandwidth;

polynomial kernel function:

K(x_i，x_j)＝(x_i ^Tx_j)^d

wherein, K (x)_i，x_j) Fusion feature x representing a sample image of a pedestrian_iAnd x_jMapping to a high dimension by adopting a polynomial kernel function; d is more than or equal to 1 and is the degree of the polynomial.

In the above implementation, the present invention can adopt different kernel functions to fuse the features x_iAnd x_jRespectively to other dimensions.

S104, giving weight to each kernel function by using a center alignment method, carrying out linear combination to obtain a composite kernel function, and carrying out composite mapping on the fusion characteristics according to the composite kernel function;

in the embodiment of the present specification, a plurality of kernel functions need to be combined together to obtain a composite kernel function, and the combining method is to use an align center alignment method;

center alignment definition:

if a pedestrian sample image x is mapped onto a high dimensional feature space and represented as phi (x), then the central kernel function K is_cCan be defined as:

K_c(x，x′)＝(φ(x)-E_x[φ])^T(φ(x′)-E_x′[φ])

＝K(x，x′)-E_x[K(x，x′)]-E_x′[K(x，x′)]+

E_x，x′[K(x，x′)]

K_c(x, x ') represents the central kernel function employed by the pedestrian sample images x and x'; as can be seen from the above formula definition, the kernel matrix K does not depend on the feature mapping rule. From the definition of the central kernel function, a similar definition of a central kernel matrix can be given, assuming that there are samples D ═ x₁，x₂，...，x_m) And the feature vector in the high-dimensional space is phi (x)_i)，i∈[1，m]And m represents the number of pedestrian sample images. By subtracting the empirical expectation from it, the kernel matrix is aligned and can then be expressed as

Here, the

Then using the central kernel matrix K_cThe original kernel matrix K is replaced so that the sample D is centered. Defining all i, j epsilon [1, m ∈ ]]Then there is：

Wherein [ K ]_c]_ijRepresenting central kernel functions adopted by the pedestrian sample images i and j; k_ijRepresenting kernel functions adopted by the pedestrian sample images i and j; let phi be [ phi (x) ]₁)，φ(x₂)，...，φ(x_m)]And is and

thus can obtain

Here the central kernel matrix K_cIs a semi-positive matrix.

In addition, like the kernel function, let<·，·>_FRepresents inner product operation, | · | shading_FRepresenting the norm, then:

and is provided with

A vector is defined with all elements being 1,

and an identity matrix I, so that for any one kernel matrix

This central kernel matrix can be written as:

the 11 in the formula and the pixel triplet are mapped into an 11-dimensional probability expression, namely, a 11-dimensional probability vector is represented; and for any two kernel matrices K and K', there are:

<K_c，K_c′>_F＝<K，K_c′>_F＝<K_c，K′>_F

linear combination of kernels:

learning the optimal linear combination of kernels by a center alignment method, and supposing that P different kernel matrixes K are obtained by learning₁，K₂，...，K_pThe final desired multi-core linear combination is then:

here K_cpIs a central kernel matrix, for each kernel matrix K_qAmong them are:

this μ is optimized by aligning the kernel matrix in maximum centers, then the optimization objective is:

solving by using a center alignment method to obtain mu ═ mu₁，μ₂，...，μ_p) Here yy^TIs an ideal kernel based on the target vector yy,<.，.>is the inner product operation of the matrix. Each μ set here_qAre all independent.

And d is a user-defined hyperparameter and has ∑ μ_g1 month after

And S105, processing the fusion features after the composite mapping by adopting a multiple logistic regression algorithm, calculating the similarity between the pedestrian images, and performing descending order arrangement by utilizing the similarity to obtain a pedestrian classification value, namely a pedestrian re-identification result.

In the embodiment of the invention, the problem of re-identification of the pedestrians can be regarded as a multi-classification problem, whether the pedestrians are the same person or not is judged by comparing the similarity between the image of the pedestrian to be detected and the images of other pedestrians, and the pedestrians are classified by adopting a multiple logistic regression algorithm; the multiple logistic regression algorithm is expanded to solve the problem of multi-class classification, and in order to embed feature selection into the classification process of the algorithm in consideration of the high-latitude features of data in certain specific fields, Laplace priors are introduced into the multiple logistic regression algorithm, so that the solution result of the optimization algorithm has sparsity.

Optimizing and solving a loss function of the sparse multiple logistic regression algorithm;

for the loss function is l₁The non-convex function of the regularization term can be solved by using a soft threshold iteration Algorithm (ISTA).

For SMLR, the optimization goals are:

wherein W ∈ R^n×k；||||₁Is represented by₁A regularization term. It is not easy to directly solve the above equation, so the present invention can convert it into a form that is easier to solve. The invention will be described in the specification of l (W) in W^tPerforming second-order Taylor expansion and taking Hessian matrix H_l(W^t) I/τ, where I ∈ R^n×nThe variable τ is the step size. Then there are:

then the target function of the ISTA can be rewritten to the following form:

thus, the minimization problem becomes:

by simple algebraic transformation, the minimization above can be rewritten in the form:

where the equation ignores constants that are not relevant for minimizing W. For minimization problems, the solution can be fast. Order:

the solution of the rewritten equation can be represented as S_λτ(W′^t). Wherein the soft threshold operation S_λ：Rⁿ→RⁿIs defined as:

where the index i indicates that the soft threshold operation is performed element by element.

And combining a composite kernel function obtained by combining various kernel functions with a sparse multiple logistic regression algorithm to enhance the classification effect.

A kernel method is added into a sparse multiple logistic regression algorithm to solve the problem of nonlinear multiple classification, multi-kernel learning and an SMLR algorithm are combined, and multi-kernel sparse multiple logistic regression calculation is formed by fusing a plurality of kernel functionsMethod (Multiple Kernels spark Multinomial logic Regression Algorithm, MKSMLR). After the kernel matrix is introduced, the linear search iteration speed is very slow, and in consideration of the problem of optimization solution efficiency, a Fast Iterative threshold-Thresholding Algorithm (FISTA) is adopted to accelerate the solution. On the basis of an original ISTA algorithm, a Nesterov acceleration strategy is adopted, the convergence rate of the algorithm is improved, and in the T iteration processes, the convergence rate under the worst condition is optimized from O (1/T) to O (1/T)²)。

The multiple logistic regression algorithm MKSMLR loss function based on the multi-core learning is represented as follows:

wherein, the first and the second end of the pipe are connected with each other,

the label of the ith pedestrian sample image corresponding to the jth kernel function is represented as a real classification;

representing the mapping characteristics of the ith pedestrian sample image in the composite kernel function;

representing the transpose of the vector parameters corresponding to the ith kernel function, each kernel function comprising an n-dimensional vector.

In some preferred embodiments, the fast soft threshold iterative algorithm used in the present invention comprises the following steps: τ is 1/L, L > 0; initializing parameters: alpha is formed by R^n×k(ii) a The initialization kernel function parameters refer in particular to the gaussian kernel bandwidth: σ ═ 2; maximum number of iterations: iter is 500; backtracking parameters: β ∈ (0, 1); outputting final parameters of the algorithm: α t⁺¹。

The specific process comprises the following steps:

iteration step:

1：from pedestrian images X⁽ⁱ⁾Calculating to obtain p different kernel matrixes;

2: calculating to obtain a multi-kernel learning parameter mu by using a center alignment method and generating a new kernel matrix K_cμ；

3: initializing counter t ← 0;

4: initialization parameter alpha^t←α，μ^t←1，v^t←α^t；

5：α^t+1＝p_τ(v^t)；

6：

7：

8：τ＝βτ；

9: when it satisfies

Or the algorithm terminates when a specified number of iteration steps is reached, step 10 is performed. Otherwise, let tFt +1, and return to step 5;

10: returning the updated algorithm parameter alpha^t+1。

Wherein p is_τ(v^t) Representing a near-end operator corresponding to a first momentum parameter when the step length is tau and iteration is carried out for t times; μ and v are momentum parameters; mu.s^t+1Representing a first momentum parameter at t +1 iterations; v. of^t+1Representing a second momentum parameter at t +1 iterations; alpha is alpha^t+1Representing vector parameters corresponding to all kernel functions output during t +1 iterations; l (. alpha.) of^t+1) Denotes alpha^t+1The target function of (a) is determined,

denotes alpha^t+1And alpha^tA preset value in between.

In the description of the present invention, it is to be understood that the terms "coaxial", "bottom", "one end", "top", "middle", "other end", "upper", "one side", "top", "inner", "outer", "front", "center", "both ends", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience of description and simplicity of description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "disposed," "connected," "fixed," "rotated," and the like are to be construed broadly, e.g., as meaning fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; the terms may be directly connected or indirectly connected through an intermediate, and may be communication between two elements or interaction relationship between two elements, unless otherwise specifically limited, and the specific meaning of the terms in the present invention will be understood by those skilled in the art according to specific situations.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A pedestrian re-identification method based on feature fusion and multi-kernel learning is characterized by comprising the following steps:

acquiring a pedestrian image, and preprocessing the pedestrian image;

2. The method according to claim 1, wherein the preprocessing of the pedestrian image comprises segmenting the acquired pedestrian image in a deep learning manner, dividing the pedestrian image into a pedestrian region and a background region, and acquiring the pedestrian image without the background region.

3. The method of claim 1, wherein the bottom-layer features of the pedestrian image comprise color histogram features and multi-level leading edge energy features.

4. The method according to claim 1, wherein the middle semantic features of the pedestrian image comprise color name descriptors, and the extraction process comprises dividing the pedestrian image into a plurality of horizontal bands, and assigning color name features to each band; and learning the relation between each pixel in the pedestrian image and the dimension color probability by adopting a probability latent semantic analysis method, and mapping each pixel into a multi-dimensional color probability expression.

5. The pedestrian re-identification method based on the feature fusion and the multi-kernel learning as claimed in claim 1, wherein the feature weight combination method comprises performing principal component analysis dimensionality reduction on the bottom layer features and the middle layer semantic features respectively, providing weight coefficient values by linear discriminant analysis, and performing feature fusion according to the weight coefficient values.

6. The pedestrian re-identification method based on feature fusion and multi-kernel learning according to claim 1, wherein the different kernel functions comprise different types of kernel functions and kernel functions with different parameters in the same type.

7. The pedestrian re-identification method based on feature fusion and multi-kernel learning as claimed in claim 1, wherein the solving means of the composite kernel function comprises setting an initial weight value for each kernel function as an initial multi-kernel linear combination; and maximizing the centered alignment center kernel matrix to obtain a weight set, and distributing a corresponding weight value for each independent kernel function from the weight set.

8. The pedestrian re-identification method based on feature fusion and multi-core learning according to claim 1, wherein the processing of the fusion features after the composite mapping by the multiple logistic regression algorithm comprises constructing a loss function of the multiple logistic regression algorithm based on the multi-core learning by using Laplace priors, and solving the loss function by using a soft threshold iterative algorithm.

9. The pedestrian re-identification method based on feature fusion and multi-kernel learning according to claim 8, wherein an acceleration strategy is adopted on the soft threshold iterative algorithm, that is, a fast soft threshold iterative algorithm is adopted to solve the loss function.