CN112418262A

CN112418262A - Vehicle re-identification method, client and system

Info

Publication number: CN112418262A
Application number: CN202011011077.0A
Authority: CN
Inventors: 王茜; 刘民; 张鸿洲
Original assignee: SHANGHAI CRIMINAL SCIENCE TECHNOLOGY RESEARCH INSTITUTE
Current assignee: SHANGHAI CRIMINAL SCIENCE TECHNOLOGY RESEARCH INSTITUTE
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2021-02-26

Abstract

The invention discloses a vehicle re-identification method, a client and a system, and relates to the technical field of computer vision and pattern recognition. The method comprises the following training process: inputting a vehicle image set containing different vehicle images into a convolutional neural network model as input data to train until the model converges to obtain a basic network model; and (3) re-identification process: acquiring a query vehicle image input by a user, respectively acquiring the feature vectors of the query vehicle image and each image in a comparison library based on a basic network model, calculating the distance between the feature vectors by constructing a depth measurement learning network, implementing sequencing according to the distance and outputting an identification result; the feature vector is a global feature generated by adding a plurality of local block features fused into the view angle feature and the relative position information and then performing feature fusion. The invention integrates the multi-attribute characteristics, increases the position information corresponding to the characteristics, can adapt to the change of the visual angle, and optimizes the identification rate and the cost performance of the calculated amount.

Description

Vehicle re-identification method, client and system

Technical Field

The invention relates to the technical field of computer vision and pattern recognition, in particular to a vehicle re-recognition method, a client and a system.

Background

With the widespread application of monitoring cameras in the field of public safety, vehicles are used as important objects in urban monitoring, and the vehicles attract wide attention in a large number of tasks related to the vehicles, such as detection, tracking, classification, verification and the like. Vehicle re-identification is to find out vehicles that contain images captured by other cameras of the same vehicle or under different lighting, viewing angles, etc. under the same camera as the query image. Through vehicle re-identification, a target vehicle can automatically find, locate and track across a plurality of cameras, and the method plays an important role in the automatic analysis of increasingly-increased urban monitoring videos. The application scenarios of vehicle re-identification are very wide, such as vehicle tracking, vehicle positioning, criminal detection and the like, and are also very important for the application of intelligent transportation and smart cities. The vehicle image shot by the monitoring system without the overlapped vision field is a main processing object for vehicle re-identification. Because the shot vehicle image contains factors such as visual angle (position) change, resolution, illumination change, blurring, camera setting, complex background and shielding, the vehicle re-identification problem has special difficulties, and solutions to the difficulties are still researched by many scholars.

Currently, methods for vehicle re-identification under non-overlapping field-of-view monitoring systems can be broadly divided into two categories: one is a vehicle re-identification method based on feature learning, and the other is a vehicle re-identification algorithm based on metric learning. For the vehicle re-identification method based on the feature learning, the vehicle image is represented as a feature vector by some descriptors with robustness, stability and distinguishing capability. In the vehicle re-identification problem, the similarity between vehicle images is directly or indirectly measured by extracting descriptive features which are distinctive to different vehicles and robust to view and illumination. Commonly used target re-identification features typically include color histograms (such as HSV, RGB, YCbCr color histogram), texture histograms, Local Binary Patterns (LBP), Scale Invariant Feature Transforms (SIFT), and the like. However, since the feature matching has no good robustness to the illumination and the change of the view angle (perspective), it is difficult to satisfy the requirements of robustness, stability and distinction at the same time. For a metric learning based vehicle re-identification method, it translates the vehicle re-identification problem into a distance metric problem between the computed target image and the candidate image. Distances commonly used for metrics typically include euclidean distance, mahalanobis distance, cosine distance, hamming distance, and the like. In recent years, deep metric learning is also a common method in vehicle re-identification problems, for example, training a multi-layer convolutional neural network to simultaneously extract features of vehicle images and corresponding similarity metric function learning. Vehicle re-identification based on depth metric learning does not need to manually extract the characteristics of a vehicle, but trains a large number of vehicle images by applying a neural network model. However, the training complexity is large, for example, an imbalance problem caused by training an imbalance data set is caused, and for image databases with different magnitudes, due to the imbalance of the image data set, there are often many more unmatched pairs than matched pairs.

In summary, the conventional vehicle re-identification method is difficult to achieve in the aspects of adaptive view angle identification, data imbalance, identification rate and computational cost ratio, and how to provide a vehicle re-identification scheme which can adapt to view angle changes, reduce data imbalance influence and has superior identification rate and computational cost ratio based on the prior art is a technical problem which needs to be solved urgently.

Disclosure of Invention

The invention aims to: the defects of the prior art are overcome, and the vehicle re-identification method, the client and the system are provided. The vehicle re-identification method provided by the invention integrates the multi-attribute characteristics, increases the corresponding position information of the characteristics, can adapt to the change of the visual angle, and optimizes the identification rate and the cost performance of the calculated amount.

In order to achieve the above object, the present invention provides the following technical solutions:

a vehicle re-identification method comprises a training process and a re-identification process, and comprises the following steps:

training process: inputting a vehicle image set containing different vehicle images into a convolutional neural network model as input data to train until the model converges to obtain a basic network model;

and (3) re-identification process: acquiring a query vehicle image input by a user, respectively acquiring a feature vector of each image in the query vehicle image and a comparison library based on the basic network model, calculating the distance between the feature vector of the query vehicle and the feature vector of the image in the comparison library by constructing a depth measurement learning network, implementing sequencing according to the distance, and outputting an identification result;

the feature vector is a global feature generated by adding a plurality of local block features blended into the view angle feature and the relative position information and then performing feature fusion.

Further, the training process may be preceded by the step of constructing a set of vehicle images, as follows:

erecting a plurality of camera terminal groups in a real lane, wherein each camera terminal group comprises a plurality of camera terminal devices to form a group of linked camera groups of 0-180 degrees around a vehicle; each group of camera terminal groups comprises 7 camera terminal devices, and the visual angle is divided into 7 visual angles which are respectively 0 degree, 30 degrees, 60 degrees, 90 degrees, 120 degrees, 150 degrees and 180 degrees;

the method comprises the steps that a plurality of groups of vehicle video images and/or vehicle static images at a plurality of visual angles are collected through the camera terminal equipment;

and screening the images, and selecting a clear picture corresponding to each view angle of each vehicle to be selected into a vehicle image set.

Further, the training process comprises a feature extraction step, a metric learning step and a classification step;

the characteristic extraction step comprises the steps of obtaining each image and extracting a plurality of local block characteristics of the vehicle in the image;

the metric learning step comprises the steps of converting the local block features into local block features fused with view angle features through a plurality of set basic feature views, carrying out fusion calculation on the local block features fused with the view angle features to obtain global features, and carrying out distance metric learning on the global features.

Further, the local blocking bitsCharacterizing a color histogram feature f comprising a host vehicle with a relative position dimension_crFull pixel color histogram feature f_chAnd texture feature f_vThe feature extraction step is as follows:

vehicle image set with aspect ratio W × D for image set

Two different vehicles are taken from the training set and are recorded as M and N, M,

the test vehicle is noted as T,

extracting a color component f of the host vehicle with a relative position dimension for each vehicle image_crThe following are:

dividing the vehicle image into 12 x 12 non-overlapping blocks P_w，dAnd acquiring a block color histogram and recording the block color histogram as:

wherein K represents the number of dereferencing characteristics, and K represents the number of dereferencing characteristics; h (k) is the number of pixels in the image having a characteristic value of k; w and d are respectively the central horizontal and longitudinal axis coordinates of the block;

for dimension reduction and improvement of separability, the component values of the first 10% of h (k) in h (k) represent the block color characteristics and are recorded as

Meanwhile, the center position of the score block is (c)_w，c_d) The center position of the block distance graph is

Obtaining the block color characteristics containing the block position information as follows:

wherein pw and pd respectively represent the row and column numbers of the block, and f_crThe calculation formula of (a) is as follows:

where, s is 1, 2, …, pw × pd.

Based on formula (1), full-pixel color histogram information f is acquired_chThe following are:

obtaining texture features by extracting FV coding dense features as follows:

converting the vehicle image into a gray image, dividing the vehicle image into 24 × 24 non-overlapping blocks P for the vehicle images of multiple view angles_w′，d′Extracting low-level SIFT descriptors point by point from each 24 × 24 image block; w ', d ' are respectively the central horizontal and vertical axis coordinates of the block, and the central position of the block is marked as (c '_w，c′_d)；

SIFT descriptor is obtained by using PCA algorithm to reduce dimension to l-128 dimension

And adding the central point position to obtain the block color characteristics containing the block position information, as follows:

v to be extracted_patchPerforming feature fitting to obtain a Gaussian mixture model GMM, wherein the model comprises H Gaussian models, and enabling omega to be omega_h，γ_h，S_hRespectively representing the weight, mean and covariance matrix of the H model in the GMM, H is 0, 1, …, H-1; calculating each Gaussian model according to the following formulaFirst order differential v of the form⁽¹⁾And a second order differential v⁽²⁾：

Wherein N represents the number of features;

representing the weight of the nth feature for the h gaussian model, N being 0, 1, …, N-1; x is the number of_nA feature value representing an nth feature;

will be provided with

And

are connected in series to obtain:

and adding position information to obtain (l +2) × H × 2 dimensional integral FV characteristics, wherein l is 128, and the FV characteristics are as follows:

sort (-) function sorting is carried out on the energy, Z key positions capable of representing the textural features of the energy are selected to form f_vThe expression is as follows:

wherein ss is 1, 2, …, Z, Z is an integer greater than 1.

Further, setting 0, 90 and 180 degrees as three basic feature perspectives, and calculating to obtain the global features according to the following steps:

marking three basic characteristic visual angles i of 0 degree, 90 degrees and 180 degrees as V respectively_fro、V_sid、V_revConverting all images in the training set into a picture characteristic function v blended with the visual angle characteristics_nullThe following are:

wherein alpha is_i，θA view transformation characteristic function representing a view i;

the method for obtaining the characteristics of the blocks FV according to the formula (9) and the formula (11), the formula (9) can be expressed as the characteristics blended into the characteristics of the view angle:

is calculated to obtain

Simultaneously obtain f'_ch，

Formula (3) is similarly rewritten and expressed as a feature that has been incorporated into the viewing angle feature as follows:

calculating to obtain a global feature F_fullThe following are:

F_full＝β_1，θf′_cr+β_2，θf′_ch+β_3，θf′_v (14)

wherein, beta_j，θRepresenting the weight value of each characteristic component according to the change of the visual angle, j is more than or equal to 1 and less than or equal toAnd 3, classification feature classes.

Further, the distance metric learning adopts mahalanobis distance, and for any two vehicle images, the step of calculating the distance function between the feature vectors of the two vehicle images through the mahalanobis distance is as follows:

for any two vehicle images si and sj, a distance function d (y) based on a characteristic vector between the images si and sj is calculated through a horse-type distance_si，y_sj) The following are:

wherein, A is a diagonal matrix which represents the weight to distinguish the importance degree of the characteristic vector; decomposing A into W^TW is solved instead of the weight beta in equation (14)_j，θThe function of (1); z is a radical of_siIs the feature vector of si, z_siA feature vector of sj;

through r_ijE (1, -1) to express whether the two images are similar, and the similarity belongs to the same class, r_ij1, otherwise, r is not of the same class_ijThe constraint function is given by-1:

wherein the content of the first and second substances,

the critical value parameter represents the distance between classes and the class distance, delta (si, sj) represents a positive and negative sample equalization function, and the calculation formula of delta (si, sj) is as follows:

wherein z is_snThe minimum value of the distance of all negative samples;

and (3) realizing the pairwise metric learning by calculating W, and obtaining final classification information by a KNN method.

Further, in the training process, for a large image database, model training is carried out by adopting a triple depth measurement learning algorithm, wherein the triple comprises a group of test images T and two groups of training images M and N; when the test acquisition test image T is close to M and far away from N, M, N is corresponding to the global feature { F } of T_full(M)，F_full(N)，F_full(T) is adjusted to { y }_M，y_N，y_TAnd (5) rewriting the distance constraint conditions of the feature vectors of the three components according to the formula (15) as follows:

wherein the content of the first and second substances,

is a critical value parameter of the distance between classes and the class distance; z_T、Z_N、Z_MFeature vectors, z, for images T, M and N, respectively_siIs the feature vector of sj.

Further, in the training process, aiming at the pair method, the asymmetric problem is solved by adopting a weight strategy of an asymmetric sample, wherein the weight strategy of the asymmetric sample is as follows: once per iteration, num + +, by weight parameter

Adjusting the weight value;

aiming at the triple depth metric learning algorithm, in order to reduce the operand, only triplets which violate the constraint condition and reach the non-zero loss constraint condition are calculated; and the number of the first and second groups,

for those in the formula (18)

The values are sorted from large to small, and sample matching is performed from the maximum value to reduce the interference of noise images.

The invention also provides a vehicle re-identification client, which comprises the following structure:

the model training module is used for inputting a vehicle image set containing different vehicle images into the convolutional neural network model as input data to train until the model converges to obtain a basic network model;

and a re-identification module: the system comprises a base network model, a comparison library, a query vehicle image and a depth measurement learning network, wherein the base network model is used for acquiring a query vehicle image input by a user, respectively acquiring a feature vector of each image in the query vehicle image and the comparison library based on the base network model, calculating the distance between the feature vector of the query vehicle and the feature vector of the image in the comparison library by constructing the depth measurement learning network, implementing sequencing according to the distance and outputting an identification result; the feature vector is a global feature generated by adding a plurality of local block features blended into the view angle feature and the relative position information and then performing feature fusion. The invention also provides a vehicle re-identification system, which comprises a user terminal and a server,

the user terminal is provided with a human-computer interaction interface, and a vehicle image set for training and query information input by a user are acquired through the human-computer interaction interface;

the server side includes a processor and a memory for storing processor-executable instructions and parameters, the processor configured to:

inputting a vehicle image set containing different vehicle images into a convolutional neural network model as input data to train until the model converges to obtain a basic network model; acquiring a query vehicle image input by a user, respectively acquiring the feature vector of each image in the query vehicle image and the comparison library based on the basic network model, calculating the distance between the feature vector of the query vehicle and the feature vector of the image in the comparison library by constructing a depth measurement learning network, implementing sequencing according to the distance, and outputting an identification result; the feature vector is a global feature generated by adding a plurality of local block features blended into the view angle feature and the relative position information and then performing feature fusion. Due to the adoption of the technical scheme, compared with the prior art, the visual angle display device has the following advantages and positive effects as examples:

aiming at the characteristics of a vehicle image set, a vehicle re-identification algorithm which is based on fusion characteristics and metric learning and can adapt to visual angle change is realized, links such as fusion of multi-attribute characteristics, increase of position information corresponding to the characteristics, coping with data unbalance, identification rate and calculation capacity cost ratio selection schemes are improved, and the identification rate is improved by testing on image data sets with different magnitudes and a public image data set. Further, in a metric learning link of model training, an improved pair metric learning method with high recognition rate and high computation cost performance and a triple depth metric learning (triple deep metric learning) algorithm with high recognition rate combined with a CNN network are respectively selected, and the asymmetry problem is solved by setting a weight strategy aiming at the asymmetry problem of image sets (samples) with different magnitudes.

On one hand, the method expands 5 visual angles of the previous multi-visual-angle identification database building into 7 visual angles, and carries out basic image database collection and modeling according to the secondary building real-time linkage hardware acquisition scheme. Meanwhile, relative position information acquired during feature extraction is added as dimension features of metric learning, any view angle vehicle is defined as a feature transformation function based on three basic components of 0 degrees and 90 degrees and 180 degrees, a block mapping scheme is implemented for each primary feature component, a training set is trained, a composite feature function capable of self-recognition and self-adaptive view angle change is acquired, and the recognition effect is improved.

On the other hand, different solutions for different algorithms are adopted according to the data volume property of the image set for training the unbalanced image library, the medium-sized database can adopt a pairwise method, and the large-sized database can adopt a triplets method. And meanwhile, the composite distance values are sequenced and then sample matching is carried out from the maximum value so as to reduce the interference of noise images.

Drawings

Fig. 1 is an information processing diagram of a vehicle re-identification method according to an embodiment of the present invention.

Fig. 2 is a schematic flow chart of vehicle re-identification according to an embodiment of the present invention.

Fig. 3 is a block diagram of a client according to an embodiment of the present invention.

Fig. 4 is a block diagram of a system according to an embodiment of the present invention.

Description of reference numerals:

a client 200, a model training module 210, and a re-identification module 220;

system 300, user terminal 310, server 320.

Detailed Description

The method, client and system for vehicle re-identification disclosed in the present invention are further described in detail with reference to the accompanying drawings and specific embodiments. It should be noted that technical features or combinations of technical features described in the following embodiments should not be considered as being isolated, and they may be combined with each other to achieve better technical effects. In the drawings of the embodiments described below, the same reference numerals appearing in the respective drawings denote the same features or components, and may be applied to different embodiments. Thus, once an item is defined in one drawing, it need not be further discussed in subsequent drawings.

It should be noted that the structures, proportions, sizes, and other dimensions shown in the drawings and described in the specification are only for the purpose of understanding and reading the present disclosure, and are not intended to limit the scope of the invention, which is defined by the claims, and any modifications of the structures, changes in the proportions and adjustments of the sizes and other dimensions, should be construed as falling within the scope of the invention unless the function and objectives of the invention are affected. The scope of the preferred embodiments of the present invention includes additional implementations in which functions may be executed out of order from that described or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate. In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

Examples

Referring to fig. 1, a vehicle re-identification method provided by the present invention includes the following steps:

step S100 training process: and (3) taking a vehicle image set containing different vehicle images as input data, inputting the input data into the convolutional neural network model, and training until the model converges to obtain a basic network model.

Step S200 re-identification process: acquiring a query vehicle image input by a user, respectively acquiring a feature vector of each image in the query vehicle image and a comparison library based on the basic network model, calculating the distance between the feature vector of the query vehicle and the feature vector of the image in the comparison library by constructing a depth measurement learning network, implementing sequencing according to the distance, and outputting an identification result; the feature vector is a global feature generated by adding a plurality of local block features blended into the view angle feature and the relative position information and then performing feature fusion.

Preferably, the training process further comprises a step of constructing a vehicle image set, and the specific steps are as follows:

firstly, erecting a plurality of camera terminal groups in a real lane, wherein each camera terminal group comprises a plurality of camera terminal devices to form a 0-180-degree linkage camera group around a vehicle. Preferably, each group of camera terminal devices includes 7 camera terminal devices, and the viewing angles are divided into 7 viewing angles, which are 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, and 180 °, respectively.

And then, a plurality of groups of vehicle video images and/or vehicle still images with a plurality of visual angles are collected through the camera terminal equipment.

And then, screening the images, and selecting a clear picture corresponding to each view angle of each vehicle to be selected into a vehicle image set.

In this embodiment, referring to fig. 1, the training process may include three steps of feature extraction (reconstruction), Metric learning (Metric learning), and Classification (Classification), and correspondingly, the obtained basic network model includes a feature extraction module, a distance Metric learning module, and a Classification module. The basic network model may be based on a mainstream neural network structure, including but not limited to google lenet, VGGNet, ResNet, etc., and in this embodiment, a google net V4 network structure is preferred.

The feature extraction step includes acquiring each image, and extracting a plurality of local block features of the vehicle in the image.

By way of example and not limitation, the local blocking features used for identification may be a color histogram (such as HSV, RGB, YCbCr color histogram), a texture histogram, a Local Binary Pattern (LBP), and/or a scale-invariant feature transform (SIFT).

By way of example and not limitation, the distance metric learning may employ Euclidean distance, Mahalanobis distance, cosine distance, and Hamming distance.

Including the color histogram feature f of the host vehicle with the relative position dimension as a local blocking feature_crFull pixel color histogram feature f_chAnd texture feature f_vA mahalanobis distance is taken as an example of a distance measurement method, and a typical implementation of the present embodiment is described in detail.

A vehicle image set is constructed by sample acquisition, the sample acquisition scheme being as follows:

constructing 42 camera terminal devices to collect vehicle videos and images together, dividing 42 terminals into 6 groups to be erected, constructing one camera terminal at every 30 degrees from the left side of a real lane, totally 7 camera terminals at 0-180 degrees of a group of ring cars, and numbering the camera terminals as S_a，b(a is more than or equal to 1 and less than or equal to 6, and b is more than or equal to 1 and less than or equal to 7), wherein a is a group, and b is the number of different viewing angles in the same group. Collection by S_a，bMultiple vehicle video feed of shootingAnd (4) performing multi-frame screening, and confirming that 7 visual angles of each vehicle are respectively cut out and a clear picture is put in storage. By the hardware erection method, the classification error caused by visual angle labeling during library building can be reduced.

And acquiring the training sample with the marked visual angle classification. Preferably, 6 groups of camera terminals are all arranged on a certain straight road section, so that the condition that a part of cameras do not capture the vehicle can be eliminated. The number of pictures taken by the same vehicle through the erection section at one time is 7 to 42. Considering that long-term data accumulation can cause the imbalance of the vehicle image database samples due to the fact that the number of the same vehicle image can reach up to 200, different algorithms are used according to different conditions when the imbalance database is trained, and estimation error influence caused by the imbalance is effectively reduced, which is described in detail later.

Then, feature extraction, metric learning, and classification are performed as follows.

1) Feature extraction

Let the image set be a vehicle image set with aspect ratio W × D

Two different vehicles are taken in the training set and are recorded as M and N

Test vehicle noted T

As shown in fig. 2. For each vehicle image, first, a color component f of the host vehicle with a relative position dimension is extracted_crThe method comprises the following steps:

wherein K represents the number of dereferencing characteristics, and K represents the number of dereferencing characteristics; h (k) is the number of pixels in the image having a characteristic value of k; and w and d are respectively the central horizontal and vertical axis coordinates of the block.

In order to reduce dimension (reduce feature dimension) and improve separability, in this embodiment, it is preferable to take the first 10% of h (k) component values in h (k) to represent the block color feature, and record the block color feature as

where pw and pd represent the number of rows and columns of the block, respectively.

f_crThe calculation formula of (a) is as follows:

where, s is 1, 2, …, pw × pd.

Based on the formula (1), the full-pixel color histogram information f can be acquired_chThe following are:

preferably, the texture features are obtained by extracting FV (i.e., Fisher vector) encoded dense features as follows:

first, a vehicle image is converted into a grayscale image, and the vehicle image is divided into 24 × 24 non-overlapping blocks P for the vehicle images of a plurality of view angles_w′，d′Low-level SIFT descriptors are extracted point-by-point for each 24 × 24 patch for 7 views. In the above value-taking method, more emphasis is placed on feature selection based on parts or chunks (patch), and the method is suitable for use in parts or chunks (patch)And (4) identifying targets with larger scales on vehicles and the like.

w ', d ' are respectively the central horizontal and vertical axis coordinates of the block, and the central position of the block is marked as (c '_w，c′_d). In this embodiment, w', d, c may be set to w ═ d ═ c_w′＝c_w，c_d′＝c_d. And then using PCA algorithm to reduce dimension to l-128 dimension to obtain SIFT descriptor

v to be extracted_patchPerforming feature fitting to obtain a Gaussian mixture model GMM, wherein the model comprises H Gaussian models, and enabling omega to be omega_h，γ_h，S_hThe weights, means and covariance matrices of the H-th model in GMM are respectively represented, H is 0, 1, …, H-1.

The first order differential v of each Gaussian model is calculated as follows⁽¹⁾And a second order differential v⁽²⁾：

Wherein N represents the number of features;

well show the weight of the nth feature for the h gaussian model, N ═ 0, 1, …, N-1; x is the number of_nA feature value representing an nth feature;

will be provided with

And

are connected in series to obtain:

2) metric learning incorporating perspective transformation

Wherein ss is 1, 2, …, Z, Z is an integer greater than 1. In this example, Z is taken as 100.

Three basic characteristic visual angles are set as 0, 90 and 180 degrees.

Marking three basic characteristic visual angles i of 0 degree, 90 degrees and 180 degrees as V respectively_fro、V_sid、V_revTraining set

All images in the picture can be converted into a picture characteristic function v with visual angle characteristics blended in_nullThe following are:

wherein alpha is_i，θA view transformation characteristic function representing the view i. In this example, α_i，θThe value may be 0.

The method of finding the features of the patches FV according to equation (9) (by default each patch is not divided and only a single base feature is mapped)) and equation (11) can express equation (9) as a feature that has been blended into the view-angle feature:

is calculated to obtain

Simultaneously obtain f'_ch，

finally, global feature F is obtained through calculation_fullThe following are:

F_full＝β_1，θf′_cr+β_2，θf′_ch+β_3，θf′_v (14)

wherein, beta_j，θAnd j is more than or equal to 1 and less than or equal to 3, and is used as a classification characteristic category.

The global feature is formed by adding a plurality of local block features blended into the view angle feature and then performing feature fusion after adding relative position information, and may also be referred to as a fusion feature, as shown in fig. 2, after the fusion feature is generated, a depth metric network (or referred to as a depth metric learning network) may be constructed.

In this embodiment, for any two vehicle images in the image set, for any two vehicle images si and sj, a distance function d (y) based on a feature vector between the images si and sj is calculated by a horse-like distance_si，y_sj) As follows：

Where A is a diagonal matrix representing weights to distinguish the importance of each attribute, and is decomposed into W^TW is solved to replace the weight beta in (14)_j，θThe function of (1); z is a radical of_siIs a feature vector of the vehicle image si, z_siIs the feature vector of the vehicle image sj.

In this embodiment, the constraint function can be obtained by expressing whether the two categories belong to the same category (the same category is 1, and vice versa — 1).

Specifically, by r_ijE (1, -1) to express whether the two images are similar, and the similarity belongs to the same class, r_ij1, otherwise, r is not of the same class_ijThe constraint function is given by-1:

wherein the content of the first and second substances,

where zsn is the minimum distance for all negative examples.

Finally, pair metric learning can be achieved by solving W, and final classification information is obtained by a KNN (i.e., K Nearest Neighbor) method.

In this embodiment, to adapt to different data sets, it is preferable to adopt a pairwise method for the medium-size image database, and it is preferable to adopt a triplets (triple) method for the large-size image database.

Specifically, a triple deep metric learning (triplet deep learning) algorithm is adopted for model training of a large image database.

The triplet includes a set of test images T and two sets of training images M and N. When the input test image T is close to M and far away from N, M, N will correspond to the global feature { F } of T_full(M)，F_full(N)，F_full(T) is adjusted to { y }_M，y_N，y_T}. Then, the distance constraint of the feature vectors of the three is rewritten according to the formula (15) as follows:

wherein the content of the first and second substances,

And finally, selecting GoogleNet V4 as a CNN network to complete the triplet deep metric learning algorithm.

According to the data volume property of the image data set, the method adopts the formula (16) and the formula (18) respectively, and effectively solves the problem caused by the imbalance of the database.

In this embodiment, for the pair wise method, in addition to setting a δ (si, sj) positive and negative sample equalization function in distance learning, a weight policy of an asymmetric sample is also adopted to solve the asymmetry problem, where the weight policy of the asymmetric sample is as follows: once per iteration, num + + (i.e., num increases by 1), by weighting the parameters

And adjusting the weight value.

For the triple depth metric learning algorithm, in order to reduce the computation amount, only triplets which violate the constraint condition and reach a non-zero loss (i.e., non-zero loss) constraint condition are calculated.

At the same time, for those in formula (18)

After the vehicle re-identification scheme of the visual angle self-adaption and block position measurement learning is tested, the test result shows that: by adding visual angle transformation (pos transformation) and block position metric learning, the recognition rate is improved from 73.7% to 87.7% in the pairwise method, and the recognition rate is improved from 87.5% to 91.8% in the triplet method, so that the vehicle recognition rate is obviously improved.

Referring to fig. 3, a vehicle re-identification client is further provided as another embodiment of the present invention.

The client 200 includes a model training module 210 and a re-recognition module 220.

The model training module 210 is configured to input a vehicle image set including different vehicle images as input data into the convolutional neural network model for training until the model converges, so as to obtain a basic network model.

The re-identification module 220: acquiring a query vehicle input by a user, respectively acquiring a feature vector of each image in a query vehicle image and a comparison library based on the basic network model, calculating the distance between the feature vector of the query vehicle and the feature vector of the image in the comparison library by a distance measurement method, judging whether the query vehicle is similar to the image in the comparison library according to the distance, and outputting an identification result; the feature vector is a global feature generated by fusing a plurality of local block features fused into the view angle feature.

Other technical features refer to the foregoing embodiment, and the modules can be configured to execute the corresponding information processing method, which is not described herein again.

Referring to fig. 4, a vehicle re-identification system is further provided as another embodiment of the present invention.

The system 300 includes a user terminal 310 and a server 310.

The user terminal 310 is provided with a human-computer interaction interface, and a vehicle image set for training and query information input by a user are acquired through the human-computer interaction interface.

The server side 320 includes a processor and a memory for storing processor-executable instructions and parameters.

The processor is configured to:

inputting a vehicle image set containing different vehicle images into a convolutional neural network model as input data to train until the model converges to obtain a basic network model; and the number of the first and second groups,

acquiring a query vehicle input by a user, respectively acquiring a feature vector of each image in a query vehicle image and a comparison library based on the basic network model, calculating the distance between the feature vector of the query vehicle and the feature vector of the image in the comparison library by a distance measurement method, judging whether the query vehicle is similar to the image in the comparison library according to the distance, and outputting an identification result; the feature vector is a global feature generated by fusing a plurality of local block features fused into the view angle feature.

Other technical features refer to the previous embodiment, and the processor can be configured to execute the corresponding information processing method, which is not described herein again.

In the foregoing description, the disclosure of the present invention is not intended to limit itself to these aspects. Rather, the various components may be selectively and operatively combined in any number within the intended scope of the present disclosure. In addition, terms like "comprising," "including," and "having" should be interpreted as inclusive or open-ended, rather than exclusive or closed-ended, by default, unless explicitly defined to the contrary. All technical, scientific, or other terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. Common terms found in dictionaries should not be interpreted too ideally or too realistically in the context of related art documents unless the present disclosure expressly limits them to that. Any changes and modifications of the present invention based on the above disclosure will be within the scope of the appended claims.

Claims

1. A vehicle re-identification method is characterized by comprising a training process and a re-identification process, and the steps are as follows:

2. The method of claim 1, wherein the training process is preceded by the step of constructing a set of vehicle images, as follows:

the method comprises the steps that a plurality of groups of vehicle video images and/or vehicle static images at a plurality of visual angles are collected through the camera terminal equipment; and screening the images, and selecting a clear picture corresponding to each view angle of each vehicle to be selected into a vehicle image set.

3. The method of claim 2, wherein: the training process comprises a feature extraction step, a measurement learning step and a classification step;

4. The method of claim 3, wherein: the local blocking features include a color histogram feature f of the host vehicle with a relative position dimension_crFull pixel color histogram feature f_chAnd texture feature f_vThe feature extraction step is as follows:

vehicle image set with aspect ratio W × D for image set

Two different vehicles are taken from the training set as M and N,

the test vehicle is noted as T,

where, s is 1, 2, …, pw × pd.

obtaining texture features by extracting FV coding dense features as follows:

Dimensionality reduction by PCA algorithmGet SIFT descriptor from 128D

v to be extracted_patchPerforming feature fitting to obtain a Gaussian mixture model GMM, wherein the model comprises H Gaussian models, and enabling omega to be omega_h，γ_h，S_hRespectively representing the weight, mean and covariance matrix of the H model in the GMM, H is 0, 1, …, H-1; the first order differential v of each Gaussian model is calculated as follows⁽¹⁾And a second order differential v⁽²⁾：

Wherein N represents the number of features;

will be provided with

And

are connected in series to obtain:

wherein ss is 1, 2, …, Z, Z is an integer greater than 1.

5. The method of claim 4, wherein: setting 0, 90 and 180 degrees as three basic feature perspectives, and calculating to obtain global features according to the following steps:

is calculated to obtain

Simultaneously obtain f'_ch，

calculating to obtain a global feature F_fullThe following are:

F_full＝β_1，θf′_cr+β_2，θf′_ch+β_3，θf′_v (14)

6. The method of claim 5, wherein: the distance measurement learning adopts the Mahalanobis distance, and for any two vehicle images, the step of calculating the distance function between the characteristic vectors of the two vehicle images through the Mahalanobis distance comprises the following steps:

wherein, A is a diagonal matrix which represents the weight to distinguish the importance degree of the characteristic vector; decomposing A into W^TW is solved instead of the weight beta in equation (14)_j，θThe function of (1); z is a radical of_siIs siCharacteristic vector of (2), z_siA feature vector of sj;

wherein the content of the first and second substances,

wherein z is_snThe minimum value of the distance of all negative samples;

the classification step comprises: and (3) realizing the pairwise metric learning by calculating W, and obtaining final classification information by a KNN method.

7. The method of claim 6, wherein: in the training process, model training is carried out on a large image database by adopting a triple depth measurement learning algorithm, wherein the triple comprises a group of test images T and two groups of training images M and N; when the test acquisition test image T is close to M and far away from N, M, N is corresponding to the global feature { F } of T_full(M)，F_full(N)，F_full(T) is adjusted to { y }_M，y_N，y_TAnd (5) rewriting the distance constraint conditions of the feature vectors of the three components according to the formula (15) as follows:

wherein the content of the first and second substances,

8. The method of claim 7, wherein: in the training process, aiming at the pair wise method, the asymmetric problem is solved by adopting a weight strategy of an asymmetric sample, wherein the weight strategy of the asymmetric sample is as follows: once per iteration, num + +, by weight parameter

Adjusting the weight value; aiming at the triple depth metric learning algorithm, in order to reduce the operation weight, only triplets which violate the constraint condition and reach the non-zero loss constraint condition are calculated; and the number of the first and second groups,

for those in the formula (18)

9. A vehicle re-identification client, comprising the structure of:

and a re-identification module: the system comprises a base network model, a comparison library, a query vehicle image and a depth measurement learning network, wherein the base network model is used for acquiring a query vehicle image input by a user, respectively acquiring a feature vector of each image in the query vehicle image and the comparison library based on the base network model, calculating the distance between the feature vector of the query vehicle and the feature vector of the image in the comparison library by constructing the depth measurement learning network, implementing sequencing according to the distance and outputting an identification result; the feature vector is a global feature generated by adding a plurality of local block features blended into the view angle feature and the relative position information and then performing feature fusion.

10. A vehicle re-identification system, characterized by: comprises a user terminal and a server terminal,

acquiring a query vehicle image input by a user, respectively acquiring a feature vector of each image in the query vehicle image and a comparison library based on the basic network model, calculating the distance between the feature vector of the query vehicle and the feature vector of the image in the comparison library by constructing a depth measurement learning network, implementing sequencing according to the distance, and outputting an identification result;