CN112906557B

CN112906557B - Multi-granularity feature aggregation target re-identification method and system under multi-view angle

Info

Publication number: CN112906557B
Application number: CN202110183597.8A
Authority: CN
Inventors: 彭德光
Original assignee: Chongqing Zhaoguang Technology Co ltd
Current assignee: Chongqing Zhaoguang Technology Co ltd
Priority date: 2021-02-08
Filing date: 2021-02-08
Publication date: 2023-07-14
Anticipated expiration: 2041-02-08
Also published as: CN112906557A

Abstract

The invention provides a multi-granularity characteristic aggregation target re-identification method and system under a multi-view angle, comprising the following steps: constructing a multi-view neural network, and acquiring target characteristics of multiple views of a target object through the multi-view neural network; constructing a multi-granularity hypergraph based on the target characteristics of each target object in a set time period; inputting a target image to be queried, and acquiring a neighboring feature set of the target image to be queried from the multi-granularity hypergraph; performing similarity comparison on the adjacent feature set of the target image to be queried and the adjacent feature set of each target object in the multi-granularity hypergraph to obtain a target object re-identification result; the invention can effectively improve the re-identification precision.

Description

Multi-granularity feature aggregation target re-identification method and system under multi-view angle

Technical Field

The invention relates to the field of multi-granularity feature aggregation target re-identification method and system under multiple view angles.

Background

Pedestrian re-recognition based on video sequences is widely discussed because rich time information can be used for solving visual ambiguity, and currently, a classical method for video pedestrian re-recognition is to adopt a deep learning method to project a high-dimensional feature space on the video sequences, then perform identity matching sorting by calculating distances among samples, and mainly comprise the steps of representing video pedestrian features by aggregating frame-level time features by adopting a recurrent neural network and learning the time features by extracting video frame dynamic time information by using an optical flow field. Disadvantages of the prior art: 1. the most discriminant features cannot be learned based on the video learning of the recurrent neural network, and the training of the model on the long-segment video is complex and takes a long time. 2. The method for extracting the time features by means of the light field exploration flow structure is often easy to generate optical flow estimation errors due to the fact that adjacent frames of a certain video segment are not aligned. In order to solve the problems, the invention 1 provides a video pedestrian re-identification method based on multi-granularity feature aggregation under multi-view angles, simultaneously captures multi-granularity spatial information and time information of a video sequence, and adopts a simple and efficient hypergraph construction mode to reserve and enhance the diversity discrimination feature representation of different spatial granularities.

Disclosure of Invention

In view of the problems in the prior art, the invention provides a multi-granularity feature aggregation target re-identification method and system under multiple visual angles, which mainly solve the problems of long time consumption and low accuracy in training in the prior art.

In order to achieve the above and other objects, the present invention adopts the following technical scheme.

A multi-granularity characteristic aggregation target re-identification method under a multi-view angle comprises the following steps:

constructing a multi-view neural network, and acquiring target characteristics of multiple views of a target object through the multi-view neural network;

constructing a multi-granularity hypergraph based on the target characteristics of each target object in a set time period;

inputting a target image to be queried, and acquiring a neighboring feature set of the target image to be queried from the multi-granularity hypergraph;

and comparing the similarity between the adjacent feature set of the target image to be queried and the adjacent feature set of each target object in the multi-granularity hypergraph to obtain a target object re-identification result.

Optionally, the multi-view neural network comprises a convolutional neural network and a classification output layer, and after the image is subjected to feature extraction by the convolutional neural network, the image is input into the classification output layer to obtain target feature output of different views.

Optionally, the multi-view neural network is pre-trained by inputting a set of images containing pre-labeled different views into the multi-view neural network, constructing a loss function by cross entropy, and updating network parameters by using back propagation.

Optionally, the loss function is expressed as:

wherein y is _i For a label to be of a corresponding viewing angle,

for classification prediction results, N is the number of views.

Optionally, the target object includes a pedestrian, a vehicle.

Optionally, acquiring the adjacent feature set of the target image to be queried from the multi-granularity hypergraph includes:

calculating Euclidean distance among target features in the multi-granularity hypergraph, and acquiring first K target features with the nearest feature distance corresponding to the target image to be queried;

and acquiring a neighbor set of each target feature in the K target features, and selecting neighbor sets containing corresponding features of the target image to be queried in each neighbor set to form a neighbor feature set of the target image to be queried.

Optionally, performing similarity comparison between the adjacent feature set of the target image to be queried and the adjacent feature set of each target object in the multi-granularity hypergraph to obtain a target object re-identification result, which includes:

and selecting a target object corresponding to the adjacent feature set, the similarity of which reaches a set threshold value, as re-identification output by measuring the similarity between the adjacent feature sets through the Jaccard distance.

Alternatively, the similarity calculation method is expressed as:

wherein I is _i ，I _j Respectively representing two frames of images, R (I _i K) represents image I _i Is described.

A multi-granularity feature aggregation target re-identification system at multiple perspectives, comprising:

the network construction module is used for constructing a multi-view neural network, and acquiring target characteristics of multiple views of a target object through the multi-view neural network;

the hypergraph construction module is used for constructing a multi-granularity hypergraph based on the target characteristics of each target object in a set time period;

the feature set acquisition module is used for inputting a target image to be queried and acquiring a neighboring feature set of the target image to be queried from the multi-granularity hypergraph;

and the identification module is used for comparing the similarity between the adjacent feature set of the target image to be queried and the adjacent feature set of each target object in the multi-granularity hypergraph to obtain a target object re-identification result.

As described above, the multi-granularity characteristic aggregation target re-identification method and system under multi-view angle have the following beneficial effects.

The viewing angle information is increased, and the problems of shielding, viewing angle difference and the like are overcome; and enhancing the re-identification precision through the neighbor feature set.

Drawings

FIG. 1 is a flowchart of a multi-granularity feature aggregation target re-identification method under a multi-view in an embodiment of the invention.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

It should be noted that the illustrations provided in the following embodiments merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.

Referring to fig. 1, the present invention provides a multi-granularity feature aggregation target re-identification method under multi-view, which includes steps S01-S04.

In step S01, a multi-view neural network is constructed, and target characteristics of multiple views of a target object are acquired through the multi-view neural network:

in one embodiment, the target object may comprise a pedestrian, a vehicle, etc., and the video image containing the target object is acquired in advance, and the video sequence is acquired as an input to the multi-view neural network.

In an embodiment, the multi-view neural network comprises a convolutional neural network and a classified output layer, and after the image is subjected to feature extraction through the convolutional neural network, the classified output layer is input to obtain target feature output of different views.

In one embodiment, a set of images containing pre-labeled different perspectives is input into a multi-perspective neural network, and the multi-perspective neural network is pre-trained by constructing a loss function through cross entropy and updating network parameters by using back propagation.

Specifically, a ternary classification output layer is added after the traditional CNN, and the marked image x is utilized _i As input, it corresponds to the viewing angle label y _i As a supervisory signal, for the prediction result

Supervision training by cross entropy, cross entropy loss function can be adoptedExpressed as:

the forward and backward algorithms are used to complete the update calculation of the loss function.

Extracting video frame characteristics;

for video sequences i= { I containing a sheet of pictures ₁ ,I ₂ ,...,I _T And extracting the characteristics of each image by adopting the constructed multi-view neural network, which can be expressed as follows:

F _i ＝CNN(I _i ),i＝1,...,T,

wherein F is _i The three-dimensional tensor of each dimension c×h×w, the channel size, and the height and width of the feature map.

In step S02, a multi-granularity hypergraph is constructed based on the target features of each target object within a set period of time:

and (3) dividing the image features extracted in the step (S01) into p E {1,2,4,8} horizontal blocks according to a horizontal division mode, and carrying out average combination on the divided feature images to construct partial feature vectors. For each granularity, the entire sequence generates N _p T×p partial-level features, respectively denoted as

The first granularity of a video sequence contains a single global feature vector, and the other granularity consists of partial feature vectors.

First using v _i ∈V _p ,i∈{1,2,...,N _p The pre-candidate nodes needed for building the hypergraph are defined as a group of hyperedges E for capturing time information _p To model short-to-long-term correlations in hypergraphs. Specifically, for any one candidate node v _i Select it at time T _t The most similar K adjacent nodes in the network

The k+1 nodes are related by using the hyperedge as shown in the following formula, which is specifically expressed as follows:

hypergraph feature updating;

for a node v of hypergraph _i Definition of

Representing all supersides associated with the point, since the point associated with one superside has extremely strong association, the feature of defining the superside with the aggregation operation is as follows:

wherein,,

representing v _j Node characteristics at the layer. To calculate the association relation between node features and the association feature of the superside, calculate its similarity +.>

Wherein,,

representing the similarity between features. In addition, the softMax is adopted to normalize the similarity weight and aggregate the superside information to respectively calculate and obtain ++>

After the aggregated superside information is acquired, node characteristics can be associated by a full connection layer:

wherein W is ^l Representing a weight matrix, σ represents the excitation equation. Thus repeating the update mechanism more than L times, a series of output node characteristics can be calculated

Hypergraph feature aggregation based on an attention mechanism;

after obtaining the final updated node characteristics for each hypergraph, it is considered that in one hypergraph, different nodes have different importance. For example: the lower the importance of the blocked part or background is, the better the feature discrimination is. Thus, discriminant computation based on the mechanism of attention is designed, with nodes of each hypergraph being noted

Wherein W is _u Representing a weight matrix. The hypergraph feature can thus be computed as a weight aggregation of node features:

aggregating the multi-granularity hypergraph based on the mutual information minimizing retention loss;

to optimize the framework, the training process is co-supervised with cross entropy loss and triplet loss:

wherein y is _i Representing characteristics

N, C represent the size of mini-batch and the number of training set categories, respectively,

the query sample, positive sample and negative sample are respectively represented when the division granularity is p. After training the model based on the two loss terms, each hypergraph will output a distinct graph-level feature.

In order to acquire the characteristics fusing multi-granularity hypergraph information, mutual information minimization loss is adopted, mutual information among different hypergraph characteristics is reduced, and uncertainty of final video representation is further increased by combining all the characteristics. Thus, for hypergraph features of different granularity p, we define the mutual information minimization penalty:

kappa is used to measure the mutual information of different hypergraph features. Finally, the loss functions of all parts are combined as in the formula (13), and a forward and backward algorithm is adopted to finish the updating calculation of the loss functions.

L _all ＝L _xent +L _tri +L _MI

In step S03, inputting a target image to be queried, and acquiring a set of neighboring features of the target image to be queried from the multi-granularity hypergraph:

in an embodiment, calculating Euclidean distance among target features in the multi-granularity hypergraph, and acquiring first K target features with the nearest feature distance corresponding to the target image to be queried;

Specifically, the Euclidean distance d between hypergraph features obtained by step S03 is calculated _m (F′i,F′ _j ) The k nearest distances of the query image probe correspond to a neighbor set N (probe, k), wherein the set contains positive samples and negative samples, and is defined as:

wherein,,

samples 1,2, and k near the Euclidean distance between the samples are shown. At the same time, for each of the neighbor sets N +.>

Also have their respective neighbor sets N ', which are adjacent to each other if probes are included in N', or are not adjacent to each other. Thus, the k-mutual proximity set R of the probe can be obtained, and all elements in the R are target objects which are mutually adjacent to the probe.

R(probe,k)＝{(t _i ∈N(probe,k)∩(p∈N(t _i ，k))} (16)

The set can be regarded as a probe k-mutual proximity feature, which is more suitable for similarity measurement between pedestrians than a hypergraph feature.

In step S04, similarity comparison is performed between the adjacent feature set of the target image to be queried and the adjacent feature set of each target object in the multi-granularity hypergraph, and a target object re-recognition result is obtained:

in an embodiment, through the similarity between adjacent feature sets measured by the Jaccard distance, selecting a target object corresponding to the adjacent feature set with the similarity reaching a set threshold as the re-recognition output, specifically:

to describe any two images I specifically from the point of view of collection _i ，I _j Variability between nearest neighbor sets, defining Jaccard distance between two neighbor sets

And measuring the similarity between the target objects by the distance, and re-identifying the query target objects.

The embodiment also provides a multi-granularity characteristic aggregation target re-identification system under the multi-view angle, which is used for executing the multi-granularity characteristic aggregation target re-identification method under the multi-view angle in the embodiment of the method. Since the technical principle of the system embodiment is similar to that of the foregoing method embodiment, the same technical details will not be repeated.

In one embodiment, a multi-granularity feature aggregation target re-identification system under multiple perspectives comprises: the device comprises a network construction module, a hypergraph construction module, a feature set acquisition module and an identification module, wherein the network construction module is used for assisting in executing the step S01 in the embodiment of the method; the hypergraph construction module is used for assisting in executing the step S02 in the embodiment of the method; the feature set obtaining module is used for assisting in executing step S03 in the method embodiment; the identification module is used to assist in executing step S04 in the foregoing method embodiment.

In summary, according to the multi-granularity feature aggregation target re-identification method and system under the view angle, the three-dimensional view angle classification is adopted to enable the pedestrian feature to contain view angle information in subsequent processing, so that the problems of shielding, view angle difference and the like are overcome; the hypergraph neural network structure can simultaneously extract the spatial characteristics and the time dependence of the video frame, and the hypergraph diversity corresponding to different spatial granularities can be reserved and enhanced by using mutual information to minimize loss; the k-mutual adjacent coding method is utilized to improve the re-recognition precision of pedestrians, and the defect that the hypergraph learning is too focused on local information is overcome. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims

1. The multi-granularity characteristic aggregation target re-identification method under the multi-view angle is characterized by comprising the following steps of:

constructing a multi-view neural network, acquiring target characteristics of multiple views of a target object through the multi-view neural network, wherein the multi-view neural network comprises a convolutional neural network and a classified output layer, and after an image is subjected to characteristic extraction through the convolutional neural network, inputting the image into the classified output layer to acquire target characteristic output of different views;

the similarity comparison is carried out on the adjacent feature set of the target image to be queried and the adjacent feature set of each target object in the multi-granularity hypergraph, and a target object re-identification result is obtained, which comprises the following steps: and (3) measuring the similarity between adjacent feature sets through the Jaccard distance, selecting a target object corresponding to the adjacent feature set, the similarity of which reaches a set threshold value, as re-identification output, wherein the similarity calculation mode is expressed as follows:

2. The multi-granularity feature aggregation target re-recognition method under multi-view according to claim 1, wherein a set of images including pre-labeled different views is input into the multi-view neural network, and the multi-view neural network is pre-trained by constructing a loss function through cross entropy and updating network parameters by adopting back propagation.

3. The multi-granularity feature aggregation target re-identification method under multi-view according to claim 2, wherein the loss function is expressed as:

wherein y is _i For a label to be of a corresponding viewing angle,

for classification prediction results, N is the number of views.

4. The multi-granularity feature aggregation target re-recognition method under multiple perspectives of claim 1, wherein the target object comprises a pedestrian or a vehicle.

5. The multi-granularity feature aggregation target re-identification method under the multi-view angle according to claim 1, wherein the acquiring the adjacent feature set of the target image to be queried from the multi-granularity hypergraph comprises:

6. A multi-granularity feature aggregation target re-identification system under multiple viewing angles, comprising:

the system comprises a network construction module, a multi-view neural network, a classification output layer and a target characteristic acquisition module, wherein the network construction module is used for constructing a multi-view neural network, acquiring target characteristics of a target object at multiple views through the multi-view neural network, the multi-view neural network comprises a convolutional neural network and the classification output layer, and after an image is subjected to characteristic extraction through the convolutional neural network, the image is input into the classification output layer to acquire target characteristic output at different views;

the feature set acquisition module is used for inputting a target image to be queried and acquiring an adjacent feature set of the target image to be queried from the multi-granularity hypergraph;

the identification module is used for comparing the similarity between the adjacent feature set of the target image to be queried and the adjacent feature set of each target object in the multi-granularity hypergraph to obtain a target object re-identification result, and comprises the following steps: and (3) measuring the similarity between adjacent feature sets through the Jaccard distance, selecting a target object corresponding to the adjacent feature set, the similarity of which reaches a set threshold value, as re-identification output, wherein the similarity calculation mode is expressed as follows: