CN117788511A

CN117788511A - Multi-expansion target tracking method based on deep neural network

Info

Publication number: CN117788511A
Application number: CN202311800574.2A
Authority: CN
Inventors: 陈辉; 胡荣海; 张小娟; 梁建虎; 边斌超; 崔婧; 杜双燕
Original assignee: Lanzhou University of Technology
Current assignee: Lanzhou University of Technology
Priority date: 2023-12-26
Filing date: 2023-12-26
Publication date: 2024-03-29
Anticipated expiration: 2043-12-26

Abstract

The invention discloses a multi-expansion target tracking method based on a deep neural network, which belongs to the technical field of multi-expansion target tracking and comprises the following steps: constructing a Kalman filter and an LSTM network architecture, and constructing a depth-associated multi-target tracking network structure based on the Kalman filter and the LSTM network architecture; performing target tracking based on the depth-associated multi-target tracking network structure to obtain target association probability, and obtaining a multi-target tracking result based on the target association probability; converting the measurement of a target into a two-channel image, constructing a CNN network model, and estimating the elliptical shape of the two-channel image to obtain a prediction result of a major-minor axis of the ellipse; and generating a multi-expansion target tracking result based on the multi-target tracking result and the prediction result of the ellipse major-minor semi-axis. In the patent, the LSTM neural network is used for processing the multi-target data association problem, so that the multi-target tracking data association problem can be effectively solved.

Description

Multi-expansion target tracking method based on deep neural network

Technical Field

The invention belongs to the field of multi-expansion target tracking, and particularly relates to a multi-expansion target tracking method based on a deep neural network.

Background

The target tracking technology is an important branch in the field of information fusion, and has wide application in the fields of military and civilian life. Data association is a key part of a multi-target tracking technology, and has the potential of improving the multi-target tracking performance of an airborne radar system, so that the data association is an important research topic in the field of airborne radars. The hungarian algorithm is one of the earliest algorithms to solve the linear allocation problem. The HA minimizes the estimated allocation cost of the target by maximizing the sum of the log likelihood functions. However, HA does not perform well in the context of, in particular, noisy interference. The probability data association filter adopts a Bayesian method to search the distribution probability of the detected target, but the PDA can only effectively process the single target data association problem under the clutter background. The joint probability data association can solve the problem of multi-objective data association of measurement belonging to a plurality of tracking gates. However, as the number of targets increases, the validation matrix becomes larger and larger, a large number of federated events will be obtained by splitting the validation matrix. Therefore, the algorithm is difficult to apply in engineering applications. Meanwhile, through years of development, a plurality of extended target modeling methods are developed aiming at different types of targets. Koch is proposing a Random Matrix Model (RMM) describing the shape of a target as an ellipse. Subsequently, the first and second heat exchangers are connected,gaussian Inverse Wishart (GIW) filters were developed to enable RMM to estimate motion states and extension states. Baum also proposes a Random Hypersurface Model (RHM) to handle elliptic expansionThe targets are expanded and multiple expanded targets are tracked using rhm, and Zea et al application rhm tracks non-convex shaped targets. For irregularly shaped extended targets, lan et al propose a multiple random matrix model that approximates the target shape in combination with multiple ellipses. Another irregularly shaped method is the star-convex RHM proposed by Baum. The Gaussian Process (GP) is a common machine learning method. />A GP extended kalman filter (GPEKF) was proposed by et al for single extended target tracking in ideal environments. In the detailed description of the present invention,the main extended target tracking algorithm at present is explained. Up to now, the GP algorithm is the most dominant shape estimation method. However, the shape estimation accuracy of the GP algorithm is related to the covariance function, and is more likely to be degraded when the motion state deviates from the true value, and the calculation amount of the measurement model is increased when tracking a complex target. Conventional multi-expansion target tracking faces two difficulties: 1) How to solve the multi-objective data association problem. 2) How to better estimate the target shape.

In recent years, deep learning is a new research hotspot in the field of artificial intelligence. Deep neural networks have emerged in a wide variety of applications successfully addressing what has previously been considered too complex for the prior art. The cross fusion of the deep learning and the target tracking problem provides a new approach for improving the target tracking performance and estimating the target shape. The initial artificial neural network is proposed to be the beginning of deep learning development, and then various neural networks such as RNN, CNN, transducer and the like emerge, while common neural networks have good effects in treating some simple problems, targets with various shapes are difficult to treat when the networks lack data and training data are not related to test data, and the optimal data association in multi-target tracking is solved in the existing multi-extended target tracking field. And as the number and amount of targets increases, the complexity of the algorithm is also increasing, and the calculation amount of the algorithm is also becoming larger and larger. For the traditional extended target tracking algorithm, for example, the shape estimation accuracy of the GP algorithm is related to a covariance function, when the motion state deviates from a true value, the shape estimation accuracy is easier to be reduced, and when a complex target is tracked, the calculated amount of a measurement model is increased.

Disclosure of Invention

The invention aims to provide a multi-expansion target tracking method based on a deep neural network, which aims to solve the problems existing in the prior art.

In order to achieve the above object, the present invention provides a multi-expansion target tracking method based on a deep neural network, including:

constructing a Kalman filter and an LSTM network architecture, and constructing a depth-associated multi-target tracking network structure based on the Kalman filter and the LSTM network architecture;

performing target tracking based on the depth-associated multi-target tracking network structure to obtain target association probability, and obtaining a multi-target tracking result based on the target association probability;

converting the measurement of a target into a two-channel image, constructing a CNN network model, and estimating the elliptical shape of the two-channel image to obtain a prediction result of a major-minor axis of the ellipse;

and generating a multi-expansion target tracking result based on the multi-target tracking result and the prediction result of the ellipse major-minor semi-axis.

Preferably, before the construction of the Kalman filter and the LSTM network architecture, the method further comprises confirming a complete data association problem model tracked by the radar in clutter;

the expression of the complete data association problem model is as follows:

wherein,representing the cost of assigning target j to measurement i at time k,/>For decision variables, i.e.)>Indicating that target i is assigned to measurement j, N _k For the total number of targets that have been started and maintained at time k, N is the number of scans, M _k To scan the total number of measurements received.

Preferably, the input vector of the LSTM network architecture is preprocessed through a pair of distance matrixes generated by all measurement and state prediction of each target;

the expression of the input vector is:

wherein,is a measure of the predicted target,/->Representing repetition->An M x D dimension matrix is formed once, Z (k) is the actual measurement.

Preferably, the training process of the LSTM network architecture includes:

training the LSTM network architecture based on an Adam optimization algorithm, and simultaneously adopting a root mean square error loss function to minimize errors;

the hidden state dimension of the LSTM network architecture is set to 128, and the number of units of the full connection layer is set to 64.

Preferably, the process of converting the measurement of the target into the two-channel image includes:

centering the measured value of the target by taking the estimated target center as a center, and aligning the measured value of the target with the direction of the target to obtain a processed measured value;

performing spatial transformation on the processed measurement value based on the discretization operation of the super parameter to obtain a measurement image;

and constructing an empty double-channel image internal intensity measurement image with the same size as the existing image, and constructing the double-channel image based on the empty double-channel image internal intensity measurement image and the measurement image.

Preferably, the discretization operation based on the super parameter spatially transforms the processed measurement value to obtain the expression of the measurement image as follows:

wherein c _i,pixel To measure the image c _i For the individual coordinates, p is the pixel and s is the hyper-parameter.

Preferably, the process of constructing a CNN network model to estimate the elliptical shape of the two-channel image and obtain the prediction result of the major and minor axes of the ellipse includes:

constructing a CNN network, and inputting the two-channel image into the CNN network for processing to obtain a predicted value;

amplifying the predicted value until the complete target length and width are obtained, and outputting the amplified predicted value to obtain the predicted result of the elliptical long and short half shaft.

Preferably, the convolution layer of the CNN network consists of a Batchnormal layer, a relu layer and a Max Pool layer;

the architecture of the CNN network comprises 5 convolution layers, a Relu activation function and a full connection layer;

the kernel size of the first two layers of the convolution layer is 3x3, and the kernel size of the last three layers is 4x4.

The invention has the technical effects that:

the LSTM neural network is used for processing the multi-target data association problem, the multi-target tracking data association problem can be effectively solved, then the measurement is converted into a double-channel image and is input as a CNN neural network architecture, so that the length of the elliptical long and short half shafts is inferred, the combined optimization of the target motion state and the profile information (expansion state) is comprehensively considered, and compared with the traditional data association algorithm, the LSTM data association algorithm is used for more accurately matching the targets with the measurement, so that the multi-target tracking effect is better. Meanwhile, the estimation of the target contour information by the CNN network frame is more accurate, so that the tracking effect of the multi-expansion target is optimized on the whole.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:

FIG. 1 is a flow chart of multi-expansion target tracking in an embodiment of the invention;

FIG. 2 is a schematic diagram of an LSTM-based data correlation model in accordance with an embodiment of the present invention;

FIG. 3 is a graph of the true value of a target trajectory in an embodiment of the present invention;

FIG. 4 is a diagram of the JPDA multi-target tracking result in an embodiment of the present invention;

FIG. 5 is a diagram of LSTM multi-target tracking results in an embodiment of the invention;

FIG. 6 is a diagram of multi-expansion target tracking in an embodiment of the invention;

FIG. 7 is a graph of extended target Neisseria distance in an embodiment of the present invention.

Detailed Description

It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

Example 1

As shown in fig. 1, the embodiment provides a multi-expansion target tracking method based on a deep neural network, which includes:

and generating a multi-expansion target tracking result based on the target association probability, the multi-target tracking result and the prediction result of the ellipse minor-minor axis.

The specific implementation process is as follows:

1: constructing an LSTM network architecture to process data association problems;

2: constructing a depth-associated multi-target tracking frame, predicting and updating targets by using a tracking filter, and matching targets with measurement by using an LSTM algorithm;

3: constructing CNN network architecture for ellipse shape estimation

4: converting the measurement into an image representation and taking the image representation as an input of CNN so as to predict an ellipse major-minor half axis;

5: and combining the depth-associated multi-target tracking with the CNN as an expansion target to perform multi-expansion target tracking.

The step 1 specifically comprises the following steps: an LSTM network architecture is constructed.

The step 2 specifically comprises the following steps: a depth-dependent multi-objective tracking network structure is constructed, which includes a Kalman filter and an LSTM network whose inputs are preprocessed by a pair-wise distance matrix generated from all measurements and state predictions for each objective. The network training algorithm is trained by utilizing an Adam optimization algorithm, error minimization is carried out by adopting a root mean square error loss function, the loss function is represented by (8), the hidden state dimension of the LSTM network is set to be 128, and the unit number of the full connection layer is set to be 64. The training output through the LSTM network is the probability of association of the target with the measurement.

The step 3 specifically comprises the following steps: and (3) converting the measurement of the ellipse expansion target generated by the formulas (9) - (11) into an image representation by the formulas (12) - (17), predicting the converted image as a network input of CNN, and finally amplifying the predicted value to obtain a major half axis of the ellipse. The used CNN convolutional layer internal architecture consists of a Batchnormal layer, a Relu layer and a Max Pool layer, and the used CNN network architecture consists of 5 convolutional layers, followed by a Relu activation function and a fully connected layer that produces two-dimensional regression output. The first two convolutions are filled in order to keep the image size constant throughout the layers, with the convolution steps of the other convolutions being 2. The kernel size is chosen to be 3x3 in the first two layers and 4x4 in the deeper layers.

The step 4 specifically comprises the following steps: combining the depth-associated multi-target tracking frame with the ellipse extended target tracking, obtaining the motion state of the target through the depth-associated multi-target tracking frame, converting measurement into image representation, inputting the image representation into a CNN framework, and finally obtaining the minor and major half axes of the ellipse through conversion. Finally, a multi-ellipse target tracking algorithm is realized.

The method also comprises the step of confirming a complete data association problem model tracked by the radar in clutter before constructing the Kalman filter and the LSTM network architecture;

description of data association problem:

the definition measurement is:

wherein N is the number of scans, M _k To scan the received measurement total, a false alarm i is introduced _k =0, then Z (k) is re-expressed as

Defining a target set as:

wherein,for the j-th target state prediction existing at time k, N _k Is the total number of targets that have been started and maintained at time k.

For each ofIntroducing a variable 0-1->Definition:

the complete data correlation problem model for radar tracking in clutter using N probes can be defined as:

wherein,representing the cost of assigning target j to measurement i at time k,/>For decision variables, i.e.)>Indicating that target i is assigned to measurement j.

Constraint conditions:

where (5) indicates that only one target can be assigned to each measurement and (6) indicates that each target can be associated with only one measurement.

The expression of the complete data association problem model is as follows:

The input vector of the LSTM network architecture is preprocessed through paired distance matrixes generated by all measurement and state prediction of each target;

the expression of the input vector is:

The training process of the LSTM network architecture comprises the following steps:

The process of converting the measurement of the target into the two-channel image comprises the following steps:

Performing spatial transformation on the processed measurement value based on the discretization operation of the super parameter, and obtaining the expression of the measurement image as follows:

wherein c _i,pixel To measure the image c _i For the individual coordinates, p is the pixel,s is a super parameter.

The inputs to the LSTM network may be preprocessed by pairs of distance matrices generated from all measurements and state predictions for each target. The LSTM network is then used to calculate the metrology-trace association probabilities. Each target is updated by an associated measurement.

FIG. 2 shows the matching of all targets and measurements at the same time, in each prediction step the network outputs a probability distribution vector, which is the probability of correlation of a target with all measurements at time k, i.eMissing targets will be assigned to virtual metrology. The input is +.>It reaches the hidden state h through a full connection layer, and outputs the probability of being associated with the ith target obtained through full connection layer transformation and subsequent sigmoid transformation>Input vector->Remodelling from a distance matrix, which is a distance matrix from all measurements to all targets. Its remodeling was calculated as follows:

wherein the method comprises the steps ofMeasurement of predicted target->Representing repetition->Secondary formation of M x D dimensional matrix (6)

The algorithm uses a mean square error function as a loss function to minimize the mean square error between the predicted associated probability and the associated probability of the actual target measurement. The loss function is defined as:

wherein beta is _i A predicted value representing a probability of the data association of the ith target with the metrology value,a true value representing the probability of the data association of the ith target with all metrology values.

CNN-based shape estimation:

setting the state of the expansion target to be from the motion stateThe decoupled shape state is composed of a shape parameter consisting of a direction θ and a half-axis length l, _w composition, motion state is Gaussian distribution +.>The direction changes over time, aligned with the velocity vector. The sensor can detect the measurement source y in shape, so that the measurement z is generated as:

z＝y+v(9)

the sensor noise is v-N (0, R), the measurement source is uniformly distributed on the elliptical surface, and the multiplication error h= [ h ] is used ₁ ,h ₂ ] ^Τ It consists of two independent noise terms, according to h _i -N (0,0.25) distribution to approximate a uniform distribution on a circle. The equation obtained is as follows:

y＝Hx+rot(θ)·diag([t,w])·h (10)

where rot (θ) is a rotation matrix around θ, diag (·) represents a diagonal matrix with the input vector as the main diagonal, and the metrology matrix H. Is obtained by two modes (9) and (10)

z＝Hx+rot(θ)·diag([l,w])·h+v (11)

In the formation process of the measurement image, the updating of the intensity image is divided into four steps:

1. all measured values Z _k Converted metrology centered on an estimated target centerExpressed as:

2. the measurement is aligned with the direction of the target, and->Representing the velocity components of the y and x coordinates, respectively.

3. Converting the centered and aligned measurement values into image space by performing a discretization operation based on a super parameter s, which represents the maximum distance the measurement value may be from the target center horizontally or vertically, for inclusion in the image, and the image size (pixel p), for a color having c _i ∈[-s,s]Is the individual coordinate c of (2) _i The following spatial transformation may be performed.

Wherein c _i,pixel Representing the input coordinates transformed to the pixel point of the image, c _i,pixel Rounding to the nearest integerAnd obtaining pixel coordinates corresponding to the input coordinates, and setting a threshold value to adapt to the outermost pixel in the corresponding direction.

4. An empty dual channel image internal intensity measurement image of equal size to the existing image is created (intel intersity measurementimage,) To manage the updating step for +.>For the center, each coordinate of each measurement point aligned is subjected to pixel point conversion, and the intensity of the corresponding pixel is increased by 1 in two image channels, and when all the measurements are combined, the second channel is obtained by applying the position uncertainty P based on measurement noise covariance and Kalman filter _loc ＝HP _k|k H ^Τ Is processed by standard gaussian blur. "update image" of the measurement construction of the current time step>Adding the image to the existing measurement image;

obtaining IIMI in the prediction step by equation 16 _k|k-1 Wherein IIMI _k|k Representing the measurement image obtained by updating the k moment IIMI _k|k-1 A measurement image predicted at the time k is represented;

IIMI _k|k-1 ＝γ·IIMI _k-1|k-1 (16)；

wherein 0 < gamma < 1 is a superparameter that achieves a forgetting factor such that the most recently obtained measured intensity is higher than the measured intensity of the past time step. Wherein IIMI _k-1|k-1 And represents the measurement image updated at time k-1.

The current acquisition of shape estimates may be accomplished simply by combining IIMI _k|k Pass to CNN for completion. Using normalization of images, the normalization is defined byEquation 17 represents:

wherein,the method is characterized in that the updated measurement image standard is normalized, the highest pixel intensity is set to be the intensity 255, the prediction of the neural network is performed in a reduced image space, and the predicted object size is s, so that the predicted value is finally amplified by s to obtain the complete target length and width, and the half axis is obtained.

The foregoing is merely a preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. The multi-expansion target tracking method based on the deep neural network is characterized by comprising the following steps of:

2. The deep neural network-based multi-extended target tracking method of claim 1, further comprising, prior to said constructing a kalman filter and LSTM network architecture, validating a complete data correlation problem model for radar tracking in clutter;

the expression of the complete data association problem model is as follows:

3. The deep neural network-based multi-extension target tracking method according to claim 1, wherein the input vector of the LSTM network architecture is preprocessed by a pair-wise distance matrix generated by all measurements and state prediction of each target;

the expression of the input vector is:

wherein,is the aim of predictionMeasurement of target->Representing repetition->An M x D dimension matrix is formed once, Z (k) is the actual measurement.

4. The deep neural network-based multi-extension target tracking method of claim 1, wherein the training process of the LSTM network architecture comprises:

5. The depth neural network based multi-extension target tracking method of claim 1, wherein the converting the measurement of the target into the two-channel image comprises:

6. The deep neural network-based multi-extension target tracking method of claim 5, wherein the performing spatial transformation on the processed measurement values based on the discretization operation of the super parameters, the expression for obtaining the measurement image is:

7. The multi-expansion target tracking method based on deep neural network according to claim 1, wherein the process of constructing a CNN network model to estimate the elliptical shape of the two-channel image and obtain the prediction result of the major and minor ellipses comprises:

8. The deep neural network-based multi-extension target tracking method of claim 7, wherein the convolutional layer of the CNN network consists of a batch normal layer, a relu layer and a Max Pool layer;