CN114358211B

CN114358211B - Multi-mode deep learning-based aircraft behavior intention recognition method

Info

Publication number: CN114358211B
Application number: CN202210044234.0A
Authority: CN
Inventors: 朱秀翠; 黄宇
Original assignee: Zhongke Shitong Hengqi Beijing Technology Co ltd
Current assignee: Zhongke Shitong Hengqi Beijing Technology Co ltd
Priority date: 2022-01-14
Filing date: 2022-01-14
Publication date: 2022-08-23
Anticipated expiration: 2042-01-14
Also published as: CN114358211A

Abstract

The invention discloses an aircraft behavior intention recognition method based on multi-mode deep learning, and belongs to the technical field of aircraft behavior recognition. The method utilizes a deep learning technology, and based on two modal data of a track sequence of the flight behavior of the aircraft and projection images of the track at different angles, the flight behavior intentions of moving targets such as the aircraft and the like are identified at a brand-new view angle, and in addition, a behavior intention identification model is trained independently by taking behavior categories obtained by clustering as labels, so that the behavior intentions of the subsequently added flight tracks of the aircraft can be directly identified.

Description

Multi-mode deep learning-based aircraft behavior intention recognition method

Technical Field

The invention relates to the technical field of behavior recognition, in particular to an aircraft behavior intention recognition method based on multi-mode deep learning.

Background

With the rapid development of mobile communication technology and location awareness technology, it becomes easier and easier to acquire real-time location information of various moving targets such as aircrafts, and the scale of moving target trajectory data is rapidly increased. In recent years, in various industries and departments such as transportation, military affairs and internet enterprises, a large number of data assets containing geographic position information are accumulated, and the data assets contain original track information of moving target activities and have important values for regular mining of moving target behaviors and identification of moving target behavior intentions.

At present, the following two problems mainly exist in most methods for analyzing the track behavior of a moving target:

1. the input data mode is single, either only text data or image data is input, and the influence of possible relevance of different mode data on the accuracy of the moving target track behavior analysis result is ignored.

2. Most of the track behavior analysis adopts a clustering method, which is limited by a clustering algorithm, so that the subsequent newly-added track behavior intention cannot be directly identified, and if the behavior intention of the subsequent newly-added track data is identified, the clustering analysis needs to be carried out again, which is very troublesome.

The development of big data and artificial intelligence technology provides a new visual angle for the analysis of moving target activities, and a new generation artificial intelligence technology revolution which is characterized by deep learning provides technical support for behavior intention identification based on moving target track data. The multi-modal deep learning is a leading and hot technical field nowadays, a great deal of technical companies in the field make great technical research and development investments in the multi-modal deep learning field, and scientific achievements are also shown in spring bamboo shoots after rain, such as text classification based on multi-modal and the like, but have not been applied to behavior intention recognition of moving targets such as aircrafts and the like.

Disclosure of Invention

The invention innovatively provides an aircraft behavior intention recognition method based on multi-mode deep learning, by utilizing a deep learning technology and based on two modal data of a track sequence of aircraft flight behaviors and projection images of the track at different angles, the flight behavior intentions of moving targets such as an aircraft and the like are recognized at a brand-new view angle, in addition, a behavior intention recognition model is trained independently by taking a behavior category obtained by clustering as a label, and the behavior intention of a subsequently added aircraft flight track can be directly recognized.

In order to achieve the purpose, the invention adopts the following technical scheme:

the aircraft behavior intention recognition method based on the multi-mode deep learning comprises the following steps:

s1, acquiring flight path data of the aircraft, wherein the flight path data comprises flight data of each path point and projection image data of the aircraft projected to the ground at different angles on each path point, sequencing the flight data of each path point into a path sequence according to a flight time sequence, and sequencing each projection image projected to the ground at the same angle on each path point of the aircraft into a projection image sequence;

s2, respectively extracting the behavior characteristic vector of the track sequence and the projection image characteristic vector of each projection image sequence;

s3, performing cluster analysis on the eigenvectors of the aircraft with different associations extracted in the step S2 to obtain behavior intention categories of each aircraft;

s4, training and forming a behavior intention multi-classification model by taking the feature vector of the aircraft corresponding to the association extracted in the step S2 and the behavior intention category of the aircraft corresponding to the association obtained in the step S3 as model training samples;

and S5, inputting the acquired flight trajectory data associated with the aircraft into the behavior intention multi-classification model, and outputting a behavior intention identification result of the aircraft by the model.

As a preferred aspect of the present invention, the data dimension of each track point in the track sequence includes a flight time, a flight altitude, a flight speed, and a longitude and a latitude of the track point.

In a preferred embodiment of the present invention, in step S2, the trajectory behavior feature corresponding to the ith trajectory point in the trajectory sequence to be extracted as the behavior feature is written as b _i ，b _i Including the flight position variation of the aircraft at the ith track point

Flying speed

Flight angle

And the amount of change in flying speed

And the amount of change in flight angle

As a preferred embodiment of the present invention, the,

calculated by the following equations (1) to (5), respectively:

in the formulas (1) to (5),

respectively representing the latitude and the longitude of the ith track point;

respectively representing the latitude and longitude of the ith-1 track point which is sequenced in the track sequence before the ith track point;

respectively representing the heights of the ith track point and the (i-1) th track point;

respectively representing the time when the aircraft flies to the ith and (i-1) th track points.

As a preferable mode of the present invention, the aircraft respectively projects two projection images on each track point in a manner of projecting in parallel to the ground and in a manner of projecting in perpendicular to the ground.

As a preferred embodiment of the present invention, in step S2, the pre-trained auto-encoder model based on LSTM is used to extract the behavior features of the track sequence, and the method for extracting the behavior features of the track sequence based on the auto-encoder model based on LSTM includes:

the auto-encoder model based on the LSTM comprises an LSTM encoder and an LSTM decoder, and the line characteristic sequence of the track sequence is B _TR ＝(b ₁ ,b ₂ ,…,b _i ,…,b _T ) Wherein b is ₁ ,b ₂ ,…,b _i ,…,b _T Setting i to be 0,1, …, and T to represent the number of the track points in the track sequence; the input to the LSTM encoder is a sequence B _TR The LSTM encoder reads the input sequence in order and updates the hidden layer state h accordingly _t The updating method is as follows:

h _t ＝f _LSTM (h _t-1 ,b _t )，f _LSTM is an activation function, b _t Is an element in the input sequence currently read by the LSTM encoder;

at the last trace point b _T After being processed, the layer state h is implied _T As said track sequence B _TR A low-dimensional implicit representation of;

the LSTM decoder first starts with h _T Generating c as an initialized implicit State ₁ And then further generates (c) ₂ ,c ₃ ,…,c _T ) The updating mode of the LSTM decoder is as follows:

wherein f is _LSTM Is an activation function;

the goal of the LSTM decoder is to reconstruct the input sequence B _TR Said LSTM encoder and said LSTM decoder pass minimization (b) ₁ ,b ₂ ,…,b _i ,…,b _T ) And (c) ₁ ,c ₂ ,…,c _i ,…,c _T ) Training the trajectory sequence with the reconstruction error; the loss function of the auto-encoder model based on the LSTM adopts the mean square error, and the calculation formula is as follows:

as a preferred embodiment of the present invention, in step S2, the pre-trained auto-encoder model based on CNN is used to extract the behavior features of the projection image sequence, and the method for extracting the behavior features of the projection image sequence based on the auto-encoder model based on CNN comprises:

the CNN-based auto-encoder model includes a CNN encoder and a CNN decoder, with the projection image sequence being I, the CNN encoder aiming at converting the input vector I into a potential representation Z ₂ The goal of the CNN decoder is to represent the potential representation Z ₂ The mixture is reconstructed into the shape of I',

the CNN encoder comprises 3 conv layers, 1 reshape layer and 1 FC layer which are sequentially connected, a LeakyRelu activation function is adopted behind each conv layer, the conv layers in the CNN encoder are used for extracting image features of each element in an input vector I, the reshape layers in the CNN encoder are used for changing the size of a feature map output by the conv layers, and the FC layers in the CNN encoder are used for reducing the dimension of input data;

the CNN decoder comprises 3 deconv layers, 1 reshape layer and 1 FC layer, a LeakyRelu activation function is adopted after each deconv layer, and the FC layer in the CNN decoder is used for performing dimensionality raising on output data of the CNN encoder; the reshape layer in the CNN decoder is used for changing the size of the feature map after being subjected to dimension upgrading by the FC layer; the 3 deconv layers in the CNN decoder are used for reconstructing the feature map output by the reshape layer into I';

the loss function of the auto-encoder model based on the CNN adopts the mean square error, and the calculation formula is as follows:

L(I,I′)＝|I-I′| ² and L (I, I') represents a loss function.

As a preferable embodiment of the present invention, in step S3, the feature vectors of the aircraft with different associations extracted in step S2 are subjected to cluster analysis by a DBSCAN density clustering algorithm.

As a preferred embodiment of the present invention, in step S4, the method for training the behavior intention multi-classification model includes:

l1, the behavior feature of the trajectory sequence extracted in step S2 is represented as Z ₁ (p) normalizing Z by min-max ₁ (p) conversion to z ₁ (p)', the transfer function is:

max represents the maximum value of sample data, min represents the minimum value of sample data

Order to

Converting z' to obtain I ₁ ∈R ^K×K ,I ₁ The mth row of data has the following calculation formula:

I ₁ (m)＝A+B

wherein, A represents one-dimensional data with m zero elements;

B＝[z′ _1m ,z′ _1m+1 ,…,z′ _1k-1 ]and is and

p represents z ₁ (p) length;

l2, 1 ₁ ∈R ^K×K Two projection images I associated with the same aircraft ₂ ∈R ^K×K Obtaining I epsilon R after splicing ^K ^×K×3 ；

L3, using any one or more of image turning, rotating, zooming, cutting and translating to enhance the data of the image I to expand the model training sample, and dividing the model training sample into a training data set, a testing data set and a verification data set according to the proportion;

l4, reading the training data set and the testing data set into a Darknet-53 network, using the class obtained by clustering in the step S3 as a label and using a cross entropy loss function to train the network to finally form the behavior intention multi-classification model.

The method utilizes a deep learning technology, and based on two modal data of a track sequence of the flight behavior of the aircraft and projection images of the track at different angles, the flight behavior intentions of moving targets such as the aircraft and the like are identified at a brand-new view angle, and in addition, a behavior intention identification model is trained independently by taking behavior categories obtained by clustering as labels, so that the behavior intentions of the subsequently added flight tracks of the aircraft can be directly identified.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the embodiments of the present invention will be briefly described below. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a flowchart of an implementation of a method for recognizing an aircraft behavior intention based on multi-modal deep learning according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method for auto-encoder based extraction of behavioral characteristics associated with a sequence of trajectories and a sequence of projection images of the same aircraft;

FIG. 3 is an internal structural view of an LSTM-based auto-encoder model employed in the present embodiment;

FIG. 4 is an internal structural diagram of the auto-encoder model based on CNN employed in the present embodiment;

FIG. 5 is a clustering diagram of DBSCAN algorithm;

FIG. 6 is a schematic diagram of cluster analysis process of DBSCAN algorithm implemented by codes;

FIG. 7 is a logic block diagram of a training behavioral intent multi-classification model;

FIG. 8 is a Block diagram of a Residual Block Residual Block.

Detailed Description

The technical scheme of the invention is further explained by the specific implementation mode in combination with the attached drawings.

Wherein the showings are for the purpose of illustration only and are shown by way of illustration only and not in actual form, and are not to be construed as limiting the present patent; to better illustrate the embodiments of the present invention, some parts of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product; it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The same or similar reference numerals in the drawings of the embodiments of the present invention correspond to the same or similar components; in the description of the present invention, it should be understood that if the terms "upper", "lower", "left", "right", "inner", "outer", etc. are used for indicating the orientation or positional relationship based on the orientation or positional relationship shown in the drawings, it is only for convenience of description and simplification of description, but it is not indicated or implied that the referred device or element must have a specific orientation, be constructed in a specific orientation and be operated, and therefore, the terms describing the positional relationship in the drawings are only used for illustrative purposes and are not to be construed as limitations of the present patent, and the specific meanings of the terms may be understood by those skilled in the art according to specific situations.

In the description of the present invention, unless otherwise explicitly specified or limited, the term "connected" or the like, if appearing to indicate a connection relationship between the components, is to be understood broadly, for example, as being fixed or detachable or integral; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be connected through any combination of two or more members or structures. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The aircraft behavior intention identification method based on multi-mode deep learning, provided by the embodiment of the invention, as shown in fig. 1, comprises 5 steps of data preparation, data preprocessing, auto-encoder-based feature representation, DBSCAN-based behavior intention clustering and darknet 53-based behavior intention multi-classification, and specifically comprises the following contents:

step 1, data preparation

The method comprises the steps of crawling flight trajectory data of aircrafts in open-source websites related to aircraft trajectories by utilizing a web crawler technology to obtain a trajectory sequence tra (x) consisting of a series of multi-dimensional data points arranged according to a flight time sequence ₀ ,...,x _i ,x _i+1 ,..), wherein x _i Representing a point in the trajectory, including dimensions of height, time, etc., the details of which are shown in table a below:

TABLE a

And 2, preprocessing data.

2.1 extracting trajectory behavior features

Extracting the track sequence tra (x) obtained in the step 1 ₀ ,...,x _i ,x _i+1 ,..) the behavior characteristics of the 2 nd, 3 rd and … … points are extracted in the following specific way:

respectively calculating the variation of the position, the speed and the angle of two adjacent track points, wherein the calculation formula is as follows:

in the formulae (1) to (5),

In the extracted track sequenceThe track behavior characteristic corresponding to the ith track point is recorded as b _i ，b _i Including the flight position variation of the aircraft at the ith track point

Flying speed

Flight angle

And the amount of change in flying speed

And the amount of change in flight angle

2.2 acquiring a sequence of trajectory projection images

The sequence of traces tra (x) ₀ ,...,x _i ,x _i+1 ,..) are projected on the RN planes to obtain a projection image sequence of the aircraft track. In the invention, two projection image sequences which are related to the same aircraft track are obtained by respectively projecting in a mode of projecting parallel to the ground and a mode of projecting perpendicular to the ground, namely RN is 2.

It is emphasized here that a trajectory is a directed line segment that connects each trajectory point in turn in the time of flight sequence of the same aircraft. The projection image obtained by projecting the aircraft at each track point at different angles adopts a gray scale image, and the other parts of the image are all darkest black except the projection track line which is brightest white. In the case of a gray image, an image having only one sample color per pixel is usually displayed in gray scale from darkest black to brightest white.

And 3, automatically extracting the feature vector representation with fixed length based on the auto-encoder feature representation.

Based onThe auto-encoder extraction method for correlating the trajectory sequence of the same aircraft and the behavior characteristics of each projection image sequence is shown in fig. 2. Firstly, the step trains an auto-encoder model based on LSTM (Long short term memory network, also called Long-short term memory network, which is an improved Recurrent Neural Network (RNN)) to extract a one-dimensional behavior characteristic vector with fixed length, which is expressed as Z ₁ ∈R ^1×p P represents Z ₁ Length of (d); then training the auto-encoder model based on CNN to extract a one-dimensional projection image feature vector with fixed length, which is expressed as Z ₂ ∈R ^1×q Q represents Z ₂ The length of the vector is equal to the length of the vector, and finally, the two feature vectors are spliced to obtain an input one-dimensional vector Z belonging to the next step ^1×(p+q) As in fig. 2. In fig. 2, the auto-encoder is a self-encoding network, which trains a model using input data as output labels, wherein the number of neurons in the middle layer is often smaller than the number of features, and the output of the neurons in the middle layer is used as features, thereby achieving the purpose of feature space transformation and dimension reduction.

3.1 auto-encoder model based on LSTM

The internal structure of the LSTM-based auto-Encoder model is shown in FIG. 3, and includes an LSTM Encoder (denoted by LSTM Encoder in FIG. 3) and an LSTM decoder (denoted by LSTM decoder in FIG. 3). For a given sequence of behavior features B _TR ＝(b ₁ ,b ₂ ,…,b _i ,…,b _T )(b ₁ ,b ₂ ,…,b _i ,…,b _T For the behavior characteristics of each track point in the track sequence, i ═ 0,1, …, T denotes the number of said track points in the track sequence), the input to the LSTM encoder is a sequence B _TR The LSTM encoder reads the input sequence in order and updates the hidden layer state h accordingly _t As shown in fig. 3, the update method is:

h _t ＝f _LSTM (h _t-1 ,b _t )，f _LSTM is an activation function, b _t Is an element in the input sequence that is currently read by the LSTM encoder;

at the last locus b _T After being processed, the layer state h is implied _T As a track sequence B _TR Low dimension of (2)Comprising a representation;

the LSTM decoder first starts with h _T Generating c as an initialized implicit State ₁ Then further generate (c) ₂ ,c ₃ ,…,c _T ) As shown in fig. 3, the LSTM decoder updates in the following manner:

wherein f is _LSTM Is an activation function;

the goal of the LSTM decoder is to reconstruct the input sequence B _TR The LSTM decoder takes the output of the LSTM encoder as input and the input of the LSTM encoder as its learning target, and is a self-supervised learning method without additional tag data. LSTM encoder and LSTM decoder pass minimization (b) ₁ ,b ₂ ,…,b _i ,…,b _T ) And (c) ₁ ,c ₂ ,…,c _i ,…,c _T ) Training the trajectory sequence by the reconstruction error; the loss function of the auto-encoder model based on the LSTM adopts the mean square error, and the calculation formula is as follows:

since the entire input sequence can be reconstructed by an LSTM decoder, a fixed-length vector Z is set in this model ₁ ∈R ^1×p Behavior characteristic sequence B capable of well representing input _TR ＝(b ₁ ,b ₂ ,…,b _T )。

3.2 auto-encoder model based on CNN

CNN (Convolutional Neural Network), also called Convolutional Neural Network, is a kind of feed forward Neural Network (fed Neural Networks) containing convolution calculation and having a deep structure, and is commonly used for processing image-related tasks, such as image classification, image detection, and the like.

In fig. 4, the CNN-based auto-encoder model takes each projection image sequence I as an input, the output layer being the same size as the input layer, with the aim of reconstructing its own input,the goal is to convert the input vector I into a smaller potential representation Z by the CNN encoder acting as a compression filter ₂ Then the CNN decoder tries to reconstruct it to I'.

As shown in fig. 4, the auto-Encoder model based on CNN includes a CNN Encoder (denoted by CNN Encoder in fig. 4) and a CNN decoder (denoted by CNN decoder in fig. 4), the CNN Encoder includes 3 conv layers (convolutional layers), 1 reshape layer and 1 FC layer (fully connected layers, abbreviated as "FC") connected in sequence, and each conv layer is followed by a leaky relu activation function,

the conv layers in the CNN encoder are used to extract the image features of each element in the input vector I, and the parameters of each conv layer relate to the convolution kernel size, the number of convolution kernels and the step size. Taking the first conv layer in the CNN encoder as an example, conv specifically operates as follows,

let us note the I size K of the first layer of input data, the KS size of the convolution kernel _s ×K _s If the number of convolution kernels is num and the step length is s, the size of the output data O of the first conv layer is

The last dimension of O is channel c, O _c Is the data of O in the c channel, then

O _c ＝W _c I

Wherein, W _c Are the hyper-parameters to be learned in the model.

Let O be _c (i, j) represents the value of data O at location (i, j, k) on the c-channel, w _c (i, j) represents data W _c The value at location (I, j), I (I, j) representing the value of data I at location (I, j), then O _c (i, j) is calculated as,

after each convolutional layer in the CNN encoder, a LeakyReLU activation function is used, and the expression is

Wherein a ∈ (1, + ∞) fixed parameter

The reshape layer in the CNN encoder is used to change the size of the feature map output by the conv layer.

The FC layer in the CNN encoder is configured to perform dimensionality reduction on input data, and reduce data X input to the FC layer from an M dimension to an M 'dimension (M' < M) to obtain Y, expressed as,

Y＝WX ^T

wherein X ∈ R ^1×M ,Y∈R ^M′×1 ,W∈R ^M′×M And here W is also the hyper-parameter to learn in the model.

The CNN decoder includes 3 deconv layers (deconvolution layers), 1 reshape layer, and 1 FC layer, and employs a LeakyRelu activation function after each deconv layer,

the FC layer in the CNN decoder is used to upscale the output data of the CNN encoder, increasing the input data X from M 'dimension to M dimension (M' < M) to get Y, the expression is,

Y＝W ^d X ^T

wherein X ∈ R ^1×M′ ,Y∈R ^M×1 ,W ^d ∈R ^M×M′ And here W ^d And is also a hyper-parameter to be learned by the CNN decoder.

The reshape layer in the CNN decoder is used to change the size of the feature map after being upscaled by the FC layer. The layer needs to convert the input data into data with dimension of M multiplied by N multiplied by C, and the input data X belongs to R ^1×M The output data is Y ∈ R ^M×N×C Then, the calculation formula for the value Y (m, n, c) at the position (m, n, c) in the data Y is as follows,

y (m, n, c) x (0, i), where i mn + c

The 3 deconv layers in the CNN decoder are used to reconstruct the feature map output by the reshape layer into I', and the parameters of each deconv layer also relate to the size of the convolution kernel, the number of the convolution kernels and the step size. Taking the last deconv layer of the CNN decoder as an example, the deconv specifically operates as follows,

the input data I has a size of K × K × C, and the convolution kernel KS has a size of K _s ×K _s If the number of convolution kernels is 1 and the step length is s, the size of the output data O is

Then

O＝W ^d I

Wherein, W ^d Is a hyper-parameter to be learned in the CNN decoder model.

Let O (i, j) denote the value of the output data O at position (i, j), W (i, j, c) denote the coefficient matrix W ^d The value at the position (I, j, c) and I (I, j, c) represents the value of the data I at the position (I, j, c), the calculation formula of O (I, j) is as follows:

wherein m is 0,2, …, K _s -1,

n＝0,2,…,K _s -1

L(I,I′)＝|I-I′| ² and L (I, I') represents a loss function.

Output Z of CNN encoder ₂ ∈R ^1×q Is the input of the CNN decoder and is also the feature vector representation of the projected image of length q to be acquired in this step.

3.3 splicing

The characteristics obtained by splicing (3.1) and (3.2) represent Z ₁ (p) and Z ₂ (q) obtaining Z (p + q), wherein Z (p + q) is Z ₁ (p)+Z ₂ (q)

＝[z _1,0 ,z _1,1 ,…,z _1,p-1 ]+[z _2,0 ,z _2,1 ,…,z _2,q-1 ]＝[z _1,0 ,z _1,1 ,…,z _1,p-1 ,z _2,0 ,z _2,1 ,…,z _2,q-1 ]

Step 4, clustering behavioral intention based on DBSCAN

Clustering behavioral intention based on DBSCAN uses the feature representation Z (p + q) obtained in step 3 as algorithm inputAnd finally, clustering by adopting a DBSCAN algorithm to obtain behavior intention categories. DBSCAN (Density-Based Spatial Clustering of Applications with Noise, Density-Based Clustering method) is a Density-Based Spatial Clustering algorithm. The algorithm characterizes how close the sample distribution is based on a set of "domain" parameters (ε, MinPts). Given dataset D ═ x ₁ ,x ₂ ,…,x _m Defines and illustrates the following concepts:

(a) ε -neighborhood: for x _j E.g. D, whose e-neighborhood contains the sum x in the sample set D _i Samples having a distance of not more than epsilon, i.e. N _ε (x _j )＝{x _i ∈D|dist(x _i ,x _j ) ≦ ε }, as shown by the dashed circle in FIG. 5;

(b) core object (core object) if x _j Contains at least MinPts samples, i.e. | N _ε (x _j ) | ≧ MinPts, then x _j Is a core object, such as x in FIG. 5 ₁ ；

(c) Direct density-reachable (x) number _j Is located at x _i E-neighborhood of (a), and x _i Is a core object, then called x _j From x _i Density through, e.g. x in FIG. 5 ₂ From x ₁ The density is direct;

(d) density-accessible (dense-accessible) for x _i And x _j If a sample sequence p is present ₁ ,p ₂ ,...,p _n Wherein p is ₁ ＝x _i ,p _n ＝x _j And p is _i+1 From p _i When the density is up to, it is called x _j From x _i The density can be reached, as shown by x in FIG. 5 ₃ From x ₁ The density can be reached;

(e) density-connected (density-connected) for x _i And x _j If x is present _k So that x _i And x _j Are all x _k When the density is up, it is called x _i And x _j Density connected, as in x in FIG. 5 ₃ From x ₄ The densities are connected.

Fig. 6 is a schematic diagram of the cluster analysis process of the DBSCAN algorithm implemented by codes. As shown in fig. 6, in the present invention, the two core steps of the DBSCAN algorithm for cluster analysis of the eigenvectors of the aircraft are:

(1) finding all core objects according to the given domain parameters (ε, MinPts), as shown in lines 1-7 of FIG. 6;

(2) starting with any core object, find the cluster generated by the samples whose density is reachable until all core objects have been visited, as shown in fig. 6, lines 10-24.

Step 5, behavior intention multi-classification model based on darknet-53

Adopting the behavior characteristic representation Z corresponding to the track sequences related to different aircrafts and obtained by 3.1 in the step 3 ₁ ∈R ¹ ^×p And the projection image I obtained in step 2 by 2.2 ₂ ∈R ^K×K The I epsilon obtained after the treatment belongs to R ^K×K×3 And (4) as the input of the multi-classification model, taking the corresponding clustering class obtained in the step (4) as an output label of the multi-classification model, and training the multi-classification model for identifying the behavior intention of the aircraft based on the dark net-53 network.

Two steps 5.1, 5.2 of training the behavioral intent multi-classification model are specifically set forth below in conjunction with fig. 7:

5.1 data processing

(1) Representing the behavior characteristic Z by adopting a min-max normalization method ₁ (1 × p) to Z ₁ ' (1 × p), the transfer function is as follows

Where max is the maximum value of the sample data and min is the minimum value of the sample data.

(2) Order to

Will z ₁ ' conversion to obtain I ₁ ∈R ^K×K ,I ₁ The mth row of data is calculated as,

I ₁ (m)＝A+B

wherein "+" is consistent with the splicing operation of 3.3 in step 3,

a is one-dimensional data with m zero elements, e.g., [0,0, …,0],

and is

(3) The I obtained in (2) ₁ ∈R ^K×K With two projection images I ₂ ∈R ^K×K Obtaining I epsilon R after splicing ^K×K×3 It can be approximately seen as an image with three channels.

(4) The training data is augmented to the image using one or more of image data enhancement techniques such as flipping, rotating, scaling, cropping, and translating.

(5) And (4) dividing the data set according to the ratio of 6:2:2 to obtain a training, testing and verifying data set.

5.2 Darknet-53-based Multi-Classification model

And (3) reading the training set data and the verification set data obtained in the step (5.1) into a Darknet-53 network, and training the network by using the category obtained by clustering in the step (4) as a label and using a cross entropy loss function. For input data I ∈ R ^K×K×3 The parameters and output data of each layer of Darknet-53 network are shown in the following table b, wherein

stride is the step length, kernel is the filter dimension, channel is the number of filters, and class _ num is the total number of classes of the classification model;

1 ×,2 ×,8 ×,4 × in Number column indicate that the module is repeated 1 time, 2 times, 8 times and 4 times, respectively;

conv layers represent convolutional layers; residual Block represents that the Block is a Residual layer;

gobal avgpool represents global average pooling;

the full connected layer (FC layer for short) represents the full connection layer;

the third dimension of the data represents the channel, e.g., "32" in K × 32 is the number of channels; class _ N is the total number of cluster categories obtained in step 4;

the Activate function uses softmax for all but the output layers and LeakyReLu for all but the convolution layers.

(1) Layer conv

In Table b, the input dimension of the first conv layer is 5.1(3) and the obtained data I e R ^K×K×3 Here, the convolution kernel size is 3 × 3, the step size is 1, and the output is

For each of the outputs

The calculation formula of (a) is as follows,

Y _c ＝W _c X ₀

wherein X ₀ Input data of dimension K × 3; w _c A coefficient matrix with K multiplied by 3 dimensions on the c channel; y is _c On the c-th channel

Dimension data.

Let Y _c (i, j) represents Y ₁ Data, x, in the array at row i and column j ₀ (m, n, l) represents X ₀ Data at the (m, n, l) position in the array, w _c (m, n, l) represents data at the (m, n, l) position in the W array on the c-th channel,

then Y is _c The calculation formula of (i, j) is as follows,

(2) activating a function

Given a set of inputs in a neural network, the activation function of a neuron defines a set of outputs, and the darknet-53 network used in this patent employs a softmax activation function at the output level, where z defines a vector of output level inputs.

The expression of the softmax activation function is,

wherein, the convolutional layers except the output layer are activated by using LeakyReLU.

Table b

(3)Residual Block

The input and output of the Residual Block have the same size and channel number, and the structure of the Residual Block derived from the Residual network resnet is shown in fig. 8 and includes two branches: identity mapping and residual branching. The solid circles with a plus sign in the figure represent skipped connections, and their corresponding formula is defined as follows:

x _t+1 ＝F _t (x _t )+x _t

wherein x is _t And x _t+1 Input and output vectors, F, of the t-th residual block, respectively _t (x _t ) The transfer function is represented, corresponding to the branches stacked by convolution and leakyRelu. The deep residual network thus composed is easy to flow information and easy to train.

(4) FC layer

The FC layer (FC) plays the role of a classifier in the Darknet-53 network, the learned distributed feature representation is mapped to a sample class space, the fully connected core operation is the matrix-vector product, and the expression is as follows:

Y＝WX ^T

wherein, W represents a class _ Nx 1024 coefficient matrix, X represents a 1X 1024 matrix of the output of the global avgpool layer, Y represents a class _ Nx 1 matrix, and X represents a class _ Nx 1 matrix ^T Representing the transpose of X.

(5) Layer of global avgpool

global avgpool is also called global average pooling layer (GAP for short). The input of the GAP layer is the output of the previous layer, namely (K/32) × (K/32) × 1024, the kernel is (K/32) × (K/32) and the step length is 1, and then the average value is selected as the output in each (K/32) × (K/32) area, and the output scale is M. GAP not only avoids the overfitting risk brought by full connected but also reduces the parameter amount.

(6) Cross entropy loss function

The expression of the cross entropy loss function is as follows

Wherein: class_N represents the number of categories of behavioral intentions of the aircraft;

sample _ N represents the total number of samples participating in training;

y _ic representing a symbolic function (taking a value of 0 or 1), if the real category of the sample i is equal to c, taking 1, and otherwise, taking 0;

p _ic representing the predicted probability that the observed sample i belongs to class c.

In conclusion, the method and the device make full use of the track data of each mode of the aircraft, not only can dig out the behavior intention category of the aircraft, but also can identify the behavior intention of a section of new track. The DBSCAN-based behavior category clustering algorithm input adopted by the invention not only comprises text data (comprising time, longitude, latitude, speed and altitude dimensions) but also comprises projected images of the aircraft track to the ground. In order to judge the flight behavior intention of a section of new aviation track, the invention also adds a multi-classification deep learning model based on CNN, and takes behavior clustering categories as the label data of a classification algorithm, thereby saving the manual labeling cost.

It should be understood that the above-described embodiments are merely preferred embodiments of the invention and the technical principles applied thereto. It will be understood by those skilled in the art that various modifications, equivalents, changes, and the like can be made to the present invention. However, such variations are within the scope of the invention as long as they do not depart from the spirit of the invention. In addition, certain terms used in the specification and claims of the present application are not limiting, but are used merely for convenience of description.

Claims

1. An aircraft behavior intention recognition method based on multi-mode deep learning is characterized by comprising the following steps:

s4, converting the behavior eigenvectors of the aircraft corresponding to the association extracted in the step S2 into a matrix I with dimension K multiplied by K ₁ And converting each of the feature vectors of the projection images extracted in step S2 into corresponding projection images I ₂ Then mix I ₁ And each I ₂ Stitching into image I _Splicing Finally, by associating said image I of the corresponding aircraft _Splicing The behavior intention type of the aircraft corresponding to the association obtained in the step S3 is a model training sample, and a behavior intention multi-classification model is formed through training;

s5, converting the acquired flight path data which are related to the same aircraft into the matrix I according to the method provided by the steps S1-S2 ₁ And each of the projection images I ₂ And is combined with ₁ And each I ₂ Stitching into the image I _Splicing And then inputting the behavior intention multi-classification model into the behavior intention multi-classification model, and outputting the behavior intention identification result of the aircraft by the model.

2. The multi-modal deep learning-based aircraft behavioral intention recognition method according to claim 1, characterized in that the data dimensions of each of the trajectory points in the trajectory sequence include time of flight, altitude of flight, speed of flight, and longitude and latitude of the trajectory point.

3. The method for recognizing aircraft behavior intention based on multi-modal deep learning of claim 1, wherein in step S2, the behavior feature of the track corresponding to the ith track point in the track sequence as the behavior feature extraction object is recorded as b _i ，b _i Including the flight position variation of the aircraft at the ith track point

Flying speed

Flight angle

And the amount of change in flying speed

And the amount of change in flight angle

4. The method for recognizing behavioral intention of aircraft based on multi-modal deep learning according to claim 3,

calculated by the following formulas (1) to (5), respectively:

in the formulae (1) to (5),

respectively representing the latitude and longitude of the ith-1 track point sequenced in the previous track point in the track sequence;

respectively representing said aviationThe time that the device flies to the ith and (i-1) th trace points.

5. The method for recognizing the behavioral intention of the aircraft based on the multi-modal deep learning according to claim 1, wherein the aircraft projects two projection images at each track point in a manner of projecting in parallel to the ground and in perpendicular to the ground.

6. The method for recognizing aircraft behavior intention based on multi-modal deep learning as claimed in any one of claims 1 to 5, wherein in step S2, the behavior features of the track sequence are extracted by using a pre-trained LSTM-based auto-encoder model, and the method for extracting the behavior features of the track sequence based on the LSTM-based auto-encoder model comprises the following steps:

at the last locus b _T After being processed, the hidden layer state h _T As said track sequence B _TR A low-dimensional implicit representation of;

wherein f is _LSTM Is an activation function;

7. the method for recognizing aircraft behavior intention based on multi-modal deep learning as claimed in any one of claims 1 to 5, wherein in step S2, the pre-trained auto-encoder model based on CNN is used to extract the behavior characteristics of the projection image sequence, and the method for extracting the behavior characteristics of the projection image sequence based on CNN is as follows:

the CNN-based auto-encoder model includes a CNN encoder and a CNN decoder, with the projection image sequence being I, the aim of the CNN encoder being to convert an input vector I into a potential representation Z ₂ The aim of the CNN decoder is to represent the potential representation Z ₂ The mixture is reconstructed into I',

the loss function of the auto-encoder model based on the CNN adopts a mean square error, and the calculation formula is as follows:

L(I,I′)＝|I-I′| ² and L (I, I') represents a loss function.

8. The method for recognizing behavioral intention of aircraft according to claim 1, wherein in step S3, the feature vectors extracted in step S2 and related to different aircraft are subjected to cluster analysis by DBSCAN density clustering algorithm.

9. The method for recognizing the behavioral intention of the aircraft based on the multi-modal deep learning according to claim 1, wherein in the step S4, the method for training the behavioral intention multi-classification model comprises the steps of:

Order to

Will z ₁ ' obtaining a matrix with dimension K multiplied by K after conversion, and marking as I ₁ ∈R ^K×K ,I ₁ The mth row of data calculation formula of (a) is:

I ₁ (m) ═ a + B, where m is 0,1, …, K-1

Wherein, A represents one-dimensional data with m zero elements, and the matrix expression is as follows: [0,0, …,0], wherein the number of zero elements in the matrix is m;

and is

p represents z ₁ (p) length;

l2, 1 ₁ ∈R ^K×K Two projection images I associated with the same aircraft ₂ ∈R ^K×K After splicing, obtain I _Splicing ∈R ^K ^×K×3 ；

L3, image I, using any one or more of image flipping, rotation, scaling, cropping, translation _Splicing Performing data enhancement to expand a model training sample, and dividing the model training sample into a training data set, a test data set and a verification data set in proportion;