CN116629115A - Bidirectional data driving ship track prediction method and system based on attention mechanism - Google Patents
Bidirectional data driving ship track prediction method and system based on attention mechanism Download PDFInfo
- Publication number
- CN116629115A CN116629115A CN202310583916.3A CN202310583916A CN116629115A CN 116629115 A CN116629115 A CN 116629115A CN 202310583916 A CN202310583916 A CN 202310583916A CN 116629115 A CN116629115 A CN 116629115A
- Authority
- CN
- China
- Prior art keywords
- sequence
- machine learning
- learning model
- input
- prediction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 92
- 230000007246 mechanism Effects 0.000 title claims abstract description 42
- 230000002457 bidirectional effect Effects 0.000 title claims abstract description 33
- 238000010801 machine learning Methods 0.000 claims abstract description 84
- 238000012549 training Methods 0.000 claims abstract description 47
- 239000002131 composite material Substances 0.000 claims abstract description 13
- 230000008569 process Effects 0.000 claims description 35
- 238000013507 mapping Methods 0.000 claims description 32
- 238000013528 artificial neural network Methods 0.000 claims description 27
- 230000004927 fusion Effects 0.000 claims description 23
- 230000006870 function Effects 0.000 claims description 16
- 239000013598 vector Substances 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 7
- 239000000654 additive Substances 0.000 claims description 6
- 230000000996 additive effect Effects 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 6
- 238000013139 quantization Methods 0.000 claims description 4
- 239000012633 leachable Substances 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000003245 working effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Geometry (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a bidirectional data driving ship track prediction method and a bidirectional data driving ship track prediction system based on an attention mechanism, relates to the technical field of track prediction, and mainly obtains an observation sequence with a forward length of g from an AIS data setThe sequence will be observedInput to a first machine learning model to obtain an intermediate pre-run of length lSequencingAt the same time, the observation sequence with the backward length of g is obtained from the AIS data setThe sequence will be observedInputting the intermediate prediction sequence into a second machine learning model to obtain an intermediate prediction sequence with the length of lIntermediate prediction sequenceIntermediate prediction sequencesAnd splicing to form composite training data serving as the input of a third machine learning model to obtain a prediction result. The method solves the problems of limitation and low prediction precision existing in the prior art for predicting the future track in one direction.
Description
Technical Field
The invention relates to the technical field of track prediction, in particular to a bidirectional data driving ship track prediction method and system based on an attention mechanism.
Background
AIS systems are an important technique for ensuring the safety of marine transportation. The static state of the ship is recorded by an AIS system, such as an offshore mobile service identifier (MMSI), and dynamic information of the ship position, the ground speed and the like. AIS provides expert service for supporting offshore navigation decision-making, information broadcasting, offshore avoidance and environmental protection. However, marine vessels are susceptible to problems such as bad weather or misplanning of the course, which affects the sailing efficiency. Problems such as ship collision and blockage may occur due to improper driving of the captain and poor maneuverability. Therefore, there is a need to explore the great potential of AIS data in marine risk pre-warning and route optimization.
The current method for predicting the ship track mainly uses unidirectional historical track data to train a model for predicting the future track, which does not consider bidirectional historical track data, namely, the historical track data after the track is predicted by the target is lacked to be considered. In the traffic scene reappearance prediction tasks of some maritime areas, the existing method has the defect of improving prediction accuracy due to the fact that the existing method has the defect of excavating ship historical multidirectional track data.
The linear model can well process the track prediction of the ship in linear navigation. A typical linear model is the Constant Velocity Model (CVM). Representative prediction methods mainly include a kinematics-based model, a constant speed model, an ornstein uhlenbeck model, and a kalman filter variant. However, the linear model performs poorly when the vessel needs to change heading or speed and turn to navigate in some situations. Therefore, in order to overcome the weakness of the linear model, some researchers have studied the nonlinear model to improve the prediction accuracy. Typical nonlinear trajectory prediction models are machine learning and deep learning based methods. Common machine learning methods include trajectory clustering, support vector machines, gaussian models, and the like. However, the accuracy of machine-based learning methods relies on labels and physical knowledge identification. The deep learning-based method can capture the characteristics of more input data and improve the track prediction accuracy. Common deep learning prediction methods such as artificial neural networks, recurrent neural networks, and variants thereof. However, the existing method only considers the effectiveness of training and learning the unidirectional historical track data to realize the target prediction track, and lacks the research on the bidirectional historical track data. In the reconstructed historical sea traffic scene, how to use the bidirectional historical track data characteristics to improve the track prediction accuracy is still a difficulty.
Disclosure of Invention
The invention aims to provide a bidirectional data driving ship track prediction method based on an attention mechanism, which aims to solve the problems of insufficient extraction of bidirectional historical track data characteristics and low prediction precision in the prior art.
In order to solve the technical problems, the invention adopts the following technical scheme: a bidirectional data driving ship track prediction method based on an attention mechanism comprises the following steps:
s1, acquiring an observation sequence with a forward length of g from an AIS data setInputting the observation sequence into a first machine learning model to obtain an intermediate prediction sequence with length of l +.>At the same time, the observation sequence with the backward length g is acquired from the AIS data set>Inputting the observation sequence into a second machine learning model to obtain an intermediate prediction sequence with length of l +.>
S2, middle prediction sequence of first machine learning modelAnd intermediate prediction sequence of the second machine learning model +.>Splicing to form composite training data;
s3, taking the composite training data as the input of a third machine learning model to obtain a prediction result.
Preferably, the acquiring process of the first machine learning model, the second machine learning model and the third machine learning model includes: the acquired ship time sequence track data are used as AIS data sets, and the AIS data sets are divided into training sets and testing sets; training the forward sub-block and the backward sub-block model by taking the training set as the input of the forward sub-block and the backward sub-block model to obtain a first machine learning model and a second machine learning model; and combining the intermediate outputs of the optimal forward sub-block and the backward sub-block model obtained in the training process into a composite fusion training set, inputting the composite fusion training set into a fusion prediction block, and training the fusion prediction block to obtain a third machine learning model.
More preferably, the first machine learning model is composed of a GRU neural network and an attention mechanism; the second machine learning model is composed of a BiGRU neural network and an attention mechanism; the third machine learning model is a multi-layer perceptron network.
More preferably, in step S1, the first machine learning model observes an input forward observation sequenceMapping to an output sequence, intermediate prediction sequence of length l +.>The acquisition process of (1) comprises:
forward observation sequence in datasetSequentially inputting elements of (a) into a first machine learning model, wherein t represents a position in a current forward track sequence;
updating hidden sequences of a first machine learning model according toWherein the method comprises the steps ofEach element->Representing features extracted from sample points at t time in g middle feature states in length in a forward track sequence;
the forward attention distribution coefficient representing the t-th sample point in the forward trajectory sequence is calculated as follows: the attention weighting score representing the t-th sample point in the forward trajectory sequence is calculated as follows: /> Is a network parameter of an additive calculation process, +.>Space-time feature codes extracted from input sample points at the t-th moment in g tracks of input time sequence length for the GRU network;
element(s)Representing a spatiotemporal feature code extracted from an input sample point at time t in a g-track of input timing length over a GRU network,/>Each->Representing a spatiotemporal feature extracted from the last sample point of the g tracks of length of input timing, t>Track points representing sequences input to the GRU neural network at the current moment, namely input sample points of the first machine learning model, theta gru Representing the parameter values in each input-to-output mapping process;
θ gru representing a set of parameters in a GRU network mapping process in a first machine learning model: representing the ith time sequence point in the forward track sequence with length g input into the first machine learning model, L representing quantization error, N representing total number of training samples, +.>Mapping output representing length lThe j-th time sequence point in the sample, theta represents the parameter value in the mapping process from each input to the output;
M g,l representing a forward input sequence of g at a given lengthNext, the output sequence Y of length l is predicted l Thereby maximizing the conditional probability: m is M g,l =argmax Y p(Y l |X g ),p(Y l |X g ) Representing a given g observation sequences X g Mapping to future l predicted track sequences Y l Probability of (2);
inputting vectors composed of the outputs of all hidden layers into a full-connection layer to obtain an intermediate prediction sequence with length of l
In step S1, the second machine learning model observes the sequence of input backward observationsMapping to an output sequence, intermediate prediction sequence of length l +.>The acquisition process of (1) comprises:
backward observation sequence in datasetSequentially inputting elements of (a) into a second machine learning model, wherein t represents a position in a current backward track sequence;
updating hidden sequences of a second machine learning model according toWherein the method comprises the steps ofEach element->Representing backward directionInputting the extracted features of the sample points at the t moment in the g intermediate feature states;
the forward attention distribution coefficient representing the t-th sample point in the backward trajectory sequence is calculated as follows: the attention weighting score representing the t-th sample point in the backward trajectory sequence is calculated as follows: />v, W, U are network parameters of the additive calculation process, < >>Space-time feature codes extracted from input sample points at the t-th moment in g tracks of the input time sequence length for the BiGRU network;
for the fusion of hidden states of a forward layer and a reverse layer by a BiGRU network, the calculation formula is as follows:wherein W is a weight matrix in the network, and b is a bias value;
wherein the method comprises the steps of Track points of a backward sequence input at the current t moment are represented, namely input sample points of a second machine learning model;
representing positive in a second machine learning modelParameter sets in the layer mapping process: represents the ith timing point in the sequence of backward trajectories of length g input into the forward layer of the second machine learning model. />Representing a set of parameters in a reverse layer mapping process in a second machine learning model: represents the ith time sequence point in the backward track sequence with length g input into the backward layer, L represents quantization error, N represents total number of training samples, +.>A j-th time sequence point in a mapping output sample with the length of l is represented, and theta represents a parameter value in each input-to-output mapping process;
inputting vectors composed of the outputs of all hidden layers into a full-connection layer to obtain an intermediate prediction sequence with length of l
More preferably, in step S3, the third machine learning model concatenates the input intermediate prediction sequenceMapping to an output sequence, predicted output +.>The acquisition process of (1) comprises:
splice intermediate predicted sequencesSequentially inputting elements of (a) into a third machine learning model, wherein t represents a t-th intermediate prediction pair in the current intermediate prediction sequence;
obtaining a hidden sequence of the third machine learning model according toWherein the method comprises the steps of Intermediate predictive splice value representing the current t-moment input data, i.e. the input data of the third machine learning model,/->Representing the output hiding state value obtained at the last moment, wherein θ represents the parameter value in the process of mapping each pair of input splicing values to the output;
output according to the following jth hidden layerWherein, the hidden state h of the 1 st hidden layer 1 Is->Sigma is the activation function of the 1 st hidden layer, l is the total number of inputs, b t Is the bias of the 1 st hidden layer, w t Is the weight of the connection layer, sigma j Is a nonlinear activation function with a leachable parameter theta for the j-th hidden layer, b j Is the bias of the j-th hidden layer, w j Is the weight of the j-th connecting layer;
inputting vectors composed of outputs of all hidden layers into a full-connection layer to obtain a predicted output sequence with length of l
In addition, the invention also provides a bidirectional data driving ship track prediction system based on an attention mechanism, which comprises the following components:
a data acquisition module for acquiring AIS data from the AISConcentrated acquisition of observation sequences of forward length gAnd a viewing sequence having a backward length g +.>
A data processing module for observing the sequenceInputting into a first machine learning model to obtain an intermediate prediction sequence with length of l>And is used to observe the sequence->Inputting into a second machine learning model to obtain an intermediate prediction sequence with length of l>
A data stitching module for stitching the intermediate prediction sequence of the first machine learning modelAnd intermediate prediction sequence of the second machine learning model +.>Splicing to obtain composite training data;
and the result prediction module is used for taking the composite training data as the input of a third machine learning model to obtain a predicted result.
The system adopts the bidirectional data driving ship track prediction method based on the attention mechanism to operate.
Compared with the prior art, the method and the device can extract the characteristics of the bidirectional historical time sequence track data, introduce a attention mechanism to carry out weight assignment on the output characteristics of the neural network so as to enhance the characteristic learning sensing capability of the model on the data, and skillfully use the multi-layer perceptron network to carry out intermediate output characteristic fusion learning on different sub-models, thereby greatly improving the characteristic learning capability of the model on the time sequence track data and the prediction precision of the model.
Drawings
FIG. 1 is a schematic diagram of a prediction method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a prediction method according to an embodiment of the present invention;
FIG. 3 is a block diagram of a GRU network in an embodiment of the invention;
FIG. 4 is a diagram of a BiGRU network in an embodiment of the invention;
FIG. 5 is a diagram of a multi-layer perceptron (MLP) network architecture in an embodiment of the invention;
fig. 6 is a diagram of the results of exploring the optimal input and output steps in RMSE performance in accordance with an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to examples and drawings, to which reference is made, but which are not intended to limit the scope of the invention.
The machine learning model training process adopted by the embodiment of the invention is as follows: in this embodiment, the existing historical track data in the AIS dataset is adopted and divided into two parts, namely a training set and a testing set. It will be appreciated by those skilled in the art that in practical application of the method, the test set should be obtained from the time series data set to be predicted for the ship track, and since the present embodiment is merely used for explaining and verifying the method, the time series data in the existing data set may be used as the test set.
According to 164 pieces of ship data downloaded from an official website, 940943 pieces of ship time series track data collected by an AIS receiver device are taken as AIS data sets, and the data sets are respectively divided into a training set and a test set according to a ratio of 7:3.
In a training set, after preprocessing track segments of each ship corresponding to a data set, dividing the track segments into a forward input track sequence and a backward input track sequence according to the front-to-back relation of target preprocessing real labels, respectively inputting the forward input track sequence and the backward input track sequence into a first machine learning model and a second machine learning model, extracting deep features of time sequence data, predicting by using a tanh function at an output layer, calculating errors between the predicted track labels and the real track labels by using a root mean square error function, calculating weights and biases of all layers of a neural network by a back propagation algorithm, continuously iteratively training the neural network in the first machine learning model and the second machine learning model until a loss function converges, obtaining an optimal training first machine learning model and an optimal training second machine learning model, and obtaining an intermediate prediction sequence of training data of the training set.
Splicing the intermediate prediction sequences of the optimal training first machine learning model and the second machine learning model, inputting the spliced intermediate prediction sequence vector into a fusion prediction block, utilizing a fusion prediction block multi-layer perceptron network to learn and extract depth characteristics, predicting an output layer by using a tanh function, calculating errors between a prediction track label and a real track label by using a root mean square error function, calculating weights and biases of all layers of a neural network by using a back propagation algorithm, continuously iterating the training of the neural network in the fusion prediction block until a loss function converges, obtaining a model of the optimal training fusion prediction block, and finally integrally storing the whole first, second and third machine learning models.
Example 1
The embodiment of the invention utilizes the GRU neural network, the attention mechanism network, the BiGUR neural network, the attention mechanism network and the multi-layer perceptron network which are obtained after training to realize the feature extraction and better prediction performance of the prediction model on the forward and backward historical track data.
Fig. 2 shows a structural overview of the proposed bi-directional data-driven ship track prediction method, wherein the forward sub-block, the backward sub-block and the fusion prediction block form an integral prediction framework for track prediction. The key characteristics of the bidirectional data driving prediction method of this embodiment are as follows:
firstly, the bidirectional data driving prediction method of the embodiment inputs the observed g existing track points, and outputs one future predicted l track points after learning and fusion of three key blocks. Wherein, the values of g and l can be adjusted according to the actual service requirement.
Secondly, the bidirectional data driving prediction method of the embodiment constructs a predictor block according to the same data set, and performs feature learning and prediction on information in the track in the same data set by using different neural network structures in the predictor block. The forward and backward track sequence segments in the data set are learned and the intermediate prediction sequences are performed by using the forward sub-block and the backward sub-block simultaneously, so that the characteristic learning and extraction of the comprehensive track can be performed on the data set by using the bidirectional data track, and the final prediction effect is obtained by performing splicing fusion learning on the intermediate prediction sequences by using the multi-layer perceptron network. The bidirectional data driven trajectory prediction flow based on the attention mechanism includes the following steps, as shown in fig. 2:
first, the forward sub-block receives the forward sequence with length g observed from the data set, and inputs the observed sequence into the GRU network and the attention mechanism network for learning, wherein each sequence point in the data set comprises the following data types: time, latitude, longitude, heading, and speed.
And (2) secondly, the backward sub-block receives a backward sequence with the length of g observed from the data set, and inputs the observed sequence into the BiGRU network and the attention mechanism network for learning, wherein each sequence point in the data set comprises the following data types: time, latitude, longitude, heading, and speed.
Step (3), obtaining a state sequence composed of hidden layer outputs after GRU network and attention mechanism network structure training learning, and inputting the state sequence into a full-connection layer to obtain an intermediate prediction sequence with the length of l
Step (4) the state sequence composed of the outputs of the hidden layer is obtained after training and learning the BiGRU network and the attention mechanism network structure and is input into the full-connection layer to obtain the intermediate characteristic state output sequence with the length of l
Step (5) then splicing the intermediate prediction sequences of the forward sub-block and the backward sub-block to form a new training dataThe training data will be input to the fusion prediction block.
Step (6): new training dataInputting the result into a fusion prediction block, learning the fusion prediction block by using a multi-layer perceptron network, and finally outputting the result of the overall prediction model +.>
The bidirectional data-driven ship track prediction method based on the attention mechanism, which is designed by the embodiment, can extract bidirectional historical track data so as to enhance the feature learning perception capability of the bidirectional historical track data, and introduce the attention mechanism to realize weight assignment on behavior feature states in the track data, so that the association relation of a model to different data dimensions in time sequence data is enhanced, the association relation of the model to different dimensions of the time sequence data is enhanced, and the prediction model precision is improved.
The forward sub-block employs a GRU neural network and an attention mechanism network. A portal recurrent neural network (GRU) is a variant of a Recurrent Neural Network (RNN), as shown in fig. 3, the interior of which comprises two structures: reset gates and update gates. The reset gate is used to reduce the information that is considered to be irrelevant in the previous cell, and the update gate is used to determine how much information needs to be transferred from the previous cell to the next cell.
The present embodiment uses an observed forward sequence defining a datasetRepresenting a generic sequence of length, t representing the position in the current track sequence. Input sequence->Sequential computation of hidden vector sequences over GRU networksThe specific GRU model is controlled by the following formulas (1) - (4)
Wherein σ represents a sigmoid activation function, tanh is a hyperbolic tangent function, r t ,z t Representing the outputs of the reset gate and the update gate,representing candidate output and actual output,/->Representing multiplication by element, us and Ws are weight matrices and bs are bias terms.
For output vectorMake a full connection layer calculation and output a characteristic state output sequence with length of lWherein each element->Representing the output of the t-th component of the code slave sequenceAnd (5) entering space-time characteristic data extracted from the track sequence of the GRU neural network. />Is an output set representing a hidden state full-connection layer connected to the GRU neural network, the dimension size of the hidden state is q, and q is the same as the dimension size of model input data.
Output vector to GRU networkWeight distribution assignment learning is performed by using an attention mechanism network to obtain predictive output +.>The specific attention mechanism network is calculated by the following formulas (5) - (6)
Where v, W, U are network parameters of the additive computation process,for the space-time characteristic code extracted from the input sample point of the g tracks with the length of the t moment in the forward track sequence of the GRU network,/for the GRU network>Attention weighting score value representing the t-th sample point in the forward track sequence, +.>Representing the forward attention distribution coefficient of the t-th sample point in the forward track sequence.
For output vectorPerforming a full-link calculation and outputting an intermediate prediction sequence with length l>
The backward sub-block adopts a BiGRU neural network and an attention mechanism network. As shown in FIG. 4, the BiGRU model has one more set of counter-propagating GRU models than the unidirectional GRU model, which enables the BiGRU to explore the past and future information in the observed backward sequence, thereby providing more efficient predictions. BiGRU will input the backward sequenceMapping to two output sequences, namely forward concealment sequence +.>Backward concealment sequence->And operates by the following formulas (8) - (10):
wherein each GRU function is a cyclic network of formulas (1) - (4), the GRU function non-linearly converts the input timing trace vector into a corresponding GRU hidden state,a trace point representing the backward sequence input at the current time t,representing parameter sets during forward layer mapping in a BiGRU network,/for>And representing a parameter set in a reverse layer mapping process in the BiGRU network, wherein W is a weight matrix in the network, and b is a bias value. Splicing the forward layer hidden state and the reverse layer hidden layer obtained by calculation of two unidirectional GRU networks through formulas (8) - (10) into compact bidirectional representation, and finally obtaining the characteristic state output vector +_ of the BiGRU block>
For output vectorMake a full connection layer calculation and output a characteristic state output sequence with length of l>Wherein each element->Representing the encoding of spatiotemporal feature data extracted from the sequence of trajectories of the t-th component of the sequence input into the biglu neural network. />Is an output set representing a hidden state full connection layer connected to the BiGRU neural network, the dimension size of the hidden state is q, and q is the same as the dimension size of model input data.
Output vector to BiGRU neural networkWeight distribution assignment learning is carried out by using an attention mechanism network to obtain prediction output of backward sub-blocks>The specific attention mechanism network is calculated by the following formulas (11) - (13)
Where v, W, U are network parameters of the additive computation process,space-time feature codes extracted from t-th time input sample points in g tracks with lengths in a backward track sequence for BiGRU network>Attention weighting score value representing the t-th sample point in the backward track sequence, +.>And the backward attention distribution coefficient of the t-th sample point in the backward track sequence is represented.
For output vectorPerforming a full-link calculation and outputting an intermediate prediction sequence with length l>
The present embodiment predicts intermediate prediction sequences for forward and backward sub-blocksAnd->Splicing is performed so as to form the input sequence of the final fusion prediction block +.>
The fusion prediction block adopts a multi-layer perceptron network. A multi-layer perceptron network (MLP) is a neural network with supervised learning techniques using a back propagation method. The multi-layer perceptron network reads the intermediate prediction sequence of the outputs of the forward and backward blocks in turnAnd updates the internal hidden state according to the following equation (14):
where sigma is the activation function of the 1 st hidden layer,representing the splice state at the t-th moment in the input variable, l is the total number of inputs, b t Is the bias of the layer, w t Is the weight of the connection layer. The following hidden layer then updates the internal hidden state by the following formula:
wherein sigma j Is a nonlinear activation function with a theta-leachable parameter for the j-th hidden layer, b j Is the bias of the j-th hidden layer, w tj Is the weight of the j-th layer connection,representing the output hidden state value calculated at the previous time, θ represents the parameter value in the mapping process from each pair of input splice values to the output. Finally, an output layer is added to accept the hidden state ++in equation (15)>As input, predictions are made in order.
In a fused prediction block, a multi-layer perceptron (MLP) networkAs shown in FIG. 5, the MLP network structure training process is to input sequencesMapping to an output sequence, i.e. concealment sequence +.>Calculation is performed by formula (16):
wherein MLP represents the operation of equation (14) and equation (15), usingOutputting vectors to hidden layersAnd performing a full-link layer operation again and outputting a sequence with the length of l. Finally, the predicted output sequence of the fused prediction block is calculated by the formula (17)
Wherein W is y And b y Mapping the MLP output to the next predicted positionIs a trainable parameter of the neural network. />An output value representing the t-th track sequence point in the output sequence of length l.
The following describes a prediction scene and an application model where the attention-based bidirectional data-driven prediction method of the present embodiment is located, and then analyzes the effectiveness of the method in the scene.
Prediction model:
(1) It is assumed that the user knows the locus observation point of the ship at sea for a certain period of time and that the locus observation sequence is from an AIS (automatic ship identification system) dataset.
(2) The user may derive the final future predicted track sequence from the forward sub-block, the backward sub-block, and the fused prediction block by inputting the known observed forward and backward sequences into a attention-mechanism based bi-directional data-driven prediction method.
(3) In the bidirectional data driving prediction method based on the attention mechanism of the present embodiment, the track point in the input observation sequence is not allowed to be less than 2, and the track point in the output prediction sequence is not allowed to be less than 1.
Validity analysis:
this section takes the ship track sequence of the west coast region of the united states as an example to analyze the effectiveness of the method provided by this embodiment.
(1) Compared with the existing work. This example compares the method with naive LSTM, GRU networks and the most advanced 3 existing works. This example works better than 85.42% on average in RMSE performance than the prior study. Wherein, the performance of the embodiment can be improved by at least 30.77% when being operated in the RMSE compared with the prior study, and can be improved by 99.15% when being operated in the RMSE compared with the prior study. Therefore, the bidirectional data driving prediction method based on the attention mechanism provided by the embodiment has good prediction precision. In addition, the embodiment explores that in experiments for predicting target track sequences with different lengths, the method proposed by the embodiment is superior to the existing researches in short-term, medium-term and long-term target track prediction tasks.
(2) And researching the optimal input sequence step length and prediction step length parameters, and further exploring the optimal parameters of the proposed method. In this embodiment, each record in the data set represents a track point of the ship in the sea area, the record number in the data set input into the model is used as an input step length, and the record number of the model prediction output in this embodiment is used as an output step length. The present example explores the best relationship between input step size and output step size. As shown in fig. 6, the present embodiment works on RMSE performance to find that the prediction effect of the present embodiment is to a good degree when the method of the present embodiment inputs step=4, 5,8, 9. Therefore, the invention can set reasonable input step length and output step length number in actual work to achieve the best working effect.
Example 2
The present embodiment relates to a bidirectional data driven ship track prediction device based on an attention mechanism, which includes a processor and a memory, wherein the memory stores a computer program for implementing the bidirectional data driven ship track prediction method based on the attention mechanism in embodiment 1 when the computer program is executed by the processor.
Specifically, the processor may be a Intel (R) Core (TM) i7-1165G7@2.80GHz processor, 16GB memory, programmed with software on the Keras framework using python 3.6.
The bidirectional data driving ship track prediction device based on the attention mechanism provided by the embodiment is used for realizing the bidirectional data driving ship track prediction method based on the attention mechanism according to the embodiment 1. Therefore, the technical effects of embodiment 1, the bidirectional data driving ship track prediction device based on the attention mechanism provided in this embodiment, will not be described herein.
In order to make it easier for a person skilled in the art to understand the improvements of the present invention with respect to the prior art, some of the figures and descriptions of the present invention have been simplified, and the above-described embodiments are preferred implementations of the present invention, but in addition, the present invention may be implemented in other ways, and any obvious substitution is within the scope of protection of the present invention without departing from the concept of the present technical solution.
Claims (8)
1. The bidirectional data driving ship track prediction method based on the attention mechanism is characterized by comprising the following steps of:
s1, acquiring an observation sequence with a forward length of g from an AIS data setInputting the observation sequence into the firstA machine learning model, obtaining the intermediate prediction sequence +.>At the same time, the observation sequence with the backward length g is acquired from the AIS data set>Inputting the observation sequence into a second machine learning model to obtain an intermediate prediction sequence with the length of l
S2, middle prediction sequence of first machine learning modelAnd an intermediate prediction sequence of a second machine learning modelSplicing to form composite training data;
s3, taking the composite training data as the input of a third machine learning model to obtain a prediction result.
2. The attention mechanism based bi-directional data driven vessel trajectory prediction method of claim 1, wherein the acquiring process of the first, second and third machine learning models comprises:
the acquired ship time sequence track data are used as AIS data sets, and the AIS data sets are divided into training sets and testing sets;
training the forward sub-block and the backward sub-block model by taking the training set as the input of the forward sub-block and the backward sub-block model to obtain a first machine learning model and a second machine learning model;
and combining the intermediate outputs of the optimal forward sub-block and the backward sub-block model obtained in the training process into a composite fusion training set, inputting the composite fusion training set into a fusion prediction block, and training the fusion prediction block to obtain a third machine learning model.
3. The attention-based bi-directional data driven vessel trajectory prediction method according to claim 1 or 2, wherein: the first machine learning model is composed of a GRU neural network and an attention mechanism; the second machine learning model is composed of a BiGRU neural network and an attention mechanism; the third machine learning model is a multi-layer perceptron network.
4. A bi-directional data driven vessel trajectory prediction method based on an attention mechanism as claimed in claim 3, wherein: in step S1, the first machine learning model observes an input forward observation sequenceMapping to an output sequence, intermediate prediction sequence of length l +.>The acquisition process of (1) comprises:
forward observation sequence in datasetSequentially inputting elements of (a) into a first machine learning model, wherein t represents a position in a current forward track sequence;
updating hidden sequences of a first machine learning model according toWherein the method comprises the steps ofEach element->Representing features extracted from sample points at t time in g middle feature states in length in a forward track sequence;
the forward attention distribution coefficient representing the t-th sample point in the forward trajectory sequence is calculated as follows: the attention weighting score representing the t-th sample point in the forward trajectory sequence is calculated as follows: />v, W, U are network parameters of the additive calculation process, < >>Space-time feature codes extracted from input sample points at the t-th moment in g tracks of input time sequence length for the GRU network;
element(s)Representing a spatiotemporal feature code extracted from an input sample point at time t in a g-track of input timing length over a GRU network,/>Each->Representing a spatiotemporal feature extracted from the last sample point of the g tracks of length of input timing, t>Track points representing sequences input to the GRU neural network at the current moment, namely input sample points of the first machine learning model, theta gru Representing the parameter values in each input-to-output mapping process;
θ gru representing a set of parameters in a GRU network mapping process in a first machine learning model: representing the ith time sequence point in the forward track sequence with length g input into the first machine learning model, L representing quantization error, N representing total number of training samples, +.>A j-th time sequence point in a mapping output sample with the length of l is represented, and theta represents a parameter value in each input-to-output mapping process;
M g,l representing a forward input sequence of g at a given lengthNext, the output sequence Y of length l is predicted l Thereby maximizing the conditional probability: m is M g,l =argmax Y p(Y l |X g ),p(Y l |X g ) Representing a given g observation sequences X g Mapping to future l predicted track sequences Y l Probability of (2);
inputting vectors composed of the outputs of all hidden layers into a full-connection layer to obtain an intermediate prediction sequence with length of l
5. The attention-based mechanism of claim 3The bidirectional data driving ship track prediction method is characterized by comprising the following steps of: in step S1, the second machine learning model observes the sequence of input backward observationsMapping to an output sequence, intermediate prediction sequence of length l +.>The acquisition process of (1) comprises:
backward observation sequence in datasetSequentially inputting elements of (a) into a second machine learning model, wherein t represents a position in a current backward track sequence;
updating hidden sequences of a second machine learning model according toWherein the method comprises the steps ofEach element->Representing features extracted from a t-th time sample point in a backward input time sequence state with g intermediate feature states;
the forward attention distribution coefficient representing the t-th sample point in the backward trajectory sequence is calculated as follows: the attention weighting score representing the t-th sample point in the backward trajectory sequence is calculated as follows: />v, W, U are network parameters of the additive calculation process, < >>Space-time feature codes extracted from input sample points at the t-th moment in g tracks of the input time sequence length for the BiGRU network;
for the fusion of hidden states of a forward layer and a reverse layer by a BiGRU network, the calculation formula is as follows:wherein W is a weight matrix in the network, and b is a bias value;
wherein the method comprises the steps of Track points of a backward sequence input at the current t moment are represented, namely input sample points of a second machine learning model;
representing a set of parameters in a forward layer mapping process in a second machine learning model: representing input to a second machineAnd learning an ith time sequence point in a backward track sequence with the length g in a forward layer of the model. />Representing a set of parameters in a reverse layer mapping process in a second machine learning model: represents the ith time sequence point in the backward track sequence with length g input into the backward layer, L represents quantization error, N represents total number of training samples, +.>A j-th time sequence point in a mapping output sample with the length of l is represented, and theta represents a parameter value in each input-to-output mapping process;
inputting vectors composed of the outputs of all hidden layers into a full-connection layer to obtain an intermediate prediction sequence with length of l
6. A bi-directional data driven vessel trajectory prediction method based on an attention mechanism as claimed in claim 3, wherein: in step S3, the third machine learning model concatenates the input intermediate prediction sequenceMapping to an output sequence, predicted output +.>The acquisition process of (1) comprises:
splice intermediate predicted sequencesSequentially inputting elements of (a) into a third machine learning model, wherein t represents a t-th intermediate prediction pair in the current intermediate prediction sequence;
obtaining a hidden sequence of the third machine learning model according toWherein the method comprises the steps of Intermediate predictive splice value representing the current t-moment input data, i.e. the input data of the third machine learning model,/->Representing the output hiding state value obtained at the last moment, wherein θ represents the parameter value in the process of mapping each pair of input splicing values to the output;
output according to the following jth hidden layer Wherein, the hidden state h of the 1 st hidden layer 1 Is->Sigma is the activation function of the 1 st hidden layer, l is the total number of inputs, b t Is the bias of the 1 st hidden layer, w t Is the weight of the connection layer, sigma j Is a nonlinear activation function with a leachable parameter theta for the j-th hidden layer, b j Is the bias of the j-th hidden layer, w j Is the weight of the j-th connecting layer;
will consist of the output of all hidden layersIs input into the full-connection layer to obtain a predicted output sequence with the length of l
7. A bi-directional data-driven marine vessel trajectory prediction system based on an attention mechanism, comprising:
a data acquisition module for acquiring an observation sequence with a forward length of g from the AIS data setAnd a viewing sequence having a backward length g +.>A data processing module for processing the observation sequence +.>Inputting into a first machine learning model to obtain an intermediate prediction sequence with length of l>And is used to observe the sequence->Inputting into a second machine learning model to obtain an intermediate prediction sequence with length of l>A data stitching module for mid-prediction sequence of the first machine learning model>And intermediate prediction sequence of the second machine learning model +.>Splicing to obtain composite training data;
and the result prediction module is used for taking the composite training data as the input of a third machine learning model to obtain a predicted result.
8. The attention-based bi-directional data driven vessel trajectory prediction system of claim 7, wherein: a bi-directional data driven vessel trajectory prediction method based on an attention mechanism as claimed in any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310583916.3A CN116629115A (en) | 2023-05-23 | 2023-05-23 | Bidirectional data driving ship track prediction method and system based on attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310583916.3A CN116629115A (en) | 2023-05-23 | 2023-05-23 | Bidirectional data driving ship track prediction method and system based on attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116629115A true CN116629115A (en) | 2023-08-22 |
Family
ID=87612828
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310583916.3A Pending CN116629115A (en) | 2023-05-23 | 2023-05-23 | Bidirectional data driving ship track prediction method and system based on attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116629115A (en) |
-
2023
- 2023-05-23 CN CN202310583916.3A patent/CN116629115A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109410575B (en) | Road network state prediction method based on capsule network and nested long-time memory neural network | |
Tang et al. | A model for vessel trajectory prediction based on long short-term memory neural network | |
Han et al. | Convective precipitation nowcasting using U-Net model | |
Wang et al. | Multi-vehicle collaborative learning for trajectory prediction with spatio-temporal tensor fusion | |
CN109214452B (en) | HRRP target identification method based on attention depth bidirectional cyclic neural network | |
EP4180892A1 (en) | Self-aware visual-textual co-grounded navigation agent | |
Choi et al. | Real-time significant wave height estimation from raw ocean images based on 2D and 3D deep neural networks | |
CN111832615A (en) | Sample expansion method and system based on foreground and background feature fusion | |
CN115829171B (en) | Pedestrian track prediction method combining space-time information and social interaction characteristics | |
CN112785077A (en) | Travel demand prediction method and system based on space-time data | |
CN115690153A (en) | Intelligent agent track prediction method and system | |
CN117494871A (en) | Ship track prediction method considering ship interaction influence | |
CN114913434B (en) | High-resolution remote sensing image change detection method based on global relation reasoning | |
CN115204032A (en) | ENSO prediction method and device based on multi-channel intelligent model | |
CN114152257A (en) | Ship prediction navigation method based on attention mechanism and environment perception LSTM | |
CN114511710A (en) | Image target detection method based on convolutional neural network | |
CN117743795A (en) | Multi-ship track prediction method and system based on dynamic space-time refinement network | |
Madhukumar et al. | Consensus forecast of rainfall using hybrid climate learning model | |
CN113887330A (en) | Target detection system based on remote sensing image | |
CN117275222A (en) | Traffic flow prediction method integrating one-dimensional convolution and attribute enhancement units | |
Chaganti et al. | Predicting Landslides and Floods with Deep Learning | |
CN116629115A (en) | Bidirectional data driving ship track prediction method and system based on attention mechanism | |
CN116503700A (en) | Track prediction method and device, storage medium and vehicle | |
CN115880660A (en) | Track line detection method and system based on structural characterization and global attention mechanism | |
CN114140524B (en) | Closed loop detection system and method for multi-scale feature fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |