CN115063676A

CN115063676A - Ship target classification method based on AIS data

Info

Publication number: CN115063676A
Application number: CN202210594360.3A
Authority: CN
Inventors: 王宇君; 郭健; 李可欣; 李宗明; 缪坤; 陈辉; 徐立
Original assignee: Information Engineering University of PLA Strategic Support Force
Current assignee: Information Engineering University of PLA Strategic Support Force
Priority date: 2022-05-27
Filing date: 2022-05-27
Publication date: 2022-09-16

Abstract

The invention relates to a ship target classification method based on AIS data, and belongs to the technical field of track classification methods. According to the method, a first input feature vector and a second input feature vector for ship target classification are constructed, the first input feature vector and the second input feature vector are simple, can be directly obtained from AIS data, and can give consideration to time-space features, so that human intervention and complex feature engineering are avoided; and then, a ship target classification model of a CNN-BiGRU combined model is adopted, the obtained first input characteristic vector and second input characteristic vector are respectively input into the CNN model and the BiGRU model, and the CNN-BiGRU combined model is combined with space-time characteristics to classify and identify the ship target.

Description

Ship target classification method based on AIS data

Technical Field

The invention relates to a ship target classification method based on AIS data, and belongs to the technical field of track classification methods.

Background

The trajectory classification is always an important research on traffic engineering and traffic geography. The rapid updating of technologies such as mobile internet, satellite positioning and the like and the wide popularization of the '21 st century maritime silk road' advocate accelerate the vigorous development of global maritime transport business, and the ship track data is increased day by day, wherein a novel open ship data transmission system AIS is an important source of maritime ship track data. The classification of the ship targets based on the trajectory data can be used for analyzing the motion characteristics and rules of different ships, helps to mine the internal relation among the ships, and has important application values in the aspects of identifying abnormal ships, providing decision support for shipping analysis and ship scheduling, promoting the development of intelligent marine traffic and the like. At present, the problem is mainly researched by manually extracting multi-dimensional features from a ship track section and constructing a single model based on the multi-dimensional features to mine shallow spatial information in the features so as to realize ship target classification. However, the sea water area is wide, the types and structures of ships tend to be diversified, and the classification of the ships includes the space dependence and the time dependence of the ship target motion, and is a relatively complex nonlinear mathematical model. Therefore, how to simplify such a complex ship target classification task and construct an effective classification model is one of the main challenges in this field.

In early research, the trajectory is classified mainly by manually extracting features and applying a conventional Machine learning method, and common methods include a Decision Tree model (DT), a Support Vector Machine (SVM), a Tree-based integrated learning model, and the like. Some scholars have conducted intensive research on classification of ship targets based on AIS data. The method comprises the steps that a fishing boat operation mode is classified and identified by extracting speed and course characteristics and applying a BP neural network model; it has also been proposed that cargo ships and fishing ships can be classified by extracting 17-dimensional ship motion characteristics and using logistic regression models; on the basis of extracting the multi-dimensional ship motion characteristics, adding geographic characteristics and ship sizes as auxiliary characteristics, and classifying ships by using a Random Forest model (RF); and classifying the multiple types of ships by extracting 119-dimensional ship motion characteristics and applying an XGboost model. On one hand, the method depends on the characteristic space which is constructed manually and is fussy, and the classification result is easily influenced by subjective cognition; on the other hand, the traditional machine learning method only considers the spatial dependence, neglects the influence of the time characteristics on the classification of the ship target, and cannot process the complex nonlinear relation of AIS data due to the shallow structure, so that certain limitation exists.

In the context of geospatial artificial intelligence (GeoAI), technological advances in the AI field have brought new opportunities for the intelligent development of geospatial-related field research. The deep learning method does not need to extract complex features, can better fit the problem of nonlinearity, and is gradually used by learners to solve the problem of trajectory classification in recent years. There are spatial dimension features for mining AIS data through Convolutional Neural Networks (CNN) to classify fishing, passenger, cargo and oil tankers. There is a time dimension feature for extracting AIS data through a Recurrent Neural Network (RNN) to classify high-speed ships, oil tankers, passenger ships, sailing ships, and fishing ships. The above methods consider spatial dependency and temporal dependency, respectively, but the models are discrete, i.e. learning of spatial and temporal features is not taken into account. The method comprises the steps of constructing a distribution characteristic vector and a time sequence characteristic vector which are respectively used for a 1-dimensional convolutional neural network and a Long Short Term Memory (LSTM), and then fusing ship target classification results of the distribution characteristic vector and the Long Short Term Memory (LSTM) through weighting voting.

Disclosure of Invention

The invention aims to provide a ship target classification method based on AIS data, and aims to solve the problem of low classification accuracy of the existing ship target classification method based on AIS data.

The invention provides a ship target classification method based on AIS data for solving the technical problems, which comprises the following steps:

1) acquiring AIS data, preprocessing the AIS data, extracting the speed, the course, the bow direction and the acceleration from the AIS data to construct a first input feature vector for learning the relation and the overall features between the parts of each track segment; extracting the speed, the course, the bow direction and the time interval as second input feature vectors;

2) constructing a ship target classification model, wherein the ship target classification model comprises a CNN model, a BiGRU model, a fusion layer and a full connection layer; the CNN model is used for processing the first input feature vector to obtain high-level features representing spatial information; the BiGRU model is used for processing the second input feature vector to obtain high-level features representing time sequence information; the fusion layer is used for fusing and summarizing the obtained high-level features representing the spatial information and the high-level features representing the time sequence information; and the full connection layer calculates the distribution probability of each type of ship target according to the fused and summarized features so as to realize the classification of the ship targets.

The method comprises the steps of firstly, constructing a first input feature vector and a second input feature vector for ship target classification, wherein the first input feature vector and the second input feature vector are simple, can be directly obtained from AIS data, and can also take the time-space feature into consideration, so that the manual intervention and the complex feature engineering are avoided; and then, a ship target classification model of a CNN-BiGRU combined model is adopted, the obtained first input characteristic vector and second input characteristic vector are respectively input into the CNN model and the BiGRU model, and the CNN-BiGRU combined model is combined with space-time characteristics to classify and identify the ship target.

Further, the extraction process of the first input feature vector in step 1) is as follows:

segmenting the acquired AIS data to obtain a track segment corresponding to each ship;

and acquiring the navigational speed, the heading and the bow direction of each track point in the track segment according to the set track segment length, and calculating the acceleration of the corresponding track point according to the speed and the time interval of the adjacent track points.

In order to facilitate later CNN model training, the lengths of all track segments are limited to be fixed, and meanwhile, the acquired first input feature vector can learn the relation and the overall features between the local parts of the track segments.

The extraction process of the second input feature vector in the step 1) is as follows:

and acquiring the navigational speed, the course and the ship heading direction of each track point in the track segment according to the set track segment length, and acquiring the time interval of adjacent track points.

In order to facilitate later-stage BiGRU type training, the lengths of all track sections are limited to be fixed, and meanwhile, the obtained second input feature vector can help a subsequent model to mine rules and characteristics of motion information of different ships in time sequence, and the correlation learning of the model on the change of time and the motion information is enhanced.

Further, when the AIS data is preprocessed, the AIS data is cleaned, and the cleaning includes deleting key field missing, time repetition and track point records exceeding a set range.

According to the invention, the AIS data is cleaned, so that the problem of low classification precision caused by noise of the AIS data is avoided, and an accurate and reliable data source is provided for the classification of subsequent ship targets.

Further, the BiGRU model includes two gating cycle units in opposite directions.

According to the invention, the opposite gate control circulation units in 2 directions are overlapped up and down, so that the long-term dependence characteristics of the track data on the past and the future can be obtained, and the problems of gradient disappearance and gradient explosion existing in the long-term dependence problem of RNN processing sequence data are solved.

Further, the gate control cycle unit comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises an updating gate and a resetting gate, and the updating gate and the resetting gate are used for jointly determining whether the historical information can be reserved and transmitted.

Further, the CNN model comprises an input layer, a convolution layer and a pooling layer, wherein the input layer is used for acquiring a first input feature vector; the convolution layers are a plurality of, the output of each convolution layer is activated by a ReLu function and then used as the input of the next layer, and the pooling layer is arranged behind each convolution layer.

According to the CNN model, ReLu function activation processing is added after each convolution layer, so that rapid convergence can be realized and nonlinear feature learning of a network can be improved; while adding one pooling layer after each convolutional layer in order to reduce the amount of computation and prevent overfitting.

Further, the CNN model further includes a Dropout layer, and the Dropout layer is disposed after the last pooling layer.

The invention can relieve the over-fitting problem of the model by adding the Dropout layer after the last pooling layer, and meanwhile, in order to take account of the precision and the complexity, the Dropout layer is only added after the last pooling layer.

Further, when the ship target classification model is trained, the CNN model is trained, and when the CNN model training is finished, the CNN model and the BiGRU model are integrally trained.

The invention trains the CNN single model first and then trains the whole model, thereby improving the training efficiency and shortening the training time.

Drawings

FIG. 1 is a flow chart of the AIS data based ship target classification method of the present invention;

FIG. 2 is a schematic diagram of a first input feature vector structure constructed according to the present invention;

FIG. 3 is a diagram of a BiGRU model structure in the ship target classification model according to the present invention;

FIG. 4 is a schematic structural diagram of a ship target classification model constructed by the invention;

FIG. 5 is a schematic view of a visualization of selected experimental data in a simulation experiment according to the present invention;

FIG. 6 is an overall structure diagram of a classification model of a ship target selected in a simulation experiment according to the present invention;

FIG. 7-a is a schematic diagram of the variation of the loss value of the training and testing of models of different iteration times in the simulation experiment of the present invention;

FIG. 7-b is a schematic diagram of the accuracy of training and testing of models with different iteration numbers in the simulation experiment of the present invention.

Detailed Description

The following further describes embodiments of the present invention with reference to the drawings.

As shown in fig. 1, the AIS data obtained first is preprocessed, and the speed, heading, bow direction and acceleration are extracted from the preprocessed AIS data to construct a first input feature vector for learning the connection and integral features between the local parts of each track segment; extracting the speed, the course, the bow direction and the time interval as second input feature vectors; and then constructing a ship target classification model, wherein the ship target classification model comprises an optimal Convolutional Neural Network (CNN) model, a BiGRU model, a fusion layer and a full connection layer, and is shown in figure 4. Standardizing the first input feature vector, inputting the first input feature vector into an optimal Convolutional Neural Network (CNN) model, and standardizing the second feature vector, and inputting the second feature vector into a BiGRU model; respectively mining spatial and temporal features contained in the track sequence data by using the two models; fusing the high-level features learned by the two models through a fusion (Merge) layer; and finally, the gathered features are delivered to a full connection layer with an activation function of softmax to calculate the distribution probability of each type of ship target so as to realize the classification of the ship targets.

1. And acquiring AIS data and preprocessing the AIS data.

AIS data is a sequence of samples with time, location, etc. information. The original AIS data comprises 16 fields in each row of records, and the fields mainly comprise ship dynamic information, ship static information and ship navigation information. As shown in table 1, the following information recorded in each row is taken in the present experiment to participate in classification of the ship target.

TABLE 1

Due to the reasons that the crew operates the equipment improperly, the equipment fails, the signal transmission distance is limited and the like, the AIS data is noisy, so that the AIS data needs to be cleaned, namely, track point records with missing key fields, repeated time and exceeding a normal range are deleted, and accurate and reliable data sources are provided for classification of ship targets. On the basis, the AIS data is processed in a segmented mode, a plurality of travel track sections containing ship motion information are obtained, namely, track sections corresponding to each ship are obtained according to MMSI segmentation; then, in order to make the classification problem closer to the real situation, in addition to the track points in the sailing state, the track points in the working states of "engaged in fishing" and "restrained managed operability" (controllability limited) are considered, wherein the controllability limited generally means that the maneuvering characteristics of the operating ship are limited, and a track segment composed of the continuous track points is extracted; and finally, screening track sections containing 30 data points or more to ensure that each section of track contains enough information.

2. An input feature vector is constructed.

The preprocessed AIS data comprises 3 basic space-time information of time, longitude and latitude, and 3 motion information of speed, heading and bow direction. The basic space-time information cannot intuitively express the type of the ship target, and the classification model cannot establish effective mapping from the space-time information to the ship target. Therefore, before the model classifies the ship target, it is necessary to construct an input feature vector and convert the data into a form that the classification model can easily understand. The feature types extracted from the track segment can be divided into track point features, track segment features and self-contained attributes of the application track data. In order to avoid extracting excessive statistics, increase manual calculation amount and depend on personal experience cognition too much, and fully exert the capability of a deep learning model for independently learning basic features and deep-level features in a ship track section, the invention combines track point features extracted from AIS data basic space-time information and 3 types of motion information of AIS data to respectively construct a CNN model input feature vector (namely, a first input feature vector) and a BiGRU model input feature vector (namely, a second input feature vector).

The method selects 4 effective attributes including navigational speed, course, ship heading direction and acceleration to construct the input characteristic vector of the CNN model, and is used for learning the relation and the integral characteristic between the parts of each track segment. The navigation speed, the course and the ship fore direction are motion information of AIS data, so that the acceleration of the track point is calculated only according to the basic time-space information, and the calculation formula is as follows:

wherein a is _n Represents the acceleration of the nth track point in m/s ² ；V _n Representing the speed of the track point, and the unit is m/s; t is t _n Representing the time interval of adjacent track points, and the unit is s; d _n Representing the distance between adjacent track points, and obtaining d by a Haversene spherical distance calculation formula, wherein r is the radius of the earth, and 6371.393km is taken;

Δλ′＝|λ′ _n -λ′ _n-1 |，

the positions of the trace point n and the trace point n-1 are respectively in unit of rad.

CNN modeThe input feature vector of the type is composed of 3 dimensions of height, width and depth (channel). The input layer comprises a set of independent samples D, each sample D _i Represents a ship track segment, comprising navigational speed, acceleration, heading and 4 channels in the direction of the bow. As shown in fig. 2, each channel has a shape of 1 × L, L being the length of each track segment, i.e., the number of AIS track points in the track segment, and thus the input vector has a shape of 1 × L × 4. In order to facilitate later-stage CNN model training, the sizes of input vectors need to be unified, so that the lengths of all track segments are limited to be a fixed size L, longer track segments are segmented, and shorter track segments are filled with zero values.

The BiGRU model is formed by stacking 2 Gate controlled loop units (GRUs) up and down, and aims to acquire long-term dependence characteristics of track data on the past and the future. Therefore, the invention adopts 4 attributes of navigational speed, course, ship heading direction and time interval to construct the time sequence feature vector of the BiGRU. The speed, the course and the heading direction of the ship can change along with the time, so that the model is favorable for mining the rules and the characteristics of motion information of different ships in time sequence; on the basis, the time interval is selected to strengthen the associated learning of the model on the change of the time and the motion information.

The BiGRU input feature vector is formed by 2 dimensions of features and time steps (time), wherein the time steps are time sequence feature vectors of input time, generally depend on the length of a ship track segment, and all track segment lengths are limited to be a fixed size L for simplifying calculation. T is a group of independent samples of a BiGRU input layer, and an input feature vector T formed by single ship track segment samples _i As shown in equation (2).

3. And constructing a ship classification model.

The ship classification model constructed by the invention adopts a CNN-BiGRU model as shown in figure 4, and the main body of the model consists of two parts: (1) extracting spatial features in AIS data through a CNN to depict the spatial dependence of the track segment; (2) the temporal dependencies of the trace segments are characterized by the BiGRU capturing timing features in the AIS data. The CNN-BiGRU model is formed by combining an optimal convolutional neural network model and a bidirectional gating circulating unit, and model structures of the CNN, the BiGRU and the CNN-BiGRU are introduced below respectively.

1) CNN model

The CNN model adopted by the invention comprises an input layer, an output layer, a pooling layer, a full-connection layer and an output layer, wherein the parameters of each layer are configured as follows:

an input layer: put the sample set D into the input layer, use CNN network to every D _i And (5) extracting the spatial features.

And (3) rolling layers: the invention adopts convolution kernel with shape of 1 × 3 × C, where C represents the number of channels of each layer of input vector. The 3-dimensional output shape of each convolution layer is controlled by 3 hyper-parameters, namely the number of convolution kernels (filter) for determining the depth of the output shape; the length S of each stepping of the convolution kernel in the input vector; zero padding for controlling the output layer size; in order to ensure that the input and output shapes of each layer are the same, S-1 and zero value padding are adopted, and the convolution kernel number is adjusted and determined according to actual conditions. In order to realize rapid convergence and improve the nonlinear characteristic learning of the network, the output of each convolution layer needs to be activated by the ReLu function and then used as the input of the next layer.

A pooling layer: to reduce the amount of computation and prevent overfitting, the present invention periodically inserts a Max Pooling layer (Max Pooling) with a convolution kernel shape of 1 × 2 and S ═ 1 between each convolution layer. The output after passing through the maximum pooling layer is shown in equation (3).

In the formula (I), the compound is shown in the specification,

using c for the l-th layer _l The convolution kernel is used for convolution and pooling to obtain output, l is the depth of the CNN model, and is in the range of {1,2,3, … }, c _l ∈{1,2,3，…,C _l }，C _l Is the number of convolution kernels; pool (. cndot.) is a pooling operation; x is the number of _l,a Inputting a vector for the ith feature of the ith layer;

performing convolution operation; an activation function of the relu convolution kernel;

and

respectively the first layer and the second layer _l Weight values and bias vectors for each convolution kernel.

Fully Connected layers (FC): each neuron of the FC is connected to all neurons of the previous layer, and the output data is subjected to a flattening operation by element-by-element multiplication calculation. And (4) as shown in the formula (4), except the last layer of FC, the other FCs are used for extracting features, and the finally extracted high-level features perform a classification task on the last layer of FC, and the number of output neurons is ensured to be the same as the number of classification labels by means of a softmax activation function, so that probability distribution of each type of ship is generated.

Wherein O is _l The high-level characteristics are obtained after flattening operation is carried out on the output of the l-1 layer; y is _CNN Probability distribution vectors for all vessels output for CNN, softmax being the activation function of the convolution kernel, ω _CNN And b _CNN Respectively, the weight value and the bias vector for that layer.

In order to alleviate the overfitting problem of the model, a Dropout layer can be added behind the pooling layer and the fully connected layer for extracting the features, and the Dropout layer can be increased or decreased according to actual conditions and is generally arranged behind the pooling layer.

2) BiGRU model

The GRU can only predict the output of the next moment according to the time sequence information of the past moment, but the output of the current moment is not only related to the past state but also possibly related to the future state, and the bidirectional structure can provide complete 'context' information about each point in the input sequence to the output layer, so the time sequence characteristics in the AIS data are extracted by adopting the bidirectional gating circulation Unit, the structure of the BiGRU designed by the invention is shown in FIG. 3, the parameter configuration of each layer is as follows:

an input layer: putting the sample set T into an input layer, and utilizing a BiGRU network to perform on each T _i And (5) extracting time sequence characteristics.

A BiGRU layer: for each time t, the input is simultaneously provided to two GRUs in opposite directions, and the output is jointly determined by the two unidirectional GRUs; as shown in the right diagram of fig. 3, the GRU is composed of an input layer, a hidden layer and an output layer, wherein the hidden layer includes an Update Gate (Update Gate) and a Reset Gate (Reset Gate), and 2 gates together determine whether history information can be retained and transferred. Time sequence characteristic input matrix T _i The memory information H of the corresponding hidden layer is:

H＝(H ₁ ,H ₂ ,…,H _t ,…,H _L ) (5)

in the formula, H ₁ ～H _L The memory information obtained by the GRU neural network in the 1 st to L time intervals respectively. At time t, from the current input X _t And hidden output of forward state at time t-1

The outputs of the reset gate and the update gate in the GRU neural network can be calculated to be R respectively _t And Z _t The formula is as follows:

R _t ＝σ(X _t W _xr +H _t-1 W _hr +b _r ) (6)

Z _t ＝σ(X _t W _xz +H _t-1 W _hz +b _z ) (7)

wherein σ is an activation function; w _xr And W _xz Selecting weights for the reset gate and the update gate; b _r And b _z The offset vectors are selected for the reset gate and the update gate, respectively. Based on R _t And Z _t Candidate hidden states may be computed

And current hidden state forward output

The formula is as follows:

wherein, tanh is an activation function; w is a group of _xh And b _h For selectively memorizing the current input X _t The selected weights and bias vectors; multiplication of corresponding elements in the operation matrix;

showing that the selective memory of the important information of the current node is performed, (1-Z) _t )⊙H _t-1 Indicating that the otherwise hidden state of unimportant information is selectively forgotten.

Similarly, from the current input X _t Hidden output of backward state at time t +1

Backward output capable of obtaining current hidden state

And finally, splicing the forward output and the backward output of each time step into the final output of the BiGRU layer.

Dropout layer: to reduce the overfitting problem, the present invention adds a Dropout layer for regularization and culls the input and output connections from the neural network with a probability P of 0.5.

Full connection layer: according to the method, the FC is added behind the BiGRU layer for extracting features, and the FC is added at last for classifying, so that the probability distribution of the current time sequence feature vector corresponding to each type of ship is output.

In order to avoid the problem that a single model cannot learn the spatial and temporal characteristics of AIS data simultaneously, on the basis of the model, the invention provides a CNN-BiGRU combined model structure, and as shown in FIG. 4, an optimal convolutional neural network and a bidirectional gating cyclic unit are respectively used as two branches of the combined model. When the combination is carried out, the last full connection layer in the CNN model is removed, and the last full connection layer in the BiGRU model is removed; and on the basis, a fusion layer (Merge) layer and a full connection layer are added.

The specific treatment steps are as follows: firstly, feature vectors D are respectively divided _i And T _i After standardization processing, the data are used as input data of CNN and BiGRU; then, respectively mining spatial and temporal characteristics contained in the track sequence data by using 2 models; secondly, fusing the high-level features learned by the 2 models through a Merge layer; and finally, the gathered features are given to a full connection layer with the activation function of softmax to calculate the distribution probability of each type of ship target.

When the CNN-BiGRU combined model is trained, the CNN model may be trained first, and then the overall training may be performed.

Experimental verification

In order to further illustrate the classification effect of the AIS data-based ship target classification method, experimental verification is carried out on actual AIS data to which the method is applied.

1) Acquiring experimental data and preprocessing

The experiment takes local oceans between the west longitude 126-138 degrees and the north latitude 10-85 degrees on the west side of north america as a research area, experimental data are from AIS data of National Oceanic and Atmospheric Administration (NOAA) 2015 year all the year, and partial data are visualized as shown in FIG. 5. The region mainly includes cargo ships, fishing ships, passenger ships, and tugboats, and thus the ships are classified into the above 4 types.

The AIS raw data comprises 31805155 track point records, and 192226 ship tracks are obtained after preprocessing, wherein the cargo ship, the fishing ship, the passenger ship and the tug respectively account for 33.08%, 19.80%, 20.15% and 26.97% of the total number. Then, a fixed length L ═ 200, a CNN input feature vector consisting of 4 channels of speed, heading, and acceleration and having a shape of 1 × 200 × 4, and a BiGRU input feature vector consisting of 4 channels of speed, heading, and time interval and having a shape of 200 × 4 were constructed, respectively. In order to accelerate the convergence rate of deep learning model training, mean variance normalization processing is carried out on the 2 input feature vectors. And finally, dividing a sample set consisting of 2 input feature vectors into a training set and a testing set according to a ratio of 7:3, and using the training set and the testing set for construction and training of a classifier and verification and evaluation of classification effects.

The experimental operating environment is Ubuntu20.04 Intel i9-10900k, GPU RTX3090, and is based on a Keras deep learning library, and Tensorflow is the rear end. In order to obtain the optimal CNN and further build a CNN-BiGRU model, parameters need to be optimized through model training, and the loss function value is minimized. This experiment used the classified Cross Entropy (Cross-Entropy) as a loss function to calculate the error of the output layer. Adam is a technique well suited for large data sets and parameter optimization and has found wide application in deep learning approaches. Therefore, in the back propagation process, the model parameters are updated by using the default Adam optimizer, learn 0.001, β ₁ ＝0.9,β ₂ ＝0.999,ε＝10 ^-8 . The volume of batch processed data (bath size) was set to 64, while in order to avoid overfitting problems, Early Stopping (Early Stopping) was used in the training process to determine the optimal number of iterations (epoch).

2) Selecting evaluation criteria

The classification effect of the deep learning model is comprehensively evaluated by using the accuracy, the precision, the recall rate, the f-score and the confusion matrix. The accuracy (A) is the ratio of the number of correct classification samples in the test set samples to the total number of all test set samples, and the index can most intuitively evaluate the classification performance of the model. In the multi-classification task, the following 4 classification results are obtained by performing matching analysis on the labels of the test set and the prediction results: true positive, true negative, false positive and false negative. The accuracy (P) is the ratio of the number of the positive samples predicted by the model to the number of the actual positive samples in view of the classification result; the recall (R) is the ratio of the number of samples identified by the correct classification among all positive samples, considered from the sample-by-sample perspective; the F-score (F) is the harmony of the precision rate and the recall rate, and is combined into an evaluation index according to the same importance of the precision rate and the recall rate. The confusion matrix visualizes the above classification effect by an n × n matrix, each column representing the prediction category, the total number of each column representing the number of predicted positive samples, each row representing the actual category, and the total number of each row representing the actual number of positive samples of the category. Is calculated by the formula

In the formula, T _positive Number of true positive samples; f _positive The number of false positive samples is counted; t is _negative The number of true negative samples; f _negative The number of false negative samples.

3) CNN training results

In order to find an optimal CNN model suitable for ship target classification, complex and tedious calculation optimization of hyper-parameters needs to be avoided on one hand, and the experiment gradually increases the number of network layers and the number of convolution kernels according to the classification effect of the model; on the other hand, considering that the neural network is easy to generate an over-fitting problem due to the large weighting amount and the complicated input-output relationship, and Dropout is the most practical and widely applied regularization method for overcoming the over-fitting problem in the CNN, an attempt is made to add Dropout layers to construct an optimal CNN model in the training process.

The various structural configurations of CNN and the corresponding training results are shown in table 2. The number of convolution layers of the models A to C is increased from 2 layers to 6 layers, meanwhile, the number of convolution kernels is increased to capture more deep-level features, and the accuracy of the models is improved by 0.77%. The model D adds a full connection layer to generate an overfitting phenomenon, so that the precision on the test set does not rise or fall. In order to evaluate the effect of the maximum pooling layer, a maximum pooling layer is added behind each group of convolutional layers in the model E, the test precision is improved compared with that of the model D, but the overfitting phenomenon still exists. In order to relieve the over-fitting problem of the model, a Dropout layer is added on the back of the maximum pooling layer and the full connection layer for extracting the features of the model F, so that the test precision is reduced compared with that of the model E, and an under-fitting phenomenon occurs because the model is simplified by using too many Dropout layers, and the classification error is increased due to the loss of a large number of features. Thus, the Dropout layers are properly arranged to balance the over-fit and under-fit problems, and model G removes the Dropout layers from the group 1 and group 2 convolutional layers, with a maximum test accuracy of 78.74%. In order to further verify that the model G is the optimal CNN model, a group of convolution layers and a full connection layer are respectively added to the model H and the model I on the basis of the model G, and although a deeper model is created, the test precision is not improved, but the complexity and the calculation cost of the model are increased, so that the training time is increased. Therefore, the optimal CNN structure is model G, taking classification accuracy and classification efficiency into comprehensive consideration.

TABLE 2

4) CNN-BiGRU training results

Combining the optimal CNN model with the BiGRU model results in the CNN-BiGRU model structure shown in fig. 6. To obtain the optimal number of iterations epoch, epoch is set to 80, and model performance is calculated on the training set and the test set for each iteration. When the accuracy rate of the model increases by less than 0.1% for 10 consecutive times, the training is stopped and the optimal number of iterations is obtained. As shown in fig. 7-a and 7-b, the test accuracy stabilized around 79% -80% after about 19 training rounds, the test loss function value remained around 0.51-0.52, and the highest test accuracy of 80.6% and the minimum loss function value of 0.507 were reached at an epoch 34. In 10 times of training after the model reaches the highest precision, the test precision of the model is almost kept unchanged and is not obviously reduced along with the increase of the iteration times, which shows that the CNN-BiGRU model does not have the over-fitting problem and can better fit the ship target by excavating the space-time characteristics contained in the input vector.

To evaluate the performance of the CNN-BiGRU model, table 3 includes the confusion matrix, classification accuracy, recall, and f-score of the CNN-BiGRU model. From the overall classification condition, the prediction results of various types of ships are distributed along the diagonal line of the confusion matrix, all classification indexes exceed 62%, and the combination model can basically and accurately identify the ship target. From the local classification condition, the cargo ship has the characteristics of maximum sample data volume, fixed air route, constant-speed navigation in most time and the like, and the classification precision and the recall rate are as high as 94.4 percent and 94.1 percent; the characteristics of the passenger ship are similar to those of the cargo ship, the model classification precision can reach 86.6%, but the minimum sample data amount can cause that the combined model can not learn the motion rule of the passenger ship more fully, and the classification recall rate is only 73.7%; compared with the prior 2 ships, the tug boat and the fishing boat have strong maneuvering flexibility, and the invention considers that the track points of the ships in the operation state bring certain difficulty for classification, so the degree of confusion of the tug boat and the fishing boat is high, the classification precision and the recall rate of the fishing boat are 66.6 percent and 62.5 percent at the lowest, and the classification precision and the recall rate of the tug boat with more samples are 71.1 percent and 76.3 percent. The above results show that the classification performance of the CNN-BiGRU combined model has high correlation with the number of sample instances, and more instances can be considered to be added for identifying passenger ships and distinguishing tugs from fishing ships.

TABLE 3

5) Comparative experiment

To further evaluate the feasibility and effectiveness of the CNN-BiGRU model employed in the present invention, a set of comparative experiments were constructed using the same training and testing trajectory segments as the CNN-BiGRU model. On one hand, 4 machine learning methods commonly used for multi-classification tasks are selected for comparison, including K-Nearest Neighbor KNN (K-Nearest Neighbor), SVM, DT and RF, and training and evaluation of all models are based on scinit-leann machine learning library. Because the machine learning method needs to input the manually extracted track segment characteristics, in order to fully cover the motion characteristics of the ship, the invention extracts 5 statistics of the maximum navigational speed, the average navigational speed, the maximum acceleration, the average course change and the total sailing distance of each ship track segment from 4 types of characteristics of navigational speed, acceleration, course and distance as the characteristic space. And (3) searching the optimal parameters of the classifier by using a grid search method and a 5-fold cross validation method on a training set, ensuring that the training is the optimal model and the data can be well fitted. The heaviest parameters of the 4 machine learning models are the number n _ neighbors of the KNN, the maximum depth max _ depth of the penalty coefficient C, DT of the SVM and the number n _ estimators of the decision tree of the RF respectively. And finally, evaluating the machine learning model after the parameters are adjusted on the test set. On the other hand, deep learning models are selected for comparison, and a single deep learning model comprises an LSTM, a CNN-G model of the learning track space characteristics in the combined model and a BiGRU model of the learning track time characteristics; the combined deep learning model constructs a CNN-LSTM model. And inputting the feature vectors which are the same as the combined model, determining the optimal iteration times through a training set and an early-stop method to obtain an optimal model, and evaluating the optimal model on a test set.

Table 4 contains the optimal parameters of each model within the specified range, the accuracy A and the weighted average accuracy of each model

Weighted average recall

And weighted average f-fraction

The classification performance evaluation results on 4 indexes in total. On the whole, 4 evaluation indexes of the CNN-BiGRU adopted by the method are superior to those of other 6 models, and the accuracy rate is nearly 15% higher than that of the other models; locally, the classification effect of the deep learning method is superior to that of the machineCompared with the LSTM model which shows the worst performance in the deep learning method and the RF model which shows the best performance in the machine learning method, the accuracy rate of the LSTM model is 13.3 percent; the classification effect of the combined model is superior to that of a single deep learning model, compared with BiGRU and LSTM which respectively learn the CNN and the time characteristics of the space characteristics of AIS data, the combined model can simultaneously learn the space characteristics and the time characteristics to assist in classification of the ship target, the accuracy is 1.9% higher than the average accuracy of 3 deep learning models, and the classification precision is effectively improved; the classification effect of the CNN-BiGRU is better than that of the CNN-LSTM, and the BiGRU in the combined model can realize bidirectional mining on the AIS track sequence compared with the LSTM, so that the classification effect of the combined model is improved.

From the analysis, the CNN-BiGRU model adopted by the invention is superior to the traditional machine learning algorithm depending on artificial feature engineering in the capability of extracting high-level features through a multilayer nonlinear processing unit, and the combination of the CNN model and the BiGRU model effectively realizes the simultaneous mining of space and time features from AIS data, thereby further improving the classification precision of ship targets.

The method comprises the steps of firstly cleaning AIS data and dividing a ship track into samples with fixed lengths; then, constructing input feature vectors consisting of 4 channels for the CNN and the BiGRU respectively according to the characteristics of each model; secondly, training an optimal CNN model, and combining the optimal CNN and BiGRU to obtain a CNN-BiGRU model; and finally, training a CNN-BiGRU model based on the 2 input feature vectors and classifying the ship target.

Experimental results show that the CNN-BiGRU combined model constructed by the method can effectively realize accurate classification and identification of different ship targets, and is particularly suitable for cargo ships, so that the classification effect is better. Compared with the traditional machine learning method comprising KNN, SVM, DT and RF, on one hand, only simple input feature vectors are required to be constructed based on AIS data, so that artificial intervention and complex feature engineering are avoided, on the other hand, the deep learning method has the advantages of self-learning, extraction of advanced features representing the ship motion rule contained in the AIS data and the like, so that the classification effect of the CNN-BiGRU model on the ship target is superior to that of the machine learning method; compared with the deep learning method comprising the optimal CNN, BiGRU, LSTM and CNN-LSTM, the CNN-BiGRU combined model can be used for classifying and identifying the ship target by combining the space-time characteristics, and the precision of ship target classification is effectively improved.

Claims

1. A ship target classification method based on AIS data is characterized by comprising the following steps:

1) acquiring AIS data, preprocessing the AIS data, extracting the speed, the course, the bow direction and the acceleration from the AIS data to construct a first input feature vector for learning the connection and integral features among the local parts of each track segment; extracting the speed, the course, the bow direction and the time interval as second input feature vectors;

2. The AIS data-based ship target classification method according to claim 1, wherein the extraction process of the first input feature vector in the step 1) is as follows:

3. The AIS data-based ship target classification method according to claim 1, wherein the extraction process of the second input feature vector in the step 1) is as follows:

4. The AIS data-based ship target classification method according to claim 2 or 3, characterized in that when the AIS data is preprocessed, the AIS data is cleaned, and the cleaning comprises deleting missing key fields, time repetition and track point records beyond a set range.

5. The AIS data based vessel object classification method according to claim 1 wherein the BiGRU model includes two gated loop units in opposite directions.

6. The AIS data based ship target classification method according to claim 5 wherein the gated loop unit includes an input layer, a hidden layer and an output layer, the hidden layer including an update gate and a reset gate, the update gate and the reset gate being used together to determine whether historical information can be retained and transferred.

7. The AIS data-based ship target classification method according to claim 1, wherein the CNN model comprises an input layer, a convolutional layer and a pooling layer, the input layer is used for obtaining a first input feature vector; the convolution layers are several, and the pooling layer is arranged behind each convolution layer.

8. The AIS data-based ship target classification method according to claim 7, wherein the CNN model further includes a Dropout layer, and the Dropout layer is disposed after the last pooling layer.

9. The AIS data-based ship target classification method according to claim 1, wherein the ship target classification model is trained by first training a CNN model, and then integrally training the CNN model and the BiGRU model when the CNN model training is finished.