CN113569921A

CN113569921A - Ship classification and identification method and device based on GNN

Info

Publication number: CN113569921A
Application number: CN202110766734.0A
Authority: CN
Inventors: 李湉雨; 胥辉旗; 曾维贵; 张润萍; 程永茂; 刘亮; 刘明刚; 杨利斌
Original assignee: Coastal Defense College Of Naval Aviation University Of Chinese Pla
Current assignee: Coastal Defense College Of Naval Aviation University Of Chinese Pla
Priority date: 2021-07-07
Filing date: 2021-07-07
Publication date: 2021-10-29

Abstract

The invention provides a ship classification and identification method and device based on GNN, wherein the method comprises the following steps: extracting the characteristics of ship AIS data, and constructing a sample total set, wherein the sample total set is a three-dimensional matrix; converting the sample collection into graph structure data, and dividing the sample collection into a training set and a testing set; the GNN network model is trained through a training set, the characteristics of ship AIS data of all samples to be tested in a test set are input into the trained GNN network model to test the validity of the GNN network, the GNN network passing the test is used for classifying ships to be classified, and the GNN network model is a GNN neural network model with two layers of graph convolution layers. According to the scheme of the invention, the ship track is utilized, the spatial characteristics can be effectively extracted for machine learning, and the accuracy of classification and identification of the ship track can be improved.

Description

Ship classification and identification method and device based on GNN

Technical Field

The invention relates to the field of pattern recognition, in particular to a ship classification recognition method and device based on GNN.

Background

Ship classification has wide application in both military and civilian fields, such as detection of illegal ships by relevant departments, vigilance of terrorism at sea, fighting of smuggling, and the like. At present, the method for researching ship type classification at home and abroad mainly takes traditional radar identification and optical identification as main parts, but has limitations, for example, the optical identification depends on video monitoring equipment, the visual field range is limited, the action distance is short, the influence of weather factors such as rain and fog is easy, and the limitation is large under weather conditions such as high humidity and low cloud at sea. Although the radar identification is slightly influenced by the environment, the problems of visibility and indistinguishability exist, and co-frequency interference clutter is easily generated in a complex electromagnetic environment. The scheme of ship classification and identification according to the AIS is little affected by weather, can automatically identify the state of the ship in all weather, has high data acquisition precision, and can acquire static data such as the voyage number and the attributes of the ship, so the AIS has important significance for ship classification and identification. The AIS data has the characteristics of large data volume and wide coverage area, and certain challenges are brought to classification and identification.

The traditional research method for ship classification mainly comprises a clustering algorithm based on the distance between track points, a machine learning algorithm after manually extracting features and a neural network classification method. In recent years, compared with a method combining artificial feature extraction and machine learning, a deep neural network gradually becomes a research hotspot. At present, the neural networks for classifying ship tracks mainly comprise traditional cyclic neural networks and convolutional neural networks such as CNN, MCDCNN, 1DCNN and RNN, but data processed by the convolutional neural networks are in a matrix form, and belong to Euclidean structures on the basis of a matrix formed by arranging samples, if the characteristics of the samples are regarded as vertexes, the vertexes of the traditional neural networks are independent, and the connection between the vertexes is not utilized; the recurrent neural network is modeled based on a time sequence, and has the defects that the sample characteristics are insufficient, the connection between the characteristics can not be recommended by using different samples, the characteristic learning is incomplete, and the classification result is not ideal.

Disclosure of Invention

In order to solve the technical problems, the invention provides a ship classification and identification method and device based on GNN, and the method and device are used for solving the technical problems that in the prior art, data relation is not utilized, and the classification result is not ideal enough.

According to a first aspect of the present invention, there is provided a GNN-based vessel classification identification method, the method comprising the steps of:

step S101: extracting the characteristics of ship AIS data, and constructing a sample total set, wherein the sample total set is a three-dimensional matrix; converting the sample collection into graph structure data, and dividing the sample collection into a training set and a testing set;

step S102: training a GNN network model by a training set, inputting the characteristics of ship AIS data of all samples to be tested in a test set into the trained GNN network model to test the validity of the GNN network, and classifying ships to be classified by using the GNN network passing the test;

each track is taken as a sample, and the total set of samples is a three-dimensional matrix; the first dimension of the three-dimensional matrix is the track number S ═ S of the AIS data₁，…，S_i，…，S_num}; the second dimension is the trajectory S of the AIS data_iThe number of upper trace points N; the third dimension is the attribute of each track point, including IMO, h, v, t-stamp, lat and lon; wherein IMO is ship IMO code, h is ship fore direction characteristic, v is speed, t-stamp is timestamp, lat is track point latitude value, lon is track point longitude value;

converting the sample collection into graph structure data G (V, Edge), wherein V is a vertex, and Edge is an Edge connecting the vertex; the chart structure data takes the bow-direction characteristic h of the track point as a vertex characteristic to construct a vertex characteristic matrix M; calculating the weight of the edge connecting the vertexes according to the speed characteristic v, and constructing an adjacency matrix B;

the GNN network model is a GNN neural network model with two graph convolution layers.

According to a second aspect of the present invention, there is provided a GNN-based vessel classification identifying apparatus, the apparatus comprising:

a feature acquisition module: the method comprises the steps of configuring to extract features of ship AIS data and constructing a sample total set, wherein the sample total set is a three-dimensional matrix; converting the sample collection into graph structure data, and dividing the sample collection into a training set and a testing set;

a classification module: training a GNN network model by a training set, inputting the characteristics of ship AIS data of all samples to be tested in a test set into the trained GNN network model to test the validity of the GNN network, and classifying ships to be classified by using the GNN network passing the test;

According to a third aspect of the present invention, there is provided a GNN-based vessel classification recognition system, comprising:

a processor for executing a plurality of instructions;

a memory to store a plurality of instructions;

wherein the instructions are configured to be stored by the memory and loaded and executed by the processor to perform the GNN based vessel classification identification method as described above.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium having a plurality of instructions stored therein; the plurality of instructions for loading and executing by the processor the GNN based vessel classification identification method as described above.

According to the scheme of the invention, because the traditional neural network can only process the Euclidean structures which are regularly arranged, the characteristic relation among different vertexes cannot be effectively utilized, the ship track has the time-space domain characteristics, and the sample characteristics belong to the non-Euclidean structures which are irregularly arranged. According to the method, the connection relation information between the track points and the track points is extracted by using the time-space domain characteristics, such as position characteristics, distance characteristics, speed characteristics and the like, included in the ship track, a topological correlation network is established, and the space characteristics can be effectively extracted for machine learning. Firstly, mapping of track point data and vertexes and edges in a graph structure is established, vertex key features are extracted, weights of opposite sides are assigned to establish an adjacency matrix, the track point data are changed into the graph data structure, GNN is input to train, the method for classifying and identifying the ship types overcomes the defects, and the accuracy of classifying and identifying the ship tracks can be improved.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention. In the drawings:

FIG. 1 is a flow chart of the GNN-based ship classification identification method according to the present invention;

FIG. 2 is a graph data diagram of one embodiment of the present invention;

fig. 3 is a schematic structural diagram of a GNN network model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of GNN network model training according to an embodiment of the present invention;

fig. 5 is a block diagram of a GNN-based ship classification recognition apparatus according to an embodiment of the present invention.

Detailed Description

First, a GNN-based ship classification recognition method according to an embodiment of the present invention will be described with reference to fig. 1. As shown in fig. 1, the method comprises the steps of:

Graph Neural Networks (GNNs) are a deep learning model for processing Graph data, can effectively utilize feature connections among different samples, and are popular in the fields of social Networks, knowledge maps, molecular chemistry and the like.

As shown in FIG. 2, the Graph (Graph) is composed of edges (edges) and vertices (vertices), where the edges are denoted by e and the vertices are denoted by v. In the figure, each vertex contains respective features, the features of the vertices can be represented by a matrix M with X Y dimensions, edges represent the relationship between the respective vertices, and a matrix B with X dimensions can be formed and is called an adjacent matrix. M and B are the input of the graph neural network model.

Before the step S101, a step S100 is included;

the step S100: determining the proportion of a training set and a test set in a sample total set;

in this embodiment, the ratio of the data in the training set and the test set is 8: 2, the training set is used to train the network model, and the test set is input into the network to classify the data to be tested.

The step S101: extracting the characteristics of ship AIS data, and constructing a sample total set, wherein the sample total set is a three-dimensional matrix; converting the sample collection into graph structure data, and dividing the sample collection into a training set and a testing set, including:

the vessel IMO encodes the same AIS data, having at least one trajectory S_iEach locus S_iHaving a plurality of track points, each track S_iThe trace points in the sequence are arranged according to the time stamp sequence; from each track S_iAcquiring continuous N track points which accord with a preset rule; the continuous N track points form a track S_i', extracting the trajectory S_i' the attributes of each track point comprise IMO, h, v, t-stamp, lat and lon; wherein IMO is ship IMO code, h is ship fore direction, v is speed, t-stamp is time stamp, lat is track point latitude value, lon is track point longitude value, wherein,i is more than or equal to 1 and less than or equal to Num, Num is the total number of tracks of the same IMO, and the extracted AIS data is characterized by a three-dimensional matrix M_Num*N*6And extracting features of all samples, constructing a sample total set, and dividing the sample set into a training set and a testing set.

In this embodiment, the preset rule is: and setting a preset time interval threshold, wherein the time interval of every two adjacent track points in the N continuous track points is smaller than the preset time interval threshold.

Because port AIS data acquisition time interval is irregular, some adjacent track points are spaced for several seconds, some are spaced for several minutes even ten minutes, the longer the time interval of data acquisition, the worse the sample quality, therefore it is necessary to select a proper and as small as possible time interval threshold TT (time threshold) to ensure data reliability. The number N of the trace points contained in the data segment determines the classification and identification accuracy, and the more N, the higher the identification accuracy, so that the proper N is selected as much as possible, and the effectiveness of the sample is ensured. Generally, the larger the time interval threshold, the larger the number of trace points in the sample segment, i.e. the sample reliability and validity is a pair of spears, so it is necessary to find a balance state, which makes the sample have more N on the premise of smaller TT. Through tests, in this embodiment, after many experimental comparisons, the parameter TT is 20 (in seconds), and N is 160, which can meet the requirement.

Converting ship feature data into topological structure diagram data, and constructing an adjacency matrix by using feature description vertexes and edges. Only by selecting proper characteristic data as the input of the neural network, the effectiveness of ship track classification can be improved. In the embodiment, the ship heading characteristic is used as a vertex characteristic, and the speed characteristic is used as the weight of the edge to construct the adjacency matrix.

In this embodiment, one trace is taken as one sample, and one sample is converted into one vertex of the graph structure.

Converting the sample collection into graph structure data, including:

step S1011: determining the receptive field of the vertices in the graph structure, comprising:

calculating the average Ha's distance between the samples to be measured

Wherein,

is a track S_iThe nth locus point and the locus S_jThe hardship distance between the nth trace points.

Sorting the Hash distances in the order from small to large, reserving the data with the Hash distance value of the first 5% to ensure the strong connection relation between the vertexes, and setting a relation strength threshold, for example, setting the data exactly equal to the 5 th% as the relation strength threshold. And (4) expressing the strong connection relation as 1 and the weak connection relation as 0, and constructing a relation matrix R based on the spatial distance connection strength characteristics to determine the receptive field of the vertex. The dimension of the relation matrix I is X × X, where X represents the number of samples, i.e., the number of vertices of the graph.

Where Thr represents the relationship strength threshold,

is the average hardship distance between samples to be measured.

In this embodiment, a point with a small hardship distance indicates that the distance characteristic connection relationship between vertices is strong, and a point with a large hardship distance indicates that the distance characteristic connection relationship between vertices is weak.

Step S1012: calculating the two-norm of the average navigational speed difference of any two samples according to the average navigational speed ave _ v of all track points in the samples, taking the two-norm as the weight of the edge in the graph structure to obtain a weight matrix E, wherein the dimension of the weight matrix E is X X X,

wherein

Is a track S_iThe average speed of the flight of the aircraft,

represents the track S_jAverage speed of flight.

Step S1013: constructing an adjacent matrix B based on the weight matrix E:

multiplying the point of a relation matrix R based on the space distance connection strength characteristics by a weight matrix E of edges to obtain an adjacent matrix B with the dimension of X multiplied by X,

B＝R·E

and normalizing the adjacency matrix B:

where min (B) is the minimum value in matrix B, max (B) is the maximum value in the matrix, and B (i, j) is the normalized adjacency matrix;

step S1014: and extracting the bow-direction characteristics of the track points as vertex characteristics, and constructing a vertex characteristic matrix M with the dimension of X multiplied by 1.

The step S102: training a GNN network model by a training set, inputting the characteristics of ship AIS data of all samples to be tested in a test set into the trained GNN network model to test the validity of the GNN network, and classifying ships to be classified by utilizing the GNN network passing the test, wherein the steps comprise:

the GNN network model is a GNN neural network model with two graph convolution layers, and further comprises the following steps:

the GNN network structure in this embodiment is shown in fig. 3, where dots represent vertices of the graph and have different labels, and the input graph structure data passes through the GNN network structure and outputs classification results with different labels.

In this embodiment, the input/output relationship of the first layer graph convolution layer is as follows:

in the formula, h_jIs the eigenvalue of the vertex j in the input data, h_iFor the eigenvalues of the output data vertices i, σ is the activation function, n_j∈Neigh(n_i) Denotes the field of view, W, of vertex i_1τIs the convolution kernel of the first layer map convolution layer,

normalized for laplace matrix:

LAPRAS[i,j]＝A^-1/2B_ijA^1/2

b is a normalized adjacent matrix, and I is a unit matrix; a is a degree matrix of B, and the formula is A_ij＝∑_jB_ij。

The input and output relationship of the second layer graph convolution layer is as follows:

in the formula, h_jIs the eigenvalue of the vertex j in the input data, h_iFor the eigenvalues of the output data vertices i, σ is the activation function, n_j∈Neigh(n_i) Denotes the field of view, W, of vertex i_2τA convolution kernel for the second layer of map convolutional layer,

normalization for laplace matrix;

LAPRAS[i,j]＝A^-1/2B_ijA^1/2

Is the addition of the normalized adjacency matrix and the identity matrix,

is composed of

Wherein the sum of each column of elements forms a diagonal matrix,

for the ith row element of the diagonal matrix,

representing the ith row and ith column elements of the diagonal matrix,

to the minus half power of the diagonal matrix,

to the power of half of the diagonal matrix.

As shown in fig. 4, the GNN network model training process includes:

step S301: acquiring a training set in a total sample set, selecting AIS sample data of all types of ships in the training set, and extracting the characteristics of the AIS data of each sample data;

in this embodiment, 80% of the total sample set is used as the training set, and 20% is used as the test set.

Wherein the extracting the characteristics of the AIS data of each sample data includes:

the vessel IMO encodes the same AIS data, having at least one trajectory S_iEach locus S_iHaving a plurality of track points, each track S_iThe trace points in the sequence are arranged according to the time stamp sequence; from each track S_iAcquiring continuous N track points which accord with a preset rule; the continuous N track points form a track S_i', extracting the trajectory S_i' the attributes of each track point comprise IMO, h, v, t-stamp, lat and lon; wherein IMO is ship IMO code, h is ship heading, v is speed, t-stamp is timestamp, lat is track point latitude value, lon is track point longitude value, i is more than or equal to 1 and less than or equal to Num, Num is total track number of the same IMO, and the extracted AIS data is characterized by a three-dimensional matrix M_Num*N*6。

Step S302: inputting the characteristics of the AIS data of each sample data in the training set into the GNN network until the preset conditions for stopping training are met, and obtaining a trained GNN network;

step S303: inputting the characteristics of the ship AIS data of all samples to be tested in the test set into the trained GNN network model for vertex classification so as to test the validity of the GNN network, and classifying the ships to be classified by using the GNN network passed by the test.

In this embodiment, the training process of the neural network is to train a convolution kernel.

Further, before step S100, ship AIS data preprocessing is performed, including:

step S1: constructing a ship characteristic information table, comprising: constructing a ship feature database, extracting 6 fields of IMO, timestamp, bow direction, navigational speed, track point latitude and track point longitude from AIS data as values, taking the IMO of the ship as a primary key value, namely saving the track feature of each ship according to the IMO number, and arranging the track point data of each IMO according to the timestamp sequence;

step S2: data cleaning is carried out on ship AIS data, and the method comprises the following steps:

and discarding dirty data meeting the data cleaning conditions through data analysis, wherein the dirty data comprises abnormal position data and redundant position data, the abnormal position data refers to that the distance difference between the two adjacent track point data is greater than a first preset distance threshold when the time interval is smaller than a first preset time interval, and the redundant position data refers to that the characteristic attributes of the two adjacent track point data are completely the same.

The data analyzed and processed in this embodiment is ship track data in a certain fixed sea area, and dirty data needs to be discarded in order to achieve an ideal classification effect.

The judgment strategy is as follows:

1. and for the data with the same key, calculating the time interval and the distance interval between the (i + 1) th track point and the ith track point, wherein the distance calculation formula is a Haverine formula in consideration of the curvature of the earth.

The distance obtained by calculation with Haversene formula is called Ha's distance for short.

Wherein l represents the distance between two tracing points, and R represents the earth radius, which is generally 6371 Km; x is the number of_lat1Denotes x₁Latitude of the point, y_lon1Denotes y₁Longitude of point, x_lat2Denotes x₂Latitude of the point, y_lon2Denotes y₂The longitude, φ, of a point is the input to the Haverine equation.

And if the Ha's distance between two track points in a short time interval is too large, the (i + 1) th to the nth track points of the IMO ship are abnormal data and need to be discarded, wherein n represents the number of all track points of the IMO ship in the same time window.

2. If the (i + 1) th track point is completely the same as the ith track point, the (i + 1) th track point is redundant data and needs to be discarded.

An embodiment of the present invention further provides a GNN-based ship classification and identification apparatus, as shown in fig. 5, the apparatus includes:

The embodiment of the invention further provides a GNN-based ship classification and identification system, which comprises:

a processor for executing a plurality of instructions;

a memory to store a plurality of instructions;

The embodiment of the invention further provides a computer readable storage medium, wherein a plurality of instructions are stored in the storage medium; the plurality of instructions for loading and executing by the processor the GNN based vessel classification identification method as described above.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a physical machine Server, or a network cloud Server, etc., and needs to install a Windows or Windows Server operating system) to perform some steps of the method according to various embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and any simple modification, equivalent change and modification made to the above embodiment according to the technical spirit of the present invention are still within the scope of the technical solution of the present invention.

Claims

1. A GNN-based ship classification identification method is characterized by comprising the following steps:

2. The GNN-based vessel classification identifying method according to claim 1, wherein the converting the sample collection into graph structure data comprises:

calculating the average Ha's distance between the samples to be measured

Wherein,

is a track S_iThe nth locus point and the locus S_jThe Hawthorn distance between the middle nth track points;

setting a relation strength threshold, sequencing the Hash distances from small to large, wherein the distance characteristics between the representing vertexes with the Hash distances smaller than the relation strength threshold are in a strong connection relation, and the distance characteristics between the representing vertexes with the Hash distances larger than the relation strength threshold are in a weak connection relation; expressing the strong connection relation by 1 and the weak connection relation by 0, and constructing a relation matrix R based on space distance connection strength characteristics to determine the receptive field of a vertex; the dimension of the relationship matrix I is X × X, X representing the number of samples, i.e. the number of vertices of the graph:

where Thr represents the relationship strength threshold,

the average Ha's distance between samples to be measured;

wherein

Is a track S_iThe average speed of the flight of the aircraft,

represents the track S_jAverage speed of flight of;

step S1013: constructing an adjacent matrix B based on the weight matrix E:

B＝R·E

and normalizing the adjacency matrix B:

3. The GNN-based vessel classification and identification method according to claim 2, wherein before the step S101, the vessel AIS data is subjected to data cleaning, which includes:

and discarding dirty data meeting the data cleaning condition, wherein the dirty data comprises abnormal position data and redundant position data, the abnormal position data refers to that the distance difference between the two adjacent track point data is greater than a first preset distance threshold when the time interval is smaller than a first preset time interval, and the redundant position data refers to that the characteristic attributes of the two adjacent track point data are completely the same.

4. A GNN-based vessel classification and identification apparatus, the apparatus comprising:

5. The GNN-based vessel classification and identification apparatus according to claim 4, wherein the feature obtaining module comprises:

receptive field determination submodule: configured to determine a receptive field for a vertex in a graph structure, comprising:

calculating the average Ha's distance between the samples to be measured

Wherein,

where Thr represents the relationship strength threshold,

the average Ha's distance between samples to be measured;

a weight matrix acquisition submodule: the method is configured to calculate the two-norm of the average navigational speed difference of any two samples according to the average navigational speed ave _ v of all track points in the samples, the two-norm is used as the weight of an edge in a graph structure to obtain a weight matrix E, the dimensionality of the weight matrix E is X multiplied by X,

wherein

Is a track S_iThe average speed of the flight of the aircraft,

represents the track S_jAverage speed of flight of;

adjacency matrix acquisition submodule: and the method is configured to construct an adjacency matrix B based on the weight matrix E:

B＝R·E

and normalizing the adjacency matrix B:

track point extraction submodule: and constructing a vertex characteristic matrix M with the dimensionality of X multiplied by 1 by using the bow-direction characteristics configured to extract the track points as vertex characteristics.

6. A GNN-based vessel classification identifying apparatus according to claim 6, wherein said apparatus comprises:

the data cleaning module is configured to discard dirty data meeting data cleaning conditions, wherein the dirty data comprises abnormal position data and redundant position data, the abnormal position data refers to the condition that the distance difference between two adjacent track point data is larger than a first preset distance threshold when the time interval is smaller than a first preset time interval, and the redundant position data refers to the condition that the characteristic attributes of the two adjacent track point data are completely the same.

7. A GNN-based vessel classification recognition system comprising:

a processor for executing a plurality of instructions;

a memory to store a plurality of instructions;

wherein the plurality of instructions are to be stored by the memory and loaded and executed by the processor to perform the GNN based vessel classification identifying method according to any of claims 1-3.

8. A computer-readable storage medium having stored therein a plurality of instructions; the plurality of instructions for loading and executing by a processor the GNN based vessel classification identification method according to any of claims 1-3.