WO2023029461A1 - 海量高维ais轨迹数据聚类方法 - Google Patents

海量高维ais轨迹数据聚类方法 Download PDF

Info

Publication number
WO2023029461A1
WO2023029461A1 PCT/CN2022/083839 CN2022083839W WO2023029461A1 WO 2023029461 A1 WO2023029461 A1 WO 2023029461A1 CN 2022083839 W CN2022083839 W CN 2022083839W WO 2023029461 A1 WO2023029461 A1 WO 2023029461A1
Authority
WO
WIPO (PCT)
Prior art keywords
trajectory
clustering
data
ais
dimensional
Prior art date
Application number
PCT/CN2022/083839
Other languages
English (en)
French (fr)
Inventor
廖泓舟
代翔
王侃
戴礼灿
潘磊
高翔
崔莹
陈伟晴
Original Assignee
西南电子技术研究所(中国电子科技集团公司第十研究所)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 西南电子技术研究所(中国电子科技集团公司第十研究所) filed Critical 西南电子技术研究所(中国电子科技集团公司第十研究所)
Publication of WO2023029461A1 publication Critical patent/WO2023029461A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present invention relates to data clustering technology, and more specifically, relates to a ship automatic identification system AIS track clustering method based on deep embedded clustering, which is used to solve the clustering problem of massive high-dimensional ship AIS track data.
  • Spatio-temporal trajectories are records of the position and time of moving objects.
  • spatio-temporal trajectories are widely used in traffic flow patterns and characteristics research, resource allocation, sea ice monitoring and other fields.
  • the similarity features in the spatio-temporal trajectory data can be obtained, and meaningful trajectory patterns can be found.
  • Ship trajectory data is a kind of spatiotemporal trajectory data, which records the ship's navigation process and corresponding behavior characteristics. With the wide application of automatic ship identification system (AIS) on ships, it is becoming easier to obtain ship trajectory data.
  • AIS automatic ship identification system
  • AIS system is the abbreviation of Automatic Identification System (Automatic Identification System). It is composed of shore-based (base station) facilities and ship-borne equipment. It is a new type of network technology, modern communication technology, computer technology, and electronic information display technology. Digital aids to navigation systems and equipment.
  • the ship automatic identification system AIS can continuously send the relevant information of the ship, and the data can be received through the AIS receiver.
  • the AIS receiving base stations are deployed on land, so that they can only receive ship data within a range of about 60 kilometers near the base station, while satellite AIS installs the AIS receiver on the satellite, so that it can be used without geographical restrictions. Receive AIS information from ships around the world. With the continuous increase of ship traffic density in port waters, the navigation situation of ships in the water area is becoming more and more complex, which also puts forward higher requirements for ship traffic management capabilities.
  • the content of ship trajectory data AIS data mainly includes ship static information, dynamic information and voyage related information.
  • Ship AIS trajectory data is mainly obtained through AIS base stations. During the voyage, the position, speed and other information of the global positioning system are generally directly connected, and the information is transmitted outward through the code of the ship's AIS transmitter, and is received by nearby ships or shore-based AIS receivers.
  • the original AIS data usually has time disorder. Abnormal, missing data, and unequal number of track points, therefore, in order to improve the quality of ship AIS track data, it is necessary to preprocess the data before use.
  • the preprocessing of ship AIS trajectory data generally includes the following aspects:
  • Missing data processing is mainly for the static data in the ship trajectory data, such as ship name, ship width, ship type, etc. For these data, it can be checked through the ship directory or ship database of the maritime administrative department. If the dynamic data is missing, the piece of data is generally treated as error data.
  • the ship trajectory data contains many attributes, but not all the attributes are what we need, we can eliminate unnecessary attributes according to the actual situation of the research, so as to obtain a simplified representation of the data set. For example, when only studying the space information of ship trajectory, only the attributes of ship position and ship name can be retained, and other attributes can be eliminated.
  • the clustering of the ship trajectory clustering method is to divide the objects with similar behavior into the same group, so that the difference within the group is as small as possible, and the difference between groups is as large as possible.
  • the purpose of ship AIS trajectory clustering is to use relevant clustering algorithms to cluster trajectory data, find out trajectory clusters with similar ship motion evolution patterns, reveal the potential relationship between ship trajectories, and analyze the characteristics of ship traffic flow or the behavior of individual ships. Behavior.
  • the essence of the distance-based ship AIS trajectory clustering method is to divide objects according to the similarity of trajectory data, and the result of clustering is often to optimize an evaluation function that represents the clustering quality. Therefore, how to evaluate trajectory data The distance or similarity between them is one of the key issues in clustering.
  • Clustering is an unsupervised data mining method.
  • the original data set is divided into multiple clusters by measuring the similarity between objects.
  • the similarity of objects within a cluster is high, and the similarity of objects between clusters is low.
  • Trajectory data clustering first obtains the similarity between trajectories by analyzing and comparing trajectory feature information, and then classifies trajectories with high similarity into one category.
  • Through the cluster analysis of ship AIS trajectory data it can provide effective support for technologies such as typical route extraction, abnormal trajectory discovery, navigation trajectory prediction and traffic flow analysis, and has important applications for solving ship navigation safety problems and improving the efficiency of entering and leaving ports value.
  • ship AIS trajectory data not only has space-time attributes, but also includes various attribute information such as speed over ground, course over ground, ship heading, navigation status, ship type, etc., with a large amount of data And it has many feature dimensions, which belongs to the typical space-time trajectory big data.
  • the existing ship AIS trajectory clustering method mainly includes two steps: (1) similarity measurement, which is used to measure the similarity between trajectories; (2) clustering, which classifies similar trajectories into one category.
  • the similarity measure is usually measured by the distance between two trajectories, commonly used are Euclidean distance (Euclideandistance), Hausdorff distance (Hausdorff Distance, HD), dynamic time warping distance (Dynamic time warping distance, DTW), Frechet distance (Fréchet distance, FD) etc.
  • Clustering mainly includes partition-based clustering algorithms represented by K-means, hierarchical clustering algorithms represented by BIRCH, grid-based clustering algorithms represented by STING, and spectral clustering represented by SpectralClustering. algorithm, and the density-based clustering algorithm represented by DBSCAN.
  • the distance-based similarity measurement method between trajectories is a commonly used method, among them, the algorithm based on Hausdorff distance, the algorithm based on Longest Common Subsequence (LCSS) and the algorithm based on edit distance (EditDistance, ED) Algorithms are commonly used measurement methods. Due to the shortcomings of the K-Means clustering method, such as the need to specify the number of clusters, the clustering results are often seriously affected by the initial clustering center, etc., the article does not improve these defects, so the clustering results of ship trajectory information be affected by these adverse factors. VRIES et al.
  • DTW and ED use DTW and ED to calculate the similarity of trajectories, and combine the method of trajectory compression, use kernel k-means method to cluster ship AIS trajectories. Both DTW and ED can be used to calculate the The similarity of the trajectory, but the calculation method is different.
  • the DTW strategy is a point-by-point matching method between trajectory points and points.
  • One of the shortcomings brought about by this is that the amount of calculation is large, and it is sensitive to isolated points. At the same time, , when the two trajectories are not similar within a short period, the clustering result will be unsatisfactory.
  • ED can be used to calculate the similarity between trajectories when clustering ship trajectories, and can overcome the gap problem of DTW, it still has the problem of large amount of calculation and sensitivity to abnormal trajectories.
  • a one-way distance-based spectral clustering ship motion pattern identification method which defines the one-way distance as the average value of the minimum distance from each point on one ship’s trajectory to each point on another ship’s trajectory , using the characteristics of one-way distance anti-jamming, construct a ship AIS trajectory similarity measure based on one-way distance, obtain the similarity matrix of the ship AIS trajectory, use the spectral clustering algorithm to learn the spatial distribution of the trajectory, and obtain the normal movement mode of the ship,
  • the distance-based clustering method to cluster ship trajectories has the advantages of simple algorithm and easy implementation, but due to the shortcomings of the distance-based trajectory similarity measurement method, it is still easy to lose the local feature information of the trajectory.
  • MinSpd maximum ship speed change
  • MaxDir MaxDir
  • the neighborhood of the object the speed over the ground, and the course over the ground
  • MaxSpd and MaxDir are adjusted according to the definition of the air route by the International Maritime Organization.
  • Clustering of ship trajectories extracting the main route of ship trajectories, compared with the distance-based ship trajectory clustering method, using DBSCAN and its improved algorithm to cluster ship trajectories, its advantages are mainly manifested in the ability to find ship trajectories of any shape clusters, and it is robust to abnormal ship trajectories, and the structure of trajectory aggregation has nothing to do with the traversal order of sample trajectories.
  • the method based on statistics is based on mature mathematical methods, but it has the disadvantage of high computational complexity.
  • the ship AIS trajectory data is multi-dimensional spatio-temporal data, and the data volume is large, there are still some technical problems to be solved urgently in its cluster analysis, such as how to efficiently process massive ship trajectory data, and how to better represent the ship AIS trajectory in clustering.
  • How to combine the results of ship AIS trajectory data mining with visualization is also a problem worthy of further study.
  • K-means clustering divides the category by determining the cluster center and calculating the distance from each data point to the cluster center.
  • General clustering algorithms such as K-means and GMM are fast and suitable for a variety of problems. However, their distance measures are limited to the original data space, and they are often ineffective when the input dimension is high.
  • the spectral clustering algorithm requires high memory computing consumption for high-dimensional data sets, which is not applicable in the actual clustering of large data sets.
  • related dimensionality reduction methods can only perform dimensionality reduction processing on linear data, but cannot perform nonlinear relational mapping on original high-dimensional data.
  • the clustering algorithm based on deep learning has aroused the research interest of many scholars in recent years. Deep learning technology has been widely used in the fields of computer vision and image processing, and its effectiveness in processing high-dimensional data has been proven.
  • the learning of deep neural network parameters generally relies on supervised labels to guide learning, while in the unsupervised clustering process, labels cannot be used to guide network parameter updates.
  • the crowding problem means that the clusters are gathered together and cannot be distinguished. For example, there is a situation where high-dimensional data can be expressed well when the dimensionality is reduced to 10 dimensions, but credible mapping cannot be obtained after dimensionality reduction to two dimensions. If two equidistant ones are used, a credible mapping result (up to 3 points) cannot be obtained in two dimensions. As the dimension increases, most of the data points are gathered near the surface of the m-dimensional sphere, and the distribution of distances from point xixi is extremely unbalanced. If this distance relationship is directly preserved to low dimensions, the crowding problem will arise. A direct consequence of the crowding problem is that the separated clusters in the high-dimensional space are not clearly divided in the low-dimensional space (but can be divided into blocks).
  • SNE Stochastic Neighbor Embedding
  • SNE provides a good visualization method, it is difficult to optimize, and there is a "crowding problem" (crowding problem).
  • the cost function of SNE focuses on the local structure of the data in the mapping, and it is very difficult to optimize this function, while t-SNE uses a heavy-tailed distribution, which can alleviate the crowding problem and the optimization problem of SNE.
  • Algorithm calculations correspond to conditional probabilities, and try to minimize the sum of the probability differences between higher and lower dimensions, which involves a lot of calculations and requires high system resources.
  • the complexity of t-SNE scales quadratically in time and space with the number of data points.
  • t-SNE Based on the achieved accuracy, comparing t-SNE with PCA and other linear dimensionality reduction models, the results show that t-SNE is able to provide better results. This is because the algorithm defines a soft boundary between the local and global structure of the data. t-SNE is currently the best data dimensionality reduction and visualization method, but its shortcomings are also obvious, such as: it takes up a lot of memory and runs for a long time. Since the cost function is non-convex, the results of multiple executions of the algorithm are random, and multiple runs are required to select the best result.
  • the purpose of the present invention is to provide a high-accuracy and fast-running massive high-dimensional AIS trajectory data clustering method for the deficiencies in the prior art.
  • a massive high-dimensional AIS trajectory data clustering method including the following steps:
  • AIS trajectory data preprocessing extract ship trajectory data, take the trajectory points with the same MMSI number as a trajectory, divide them into multiple trajectories according to the course information, and delete the abnormal points that belong to the trajectory and deviate from all trajectory points. Calculate the number of track points that need to be inserted after deleting the abnormal points, perform linear interpolation filling and data completion for the track point vacancies that will appear after deleting the abnormal points, and the missing values of the original AIS data; Perform normalization processing, and map each attribute component in the track point to the range of 0 to 1;
  • Pre-training autoencoder network pre-training an autoencoder network consisting of an encoder and a decoder, inputting the preprocessed AIS trajectory data into the autoencoder network for cyclic iteration, after multiple cyclic iterations, Complete the autoencoder network "input-dimension reduction-feature-upgrade-reconstruction" process, the initialization of the encoder part of the network parameters in the autoencoder is successful, and output the trajectory feature embedding point Z i after dimensionality reduction;
  • the gradient descent algorithm is used to calculate the gradient of the loss function L relative to each trajectory feature embedding point Z i and the clustering center ⁇ j , and pull The distance between two target distributions forms a probability distribution column.
  • the clustering process stops and the final clustering result is obtained.
  • the present invention regards the track points with the same MMSI number in the AIS data as one track, divides them into multiple tracks according to the heading information, and deletes the abnormal points belonging to the track and the abnormal points that deviate from all the track points, and needs to be inserted after calculating and deleting the abnormal points
  • the number of trajectory points, linear interpolation filling and data completion are performed on the trajectory point vacancies that will appear after deleting abnormal points, and the missing values in the original AIS data; the AIS data after interpolation and completion are normalized, and the Each attribute component in the trajectory points is mapped to the range of 0 to 1; it makes up for the difficulties in trajectory similarity measurement and feature extraction, low clustering accuracy and calculation efficiency in the existing AIS trajectory clustering methods. So that the final clustering effect is improved.
  • This disclosure uses an autoencoder network composed of an encoder and a decoder to extract AIS trajectory features and reduce dimensionality, and input the preprocessed AIS trajectory into the network. After multiple iterations, the network completes the "input-dimensionality reduction-feature - Dimension-up-reconstruction" process, the initialization of the encoder part of the network parameters in the autoencoder is successful, and the trajectory features after dimensionality reduction are output.
  • the encoder part of the trained autoencoder can map the original massive high-dimensional AIS trajectory data to a 10-dimensional feature space and represent it; the autoencoder is used for dimensionality reduction and feature extraction of trajectory data. Component analysis PCA dimensionality reduction or artificial feature engineering, the autoencoder used can automatically learn a set of good feature representations.
  • the encoder part of the trained autoencoder is extracted and added to the deep embedding clustering layer, and the k-means algorithm based on the Euclidean distance is used to cluster the embedding points of the trajectory feature after dimensionality reduction to obtain the initial clustering Center, calculates the soft assignment probability of each feature embedding point assigned to the initialization cluster point as the original target distribution.
  • auxiliary target distribution use the KL divergence to calculate the distance between the original target distribution and the target distribution, iteratively train the network, and update and optimize the network parameters and clustering parameters at the same time;
  • DEC deep embedding clustering
  • the feature extraction ability of the deep neural network maps the original data space to the low-dimensional feature space, automatically learns the feature representation of the trajectory in the feature space, and uses the KL divergence as the clustering assignment loss function to iteratively optimize the clustering target to achieve Data feature representation and cluster assignment are carried out at the same time, which can improve computational efficiency while ensuring clustering accuracy. It also has the advantage of reducing the complexity of O(nk), where k is the number of cluster centers.
  • This disclosure aims at the fact that the traditional clustering algorithm cannot perform clustering on high-dimensional big data well.
  • the encoder is taken out to perform feature-based trajectory clustering and initialize the soft allocation clustering layer; measure the embedded point and cluster centers, calculate the soft assignment between embedding points and cluster centers, use the feature representation and cluster assignment of deep neural network, normalize the soft assignment, construct auxiliary target assignment and loss function to train clustering Class: use the gradient descent algorithm to calculate the gradient of the loss function L relative to each feature embedding point Z i and clustering center ⁇ j , learn the mapping from the data space to the low-dimensional feature space, and iteratively optimize the clustering in the feature space When the change of cluster assignment between two consecutive iterations is less than the set value, the clustering process stops and the final clustering result is obtained.
  • the method is simple to implement and has obvious effects, and can be applied to different trajectory clustering occasions, and proposes a new solution for massive high-dimensional trajectory big data clustering.
  • the ship AIS trajectory data clustering method based on deep embedding clustering proposed in this disclosure does not need to set a similarity measure based on experience, and can perform similarity measurement and cluster assignment tasks at the same time, ensuring the feature representation and clustering of trajectory data Allocation can achieve better results. Compared with the prior art, it has the following beneficial effects:
  • the present invention can meet the massive and high-dimensional AIS trajectory big data clustering requirements; the trajectory feature is extracted by the autoencoder in the DEC, which is simple to implement and has low implementation complexity, and the extracted trajectory feature can express most of the original AIS trajectory information. Therefore, these trajectory features can be used in different algorithms. On the premise of ensuring the accuracy of the algorithm, it can also improve the efficiency of the algorithm; to obtain the initial clustering points of the cluster, any common clustering algorithm can be used, such as K-means/ Various classical clustering algorithms such as DBSCAN/STING can be calculated. In practical applications, considering the simplicity and efficiency of the K-means algorithm, the K-means algorithm is used to solve the initial aggregation points, which is convenient for efficient implementation.
  • Fig. 1 is the flow chart that the present invention realizes massive high-dimensional AIS trajectory data clustering
  • Fig. 2 Schematic diagram of AIS trajectory clustering based on DEC
  • Fig. 3 is a diagram of the network structure of an autoencoder
  • Fig. 4 is a deep clustering network structure diagram
  • Fig. 5 is the AIS trajectory extraction figure of the present invention.
  • Fig. 6 is a diagram for deleting abnormal points of AIS data in the present invention.
  • Fig. 7 is the AIS data interpolation figure of the present invention.
  • Fig. 8 is the AIS data visualization diagram after the pretreatment of the present invention.
  • Fig. 9 is an effect diagram of AIS data depth embedding clustering of the present invention.
  • Fig. 10 is the AIS data depth embedding cluster decomposition diagram 1 of the present invention.
  • Fig. 11 is the AIS data depth embedding cluster decomposition diagram 2 of the present invention.
  • Fig. 12 is the AIS data depth embedding cluster decomposition diagram 3 of the present invention.
  • AIS trajectory data preprocessing extract ship trajectory data, take the trajectory points with the same MMSI number as a trajectory, divide them into multiple trajectories according to the course information, and delete the abnormal points that belong to the trajectory and deviate from all trajectory points. Calculate the number of track points that need to be inserted after deleting the abnormal points, perform linear interpolation filling and data completion for the track point vacancies that will appear after deleting the abnormal points, and the missing values of the original AIS data; Perform normalization processing, and map each attribute component in the track point to the range of 0 to 1;
  • Pre-training autoencoder network pre-training an autoencoder network consisting of an encoder and a decoder, inputting the preprocessed AIS trajectory data into the autoencoder network for cyclic iteration, after multiple cyclic iterations, Complete the autoencoder network "input-dimension reduction-feature-upgrade-reconstruction" process, the initialization of the encoder part of the network parameters in the autoencoder is successful, and output the trajectory feature embedding point Z i after dimensionality reduction;
  • KL divergence is used as the loss function of the deep clustering network.
  • the gradient of the loss function L relative to each trajectory feature embedding point Z i and clustering center ⁇ j is calculated separately, and the distance between the two target distributions is shortened to form a probability distribution column.
  • the clustering process stops and the final clustering result is obtained.
  • the specific implementation steps can be divided into four parts: 1) Preprocessing of AIS trajectory data; 2) Pre-training the autoencoder network to extract trajectory features; 3) Initializing the clustering center; 4) Constructing a deep clustering network for clustering;
  • T i (p i1 ,p i2 ,p i3 ,...,p in ) (1)
  • n 1,2,...n
  • n the number of track points contained in the track
  • t the time of track point collection
  • lon longitude
  • lat latitude
  • sog the course over the ground
  • head the heading of the ship.
  • Outliers are removed.
  • abnormal points belonging to the trajectory such as negative speed, deviation from all trajectory points, etc.
  • the abnormal points are deleted.
  • the entire trajectory is deleted and does not participate in the later trajectory. clustering.
  • t(p b -p a ) represents the time interval between track points P b and P a
  • t threshold is a predefined time threshold
  • t-SNE In order to speed up the training speed of the network and improve the computational efficiency, given an N high-dimensional data x 1 , x 2 , ... x N (note that N is the number of data samples, not the dimension), t-SNE first calculates the probability p ji , Proportional to the similarity between data points x i and x j , map each attribute component in the trajectory point to the range of 0 to 1, complete the AIS data for interpolation, and then perform normalization processing to obtain normalization Longitude lon, latitude lat, course-over-ground sog, and ship's heading normalized attribute values after normalization: data normalization.
  • t-SNE In order to speed up the training speed of the network and improve the computational efficiency, given an N high-dimensional data x 1 , x 2 , ... x N (note that N is the number of data samples, not the dimension), t-SNE first calculates the probability p ji , Proportional to the similarity between data points x i and x j , map each attribute component in the trajectory point to the range of 0 to 1, complete the AIS data for interpolation, and then perform normalization processing to obtain normalization Normalized longitude lon, latitude lat, ground course sog and ship heading head normalized attribute value x':
  • the attribute value before normalization including longitude lon, latitude lat, ground course sog and ship heading head, is the maximum attribute value, is the minimum attribute value, and is the normalized attribute value.
  • the attribute values of all track points are mapped to the range of 0-1.
  • the trajectory features are extracted, and the pre-processed trajectory is input to the auto-encoder network for training.
  • Dimension-feature-upgrade-reconstruction the trajectory feature data formed is,
  • Trj i (p i1 ,p i2 ,...,p im ) (5)
  • the autoencoder network After multiple cycles of iterative network training, that is, the input and output are infinitely close, the autoencoder network has completed the process of "input-dimension reduction-feature-up-dimension-reconstruction", and the encoder part of the network parameters in the autoencoder The initialization of is successful. At this point, the output of the autoencoder is the dimensionality-reduced feature of the trajectory Trj i .
  • the encoder can be regarded as a neural network that maps the high-dimensional data space to the low-dimensional data space, which can be expressed by the following formula:
  • p i represents the i-th trajectory point
  • i 1, 2,...,m
  • m represents the number of trajectory points contained in the trajectory
  • f is a nonlinear mapping function
  • t represents the time of trajectory point collection
  • is the neural network
  • the non-linear mapping parameters that can be learned in , zi is the feature embedding point of the trajectory Trj i in the low-dimensional feature space after being mapped by the encoder network, which is the trajectory feature that we will cluster in the future.
  • the structure of the self-encoder network is shown in Figure 3.
  • Pre-train the autoencoder network initialize network parameters, and extract trajectory features.
  • the autoencoder includes: a fully symmetrical neural network encoder and decoder.
  • the encoder completes the encoding of the input trajectory data, and maps the high-dimensional trajectory data features to the low-dimensional trajectory data features.
  • the decoder is the opposite of the encoder, and the decoder restores the original input data from the low-dimensional trajectory data features of the autoencoder network;
  • the autoencoder network is a network with 9 layers, and the first layer is a ship
  • the input feature dimension of the trajectory, set the input feature dimension to 682 dimensions, the second and third layers are 500 dimensions, the 3rd-4th layer is 200 dimensions, the 5th layer is 10 dimensions, the 6th layer is 200 dimensions, and the 7th layer is 200 dimensions.
  • the 8th layer is 500 dimensions
  • the 9th layer is the data feature dimension is 682 dimensions
  • the middle layer uses the ReLU function as the activation function
  • the output of the autoencoder network is a 10-dimensional feature.
  • the training neural network uses the mean square error (MSE) as the loss function, and all the neural networks in the experiment are fully connected.
  • MSE mean square error
  • Initialize the clustering center In order to obtain the initialized clustering center, you can use the k-means algorithm based on Euclidean distance to obtain the initial cluster points.
  • the k-means algorithm based on Euclidean distance clusters the trajectory feature set Z.
  • the number of clusters after clustering is K, and the center of each cluster is ⁇ j , 1 ⁇ j ⁇ K.
  • the deep clustering network structure is shown in Figure 4.
  • the encoder part of the pre-trained autoencoder network is taken out and added to the clustering layer to form a deep clustering network.
  • the encoder part of the pre-trained autoencoder network is taken out and added to the clustering layer to form a deep clustering network.
  • two distributions need to be constructed, and iterative clustering can be realized by shortening the distance between the two distributions.
  • the Euclidean distance between the data and the clustering center point is converted into a conditional probability to characterize the probability that the data point is assigned to the clustering center, and the probability q ij that the trajectory feature embedding point Z i is assigned to the initialization point ⁇ j is calculated, and also Called the soft assignment probability, as the initial target distribution for clustering,
  • z i is the feature embedding point of trajectory Trj i in the low-dimensional feature space after being mapped by the encoder network
  • ⁇ j is the center of the jth cluster
  • is the degree of freedom of the t-SNE distribution, usually set to 1 .
  • auxiliary target distribution p ij that can be constructed by the following formula: Stochastic Adjacency Embedding (SNE), which converts the high-dimensional Euclidean distance between data points into a conditional probability q j
  • SNE Stochastic Adjacency Embedding
  • the relative entropy KL divergence is used to measure the difference between the two distributions, and it is used as the loss function of the deep clustering network, which can be expressed as :
  • step S2 in order to learn the nonlinear mapping parameter ⁇ at the same time (step S2 is only pre-training) and the cluster center ⁇ j (K-mean only obtains the initial cluster center).
  • the following takes the real ship AIS data in a certain sea area as an example.
  • the data source of the whole experiment includes AIS data
  • the original data includes the ship MMSI No. water mobile communication service identification code MMSI, data receiving time BaseDataTime, position dimension LAT, position accuracy LON, speed over ground SOG, course over ground COG, heading, navigation Status, VesselType and other attributes, the original AIS data .csv file, after the ship goes out to sea and away from the port, its sailing route is in a divergent state.
  • the spatial span of the AIS trajectory data of the port extracted in the embodiment is (minimum precision: -123.93299, maximum precision: -112.64193; minimum dimension 48.10732, maximum dimension 48.50108), a total of 104930 trajectory points, as shown in Figure 6 after visualization.
  • Outliers are removed.
  • the number of AIS trajectory data points is less than 100, because it does not form an obvious route.
  • the group trajectory is deleted, and the deletion is obvious.
  • the value of the track point jump as shown in Figure 8, the last track point, its time attribute suddenly jumps to the middle of the track, and the obvious error needs to be deleted.
  • the number of pre-training is set to 100
  • the data processing batch size is 8
  • the iteration stop condition is 2*10 -3
  • the maximum number of iterations is 2*10 4
  • the preprocessed AIS trajectory data is subjected to deep clustering, and the clustering results are shown in Figure 9.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开的一种海量高维AIS轨迹数据聚类方法,准确率高,运行速度快。本发明通过下述技术方案实现:依据航向信息分成多条轨迹,对AIS轨迹数据预处理并进行线性插值和数据补全;将预处理后的AIS轨迹数据输入到自编码器网络进行重构训练,输出降维后的轨迹特征嵌入点;基于欧氏距离的k-means算法,对轨迹特征嵌入点进行聚类,得到初始聚点;将预训练好的编码器加入聚类层构建深度聚类网络,分别计算轨迹特征嵌入点分配给初始聚点的软分配概率,以及属于某个聚类的辅助分配概率,采用梯度下降算法计算二者KL散度,当连续迭代之间的聚类分配变化小于设定值时,聚类过程停止,得到最终聚类结果。

Description

海量高维AIS轨迹数据聚类方法
本申请要求于2021年08月31日提交中国专利局、申请号为202111012775.7、申请名称“海量高维AIS轨迹数据聚类方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及数据聚类技术,更具体的,涉及一种基于深度嵌入聚类的船舶自动识别系统AIS轨迹聚类方法,用来解决海量高维船舶AIS轨迹数据的聚类问题。
背景技术
时空轨迹是移动对象位置和时间的记录序列,作为一种重要的时空对象数据类型,时空轨迹在交通流模式和特性研究、资源分配、海冰监测等领域等方面有着广泛的应用,通过对各种时空轨迹数据进行分析,可以得到时空轨迹数据中的相似性特征,发现其中有意义的轨迹模式。船舶轨迹数据是时空轨迹数据的一种,记录了船舶的航行过程和相应的行为特征。随着船舶自动识别系统AIS在船上的广泛应用,船舶轨迹数据的获取越来越容易,这些包括船位、时间、船速、航向、转头角速度等属性的船舶AIS轨迹数据是分析船舶聚集特性的数据源,如何从这些海量的数据中挖掘出所蕴含的有价值的信息,对于研究船舶交通行为模式、分析船舶交通流特征具有重要的意义。
AIS系统是船舶自动识别系统(AutomaticIdentificationSystem)的简称,由岸基(基站)设施和船载设备共同组成,是一种新型的集网络技术、现代通讯技术、计算机技术、电子信息显示技术为一体的数字助航系统和设备。船舶自动识别系统AIS可以不间断地发送本船的相关信息,通过AIS接收机即可接收到该数据。通常AIS接收基站都是部署在陆地上的,这样只能接收到基站附近约60公里范围内的船舶数据,而卫星AIS则是把AIS接收机安装到卫星上,这样即可不受地域范围限制地接收到全球船舶的AIS信息。随着港口水域船舶交通密度不断提高,水域内的船舶航行情况越来越复杂,对船舶交通管理能力也提出了更高要求。
根据船舶AIS轨迹数据的特点,船舶轨迹数据AIS数据包含的内容主要有船舶静态信息、动态信息以及航次相关信息。船舶AIS轨迹数据主要是通过AIS基站获得。在航行过程中,一般直接接入全球定位系统的位置、速度等信息,这些信息经由船舶AIS发射机编码向外发射,并由附近船舶或岸基AIS接收机接收。在以上船舶AIS数据采集的过程中,船舶驾驶人员的手工输入、AIS信息的传输、采集到信息后的存储等环节都有可能现误差或错误,并且原始的AIS数据通常存在时间乱序、数据异常、数据缺失、以及轨迹点数量不相等的情况, 因此,为提高船舶AIS轨迹数据的质量,在使用前,有必要进行数据的预处理。船舶AIS轨迹数据预处理大致有以下几个方面的内容:
1)缺失数据的处理。缺失数据处理主要是针对船舶轨迹数据中的静态数据,如船名、船宽、船舶类型等,对于这些数据,可以通过海事主管部门的船名录或船舶数据库进行核对。如果是动态数据缺失,一般将该条数据作为错误数据处理。
2)降维归约。船舶轨迹数据包含的属性较多,但并不是所有的属性都是我们所需要的,可以根据研究实际情况剔除不需要的属性,从而得到数据集的简化表示。如在只研究船舶轨迹空间信息时,可只保留船舶位置、船名属性,而剔除其他的属性.
3)数值概念分层。对于有些船舶轨迹数据,比如船长、船宽、船舶吨位等,可根据在应用时的实际情况,进行概念分层处理。例如对于集装箱船,根据船长不同可分为超大型集装箱船、大型集装箱船、中型集装箱船和小型集装箱船,对船舶进行相应的数值概念分层。
船舶轨迹聚类方法的聚类就是要将具有相似行为的对象划分到同一组中,使得组内的差别尽量小,组间的差别尽量大。船舶AIS轨迹聚类的目的就采用相关的聚类算法对轨迹数据进行聚类,找出具有相似船舶运动演化方式的轨迹簇,揭示船舶轨迹间潜在的关系,分析船舶交通流特征或个体船舶的行为。基于距离的船舶AIS轨迹聚类方法的本质是按照轨迹数据的相似性进行对象的划分,而聚类划分的结果往往是使某种表示聚类质量的评价函数最优,因此,如何评价轨迹数据间的距离或相似度是聚类处理的关键问题之一。聚类是一种无监督的数据挖掘方法,通过对象之间的相似度度量将原始数据集划分为多个类簇,簇内对象的相似程度高,簇间对象的相似程度低。轨迹数据聚类首先通过分析对比轨迹特征信息,得到轨迹之间的相似程度,然后将相似程度高的轨迹归为一类。通过对船舶AIS轨迹数据进行聚类分析,能够为典型航线提取、异常轨迹发现、航行轨迹预测和交通流量分析等技术提供有效支持,对解决船舶航行安全问题及提高进出港效率,具有重要的应用价值。但是,与常见的行人和汽车轨迹相比,船舶AIS轨迹数据除了拥有时空属性之外,还包括对地航速,对地航向、船首向、航行状态、船舶类型等多种属性信息,数据量大且特征维度多,属于典型的时空轨迹大数据。
现有的船舶AIS轨迹聚类方法主要包括两个步骤:(1)相似度度量,用来衡量轨迹之间相似度;(2)聚类,将相似的轨迹归为一类。
相似性度量通常以两轨迹之间的距离来衡量,常用有欧式距离(Euclideandistance)、豪斯多夫距离(HausdorffDistance,HD)、动态时间规整距离(Dynamictimewarpingdistance,DTW)、弗雷歇距离(Fréchetdistance,FD)等。而聚类则主 要包括以K-means为代表的基于划分聚类算法,以BIRCH为代表的基于层次聚类算法,以STING为代表的基于网格聚类算法,以SpectralClustering为代表基于谱聚类算法,和以DBSCAN为代表的基于密度聚类算法。基于距离的轨迹间相似度度量方法是一种常用的方法,其中,基于豪斯多夫距离的算法、基于最长公共子序列(LongestCommonSubSequence,LCSS)的算法和基于编辑距离(EditDistance,ED)的算法是常用的度量方法。由于K-Means聚类方法存在的缺陷,如需要指定聚类数目、聚类结果常会受到初始聚类中心的严重影响等,文章没有对这些缺陷进行改进,因此,对船舶轨迹信息的聚类结果会受到这些不利因素的影响。VRIES等人将船舶轨迹看作时间序列,采用DTW和ED来计算轨迹的相似度,并且结合轨迹压缩的方法,利用核k均值方法对船舶AIS轨迹聚类.DTW和ED均可以用来计算船舶轨迹的相似度,但是计算方法又有所不同,DTW的策略是轨迹点与点之间逐点匹配的方法,由此带来的一个不足是计算量大,并且对孤立的点比较敏感,同时,当两条轨迹在一小段内不相似时,就会出现聚类结果不理想的情况。ED虽然可以用于船舶轨迹聚类时轨迹间相似度的计算,并且可以克服DTW的缺口问题,但是,仍然存在计算量大,并且对异常轨迹敏感的问题。马文耀等人提出了一种基于单向距离的谱聚类船舶运动模式辨识方法,该算法将单向距离定义为一条船舶轨迹上各个点到另一船舶条轨迹中各点的最小距离的平均值,利用单向距离抗干扰的特点,构建基于单向距离的船舶AIS轨迹相似性度量,得到船舶AIS轨迹的相似度矩阵,以谱聚类算法学习轨迹的空间分布,获取船舶的正常运动模式,但是,由于船舶轨迹采样频率高,数据量大,需要逐个计算单向距离,因此,存在计算量大的问题。采用基于距离的聚类方法对船舶轨迹聚类具有算法简单、实现容易的优点,但由于基于距离的轨迹相似度度量方法本身的不足,仍然存在容易丢失轨迹局部特征信息。
刘涛等人将船舶概念引入DBSCAN算法,对船舶轨迹进行聚类.DBSCAN算法将每个对象看作一个质点,而实际的船舶交通流中每个船舶的大小是不一致的,这导致了在较小范围的水域内,聚类结果不能很好地反映出该水域真实的交通流状况。LIU等人在考虑船舶轨迹数据中非空间属性(如船速、航向等)基础上,对DBSCAN算法进行改进,在输入参数中新增船舶最大的船速变化量(MaxSpd)和最大的航向变化量(MaxDir)两个变量,在综合考虑对象的邻域、对地航速、对地航向情况下,重新定义核心对象,并根据国际海事组织对航路的定义,对MaxSpd和MaxDir进行调整,实现对船舶轨迹的聚类,提取船舶轨迹的主航路,与基于距离的船舶轨迹聚类方法相比,采用DBSCAN及其改进算法对船舶轨迹进行聚类,其优势主要表现在能发现任意形状的船舶轨迹簇,且对异常的船舶轨迹鲁棒性较强,轨迹聚集的结构与样本轨迹的遍历顺序也无关.但是,也存在一些缺点,主要是当船舶轨迹的密度不均匀、类簇间距差相 差很大时,聚类质量较差,也需要人工输入聚类对象的邻域半径和邻域内的样本数,使得人的主观判断对聚类有影响;同时,当船舶轨迹数据量增大时,要求较大的内存支持,I/O消耗也很大。由于大部分轨迹聚类方法的处理仅限于原始数据空间,面对数据量大、维度高的船舶AIS轨迹数据,聚类精度和效率均比较低,并且由于轨迹的相似度衡量和聚类任务分离,无法保证提取的轨迹特征适合聚类任务,影响聚类质量。即便是轨迹聚类中应用最为广泛的DBSCAN算法,虽然能够发现任意形状的类簇、对噪声不敏感,仍需人为进行轨迹特征选取,预设置半径(eps)和最小包含点数(minPts)两个参数,并且当轨迹数据密度不均时,聚类效果差且计算效率低。这些方法各有特色,例如基于距离的方法虽然理论简单,实现容易,但是在聚类过程中容易丢失轨迹的局部信息,基于密度的方法虽然可以对任意形状船舶轨迹聚类,但是当船舶轨迹数据簇间密度不均匀时,聚类效果差,基于统计学的方法有成熟的数学方法为基础,但是存在计算复杂度高的不足。由于船舶AIS轨迹数据是多维时空数据,并且数据量大,其聚类分析还存在一些技术上亟待解决的问题,例如如何高效处理海量船舶轨迹数据,如何在聚类中更好地表现船舶AIS轨迹数据中的多维属性,如何在充分考虑风、流、能见度等自然条件情况下对实现对船舶AIS轨迹的聚类分析等,这些问题需要继续进行深入的研究,同时,进行船舶AIS轨迹的可视化分析.如何把对船舶AIS轨迹数据挖掘获得的结果与可视化相结合也是一个值得深入研究的问题。
传统的聚类算法可以分为划分式聚类算法(例如K-means),基于图的聚类算法(例如谱聚类Spectral Clustering),基于层次的聚类算法(例如AGNES)等。传统的聚类算法应用最广泛的两种算法:Keans聚类和谱聚类。K-means聚类通过确定簇心,并计算各个数据点到簇心的距离,来划分类别的归属。一般的聚类算法例如K-means,GMM,这些方法速度快,适用于各种各样的问题,但是,它们的距离度量仅限于原始数据空间,当输入维度较高时,它们往往无效。直接采用K-means等传统基于距离度量的距离算法,在原始像素高维图像数据集上计算欧式距离不够高效,即当维度很高时,计算非常耗时;传统的先降维后聚类的算法,只能对原始数据执行线性嵌入学习,导致很多重要特征丢失;谱聚类是最流行的聚类算法之一,它的实现简单,而且效果往往胜过传统的聚类算法,如K-means。它的主要思想是把所有数据看作空间中的点,这些点之间用带权重的边相连,距离较远的点之间的边权重较低,距离较近的点之间边权重较高,通过对所有数据点和边组成的图进行切图,让切图后不同子图间边权重和尽可能低,而子图内边权重和尽可能高来达到聚类的目的。谱聚类算法虽然能够在高维数据集中很好地执行聚类,但是当数据集变大时,其计算特征矩阵时的内存和运算资源会暴增。传统的聚类算法对于高维数据集很难达到理想的聚类效 果。例如,谱聚类算法对于高维数据集,会需要很高的内存计算消耗,在实际的大数据集聚类中不适用。虽然目前也有相关方法对原始高维数据执行降维,但是相关降维方法只能对线性数据执行降维处理,而不能够对原始高维数据进行非线性关系映射处理。针对该问题,基于深度学习的聚类算法在近些年引发了众多学者的研究兴趣。深度学习技术已被广泛使用在计算机视觉和图像处理等领域,被证明了其处理高维数据的有效性。深度神经网络参数的学习一般是依靠有监督的标签来指导学习,而在无监督聚类过程中,则不能使用标签来指导网络的参数更新。当前大部分轨迹聚类方法的处理仅限于原始数据空间,当轨迹数据量较大时,聚类效果和效率均比较低,并且由于轨迹的相似度衡量和聚类任务分离,无法保证提取的特征适合聚类任务,影响聚类精度和效率。
大数据时代,数据量不仅急剧膨胀,数据也变得越来越复杂,数据的维度也随之增加。在二维映射空间中,能容纳(高维空间中的)中等距离间隔点的空间,不会比能容纳(高维空间中的)相近点的空间大太多。换言之,哪怕高维空间中离得较远的点,在低维空间中留不出这么多空间来映射。于是到最后高维空间中的点,尤其是远距离和中等距离的点,在低维空间中统统被塞在了一起,这就叫做“拥挤问题(Crowding Problem)”。拥挤问题就是说各个簇聚集在一起,无法区分。比如有一种情况,高维度数据在降维到10维下,可以有很好的表达,但是降维到两维后无法得到可信映射,比如降维如10维中有11个点之间两两等距离的,在二维下就无法得到可信的映射结果(最多3个点)。随着维度的增大,大部分数据点都聚集在m维球的表面附近,与点xixi的距离分布极不均衡。如果直接将这种距离关系保留到低维,就会出现拥挤问题。拥挤问题带来的一个直接后果,就是高维空间中分离的簇,在低维中被分的不明显(但是可以分成一个个区块)。比如用SNE去可视化MNIST数据集的结果。t-SNE是基于在邻域图上随机游走的概率分布来找到数据内的结构,主要用于数据的局部结构,并且会倾向于提取出局部的簇,这种能力对于可视化同时包含多个流形的高维数据(比如MNIST数据集)很有效。随机邻接嵌入(SNE)通过将数据点之间的高维欧几里得距离转换为表示相似性的条件概率而开始。为了测量条件概率差的和最小值,SNE使用梯度下降法最小化KL距离。尽管SNE提供了很好的可视化方法,但是他很难优化,而且存在”crowding problem”(拥挤问题)。而SNE的代价函数关注于映射中数据的局部结构,优化该函数是非常困难的,而t-SNE采用重尾分布,可以减轻拥挤问题和SNE的优化问题。算法计算对应的是条件概率,并试图最小化较高和较低维度的概率差之和,这涉及大量的计算,对系统资源要求高。t-SNE的复杂度随着数据点数量有着时间和空间二次方。基于所实现的精度,将t-SNE与PCA和其他线性降维模型相比,结果表明t-SNE能够提 供更好的结果。这是因为算法定义了数据的局部和全局结构之间的软边界。t-SNE是目前来说效果最好的数据降维与可视化方法,但是它的缺点也很明显,比如:占内存大,运行时间长。由于代价函数非凸,多次执行算法的结果是随机的,需要多次运行选取最好的结果。
发明内容
为满足海量高维AIS轨迹数据的聚类需求,本发明的目的是针对现有技术存在的不足之处,提供一种准确率高,运行速度快的海量高维AIS轨迹数据聚类方法。
本公开的上述目的可以通过下述技术方案予以实现,一种海量高维AIS轨迹数据聚类方法,包括如下步骤:
1)AIS轨迹数据预处理:提取船舶轨迹数据,将MMSI号相同的轨迹点作为一条轨迹,依据航向信息分成多条轨迹,对于属于轨迹中的异常点,偏离所有轨迹点的异常点进行删除,计算删除异常点后需要插入的轨迹点数量,对删除异常点后会出现的轨迹点空缺,以及原始AIS数据存在的缺失值进行线性插值填充和数据补全;对插值补全后的AIS数据进进行归一化处理,将轨迹点中的每一个属性分量映射到0~1的范围内;
2)预训练自编码器网络:预训练由一个编码器和解码器构成的自编码器网络,将预处理后的AIS轨迹数据输入到自编码器网络进行循环迭代,经多次循环迭代后,完成自编码器网络“输入-降维-特征-升维-重构”过程,自编码器里编码器部分网络参数的初始化成功,并输出降维后的轨迹特征嵌入点Z i
3)初始化聚类中心:基于欧氏距离的k-means算法,对自编码器里编码器部分提取的低维轨迹特征空间集合进行聚类,得到聚类中心的初始化聚点μ j
4)构建深度聚类网络:将预训练好的编码器加入聚类层构建深度聚类网络,基于机器学习中的t-SEN思想,将数据与聚类中心聚点的欧氏距离转化为条件概率来表征数据点分配给聚类中心的概率,计算轨迹特征嵌入点Z i分配给初始化聚点μ j的软分配概率,和聚类的初始目标分布,同时,将KL散度作为深度聚类网络损失函数,构建辅助目标分布和用来衡量样本的聚类目标分布,采用梯度下降算法,分别求出损失函数L相对于每个轨迹特征嵌入点Z i和聚类中心μ j的梯度,拉近两个目标分布之间的距离,形成一个概率分布列,当有2个连续迭代之间的簇分配变化小于设定值时,聚类过程停止,得到最终聚类结果。
本公开相比于现有技术具有如下有益效果,
本发明将AIS数据中的MMSI号相同的轨迹点作为一条轨迹,依据航向信息分成多条轨迹,对于属于轨迹中的异常点,偏离所有轨迹点的异常点进行删除,计算删除异常点后需要插入的轨迹点数量,对删除异常点后会出现的轨迹点空缺,以及原始AIS数据存在的缺失值进行 线性插值填充和数据补全;对插值补全后的AIS数据进进行归一化处理,将轨迹点中的每一个属性分量映射到0~1的范围内;弥补了现有AIS轨迹聚类方法中的轨迹相似性度量和特征提取困难,聚类精度和计算效率低等问题。从而使得最终的聚类效果得到提升。
本公开使用编码器和解码器构成的自编码器网络进行AIS轨迹特征提取和降维,将预处理后AIS轨迹输入到网络中,经多次循环迭代后,网络完成“输入-降维-特征-升维-重构”过程,自编码器里编码器部分网络参数的初始化成功,并输出降维后的轨迹特征。训练好的自编码器中的编码器部分,可以将原始的海量高维AIS轨迹数据映射到10维特征空间并表示;采用自编码器进行轨迹数据的降维和特征提取,相比传统的如主成分分析PCA降维或人工特征工程,采用的自编码器能够自动学习到一组很好的特征表示。
本公开将训练好的自编码器里的编码器部分提取出来加入深度嵌入聚类层,使用基于欧氏距离的k-means算法,对降维后的轨迹特征嵌入点进行聚类得到初始化聚类中心,计算每个特征嵌入点分配给初始化聚点的软分配概率,作为原始目标分布。再构建辅助目标分配(目标分布),使用KL散度计算原始目标分布和目标分布之间距离,循环迭代训练网络,同时更新优化网络参数和聚类参数;基于深度嵌入聚类(DEC)可以借助深度神经网络的特征提取能力,将原始数据空间映射到低维特征空间,在特征空间中自动学习轨迹的特征表示,并使用KL散度作为聚类分配损失函数,迭代地优化聚类目标,实现数据特征表示和聚类分配的同时进行,在保证聚类精度的同时还能提高计算效率。还具有降低O(nk)的复杂度的优点,其中k是聚类中心的数量。
本公开针对传统聚类算法不能很好地对高维大数据执行聚类,在自编码器训练完成之后,取出编码器,进行基于特征的轨迹聚类,初始化软分配聚类层;测量嵌入点和聚类中心之间的相似度,计算嵌入点和聚类中心之间的软分配,使用深度神经网络的特征表示和聚类分配,对软分配进行规范化,构建辅助目标分配和损失函数训练聚类:采用梯度下降算法,分别求出损失函数L相对于每个特征嵌入点Z i和聚类中心μ j的梯度,从数据空间学习映射到低维特征空间,在特征空间中迭代地优化聚类目标,当有2个连续迭代之间的簇分配变化小于设定值时,聚类过程停止,得到最终聚类结果。方法实施简单,效果明显,可应用在不同的轨迹聚类场合,为海量高维的轨迹大数据聚类提出了一个新的解决方案。
本公开提出的基于深度嵌入聚类的船舶AIS轨迹数据聚类方法,不需要根据经验设置相似性度量,并且能够将相似性度量和聚类分配任务同时进行,确保轨迹数据的特征表示和聚类分配能够达到比较好的效果。与现有技术相比,具有以下有益效果:
本发明能够满足海量、高维的AIS轨迹大数据聚类需求;通过DEC中的自编码器提取轨迹特 征,实现简单且实施复杂度低,所提取的轨迹特征能够表达原始AIS轨迹中的大部分信息。因此,这些轨迹特征能够运用在不同算法中,保证算法准确性的前提下,还能提高算法效率;获取聚类初始聚点部分,可使用任意一种常见的聚类算法,比如K-means/DBSCAN/STING等各种经典聚类算法均可计算。实际应用中,考虑到K-means算法简单高效,因此采用K-means算法求解初始聚点,便于高效实施。
附图说明
图1是本发明实现海量高维AIS轨迹数据聚类的流程图;
图2基于DEC的AIS轨迹聚类原理图;
图3是自编码器网络结构图;
图4是深度聚类网络结构图;
图5是本发明AIS轨迹提取图;
图6是本发明AIS数据异常点删除图;
图7是本发明AIS数据插值图;
图8是本发明预处理后的AIS数据可视化图;
图9是本发明的AIS数据深度嵌入聚类效果图;
图10是本发明的AIS数据深度嵌入聚类分解图1;
图11是本发明的AIS数据深度嵌入聚类分解图2;
图12是本发明的AIS数据深度嵌入聚类分解图3。
以下将结合附图及实施例,对本发明的构思、具体结构及产生的技术效果作进一步说明,以充分地了解本发明的目的、特征和效果。
具体实施方式
参阅图1-图5。根据本发明,采用如下步骤:
1)AIS轨迹数据预处理:提取船舶轨迹数据,将MMSI号相同的轨迹点作为一条轨迹,依据航向信息分成多条轨迹,对于属于轨迹中的异常点,偏离所有轨迹点的异常点进行删除,计算删除异常点后需要插入的轨迹点数量,对删除异常点后会出现的轨迹点空缺,以及原始AIS数据存在的缺失值进行线性插值填充和数据补全;对插值补全后的AIS数据进进行归一化处理,将轨迹点中的每一个属性分量映射到0~1的范围内;
2)预训练自编码器网络:预训练由一个编码器和解码器构成的自编码器网络,将预处理后的AIS轨迹数据输入到自编码器网络进行循环迭代,经多次循环迭代后,完成自编码器网络“输入-降维-特征-升维-重构”过程,自编码器里编码器部分网络参数的初始化成功,并输 出降维后的轨迹特征嵌入点Z i
3)初始化聚类中心:基于欧氏距离的k-means算法,对自编码器里编码器部分提取的低维轨迹特征空间集合进行聚类,得到聚类中心的初始化聚点μ j
4)构建深度聚类网络:将预训练好的编码器加入聚类层构建深度聚类网络,基于机器学习中的t-SEN思想,将数据与聚类中心聚点的欧氏距离转化为条件概率来表征数据点分配给聚类中心的概率,计算轨迹特征嵌入点Z i分配给初始化聚点μ j的软分配概率(软分配概率,也即是聚类的初始目标分布),同时,将KL散度作为深度聚类网络损失函数,构建辅助目标分布和用来衡量样本的属于某个聚类目标的分布(目标分布)。为了拉近两个分布之间的距离,使用KL散度作为深度聚类网络损失函数。采用梯度下降算法,分别求出损失函数L相对于每个轨迹特征嵌入点Z i和聚类中心μ j的梯度,拉近两个目标分布之间的距离,形成一个概率分布列,当有2个连续迭代之间的簇分配变化小于设定值时,聚类过程停止,得到最终聚类结果。
具体实现步骤可被划分为四个部分:1)AIS轨迹数据预处理;2)预训练自编码器网络,提取轨迹特征;3)初始化聚类中心;4)构建深度聚类网络进行聚类;
在AIS数据预处理中,对于同一艘船区域内往返多次的情况,依据航向信息分成多条轨迹,其中,轨迹点p i=(t,lon,lat,sog,head),第i条轨迹表示为
T i=(p i1,p i2,p i3,…,p in)           (1)
式中,i=1,2,…n,n表示轨迹中包含的轨迹点数,t表示轨迹点采集的时间,lon表示经度,lat表示纬度,sog表示对地航向,head表示船首向。
异常点删除。对于属于轨迹中的异常点,比如速度出现负值、偏离所有轨迹点等异常点进行删除,另外,对于包含轨迹点数少于所有轨迹平均点数一半的轨迹,进行整条删除,不参与后期的轨迹聚类。
数据插值。对于删除异常点后会出现的轨迹点空缺,以及原始AIS数据存在中间值缺失情况,需要进行线性插值,对缺失值进行填充;方法是当相邻两轨迹点的时间间隔大于给定阈值时,计算需要插入的轨迹点数量,然后再进行插值处理,首先计算需要插值的两轨迹点之间的时间间隔,得到需要插入轨迹点的数量:
Figure PCTCN2022083839-appb-000001
得到插入轨迹点的数量N之后,需要对轨迹的经纬度、对地航速和船首向进行插值处理,计算时间段内缺失的船舶轨迹数据p i
Figure PCTCN2022083839-appb-000002
其中,t(p b-p a)表示轨迹点P b,P a间的时间间隔,t threshold为预定义的时间阈值。
数据补全。由于AIS数据的采样率随船舶航速变化而变化,因此各轨迹长度并不完全相同。为满足后续DEC中神经网络的输入要求,需要将不同长度的船舶轨迹转换为固定长度,将轨迹数据集中最长的轨迹作为标准长度。考虑到行驶在同一航线上的船舶起点位置和终点位置应该是相同的,因此采用一种两端补全的方式,即填充的轨迹点只有时间属性发生变化,其他属性不变。经过轨迹补全之后,所有的轨迹都具有相同的标准长度。
数据归一化。为加速网络的训练速度和提高计算效率,给定一个N个高维的数据x 1、x 2、…x N(注意N是数据样本数量,不是维度),t-SNE首先计算概率p ji,正比于数据点x i与x j之间的相似度,将轨迹点中的每一个属性分量映射到0~1的范围内,对插值补全AIS数据,然后进行归一化处理,得到归一化后经度lon,纬度lat,对地航向sog和船首向head归一化后的属性值:数据归一化。为加速网络的训练速度和提高计算效率,给定一个N个高维的数据x 1、x 2、…x N(注意N是数据样本数量,不是维度),t-SNE首先计算概率p ji,正比于数据点x i与x j之间的相似度,将轨迹点中的每一个属性分量映射到0~1的范围内,对插值补全AIS数据,然后进行归一化处理,得到归一化后经度lon,纬度lat,对地航向sog和船首向head归一化后的属性值x':
Figure PCTCN2022083839-appb-000003
此时所有轨迹点的属性值均映射到0~1范围内,
式中,为归一化前的属性值,,包括经度lon,纬度lat,对地航向sog和船首向head,为最大属性值,为最小属性值,为归一化后的属性值。此时所有轨迹点的属性值均映射到0~1范围内。
在预训练自编码器网络中,提取轨迹特征,将预处理后轨迹输入到自动编码器网络进行训练,原始AIS轨迹经预处理后,经多次循环迭代网络训练轨迹点,完成“输入-降维-特征-升维-重构”过程,形成的轨迹特征数据为,
Trj i=(p i1,p i2,…,p im)            (5)
经多次循环迭代网络训练完成后,也即是输入和输出无限接近,自编码器网络完成了“输入-降维-特征-升维-重构”过程,自编码器里编码器部分网络参数的初始化成功。此时,自动编码器的输出就是轨迹Trj i降维后的特征。此时的编码器,就可以看成是一个将高维数据空间映射到低维数据空间的神经网络,可用下式表示:
f(Trj i,θ)=z i           (6)
其中,p i表示第i个轨迹点,i=1,2,…,m,m表示该轨迹中包含的轨迹点数,f为非线性映射函数,t表示轨迹点采集的时间,θ为神经网络中可学习的非线性映射参数,z i是轨迹Trj i经编码器网络映射后在低维特征空间里的特征嵌入点,也即是我们后续要进行聚类的轨迹特征。
参阅图3。自编码器网络结构如图3所示。预训练自编码器网络,初始化网络参数,提取轨迹特征。首先需要训练一个自编码器,自编码器包括:一个完全对称的神经网络的编码器和解码器,编码器完成对输入轨迹数据进行编码,将高维轨迹数据特征映射到低维轨迹数据特征的任务,解码器则和编码器相反,解码器从自编码器网络低维的轨迹数据特征恢复原始输入数据;本实施例中,自编码器网络是一个具有9层网络的,第1层为船舶轨迹的输入特征维度,设输入特征维度为682维,第2层和第3层都是500维,第3-4层为200维,5层为10维,第6层为200维,第7,8层为500维,第9层为数据特征维度为682维,中间层均使用ReLU函数作为激活函数,该自编码器网络输出的是一个10维的特征。为了衡量输入向量和输出向量之间的差异,训练神经网络使用均方误差(MSE)作为损失函数,实验中所有神经网络均采用全连接的形式。
初始化聚类中心:为了得到初始化的聚类中心,可以使用基于欧氏距离的k-means算法,得到初试聚点。基于欧氏距离的k-means算法对轨迹特征集Z进行聚类,聚类后的类簇数为K,每个类簇的中心为μ j,1≤j≤K,轨迹数据集(Trj 1,Trj 2,…)经自编码器特征提取,得到的低维特征空间集合为Z=(z 1,z 2,…),初始聚点集合为μ=(μ 12,…,μ K)。
参阅图4。深度聚类网络结如图4所示。将预训练好的自编码器网络中编码器部分取出来,加入至聚类层中形成深度聚类网络。为了能够衡量轨迹之间的相似度。基于机器学习中的t-SNE思想,需要构建两个分布,通过拉近两个分布间的距离实现迭代聚类。首先,将数据与聚类中心聚点的欧氏距离转化为条件概率来表征数据点分配给聚类中心的概率,计算轨迹特征嵌入点Z i分配给初始化聚点μ j的概率q ij,也叫软分配概率,作为聚类的初始目标分布,
Figure PCTCN2022083839-appb-000004
其中,z i是轨迹Trj i经编码器网络映射后在低维特征空间里的特征嵌入点,μ j是第j个聚类的中心,α是t-SNE分布的自由度,通常设置为1。
然后,建立一个可用下式表示构建的辅助目标分布p ij:随机邻接嵌入(SNE),将数据点之间的高维欧几里的距离转换为表示相似性的条件概率q j|i,得到轨迹i分配给类簇中心μ j的概率值q ij
Figure PCTCN2022083839-appb-000005
其中,软聚类概率
Figure PCTCN2022083839-appb-000006
构建损失函数训练聚类。为使聚类层的软分配概率值q ij和辅助目标分布p ij接近一致,使用相对熵KL散度来衡量两分布之间的差异,将其作为深度聚类网络的损失函数,可表示为:
Figure PCTCN2022083839-appb-000007
使用梯度下降算法,分别求出损失函数L相对于每个特征嵌入点z i和聚类中心μ j的梯度,如下式所示:
Figure PCTCN2022083839-appb-000008
训练过程中,为了同时学习非线性映射参数θ(步骤S2只是预训练)和聚类中心μ j(K-mean得到的只是初始聚类中心)。
参阅图6。下面以某海域的真实船舶AIS数据为例。整个实验数据源包含AIS数据,原始数据包含船MMSI号水上移动通信业务标识码MMSI、数据接收时间BaseDataTime、位置维度LAT、位置精度LON、对地速度SOG、对地航向COG、船首向Heading、航行状态Status、船舶类型VesselType等属性,原始AIS数据.csv文件,船在出海远离港口后,其航行路线呈发散状态。因此,实施例中提取港口的AIS轨迹数据空间跨度为(最小精度:-123.93299,最大精度:-112.64193;最小维度48.10732,最大维度48.50108),共计104930个轨迹点,可视化后如图6所示。
异常点删除。实施例中,去除AIS轨迹数据点数量少于100的船舶轨迹删除,因为其不形成明显的航线,同时根据某条船舶轨迹中最大的两点之间的距离,删除团状轨迹,删除明显出现轨迹点跳跃的值,如图8所示最后一个轨迹点,其时间属性突然跳到轨迹中间,明显错误需要删除。
参阅图9。本实施例设置预训练次数100次,数据处理批次大小为8,迭代停止条件2*10 -3,最大迭代次数为2*10 4,初始化类簇个数18,设置t-SNE分布的自由度α=1。将预处理后的AIS轨迹数据进行深度聚类,聚类结果如图9所示。
以上所述为本发明较佳实施例,应该注意的是上述实施例对本发明进行说明,然而本发明并不局限于此,并且本领域技术人员在脱离所附权利要求的范围情况下可设计出替换实施例。对于本领域内的普通技术人员而言,在不脱离本发明的精神和实质的情况下,可以做出各种变型和改进,这些变型和改进也视为本发明的保护范围。

Claims (10)

  1. 一种海量高维AIS轨迹数据聚类方法,包括如下步骤:
    1)AIS轨迹数据预处理:提取船舶轨迹数据,将MMSI号相同的轨迹点作为一条轨迹,依据航向信息分成多条轨迹,对于属于轨迹中的异常点,偏离所有轨迹点的异常点进行删除,计算删除异常点后需要插入的轨迹点数量,对删除异常点后会出现的轨迹点空缺,以及原始AIS数据存在的缺失值进行线性插值填充和数据补全;对插值补全后的船舶自动识别系统AIS数据进进行归一化处理,将轨迹点中的每一个属性分量映射到0~1的范围内;
    2)预训练自编码器网络:预训练由一个编码器和解码器构成的自编码器网络,将预处理后的AIS轨迹数据输入到自编码器网络进行循环迭代,经多次循环迭代后,完成自编码器网络“输入-降维-特征-升维-重构”过程,自编码器网络编码器部分参数的初始化成功后,输出降维后的轨迹特征嵌入点Z i
    3)初始化聚类中心:基于欧氏距离的k-means算法,对自编码器里编码器部分提取的低维轨迹特征空间集合进行聚类,得到聚类中心的初始化聚点μ j
    4)构建深度聚类网络:将预训练好的编码器加入聚类层构建深度聚类网络,基于机器学习中的t-SNE思想,需要构建两个分布,通过拉近两个分布间的距离实现迭代聚类;
    首先将数据与聚类中心聚点的欧氏距离转化为条件概率来表征数据点分配给聚类中心的概率,计算轨迹特征嵌入点Z i分配给初始化聚点μ j的软分配概率,根据初始聚类目标分布,构建辅助目标分布用来衡量样本的聚类目标分布,将KL散度作为深度聚类网络损失函数,采用梯度下降算法,分别求出损失函数L相对于每个轨迹特征嵌入点Z i和聚类中心μ j的梯度,拉近两个目标分布之间的距离,形成一个概率分布列,当有2个连续迭代之间的簇分配变化小于设定值时,聚类过程停止,得到最终聚类结果。
  2. 如权利要求1所述的海量高维AIS轨迹数据聚类方法,其中:在AIS数据预处理中,对于同一艘船区域内往返多次的情况,依据航向信息分成多条轨迹,其中,轨迹点p i=(t,lon,lat,sog,head),第j条轨迹表示为
    Figure PCTCN2022083839-appb-100001
    式中,i=1,2,…n,n表示轨迹中包含的轨迹点数,t表示轨迹点采集的时间,lon表示经度,lat表示纬度,sog表示对地航向,head表示船首向。
  3. 如权利要求2所述的海量高维AIS轨迹数据聚类方法,其中:对于删除异常点后出现的轨迹点空缺,以及原始AIS数据存在中间值缺失,进行线性插值,对缺失值进行填充;当相邻两轨迹点的时间间隔大于给定阈值时,计算需要插入的轨迹点数量,然后再进行插值处理,首先计算需要插值的b、a两轨迹点之间的时间间隔,得到需要插入轨迹点的数量:
    Figure PCTCN2022083839-appb-100002
    得到插入轨迹点的数量N之后,对轨迹的经纬度、对地航速和船首向进行插值处理,计算时间段内缺失的船舶轨迹数据p i
    Figure PCTCN2022083839-appb-100003
    其中,t(p b-p a)表示轨迹点P b,P a间的时间间隔,t threshold为预定义的时间阈值。
  4. 如权利要求1所述的海量高维AIS轨迹数据聚类方法,其中:为加速网络的训练速度和提高计算效率,给定一个N个高维的数据x 1、x 2、…x N,首先计算概率p ji,正比于数据点x i与x j之间的相似度,将轨迹点中的每一个属性分量映射到0~1的范围内,对插值补全AIS数据,然后进行归一化处理,得到归一化后经度lon,纬度lat,对地航向sog和船首向head的归一化后的属性值x':
    Figure PCTCN2022083839-appb-100004
    此时所有轨迹点的属性值均映射到0~1范围内;
    式中,N是数据样本数量,x为归一化前的属性值,x max为最大属性值,x min为最小属性值。
  5. 如权利要求1所述的海量高维AIS轨迹数据聚类方法,其中:在预训练自编码器网络中,提取轨迹特征,将预处理后轨迹Trj i输入到自动编码器网络进行训练,原始AIS轨迹T i经预处理后,经多次循环迭代网络训练轨迹点P i=(t,lon,lat,sog,head),完成“输入-降维-特征-升维-重构”过程,形成轨迹特征数据为,
    Trj i=(p i1,p i2,…,p im)  (5)
    自动编码器输出轨迹特征数据Trj i降维后的特征,编码器将高维数据空间映射到低维数据空间的神经网络:
    f(Trj i,θ)=z i  (6)
    其中,p i表示第i个轨迹点,i=1,2,…,m,m表示该轨迹中包含的轨迹点数,f为非线性映射函数,t表示轨迹点采集的时间,θ为神经网络中可学习的非线性映射参数,z i是轨迹Trj i经编码器网络映射后在低维特征空间里的特征嵌入点。
  6. 如权利要求1所述的海量高维AIS轨迹数据聚类方法,其中:自编码器包括:一个完全对称的神经网络的自动编码器和解码器,自编码器完成对输入轨迹数据进行编码,将高维轨迹 数据特征映射到低维轨迹数据特征的任务,解码器则和自编码器相反,解码器从自编码器网络低维的轨迹数据特征恢复原始输入数据;自编码器网络是一个具有9层网络的,第1层为船舶轨迹的输入特征维度,设输入特征维度为682维,第2层和第3层都是500维,第3-4层为200维,5层为10维,第6层为200维,第7,8层为500维,第9层为数据特征维度为682维,中间层均使用ReLU函数作为激活函数,该自编码器网络输出的是一个10维的特征。
  7. 如权利要求1所述的海量高维AIS轨迹数据聚类方法,其中:为了得到初始化的聚类中心,可以使用基于欧氏距离的k-means算法,对轨迹特征集Z进行基于欧氏距离的k-means聚类,得到初试聚点,聚类后的类簇数为K,每个类簇的中心为μ j,1≤j≤K,轨迹数据集(Trj 1,Trj 2,…),经自编码器特征提取,得到的低维特征空间集合为Z=(z 1,z 2,…),初始聚点集合为μ=(μ 12,…,μ K)。
  8. 如权利要求1所述的海量高维AIS轨迹数据聚类方法,其中:利用自编码器网络的降维重构功能,对原始的高维数据进行降维和特征提取,然后取出编码器部分,加入聚类层构建深度嵌入聚类网络。基于机器学习中的t-SEN思想,将数据与聚类中心聚点的欧氏距离转化为条件概率来表征数据点分配给聚类中心的概率,计算轨迹特征嵌入点Z i分配给初始化聚点μ j的软分配概率(初始聚类目标分布),
    Figure PCTCN2022083839-appb-100005
    其中,z i是轨迹Trj i经编码器网络映射后在低维特征空间里的特征嵌入点,μ j是第j个聚类的中心,α是t-SNE分布的自由度,通常设置为1。
  9. 如权利要求8所述的海量高维AIS轨迹数据聚类方法,其中:建立一个可用下式表示构建的辅助目标分布的概率值p ij,用来衡量样本属于某个聚类的分布,将数据点之间的高维欧几里得距离转换为表示相似性的条件概率q j|i,得到轨迹i分配给聚类中心μ j的软分配概率值p ij,
    Figure PCTCN2022083839-appb-100006
    其中,软聚类概率
    Figure PCTCN2022083839-appb-100007
  10. 如权利要求1所述的海量高维AIS轨迹数据聚类方法,其中:构建损失函数训练聚类。为使聚类层的软分配概率值q ij和辅助目标分布p ij接近一致,使用相对熵KL散度来衡量两分布之间的差异,将其作为深度聚类网络的损失函数,
    Figure PCTCN2022083839-appb-100008
    训练过程中,使用梯度下降算法,分别求出损失函数L相对于每个特征嵌入点z i和聚类中心μ j的梯度,如下式所示:
    Figure PCTCN2022083839-appb-100009
    同时学习非线性映射参数θ和聚类中心μ j
PCT/CN2022/083839 2021-08-31 2022-03-29 海量高维ais轨迹数据聚类方法 WO2023029461A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111012775.7A CN113780395B (zh) 2021-08-31 2021-08-31 海量高维ais轨迹数据聚类方法
CN202111012775.7 2021-08-31

Publications (1)

Publication Number Publication Date
WO2023029461A1 true WO2023029461A1 (zh) 2023-03-09

Family

ID=78840433

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083839 WO2023029461A1 (zh) 2021-08-31 2022-03-29 海量高维ais轨迹数据聚类方法

Country Status (2)

Country Link
CN (1) CN113780395B (zh)
WO (1) WO2023029461A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116522143A (zh) * 2023-05-08 2023-08-01 深圳市大数据研究院 模型训练方法、聚类方法、设备及介质
CN117349688A (zh) * 2023-12-01 2024-01-05 中南大学 一种基于峰值轨迹的轨迹聚类方法、装置、设备及介质
CN117491987A (zh) * 2023-12-29 2024-02-02 海华电子企业(中国)有限公司 基于lstm神经网络和时空运动距离算法的船舶轨迹拼接方法
CN117523382A (zh) * 2023-07-19 2024-02-06 石河子大学 一种基于改进gru神经网络的异常轨迹检测方法
CN117611862A (zh) * 2023-12-11 2024-02-27 中国科学院空天信息创新研究院 Ais轨迹聚类方法、装置、电子设备和存储介质
CN117688257A (zh) * 2024-01-29 2024-03-12 东北大学 一种面向异构用户行为模式的长期轨迹预测方法
CN118017534A (zh) * 2024-04-09 2024-05-10 国网山西省电力公司晋城供电公司 基于层次聚类的新能源厂站协同优化电压控制方法及设备
CN118279678A (zh) * 2024-06-04 2024-07-02 中国人民解放军海军航空大学 面向未知类型船舶目标的开集识别方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113780395B (zh) * 2021-08-31 2023-02-03 西南电子技术研究所(中国电子科技集团公司第十研究所) 海量高维ais轨迹数据聚类方法
CN114613037B (zh) * 2022-02-15 2023-07-18 中国电子科技集团公司第十研究所 一种机载融合信息引导传感器提示搜索方法及装置
CN114637931B (zh) * 2022-03-29 2024-04-02 北京工业大学 基于流形上序列子空间聚类的出行模式检测方法
CN115730742B (zh) * 2022-12-01 2024-01-16 中远海运科技股份有限公司 一种在航集装箱班轮航线预测方法及系统
CN116160444B (zh) * 2022-12-31 2024-01-30 中国科学院长春光学精密机械与物理研究所 基于聚类算法的机械臂运动学逆解的优化方法、装置
CN115952364B (zh) * 2023-03-07 2023-05-23 之江实验室 一种路线推荐的方法、装置、存储介质及电子设备
CN116342657B (zh) * 2023-03-29 2024-04-26 西安电子科技大学 一种基于编码-解码结构的tcn-gru船舶轨迹预测方法、系统、设备及介质
CN116342915B (zh) * 2023-05-30 2024-06-25 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) 一种深度图像聚类方法、系统及存储介质
CN118098025B (zh) * 2024-04-24 2024-07-02 中国民航大学 基于改进K-means算法的空域交通流聚类方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200133269A1 (en) * 2018-10-30 2020-04-30 The Regents Of The University Of Michigan Unsurpervised classification of encountering scenarios using connected vehicle datasets
CN111178427A (zh) * 2019-12-27 2020-05-19 杭州电子科技大学 一种基于Sliced-Wasserstein距离的深度自编码嵌入聚类的方法
CN111694913A (zh) * 2020-06-05 2020-09-22 海南大学 一种基于卷积自编码器的船舶ais轨迹聚类方法和装置
CN112884010A (zh) * 2021-01-25 2021-06-01 浙江师范大学 基于自编码器的多模态自适应融合深度聚类模型及方法
CN113780395A (zh) * 2021-08-31 2021-12-10 西南电子技术研究所(中国电子科技集团公司第十研究所) 海量高维ais轨迹数据聚类方法
CN113988203A (zh) * 2021-11-01 2022-01-28 之江实验室 一种基于深度学习的轨迹序列聚类方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200133269A1 (en) * 2018-10-30 2020-04-30 The Regents Of The University Of Michigan Unsurpervised classification of encountering scenarios using connected vehicle datasets
CN111178427A (zh) * 2019-12-27 2020-05-19 杭州电子科技大学 一种基于Sliced-Wasserstein距离的深度自编码嵌入聚类的方法
CN111694913A (zh) * 2020-06-05 2020-09-22 海南大学 一种基于卷积自编码器的船舶ais轨迹聚类方法和装置
CN112884010A (zh) * 2021-01-25 2021-06-01 浙江师范大学 基于自编码器的多模态自适应融合深度聚类模型及方法
CN113780395A (zh) * 2021-08-31 2021-12-10 西南电子技术研究所(中国电子科技集团公司第十研究所) 海量高维ais轨迹数据聚类方法
CN113988203A (zh) * 2021-11-01 2022-01-28 之江实验室 一种基于深度学习的轨迹序列聚类方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Master's Thesis", 1 June 2020, HEFEI UNIVERSITY OF TECHNOLOGY, CN, article WEN, PENGFEI: "Research on Track Recognition and Clustering Based on Radar Data", pages: 1 - 68, XP009544124, DOI: 10.27101/d.cnki.ghfgu.2020.001861 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116522143A (zh) * 2023-05-08 2023-08-01 深圳市大数据研究院 模型训练方法、聚类方法、设备及介质
CN116522143B (zh) * 2023-05-08 2024-04-05 深圳市大数据研究院 模型训练方法、聚类方法、设备及介质
CN117523382B (zh) * 2023-07-19 2024-06-04 石河子大学 一种基于改进gru神经网络的异常轨迹检测方法
CN117523382A (zh) * 2023-07-19 2024-02-06 石河子大学 一种基于改进gru神经网络的异常轨迹检测方法
CN117349688B (zh) * 2023-12-01 2024-03-19 中南大学 一种基于峰值轨迹的轨迹聚类方法、装置、设备及介质
CN117349688A (zh) * 2023-12-01 2024-01-05 中南大学 一种基于峰值轨迹的轨迹聚类方法、装置、设备及介质
CN117611862A (zh) * 2023-12-11 2024-02-27 中国科学院空天信息创新研究院 Ais轨迹聚类方法、装置、电子设备和存储介质
CN117491987A (zh) * 2023-12-29 2024-02-02 海华电子企业(中国)有限公司 基于lstm神经网络和时空运动距离算法的船舶轨迹拼接方法
CN117491987B (zh) * 2023-12-29 2024-04-09 海华电子企业(中国)有限公司 基于lstm神经网络和时空运动距离算法的船舶轨迹拼接方法
CN117688257A (zh) * 2024-01-29 2024-03-12 东北大学 一种面向异构用户行为模式的长期轨迹预测方法
CN118017534A (zh) * 2024-04-09 2024-05-10 国网山西省电力公司晋城供电公司 基于层次聚类的新能源厂站协同优化电压控制方法及设备
CN118017534B (zh) * 2024-04-09 2024-06-04 国网山西省电力公司晋城供电公司 基于层次聚类的新能源厂站协同优化电压控制方法及设备
CN118279678A (zh) * 2024-06-04 2024-07-02 中国人民解放军海军航空大学 面向未知类型船舶目标的开集识别方法

Also Published As

Publication number Publication date
CN113780395A (zh) 2021-12-10
CN113780395B (zh) 2023-02-03

Similar Documents

Publication Publication Date Title
WO2023029461A1 (zh) 海量高维ais轨迹数据聚类方法
Gao et al. Ship-handling behavior pattern recognition using AIS sub-trajectory clustering analysis based on the T-SNE and spectral clustering algorithms
CN109508360B (zh) 一种基于元胞自动机的地理多元流数据时空自相关分析方法
CN112883839B (zh) 基于自适应样本集构造与深度学习的遥感影像解译方法
CN106372402A (zh) 一种大数据环境下模糊区域卷积神经网络的并行化方法
CN114926469B (zh) 语义分割模型训练方法、语义分割方法、存储介质及终端
CN109581339B (zh) 一种基于头脑风暴自动调整自编码网络的声呐识别方法
CN111008337B (zh) 一种基于三元特征的深度注意力谣言鉴别方法及装置
CN114120110A (zh) 一种混杂场景机载激光点云分类的多粒度计算方法
CN114092697B (zh) 注意力融合全局和局部深度特征的建筑立面语义分割方法
Nurunnabi et al. An efficient deep learning approach for ground point filtering in aerial laser scanning point clouds
CN115619963B (zh) 一种基于内容感知的城市建筑物实体建模方法
CN110363299B (zh) 面向露头岩层分层的空间案例推理方法
CN116824585A (zh) 一种基于多级上下文特征融合网络的航空激光点云语义分割方法与装置
CN116662468A (zh) 基于地理对象空间模式特征的城市功能区识别方法及系统
CN112200248A (zh) 一种基于dbscan聚类的城市道路环境下的点云语义分割方法、系统及存储介质
Kumar et al. Feature relevance analysis for 3D point cloud classification using deep learning
CN112257496A (zh) 一种基于深度学习的输电通道周围环境分类方法及系统
Widyantara et al. Automatic identification system-based trajectory clustering framework to identify vessel movement pattern
Chen et al. A superpixel-guided unsupervised fast semantic segmentation method of remote sensing images
Mao et al. Comparison of wave-cluster and DBSCAN algorithms for landslide susceptibility assessment
CN115019163A (zh) 基于多源大数据的城市要素识别方法
Lin et al. Research on denoising and segmentation algorithm application of pigs’ point cloud based on DBSCAN and PointNet
CN117493994A (zh) 一种基于ais数据的船舶航迹预测方法及系统
CN115205693B (zh) 一种多特征集成学习的双极化sar影像浒苔提取方法

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22862608

Country of ref document: EP

Kind code of ref document: A1