CN110363115B - AIS (automatic identification system) track data based ship operation abnormity semi-supervised real-time detection method - Google Patents

AIS (automatic identification system) track data based ship operation abnormity semi-supervised real-time detection method Download PDF

Info

Publication number
CN110363115B
CN110363115B CN201910574738.1A CN201910574738A CN110363115B CN 110363115 B CN110363115 B CN 110363115B CN 201910574738 A CN201910574738 A CN 201910574738A CN 110363115 B CN110363115 B CN 110363115B
Authority
CN
China
Prior art keywords
sample
model
time
anomaly
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910574738.1A
Other languages
Chinese (zh)
Other versions
CN110363115A (en
Inventor
钱诗友
程彬
曹健
薛广涛
李明禄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201910574738.1A priority Critical patent/CN110363115B/en
Publication of CN110363115A publication Critical patent/CN110363115A/en
Application granted granted Critical
Publication of CN110363115B publication Critical patent/CN110363115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Alarm Systems (AREA)

Abstract

The invention provides a semi-supervised real-time detection method for ship operation abnormity based on AIS track data. During port or river dredging operations, some work vessels do not transport the sludge as expected to a designated dumping area, but throw it near the work area, which means that the sludge may be returned to the port or channel in a short time. The invention provides a dredging operation abnormity semi-supervised detection method based on Automatic Identification System (AIS) data. First, the present invention establishes a feature system for extracting behavioral features from AIS data. In addition, T-distribution random neighborhood embedding (T-SNE) is combined with a Gaussian Mixture Model (GMM) through a neural network, and a detection model is trained in a semi-supervised mode. By training the model, abnormal behaviors in the dredging operation process can be effectively detected in real time.

Description

AIS (automatic identification system) track data based ship operation abnormity semi-supervised real-time detection method
Technical Field
The invention relates to abnormal behavior detection of a port or river dredging operation ship, in particular to a ship operation abnormity semi-supervised real-time detection method based on AIS (Automatic Identification System), namely an Automatic ship Identification System.
Background
Both ports and rivers require regular maintenance dredging to gain access to ports and channels. During dredging operations, the vessel excavates the sludge from the work area and throws it in a designated dumping area. However, in actual dredging operations, some dredging vessels may throw sludge to a location near the work area, rather than to a specific dumping area. Meanwhile, the vessel may encounter a vessel traveling on a channel during a dredging operation, and thus must be avoided by leaving the channel. Therefore, when the dredging vessel is sailing away from the working area, it is necessary to distinguish the abnormal illegal dumping behavior of the sludge from the normal avoidance behavior, i.e. the abnormal detection of the operation of the dredging vessel. The accurate detection of the abnormity of the dredging ship not only standardizes the behavior of the dredging ship in the operation process, but also ensures the quality of the whole project.
Aiming at the abnormal detection of inland navigation, patent document CN106816038A discloses an automatic identification system and method for inland water area abnormal behavior ships, which analyze the abnormal behavior mode types of the ships and establish a ship abnormal behavior sample library by acquiring AIS message information, environmental information of a hydrological meteorological department, CCTV video images and depth images of inland water area ship navigation in real time; establishing a deep learning network model to analyze ship behaviors and acquiring a ship abnormal behavior mode and GPS positioning information; detecting a ship in the CCTV video image, acquiring three-dimensional space information of the ship by combining the depth-of-field image, and acquiring the video positioning information of the ship: and fusing GPS positioning information, video positioning information, ship abnormal behavior modes and ship detection characteristics, performing ship target association, and automatically identifying the ship with abnormal behavior on the CCTV video.
Patent documents for detecting the operation abnormality of the dredging vessel are few, and generally, the conventional method for detecting the operation abnormality of the dredging vessel can be classified into three types of methods, i.e., unsupervised learning, supervised learning and semi-supervised learning, according to whether a label of a sample is available or not.
The first method is unsupervised learning, using unlabeled samples. The basic idea of these methods is to find anomalies on some metric. The anomaly is determined by a threshold or a ratio based on the metric. There are several representative methods for unsupervised anomaly detection. The feature of not requiring labels is a major advantage of unsupervised learning methods. However, the anomaly data needs to be divided by a threshold or scale, making this approach inflexible.
The second approach is supervised learning, treating anomaly detection as a classification problem, with the goal of classifying data as normal or abnormal. Thus, traditional machine learning and deep learning classification methods may be applied to train the classification model. Some classical supervised learning classification methods are proposed. With the provision of labels, the supervised learning approach achieves better performance than the unsupervised learning approach.
The third approach, semi-supervised learning, is a compromise between the first two approaches, since the cost of tags obtained in practical problems is usually high. It requires a small amount of labeled data, but makes a considerable improvement in learning accuracy compared to unsupervised learning. In general, semi-supervised learning algorithms can be classified as: a generation method, a co-training method, a graph-based semi-supervised learning method and a semi-supervised support vector machine (S3 VM). The generation method is based on a generative model. It assumes that both labeled and unlabeled data are generated from the same underlying model, which connects the unlabeled data to the learning objective through the parameters of the underlying model. The co-training method assumes that each data can be classified from different views, where different classification models can be trained from labeled data. Based on the trained model, unlabeled data can be classified, and reliable classification data can then be added to the labeled data. The graph-based semi-supervised learning approach constructs a graph with data points as vertices and similarities between points as edges. Unmarked points that have greater similarity to marked points will be marked as the same mark. Unlike SVMs, S3VM attempts to find a partitioned hyperplane that separates the two types of labeled data and passes through low density regions of the data.
The existing method cannot be directly used for solving the problem of abnormal detection of the on-line dredging operation. The method proposed by the present invention is inspired by the generation method. To convert a real problem into an anomaly detection problem, some basic features need to be extracted through feature engineering. Since direct application to high dimensional data may lead to poor performance, the present invention performs dimension reduction by t-SNE before using the generation method.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a ship operation abnormity semi-supervised real-time detection method based on AIS track data.
The invention provides a ship operation abnormity semi-supervised real-time detection method based on AIS track data, which comprises a preprocessing stage, an off-line model training stage and an on-line detection stage;
a pretreatment stage: filtering the operation track by using a characteristic system, and extracting behavior characteristics;
an off-line model training stage: performing off-line training on the model parameters according to the behavior characteristics of the operation track obtained in the preprocessing stage;
and (3) an online detection stage: and executing an online abnormity detection task, and giving an alarm aiming at potential abnormal behaviors during the dredging operation.
Preferably, the features in the pre-processing stage comprise global features and/or real-time features.
Preferably, the global features comprise any one or any combination of the following eight features:
1) total time s of one work trajectory T;
2) a total travel distance d of one working trajectory T;
3) number of turns n of ship in operation areaturn
4) Total number of times n of exiting from work area within one workout
5) Total stay time t outside working area in one operationout
6) Total distance d traveled outside the working area in one operationout
7) Maximum dwell time max outside the working area within a single operationt
8) Maximum distance max from boundary after leaving working aread
Preferably, the real-time features include any one or any combination of the following four features:
1) time t of leaving work areaoutc
2) Distance d from working areaoutc
3) Time t of last leaving work arealast
4) The current speed v.
Preferably, the off-line model training phase comprises a nonlinear dimension-reduced T-SNE model and/or a probability model of the behavior feature modeling after dimension reduction.
Preferably, the T-SNE model calculates the probability of similarity of points in a corresponding low dimensional space by calculating the probability of similarity of points in a high dimensional space, data point xiAnd data point xjConditional probability of similarity pj|iSatisfies the following conditions:
Figure BDA0002111805750000031
wherein S (x)i,xj) Is xiAnd xjI, j and k are label numbers of data points; in high dimensional space, S (x)i,xj) Is defined as
Figure BDA0002111805750000032
In the lower dimensional space, S (x)i,xj) Defined as (1+ xi-xj2) -1.
Preferably, a Gaussian mixture model GMM is used as a probability model for modeling the behavior characteristics after dimensionality reduction;
the goal of Gaussian mixture model GMM for semi-supervised learning is to maximize log-likelihood, i.e.
Figure BDA0002111805750000033
Where log is a mathematical logarithmic operation, L (θ) represents a loss function, PθRepresenting a posterior probability; x is the number ofrAnd
Figure BDA0002111805750000041
is a sample with a label and its label, xuIs an unlabeled sample, θ is a GMM model parameter, including wi、μiSum-sigmaiAnd respectively represent the scale, sample mean and sample covariance of the class i.
Preferably, a parameter θ that maximizes logL (θ) is found using a maximum likelihood algorithm EM; the maximum likelihood algorithm EM comprises a step E and a step M;
in step E, the posterior probability P of the unlabeled sample is calculatedθ(Ci|xu) Namely:
Figure BDA0002111805750000042
wherein, CiRepresenting category i, c representing the total number of categories, xuIs an unlabeled sample, θ is a GMM model parameter, including wi、μiSum-sigmaiRespectively representing the proportion, the sample mean and the sample covariance of the class i;
at step M, the model parameters θ are updated according to their definitions, i.e.:
Figure BDA0002111805750000043
Figure BDA0002111805750000044
Figure BDA0002111805750000045
where N is the total number of samples, NiIs of class CiNumber of samples of (1), xrRepresents a sample, xuIs an unlabeled sample. w is ai、μiSum-sigmaiRespectively representing the proportion, the sample mean and the sample covariance of the class i;
and repeating the step E and the step M until theta converges.
The invention provides a ship operation abnormity semi-supervised real-time detection method based on AIS track data, which comprises a preprocessing stage, an off-line model training stage and an on-line detection stage;
a pretreatment stage: in the pre-treatment phase, according to the positions of the working area and the sludge dumping area, a plurality of working tracks are firstly divided from the original tracks collected from the AIS: for each work trajectory, points outside the work area are extracted and characterized by a feature system: after preprocessing, trace points with 12 features are extracted as training samples for the next stage:
an off-line model training stage: first, the original samples are mapped to two-dimensional space by t-SNE: at the same time, a fully connected neural network containing three hidden layers is trained, after t-SNE, the mapped data is used as input and the maximum likelihood algorithm EM is used to train the gaussian mixture model GMM: the gaussian mixture model GMM is generated from a mixture of two gaussian distributions corresponding to normal and abnormal classes:
an online anomaly detection stage: the main objective of the on-line anomaly detection stage is to detect the track points p of the ongoing dredging operation in real timeiWhether the operation is normal or not: for points outside the working area, firstly extracting real-time features, and then mapping the real-time features to a two-dimensional space by a trained neural network: subsequently, the point m is mappediIs fed into a Gaussian mixture model GMM to obtain P (C)anomaly|mi) And P (C)normal|mi) Each represents miProbability of belonging to abnormal and normal classes: if P (C)anomaly|mi) Greater than P (C)normal|mi),piIt can be identified that abnormal behavior is likely to occur:
to evaluate the degree of abnormality of one job, the abnormality score of one job may be defined as,
Figure BDA0002111805750000051
wherein, W represents a working area,
Figure BDA0002111805750000052
points outside the working area, dist (p)i,pi-1) Represents a point piAnd pi-1The distance between: the anomaly score is accumulated from the anomaly values for points outside of each job region,the outlier is calculated by multiplying the probability of the anomaly by the distance to the previous point.
According to the invention, a computer readable storage medium is provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above-mentioned AIS trajectory data-based ship operation anomaly semi-supervised real-time detection method.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a real-time detection method for abnormal behavior of a working ship, which is very critical in real-time detection in construction operation management, because when abnormal alarm is detected in real time, an inspector can go to the site immediately to collect evidence, which cannot be realized by an offline detection method.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic diagram of an online desilting operation anomaly detection framework of the present invention;
FIG. 2 is a diagram of the relationship between t-SNE and GMM model.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention provides a ship operation abnormity semi-supervised real-time detection method based on AIS track data, which comprises a preprocessing stage, an off-line model training stage and an on-line detection stage;
a pretreatment stage: filtering the operation track by using a characteristic system, and extracting behavior characteristics;
an off-line model training stage: performing off-line training on the model parameters according to the behavior characteristics of the operation track obtained in the preprocessing stage;
and (3) an online detection stage: and executing an online abnormity detection task, and giving an alarm aiming at potential abnormal behaviors during the dredging operation.
In particular, the features in the pre-processing stage comprise global features and/or real-time features. The global features comprise any one or any combination of the following eight features:
1) total time s of one work trajectory T;
2) a total travel distance d of one working trajectory T;
3) number of turns n of ship in operation areaturn
4) Total number of times n of exiting from work area within one workout
5) Total stay time t outside working area in one operationout
6) Total distance d traveled outside the working area in one operationout
7) Maximum dwell time max outside the working area within a single operationt
8) Maximum distance max from boundary after leaving working aread
The real-time characteristics comprise any one or any combination of the following four characteristics:
1) time t of leaving work areaoutc
2) Distance d from working areaoutc
3) Time t of last leaving work arealast
4) The current speed v.
The off-line model training stage comprises a nonlinear dimension-reduced T-SNE model and/or a probability model of behavior characteristic modeling after dimension reduction. The T-SNE model calculates the probability of similarity of points in a corresponding low dimensional space by calculating the probability of similarity of points in a high dimensional space, data point xiAnd data point xjConditional probability of similarity pj|iSatisfies the following conditions:
Figure BDA0002111805750000061
wherein S (x)i,xj) Is xiAnd xjI, j and k are label numbers of data points; in high dimensional space, S (x)i,xj) Is defined as
Figure BDA0002111805750000062
In the lower dimensional space, S (x)i,xj) Defined as (1+ xi-xj2) -1.
Adopting a Gaussian mixture model GMM as a probability model for modeling the behavior characteristics after dimensionality reduction; the goal of Gaussian mixture model GMM for semi-supervised learning is to maximize log-likelihood, i.e.
Figure BDA0002111805750000071
Where log is a mathematical logarithmic operation, L (θ) represents a loss function, PθRepresenting a posterior probability; x is the number ofrAnd
Figure BDA0002111805750000072
is a sample with a label and its label, xuIs an unlabeled sample, θ is a GMM model parameter, including wi、μiSum-sigmaiAnd respectively represent the scale, sample mean and sample covariance of the class i.
More specifically, a parameter θ that maximizes logL (θ) is found using a maximum likelihood algorithm EM; the maximum likelihood algorithm EM comprises a step E and a step M;
in step E, the posterior probability P of the unlabeled sample is calculatedθ(Ci|xu) Namely:
Figure BDA0002111805750000073
wherein, CiRepresenting category i, c representing the total number of categories, xuIs a no-label sample, θ is the GMM modeType parameter including wi、μiSum-sigmaiRespectively representing the proportion, the sample mean and the sample covariance of the class i;
at step M, the model parameters θ are updated according to their definitions, i.e.:
Figure BDA0002111805750000074
Figure BDA0002111805750000075
Figure BDA0002111805750000076
where N is the total number of samples, NiIs of class CiNumber of samples of (1), xrRepresents a sample, xuIs an unlabeled sample. w is ai、μiSum-sigmaiRespectively representing the proportion, the sample mean and the sample covariance of the class i;
and repeating the step E and the step M until theta converges.
The invention provides a ship operation abnormity semi-supervised real-time detection method based on AIS track data, which comprises a preprocessing stage, an off-line model training stage and an on-line detection stage;
a pretreatment stage: in the pre-treatment phase, according to the positions of the working area and the sludge dumping area, a plurality of working tracks are firstly divided from the original tracks collected from the AIS: for each work trajectory, points outside the work area are extracted and characterized by a feature system: after preprocessing, trace points with 12 features are extracted as training samples for the next stage:
an off-line model training stage: first, the original samples are mapped to two-dimensional space by t-SNE: at the same time, a fully connected neural network containing three hidden layers is trained, after t-SNE, the mapped data is used as input and the maximum likelihood algorithm EM is used to train the gaussian mixture model GMM: the gaussian mixture model GMM is generated from a mixture of two gaussian distributions corresponding to normal and abnormal classes:
an online anomaly detection stage: the main objective of the on-line anomaly detection stage is to detect the track points p of the ongoing dredging operation in real timeiWhether the operation is normal or not: for points outside the working area, firstly extracting real-time features, and then mapping the real-time features to a two-dimensional space by a trained neural network: subsequently, the point m is mappediIs fed into a Gaussian mixture model GMM to obtain P (C)anomaly|mi) And P (C)normal|mi) Each represents miProbability of belonging to abnormal and normal classes: if P (C)anomaly|mi) Greater than P (C)normal|mi),piIt can be identified that abnormal behavior is likely to occur:
to evaluate the degree of abnormality of one job, the abnormality score of one job may be defined as,
Figure BDA0002111805750000081
wherein, W represents a working area,
Figure BDA0002111805750000082
points outside the working area, dist (p)i,pi-1) Represents a point piAnd pi-1The distance between: the anomaly score is accumulated from the anomaly values for points outside each working area, which is calculated by multiplying the probability of an anomaly by the distance to the previous point.
According to the invention, a computer readable storage medium is provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the above-mentioned AIS trajectory data-based ship operation anomaly semi-supervised real-time detection method.
Further, preferred embodiments of the present invention are described below with reference to the accompanying drawings:
1. integrated framework
The invention is inspired by a generation method used in semi-supervised learning, detects abnormal dredging behaviors in a probabilistic mode, establishes a probabilistic model to simulate the distribution of the dredging behaviors and returns the dredging behaviors with high abnormal probability as abnormalities.
The use of probabilistic models to detect anomalies is a promising approach because the model can be viewed as a compact representation and summary of historical dredging trajectories. Therefore, for a new dredging behavior, only the probability of normality and abnormality is obtained through the probability model, and the historical dredging track does not need to be searched, which is beneficial to improving the detection speed.
Fig. 1 shows a frame of a ship operation anomaly detection method, which is mainly divided into three stages: the method comprises a preprocessing stage, an off-line model training stage and an on-line detection stage.
And in the preprocessing stage, a characteristic system is used for filtering the operation track and extracting the behavior characteristics.
The off-line model training phase is used as a key part of the system, and two models are adopted: one is a T-SNE model for non-linear dimensionality reduction, and the other is a probabilistic model modeled with post-dimensionality reduction behavioral features, i.e., Gaussian Mixture Model (GMM). And performing off-line training on the model parameters of the two models according to the behavior characteristics of the operation track obtained in the preprocessing stage.
After model parameter inference, an online anomaly detection task is performed, aimed at alerting about potential anomalous behavior during dredging operations.
2. Feature system
Since the raw AIS information cannot be used directly to detect abnormal behavior, some advanced features need to be built to distinguish these behaviors. These new features can be classified into global features and real-time features. Intuitively, global features are extracted to represent a complete dredging operation for offline detection of anomalies. When an anomaly is detected during a job, the global features are not available, so real-time features need to be designed to ensure that the detection method can still identify the anomalous behavior in real-time. In other words, the global feature is a representation of a dredging trajectory, while the real-time feature depicts a point in the dredging trajectory.
(1) Global features
Global features focus on metrics that can be collected when a job is completed. For a clearer description, the following 8 global features are extracted from the AIS data.
1) Total time s of one operation track T
2) Total travel distance d of one working track T
3) Number of turns n of ship in operation areaturn
4) Total number of times n of exiting from work area within one workout
5) Total stay time t outside working area in one operationout
6) Total distance d traveled outside the working area in one operationout
7) Maximum dwell time max outside the working area within a single operationt
8) Maximum distance max from boundary after leaving working aread
(2) Real-time features
The real-time feature is mainly used for online anomaly detection during operation. Since the job is not completed, the entire information cannot be captured. Nevertheless, the trajectory from the starting point to the current point can be considered as a complete job for some real-time statistical features ahead of time. Other real-time features also include:
1) time t of leaving work areaoutc
2) Distance d from working areaoutc
3) Time t of last leaving work arealast
4) Current velocity v
t-SNE feature dimension reduction
T-distributed random neighborhood embedding (T-SNE) is a visual machine learning algorithm. It is a nonlinear dimension reduction technique, and is very suitable for embedding high-dimensional data so as to carry out visualization in two-dimensional or three-dimensional low-dimensional space. the t-SNE algorithm calculates the probability of similarity of points in the corresponding low dimensional space by calculating the probability of similarity of points in the high dimensional space. Data point xiAnd data point xjConditional probability of similarity pj|i
Figure BDA0002111805750000101
Wherein S (x)i,xj) Is xiAnd xjI, j, k are index numbers of data points. In high dimensional space, S (x)i,xj) Is defined as
Figure BDA0002111805750000102
In the lower dimensional space, S (x)i,xj) Defined as (1+ xi-xj2) -1. The t-SNE then attempts to minimize the difference between these conditional probabilities in the high and low dimensional spaces in order to perfectly represent the data points in the lower dimensional space. To calculate the minimization of the sum of differences in conditional probabilities, the t-SNE uses a gradient descent method to minimize the sum of Kullback-Leibler divergence for all data points.
t-SNE learning is a non-parametric mapping, meaning that it does not learn an explicit function that maps data from higher dimensions to lower dimensions. Therefore, new points cannot be directly embedded in the existing map. One potential approach is to train multivariate regressors by directly minimizing the t-SNE loss function. Or to train the neural network to accommodate the mapping from the raw data to the lower dimensional data mapped by the t-SNE.
The present invention utilizes t-SNE to reduce the dimensionality of the feature data from a high-dimensional space to two dimensions and preserves the mapping through a fully connected neural network. Generally, there are two major advantages to using t-SNE. In one aspect, t-SNE may help visualize extracted feature data and check whether the data is generally separable. On the other hand, mapping data are more likely to be clustered into certain classes based on their similarity, which helps in training the GMM.
4.GMM
A Gaussian Mixture Model (GMM) is a probabilistic model that assumes that all data points are generated from a mixture of a finite number of gaussian distributions with unknown parameters. For semi-supervised learning, GMM is a generation method. Unlike GMM in supervised learning, GMM's goal for semi-supervised learning is to maximize log-likelihood, i.e.
Figure BDA0002111805750000103
log is a mathematical logarithmic operation, L (θ) represents a loss function, PθThe posterior probability is expressed.
Wherein x isrAnd
Figure BDA0002111805750000104
is a sample with a label and its label, xuIs an unlabeled sample, θ is a GMM model parameter, including wi、μiSum-sigmaiAnd respectively represent the scale, sample mean and sample covariance of the class i.
The maximum likelihood (EM) algorithm can be used to find the parameter θ that maximizes logL (θ). Unlike initialization used in supervised learning, the EM algorithm uses labeled samples to initialize the parameters of each gaussian model. The EM algorithm contains an E step and an M step.
In step E, the posterior probability P of the unlabeled sample is calculatedθ(Ci|xu) That is to say that,
Figure BDA0002111805750000105
wherein, CiRepresenting category i, c representing the total number of categories, xuIs an unlabeled sample, θ is a GMM model parameter, including wi、μiSum-sigmaiAnd respectively represent the scale, sample mean and sample covariance of the class i.
At step M, the model parameters θ are updated according to their definition, i.e.,
Figure BDA0002111805750000111
Figure BDA0002111805750000112
Figure BDA0002111805750000113
where N is the total number of samples, NiIs of class CiNumber of samples of (1), xrRepresents a sample, xuIs an unlabeled sample. w is ai、μiSum-sigmaiAnd respectively represent the scale, sample mean and sample covariance of the class i.
The E and M steps are repeated until theta converges.
When θ is fixed, the GMM model is deterministic. Thus, for a new sample x, it is easy to calculate the probability P that x belongs to class iθ(Ci| x). In other words, when a class is defined as normal or abnormal, we can easily obtain the probability that a sample is normal or abnormal.
GMM is well suited for anomaly detection in semi-supervised situations. Compared with an unsupervised anomaly detection method, the GMM can utilize the tag data which is difficult to come, thereby obtaining better performance. In contrast to most semi-supervised methods, GMMs are capable of soft allocation, rather than directly labeling samples as a class when classification is performed.
5. Dredging operation abnormity detection process
(1) Pretreatment of
In the pre-treatment phase, a plurality of working trajectories are first divided from the raw trajectory collected from the AIS according to the working area and the position of the sludge dumping area. For each work trajectory, points outside the work area are extracted and characterized by a feature system. After preprocessing, trace points with 12 features are extracted as training samples for the next stage.
(2) Offline model training
FIG. 2 shows the cooperative relationship between the t-SNE and GMM models. First, the original samples are mapped to a two-dimensional space by t-SNE. At the same time, a fully connected neural network containing three hidden layers, each with a number of elements of 5, 5 and 10, respectively, is trained to maintain the mapping for the newly embedded data. After t-SNE, the mapped data is used as input and the GMM is trained using the EM algorithm. The GMM is generated from a mixture of two gaussian distributions corresponding to normal and abnormal classes.
(3) Online anomaly detection
The main objective of the on-line anomaly detection stage is to detect the track points p of the ongoing dredging operation in real timeiWhether it is normal or not. For points outside the working area, the real-time features are first extracted and then mapped to a two-dimensional space by a trained neural network. Subsequently, the point m is mappediIs fed into GMM to obtain P (C)anomaly|mi) And P (C)normal|mi) Each represents miProbabilities of belonging to both abnormal and normal classes. If P (C)anomaly|mi) Greater than P (C)normal|mi),piMay be identified as likely to have abnormal behavior.
To evaluate the degree of abnormality of one job, the abnormality score of one job may be defined as,
Figure BDA0002111805750000121
wherein, W represents a working area,
Figure BDA0002111805750000122
points outside the working area, dist (p)i,pi-1) Represents a point piAnd pi-1The distance between them. The anomaly score is accumulated from the anomaly values for points outside each working area, which is calculated by multiplying the probability of an anomaly by the distance to the previous point.
Furthermore, the technical problem to be solved by the invention is as follows:
when a ship which carries out dredging operation in a designated harbor area drives away from an operation area, whether the ship is a normal avoidance behavior or an abnormal sludge dumping behavior is detected in real time, so that the quality of engineering management and the strength of environmental protection are improved.
The technical solution of the invention is as follows:
the invention establishes a characteristic system for extracting behavior characteristics from AIS data. In addition, T-distribution random neighborhood embedding (T-SNE) is combined with a Gaussian Mixture Model (GMM) through a neural network, and a detection model is trained in a semi-supervised mode.
In the description of the present application, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present application and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (6)

1. A ship operation abnormity semi-supervised real-time detection method based on AIS trajectory data is characterized by comprising a preprocessing stage, an off-line model training stage and an on-line detection stage;
a pretreatment stage: filtering the operation track by using a characteristic system, and extracting behavior characteristics;
an off-line model training stage: performing off-line training on the model parameters according to the behavior characteristics of the operation track obtained in the preprocessing stage;
and (3) an online detection stage: executing an online anomaly detection task, and giving an alarm aiming at potential anomaly behaviors during dredging operation;
finding a parameter theta of the maximum logL (theta) by adopting a maximum likelihood algorithm EM; the maximum likelihood algorithm EM comprises a step E and a step M;
in step E, the unlabeled sample is calculatedPosterior probability P of bookθ(Ci|xu) Namely:
Figure FDA0003156632930000011
wherein, CiRepresenting category i, c representing the total number of categories, xuIs an unlabeled sample, θ is a GMM model parameter, including wi、μiSum ΣiRespectively representing the proportion, the sample mean and the sample covariance of the class i;
at step M, the model parameters θ are updated according to their definitions, i.e.:
Figure FDA0003156632930000012
Figure FDA0003156632930000013
Figure FDA0003156632930000014
where N is the total number of samples, NiIs of class CiNumber of samples of (1), xrRepresents a sample, xuIs a no-label sample; w is ai、μiSum ΣiRespectively representing the proportion, the sample mean and the sample covariance of the class i;
repeating the step E and the step M until theta converges;
the off-line model training stage comprises a nonlinear dimension reduction T-SNE model and/or a probability model of behavior characteristic modeling after dimension reduction;
the T-SNE model calculates the probability of similarity of points in a corresponding low dimensional space by calculating the probability of similarity of points in a high dimensional space, data point xiAnd data point xjConditional probability of similarity pj|iSatisfies the following conditions:
Figure FDA0003156632930000021
wherein S (x)i,xj) Is xiAnd xjI, j and k are label numbers of data points; in high dimensional space, S (x)i,xj) Is defined as
Figure FDA0003156632930000022
In the lower dimensional space, S (x)i,xj) Is defined as (1+ | | x)i-xj||2)-1
Adopting a Gaussian mixture model GMM as a probability model for modeling the behavior characteristics after dimensionality reduction;
the goal of Gaussian mixture model GMM for semi-supervised learning is to maximize log-likelihood, i.e.
Figure FDA0003156632930000023
Where log is a mathematical logarithmic operation, L (θ) represents a loss function, PθRepresenting a posterior probability; x is the number ofrAnd
Figure FDA0003156632930000024
is a sample with a label and its label, xuIs an unlabeled sample, θ is a GMM model parameter, including wi、μiSum ΣiAnd respectively represent the scale, sample mean and sample covariance of the class i.
2. The AIS trajectory data-based semi-supervised real-time detection method of marine operations anomalies according to claim 1, wherein the features in the preprocessing stage include global features and/or real-time features.
3. The AIS trajectory data-based ship operation anomaly semi-supervised real-time detection method according to claim 2, wherein the global features comprise any one or any combination of the following eight features:
1) total time s of one work trajectory T;
2) a total travel distance d of one working trajectory T;
3) number of turns n of ship in operation areaturn
4) Total number of times n of exiting from work area within one workout
5) Total stay time t outside working area in one operationout
6) Total distance d traveled outside the working area in one operationout
7) Maximum dwell time max outside the working area within a single operationt
8) Maximum distance max from boundary after leaving working aread
4. The AIS trajectory data-based ship operation anomaly semi-supervised real-time detection method according to claim 2, wherein the real-time characteristics comprise any one or any combination of four characteristics:
1) time t of leaving work areaoutc
2) Distance d from working areaoutc
3) Time t of last leaving work arealast
4) The current speed v.
5. A ship operation abnormity semi-supervised real-time detection method based on AIS trajectory data is characterized by comprising a preprocessing stage, an off-line model training stage and an on-line detection stage;
a pretreatment stage: in the pre-treatment phase, according to the positions of the working area and the sludge dumping area, a plurality of working tracks are firstly divided from the original tracks collected from the AIS: for each work trajectory, points outside the work area are extracted and characterized by a feature system: after preprocessing, trace points with 12 features are extracted as training samples for the next stage:
an off-line model training stage: first, the original samples are mapped to two-dimensional space by t-SNE: at the same time, a fully connected neural network containing three hidden layers is trained, after t-SNE, the mapped data is used as input and the maximum likelihood algorithm EM is used to train the gaussian mixture model GMM: the gaussian mixture model GMM is generated from a mixture of two gaussian distributions corresponding to normal and abnormal classes:
an online anomaly detection stage: the main objective of the on-line anomaly detection stage is to detect the track points p of the ongoing dredging operation in real timeiWhether the operation is normal or not: for points outside the working area, firstly extracting real-time features, and then mapping the real-time features to a two-dimensional space by a trained neural network: subsequently, the point m is mappediIs fed into a Gaussian mixture model GMM to obtain P (C)anomaly|mi) And P (C)normal|mi) Each represents miProbability of belonging to abnormal and normal classes: if P (C)anomaly|mi) Greater than P (C)normal|mi),piIt can be identified that abnormal behavior is likely to occur:
to evaluate the degree of abnormality of one job, the abnormality score of one job may be defined as,
Figure FDA0003156632930000031
wherein, W represents a working area,
Figure FDA0003156632930000032
points outside the working area, dist (p)i,pi-1) Represents a point piAnd pi-1The distance between: the anomaly score is accumulated from the anomaly values of points outside each working area, the anomaly value being calculated by multiplying the probability of an anomaly by the distance to the previous point;
finding a parameter theta of the maximum logL (theta) by adopting a maximum likelihood algorithm EM; the maximum likelihood algorithm EM comprises a step E and a step M;
in step E, the posterior probability P of the unlabeled sample is calculatedθ(Ci|xu) Namely:
Figure FDA0003156632930000033
wherein, CiRepresenting category i, c representing the total number of categories, xuIs an unlabeled sample, θ is a GMM model parameter, including wi、μiSum ΣiRespectively representing the proportion, the sample mean and the sample covariance of the class i;
at step M, the model parameters θ are updated according to their definitions, i.e.:
Figure FDA0003156632930000034
Figure FDA0003156632930000041
Figure FDA0003156632930000042
where N is the total number of samples, NiIs of class CiNumber of samples of (1), xrRepresents a sample, xuIs a no-label sample; w is ai、μiSum ΣiRespectively representing the proportion, the sample mean and the sample covariance of the class i;
and repeating the step E and the step M until theta converges.
6. A computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the steps of the AIS trajectory data based vessel operation anomaly semi-supervised real-time detection method of any one of claims 1 to 5.
CN201910574738.1A 2019-06-28 2019-06-28 AIS (automatic identification system) track data based ship operation abnormity semi-supervised real-time detection method Active CN110363115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910574738.1A CN110363115B (en) 2019-06-28 2019-06-28 AIS (automatic identification system) track data based ship operation abnormity semi-supervised real-time detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910574738.1A CN110363115B (en) 2019-06-28 2019-06-28 AIS (automatic identification system) track data based ship operation abnormity semi-supervised real-time detection method

Publications (2)

Publication Number Publication Date
CN110363115A CN110363115A (en) 2019-10-22
CN110363115B true CN110363115B (en) 2021-10-15

Family

ID=68217566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910574738.1A Active CN110363115B (en) 2019-06-28 2019-06-28 AIS (automatic identification system) track data based ship operation abnormity semi-supervised real-time detection method

Country Status (1)

Country Link
CN (1) CN110363115B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110824142B (en) * 2019-11-13 2022-06-24 杭州鲁尔物联科技有限公司 Geological disaster prediction method, device and equipment
CN110929790A (en) * 2019-11-26 2020-03-27 山东中创软件工程股份有限公司 Ship data screening method and related device
CN111339864B (en) * 2020-02-17 2023-11-10 北京百度网讯科技有限公司 Abnormal behavior alarm method and device
CN111931555B (en) * 2020-06-14 2023-08-08 交通运输部科学研究院 Method for identifying whether ship AIS is opened or not by utilizing video image
CN111785090B (en) * 2020-06-24 2022-03-29 国家海洋环境监测中心 Method for automatically identifying illegal dumping based on ship AIS track data
CN111682972B (en) * 2020-08-14 2020-11-03 支付宝(杭州)信息技术有限公司 Method and device for updating service prediction model
CN112395382A (en) * 2020-11-23 2021-02-23 武汉理工大学 Ship abnormal track data detection method and device based on variational self-encoder
CN112699315B (en) * 2020-12-30 2023-08-15 中南大学 AIS data-based ship abnormal behavior detection method
CN113326472B (en) * 2021-05-28 2022-07-15 东北师范大学 Pattern extraction and evolution visual analysis method based on time sequence multivariable data
CN113542697B (en) * 2021-09-15 2021-12-10 江西省水利科学院 Adaptive monitoring method of sand production ship adaptive monitoring system based on deep learning
CN114118243B (en) * 2021-11-18 2023-07-07 中交疏浚技术装备国家工程研究中心有限公司 Construction track identification method for trailing suction hopper dredger based on track data
CN113935666B (en) * 2021-12-17 2022-03-22 武汉精装房装饰材料有限公司 Building decoration wall tile abnormity evaluation method based on image processing

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106218831A (en) * 2016-07-22 2016-12-14 大连海事大学 A kind of method and system obtaining Ship Controling behavior based on watercraft AIS track data
CN108228732A (en) * 2016-12-14 2018-06-29 公立大学法人首都大学东京 language storage method and language dialogue system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106218831A (en) * 2016-07-22 2016-12-14 大连海事大学 A kind of method and system obtaining Ship Controling behavior based on watercraft AIS track data
CN108228732A (en) * 2016-12-14 2018-06-29 公立大学法人首都大学东京 language storage method and language dialogue system

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Maximum likelihood from incomplete data via the em algorithm;A. P. Dempster et al.;《Journal of the royal statistical society》;19771231;第1-38页 *
基于数据挖掘的船舶航行轨迹异常检测方法研究;邓磊;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20190515(第5期);第26-41页 *
基于轨迹数据挖掘的异常检测方法研究;叶敏;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20180215(第2期);C034-1011 *
邓磊.基于数据挖掘的船舶航行轨迹异常检测方法研究.《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》.2019,(第5期), *

Also Published As

Publication number Publication date
CN110363115A (en) 2019-10-22

Similar Documents

Publication Publication Date Title
CN110363115B (en) AIS (automatic identification system) track data based ship operation abnormity semi-supervised real-time detection method
Zhang et al. A machine learning method for the prediction of ship motion trajectories in real operational conditions
Gong et al. Change detection in synthetic aperture radar images based on deep neural networks
Qiao et al. Marine vessel re-identification: A large-scale dataset and global-and-local fusion-based discriminative feature learning
Moradi et al. Automated anomaly detection and localization in sewer inspection videos using proportional data modeling and deep learning–based text recognition
Liu et al. Real-time monocular obstacle detection based on horizon line and saliency estimation for unmanned surface vehicles
Mathias et al. Occlusion aware underwater object tracking using hybrid adaptive deep SORT-YOLOv3 approach
Sun et al. NSD‐SSD: a novel real‐time ship detector based on convolutional neural network in surveillance video
Zhang et al. A novel ship trajectory clustering analysis and anomaly detection method based on AIS data
Li et al. Underwater object tracker: UOSTrack for marine organism grasping of underwater vehicles
Kim et al. Combined visually and geometrically informative link hypothesis for pose-graph visual SLAM using bag-of-words
Budiarsa et al. Face recognition for occluded face with mask region convolutional neural network and fully convolutional network: a literature review
Liu et al. Semi-supervised object detection with uncurated unlabeled data for remote sensing images
Lad et al. Estimating label quality and errors in semantic segmentation data via any model
Chen et al. Ship tracking for maritime traffic management via a data quality control supported framework
Nabi et al. Probabilistic model-based active learning with attention mechanism for fish species recognition
Taghavi et al. Advanced data cluster analyses in digital twin development for marine engines towards ship performance quantification
Zhang et al. How can Human-in-the-loop Improve the Performance of SAR ATR? A Reinforcement Learning Based Approach
Arief et al. Better Modeling Out-of-Distribution Regression on Distributed Acoustic Sensor Data Using Anchored Hidden State Mixup
Zhang et al. Predicting Vessel Trajectories Using ASTGCN with StemGNN-Derived Correlation Matrix
Sharma et al. MASSNet: Multiscale Attention for Single-Stage Ship Instance Segmentation
Maeda et al. Distress classification of road structures via decision level fusion
Donadi et al. Improving Generalization of Synthetically Trained Sonar Image Descriptors for Underwater Place Recognition
Alloghani Artificial Intelligence for Ocean Conservation: Sustainable Computer Vision Techniques in Marine Debris Detection and Classification
Zhang et al. Research on Siamese Object Tracking Algorithm Based on Knowledge Distillation in Marine Environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant