Disclosure of Invention
In view of the above, the present invention aims to provide a method for automatically generating a ship route based on big data, so as to solve the problems in the prior art that route planning excessively depends on navigation experience of a driver and professional background, so that the referential is not high, the productivity is low, the navigation safety cannot be well guaranteed, and the inherent navigation law of a ship cannot be adapted.
In order to achieve the purpose, the invention provides the following technical scheme:
a ship route automatic generation method based on big data comprises the following steps
S1: acquiring original data of a ship navigation track, and sorting and cleaning the original data to establish an original set of the ship navigation track;
s2: extracting starting points and ending points of all navigation tracks in the original set of the ship navigation tracks, and classifying the ship navigation tracks through a clustering algorithm to obtain a ship navigation track clustering set;
s3: giving a navigation plan and ship information of a ship, searching similar navigation tracks in a ship navigation track cluster set, obtaining a navigable path set based on the similar navigation tracks, calculating a transition probability corresponding to each navigable path in the navigable path set, and determining the navigable path with the maximum transition probability as a planned route of the voyage;
s4: repeatedly executing the step S3 for multiple times, calculating and storing a plurality of planned routes corresponding to the navigation plan of the given ship, setting constraint conditions based on the planned routes and the corresponding ship information, and establishing a ship route mapping library;
s5: and giving the constraint condition of the current ship navigation plan, carrying out route mapping in the ship route mapping library, and automatically generating the planned route of the current ship.
Further, the specific method of step S1 is as follows:
acquiring navigation tracks of all voyages of each ship in a preset time and a preset sea area to form original data, sorting and cleaning the original data, eliminating the navigation track with abnormal data in the original data, obtaining the normal navigation track of each voyage of each ship, and establishing an original set of the ship navigation tracks.
Further, the step S2 includes the following steps:
s201: extracting the starting point and the end point of each navigation track in the original set of the navigation tracks of the ship to form a starting point set and an end point set, and marking the navigation tracks according to corresponding ship information;
s202: clustering the initial points and the end points extracted in the step S201 respectively by adopting a clustering algorithm to obtain corresponding clustering results;
s203: traversing all navigation tracks in the original set of the ship navigation tracks, and classifying the navigation tracks according to the clustering results of the starting points and the end points in the step S202 to form a ship navigation track clustering set.
Further, the step S202 adopts a density-based clustering method and a DBSCAN algorithm to cluster the start point and the end point, respectively, and the specific steps are as follows:
s2021: setting a neighborhood radius of the DBSCAN algorithm and a preset number of starting points/end points in the neighborhood radius;
s2022: randomly selecting unprocessed first starting points/first ending points in the starting point set/the ending point set, and checking the number of the starting points/the ending points contained in the adjacent radius of the first starting points/the first ending points;
s2023: judging whether the number of the start points/the end points included in the first start point/first end point neighborhood radius in the step S2022 is greater than or equal to a preset number, if so, continuing to execute the step S2024, otherwise, repeating the step S2022;
s2024: establishing a corresponding first clustering set by using a first initial point/a first end point, establishing a candidate set, and classifying initial points/end points which are not classified into any clustering set or marked as noise and are contained in the neighborhood radius of the first initial point/the first end point into the candidate set;
s2025: selecting unprocessed second starting points/second end points in the candidate set, and checking the number of the starting points/end points contained in the adjacent radius of the second starting points/second end points;
s2026: judging whether the number of the start points/end points included in the neighborhood radius of the second start point/second end point in the step S2025 is greater than or equal to the preset number, if so, adding the second start point/second end point and the start point/end point which is not classified into any cluster set or not marked as noise in the neighborhood radius to the corresponding first cluster set, otherwise, only classifying the second start point/second end point into the corresponding first cluster set;
s2027: repeating steps S2025-S2026 until the candidate set is empty, and continuing to execute step S2028;
s2028: and repeating the steps S2022-S2027 until all the starting points/the end points in the starting point set/the end point set are classified into a certain cluster set or marked as noise, and obtaining a cluster set of the starting points/the end points of the ship navigation track.
Further, when determining the starting point/ending point included in the neighborhood radius of the starting point/ending point, calculating the distance between the two starting points/ending points, and if the distance between the two starting points/ending points is smaller than the neighborhood radius, the starting point/ending point is included in the neighborhood radius of the corresponding starting point/ending point.
Further, the distance between the two starting points/the ending points is calculated by using an Euclidean distance formula, and the calculation formula is as follows:
wherein: (x)1,x2) And (y)1,y2) Two-dimensional coordinates corresponding to the two start points/end points respectively.
Further, the step S3 includes the following steps:
s301: giving a ship navigation plan, classifying the navigation plan into a category represented by a corresponding ship navigation track cluster set according to an initial point and an end point of the navigation plan, dynamically searching similar ship navigation tracks in the similar ship navigation track cluster set according to ship information of a ship, and establishing an approximate track set;
s302: establishing a transition probability directed graph aiming at the approximate track set in the step S301;
s303: searching the navigable paths of the ship in the transition probability directed graph, establishing a navigable path set, calculating the transition probability of each navigable path in the navigable path set, and taking the navigable path with the maximum transition probability as a planned route from a starting point to an ending point.
Further, the specific method for establishing the transition probability directed graph in step S302 is as follows:
marking a plurality of nodes on each navigation track in the approximate track set at equal distance, calculating the transition probability from each node position to the adjacent node position, and establishing a transition probability directed graph.
Further, in step S303, a calculation formula of the transition probability of the navigable path is as follows:
wherein: p (Tr)
j) The transition probability of the jth navigable path in the navigable path set is obtained;
for the ith node in the jth navigable path, i is 1,2, …, m is the number of nodes in any navigable path;
the transition probability of the ith node in the jth navigable path is obtained;
the planned route based on the navigable path with the highest transition probability as the starting point to the ending point may be represented as:
path=argmaxP(Trj);
wherein: and the path is a planned route generated by the voyage route based on the navigable path with the highest patent probability.
Further, the ship information includes ship type, ship size, loading state, draft and sailing season.
According to the scheme, the historical navigation tracks of the ship are classified by adopting a density-based clustering method, so that the universality and the accuracy of ship navigation track clustering are improved, and a data base is laid for real-time route generation; in addition, when a ship route mapping library is established, a plurality of constraint conditions such as the current navigation season, ship information and the like are combined, and big data search based on the ship navigation track can improve the accuracy of real-time route generation of a ship and ensure the scientificity of route generation; finally, dynamic inquiry of the ship route is carried out in the ship route mapping library, so that the time required by route generation can be greatly shortened, the calculation cost is greatly saved, and the real-time performance of route generation is improved.
Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objectives and other advantages of the invention may be realized and attained by the means of the instrumentalities and combinations particularly pointed out hereinafter.
Detailed Description
The following is further detailed by way of specific embodiments:
examples
Fig. 1 is a flow chart of a big data-based ship route automatic generation method according to the present invention. The ship route automatic generation method based on big data of the embodiment specifically comprises the following steps:
s1: and establishing an original set of ship navigation tracks.
Specifically, acquiring navigation tracks of all voyages of each ship in a preset time and a preset sea area, and taking all the navigation tracks as original data; and sorting and cleaning the original data, and eliminating the navigation track with abnormal data (such as abnormal navigation speed, abnormal navigation track and the like) in the original data to obtain the normal navigation track of each ship in each voyage, so as to establish an original set of ship navigation tracks.
S2: and classifying the ship navigation tracks and establishing a ship navigation track clustering set.
Extracting the starting points and the end points of all navigation tracks in the original set of the navigation tracks of the ship, clustering all the starting points and the end points through a clustering algorithm, and classifying the navigation tracks of the ship according to the clustering results of the starting points and the end points to obtain a ship navigation track clustering set.
As shown in fig. 2, the step S2 includes the following steps:
s201: and extracting a starting point and an end point and marking ship information.
Firstly, extracting a starting point and an end point of each navigation track in an original set of navigation tracks of a ship to form a starting point set and an end point set, wherein the starting point set D isSAnd set of end points DERespectively expressed as:
wherein:
as a starting point, the position of the probe,
is an end pointAnd n is the number of starting points or the number of end points.
Because the ship has an inherent navigation rule when sailing on the sea, in order to increase the seaworthiness of the ship in the sea and integrate the navigation rules of historical ships, each navigation track needs to be marked according to corresponding ship information. In the present embodiment, the ship information includes, but is not limited to, ship type, ship size, ship state, draught, and sailing season.
S202: and clustering the starting points and the end points.
In order to improve the universality and accuracy of ship track clustering, a clustering algorithm is adopted to cluster the initial points and the end points extracted in the step S201 respectively to obtain corresponding clustering results. In this embodiment, the density-based clustering method uses a DBSCAN algorithm to cluster the start points and the end points, respectively.
Since the same method is used when clustering is performed on the start point and the end point, in this embodiment, the start point is taken as an example for description.
As shown in fig. 3, the step S202 includes the following steps:
s2021: initializing a DBSCAN algorithm, and setting a preset number of other starting points contained in a neighborhood radius e of the DBSCAN algorithm and a neighborhood radius e corresponding to a certain starting point, namely, a minimum number minPts of the starting points or the ending points contained in the neighborhood radius e of a processing object (starting point).
S2022: randomly selecting a set of starting points D
SThe first starting point of the cluster that is not processed (i.e., not classified as a cluster set or marked as noise)
Is denoted as a first object p; checking the number Pts of starting points contained by the first object p within the set neighborhood radius e
1。
When other starting points included in the neighborhood radius e of the first object p are determined, the distance rho from the first object p to other starting points in the neighborhood radius e is calculated by adopting an Euclidean distance formula:
wherein: (x)1,x2) And (y)1,y2) Two-dimensional coordinates corresponding to the two starting points respectively.
Judging whether the Euclidean distance rho between the two initial points calculated in the formula (3) is smaller than the neighborhood radius e, if so, indicating that the initial points are contained in the neighborhood radius e of the first object p, and thus obtaining the quantity Pts of the initial points contained in the neighborhood radius e of the first object p1。
S2023: it is determined whether the number Pts of the start points included in the neighborhood radius e of the first object p in step S2022 is greater than or equal to the preset number minPts.
If the number of starting points Pts contained within the neighborhood radius e of the first object p1If the number is greater than or equal to the preset number minPts, the process continues to step S2024.
If the number of starting points Pts contained within the neighborhood radius e of the first object p1If the number is less than the preset number minPts, step S2022 is repeatedly performed to continue to select other unprocessed first starting points for processing.
S2024: with the first object p (i.e. the first starting point selected in step S2021)
) Establishing a corresponding first cluster set C for core points
1And establishing a candidate set N, and then classifying the starting points which are not classified into any cluster set or marked as noise and are contained in the neighborhood radius e of the first object p into the candidate set N, namely:
wherein:
for a start contained within a neighborhood radius e of the first object pAnd (4) point.
S2025: selecting a second starting point within the candidate set where N is not processed
Marking as a second object q, determining other starting points contained in the radius e in the neighborhood of the second object q by adopting the method in the step S2022, and obtaining the number Pts of the starting points contained in the radius e in the neighborhood of the second object q
2。
S2026: it is determined whether the number of the start points included in the neighborhood radius e of the second object q in step S2025 is greater than or equal to the preset number minPts.
If the number of starting points Pts contained within the neighborhood radius e of the second object q
2If the number is greater than or equal to the predetermined number minPts, the second object q (i.e. the second starting point) is set
) Adding the initial point which is not classified into any cluster set or marked as noise in the neighborhood radius e to the corresponding first cluster set C
1In (1). The starting point within the neighborhood radius e of the second object q, which is not classified into any cluster set or marked as noise, can be represented as:
wherein: r is the number of starting points within the neighborhood radius e of the second object q that have been classified into any cluster set, which is marked as noise.
If the number of starting points Pts contained within the neighborhood radius e of the second object q
2Less than a predetermined number minPts, then only the second object q (i.e., the second starting point) is selected
) Adding to the corresponding first cluster set C
1And (4) performing neutralization.
S2027: steps S2025-S2026 are repeated until the candidate set N is empty (i.e., all the second starting points in the candidate set N
All classified as a cluster set or marked as noise), continue to execute step S2028;
s2028: repeating the steps S2022-S2027 until the starting point set D
SAll starting points in
And all the cluster sets are classified as a certain cluster set or marked as noise, and the cluster set of the starting point of the ship navigation track is obtained. The set of starting points D
SCan be expressed as:
DS=C1+C2+…+CM+ε (6)
wherein: c
1,…,C
MRespectively clustering sets for starting points; m is the number of the starting point cluster sets, namely the starting point set D
SAll starting points in
The number of categories to be divided; ε is the set of starting points marked as noise.
S203: and classifying the navigation tracks according to the clustering result.
Traversing all navigation tracks in the original set of the ship navigation tracks, classifying the navigation tracks according to the clustering results of the starting points and the end points in the step S202, and classifying all the ship navigation tracks into a plurality of categories to form a ship navigation track clustering set.
S3: and calculating the transition probability and selecting a planning route.
Specifically, a navigation plan and corresponding ship information of a ship are given, a navigation track similar to the given ship navigation plan and the ship information is searched in a ship navigation track clustering set, a current navigable path set of the given ship is obtained based on the similar navigation track, a transition probability corresponding to each navigable path in the navigable path set is calculated, and the navigable path with the maximum transition probability is determined as a planned route of the current voyage number of the ship.
As shown in fig. 4, the step S3 includes the following steps:
s301: dynamically searching the ship navigation track and establishing an approximate track set.
The method comprises the steps of giving a navigation plan of a ship, classifying the navigation plan into categories represented by corresponding ship navigation track cluster sets according to a starting point and an ending point of the navigation plan, dynamically searching similar ship navigation tracks in the ship navigation track cluster sets of the same category according to ship information of the ship, comparing the ship information (the type, the size, the loading state, the draft, the navigation season and the like of the ship) of the given ship with the type, the size, the loading state, the draft and the navigation season of the ship marked by the navigation tracks in the ship navigation track cluster sets of the same category, searching the similar ship navigation tracks, and accordingly establishing an approximate track set of the given ship at the current navigation time.
S302: and establishing a transition probability directed graph.
For the approximate track set in step S301, a plurality of nodes are marked at equal intervals on each navigation track, the transition probability from each node position to the adjacent node position (the adjacent node position may be the node position on the same navigation track, or the node position on other navigation tracks in the approximate track set) is calculated, and a transition probability directed graph is established.
In this embodiment, the established transition probability directed graph is defined as G (V, E, W), where V represents a set of all nodes on each navigation track in the approximate track set; e represents a set of directed edges formed by connecting lines of the current node position and the next adjacent node position; w represents the transition probability of each directed edge, i.e., the probability of transitioning from the current node position to the next node position.
S303: and calculating the transition probability to obtain a planned route.
Searching the navigable path of the given ship at the current voyage number according to the transition probability directed graph established in the step S302, and establishing a navigable path set { Tr1,Tr2,…,TrjJ 1,2, …, k, k is a set of navigable paths { Tr }1,Tr2,…,TrjInNumber of navigable paths, i.e. TrjRepresenting a set of navigable paths { Tr1,Tr2,…,TrjJ-th navigable path in (j), and then separately compute a set of navigable paths { Tr }1,Tr2,…,TrjAnd (4) taking the navigable path with the maximum transition probability as a planning route from a starting point to an ending point.
For a set of navigable paths { Tr
1,Tr
2,…,Tr
jEach navigable path Tr in the } can be regarded as a chain with m nodes, denoted as:
for i-1, 2, …, m-1, the transition probability of the current node can be determined
Calculating the transition probability of the next node
The transition probability for each navigable path Tr can thus be calculated:
wherein: p (Tr)
j) As a set of navigable paths { Tr
1,Tr
2,…,Tr
jThe transition probability of the jth navigable path in the } is determined;
is the ith node in the jth navigable path.
Thus, the planned route path based on the navigable path with the highest transition probability as the starting point to the ending point can be represented as:
path=argmaxP(Trj) (8)
s4: and establishing a ship route mapping library.
A large number of ship navigation plans and corresponding ship information are given, step S3 is repeatedly executed, a planned route corresponding to the navigation plan of each given ship is calculated and stored, and a ship route mapping library is established based on each planned route (including a start point, an end point, etc.) and corresponding ship information (ship type, ship size, loading state, draft, navigation season, etc.) to set constraint conditions.
S5: and automatically planning a ship route.
Before the ship sails, a sailing plan and constraint conditions of the ship are given, route mapping is carried out in the ship route mapping library, and a planned route of the current ship is automatically generated.
According to the scheme, multiple factors such as the navigation season of the ship, ship information and the like are comprehensively considered, dynamic search is carried out based on the ship track big data, a strong ship route mapping library is established, and dynamic query can be directly carried out in the ship route mapping library according to constraint conditions in the subsequent route design process, so that the current planned route is automatically generated, and the time for route design is greatly shortened.
The foregoing is merely an example of the present invention and common general knowledge of known specific structures and features of the embodiments is not described herein in any greater detail. It should be noted that, for those skilled in the art, without departing from the structure of the present invention, several changes and modifications can be made, which should also be regarded as the protection scope of the present invention, and these will not affect the effect of the implementation of the present invention and the practicability of the present invention.