CN108820157A

CN108820157A - A kind of Ship Intelligent Collision Avoidance method based on intensified learning

Info

Publication number: CN108820157A
Application number: CN201810378954.4A
Authority: CN
Inventors: 张蕊; 王潇; 刘克中; 吴晓烈; 刘炯炯
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2018-04-25
Filing date: 2018-04-25
Publication date: 2018-11-16
Anticipated expiration: 2038-04-25
Also published as: CN108820157B

Abstract

The Ship Intelligent Collision Avoidance method based on intensified learning that the invention discloses a kind of obtains the static data and dynamic data of two ships first；Then the legitimacy of inspection data judges whether to need to start collision prevention program；Related collision prevention parameter is calculated, judges whether that dangerous situation can be generated；If risk of collision will not be generated, speed and direction is kept to advance according to " collision regulation "；If risk of collision can be generated, learn collision prevention strategy with intensified learning method, input data is that the parameter after calculating is trained, and exports the strategy to generate after training, obtains the rudder angle turned needed for this ship；Then implementation strategy, dynamic updates the dynamic data of two ships in step 1, and returns to a reward value；After strategy execution, opportunity of restoring navigation is determined according to " collision regulation " and then is restored navigation.The present invention realizes autonomous learning and the improvement of ship collision prevention, avoids sailor etc. purely by unfavorable situation caused by experience.

Description

A kind of Ship Intelligent Collision Avoidance method based on intensified learning

Technical field

The invention belongs to field of artificial intelligence, are related to a kind of Ship Intelligent Collision Avoidance method, specifically a kind of based on strong The Ship Intelligent Collision Avoidance method that chemistry is practised.

Background technique

During navigation, ship collision prevention is the problem of can not ignore, and there are many different solutions, benefits for this problem With the ship collision prevention intelligent decision based on AIS, ship collision prevention using intelligent algorithm based on evolution genetic algorithm is based on Bayes Ship collision prevention algorithm of network etc., these algorithms all have it is certain solve the problems, such as the ability of ship collision prevention, but also have them Limitation, they can not self-teaching and improvement to collision prevention strategy.

Currently, ship is related between more ships in open waters Avoidance, the ship collision prevention mode of existing open waters Mainly still it is based on《International Regulations for Preventing Collisions at Sea》, currently, because " collision regulation " related evacuation clause is mostly qualitative description, in reality During the anti-collision behavior of border, Ordinary Practice of Seaman, the practical ship experience etc. of grasping of driver can keep away specific decision scheme and ship Effect generation is touched to significantly affect.

In actual conditions, ship collision prevention is mainly the manipulation for depending on people, is extremely fixed against usual way and the driver of sailor Practical behaviour's ship experience, thus have many unstability.

Summary of the invention

In order to solve the above-mentioned technical problem, the present invention realizes the optimization to collision prevention strategy and algorithm using intensified learning, mentions A kind of Ship Intelligent Collision Avoidance method based on intensified learning has been supplied, autonomous learning and the improvement of ship collision prevention has been realized, avoids Sailor etc. is purely by unfavorable situation caused by experience.

The technical solution adopted by the present invention to solve the technical problems is：A kind of Ship Intelligent Collision Avoidance based on intensified learning Method, which is characterized in that include the following steps：

Step 1：Obtain the static data and dynamic data of two ships；

Step 2：The legitimacy of inspection data calculates related collision prevention parameter, judges whether that dangerous situation can be generated, starting is kept away Touch program；

Step 3：If risk of collision will not be generated, speed and direction is kept to advance according to " collision regulation "；Such as Fruit can generate risk of collision, then learn collision prevention strategy with intensified learning method, and input data is that the parameter after calculating carries out Training, exports the strategy to generate after training, obtains the rudder angle turned needed for this ship；

Step 4：The strategy that step 3 generates is executed, then dynamic updates the dynamic data of two ships in step 1, and returns One reward value；The reward value is used to evaluate the quality of collision prevention strategy；

Step 5：After strategy execution, opportunity of restoring navigation is determined according to " collision regulation " and then is restored navigation.

The invention has the advantages that it uses intensified learning to carry out the optimization of strategy, effective auxiliary operation people Member reduces the operation error as caused by intuition and experience, effectively improves the efficiency of ship collision prevention, has used engineering The method of habit enables ship collision prevention self improvement strategy with the process of autonomous learning compared with traditional avoidance algorithm.By plan After slightly optimizing, it is convenient to by machine learning to optimal policy be supplied to operator reference, make high quality certainly Plan avoids the generation for more exempting from close quarters situation,.

Detailed description of the invention

Fig. 1 is the schematic diagram of the embodiment of the present invention.

Specific embodiment

Understand for the ease of those of ordinary skill in the art and implement the present invention, with reference to the accompanying drawings and embodiments to this hair It is bright to be described in further detail, it should be understood that implementation example described herein is merely to illustrate and explain the present invention, not For limiting the present invention.

In machine learning field, intensified learning is as a kind of artificial intelligence approach, using DeepMind team grinding as representative Study carefully team and be put forward for the first time the deeply learning method based on DQN (Deep Q-Network), and is swum using the part Atari2600 Play is used as test object, as a result can be more than human player, significant effect.2012, Lange further started to apply, and mentioned Deep Fitted Q study is gone out for vehicle control.Experiments have shown that this method be suitable for intelligent control, robot and analysis, The fields such as prediction, new thinking and opportunity are provided to ship collision prevention optimized handling.The present invention can be very good fitting mankind sea The action of member makes Ship Intelligent Collision Avoidance decision be provided with the characteristics of autonomous learning is with improving.

Referring to Fig.1, a kind of Ship Intelligent Collision Avoidance method based on intensified learning provided by the invention, includes the following steps：

Step 1：Obtain the static data and dynamic data of two ships；

The static data and dynamic data of two ships include this ship information and object ship information；This ship information includes ship shape State, ship turning indices, ship follow sex index, course made good, stem to, ground speed, speed through water, longitude, latitude, rudder Angle, drinking water；Object ship information include name of vessel, MMSI, catchword, Ship Types, captain, the beam, course made good, stem to, to ground velocity Degree, speed through water, longitude, latitude, distance, true bearing, relative bearing.

Related collision prevention parameter includes time to closest point of approach (TCPA:Time to Closest Point of Approaching), distance to closest point of approach (DCPA:Distance of Closest Point of Approaching), safety Meeting distance (SDA:Safety Distance of Approaching), distance of close quarter situation (CQS:Close-quarters Situation Distance), immediate danger distance (IMD:Immediate Danger Distance), speed of related movement (VR:Relative Velocity) and direction of relative movement (AR:Relative Angle)；

Judge whether that dangerous situation can be generated, works as TCPA>0, and DCPA<When SDA, then risk of collision can be generated.

Learn collision prevention strategy with intensified learning method, specific implementation includes following sub-step：

Step 4.1：Input is used to the static parameter and dynamic parameter of the ship of training；

Step 4.2：All kinds of parameters input intensified learning DQN (Deep Q-learning Network) is carried out into training data； It constantly updates Q value function and obtains best model until Q function convergence；

Step 4.3：The static parameter for the ship for being used to test and dynamic parameter are inputted into trained model；

Step 4.4：Export the rudder angle turned needed for this ship；

The present embodiment passes through the static data and dynamic data, ocean that the equipment such as various kinds of sensors obtain two ships first Environmental data.A markov decision process four-tuple E=is generated at this time<S,A,P,R>, S is that state set describes ship's head The speed of a ship or plane, A are that behavior aggregate describes the rudder angle that ship should turn over, For transfer function, state transfer is specified Probability；For reward functions, award is specified.Existing algorithm generallys use DQN (Deep Q- Learning Network) carry out training data.Q-Table is initialized first, and row and column is S and A respectively, and the value of Q-Table is used To measure the quality for the movement a that current state s takes.The present embodiment is updated using Bellman equation in the training process Q-Table：

Q (s, a)=r+ γ (max (Q (s ', a '))

Q (s a) is expressed as current s and takes the instant r after a, and at a discount the maximum reward max after γ (Q (s ', a′))。

The present embodiment realizes that Q-Table, input state x export the Q value of different movement a in DQN by neural network. Its corresponding algorithm is as follows

1. with a deep neural network as the network of Q value, parameter ω；

Q(s,a,ω)≈Qπ(s,a)

2. carrying out objective function objective function using mean square deviation mean-square error in Q value Namely loss function；

L (ω)=E [(r+ γ maxa, Q (s, a, ω) and-Q (s, a, ω)²)]

Formula is s ' above, and a ' is next state and movement, has used the representation of David Silver here, has seen Come than more visible.It can be seen that be exactly here used Q-Learning to be updated Q value as target value.There is target value, There is current value again, then deviation can be calculated by mean square deviation.

3. gradient of the calculating parameter ω about loss function；

4. realizing the optimization aim of End-to-end using SGD；

Gradient above is calculated, andIt is calculated from deep neural network, accordingly, it is possible to use SGD Stochastic gradient descent carrys out undated parameter, to obtain optimal Q value.

5. acting a with probability ε random selection_tOr the maximum movement a of Q value is selected by the Q value that network exports_t, then To execution a_tReward r afterwards_tWith the input of next network, network calculates the output of subsequent time network further according to current value, So circulation.

After iteration several times, training, is represented when Q value converges to maximum value and trained good model.It will train Good model use in the collision prevention of two ships, it can at that time in emergency circumstances predict optimal collision prevention strategy, that is, turn Rudder angle, assist operators carry out the control of ship, and change ship status terminates until collision prevention behavior.

Reward value includes minimum track deviation amount, most short evacuation time, most short avoidance path, most short, minimum evacuation amplitude； The superiority and inferiority of strategy depends on executing the accumulation award obtained after this strategy for a long time, and strategy can be during training by several After secondary iteration, training, optimization is continuously available when the Q value for representing award converges to maximum value.

It should be understood that the part that this specification does not elaborate belongs to the prior art.

It should be understood that the above-mentioned description for preferred embodiment is more detailed, can not therefore be considered to this The limitation of invention patent protection range, those skilled in the art under the inspiration of the present invention, are not departing from power of the present invention Benefit requires to make replacement or deformation under protected ambit, fall within the scope of protection of the present invention, this hair It is bright range is claimed to be determined by the appended claims.

Claims

1. a kind of Ship Intelligent Collision Avoidance method based on intensified learning, which is characterized in that include the following steps：

Step 1：Obtain the static data and dynamic data of two ships；

Step 2：The legitimacy of inspection data calculates related collision prevention parameter, judges whether that dangerous situation can be generated, starts collision prevention journey Sequence；

Step 3：If risk of collision will not be generated, speed and direction is kept to advance according to " collision regulation "；If meeting Risk of collision is generated, then learns collision prevention strategy with intensified learning method, input data is that the parameter after calculating is trained, Output is the strategy generated after training, obtains the rudder angle turned needed for this ship；

Step 4：The strategy that step 3 generates is executed, then dynamic updates the dynamic data of two ships in step 1, and returns to one Reward value；The reward value is used to evaluate the quality of collision prevention strategy；

2. the Ship Intelligent Collision Avoidance method according to claim 1 based on intensified learning, it is characterised in that：In step 1, two The static data and dynamic data of ship include this ship information and object ship information；Described ship information includes ship status, ship Oceangoing ship turning indices, ship follow sex index, course made good, stem to, ground speed, speed through water, longitude, latitude, rudder angle, eat Water；The object ship information include name of vessel, MMSI, catchword, Ship Types, captain, the beam, course made good, stem to, to ground velocity Degree, speed through water, longitude, latitude, distance, true bearing, relative bearing.

3. the Ship Intelligent Collision Avoidance method according to claim 1 based on intensified learning, it is characterised in that：In step 2, institute Stating related collision prevention parameter includes time to closest point of approach TCPA, distance to closest point of approach DCPA, safe meeting distance SDA, close quarters situation Distance CQS, immediate danger distance IMD, speed of related movement VR and direction of relative movement AR；

It is described to judge whether that dangerous situation is generated, work as TCPA>0, and DCPA<When SDA, then risk of collision can be generated.

4. the Ship Intelligent Collision Avoidance method according to claim 1 based on intensified learning, it is characterised in that：In step 3, fortune Learn collision prevention strategy with intensified learning method, specific implementation includes following sub-step：

Step 4.2：All kinds of parameters input intensified learning DQN is carried out into training data；Q value function is constantly updated, until Q function is received It holds back, obtains best model；

Step 4.4：Export the rudder angle turned needed for this ship.

5. the Ship Intelligent Collision Avoidance method according to any one of claims 1-4 based on intensified learning, it is characterised in that： In step 4, the reward value includes minimum track deviation amount, most short evacuation time, most short avoidance path, minimum evacuation amplitude； The superiority and inferiority of strategy depends on executing the accumulation award obtained after this strategy for a long time, and strategy can be during training by several After secondary iteration, training, optimization is continuously available when the Q value for representing award converges to maximum value.