CN112540614B - Unmanned ship track control method based on deep reinforcement learning - Google Patents

Unmanned ship track control method based on deep reinforcement learning Download PDF

Info

Publication number
CN112540614B
CN112540614B CN202011353012.4A CN202011353012A CN112540614B CN 112540614 B CN112540614 B CN 112540614B CN 202011353012 A CN202011353012 A CN 202011353012A CN 112540614 B CN112540614 B CN 112540614B
Authority
CN
China
Prior art keywords
unmanned
reward
unmanned ship
network
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011353012.4A
Other languages
Chinese (zh)
Other versions
CN112540614A (en
Inventor
仲伟波
李浩东
冯友兵
常琦
许强
林伟
孙彬
胡智威
齐国庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202011353012.4A priority Critical patent/CN112540614B/en
Publication of CN112540614A publication Critical patent/CN112540614A/en
Application granted granted Critical
Publication of CN112540614B publication Critical patent/CN112540614B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/0206Control of position or course in two dimensions specially adapted to water vehicles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention belongs to the field of unmanned ship track control and discloses an unmanned ship track control method based on deep reinforcement learning. The method comprises the following steps: the deep reinforcement learning framework is used for unmanned boat track control with a large hysteresis system, and the deep reinforcement learning framework enables a large hysteresis non-Markov system such as an unmanned boat to obtain a good training effect through deep reinforcement learning.

Description

Unmanned ship track control method based on deep reinforcement learning
Technical Field
The invention belongs to the field of unmanned ship track control, and particularly relates to an unmanned ship track control method based on deep reinforcement learning.
Background
In recent years, the deep neural network has been developed greatly, and the reinforcement learning has achieved remarkable achievements in aspects of playing chess, games, recommendation systems and the like after being combined with the deep neural network. The reason why deep reinforcement learning can achieve good training effect in these fields is that the rules of these fields are relatively clear, the state transition strictly conforms to markov, and the influence factors of the intelligent agent under these circumstances are relatively small and controllable. When the unmanned boat is used for deep reinforcement learning, the unmanned boat is influenced by various environmental factors, and the environmental factors considered when the unmanned boat completes different tasks in different environments have certain differences. Whether the unmanned ship can obtain enough and accurate environmental information is an important factor influencing the learning effect of the deep reinforcement learning. The track control of the unmanned ship is the basis for the unmanned ship to complete other tasks, and the application of deep reinforcement learning to the track control of the unmanned ship is an important step of the unmanned ship in the automatic control step towards artificial intelligence.
Disclosure of Invention
The invention designs a deep reinforcement learning framework for unmanned boat track control with a large hysteresis system, and the deep reinforcement learning framework enables a large hysteresis non-Markov system such as an unmanned boat to obtain a good training effect through deep reinforcement learning.
The invention is realized by the following technical scheme: an unmanned ship track control method based on deep reinforcement learning comprises the following steps:
the method comprises the following steps: initializing network parameters of a decision network Q and a target network Q';
step two: obtaining the current state S of the unmanned ship t The method comprises the steps of obtaining position information and speed information of the current moment, data of an obstacle avoidance sensor carried by an unmanned ship, and information of a rudder angle position and propeller output power of the previous moment;
step three: preprocessing the state information of the unmanned ship, and introducing differential quantities of length and angle information into the state information of the unmanned ship for the large inertia of the ship; for calculating board card delay, introducing integral quantity of state information into the state information;
step four: will state S t Substituting the decision network Q and obtaining an action ac and a reward r according to a strategy pi (ac | s);
step five: execute the action and enter the next state S t+1 And is pretreated to obtain state S' t+1
Step six: will (S) t ′,S′ t+1 Ac, r) as a piece of data together with the sampling priority is stored in an experience pool;
step seven: sampling m pieces of data by taking the sampling priority as the basis of the sampling probability, and putting the m pieces of data into a target network to obtain a loss function omega;
step eight: updating the decision network Q by using omega;
step nine: if i > = n, updating the target network Q' once by using the parameters of the decision network Q, and enabling i =0;
step ten: and (5) observing whether a training ending condition is reached, ending the training when the training ending condition is reached, and otherwise, jumping to the step two.
Further, in the second step, the operation information such as the steering angle and the propeller output power is also used as the state information as a part of the state information.
Further, in the third step, when the state is input into the decision network, the data of the state S is preprocessed, so that the large hysteresis system not meeting markov property can also meet markov property to a certain extent.
Furthermore, rewards acquired by the unmanned boat are set in detail, and the problem that the learning and training efficiency is low due to the sparse rewards is prevented.
Further, in the second step, the probability of the data of the training neural network being sampled is dynamically adjusted, so that the latest data can be utilized as early as possible, and it is ensured that all data are uniformly used. The overall utilization rate of the data is improved.
Compared with the prior art, the invention has the following beneficial effects: the invention designs a deep reinforcement learning framework for unmanned ship track control with a large hysteresis system, and the deep reinforcement learning framework enables a large hysteresis non-Markov system such as an unmanned ship to obtain a good training effect through deep reinforcement learning. The state of the unmanned ship is transferred to a certain extent according with Markov property through differential preprocessing of state information, and the influence of delay of the unmanned ship execution action on the training effect can be reduced in a self-adaptive manner through the delay preprocessing. The detailed reward functions are set with the track control as the main target, the relation among the reward functions is analyzed, and the situation that the training of the unmanned boat is involved in accidents is avoided in the setting of the reward functions by considering some accidents possibly encountered by the unmanned boat in the training process.
Drawings
FIG. 1 is a block diagram of an algorithm flow of an unmanned ship track control method based on deep reinforcement learning according to the present invention;
FIG. 2 is a data flow diagram of the unmanned surface vehicle track control method based on deep reinforcement learning;
fig. 3 is a diagram of unmanned ship hardware distribution and connection of the unmanned ship track control method based on deep reinforcement learning according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Moreover, the technical solutions in the embodiments of the present invention may be combined with each other, but it is necessary to be able to be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent, and is not within the protection scope of the present invention.
Please refer to the drawings to explain the specific implementation process:
(1) In the network parameter initialization, if the training is performed for the first time, the weight parameter of the network is initialized randomly, and if the training is not performed for the first time, the network is initialized to the network parameter stored when the previous test is finished. The parameter i is used for evaluating the update count of the network, and the target network is updated once after the evaluation network is updated for n times. The unmanned ship samples the environmental data at an interval of Ts, and updates the decision network once every Ts and updates the target network once every n.Ts seconds.
(2) The acquired current state information comprises the position information of the unmanned ship, and the position of the target track point is known, so that a coordinate system is established by taking the current target track point as an original point and the target track direction as the positive direction of an x axis, and the coordinate of the unmanned ship can be calculated to be G t =(x t ,y t ) (ii) a The direction of a target track changing to the direction of the next target trackAngle of delta theta t ,-180°<△θ t Less than or equal to 180 degrees; the data of the unmanned ship obstacle avoidance sensor is D t
Because the calculation of the calculation board card is time-consuming, the response delay of the motor is not negligible, the influence of the rudder angle and the power output of the propeller on the state of the unmanned ship is continuous, and the action at the previous moment influences the state at the next moment, so that the state information needs to be included. The propeller output power of the unmanned ship is Pu t The rudder angle output is Ang t . Total current power output is F t =(Pu t ,Ang t ). Then F will be t-1 And (5) incorporating the motion space.
The finally obtained current state information is S t =(G t ,△θ t ,D t ,F t-1 )
(3) The differential quantity and the integral quantity of the state information are introduced to eliminate the influence of large hysteresis of the unmanned ship. In practice, we sample the data discretely on the time axis. In a discrete system we replace the differential and integral quantities with the differential and delay quantities. The state information before the preprocessing is S t The preprocessed state information is S t ′。
The difference is used for eliminating the influence of the large inertia of the unmanned ship, and the acceleration of the unmanned ship is directly influenced by the rudder angle and the action of the propeller. The change in the unmanned vehicle speed is markov compliant, while the change in the position is not markov compliant. The position at the next moment is influenced not only by the current output power but also by the current speed, so that the speed is also listed as part of the status information. The information of the unmanned ship related to the distance or the heading and the difference thereof should be introduced into the state information.
The time delay is used for eliminating the influence caused by the time difference from decision making to action response in place, and the time delay amount of the state at the previous lambda moment is introduced into the state space. And setting the actual delay as tau, T as the sampling interval of the unmanned ship system, and setting lambda to satisfy the relation of lambda T & gt tau. The weight value of the network corresponding to the state information at the moment closest to the actual delay amount during training rises quickly along with the training, and the weight value of data in the network without the corresponding state information is quickly attenuated and approaches to 0 because the action and the behavior of the data have no relevance or the relevance is very low. Therefore, the problem that the unmanned ship is delayed from decision to action is solved in a self-adaptive manner.
Considering that the difference component can be linearly represented by a delay quantity, the influence of the delay quantity on the state transition of the unmanned ship is reflected on the weight of the neural network, and in order to simplify the deep neural network, the state preprocessing is finally simplified as follows:
S t ′=(S t ,S t-1 ,S t-2 …S t-λ )
(4) The system is applied to the unmanned surface vehicle with double propellers and single rudder. The two propellers are controlled by a signal, and the power output is the same. In order to conveniently control the board card to control the unmanned ship, a discrete action space is set. Thrust Pu to propeller output t From 0 thrust to maximum thrust, 10 gear positions are set. For rudder angle Ang t Resolution was 5 degrees from-60 degrees to 60 degrees, with 25 angles set. Action A t =(Pu t ,Ang t )。
(5) And (3) setting the reward, wherein in order to achieve the training target, a reward function is set in detail:
r=k·r v ·r y +r s +r z
each component is explained separately below, where the individual letters a, b, c, d, g, h, k are all constants.
r v Awarding speed in a direction approaching the current target track point
Figure GDA0003851820900000041
A reward is set. The horizontal distance between the unmanned boat and the target track is x t And x is t ≥0
Figure GDA0003851820900000042
r y Reward is controlled for track, the reward is larger when the navigation line sticking precision of the unmanned boat is higher, and the vertical distance between the unmanned boat and the target track isy t (y t ≥0)。
Figure GDA0003851820900000043
r s For position reward, the reward is larger when the unmanned boat is closer to the target position, and the reward is larger when the distance between the unmanned boat and the target track point is smaller. A distance from the target track point of
Figure GDA0003851820900000044
Figure GDA0003851820900000051
And as long as the unmanned ship reaches the range threshold value d of the target track point, the current track point of the unmanned ship is updated to be the next track point. Thus c/d in the above formula t And does not tend to be infinite. However, considering that the unmanned boat starts sailing and may be very close to the track point when sailing is finished, in order to prevent the unmanned boat from obtaining unreasonably large reward, the piecewise function in the above formula is set to limit the maximum value of the position reward.
r z In order to avoid the obstacle reward, the unmanned boat can obtain the information of the obstacles in front of the unmanned boat through the obstacle avoiding sensor. The magnitude of the sailing speed of the unmanned boat
Figure GDA0003851820900000052
Setting a dynamic safety distance gv d The drones receive a negative reward when less than the safe distance.
Figure GDA0003851820900000053
The final reward function is R = k · R v ·r y +r s +r z . Wherein r is v ·r y The terms are set to multiply rather than add because the approach of the drone to the track point and the track maintenance must be done simultaneously, and if the two awards are added, the result will be thatSo that unmanned boats can still receive unreasonably medium positive rewards when they remain in track and stop moving forward.
(6) Two deep neural networks with the same structure, a decision network Q and a target network Q' are set. The specific updating process is shown in fig. 1, and the data flow is shown in fig. 2. And the decision network Q is used for selecting actions to be executed by the unmanned ship after the environmental information is collected. The updated error function is derived from the target network Q' each time the decision network is updated for an action. The target network Q' cannot be updated every time, otherwise the target is always changing to be unfavorable for convergence of parameters. The constant n is therefore set, once every time the decision network Q is updated n times the target network Q' is updated.
(7) The format of the experience pool data includes preprocessed status information and rewards, action information and next status information, i.e., (S) t ′,S t+1 ,R′ t ,A t ). The reward function is also preprocessed data, and the reason of preprocessing is the same as that of state preprocessing and is not repeated.
The data actually stored by the experience pool should also include the unique number N of the piece of data, the sampling probability level P, and the number of times M the data was sampled.
Each piece of data in the experience pool has a format of (N, P, M, S) t ′,S t+1 ,R′ t ,A t )。
(8) The sampling levels of the data in the experience pool are divided into three levels, and the probability that the data with high sampling levels are sampled is higher.
The initial most recently stored data sample level is three levels. To ensure that the most up-to-date data is available as soon as possible after being put into the experience pool. The data with the sampling level of three levels is sampled for three times, and then the sampling level is two levels. The data of which the sampling level is two levels is lowered to one level after being sampled five times. Ten pieces of data are sampled each time an update results in one strip. The arrangement can ensure that most data sampling levels in the experience pool are kept at one level. The setting of the sampling grade can improve the data use efficiency and accelerate the convergence.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (4)

1. An unmanned ship track control method based on deep reinforcement learning is characterized by comprising the following steps: the method comprises the following steps:
the method comprises the following steps: initializing network parameters of a decision network Q and a target network Q';
step two: obtaining the current state S of the unmanned ship t The method comprises the steps of obtaining position information and speed information of the current moment, data of an obstacle avoidance sensor carried by an unmanned ship, and information of a rudder angle position and propeller output power of the previous moment;
step three: preprocessing the state information of the unmanned ship, and introducing differential quantities of length and angle information into the state information of the unmanned ship for the large inertia of the ship; the state S 'is formed by introducing the integral quantity of the state information into the state information according to the ship hysteresis' t Of which is S' t =(S t ,S t-1 ,S t-2 …S t-λ );
Step four: will be state S' t Substituting into the decision network Q and obtaining the action ac and the reward r according to the strategy pi (ac | s),
the reward function is:
r=k·r v ·r y +r s +r z
wherein: r is v Awarding speed in a direction approaching the current target track point
Figure FDA0003851820890000011
Set reward, noneThe horizontal distance between the boat and the target track is x t And x is t ≥0
Figure FDA0003851820890000012
r y Reward is controlled for track, the reward is larger when the navigation line pasting precision of the unmanned boat is higher, and the vertical distance between the unmanned boat and the target track is y t And y is t ≥0
Figure FDA0003851820890000013
r s For position reward, the reward is larger when the unmanned boat is closer to the target position, the distance between the unmanned boat and the target track point is smaller, the reward is larger, and the distance between the unmanned boat and the target track point is
Figure FDA0003851820890000014
Figure FDA0003851820890000021
Updating the current track point of the unmanned ship to be the next track point within a range threshold value d when the unmanned ship reaches the target track point;
r z in order to avoid the obstacle reward, the unmanned boat can obtain the information of the obstacles in front of the unmanned boat through the obstacle avoiding sensor, and the unmanned boat can run at the speed
Figure FDA0003851820890000022
Setting a dynamic safety distance gv d When the distance is less than the safe distance, the unmanned boat obtains a negative reward,
Figure FDA0003851820890000023
in the above formula, the letters a, b, c, d, g, h and k are constants;
step five: execute the action and enter the next state S t+1 And is pretreated to obtain state S' t+1
Step six: will (S) t ′,S′ t+1 Ac, r) as a piece of data together with the sampling priority is stored in an experience pool;
step seven: sampling m pieces of data by taking the sampling priority as the basis of the sampling probability, and putting the m pieces of data into a target network to obtain a loss function omega;
step eight: updating the decision network Q by using a loss function omega;
step nine: if i > = n, the target network Q' is updated once with the parameters of the decision network Q, and let i =0,
i is the updating times of the decision network Q, and n is a preset constant;
step ten: and (5) observing whether a training ending condition is reached, ending the training when the training ending condition is reached, and otherwise, jumping to the step two.
2. The unmanned ship track control method based on deep reinforcement learning of claim 1, characterized in that: in the second step, the operation information of the rudder angle at the previous time and the output power of the propeller is also used as the state information as a part of the current state information.
3. The unmanned ship track control method based on deep reinforcement learning of claim 1, characterized in that: in the third step, the state S' t The large hysteresis system which does not meet Markov property can also meet Markov property to a certain extent by inputting the large hysteresis system into a state action value function network.
4. The unmanned ship track control method based on deep reinforcement learning of claim 1, characterized in that: in the second step, the probability of the data of the training neural network being sampled is dynamically adjusted, so that the latest data can be utilized as early as possible, and all the data are guaranteed to be uniformly used.
CN202011353012.4A 2020-11-26 2020-11-26 Unmanned ship track control method based on deep reinforcement learning Active CN112540614B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011353012.4A CN112540614B (en) 2020-11-26 2020-11-26 Unmanned ship track control method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011353012.4A CN112540614B (en) 2020-11-26 2020-11-26 Unmanned ship track control method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN112540614A CN112540614A (en) 2021-03-23
CN112540614B true CN112540614B (en) 2022-10-25

Family

ID=75016863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011353012.4A Active CN112540614B (en) 2020-11-26 2020-11-26 Unmanned ship track control method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN112540614B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114839884B (en) * 2022-07-05 2022-09-30 山东大学 Underwater vehicle bottom layer control method and system based on deep reinforcement learning
CN115657683B (en) * 2022-11-14 2023-05-02 中国电子科技集团公司第十研究所 Unmanned cable-free submersible real-time obstacle avoidance method capable of being used for inspection operation task

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109765916A (en) * 2019-03-26 2019-05-17 武汉欣海远航科技研发有限公司 A kind of unmanned surface vehicle path following control device design method
CN110109355A (en) * 2019-04-29 2019-08-09 山东科技大学 A kind of unmanned boat unusual service condition self-healing control method based on intensified learning
CN110658829A (en) * 2019-10-30 2020-01-07 武汉理工大学 Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning
WO2020056299A1 (en) * 2018-09-14 2020-03-19 Google Llc Deep reinforcement learning-based techniques for end to end robot navigation
CN111880535A (en) * 2020-07-23 2020-11-03 上海交通大学 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020056299A1 (en) * 2018-09-14 2020-03-19 Google Llc Deep reinforcement learning-based techniques for end to end robot navigation
CN109765916A (en) * 2019-03-26 2019-05-17 武汉欣海远航科技研发有限公司 A kind of unmanned surface vehicle path following control device design method
CN110109355A (en) * 2019-04-29 2019-08-09 山东科技大学 A kind of unmanned boat unusual service condition self-healing control method based on intensified learning
CN110658829A (en) * 2019-10-30 2020-01-07 武汉理工大学 Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning
CN111880535A (en) * 2020-07-23 2020-11-03 上海交通大学 Unmanned ship hybrid sensing autonomous obstacle avoidance method and system based on reinforcement learning

Also Published As

Publication number Publication date
CN112540614A (en) 2021-03-23

Similar Documents

Publication Publication Date Title
CN111694365B (en) Unmanned ship formation path tracking method based on deep reinforcement learning
CN111061277B (en) Unmanned vehicle global path planning method and device
CN108820157B (en) Intelligent ship collision avoidance method based on reinforcement learning
CN111667513A (en) Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN113095481B (en) Air combat maneuver method based on parallel self-game
CN112540614B (en) Unmanned ship track control method based on deep reinforcement learning
CN111483468B (en) Unmanned vehicle lane change decision-making method and system based on confrontation and imitation learning
CN112100917B (en) Expert countermeasure system-based intelligent ship collision avoidance simulation test system and method
CN110658829A (en) Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning
CN112286218B (en) Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient
CN111260027B (en) Intelligent agent automatic decision-making method based on reinforcement learning
CN112433525A (en) Mobile robot navigation method based on simulation learning and deep reinforcement learning
CN112180950B (en) Intelligent ship autonomous collision avoidance and path planning method based on reinforcement learning
CN109145451B (en) Motion behavior identification and track estimation method for high-speed gliding aircraft
CN115033022A (en) DDPG unmanned aerial vehicle landing method based on expert experience and oriented to mobile platform
CN114967721B (en) Unmanned aerial vehicle self-service path planning and obstacle avoidance strategy method based on DQ-CapsNet
CN113110546A (en) Unmanned aerial vehicle autonomous flight control method based on offline reinforcement learning
CN114859910A (en) Unmanned ship path following system and method based on deep reinforcement learning
CN113268074A (en) Unmanned aerial vehicle flight path planning method based on joint optimization
CN114089776B (en) Unmanned aerial vehicle obstacle avoidance method based on deep reinforcement learning
CN112651374A (en) Future trajectory prediction method based on social information and automatic driving system
CN114371729B (en) Unmanned aerial vehicle air combat maneuver decision method based on distance-first experience playback
CN115373415A (en) Unmanned aerial vehicle intelligent navigation method based on deep reinforcement learning
CN115933712A (en) Bionic fish leader-follower formation control method based on deep reinforcement learning
CN114997048A (en) Automatic driving vehicle lane keeping method based on TD3 algorithm improved by exploration strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20210323

Assignee: CSIC PRIDE (NANJING) ATMOSPHERE MARINE INFORMATION SYSTEM Co.,Ltd.

Assignor: JIANGSU University OF SCIENCE AND TECHNOLOGY

Contract record no.: X2022320000094

Denomination of invention: A path control method for unmanned craft based on deep reinforcement learning

License type: Common License

Record date: 20220609

EE01 Entry into force of recordation of patent licensing contract
GR01 Patent grant
GR01 Patent grant