CN108803321A - Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study - Google Patents

Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study Download PDF

Info

Publication number
CN108803321A
CN108803321A CN201810535773.8A CN201810535773A CN108803321A CN 108803321 A CN108803321 A CN 108803321A CN 201810535773 A CN201810535773 A CN 201810535773A CN 108803321 A CN108803321 A CN 108803321A
Authority
CN
China
Prior art keywords
auv
network
strategy
evaluation
trajectory tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810535773.8A
Other languages
Chinese (zh)
Other versions
CN108803321B (en
Inventor
宋士吉
石文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201810535773.8A priority Critical patent/CN108803321B/en
Publication of CN108803321A publication Critical patent/CN108803321A/en
Application granted granted Critical
Publication of CN108803321B publication Critical patent/CN108803321B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Abstract

The present invention proposes a kind of Autonomous Underwater Vehicle Trajectory Tracking Control method learnt based on deeply, belongs to deeply study and field of intelligent control.AUV Trajectory Tracking Control problems are defined first;Then the markov decision process model of AUV track following problems is established;Then mixed strategy-evaluation network is built, which is made of multiple tactful networks and evaluation network;Finally by the target strategy of the mixed strategy-evaluation Solution To The Network AUV Trajectory Tracking Controls built, for multiple evaluation networks, the graceful absolute error of Bell it is expected by definition to assess the performance of each evaluation network, in a worst evaluation network of each time step more new capability, for multiple tactful networks, a tactful network is randomly choosed in each time step, and is updated using deterministic policy gradient, the final strategy learnt is the mean value of all policies network.The present invention is not easily susceptible to the influence of severe AUV historical traces track, and precision is high.

Description

Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
Technical field
The invention belongs to deeply study and field of intelligent control, are related to a kind of autonomous water learnt based on deeply Lower aircraft (AUV) Trajectory Tracking Control method.
Background technology
Deep seafloor the reach of science is highly dependent on deep-sea detecting technology and equipment, since abyssal environment is complicated, condition Extremely, it mainly replaces using deep ocean work type Autonomous Underwater Vehicle at present or people is assisted to detect, observe and adopt to deep-sea Sample.And the task scene of execute-in-place can not be reached for mankind such as marine resources exploration, seabed investigation and marine chartings, ensure The independence and controllability of AUV sub-aqua sports are a most basic and important functional requirements, are to realize that every complex job is appointed The premise of business.However, many off-shore applications (such as Trajectory Tracking Control, target following control etc.) of AUV are extremely challenging, this Kind challenge is mainly caused by the characteristic of three aspects below AUV systems.First, AUV as a kind of multi-input multi-output system, Dynamics and kinematics model (hereinafter referred to as model) are complicated, and with nonlinearity, close coupling, there are input or state constraints And the features such as time-varying;Second, there is uncertainty in model parameter or hydrodynamic environment, cause AUV system modellings more difficult;The Three, Most current AUV belong to under-actuated systems, i.e. degree of freedom (respectively independently executes device difference more than the quantity for independently executing device Corresponding one degree of freedom).In general, by method that mathematical physics illation of mechanism, numerical simulation and full-scale investigation are combined come really Determine the model and parameter of AUV, and rationally portrays the uncertain part in model.Complicated model leads to the control problem of AUV It is extremely complex.Moreover, with the continuous extension of AUV application scenarios, people propose more precision, the stability of its motion control How high requirement improves control effects of the AUV under various moving scenes and has become important research direction.
In the past few decades, for different applications such as track following, path point tracking, path planning and formation controls Scene, researchers devise various AUV motion control methods and demonstrate its validity.Wherein representative is The output feedback ontrol method based on model that Refsnes et al. is proposed, the control method use the system mould of two decouplings Type:One for portray ocean current load Three Degree Of Freedom ocean current induction hull model and one for describe system dynamic five from By degree model.In addition, Healey et al. devises a kind of tracking and controlling method based on feedback of status, the control method is using solid Fixed propulsion speed simultaneously carries out linearization process to system model, while the control method uses the moulds of three decouplings Type:Surging model, horizontally-guided model (swaying and yawing) and vertical model (heaving and pitching).However, these methods are all right System model has carried out decoupling or linearization process, therefore is difficult to meet high-precision controls of the AUV under specific application scene to want It asks.
Due to the powerful self-learning capability of the limitation and intensified learning of above-mentioned Classical Motion control method, in recent years, Researchers by the intelligent control method of representative of intensified learning to showing great research interest.And it is various based on reinforcing The intelligent control method of learning art (such as Q study, direct strategy search, strategy-evaluation network and self-adapting strengthened study) And be constantly suggested and be successfully applied in different complex application contexts, such as motion planning and robot control, unmanned plane during flying Control, hypersonic vehicle tracing control and signal light path control etc..The core of control method based on intensified learning Thought is that the performance optimization of control system is realized under the premise of no priori.For AUV systems, many researchers have set Count out the various control methods based on intensified learning and actual verification its feasibility.It is asked for autonomous underwater cable tracing control Topic, EI-Fakdi et al. is using direct strategy search technique come learning state/action mapping relations, but this method is only applicable to State and motion space are all discrete situations;And for continuous motion space, Paula et al. is using radial basis function network come close Like strategic function, however since the approximation to function ability of radial basis function network is weaker, which can not ensure higher tracking Control accuracy.
In recent years, with batch study, the hair of experience replay and batch regularization even depth neural network (DNN) training technique Exhibition, deeply study are multiple in motion planning and robot control, autonomous ground vehicle motion control, quadrotor control and automatic Pilot etc. Excellent properties are shown in miscellaneous task.The depth Q networks (DQN) especially proposed in the recent period are in many extremely challenging tasks In all show the control accuracy of human levels.However DQN cannot handle while have dimensional state space and continuous action empty Between the problem of.On the basis of DQN, depth deterministic policy gradient (DDPG) algorithm is by it is further proposed that and realize continuous control System.However DDPG estimates the desired value of evaluation network using objective appraisal network so that evaluation network cannot be evaluated effectively By tactful e-learning to strategy, and there are larger variances for the action value function learnt, therefore when DDPG is applied to AUV When Trajectory Tracking Control problem, it cannot be satisfied higher tracing control precision and stablize the requirement of study.
Invention content
The purpose of the present invention is to propose to a kind of AUV Trajectory Tracking Control methods based on deeply study, this method is adopted Evaluation net is respectively trained with a kind of mixed strategy-evaluation network structure, and using multiple quasi- Q study and deterministic policy gradient Network and tactful network, overcome in the past that the method control accuracy based on intensified learning is relatively low, cannot achieve continuous control and learnt The problems such as journey is unstable realizes high-precision AUV Trajectory Tracking Controls and stablizes study.
To achieve the goals above, the present invention adopts the following technical scheme that:
A kind of Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study, this method includes following step Suddenly:
1) Autonomous Underwater Vehicle AUV Trajectory Tracking Control problems are defined
It includes four parts to define AUV Trajectory Tracking Control problems:Determine AUV systems input, determine AUV system outputs, It defines Trajectory Tracking Control error and establishes AUV Trajectory Tracking Control targets;It is as follows:
1-1) determine that AUV systems input
It is τ to enable AUV system input vectorsk=[ξkk]T, wherein ξk、δkThe respectively airscrew thrust of AUV and rudder angle, under It marks k and indicates k-th of time step;ξk、δkValue range be respectivelyWithRespectively maximum spiral Paddle thrust and hard over angle;
1-2) determine AUV system outputs
It is η to enable AUV system output vectorsk=[xk,ykk]T, wherein xk、ykRespectively k-th of time step AUV is in inertia Along X, the coordinate of Y-axis, ψ under coordinate system I-XYZkFor the angle of k-th time step AUV directions of advance and X-axis;
1-3) define Trajectory Tracking Control error
Reference locus is chosen according to the driving path of AUVDefine the AUV track followings of k-th of time step Controlling error is:
1-4) establish AUV Trajectory Tracking Control targets
For step 1-3) in reference locus dk, select the object function of following form:
Wherein, γ is discount factor, and H is weight matrix;
The target for establishing AUV Trajectory Tracking Controls is to find an optimal system list entries τ*So that the mesh of initial time Scalar functions P0(τ) is minimum, and calculation formula is as follows:
2) the markov decision process model of AUV track following problems is established
Markov decision process modeling is carried out to the AUV track following problems in step 1), is as follows:
2-1) definition status vector
The velocity vector for defining AUV systems is φk=[uk,vkk]T, wherein uk、vkRespectively k-th of edge time step AUV Direction of advance, the linear velocity perpendicular to direction of advance, χkIt is k-th of time step AUV around the angular speed of direction of advance;
According to step 1-2) determine AUV system output vectors ηkWith step 1-3) reference locus that defines, it defines k-th The state vector of time step is as follows:
2-2) definition action vector
The action vector for defining k-th of time step is the AUV system input vectors of the time step, i.e. akk
2-3) define reward function
The reward function of k-th of time step is for portraying in state skTake action akImplementation effect, according to step 1-3) The Trajectory Tracking Control error e of definitionkWith step 2-2) the action vector a that definesk, define the AUV reward letters of k-th of time step Number is as follows:
2-4) by step 1-4) establish AUV Trajectory Tracking Controls target τ*Be converted to the AUV under intensified learning frame Trajectory Tracking Control target
Definition strategy π is that each probability that may be acted is selected under a certain state, then it is as follows to define action value function:
Wherein,It indicates to reward function, state and the desired value of action;K walks for maximum time;
The action value function be used for describe currently and later it is stateful it is lower take tactful π when expectation aggregated rebates Reward, therefore under intensified learning frame, AUV Trajectory Tracking Control targets are to learn one by the interaction with AUV local environments A optimal objective strategy π*So that the working value of initial time is maximum, and calculation formula is as follows:
Wherein, p (s0) it is original state s0Distribution;a0For initial actuating vector;
By step 1-4) establish AUV Trajectory Tracking Controls target τ*Solution be converted to π*Solution;
2-5) simplify the AUV Trajectory Tracking Control targets under intensified learning frame
By following iteration Bellman equation come solution procedure 2-4) in action value function:
If strategy being to determine property of π, i.e., the action vector space from the state vector space of AUV to AUV is to map one by one Relationship, and be denoted as μ, be then reduced to above-mentioned iteration Bellman equation:
For deterministic strategy μ, by step 2-4) in optimal objective strategy π*It is reduced to certainty optimal objective plan Slightly μ*
3) mixed strategy-evaluation network is built
Certainty optimal objective strategy μ is estimated respectively by building mixed strategy-evaluation network*With it is corresponding optimal dynamic Make value functionIt includes three parts to build mixed strategy-evaluation network:Construction strategy network, structure evaluation network and determining mesh Mark strategy, is as follows:
3-1) construction strategy network
Mixed strategy-evaluation network structure is by building n tactful networkTo estimate certainty optimal objective plan Slightly μ*;Wherein, θpFor the weight parameter of p-th of tactful network, p=1 ..., n;Each strategy network respectively uses a full connection Deep neural network realize that each strategy network includes respectively an input layer, two hidden layers and an output layer;Respectively The input of tactful network is state vector sk, the output of each strategy network is action vector ak
3-2) structure evaluation network
Mixed strategy-evaluation network structure is by building m evaluation networkTo estimate optimal action value functionWherein, wqFor the weight parameter of q-th of evaluation network, q=1 ..., m;Each evaluation network respectively uses a full connection Deep neural network realize that each to evaluate network respectively include an input layer, two hidden layers and an output layer;Respectively The input for evaluating network is state vector skWith action vector ak, wherein state vector skIt is input to each evaluation network from input layer, Act vector akIt is input to each evaluation network from first hidden layer, it is each to evaluate network output as in state vector skUnder take it is dynamic Make vector akWorking value;
3-3) determine target strategy
According to constructed mixed strategy-evaluation network, the AUV Trajectory Tracking Controls that k-th of time step is learnt Target strategy μf(sk) it is defined as the mean value of n tactful network output, calculation formula is as follows:
4) the target strategy μ of AUV Trajectory Tracking Controls is solvedf(sk), it is as follows:
4-1) parameter setting
Maximum iteration M is respectively set, the training set size that maximum time step K, the experience replay of each iteration extract N, the learning rate α of network is respectively evaluatedω, each tactful network learning rate αθ, weight matrix in discount factor γ and reward function H;
4-2) initialize mixed strategy-evaluation network
The tactful network of random initializtion nWith m evaluation networkWeight parameter θpAnd wq; D-th of tactful network is randomly choosed from n tactful network to be denoted asD=1 ..., n;
Structure experience lines up set R, if the maximum capacity that the experience lines up set R is B, and is initialized as sky;
4-3) iteration starts, and is trained to mixed strategy-evaluation network, initialization iterations episode=1;
4-4) setting current time walks k=0, the state variable s of random initializtion AUV0, the state that current time walks is enabled to become Measure sk=s0;And it generates one and explores noise Noisek
4-5) according to n current strategies networkWith exploration noise NoisekDetermine the action vector a of current time stepk For:
4-6) AUV is in current state skLower execution acts ak, according to step 2-3) and receive awards function rk+1, and observe one A new state sk+1;Remember ek=(sk,ak,rk+1,sk+1) it is an experience sample;If experience lines up the sample size of set R Capacity B is had reached the maximum, then first deletes a sample being added at first, then by experience sample ekDeposit experience lines up set R In;Otherwise directly by experience sample ekDeposit experience is lined up in set R;
Line up to choose A experience sample in set R from experience, it is specific as follows:When experience lines up in set R sample size not When more than N, then chooses the experience and line up had experience sample in set R;When experience, which lines up set R, is more than N, then from the warp It tests and lines up to randomly select N number of experience sample (s in set Rl,al,rl+1,sl+1);
The graceful absolute error EBAE of expectation Bell of each evaluation network 4-7) is calculated according to A experience sample of selectionq, use In the performance of each evaluation network of characterization, formula is as follows:
The evaluation network for selecting performance worst acquires the serial number of the worst evaluation network of the performance, note by following formula For c:
4-8) by c-th of evaluation networkEach experience sample is obtained by such as next greedy strategy to walk in future time Action vector:
The desired value of c-th of evaluation network 4-9) is calculated by multiple quasi- Q learning methodsFormula is as follows:
4-10) calculate the loss function L (w of c-th of evaluation networkc), formula is as follows:
4-11) pass through loss function L (wc) to weight parameter wcDerivative update the weight parameter of c-th of evaluation network, Formula is as follows:
The weight parameter of remaining evaluation network remains unchanged;
A tactful network 4-12) is randomly choosed from n tactful network to reset d-th of tactful network
4-13) according to d-th of tactful network of updated c-th evaluation network calculationsCertainty Policy-GradientAnd d-th of tactful network is updated with thisWeight parameter θd, calculation formula is distinguished as follows:
The weight parameter of remaining tactful network remains unchanged;
It 4-14) enables k=k+1 and k is judged:Such as k<K then returns to step 4-5), AUV continues track reference rail Mark;Otherwise, 4-15 is entered step);
It 4-15) enables episode=episode+1 and episode is judged:Such as episode<M then returns to step Rapid 4-4), AUV carries out next iterative process;Otherwise, 4-16 is entered step);
4-16) iteration terminates, and terminates the training process of mixed strategy-evaluation network, n policy network when by iteration ends The output valve of network passes through step 3-3) in calculation formula obtain the target strategy μ of final AUV Trajectory Tracking Controlsf(sk), by this Target strategy realizes the Trajectory Tracking Control to AUV.
The features of the present invention and advantageous effect:
Method proposed by the present invention uses multiple tactful networks and evaluation network.For multiple evaluation networks, by fixed Justice it is expected the graceful absolute error of Bell to assess the performance of each evaluation network, in each time step only worst one of more new capability Network is evaluated, the existing control method based on intensified learning is different from, the present invention proposes multiple quasi- Q learning methods to calculate more Accurately evaluation network objectives value, this method can solve action value function and cross estimation problem, and can be not by target Stablize learning process under the premise of evaluating network.For multiple tactful networks, a policy network is randomly choosed in each time step Network, and be updated using deterministic policy gradient.The final strategy learnt is the mean value of all policies network.
1) AUV Trajectory Tracking Controls method proposed by the present invention passes through AUV adopting in the process of moving independent of model Sample data carry out autonomous learning and send as an envoy to obtain the target strategy that is optimal of control targe, which need not make AUV models any It is assumed that being particularly suitable for the AUV to work under complicated abyssal environment, there is very high actual application value.
2) the method for the present invention obtains more accurately evaluating network objectives value than existing method using multiple quasi- Q study, Both the variance for having reduced the action value function obtained by evaluation network approximation also solves action value function and crosses estimation problem, from And more preferably target strategy is obtained, realize high-precision AUV Trajectory Tracking Controls.
3) which evaluation net of each time step update determined based on the graceful absolute error of desired Bell for the method for the present invention Network, this update rule can weaken the influence of poor evaluation network, to ensure the Fast Convergent of learning process.
4) the method for the present invention is as a result of multiple evaluation networks, learning process be not easily susceptible to severe AUV history with The influence of track track, robustness is good, and learning process is stablized.
5) intensified learning is combined by the method for the present invention with deep neural network, has very strong self-learning capability, can It is realized in uncertain abyssal environment and the high-accuracy self-adaptation of AUV is controlled, in scenes such as AUV track followings, underwater avoidances In have good application prospect.
Description of the drawings
Fig. 1 is the performance comparison figure of proposition method of the present invention and existing DDPG methods;Wherein, figure (a) is learning curve pair Than figure, figure (b) is AUV track following effect contrast figures.
Fig. 2 is the performance comparison figure of proposition method of the present invention and Neural network PID method;Wherein, figure (a) is AUV along X, Y The Grid Track tracking effect comparison diagram in direction, figure (b) are AUV in X, the tracking error comparison diagram of Y-direction.
Specific implementation mode
A kind of Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study proposed by the present invention, below It is further described with reference to the drawings and specific embodiments as follows.
The present invention proposes a kind of Autonomous Underwater Vehicle tracking control algorithm learnt based on deeply, includes mainly Four parts:Markov decision process model, the structure for defining AUV Trajectory Tracking Controls problem, establishing AUV track following problems It builds mixed strategy-evaluation network structure and solves the target strategy of AUV Trajectory Tracking Controls.
1) AUV Trajectory Tracking Control problems are defined
It includes four component parts to define AUV Trajectory Tracking Control problems:It determines the input of AUV systems, determine that AUV systems are defeated Go out, define Trajectory Tracking Control error and establish AUV Trajectory Tracking Control targets;It is as follows:
1-1) determine that AUV systems input
It is τ to enable AUV system input vectorsk=[ξkk]T, wherein ξk、δkThe respectively airscrew thrust of AUV and rudder angle, under The value that k indicates k-th of time step, that is, moment kt is marked, wherein t is time step, similarly hereinafter;ξk、δkValue range be respectivelyWithWhereinRespectively maximum airscrew thrust and hard over angle, according to spiral used by AUV Paddle type determination.
1-2) determine AUV system outputs
It is η to enable AUV system output vectorsk=[xk,ykk]T, wherein xk、ykRespectively k-th of time step AUV is in inertia Along X, the coordinate of Y-axis, ψ under coordinate system I-XYZkFor the angle of k-th time step AUV directions of advance and X-axis.
1-3) define Trajectory Tracking Control error
Reference locus is chosen according to the driving path of AUVDefine the AUV track followings of k-th of time step Controlling error is:
1-4) establish AUV Trajectory Tracking Control targets
For step 1-3) in reference locus dk, select the object function of following form:
Wherein, γ is discount factor, and H is weight matrix;
The target for establishing AUV Trajectory Tracking Controls is to find an optimal system list entries τ*So that the mesh of initial time Scalar functions P0(τ) is minimum, and calculation formula is as follows:
2) the markov decision process model of AUV track following problems is established
Markov decision process (MDP) is the basis of intensified learning theory, it is therefore desirable to the tracks AUV in step 1) Tracking problem carries out MDP modelings.The essential element of intensified learning includes intelligent body, environment, state, action and reward function, intelligence Can the target of body be to be maximized by the interaction with AUV local environments to learn optimal action (or control input) sequence Cumulative award (or minimize and add up tracing control error), and then realize the solution of AUV track following targets.Specific steps are such as Under:
2-1) definition status vector
The velocity vector for defining AUV systems is φk=[uk,vkk]T, wherein uk、vkRespectively k-th of edge time step AUV Direction of advance, the linear velocity perpendicular to direction of advance, χkIt is k-th of time step AUV around the angular speed of direction of advance.
According to step 1-2) determine AUV system output vectors ηkWith step 1-3) reference locus that defines, it defines k-th The state vector of time step is as follows:
2-2) definition action vector
The action vector for defining k-th of time step is the AUV system input vectors of the time step, i.e.,:akk
2-3) define reward function
The reward function of k-th of time step is for portraying in state skTake action akImplementation effect, according to step 1-3) The Trajectory Tracking Control error e of definitionkWith step 2-2) the action vector a that definesk, define the AUV reward letters of k-th of time step Number is as follows:
2-4) by step 1-4) establish AUV Trajectory Tracking Controls target τ*Be converted to the AUV under intensified learning frame Trajectory Tracking Control target
Definition strategy π is that each probability that may be acted is selected under a certain state, then it is as follows to define action value function:
Wherein,It indicates to reward function, state and the desired value of action (similarly hereinafter);K walks for maximum time;
The action value function be used for describe currently and later it is stateful it is lower take tactful π when expectation aggregated rebates Reward, therefore, under intensified learning frame, AUV Trajectory Tracking Controls target (i.e. the target of intelligent body) be by with residing for AUV The interaction of environment learns an optimal objective strategy π*So that the working value of initial time is maximum, i.e.,:
Wherein, p (s0) it is original state s0Distribution;a0For initial actuating vector.
Therefore, step 1-4) establish AUV Trajectory Tracking Controls target τ*Solution can be exchanged into π*Solution.
2-5) simplify the AUV Trajectory Tracking Control targets under intensified learning frame
Similar to Dynamic Programming, many intensified learning methods carry out solution procedure 2-4 using following iteration Bellman equation) in Action value function:
It is assumed that strategy being to determine property of π, i.e., the action vector space from the state vector space of AUV to AUV is to reflect one by one The relationship penetrated, and it is denoted as μ, then above-mentioned iteration Bellman equation can be reduced to:
In addition, for deterministic strategy μ, by step 2-4) in optimal objective strategy π*It is reduced to the optimal mesh of certainty Mark strategy μ*
3) mixed strategy-evaluation network is built
By step 2-5) it is found that solving AUV track following the very corns of a subject are how to solve certainty using intensified learning Optimal objective strategy μ*With corresponding optimal action value functionThe method of the present invention uses a kind of mixed strategy-evaluation network To estimate μ respectively*WithIt includes three parts to build mixed strategy-evaluation network:Construction strategy network, structure evaluation network and It determines target strategy, is as follows:
3-1) construction strategy network
Mixed strategy-evaluation network structure (is instructed to balance inventive algorithm tracing control precision with network by building n Practice speed, value should not be too large also unsuitable too small) a tactful networkTo estimate certainty optimal objective strategy μ*。 Wherein, θpFor the weight parameter of p-th of tactful network, p=1 ..., n;Each strategy network respectively uses a depth connected entirely Neural network is spent to realize, each strategy network includes respectively an input layer, two hidden layers and an output layer, each plan Slightly the input of network is state vector sk, each strategy network output is action vector ak, two hidden layers respectively contain 400 and 300 A unit.
3-2) structure evaluation network
Mixed strategy-evaluation network structure is by building m (basis for selecting of evaluation the number networks and above-mentioned tactful network number The basis for selecting of amount is identical) a evaluation networkTo estimate optimal action value functionWherein,w qIt is commented for q-th The weight parameter of valence network, q=1 ..., m;The deep neural network that each evaluation network is respectively connected using one entirely is come real Existing, each to evaluate network respectively comprising an input layer, two hidden layers and an output layer, two hidden layers contain respectively 400 and 300 units;The input of each evaluation network is state vector skWith action vector ak, wherein state vector skFrom input layer It is input to each evaluation network, action vector akBe input to each evaluation network from first hidden layer, it is each evaluate network output for State vector skUnder take action vector akWorking value.
3-3) determine target strategy
According to constructed mixed strategy-evaluation network, the AUV Trajectory Tracking Controls that k-th of time step is learnt Target strategy μf(sk) it is defined as the mean value of n tactful network output, calculation formula is as follows:
4) the target strategy μ of AUV Trajectory Tracking Controls is solvedf(sk), it is as follows:
4-1) parameter setting
Maximum iteration M is respectively set, the training set size that maximum time step K, the experience replay of each iteration extract N, the learning rate α of network is respectively evaluatedω, each tactful network learning rate αθ, weight matrix in discount factor γ and reward function H;In the present embodiment, M=1500, K=1000 (each time step t=0.2s), N=64, each α for evaluating networkω=0.01, The α of each strategy networkθ=0.001, γ=0.99, H=[0.001,0;0,0.001];
4-2) initialize mixed strategy-evaluation network
The tactful network of random initializtion nWith m evaluation networkWeight parameter θpAnd wq; A tactful networks of d (d=1 ..., n) are randomly choosed from n tactful network to be denoted as
Structure experience lines up set R, if the maximum capacity that the experience lines up set R is B (the present embodiment B=10000), and It is initialized as sky;
4-3) iteration starts, and is trained to mixed strategy-evaluation network, initialization iterations episode=1;
4-4) setting current time walks k=0, the state variable s of random initializtion AUV0, the state that current time walks is enabled to become Measure sk=s 0;And it generates one and explores noise Noisek(the present embodiment uses Ornstein-Wu Lun Bake (Ornstein- Uhlenbeck noise) is explored);
4-5) according to n current strategies networkWith exploration noise NoisekDetermine the action vector a of current time stepk For:
4-6) AUV is in current state skLower execution acts ak, according to step 2-3) and receive awards function rk+1, and observe one A new state sk+1;Remember ek=(sk,ak,rk+1,sk+1) it is an experience sample;If experience lines up the sample size of set R Capacity B is had reached the maximum, then first deletes a sample being added at first, then by experience sample ekDeposit experience lines up set R In;Otherwise directly by experience sample ekDeposit experience is lined up in set R;
Line up to choose A experience sample in set R from experience, A≤N is specific as follows:When experience lines up sample in set R When quantity is no more than N, then chooses the experience and line up had experience sample in set R;When experience, which lines up set R, is more than N, then Line up to randomly select N number of experience sample (s in set R from the experiencel,al,rl+1,sl+1), l is selected experience sample place Time step;
The graceful absolute error EBAE of expectation Bell of each evaluation network 4-7) is calculated according to A experience sample of selectionq, use In the performance of each evaluation network of characterization, formula is as follows:
The evaluation network for selecting performance worst acquires the serial number of the worst evaluation network of the performance, note by following formula For c:
4-8) by c-th of evaluation networkEach experience sample is obtained by such as next greedy strategy to walk in future time Action vector:
The desired value of c-th of evaluation network 4-9) is calculated by multiple quasi- Q learning methodsFormula is as follows:
4-10) calculate the loss function L (w of c-th of evaluation networkc), formula is as follows:
4-11) pass through loss function L (wc) to weight parameter wcDerivative update the weight parameter of c-th of evaluation network, Formula is as follows:
The weight parameter of remaining evaluation network remains unchanged;
A tactful network 4-12) is randomly choosed from n tactful network to reset d-th of tactful network
4-13) according to d-th of tactful network of updated c-th evaluation network calculationsCertainty Policy-GradientAnd d-th of tactful network is updated with thisWeight parameter θd, calculation formula is distinguished as follows:
The weight parameter of remaining tactful network remains unchanged.
It 4-14) enables k=k+1 and k is judged:Such as k<K then returns to step 4-5), AUV continues track reference rail Mark;Otherwise, 4-15 is entered step).
It 4-15) enables episode=episode+1 and episode is judged:Such as episode<M then returns to step Rapid 4-4), AUV carries out next iterative process;Otherwise, 4-16 is entered step).
4-16) iteration terminates, and terminates the training process of mixed strategy-evaluation network, n policy network when by iteration ends The output valve of network passes through step 3-3) in calculation formula obtain the target strategy μ of final AUV Trajectory Tracking Controlsf(sk), by this Target strategy realizes the Trajectory Tracking Control to AUV.
The validation verification of the embodiment of the present invention
AUV Trajectory Tracking Controls method (hereinafter referred to as MPQ-DPG) based on deeply study proposed by the invention Performance evaluation it is as follows, all contrast experiments are all based on widely used REMUS autonomous unmanned navigations device, maximum spiral shell Revolve paddle thrustAnd rudder angleRespectively 86N and 0.24rad;And use following reference locus:
In addition, in embodiments of the present invention, evaluation the number networks m is identical as strategy the number networks n, is uniformly denoted as n hereinafter.
1) MPQ-DPG and existing DDPG methods comparative analysis
Fig. 1 be deeply proposed by the present invention study AUV propose Trajectory Tracking Control method (MPQ-DPG) with it is existing Comparison in the learning curve and track following effect of DDPG methods in the training process.Wherein, the learning curve in figure (a) is It is obtained by five independent experiments, Ref indicates reference locus in figure (b).
Analysis chart 1 can obtain and such as draw a conclusion:
A) relative to DDPG methods, the study stability of MPQ-DPG is more preferable, this is because MPQ-DPG uses multiple evaluations Network and tactful network, can reduce influence of the difference sample to study stability.
B) finally convergent average cumulative reward is apparently higher than DDPG methods to MPQ-DPG methods, this illustrates the side MPQ-DPG The tracing control precision of method will be apparently higher than DDPG methods.
C) it from Fig. 1 (b) it is observed that the pursuit path that MPQ-DPG methods obtain almost is overlapped with reference locus, says High-precision AUV tracing controls may be implemented in bright MPQ-DPG methods.
D) with the increase of tactful network and evaluation the number networks, the tracing control precision of MPQ-DPG methods can be carried gradually Height, but the amplitude improved is in n>It will be no longer apparent after 4.
2) MPQ-DPG methods and existing Neural network PID method comparative analysis
Fig. 2 is that the present invention is the MPQ-DPG methods and Neural network PID that underwater unmanned vehicle Trajectory Tracking Control proposes Comparison of the method on Grid Track aircraft pursuit course and Grid Track tracking error.Ref indicates reference coordinate track in figure, PIDNN indicates Neural network PID algorithm, n=4.
Analysis chart 2 can obtain, and the tracking performance of NN-PID Control Method is significantly worse than MPQ-DPG proposed by the present invention Method;In addition, the tracking error in Fig. 2 (b) shows that MPQ-DPG methods may be implemented error and restrain faster, especially rising Quick, high-precision tracking performance still may be implemented in stage beginning, MPQ-DPG methods, and when the response of Neural network PID method Between to be considerably longer than MPQ-DPG methods, and the convergence of tracking error is poor.
The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, it is other it is any without departing from the spirit and principles of the present invention made by changes, modifications, substitutions, combinations, simplifications, Equivalent substitute mode is should be, is included within the scope of the present invention.

Claims (1)

1. a kind of Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study, which is characterized in that this method Include the following steps:
1) Autonomous Underwater Vehicle AUV Trajectory Tracking Control problems are defined
It includes four parts to define AUV Trajectory Tracking Control problems:It determines the input of AUV systems, determine AUV system outputs, definition Trajectory Tracking Control error and establish AUV Trajectory Tracking Control targets;It is as follows:
1-1) determine that AUV systems input
It is τ to enable AUV system input vectorsk=[ξkk]T, wherein ξk、δkThe respectively airscrew thrust of AUV and rudder angle, subscript k Indicate k-th of time step;ξk、δkValue range be respectivelyWith Respectively maximum propeller pushes away Power and hard over angle;
1-2) determine AUV system outputs
It is η to enable AUV system output vectorsk=[xk,ykk]T, wherein xk、ykRespectively k-th of time step AUV is in inertial coodinate system Along X, the coordinate of Y-axis, ψ under I-XYZkFor the angle of k-th time step AUV directions of advance and X-axis;
1-3) define Trajectory Tracking Control error
Reference locus is chosen according to the driving path of AUVDefine the AUV Trajectory Tracking Controls of k-th of time step Error is:
1-4) establish AUV Trajectory Tracking Control targets
For step 1-3) in reference locus dk, select the object function of following form:
Wherein, γ is discount factor, and H is weight matrix;
The target for establishing AUV Trajectory Tracking Controls is to find an optimal system list entries τ*So that the target letter of initial time Number P0(τ) is minimum, and calculation formula is as follows:
2) the markov decision process model of AUV track following problems is established
Markov decision process modeling is carried out to the AUV track following problems in step 1), is as follows:
2-1) definition status vector
The velocity vector for defining AUV systems is φk=[uk,vkk]T, wherein uk、vkRespectively k-th of time step AUV is along advance Direction, the linear velocity perpendicular to direction of advance, χkIt is k-th of time step AUV around the angular speed of direction of advance;
According to step 1-2) determine AUV system output vectors ηkWith step 1-3) reference locus that defines, define k-th of time The state vector of step is as follows:
2-2) definition action vector
The action vector for defining k-th of time step is the AUV system input vectors of the time step, i.e. akk
2-3) define reward function
The reward function of k-th of time step is for portraying in state skTake action akImplementation effect, according to step 1-3) definition Trajectory Tracking Control error ekWith step 2-2) the action vector a that definesk, define the AUV reward functions of k-th of time step such as Under:
2-4) by step 1-4) establish AUV Trajectory Tracking Controls target τ*Be converted to tracks AUV under intensified learning frame with Track control targe
Definition strategy π is that each probability that may be acted is selected under a certain state, then it is as follows to define action value function:
Wherein,It indicates to reward function, state and the desired value of action;K walks for maximum time;
The action value function be used to describe it is current and later it is stateful it is lower take tactful π when expectation aggregated rebates encourage It encourages, therefore under intensified learning frame, AUV Trajectory Tracking Control targets are to learn one by the interaction with AUV local environments Optimal objective strategy π*So that the working value of initial time is maximum, and calculation formula is as follows:
Wherein, p (s0) it is original state s0Distribution;a0For initial actuating vector;
By step 1-4) establish AUV Trajectory Tracking Controls target τ*Solution be converted to π*Solution;
2-5) simplify the AUV Trajectory Tracking Control targets under intensified learning frame
By following iteration Bellman equation come solution procedure 2-4) in action value function:
If strategy being to determine property of π, i.e., the action vector space from the state vector space of AUV to AUV is the pass mapped one by one System, and be denoted as μ, then above-mentioned iteration Bellman equation is reduced to:
For deterministic strategy μ, by step 2-4) in optimal objective strategy π*It is reduced to certainty optimal objective strategy μ*
3) mixed strategy-evaluation network is built
Certainty optimal objective strategy μ is estimated respectively by building mixed strategy-evaluation network*With corresponding optimal working value FunctionIt includes three parts to build mixed strategy-evaluation network:Construction strategy network, structure evaluation network and determining target plan Slightly, it is as follows:
3-1) construction strategy network
Mixed strategy-evaluation network structure is by building n tactful networkTo estimate certainty optimal objective strategy μ*; Wherein, θpFor the weight parameter of p-th of tactful network, p=1 ..., n;Each strategy network respectively uses a depth connected entirely Neural network is spent to realize, each strategy network includes respectively an input layer, two hidden layers and an output layer;Each strategy The input of network is state vector sk, the output of each strategy network is action vector ak
3-2) structure evaluation network
Mixed strategy-evaluation network structure is by building m evaluation networkTo estimate optimal action value function Wherein, wqFor the weight parameter of q-th of evaluation network, q=1 ..., m;Each evaluation network respectively uses a depth connected entirely For degree neural network to realize, it includes an input layer, two hidden layers and an output layer respectively to evaluate network respectively;Each evaluation The input of network is state vector skWith action vector ak, wherein state vector skIt is input to each evaluation network from input layer, is acted Vectorial akIt is input to each evaluation network from first hidden layer, it is each to evaluate network output as in state vector skUnder take action to Measure akWorking value;
3-3) determine target strategy
According to constructed mixed strategy-evaluation network, the target for the AUV Trajectory Tracking Controls that k-th of time step is learnt Tactful μf(sk) it is defined as the mean value of n tactful network output, calculation formula is as follows:
4) the target strategy μ of AUV Trajectory Tracking Controls is solvedf(sk), it is as follows:
4-1) parameter setting
Maximum iteration M is respectively set, is the training set size N that maximum time step K, the experience replay of each iteration extract, each Evaluate the learning rate α of networkω, each tactful network learning rate αθ, weight matrix H in discount factor γ and reward function;
4-2) initialize mixed strategy-evaluation network
The tactful network of random initializtion nWith m evaluation networkWeight parameter θpAnd wq;From n D-th of tactful network is randomly choosed in tactful network to be denoted asD=1 ..., n;
Structure experience lines up set R, if the maximum capacity that the experience lines up set R is B, and is initialized as sky;
4-3) iteration starts, and is trained to mixed strategy-evaluation network, initialization iterations episode=1;
4-4) setting current time walks k=0, the state variable s of random initializtion AUV0, enable the state variable s of current time stepk =s0;And it generates one and explores noise Noisek
4-5) according to n current strategies networkWith exploration noise NoisekDetermine the action vector a of current time stepkFor:
4-6) AUV is in current state skLower execution acts ak, according to step 2-3) and receive awards function rk+1, and observe one newly State sk+1;Remember ek=(sk,ak,rk+1,sk+1) it is an experience sample;If experience has lined up the sample size of set R Reach maximum capacity B, then first delete a sample being added at first, then by experience sample ekDeposit experience is lined up in set R;It is no Then directly by experience sample ekDeposit experience is lined up in set R;
Line up to choose A experience sample in set R from experience, it is specific as follows:Line up sample size in set R when experience to be no more than When N, then chooses the experience and line up had experience sample in set R;When experience, which lines up set R, is more than N, then arranged from the experience N number of experience sample (s is randomly selected in team set Rl,al,rl+1,sl+1);
The graceful absolute error EBAE of expectation Bell of each evaluation network 4-7) is calculated according to A experience sample of selectionq, it is used for table The performance of each evaluation network of sign, formula are as follows:
The evaluation network for selecting performance worst is acquired the serial number of the worst evaluation network of the performance by following formula, is denoted as c:
4-8) by c-th of evaluation networkEach experience sample is obtained by such as next greedy strategy to move what future time walked Make vector:
The desired value of c-th of evaluation network 4-9) is calculated by multiple quasi- Q learning methodsFormula is as follows:
4-10) calculate the loss function L (w of c-th of evaluation networkc), formula is as follows:
4-11) pass through loss function L (wc) to weight parameter wcDerivative update the weight parameter of c-th of evaluation network, formula It is as follows:
The weight parameter of remaining evaluation network remains unchanged;
A tactful network 4-12) is randomly choosed from n tactful network to reset d-th of tactful network
4-13) according to d-th of tactful network of updated c-th evaluation network calculationsCertainty Policy-GradientAnd D-th of tactful network is updated with thisWeight parameter θd, calculation formula is distinguished as follows:
The weight parameter of remaining tactful network remains unchanged;
It 4-14) enables k=k+1 and k is judged:Such as k<K then returns to step 4-5), AUV continues track reference track; Otherwise, 4-15 is entered step);
It 4-15) enables episode=episode+1 and episode is judged:Such as episode<M then returns to step 4- 4), AUV carries out next iterative process;Otherwise, 4-16 is entered step);
4-16) iteration terminates, and terminates the training process of mixed strategy-evaluation network, the tactful network of n when by iteration ends Output valve passes through step 3-3) in calculation formula obtain the target strategy μ of final AUV Trajectory Tracking Controlsf(sk), by the target Strategy realizes the Trajectory Tracking Control to AUV.
CN201810535773.8A 2018-05-30 2018-05-30 Autonomous underwater vehicle track tracking control method based on deep reinforcement learning Active CN108803321B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810535773.8A CN108803321B (en) 2018-05-30 2018-05-30 Autonomous underwater vehicle track tracking control method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810535773.8A CN108803321B (en) 2018-05-30 2018-05-30 Autonomous underwater vehicle track tracking control method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN108803321A true CN108803321A (en) 2018-11-13
CN108803321B CN108803321B (en) 2020-07-10

Family

ID=64089259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810535773.8A Active CN108803321B (en) 2018-05-30 2018-05-30 Autonomous underwater vehicle track tracking control method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN108803321B (en)

Cited By (73)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109361700A (en) * 2018-12-06 2019-02-19 郑州航空工业管理学院 A kind of unmanned plane self-organizing network system adaptive recognition protocol frame
CN109696830A (en) * 2019-01-31 2019-04-30 天津大学 The reinforcement learning adaptive control method of small-sized depopulated helicopter
CN109719721A (en) * 2018-12-26 2019-05-07 北京化工大学 A kind of autonomous emergence of imitative snake search and rescue robot adaptability gait
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN109765916A (en) * 2019-03-26 2019-05-17 武汉欣海远航科技研发有限公司 A kind of unmanned surface vehicle path following control device design method
CN109828463A (en) * 2019-02-18 2019-05-31 哈尔滨工程大学 A kind of adaptive wave glider bow of ocean current interference is to control method
CN109828467A (en) * 2019-03-01 2019-05-31 大连海事大学 A kind of the unmanned boat intensified learning controller architecture and design method of data-driven
CN109870162A (en) * 2019-04-04 2019-06-11 北京航空航天大学 A kind of unmanned plane during flying paths planning method based on competition deep learning network
CN109960259A (en) * 2019-02-15 2019-07-02 青岛大学 A kind of unmanned guiding vehicle paths planning method of the multiple agent intensified learning based on gradient gesture
CN110045614A (en) * 2019-05-16 2019-07-23 河海大学常州校区 A kind of traversing process automatic learning control system of strand suction ship and method based on deep learning
CN110083064A (en) * 2019-04-29 2019-08-02 辽宁石油化工大学 A kind of network optimal track control method based on non-strategy Q- study
CN110321666A (en) * 2019-08-09 2019-10-11 重庆理工大学 Multi-robots Path Planning Method based on priori knowledge Yu DQN algorithm
CN110333739A (en) * 2019-08-21 2019-10-15 哈尔滨工程大学 A kind of AUV conduct programming and method of controlling operation based on intensified learning
CN110362089A (en) * 2019-08-02 2019-10-22 大连海事大学 A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN110428615A (en) * 2019-07-12 2019-11-08 中国科学院自动化研究所 Learn isolated intersection traffic signal control method, system, device based on deeply
CN110673602A (en) * 2019-10-24 2020-01-10 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN110716574A (en) * 2019-09-29 2020-01-21 哈尔滨工程大学 UUV real-time collision avoidance planning method based on deep Q network
CN110806759A (en) * 2019-11-12 2020-02-18 清华大学 Aircraft route tracking method based on deep reinforcement learning
CN110806756A (en) * 2019-09-10 2020-02-18 西北工业大学 Unmanned aerial vehicle autonomous guidance control method based on DDPG
CN110989576A (en) * 2019-11-14 2020-04-10 北京理工大学 Target following and dynamic obstacle avoidance control method for differential slip steering vehicle
CN111027677A (en) * 2019-12-02 2020-04-17 西安电子科技大学 Multi-maneuvering-target tracking method based on depth certainty strategy gradient DDPG
CN111061277A (en) * 2019-12-31 2020-04-24 歌尔股份有限公司 Unmanned vehicle global path planning method and device
CN111091710A (en) * 2019-12-18 2020-05-01 上海天壤智能科技有限公司 Traffic signal control method, system and medium
CN111240345A (en) * 2020-02-11 2020-06-05 哈尔滨工程大学 Underwater robot trajectory tracking method based on double BP network reinforcement learning framework
CN111310384A (en) * 2020-01-16 2020-06-19 香港中文大学(深圳) Wind field cooperative control method, terminal and computer readable storage medium
CN111580544A (en) * 2020-03-25 2020-08-25 北京航空航天大学 Unmanned aerial vehicle target tracking control method based on reinforcement learning PPO algorithm
TWI706238B (en) * 2018-12-18 2020-10-01 大陸商北京航跡科技有限公司 Systems and methods for autonomous driving
CN111736617A (en) * 2020-06-09 2020-10-02 哈尔滨工程大学 Speed observer-based benthonic underwater robot preset performance track tracking control method
CN111813143A (en) * 2020-06-09 2020-10-23 天津大学 Underwater glider intelligent control system and method based on reinforcement learning
CN111856936A (en) * 2020-07-21 2020-10-30 天津蓝鳍海洋工程有限公司 Control method for underwater high-flexibility operation platform with cable
CN112100834A (en) * 2020-09-06 2020-12-18 西北工业大学 Underwater glider attitude control method based on deep reinforcement learning
CN112132263A (en) * 2020-09-11 2020-12-25 大连理工大学 Multi-agent autonomous navigation method based on reinforcement learning
CN112148025A (en) * 2020-09-24 2020-12-29 东南大学 Unmanned aerial vehicle stability control algorithm based on integral compensation reinforcement learning
CN112162555A (en) * 2020-09-23 2021-01-01 燕山大学 Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet
CN112179367A (en) * 2020-09-25 2021-01-05 广东海洋大学 Intelligent autonomous navigation method based on deep reinforcement learning
CN112241176A (en) * 2020-10-16 2021-01-19 哈尔滨工程大学 Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment
CN112462792A (en) * 2020-12-09 2021-03-09 哈尔滨工程大学 Underwater robot motion control method based on Actor-Critic algorithm
CN112506210A (en) * 2020-12-04 2021-03-16 东南大学 Unmanned aerial vehicle control method for autonomous target tracking
US10955853B2 (en) 2018-12-18 2021-03-23 Beijing Voyager Technology Co., Ltd. Systems and methods for autonomous driving
CN112558465A (en) * 2020-12-03 2021-03-26 大连海事大学 Unknown unmanned ship finite time reinforcement learning control method with input limitation
CN112698572A (en) * 2020-12-22 2021-04-23 西安交通大学 Structural vibration control method, medium and equipment based on reinforcement learning
CN112929900A (en) * 2021-01-21 2021-06-08 华侨大学 MAC protocol for realizing time domain interference alignment based on deep reinforcement learning in underwater acoustic network
CN113029123A (en) * 2021-03-02 2021-06-25 西北工业大学 Multi-AUV collaborative navigation method based on reinforcement learning
CN113052372A (en) * 2021-03-17 2021-06-29 哈尔滨工程大学 Dynamic AUV tracking path planning method based on deep reinforcement learning
CN113095463A (en) * 2021-03-31 2021-07-09 南开大学 Robot confrontation method based on evolution reinforcement learning
CN113095500A (en) * 2021-03-31 2021-07-09 南开大学 Robot tracking method based on multi-agent reinforcement learning
CN113359448A (en) * 2021-06-03 2021-09-07 清华大学 Autonomous underwater vehicle track tracking control method aiming at time-varying dynamics
CN113370205A (en) * 2021-05-08 2021-09-10 浙江工业大学 Baxter mechanical arm track tracking control method based on machine learning
CN113467248A (en) * 2021-07-22 2021-10-01 南京大学 Fault-tolerant control method for unmanned aerial vehicle sensor during fault based on reinforcement learning
CN113595768A (en) * 2021-07-07 2021-11-02 西安电子科技大学 Distributed cooperative transmission algorithm for guaranteeing control performance of mobile information physical system
CN113821035A (en) * 2021-09-22 2021-12-21 北京邮电大学 Unmanned ship trajectory tracking control method and device
CN113829351A (en) * 2021-10-13 2021-12-24 广西大学 Collaborative control method of mobile mechanical arm based on reinforcement learning
CN113885330A (en) * 2021-10-26 2022-01-04 哈尔滨工业大学 Information physical system safety control method based on deep reinforcement learning
CN114020001A (en) * 2021-12-17 2022-02-08 中国科学院国家空间科学中心 Mars unmanned aerial vehicle intelligent control method based on depth certainty strategy gradient learning
CN114089633A (en) * 2021-11-19 2022-02-25 江苏科技大学 Multi-motor coupling drive control device and method for underwater robot
CN114357884A (en) * 2022-01-05 2022-04-15 厦门宇昊软件有限公司 Reaction temperature control method and system based on deep reinforcement learning
CN114527642A (en) * 2022-03-03 2022-05-24 东北大学 AGV automatic PID parameter adjusting method based on deep reinforcement learning
CN114721408A (en) * 2022-04-18 2022-07-08 哈尔滨理工大学 Underwater robot path tracking method based on reinforcement learning
CN114839884A (en) * 2022-07-05 2022-08-02 山东大学 Underwater vehicle bottom layer control method and system based on deep reinforcement learning
CN114967713A (en) * 2022-07-28 2022-08-30 山东大学 Underwater vehicle buoyancy discrete change control method based on reinforcement learning
CN114954840A (en) * 2022-05-30 2022-08-30 武汉理工大学 Stability changing control method, system and device for stability changing ship and storage medium
CN114995137A (en) * 2022-06-01 2022-09-02 哈尔滨工业大学 Rope-driven parallel robot control method based on deep reinforcement learning
CN115330276A (en) * 2022-10-13 2022-11-11 北京云迹科技股份有限公司 Method and device for robot to automatically select elevator based on reinforcement learning
CN115562345A (en) * 2022-10-28 2023-01-03 北京理工大学 Unmanned aerial vehicle detection track planning method based on deep reinforcement learning
CN115657683A (en) * 2022-11-14 2023-01-31 中国电子科技集团公司第十研究所 Unmanned and cableless submersible real-time obstacle avoidance method capable of being used for inspection task
WO2023019536A1 (en) * 2021-08-20 2023-02-23 上海电气电站设备有限公司 Deep reinforcement learning-based photovoltaic module intelligent sun tracking method
CN115826594A (en) * 2023-02-23 2023-03-21 北京航空航天大学 Unmanned underwater vehicle switching topology formation control method independent of dynamic model parameters
CN115857556A (en) * 2023-01-30 2023-03-28 中国人民解放军96901部队 Unmanned aerial vehicle collaborative detection planning method based on reinforcement learning
CN115855226A (en) * 2023-02-24 2023-03-28 青岛科技大学 Multi-AUV cooperative underwater data acquisition method based on DQN and matrix completion
CN116295449A (en) * 2023-05-25 2023-06-23 吉林大学 Method and device for indicating path of autonomous underwater vehicle
CN116578102A (en) * 2023-07-13 2023-08-11 清华大学 Obstacle avoidance method and device for autonomous underwater vehicle, computer equipment and storage medium
CN116827685A (en) * 2023-08-28 2023-09-29 成都乐超人科技有限公司 Dynamic defense strategy method of micro-service system based on deep reinforcement learning
CN114089633B (en) * 2021-11-19 2024-04-26 江苏科技大学 Multi-motor coupling driving control device and method for underwater robot

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120188365A1 (en) * 2009-07-20 2012-07-26 Precitec Kg Laser processing head and method for compensating for the change in focus position in a laser processing head
KR101545731B1 (en) * 2014-04-30 2015-08-20 인하대학교 산학협력단 System and method for video tracking
CN107065881A (en) * 2017-05-17 2017-08-18 清华大学 A kind of robot global path planning method learnt based on deeply
CN107102644A (en) * 2017-06-22 2017-08-29 华南师范大学 The underwater robot method for controlling trajectory and control system learnt based on deeply
CN107368076A (en) * 2017-07-31 2017-11-21 中南大学 Robot motion's pathdepth learns controlling planning method under a kind of intelligent environment
CN107856035A (en) * 2017-11-06 2018-03-30 深圳市唯特视科技有限公司 A kind of robustness dynamic motion method based on intensified learning and whole body controller

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120188365A1 (en) * 2009-07-20 2012-07-26 Precitec Kg Laser processing head and method for compensating for the change in focus position in a laser processing head
KR101545731B1 (en) * 2014-04-30 2015-08-20 인하대학교 산학협력단 System and method for video tracking
CN107065881A (en) * 2017-05-17 2017-08-18 清华大学 A kind of robot global path planning method learnt based on deeply
CN107102644A (en) * 2017-06-22 2017-08-29 华南师范大学 The underwater robot method for controlling trajectory and control system learnt based on deeply
CN107368076A (en) * 2017-07-31 2017-11-21 中南大学 Robot motion's pathdepth learns controlling planning method under a kind of intelligent environment
CN107856035A (en) * 2017-11-06 2018-03-30 深圳市唯特视科技有限公司 A kind of robustness dynamic motion method based on intensified learning and whole body controller

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LI ZHOU等: "AUV Based Source Seeking with Estimated Gradients", 《JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY》 *
RUNSHENG YU等: "Deep Reinforcement Learning Based Optimal Trajectory Tracking Control of Autonomous Underwater Vehicle", 《PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE》 *
段勇等: "进化强化学习及其在机器人路径跟踪中的应用", 《控制与决策》 *
马琼雄等: "基于深度强化学习的水下机器人最优轨迹控制", 《华南师范大学(自然科学版)》 *

Cited By (110)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109361700A (en) * 2018-12-06 2019-02-19 郑州航空工业管理学院 A kind of unmanned plane self-organizing network system adaptive recognition protocol frame
US10955853B2 (en) 2018-12-18 2021-03-23 Beijing Voyager Technology Co., Ltd. Systems and methods for autonomous driving
US11669097B2 (en) 2018-12-18 2023-06-06 Beijing Voyager Technology Co., Ltd. Systems and methods for autonomous driving
TWI706238B (en) * 2018-12-18 2020-10-01 大陸商北京航跡科技有限公司 Systems and methods for autonomous driving
CN109719721A (en) * 2018-12-26 2019-05-07 北京化工大学 A kind of autonomous emergence of imitative snake search and rescue robot adaptability gait
CN109719721B (en) * 2018-12-26 2020-07-24 北京化工大学 Adaptive gait autonomous emerging method of snake-like search and rescue robot
CN109726866A (en) * 2018-12-27 2019-05-07 浙江农林大学 Unmanned boat paths planning method based on Q learning neural network
CN109696830A (en) * 2019-01-31 2019-04-30 天津大学 The reinforcement learning adaptive control method of small-sized depopulated helicopter
CN109696830B (en) * 2019-01-31 2021-12-03 天津大学 Reinforced learning self-adaptive control method of small unmanned helicopter
CN109960259B (en) * 2019-02-15 2021-09-24 青岛大学 Multi-agent reinforcement learning unmanned guided vehicle path planning method based on gradient potential
CN109960259A (en) * 2019-02-15 2019-07-02 青岛大学 A kind of unmanned guiding vehicle paths planning method of the multiple agent intensified learning based on gradient gesture
CN109828463A (en) * 2019-02-18 2019-05-31 哈尔滨工程大学 A kind of adaptive wave glider bow of ocean current interference is to control method
CN109828467A (en) * 2019-03-01 2019-05-31 大连海事大学 A kind of the unmanned boat intensified learning controller architecture and design method of data-driven
CN109765916A (en) * 2019-03-26 2019-05-17 武汉欣海远航科技研发有限公司 A kind of unmanned surface vehicle path following control device design method
CN109870162A (en) * 2019-04-04 2019-06-11 北京航空航天大学 A kind of unmanned plane during flying paths planning method based on competition deep learning network
CN110083064B (en) * 2019-04-29 2022-02-15 辽宁石油化工大学 Network optimal tracking control method based on non-strategy Q-learning
CN110083064A (en) * 2019-04-29 2019-08-02 辽宁石油化工大学 A kind of network optimal track control method based on non-strategy Q- study
CN110045614A (en) * 2019-05-16 2019-07-23 河海大学常州校区 A kind of traversing process automatic learning control system of strand suction ship and method based on deep learning
CN110428615B (en) * 2019-07-12 2021-06-22 中国科学院自动化研究所 Single intersection traffic signal control method, system and device based on deep reinforcement learning
CN110428615A (en) * 2019-07-12 2019-11-08 中国科学院自动化研究所 Learn isolated intersection traffic signal control method, system, device based on deeply
CN110362089A (en) * 2019-08-02 2019-10-22 大连海事大学 A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN110321666A (en) * 2019-08-09 2019-10-11 重庆理工大学 Multi-robots Path Planning Method based on priori knowledge Yu DQN algorithm
CN110321666B (en) * 2019-08-09 2022-05-03 重庆理工大学 Multi-robot path planning method based on priori knowledge and DQN algorithm
CN110333739A (en) * 2019-08-21 2019-10-15 哈尔滨工程大学 A kind of AUV conduct programming and method of controlling operation based on intensified learning
CN110333739B (en) * 2019-08-21 2020-07-31 哈尔滨工程大学 AUV (autonomous Underwater vehicle) behavior planning and action control method based on reinforcement learning
CN110806756B (en) * 2019-09-10 2022-08-02 西北工业大学 Unmanned aerial vehicle autonomous guidance control method based on DDPG
CN110806756A (en) * 2019-09-10 2020-02-18 西北工业大学 Unmanned aerial vehicle autonomous guidance control method based on DDPG
CN110716574B (en) * 2019-09-29 2023-05-02 哈尔滨工程大学 UUV real-time collision avoidance planning method based on deep Q network
CN110716574A (en) * 2019-09-29 2020-01-21 哈尔滨工程大学 UUV real-time collision avoidance planning method based on deep Q network
CN110673602A (en) * 2019-10-24 2020-01-10 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN110673602B (en) * 2019-10-24 2022-11-25 驭势科技(北京)有限公司 Reinforced learning model, vehicle automatic driving decision method and vehicle-mounted equipment
CN110806759A (en) * 2019-11-12 2020-02-18 清华大学 Aircraft route tracking method based on deep reinforcement learning
CN110989576B (en) * 2019-11-14 2022-07-12 北京理工大学 Target following and dynamic obstacle avoidance control method for differential slip steering vehicle
CN110989576A (en) * 2019-11-14 2020-04-10 北京理工大学 Target following and dynamic obstacle avoidance control method for differential slip steering vehicle
CN111027677A (en) * 2019-12-02 2020-04-17 西安电子科技大学 Multi-maneuvering-target tracking method based on depth certainty strategy gradient DDPG
CN111091710A (en) * 2019-12-18 2020-05-01 上海天壤智能科技有限公司 Traffic signal control method, system and medium
CN111061277A (en) * 2019-12-31 2020-04-24 歌尔股份有限公司 Unmanned vehicle global path planning method and device
US11747155B2 (en) 2019-12-31 2023-09-05 Goertek Inc. Global path planning method and device for an unmanned vehicle
CN111310384A (en) * 2020-01-16 2020-06-19 香港中文大学(深圳) Wind field cooperative control method, terminal and computer readable storage medium
CN111240345A (en) * 2020-02-11 2020-06-05 哈尔滨工程大学 Underwater robot trajectory tracking method based on double BP network reinforcement learning framework
CN111240345B (en) * 2020-02-11 2023-04-07 哈尔滨工程大学 Underwater robot trajectory tracking method based on double BP network reinforcement learning framework
CN111580544A (en) * 2020-03-25 2020-08-25 北京航空航天大学 Unmanned aerial vehicle target tracking control method based on reinforcement learning PPO algorithm
CN111736617B (en) * 2020-06-09 2022-11-04 哈尔滨工程大学 Track tracking control method for preset performance of benthonic underwater robot based on speed observer
CN111813143B (en) * 2020-06-09 2022-04-19 天津大学 Underwater glider intelligent control system and method based on reinforcement learning
CN111813143A (en) * 2020-06-09 2020-10-23 天津大学 Underwater glider intelligent control system and method based on reinforcement learning
CN111736617A (en) * 2020-06-09 2020-10-02 哈尔滨工程大学 Speed observer-based benthonic underwater robot preset performance track tracking control method
CN111856936A (en) * 2020-07-21 2020-10-30 天津蓝鳍海洋工程有限公司 Control method for underwater high-flexibility operation platform with cable
CN111856936B (en) * 2020-07-21 2023-06-02 天津蓝鳍海洋工程有限公司 Control method for cabled underwater high-flexibility operation platform
CN112100834A (en) * 2020-09-06 2020-12-18 西北工业大学 Underwater glider attitude control method based on deep reinforcement learning
CN112132263A (en) * 2020-09-11 2020-12-25 大连理工大学 Multi-agent autonomous navigation method based on reinforcement learning
CN112162555B (en) * 2020-09-23 2021-07-16 燕山大学 Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet
CN112162555A (en) * 2020-09-23 2021-01-01 燕山大学 Vehicle control method based on reinforcement learning control strategy in hybrid vehicle fleet
CN112148025A (en) * 2020-09-24 2020-12-29 东南大学 Unmanned aerial vehicle stability control algorithm based on integral compensation reinforcement learning
CN112179367A (en) * 2020-09-25 2021-01-05 广东海洋大学 Intelligent autonomous navigation method based on deep reinforcement learning
CN112179367B (en) * 2020-09-25 2023-07-04 广东海洋大学 Intelligent autonomous navigation method based on deep reinforcement learning
CN112241176B (en) * 2020-10-16 2022-10-28 哈尔滨工程大学 Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment
CN112241176A (en) * 2020-10-16 2021-01-19 哈尔滨工程大学 Path planning and obstacle avoidance control method of underwater autonomous vehicle in large-scale continuous obstacle environment
CN112558465A (en) * 2020-12-03 2021-03-26 大连海事大学 Unknown unmanned ship finite time reinforcement learning control method with input limitation
CN112506210A (en) * 2020-12-04 2021-03-16 东南大学 Unmanned aerial vehicle control method for autonomous target tracking
CN112506210B (en) * 2020-12-04 2022-12-27 东南大学 Unmanned aerial vehicle control method for autonomous target tracking
CN112462792A (en) * 2020-12-09 2021-03-09 哈尔滨工程大学 Underwater robot motion control method based on Actor-Critic algorithm
CN112698572A (en) * 2020-12-22 2021-04-23 西安交通大学 Structural vibration control method, medium and equipment based on reinforcement learning
CN112929900A (en) * 2021-01-21 2021-06-08 华侨大学 MAC protocol for realizing time domain interference alignment based on deep reinforcement learning in underwater acoustic network
CN112929900B (en) * 2021-01-21 2022-08-02 华侨大学 MAC protocol for realizing time domain interference alignment based on deep reinforcement learning in underwater acoustic network
CN113029123A (en) * 2021-03-02 2021-06-25 西北工业大学 Multi-AUV collaborative navigation method based on reinforcement learning
CN113052372A (en) * 2021-03-17 2021-06-29 哈尔滨工程大学 Dynamic AUV tracking path planning method based on deep reinforcement learning
CN113052372B (en) * 2021-03-17 2022-08-02 哈尔滨工程大学 Dynamic AUV tracking path planning method based on deep reinforcement learning
CN113095500B (en) * 2021-03-31 2023-04-07 南开大学 Robot tracking method based on multi-agent reinforcement learning
CN113095500A (en) * 2021-03-31 2021-07-09 南开大学 Robot tracking method based on multi-agent reinforcement learning
CN113095463A (en) * 2021-03-31 2021-07-09 南开大学 Robot confrontation method based on evolution reinforcement learning
CN113370205A (en) * 2021-05-08 2021-09-10 浙江工业大学 Baxter mechanical arm track tracking control method based on machine learning
CN113370205B (en) * 2021-05-08 2022-06-17 浙江工业大学 Baxter mechanical arm track tracking control method based on machine learning
CN113359448A (en) * 2021-06-03 2021-09-07 清华大学 Autonomous underwater vehicle track tracking control method aiming at time-varying dynamics
CN113595768A (en) * 2021-07-07 2021-11-02 西安电子科技大学 Distributed cooperative transmission algorithm for guaranteeing control performance of mobile information physical system
CN113467248A (en) * 2021-07-22 2021-10-01 南京大学 Fault-tolerant control method for unmanned aerial vehicle sensor during fault based on reinforcement learning
WO2023019536A1 (en) * 2021-08-20 2023-02-23 上海电气电站设备有限公司 Deep reinforcement learning-based photovoltaic module intelligent sun tracking method
CN113821035A (en) * 2021-09-22 2021-12-21 北京邮电大学 Unmanned ship trajectory tracking control method and device
CN113829351B (en) * 2021-10-13 2023-08-01 广西大学 Cooperative control method of mobile mechanical arm based on reinforcement learning
CN113829351A (en) * 2021-10-13 2021-12-24 广西大学 Collaborative control method of mobile mechanical arm based on reinforcement learning
CN113885330A (en) * 2021-10-26 2022-01-04 哈尔滨工业大学 Information physical system safety control method based on deep reinforcement learning
CN113885330B (en) * 2021-10-26 2022-06-17 哈尔滨工业大学 Information physical system safety control method based on deep reinforcement learning
CN114089633B (en) * 2021-11-19 2024-04-26 江苏科技大学 Multi-motor coupling driving control device and method for underwater robot
CN114089633A (en) * 2021-11-19 2022-02-25 江苏科技大学 Multi-motor coupling drive control device and method for underwater robot
CN114020001A (en) * 2021-12-17 2022-02-08 中国科学院国家空间科学中心 Mars unmanned aerial vehicle intelligent control method based on depth certainty strategy gradient learning
CN114357884A (en) * 2022-01-05 2022-04-15 厦门宇昊软件有限公司 Reaction temperature control method and system based on deep reinforcement learning
CN114527642A (en) * 2022-03-03 2022-05-24 东北大学 AGV automatic PID parameter adjusting method based on deep reinforcement learning
CN114527642B (en) * 2022-03-03 2024-04-02 东北大学 Method for automatically adjusting PID parameters by AGV based on deep reinforcement learning
CN114721408A (en) * 2022-04-18 2022-07-08 哈尔滨理工大学 Underwater robot path tracking method based on reinforcement learning
CN114954840B (en) * 2022-05-30 2023-09-05 武汉理工大学 Method, system and device for controlling stability of ship
CN114954840A (en) * 2022-05-30 2022-08-30 武汉理工大学 Stability changing control method, system and device for stability changing ship and storage medium
CN114995137B (en) * 2022-06-01 2023-04-28 哈尔滨工业大学 Rope-driven parallel robot control method based on deep reinforcement learning
CN114995137A (en) * 2022-06-01 2022-09-02 哈尔滨工业大学 Rope-driven parallel robot control method based on deep reinforcement learning
CN114839884B (en) * 2022-07-05 2022-09-30 山东大学 Underwater vehicle bottom layer control method and system based on deep reinforcement learning
CN114839884A (en) * 2022-07-05 2022-08-02 山东大学 Underwater vehicle bottom layer control method and system based on deep reinforcement learning
CN114967713A (en) * 2022-07-28 2022-08-30 山东大学 Underwater vehicle buoyancy discrete change control method based on reinforcement learning
CN114967713B (en) * 2022-07-28 2022-11-29 山东大学 Underwater vehicle buoyancy discrete change control method based on reinforcement learning
CN115330276A (en) * 2022-10-13 2022-11-11 北京云迹科技股份有限公司 Method and device for robot to automatically select elevator based on reinforcement learning
CN115330276B (en) * 2022-10-13 2023-01-06 北京云迹科技股份有限公司 Method and device for robot to automatically select elevator based on reinforcement learning
CN115562345A (en) * 2022-10-28 2023-01-03 北京理工大学 Unmanned aerial vehicle detection track planning method based on deep reinforcement learning
CN115657683A (en) * 2022-11-14 2023-01-31 中国电子科技集团公司第十研究所 Unmanned and cableless submersible real-time obstacle avoidance method capable of being used for inspection task
CN115857556A (en) * 2023-01-30 2023-03-28 中国人民解放军96901部队 Unmanned aerial vehicle collaborative detection planning method based on reinforcement learning
CN115826594A (en) * 2023-02-23 2023-03-21 北京航空航天大学 Unmanned underwater vehicle switching topology formation control method independent of dynamic model parameters
CN115855226B (en) * 2023-02-24 2023-05-30 青岛科技大学 Multi-AUV cooperative underwater data acquisition method based on DQN and matrix completion
CN115855226A (en) * 2023-02-24 2023-03-28 青岛科技大学 Multi-AUV cooperative underwater data acquisition method based on DQN and matrix completion
CN116295449A (en) * 2023-05-25 2023-06-23 吉林大学 Method and device for indicating path of autonomous underwater vehicle
CN116295449B (en) * 2023-05-25 2023-09-12 吉林大学 Method and device for indicating path of autonomous underwater vehicle
CN116578102B (en) * 2023-07-13 2023-09-19 清华大学 Obstacle avoidance method and device for autonomous underwater vehicle, computer equipment and storage medium
CN116578102A (en) * 2023-07-13 2023-08-11 清华大学 Obstacle avoidance method and device for autonomous underwater vehicle, computer equipment and storage medium
CN116827685A (en) * 2023-08-28 2023-09-29 成都乐超人科技有限公司 Dynamic defense strategy method of micro-service system based on deep reinforcement learning
CN116827685B (en) * 2023-08-28 2023-11-14 成都乐超人科技有限公司 Dynamic defense strategy method of micro-service system based on deep reinforcement learning

Also Published As

Publication number Publication date
CN108803321B (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN108803321A (en) Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
CN107748566B (en) Underwater autonomous robot fixed depth control method based on reinforcement learning
CN111667513B (en) Unmanned aerial vehicle maneuvering target tracking method based on DDPG transfer learning
CN109655066A (en) One kind being based on the unmanned plane paths planning method of Q (λ) algorithm
CN111240345B (en) Underwater robot trajectory tracking method based on double BP network reinforcement learning framework
CN108319293A (en) A kind of UUV Realtime collision free planing methods based on LSTM networks
CN110362089A (en) A method of the unmanned boat independent navigation based on deeply study and genetic algorithm
CN107255923A (en) Drive lacking unmanned boat Track In Track control method based on the RBF ICA CMAC neutral nets recognized
CN109634307A (en) A kind of compound Track In Track control method of UAV navigation
CN113052372B (en) Dynamic AUV tracking path planning method based on deep reinforcement learning
CN112286218B (en) Aircraft large-attack-angle rock-and-roll suppression method based on depth certainty strategy gradient
CN109189103B (en) Under-actuated AUV trajectory tracking control method with transient performance constraint
CN115016496A (en) Water surface unmanned ship path tracking method based on deep reinforcement learning
CN108334677A (en) A kind of UUV Realtime collision free planing methods based on GRU networks
CN106338919A (en) USV (Unmanned Surface Vehicle) track tracking control method based on enhanced learning type intelligent algorithm
CN111240356A (en) Unmanned aerial vehicle cluster convergence method based on deep reinforcement learning
CN114115262B (en) Multi-AUV actuator saturation cooperative formation control system and method based on azimuth information
Fang et al. Autonomous underwater vehicle formation control and obstacle avoidance using multi-agent generative adversarial imitation learning
CN110658814A (en) Self-adaptive ship motion modeling method applied to ship motion control
CN113741449A (en) Multi-agent control method for air-sea cooperative observation task
Zang et al. Standoff tracking control of underwater glider to moving target
CN115033022A (en) DDPG unmanned aerial vehicle landing method based on expert experience and oriented to mobile platform
Jin et al. Soft formation control for unmanned surface vehicles under environmental disturbance using multi-task reinforcement learning
Meng et al. A Fully-Autonomous Framework of Unmanned Surface Vehicles in Maritime Environments Using Gaussian Process Motion Planning
Song et al. Surface path tracking method of autonomous surface underwater vehicle based on deep reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant