CN109240091B - Underwater robot control method based on reinforcement learning and tracking control method thereof - Google Patents

Underwater robot control method based on reinforcement learning and tracking control method thereof Download PDF

Info

Publication number
CN109240091B
CN109240091B CN201811342346.4A CN201811342346A CN109240091B CN 109240091 B CN109240091 B CN 109240091B CN 201811342346 A CN201811342346 A CN 201811342346A CN 109240091 B CN109240091 B CN 109240091B
Authority
CN
China
Prior art keywords
underwater robot
robot
underwater
control method
control strategy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811342346.4A
Other languages
Chinese (zh)
Other versions
CN109240091A (en
Inventor
闫敬
公雅迪
罗小元
杨晛
李鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wang Bo
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN201811342346.4A priority Critical patent/CN109240091B/en
Publication of CN109240091A publication Critical patent/CN109240091A/en
Application granted granted Critical
Publication of CN109240091B publication Critical patent/CN109240091B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Abstract

The invention discloses an underwater robot control method based on reinforcement learning and a tracking control method thereof, belonging to the field of underwater robot control. The control center gives expected track information of the underwater robot and sends the expected track information to the underwater robot; respectively selecting sampling points for the uncertain parameters according to the probability density function of the uncertain parameters in the underwater robot model, and reducing the order of the original dynamic model by using the sampling points; the method comprises the steps that an underwater robot interacts with the surrounding environment to learn environment information, one-step cost functions are calculated in different states to update values, the weight of the value functions corresponding to a control strategy is solved by a least square method, the control strategy is improved by a gradient descent method, and the two processes of value updating and strategy improving are iterated circularly until convergence, so that the optimal control strategy of the current position tracking expected track is obtained; and repeating the steps to obtain the optimal control strategy for tracking the rest expected tracks, and finally completing the tracking task.

Description

Underwater robot control method based on reinforcement learning and tracking control method thereof
Technical Field
The invention relates to the field of underwater robot control, in particular to an underwater robot control method based on reinforcement learning and a tracking control method thereof.
Background
With the more and more extensive application of ocean resources, underwater robots are also receiving more attention from people. An important application of the underwater robot in the sea is position tracking, but the underwater environment is complex and changeable, so that model parameters of the underwater robot are difficult to obtain, and the control difficulty is high.
In the prior art, patent application with publication number CN106708069A discloses a coordinated planning and control method for underwater mobile robot. The method comprises the steps of planning a current expected speed and state in real time through a dynamic tracking differentiator, converting task planning of a Cartesian space into a random coordinate system and speed and acceleration planning of each joint coordinate system through an iterative task priority method, and controlling an underwater robot and an operation arm through a dynamics method according to the speed and acceleration planning, so that the underwater mobile robot can perform tour and operation. However, the invention does not consider the influence of uncertainty in the underwater environment on the underwater robot, and in the marine environment, the underwater robot can be influenced by various interferences such as the acting forces of surge, swing and heave during the operation, and if the uncertain factors are not considered in the algorithm, the ideal effect can not be achieved in the actual operation.
Furthermore, the patent application with the publication number of CN107544256A designs underwater robot sliding mode control based on the adaptive backstepping method, and the invention provides an underwater robot sliding mode control method based on the adaptive backstepping method. The method is based on the decomposition of a complex nonlinear system, virtual control quantity is designed for a subsystem, the control quantity of the whole system is obtained by combining sliding mode step-by-step recursion, a radial basis function neural network is introduced into a controller aiming at the buffeting problem caused by uncertain upper bound of the system, the internal uncertainty and external interference of the system are approached in a self-adaptive mode, the buffeting of the system is finally controlled, high-precision tracking control is realized, the robustness of a closed-loop system is improved, and the engineering requirements are met. The internal uncertainty and the external interference proposed in the invention are determined parameters, but in an actual working environment, when parameters which can cause interference to the underwater robot are considered, the parameters are set to be time-varying uncertain parameters.
Disclosure of Invention
The invention aims to overcome the defects and provides an underwater robot control method based on reinforcement learning, which can accurately track a target track, reduce the sampling times of a system with uncertain parameters and realize control by learning the environment by using the underwater robot.
In order to achieve the purpose, the invention adopts the following technical scheme:
an underwater robot control method based on reinforcement learning is characterized by comprising the following steps:
step 1, establishing a fixed reference system based on the self expected track position of the robot and an inertial reference system based on uncertain factors of an underwater environment for the self position of the underwater robot;
step 2, for an inertial reference system, constructing an output model of the system mapping robot based on uncertain factors in the front-back direction, the left-right direction and the up-down direction:
Figure GDA0002505767830000021
in the formula, aiIs the ith uncertainty factor suffered by the underwater robot,
Figure GDA0002505767830000022
for each uncertainty factor aiAll follow independent probability density functions
Figure GDA0002505767830000023
Sampling each uncertain factor at fixed points according to the respective probability density function of the uncertain factor, training a system mapping robot output model by using the sampling points, and constructing a reduced-order system mapping robot output model:
Figure GDA0002505767830000031
in the formula (I), the compound is shown in the specification,
Figure GDA0002505767830000032
is the coefficient of the uncertain factor in the low-order mapping;
step 3, converting the real position of the underwater robot into the coordinates in the fixed reference system in the step 1, and obtaining model output mapped by the robot reduced-order system in the inertial reference system in the step 2;
step 4, defining the real positions of the underwater robot in different states k as follows:
p(k)=[x(k),y(k),z(k)]T
defining expected track positions of the underwater robot in different states k as follows:
pr(k)=[xr(k),yr(k),zr(k)]T
defining the one-step cost function of the next action of the underwater robot under different states k as
gk(p,u)=(x(k)-xr(k))2+(y(k)-yr(k))2+(z(k)-zr(k))2+u2(k)
Wherein (x-x)r)2+(y-yr)2+(z-zr)2Representing the cost of the underwater robot position error, u is the underwater robot controller input, u2Represents a cost of consuming energy;
training the robot according to a one-step cost function generated by the position movement of the underwater robot to obtain a value function
V(p(k))=Ea(k){gk(p,u)+γV(p(k+1))}
Wherein γ ∈ (0,1) is a discount factor, Ea(k) Representing the expectation function at state k;
let V equal WTΦ (p), obtaining a value model of the control method using an iterative weighting method:
Wj+1Φ(p(k))=Ea(k)[gk(p,u)+γWjΦ(p(k+1))]
in the formula (I), the compound is shown in the specification,
Figure GDA0002505767830000033
is a basis vector, W is a weight vector;
step 5, solving a value model of the control method; let h (p) be UTσ (p), wherein the weight vector U is updated with a gradient descent method, the control method is improved with a minimum cost function:
Figure GDA0002505767830000034
wherein h (p) is the next action performed in each state when the underwater robot performs position tracking, and h (p) is used as an optimal control strategy;
step 6, simultaneously converging the two processes of updating the value model of the control method and improving the control strategy by using an iterative weight method, and completing the solution of the optimal control strategy in the current state;
and 7, inputting the real position in the step 3 into the step 4, obtaining the next optimal control strategy through the operations in the steps 5-6, inputting the optimal control strategy as output into the system mapping robot output model in the step 2, and circularly repeating the operations in the steps 3 and 7 to complete the tracking task of the underwater robot.
The further technical scheme is that the uncertain factors in the step 1 are underwater surging, swinging and heaving.
The further technical scheme is that the reduced order system in the step 2 maps the output mean value E '(G' (a)) of the robot output model1,a2,a3) And the output mean value E (G (a)) of the robot output model is mapped with the system1,a2,a3) Are identical).
The further technical scheme is that the specific steps of the step 4 are as follows:
the self position of the underwater robot under different states k is p (k) ═ x (k), y (k), z (k)]TThe expected trajectory is pr(k)=[xr(k),yr(k),zr(k)]T(ii) a In order to obtain an optimal control strategy, namely the action h performed by the underwater robot in each state when the underwater robot performs position tracking, a one-step cost function of the underwater robot in different states is set as gk(p,u)=(x(k)-xr(k))2+(y(k)-yr(k))2+(z(k)-zr(k))2+u2(k) Wherein (x-x)r)2+(y-yr)2+(z-zr)2Representing the cost of the tracking error, u is the underwater robot controller input, u2Represents a cost of consuming energy; calculating a cost function through the set one-step cost function:
V(p(k))=Ea(k){gk+γV(p(k+1))}
wherein γ ∈ (0,1) is a discount factor, Ea(k) Representing the expectation function at state k;
at priceIn the process of updating the value, let V become WTΦ (p), the cost function can then be expressed as: wj+1Φ(p(k))=Ea(k)[gk(p,u)+γWjΦ(p(k+1))]
In the formula (I), the compound is shown in the specification,
Figure GDA0002505767830000041
is a base vector; w is a weight vector, and iterative solution is carried out by a least square method; after obtaining the value function, in the strategy improvement step, the optimal tracking control strategy is solved by using a method for setting a base vector and a weight vector, and when the optimal tracking control strategy is solved, h (p) is equal to UTσ (p), wherein the weight vector U is updated by gradient descent, σ (p) being the basis vector; the control strategy is improved by using the minimum cost function:
Figure GDA0002505767830000051
and h (p) is a control strategy obtained by learning the environment by the underwater robot, and the strategy is an optimal control strategy.
The further technical scheme is that the specific content of the step 6 is as follows:
and when the value model of the control method is updated and the control strategy is improved by using an iterative weight method each time, and the obtained weight change is smaller than a threshold value of 0.001, the convergence is regarded, and h after the iteration is finished is input to the underwater robot as the input u of the controller.
The underwater robot control method based on reinforcement learning controls the underwater robot to realize the tracking of the tracked object.
Compared with the prior art, the invention has the following advantages:
the invention samples the uncertain parameters of the underwater robot relating to the underwater uncertain factors by using a reduction method, and can give accurate output statistics of the original mapping, thereby reducing the calculation cost and effectively reducing the simulation times.
The invention uses the reinforcement learning method to track the position of the underwater robot, integrates the advantages of self-adaption and optimal control, and seeks an optimal feedback strategy by using the response of the environment. By utilizing the surrounding environment information, the underwater robot can find the control strategy which best accords with the target track through self learning by multiple iterations.
The invention realizes the intelligent tracking of the underwater robot. The uncertain parameters of the underwater robot are sampled by using a reduction method and are combined with reinforcement learning, so that the backward real-time optimal control of an underwater robot system becomes forward self-adaptive control, and the underwater robot can better complete track tracking.
Drawings
FIG. 1 is a flow chart of trajectory tracking according to the present invention.
Fig. 2 is a schematic diagram of the structure of the underwater mobile sensor network of the invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
as shown in fig. 2, the invention is provided with a buoy relay on the water surface, the buoy relay is used for self-positioning the underwater robot, and the control center gives the expected track information of the underwater robot and sends the expected track information to the underwater robot; and the underwater robot controller controls the driver to drive according to the system control to complete the motion of the underwater robot.
As shown in fig. 1, the present invention provides a reinforcement learning-based underwater robot control method, which comprises the following steps:
step one, the underwater robot is influenced by the surrounding environment underwater, uncertain factors in an underwater robot model need to be evaluated, and the control of an underwater robot controller to a driver can be completed; the underwater robot has six degrees of freedom, namely, up, down, left, right, front and back, the dynamic characteristics of the underwater robot can be described by two reference systems, namely a fixed reference system based on the expected track position of the robot and an inertial reference system based on uncertain factors of an underwater environment. The fixed reference system and the inertial reference system respectively consider the up-down direction, the left-right direction and the front-back direction, and the underwater inertial reference system uses factors such as underwater surging, swinging and heaving as uncertain factors to quote uncertain parameters.
In an inertial reference system, linear speeds in three directions of surging, swinging and heaving are pairwise perpendicular, and meanwhile, the influences of rolling, pitching and yawing on the angular speed of the underwater robot are considered in the direction of the linear speed.
Step two, due to the random influence of the underwater environment, respectively estimating uncertain parameters of the underwater robot in three directions: and selecting a group of sampling points for each parameter according to the respective probability density functions of the uncertain parameters, and respectively calculating by using the sampling points to reduce the order of the robot model, so that the output result of the controller can be obtained through calculation for a few times, the output mean value is ensured to be the same as the output mean value of the original model, the underwater robot adapts to the underwater environment, and more accurate control is realized.
The method comprises the following specific steps: for an inertial reference system, an output model of the robot mapped by the system based on uncertain factors is constructed in the front-back direction, the left-right direction and the up-down direction:
Figure GDA0002505767830000071
aithe uncertainty parameter is the ith uncertainty factor suffered by the underwater robot, and in the embodiment, factors such as underwater surge, swing and heave are used as uncertainty factors to quote uncertainty parameters.
Figure GDA0002505767830000074
Are coefficients. Each uncertainty parameter (or uncertainty factor) aiAll follow independent probability density functions
Figure GDA0002505767830000075
Sampling each uncertain factor at fixed points according to the respective probability density function of the uncertain factor, training a system mapping robot output model by using the sampling points, and constructing a reduced-order system mapping robot output model:
Figure GDA0002505767830000072
wherein
Figure GDA0002505767830000073
Is a new coefficient of uncertain parameters in the low-order mapping. The order-reducing system maps the output mean value E '(G' (a) of the robot output model1,a2,a3) And the output mean value E (G (a)) of the robot output model is mapped with the system1,a2,a3) Are identical, i.e., E '(G' (a))1,a2,a3))=E(G(a1,a2,a3))。
Step three, converting the real position of the underwater robot into the coordinates in the fixed reference system in the step 1, and acquiring model output mapped by the robot reduced-order system in the inertial reference system in the step 2;
step four, defining the self position of the underwater robot in different states k as
p(k)=[x(k),y(k),z(k)]T
The desired trajectory to be tracked is:
pr(k)=[xr(k),yr(k),zr(k)]T
in order to obtain an optimal control strategy, namely the action h performed by the underwater robot in each state when the underwater robot performs position tracking, a one-step cost function of the underwater robot in different states is set as
gk(p,u)=(x(k)-xr(k))2+(y(k)-yr(k))2+(z(k)-zr(k))2+u2(k)
Wherein (x-x)r)2+(y-yr)2+(z-zr)2Representing the cost of the tracking error, u is the underwater robot controller input, u2Representing a cost of consuming energy. By setting a one-step cost functionCalculating a cost function:
V(p(k))=Ea(k){gk(p,u)+γV(p(k+1))}
wherein γ ∈ (0,1) is a discount factor, Ea(k) Representing the expectation function at state k;
let V equal WTΦ (p), the cost function can then be expressed as:
Wj+1Φ(p(k))=Ea(k)[gk(p,u)+γWjΦ(p(k+1))]
in the formula (I), the compound is shown in the specification,
Figure GDA0002505767830000081
are basis vectors. W is the weight vector, solved iteratively by the least squares method.
After the value function is obtained in the fifth step, in the strategy improvement step, the optimal tracking control strategy is solved by using a method for setting a base vector and a weight vector, and when the optimal tracking control strategy is solved, h (p) is set to be UTσ (p), where the weight vector U is updated with a gradient descent method, σ (p) being the basis vector. The control strategy is improved by using a minimum cost function:
Figure GDA0002505767830000082
and h (p) is a control strategy obtained by learning the environment by the underwater robot, and the strategy is an optimal control strategy.
And step six, two processes of iterative value updating and strategy improvement are circulated, when the weight change obtained in each iterative value updating and strategy improvement process is smaller than a threshold value of 0.001, convergence is considered, h after iteration is finished is used as the output u of the controller and is input into a driver of the underwater robot, and the optimal control strategy under the current state is solved.
Step seven, inputting the optimal control strategy into the reduced-order system obtained in the step two, updating the state of the underwater robot,
repeating the fifth step and the sixth step again to obtain an optimal control strategy for the next action, and inputting the optimal control strategy into the second step again.
The invention also discloses a control method for tracking by using the underwater robot, which uses the track information generated by the continuous movement of the tracked object as the expected track information in the upper step 1, and controls the underwater robot by using the underwater robot control method based on reinforcement learning to realize the tracking of the tracked object.
The track information of the tracked object can be obtained by positioning the buoy relay.
An embodiment is specifically described below:
(1) as shown in figure 2, in a given water area with the length of 6m, the width of 5m and the depth of 1.5m, an underwater robot is deployed, a buoy relay is arranged on the water surface, the buoy relay is used for self-positioning the underwater robot, and a control center gives expected track information x of the underwater robotr=2sin(0.1k),yr=0.1k,zr1, where k ∈ [ 0.., 100s]And sent to the underwater robot.
(2) The underwater robot has a kinematic model Sk+1=Sk+Uk+Ak,Sk=[x(k),y(k),z(k)]TIs the self position of the underwater robot, Uk=[ux,uy,uz]TObtained by reinforcement learning, Ak=[a1(k),a2(k),a3(k)]TIs not a definite parameter, wherein
Figure GDA0002505767830000091
-0.2≤a1(k)≤0.3,
Figure GDA0002505767830000092
-0.8≤a2(k)≤0.7,a3(k)=0。
(3) Tracking the position by a reinforcement learning method, and setting a one-step cost function V (p (k) ═ Ea(k){gkIn (p, u) + γ V (p (k +1)) }, a discount factor γ is set to 0.9. To obtain the cost function, let V equal WTPhi (p), the weight vector can be obtained by least squares iteration, the basisThe quantity phi (p) is [1, x, y, x2,y2,xy]T. After obtaining the value function, in the strategy improvement step, the optimal tracking control strategy is solved by using a method for setting a base vector and a weight vector, and when the optimal tracking control strategy is solved, h (p) is equal to UTσ (p), where the weight vector U is updated by gradient descent, σ (p) [1, x, y ]]T. The control strategy is improved when the minimum cost function is utilized.
(4) And (3) through two processes of cyclic iteration value updating and strategy improvement, when the weight change obtained in each iteration value updating and strategy improvement process is less than 0.001 of a threshold value, the convergence is regarded, h after the iteration is finished is used as the output u of the controller and is input into a driver of the underwater robot, and the solution of the optimal control strategy under the current state is finished.
(5) And (3) inputting the optimal control strategy as output into the system mapping robot output model in the step (2), and circulating the steps to realize the tracking task. The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and not restrictive, and various changes and modifications may be made to the technical solution of the present invention by those skilled in the art without departing from the spirit of the present invention, which is defined by the claims.

Claims (6)

1. An underwater robot control method based on reinforcement learning is characterized by comprising the following steps:
step 1, establishing a fixed reference system based on the self expected track position of the robot and an inertial reference system based on uncertain factors of an underwater environment for the self position of the underwater robot;
step 2, for an inertial reference system, constructing an output model of the system mapping robot based on uncertain factors in the front-back direction, the left-right direction and the up-down direction:
Figure FDA0002505767820000011
in the formula, aiIs an underwater machineThe i-th uncertainty factor experienced by the robot,
Figure FDA0002505767820000012
for each uncertainty factor aiAll follow independent probability density functions
Figure FDA0002505767820000013
Sampling each uncertain factor at fixed points according to the respective probability density function of the uncertain factor, training a system mapping robot output model by using the sampling points, and constructing a reduced-order system mapping robot output model:
Figure FDA0002505767820000014
in the formula (I), the compound is shown in the specification,
Figure FDA0002505767820000015
is the coefficient of the uncertain factor in the low-order mapping;
step 3, converting the real position of the underwater robot into the coordinates in the fixed reference system in the step 1, and obtaining model output mapped by the robot reduced-order system in the inertial reference system in the step 2;
step 4, defining the real positions of the underwater robot in different states k as follows:
p(k)=[x(k),y(k),z(k)]T
defining expected track positions of the underwater robot in different states k as follows:
pr(k)=[xr(k),yr(k),zr(k)]T
defining the one-step cost function of the next action of the underwater robot under different states k as
gk(p,u)=(x(k)-xr(k))2+(y(k)-yr(k))2+(z(k)-zr(k))2+u2(k)
Wherein (x-x)r)2+(y-yr)2+(z-zr)2Representing the cost of the underwater robot position error, u is the underwater robot controller input, u2Represents a cost of consuming energy;
training the robot according to a one-step cost function generated by the position movement of the underwater robot to obtain a value function
V(p(k))=Ea(k){gk(p,u)+γV(p(k+1))}
Wherein γ ∈ (0,1) is a discount factor, Ea(k) Representing the expectation function at state k;
let V equal WTΦ (p), obtaining a value model of the control method using an iterative weighting method:
Wj+1Φ(p(k))=Ea(k)[gk(p,u)+γWjΦ(p(k+1))]
in the formula (I), the compound is shown in the specification,
Figure FDA0002505767820000021
is a basis vector, W is a weight vector;
step 5, solving a value model of the control method; let h (p) be UTσ (p), wherein the weight vector U is updated with a gradient descent method, the control method is improved with a minimum cost function:
wherein h (p) is the next action performed in each state when the underwater robot performs position tracking, and h (p) is used as an optimal control strategy;
step 6, simultaneously converging the two processes of updating the value model of the control method and improving the control strategy by using an iterative weight method, and completing the solution of the optimal control strategy in the current state;
and 7, inputting the real position in the step 3 into the step 4, obtaining the optimal control strategy of the next action through the operations in the steps 5-6, inputting the optimal control strategy as output into the system mapping robot output model in the step 2, and circularly repeating the operations in the steps 3 and 7 to complete the tracking task of the underwater robot.
2. The reinforcement learning-based underwater robot control method according to claim 1, wherein the uncertain factors in the step 1 are underwater surge, sway and heave.
3. The reinforcement learning-based underwater robot control method according to claim 1, wherein the reduced order system in the step 2 maps an output mean value E '(G' (a) of the robot output model1,a2,a3) And the output mean value E (G (a)) of the robot output model is mapped with the system1,a2,a3) Are identical).
4. The reinforcement learning-based underwater robot control method according to claim 1, wherein the specific steps of the step 4 are as follows:
the self position of the underwater robot under different states k is p (k) ═ x (k), y (k), z (k)]TThe expected trajectory is pr(k)=[xr(k),yr(k),zr(k)]T(ii) a In order to obtain an optimal control strategy, namely the action h performed by the underwater robot in each state when the underwater robot performs position tracking, a one-step cost function of the underwater robot in different states is set as gk(p,u)=(x(k)-xr(k))2+(y(k)-yr(k))2+(z(k)-zr(k))2+u2(k) Wherein (x-x)r)2+(y-yr)2+(z-zr)2Representing the cost of the tracking error, u is the underwater robot controller input, u2Represents a cost of consuming energy; calculating a cost function through the set one-step cost function:
V(p(k))=Ea(k){gk+γV(p(k+1))}
wherein γ ∈ (0,1) is a discount factor, Ea(k) Representing the expectation function at state k;
in the value updating process, let V equal WTPhi (p), then valenceThe value function can be expressed as: wj+1Φ(p(k))=Ea(k)[gk(p,u)+γWjΦ(p(k+1))]
In the formula (I), the compound is shown in the specification,
Figure FDA0002505767820000031
is a base vector; w is a weight vector, and iterative solution is carried out by a least square method; the specific step of step 5 is to obtain the cost function, then, in the strategy improvement step, the optimal tracking control strategy is solved by using the method of setting the basis vector and the weight vector, and when solving, h (p) is equal to UTσ (p), wherein the weight vector U is updated by gradient descent, σ (p) being the basis vector; the control strategy is improved by using a minimum cost function:
Figure FDA0002505767820000032
and h (p) is a control strategy obtained by learning the environment by the underwater robot, and the strategy is an optimal control strategy.
5. The reinforcement learning-based underwater robot control method according to claim 1, wherein the details of the step 6 are as follows:
and when the value model of the control method is updated and the control strategy is improved by using an iterative weight method each time, and the obtained weight change is smaller than a threshold value of 0.001, the convergence is regarded, and h after the iteration is finished is input to the underwater robot as the input u of the controller.
6. A control method for tracking by using an underwater robot is characterized in that a track of an object to be tracked moving underwater is used as an expected track of the underwater robot, and the underwater robot is controlled by using the underwater robot control method based on reinforcement learning according to claim 1, so that the object to be tracked is tracked.
CN201811342346.4A 2018-11-13 2018-11-13 Underwater robot control method based on reinforcement learning and tracking control method thereof Active CN109240091B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811342346.4A CN109240091B (en) 2018-11-13 2018-11-13 Underwater robot control method based on reinforcement learning and tracking control method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811342346.4A CN109240091B (en) 2018-11-13 2018-11-13 Underwater robot control method based on reinforcement learning and tracking control method thereof

Publications (2)

Publication Number Publication Date
CN109240091A CN109240091A (en) 2019-01-18
CN109240091B true CN109240091B (en) 2020-08-11

Family

ID=65078187

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811342346.4A Active CN109240091B (en) 2018-11-13 2018-11-13 Underwater robot control method based on reinforcement learning and tracking control method thereof

Country Status (1)

Country Link
CN (1) CN109240091B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947131A (en) * 2019-04-08 2019-06-28 燕山大学 A kind of underwater multi-robot formation control method based on intensified learning
CN110246151B (en) * 2019-06-03 2023-09-15 南京工程学院 Underwater robot target tracking method based on deep learning and monocular vision
CN110597058B (en) * 2019-08-28 2022-06-17 浙江工业大学 Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning
CN110703792B (en) * 2019-11-07 2022-12-30 江苏科技大学 Underwater robot attitude control method based on reinforcement learning
CN111198568A (en) * 2019-12-23 2020-05-26 燕山大学 Underwater robot obstacle avoidance control method based on Q learning
CN112124537B (en) * 2020-09-23 2021-07-13 哈尔滨工程大学 Intelligent control method for underwater robot for autonomous absorption and fishing of benthos
CN112947430B (en) * 2021-02-03 2022-07-15 浙江工业大学 Intelligent trajectory tracking control method for mobile robot
CN112965487B (en) * 2021-02-05 2022-06-17 浙江工业大学 Mobile robot trajectory tracking control method based on strategy iteration
CN113359448A (en) * 2021-06-03 2021-09-07 清华大学 Autonomous underwater vehicle track tracking control method aiming at time-varying dynamics
CN114003042B (en) * 2021-11-02 2023-05-12 福建省海峡智汇科技有限公司 Mobile robot path tracking method based on reinforcement learning
CN116690561B (en) * 2023-05-30 2024-01-23 渤海大学 Self-adaptive optimal backstepping control method and system for single-connecting-rod mechanical arm

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105929842A (en) * 2016-04-20 2016-09-07 哈尔滨工程大学 Underactuated UUV plane trajectory tracking control method based on dynamic speed adjustment
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning
CN108008628A (en) * 2017-11-17 2018-05-08 华南理工大学 A kind of default capabilities control method of uncertain drive lacking unmanned boat system
CN108303891A (en) * 2018-02-11 2018-07-20 浙江大学 More AUV distributed collaborations tracking and controlling methods under being disturbed based on uncertain ocean current
WO2018186750A1 (en) * 2017-04-05 2018-10-11 Blueye Robotics As Camera assisted control system for an underwater vehicle
CN108762249A (en) * 2018-04-26 2018-11-06 常熟理工学院 Clean robot optimum path planning method based on the optimization of approximate model multistep

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168312B (en) * 2017-05-17 2019-12-06 哈尔滨工程大学 Space trajectory tracking control method for compensating UUV kinematic and dynamic interference

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105929842A (en) * 2016-04-20 2016-09-07 哈尔滨工程大学 Underactuated UUV plane trajectory tracking control method based on dynamic speed adjustment
WO2018186750A1 (en) * 2017-04-05 2018-10-11 Blueye Robotics As Camera assisted control system for an underwater vehicle
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning
CN108008628A (en) * 2017-11-17 2018-05-08 华南理工大学 A kind of default capabilities control method of uncertain drive lacking unmanned boat system
CN108303891A (en) * 2018-02-11 2018-07-20 浙江大学 More AUV distributed collaborations tracking and controlling methods under being disturbed based on uncertain ocean current
CN108762249A (en) * 2018-04-26 2018-11-06 常熟理工学院 Clean robot optimum path planning method based on the optimization of approximate model multistep

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
海洋环境下AUV目标跟踪控制研究;张辉;《工程科技Ⅱ辑》;20130215;全文 *

Also Published As

Publication number Publication date
CN109240091A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN109240091B (en) Underwater robot control method based on reinforcement learning and tracking control method thereof
CN110385720B (en) Robot positioning error compensation method based on deep neural network
JP5475629B2 (en) Trajectory planning method, trajectory planning system, and robot
CN112558612B (en) Heterogeneous intelligent agent formation control method based on cloud model quantum genetic algorithm
CN107633105B (en) Improved hybrid frog-leaping algorithm-based quad-rotor unmanned aerial vehicle parameter identification method
CN107168309A (en) A kind of underwater multi-robot paths planning method of Behavior-based control
CN111522341A (en) Multi-time-varying formation tracking control method and system for network heterogeneous robot system
CN113031528B (en) Multi-legged robot non-structural ground motion control method based on depth certainty strategy gradient
CN108427282A (en) A kind of solution of Inverse Kinematics method based on learning from instruction
CN112809665B (en) Mechanical arm motion planning method based on improved RRT algorithm
CN111152227A (en) Mechanical arm control method based on guided DQN control
CN112091976A (en) Task space control method for underwater mechanical arm
CN112631305A (en) Anti-collision anti-interference control system for formation of multiple unmanned ships
CN114995479A (en) Parameter control method of quadruped robot virtual model controller based on reinforcement learning
CN115781685A (en) High-precision mechanical arm control method and system based on reinforcement learning
CN114397899A (en) Bionic robot fish three-dimensional path tracking control method and device
CN112947430B (en) Intelligent trajectory tracking control method for mobile robot
CN116533249A (en) Mechanical arm control method based on deep reinforcement learning
CN116690557A (en) Method and device for controlling humanoid three-dimensional scanning motion based on point cloud
CN113829351B (en) Cooperative control method of mobile mechanical arm based on reinforcement learning
CN110703792B (en) Underwater robot attitude control method based on reinforcement learning
CN115674204A (en) Robot shaft hole assembling method based on deep reinforcement learning and admittance control
CN115446867A (en) Industrial mechanical arm control method and system based on digital twinning technology
CN111830832B (en) Bionic gliding machine dolphin plane path tracking method and system
CN114012733A (en) Mechanical arm control method for scribing PC (personal computer) component mold

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210610

Address after: Room 801, 85 Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province

Patentee after: Yami Technology (Guangzhou) Co.,Ltd.

Address before: 066004 No. 438 west section of Hebei Avenue, seaport District, Hebei, Qinhuangdao

Patentee before: Yanshan University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231215

Address after: No. 286 Nanning North Road, Qilin District, Qujing City, Yunnan Province, 655000

Patentee after: Wang Bo

Address before: Room 801, 85 Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province

Patentee before: Yami Technology (Guangzhou) Co.,Ltd.