CN114771783A - Control method and system for submarine stratum space robot - Google Patents

Control method and system for submarine stratum space robot Download PDF

Info

Publication number
CN114771783A
CN114771783A CN202210623726.5A CN202210623726A CN114771783A CN 114771783 A CN114771783 A CN 114771783A CN 202210623726 A CN202210623726 A CN 202210623726A CN 114771783 A CN114771783 A CN 114771783A
Authority
CN
China
Prior art keywords
robot
current
critic
actor
network under
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210623726.5A
Other languages
Chinese (zh)
Other versions
CN114771783B (en
Inventor
陈家旺
林型双
张培豪
翁子欣
郭进
王荧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210623726.5A priority Critical patent/CN114771783B/en
Publication of CN114771783A publication Critical patent/CN114771783A/en
Application granted granted Critical
Publication of CN114771783B publication Critical patent/CN114771783B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B63SHIPS OR OTHER WATERBORNE VESSELS; RELATED EQUIPMENT
    • B63CLAUNCHING, HAULING-OUT, OR DRY-DOCKING OF VESSELS; LIFE-SAVING IN WATER; EQUIPMENT FOR DWELLING OR WORKING UNDER WATER; MEANS FOR SALVAGING OR SEARCHING FOR UNDERWATER OBJECTS
    • B63C11/00Equipment for dwelling or working underwater; Means for searching for underwater objects
    • B63C11/52Tools specially adapted for working underwater, not otherwise provided for
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Ocean & Marine Engineering (AREA)
  • Manipulator (AREA)

Abstract

The invention relates to a control method and a control system for a submarine stratum space robot, in particular to the technical field of submarine stratum space robots. The invention sets a preset operation point before the robot starts to move, and obtains the action at the next moment according to the preset operation point, the state at the current moment and a control model at the current moment in the moving process of the robot; the states comprise attitude information, positioning information and resistance in the motion process of the submarine drilling robot and the working states of all driving hydraulic cylinders in the submarine drilling robot; the actions are used for controlling the movement of the submarine stratigraphic space robot; the actions comprise control commands of all driving hydraulic cylinders in the seabed drilling robot; the control model at the current moment is obtained by updating the control model at the last moment based on a DDPG algorithm. The invention can realize the automatic control of the submarine stratum space robot.

Description

Control method and system for submarine stratum space robot
Technical Field
The invention relates to the technical field of submarine stratum space robots, in particular to a control method and a control system of a submarine stratum space robot.
Background
With the continuous promotion of the development and utilization of oceans by human beings, the demands of human submarine stratum space for operation tasks, such as resource exploration, environmental monitoring and the like, are increasing. For these work tasks, the use of a new type of subsea stratigraphic space robot is an alternative solution. After the submarine stratum space robot is arranged in the submarine stratum through the base station, the submarine stratum space robot can freely drill and move in the submarine stratum, and the preset operation target is achieved by carrying various sensor arrays.
When the submarine stratum space robot executes tasks, the robot usually needs to move to a preset operation point after receiving a control instruction to complete the operation, but the conventional submarine stratum control field is basically manual operation control, so that the invention and the development of a proper control method and a proper control device are of great importance for realizing the automatic control of the submarine stratum space robot.
Disclosure of Invention
The invention aims to provide a control method and a control system for a submarine stratum space robot, which can realize automatic control of the submarine stratum space robot.
In order to achieve the purpose, the invention provides the following scheme:
a method of controlling a subsea stratigraphic space robot, comprising:
step 1, presetting a preset operation point;
step 2, acquiring the current time state of the submarine stratum space robot to be controlled; the states comprise attitude information, positioning information and resistance in the motion process of the submarine drilling robot and the working states of all driving hydraulic cylinders in the submarine drilling robot;
step 3, obtaining the action at the next moment according to the preset operation point, the state at the current moment and the control model at the current moment, wherein the action at the next moment is used for controlling the motion of the submarine stratum space robot; the actions comprise control instructions of each driving hydraulic cylinder in the submarine drilling robot; the control model at the current moment is obtained by updating the control model at the previous moment based on a DDPG algorithm;
and 4, repeatedly executing the step 2 to the step 3 until the seabed stratum space robot moves to the preset operation point.
Optionally, the specific training process of the initial control model is as follows:
constructing a simulation environment model; the simulation environment model comprises a submarine stratum environment simulation model and a robot simulation motion model;
setting a training preset operation point;
initializing a Critic network, an Actor network, a Critic target network and an Actor target network;
under the current iteration times, determining the current moment state of the robot simulation motion model according to the simulation environment model;
obtaining the action of the robot simulation motion model at the current moment according to the training preset operation point, the state of the robot simulation motion model at the current moment and an Actor network under the current iteration times;
the robot simulation motion model executes the action of the robot simulation motion model at the current moment to obtain behavior reward and the state of the next moment;
inputting the state of the robot simulation motion model at the next moment into an Actor target network under the current iteration times to obtain the action of the robot simulation motion model at the next moment;
inputting the state of the robot simulation motion model at the current moment and the action of the robot simulation motion model at the current moment into a Critic network under the current iteration number to obtain an estimated Q value;
obtaining an actual Q value according to the action of the robot simulation motion model at the next moment, the state of the robot simulation motion model at the next moment, the Critic target network under the current iteration number and the behavior reward;
updating the Critic network under the current iteration times and the Actor network under the current iteration times according to the actual Q value and the estimated Q value to obtain the Critic network under the next iteration times and the Actor network under the next iteration times;
and updating the Critic target network under the current iteration number and the Actor target network under the current iteration number according to the Critic network under the next iteration number and the Actor network under the next iteration number to obtain the Critic target network under the next iteration number and the Actor target network under the next iteration number, updating the iteration number to enter the next iteration until the robot simulation motion model collides or reaches the preset training operation point, and stopping the iteration until the set iteration number is reached to obtain the initial control model.
Optionally, the updating the critical network under the current iteration number and the Actor network under the current iteration number according to the actual Q value and the estimated Q value to obtain the critical network under the next iteration number and the Actor network under the next iteration number specifically includes:
obtaining a Critic network loss function under the current iteration number according to the actual Q value and the estimated Q value;
training the Critic network under the current iteration number according to the Critic network loss function under the current iteration number to obtain the Critic network under the next iteration number;
obtaining an Actor network loss function under the current iteration times according to the estimated Q value;
and updating the Actor network under the current iteration times according to the Actor network loss function under the current iteration times.
Optionally, the updating the critical target network under the current iteration number and the Actor target network under the current iteration number according to the critical network under the next iteration number and the Actor network under the next iteration number to obtain the critical target network under the next iteration and the Actor target network under the next iteration specifically includes:
updating the Critic target network under the current iteration number according to the Critic network under the next iteration number to be used as the Critic target network under the next iteration;
and updating the Actor target network under the current iteration times according to the Actor network under the next iteration times to serve as the Actor target network under the next iteration.
A control system for a subsea stratigraphic space robot, comprising:
the first preset module is used for presetting a preset operation point;
the acquisition module is used for acquiring the current time state of the submarine stratum space robot to be controlled; the states comprise attitude information, positioning information and resistance in the motion process of the submarine drilling robot and the working states of all driving hydraulic cylinders in the submarine drilling robot;
the control module is used for obtaining the action at the next moment according to the preset operation point, the state at the current moment and the control model at the current moment, and the action at the next moment is used for controlling the motion of the submarine stratum space robot; the actions include control commands for each drive hydraulic cylinder inside the subsea drilling robot; the control model at the current moment is obtained by updating the control model at the previous moment based on a DDPG algorithm;
and the execution module is used for repeatedly executing the acquisition module and the control module until the seabed stratum space robot moves to the preset operation point.
Optionally, the control system of the submarine stratum space robot further includes:
the building module is used for building a simulation environment model; the simulation environment model comprises a seabed stratum environment simulation model and a robot simulation motion model;
the second preset module is used for setting a preset training operation point;
the network initialization module is used for initializing a Critic network, an Actor network, a Critic target network and an Actor target network;
the current state determining module is used for determining the current state of the robot simulation motion model according to the simulation environment model under the current iteration times;
the current action determining module is used for obtaining the action of the robot simulation motion model at the current moment according to the preset training operation point, the state of the robot simulation motion model at the current moment and an Actor network under the current iteration times;
the next moment state determining module is used for the robot simulation motion model to execute the action of the robot simulation motion model at the current moment to obtain behavior rewards and the state of the next moment;
the next moment action determining module is used for inputting the state of the robot simulation motion model at the next moment into an Actor target network under the current iteration times to obtain the action of the robot simulation motion model at the next moment;
the estimated Q value calculation module is used for inputting the state of the robot simulation motion model at the current moment and the action of the robot simulation motion model at the current moment into a Critic network under the current iteration number to obtain an estimated Q value;
the actual Q value calculation module is used for obtaining an actual Q value according to the action of the robot simulation motion model at the next moment, the state of the robot simulation motion model at the next moment, the Critic target network under the current iteration number and the behavior reward;
the network updating module is used for updating the Critic network under the current iteration times and the Actor network under the current iteration times according to the actual Q value and the estimated Q value to obtain the Critic network under the next iteration times and the Actor network under the next iteration times;
and the target network updating module is used for updating the Critic target network under the current iteration number and the Actor target network under the current iteration number according to the Critic network under the next iteration number and the Actor network under the next iteration number to obtain the Critic target network under the next iteration and the Actor target network under the next iteration, and updating the iteration number to enter the next iteration until the robot simulation motion model collides or reaches a preset training operation point, and stopping the iteration until the set iteration number is reached to obtain the initial control model.
Optionally, the network update module specifically includes:
the Critic network loss function calculation unit is used for obtaining a Critic network loss function under the current iteration number according to the actual Q value and the estimated Q value;
the Critic network updating unit is used for training the Critic network under the current iteration times according to the Critic network loss function under the current iteration times to obtain the Critic network under the next iteration times;
the Actor network loss function calculation unit is used for obtaining an Actor network loss function under the current iteration times according to the estimated Q value;
and the Actor network updating unit is used for updating the Actor network under the current iteration times according to the Actor network loss function under the current iteration times.
Optionally, the target network updating module specifically includes:
the Critic target network updating unit is used for updating the Critic target network under the current iteration times according to the Critic network under the next iteration times to be used as the Critic target network under the next iteration;
and the Actor target network updating unit is used for updating the Actor target network under the current iteration times according to the Actor network under the next iteration times to be used as the Actor target network under the next iteration times.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects: the invention sets a preset operation point before the robot starts to move, and obtains the action at the next moment according to the preset operation point, the state at the current moment and a control model at the current moment in the moving process of the robot; the states comprise attitude information, positioning information and resistance in the motion process of the submarine drilling robot and the working states of all driving hydraulic cylinders in the submarine drilling robot; the actions are used to control the motion of the subsea stratigraphic space robot; the actions include control commands for each drive hydraulic cylinder inside the subsea drilling robot; the control model at the current moment is obtained by updating the control model at the previous moment based on a DDPG algorithm, and automatic control of the submarine stratum space robot can be realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a method for controlling a subsea stratigraphic space robot according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a hydraulic cylinder driving form change of a submarine stratum space robot provided by an embodiment of the invention;
fig. 3 is a flowchart of training a control model of a subsea stratigraphic space robot according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
In recent years, a control method based on reinforcement learning is widely applied to robot control and obtains good effects, so that the invention provides a control method for a submarine stratum space robot by using the motion characteristics of the submarine stratum space robot, a submarine stratum environment and a reinforcement learning algorithm, so as to solve the control problem of the robot during operation in the submarine stratum space and realize automatic control, and the control method for the submarine stratum space robot specifically comprises the following steps:
step 1, presetting a preset operation point.
Step 2, acquiring the current time state of the submarine stratum space robot to be controlled; the states comprise attitude information, positioning information and resistance in the motion process of the submarine drilling robot and the working states of all driving hydraulic cylinders in the submarine drilling robot.
Step 3, obtaining the action at the next moment according to the preset operation point, the state at the current moment and the control model at the current moment, wherein the action at the next moment is used for controlling the motion of the submarine stratum space robot; the actions comprise control instructions of each driving hydraulic cylinder in the submarine drilling robot; the control model at the current moment is obtained by updating the control model at the previous moment based on a DDPG algorithm. The control model also continuously adjusts the control model in the actual use process, and the specific process is as follows: and inputting the state of the current moment into a control model (Actor network) of the current moment to obtain the action of the next moment, and then updating the control model of the current moment by using a Critic network, a Critic target network and the Actor target network by adopting a DDPG algorithm.
And 4, repeatedly executing the steps 2 to 3 until the seabed stratum space robot moves to the preset operation point.
In practical application, the specific training process of the initial control model (the control model at time 0 in practical use) is as follows:
constructing a simulation environment model; the simulation environment model comprises a seabed stratum environment simulation model and a robot simulation motion model.
And setting a training preset operation point.
Initializing a Critic network, an Actor network, a Critic target network and an Actor target network.
And under the current iteration times, determining the current state of the robot simulation motion model according to the seabed stratum environment simulation model and the robot simulation motion model.
And obtaining the action of the robot simulation motion model at the current moment according to the training preset operation point, the state of the robot simulation motion model at the current moment and the Actor network under the current iteration times.
And the robot simulation motion model executes the action of the robot simulation motion model at the current moment to obtain a behavior reward and the state at the next moment.
And inputting the state of the robot simulation motion model at the next moment into an Actor target network under the current iteration times to obtain the action of the robot simulation motion model at the next moment.
And inputting the state of the robot simulation motion model at the current moment and the action of the robot simulation motion model at the current moment into a Critic network under the current iteration number to obtain an estimated Q value.
And obtaining an actual Q value according to the action of the robot simulation motion model at the next moment, the state of the robot simulation motion model at the next moment, the Critic target network under the current iteration number and the behavior reward.
And updating the Critic network under the current iteration number and the Actor network under the current iteration number according to the actual Q value and the estimated Q value to obtain the Critic network under the next iteration number and the Actor network under the next iteration number.
And updating the Critic target network under the current iteration number and the Actor target network under the current iteration number according to the Critic network under the next iteration number and the Actor network under the next iteration number to obtain the Critic target network under the next iteration number and the Actor target network under the next iteration number, updating the iteration number to enter the next iteration until the robot simulation motion model collides or a preset training operation point is reached, and stopping the iteration until a set iteration number is reached to obtain the initial control model.
In practical application, the updating the Critic network under the current iteration number and the Actor network under the current iteration number according to the actual Q value and the estimated Q value to obtain the Critic network under the next iteration number and the Actor network under the next iteration number specifically includes:
and obtaining a Critic network loss function under the current iteration number according to the actual Q value and the estimated Q value.
And training the Critic network under the current iteration number according to the Critic network loss function under the current iteration number to obtain the Critic network under the next iteration number.
And obtaining an Actor network loss function under the current iteration number according to the estimated Q value.
And updating the Actor network under the current iteration times according to the Actor network loss function under the current iteration times.
In practical application, the updating the critical target network in the current iteration count and the Actor target network in the current iteration count according to the critical network in the next iteration count and the Actor network in the next iteration count to obtain the critical target network in the next iteration and the Actor target network in the next iteration specifically includes:
and updating the Critic target network under the current iteration times according to the Critic network under the next iteration times to serve as the Critic target network under the next iteration.
And updating the Actor target network under the current iteration times according to the Actor network under the next iteration times to serve as the Actor target network under the next iteration.
As shown in fig. 1, the embodiment of the present invention provides a more specific control method for a submarine stratigraphic space robot based on the motion characteristics of the submarine stratigraphic space robot, the submarine stratigraphic environment and the reinforcement learning algorithm, which specifically comprises the following steps:
s101, defining the task target, state, action and reward function of the submarine stratum space robot.
S102, arranging a sensor array on the body of the submarine stratum space robot, and collecting state parameters of the robot in the motion process.
And S103, building a control model of the seabed stratum space robot based on reinforcement learning.
And S104, learning a training control model, and applying the training control model to an actual operation task of the submarine stratum space robot.
In practical applications, the task objectives are: by inputting a preset operation point, the robot plans to move to the preset operation point autonomously, and meanwhile, the path of the seabed stratum space robot moving to the preset operation point is an optimal path. The optimal path is that the obstacle area should be avoided during the movement of the seabed stratum space robot, and the length of the path should be shortest. The obstructed areas are areas of the formation that are difficult to drill through during movement.
The states specifically refer to attitude information, positioning information, blocked condition in the drilling process of the submarine stratum space robot and the working states of all driving hydraulic cylinders in the robot in the moving process of the robot.
Based on the definition of the attitude information, the positioning information and the obstructed information in the drilling process of the submarine stratum space robot, the state of the submarine stratum space robot can be defined as follows:
s=[x,y,z,ψ,θ,φ,f1,…,fn,h1,…,hm]
attitude information of the seabed stratum space robot is expressed by adopting a space rotation Euler angle, and concretely, psi, theta and phi are defined as rotation angles around a Z axis, a Y axis and an X axis respectively and are divided into a yaw angle, a pitch angle and a roll angle.
The positioning information of the submarine stratum space robot is expressed by adopting a space coordinate value, specifically, a northeast coordinate system is established by taking a release point of the submarine stratum space robot as a coordinate origin, and X, Y and Z are defined as coordinate values of the submarine stratum space robot on an X axis, a Y axis and a Z axis, namely the positioning information.
The obstruction information of the submarine stratum space robot in the drilling process is specifically the robot resistance value measured at a specific point on the robot body, and is defined as fiAnd (i is more than or equal to 1 and less than or equal to n), wherein n represents the number of the specific points defined on the body.
The working state of each driving hydraulic cylinder in the submarine stratum space robot is defined as hi(i is more than or equal to 1 and less than or equal to m), wherein m represents the number of each driving hydraulic cylinder in the robot. The working states of the driving hydraulic cylinders in the submarine stratum space robot specifically comprise an open-close state and an oil pressure stateAnd flow status, each defined as oi、piAnd ui. Namely, the following steps are included:
hi=[oi,pi,ui]
the action of the submarine stratum space robot is a control instruction for the submarine stratum space robot, the submarine stratum space robot is integrally driven by the combination of the plurality of hydraulic cylinders, and the submarine stratum space robot is converted into a specific motion form by the combined control of the plurality of hydraulic cylinders, so that the specific drilling speed and direction of the submarine stratum space robot can be controlled. The control of the hydraulic cylinder comprises the control of the opening and closing state of the hydraulic cylinder, the control of the hydraulic state of the hydraulic cylinder and the control of the flow state of the hydraulic cylinder. Specifically, the actions may be represented as:
a=[h1,…,hm]
the subsea stratigraphic space robot is a multi-body segment structure including, but not limited to, a kinematic support body segment, a steering body segment, a propulsion body segment, and a drilling body segment; fig. 2 is a schematic diagram showing the variation of the hydraulic cylinder driving form of the submarine stratum space robot.
When the internal hydraulic cylinder of the support body section is not displaced, the shape is shown as part (a) -1 in fig. 2, and the support body section can play a role in supporting drilling movement. After the internal hydraulic cylinders of the support body sections are displaced, the shape is shown as (a) -2 part in fig. 2, so that the movement is convenient.
When the displacement of the internal hydraulic cylinder of the propulsion body section is increased, the shape is shown as (b) -1 part in figure 2, and the support body section can be matched to play a role in propelling the front movement of the robot. When the displacement of the internal hydraulic cylinder of the propulsion body segment is reduced, the shape is shown as (b) -2 part in figure 2, and the matching support body segment can play a role in pulling the rear part of the robot to move.
The four hydraulic cylinders are arranged in the steering body knuckle, when the four hydraulic cylinders do not displace, the form of the steering body knuckle is shown as a part (c) -1 in fig. 2, when the four hydraulic cylinders displace in different lengths, displacement difference is formed among the hydraulic cylinders, and steering of the steering body knuckle can be achieved as shown as a part (c) -2 in fig. 2.
The reward function specifically comprises a control behavior reward, an obstacle avoidance behavior reward and a target achievement degree reward.
The control behavior reward is reward and punishment evaluation on the control behavior in the motion process of the submarine stratum space robot, and is defined as r1, and the reward and punishment evaluation is specifically as shown in the following formula,
Figure BDA0003675623930000101
which is indicative of the steering angle, is,
Figure BDA0003675623930000102
when the robot drills straight, the robot is given the maximum positive reward; as the robot makes steering drilling, the positive reward given will decrease as the steering angle increases.
Figure BDA0003675623930000103
The obstacle avoidance behavior reward refers to reward and punishment evaluation on obstacle avoidance behaviors of the submarine stratum space robot, and is defined as r2, and the obstacle avoidance behavior reward is specifically expressed as the following formula. When the robot is far away from the obstacle area, a positive reward will be given; when the robot approaches the obstacle area, a negative reward will be given; when the robot enters an obstructed area, the robot will be given a large negative reward.
Figure BDA0003675623930000111
The target achievement degree reward is reward and punishment evaluation on the achievement degree of the operation task of the submarine stratum space robot, is defined as r3, and is given negative reward when the robot is far away from a preset operation point; when the robot approaches the predetermined operating point, a positive reward will be given; when the robot reaches the predetermined operating point, the robot will be awarded a larger positive prize.
Figure BDA0003675623930000112
Based on the definitions of the control behavior reward, the obstacle avoidance behavior reward and the target achievement degree reward of the submarine stratum space robot, a reward function r of the submarine stratum space robot can be defined, and the reward function r is specifically as follows. Wherein k isiRepresenting the bonus factor.
Figure BDA0003675623930000113
In practical applications, the sensor array includes a positioning sensor, an attitude sensor, and a plurality of resistance measurement sensors. And the state parameter of the robot in the motion process is s, wherein the positioning information of the seabed stratum space robot is measured by a positioning sensor in the sensor array, the attitude information is measured by an attitude sensor of the seabed stratum space robot, and the obstruction information in the drilling process is measured by a resistance sensor. By calculating the Euclidean distance between the positioning information measured by the positioning sensor and the preset operation point, the position relation between the submarine stratum space robot and the preset operation point can be judged to be far away, close or arriving. The attitude information of the submarine stratum space robot measured by the attitude sensor can be used for judging whether the submarine stratum space robot moves straight or not in the moving process or not, or judging the specific steering angle.
By calculating the change condition of the accumulated value of the resistance information of each resistance sensor, the change of the blocked condition in the whole drilling process of the robot can be judged, and the relation between the change and the blocked area is far away or close. By defining a resistance threshold value, whether the robot enters an obstacle area can be judged. Defining the resistance threshold as epsilon, when fi>And when the epsilon (1 is more than or equal to i and less than or equal to n), whether the robot enters the obstacle area is shown. Otherwise, not.
In practical application, the third step is specifically:
the control model of the seabed stratum space robot based on reinforcement learning is based on DDPG algorithm. The DDPG algorithm is additionally provided with a strategy network on the basis of the DQN algorithm to output the action value of the seabed stratum space robot, and simultaneously learns the Q network and the strategy network. The DDPG algorithm is an Actor-Critic structure.
The DDPG algorithm construction specifically comprises Q network construction, strategy network construction and target network construction. The Q network is a criticic network, and the strategy network is an Actor network. The Actor network inputs the state (state) of the environment at the present time and outputs a control command for each driving hydraulic cylinder of the robot.
The weight of the Critic network is defined as thetaQI.e. the Critic network may be denoted as Q(s)t,atQ) (ii) a The weight of an Actor network is defined as θμI.e. the Actor network may be denoted as mu(s)tμ) (ii) a The target network initial weight is copied from a Critic network and an Actor network and specifically comprises two parts, and the weight is specifically defined as thetaQ'And thetaμ'Each can be represented as Q'(st+1,at+1Q') And mu'(stμ')。
The loss of the training process of the Critic network is calculated by the actual Q value calculated by the target network and the Q value calculated by the target network.
The actual Q value calculated by the target network is as follows:
yt=rt+γQ'(st+1,at+1Q')
the Critic network loss function is calculated as follows:
Figure BDA0003675623930000121
in the Actor network, the Q value fed back through the Critic network is used as a loss function of model training, and the following formula is specifically given:
LossActor=-Q(st,atQ)
and then completing updating of the Actor network by a gradient descent method.
Figure BDA0003675623930000122
The Actor network inputs the state of the environment at the current moment and outputs a control instruction for each driving hydraulic cylinder of the robot, namely according to at=μ(stμ) In the case of a liquid crystal display device, in particular,
at=[h1t,…,hmt]=μ(stμ)=μ(xt,yt,ztttt,f1t,…,fnt,h1t,…,hmtμ)
step four, specifically: as shown in fig. 3, the training of the control model specifically includes the following steps:
and S1, establishing a simulation environment model of the seabed stratum and a simulation motion model of the robot by using a computing mechanism, and setting a release point and a preset operation point of a robot base station.
S2, initializing Critic network Q, Actor network mu and initializing target network Q'And mu'And setting the training round of the model.
S3, each training round includes a plurality of time steps. In each training time step, the method specifically comprises the following steps
(1) According to the state s of the environment in the current time steptThe Actor network outputs an action a to be executed by the submarine stratigraphic space robott
(2) Subsea stratigraphic space robotic execution atThe robot gets feedback rtState s oftIs converted into st+1
(3) The Critic network calculates the estimated Q value, i.e. Q(s)t,atQ) The target network calculates the actual Q value, which is yt
(4) Computing LossActorAnd LossCriticAnd updating the weights of the Critic network and the Actor network.
(5) The target network weights are updated and, in particular,
θQ'←τθQ+(1-τ)θQ',θμ'←τθμ+(1-τ)θμ'
after the learning training of the control model of the submarine stratum space robot is completed, the method can be applied to the actual operation task of the submarine stratum space robot.
The embodiment of the invention also provides a control system of the submarine stratum space robot, which aims at the method, and comprises the following steps:
the first preset module is used for presetting a preset operation point.
The acquisition module is used for acquiring the current time state of the submarine stratum space robot to be controlled; the states comprise attitude information, positioning information and resistance in the motion process of the submarine drilling robot and the working states of all driving hydraulic cylinders in the submarine drilling robot.
The control module is used for obtaining the action at the next moment according to the preset operation point, the state at the current moment and the control model at the current moment, and the action at the next moment is used for controlling the motion of the submarine stratum space robot; the actions include control commands for each drive hydraulic cylinder inside the subsea drilling robot; the control model at the current moment is obtained by updating the control model at the last moment based on a DDPG algorithm.
And the execution module is used for repeatedly executing the acquisition module and the control module until the submarine stratum space robot moves to the preset operation point.
As an optional implementation manner, the control system of the subsea stratigraphic space robot further comprises:
the building module is used for building a simulation environment model; the simulation environment model comprises a submarine stratum environment simulation model and a robot simulation motion model.
And the second preset module is used for setting a preset training operation point.
And the network initialization module is used for initializing a Critic network, an Actor network, a Critic target network and an Actor target network.
And the current state determining module is used for determining the current state of the robot simulation motion model according to the simulation environment model under the current iteration times.
And the current action determining module is used for obtaining the action of the robot simulation motion model at the current moment according to the preset training operation point, the state of the robot simulation motion model at the current moment and an Actor network under the current iteration times.
And the next moment state determining module is used for the robot simulation motion model to execute the action of the robot simulation motion model at the current moment so as to obtain behavior rewards and the state of the next moment.
And the next moment action determining module is used for inputting the state of the robot simulation motion model at the next moment into the Actor target network under the current iteration times to obtain the action of the robot simulation motion model at the next moment.
And the estimated Q value calculation module is used for inputting the state of the robot simulation motion model at the current moment and the action of the robot simulation motion model at the current moment into the Critic network under the current iteration number to obtain an estimated Q value.
And the actual Q value calculation module is used for obtaining an actual Q value according to the action of the robot simulation motion model at the next moment, the state of the robot simulation motion model at the next moment, the Critic target network under the current iteration number and the behavior reward.
And the network updating module is used for updating the Critic network under the current iteration times and the Actor network under the current iteration times according to the actual Q value and the estimated Q value to obtain the Critic network under the next iteration times and the Actor network under the next iteration times.
And the target network updating module is used for updating the Critic target network under the current iteration number and the Actor target network under the current iteration number according to the Critic network under the next iteration number and the Actor network under the next iteration number to obtain the Critic target network under the next iteration and the Actor target network under the next iteration number, updating the iteration number to enter the next iteration until the robot simulation motion model collides or reaches a preset training operation point, and stopping the iteration until the set iteration number is reached to obtain the initial control model.
As an optional implementation manner, the network update module specifically includes:
and the Critic network loss function calculation unit is used for obtaining the Critic network loss function under the current iteration number according to the actual Q value and the estimated Q value.
And the Critic network updating unit is used for training the Critic network under the current iteration number according to the Critic network loss function under the current iteration number to obtain the Critic network under the next iteration number.
And the Actor network loss function calculating unit is used for obtaining the Actor network loss function under the current iteration times according to the estimated Q value.
And the Actor network updating unit is used for updating the Actor network under the current iteration times according to the Actor network loss function under the current iteration times.
As an optional implementation manner, the target network updating module specifically includes:
and the Critic target network updating unit is used for updating the Critic target network under the current iteration times according to the Critic network under the next iteration times to be used as the Critic target network under the next iteration.
And the Actor target network updating unit is used for updating the Actor target network under the current iteration times according to the Actor network under the next iteration times to be used as the Actor target network under the next iteration times.
The invention has the following technical effects:
(1) according to the control method of the seabed stratum space robot based on reinforcement learning, the robot plans to move to a preset operation point automatically by inputting the preset operation point. The autonomous planned obstacle avoidance motion path can enable the seabed stratum space robot to avoid an obstacle area, and the length of the path is the shortest.
(2) The control method provided by the invention is based on the motion characteristics of the submarine stratum space robot, submarine stratum environment and reinforcement learning algorithm, and can obtain excellent technical effect when controlling the motion of the submarine stratum space robot.
(3) The control method of the submarine stratum space robot provided by the invention can also control various other types of robots for submarine stratum space operation, and has good transportability.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the description of the method part.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A method of controlling a subsea stratigraphic space robot, comprising:
step 1, presetting a preset operation point;
step 2, acquiring the current time state of the submarine stratum space robot to be controlled; the states comprise attitude information, positioning information and resistance in the motion process of the submarine drilling robot and the working states of all driving hydraulic cylinders in the submarine drilling robot;
step 3, obtaining the action at the next moment according to the preset operation point, the state at the current moment and the control model at the current moment, wherein the action at the next moment is used for controlling the motion of the submarine stratum space robot; the actions comprise control instructions of each driving hydraulic cylinder in the submarine drilling robot; the control model at the current moment is obtained by updating the control model at the previous moment based on a DDPG algorithm;
and 4, repeatedly executing the step 2 to the step 3 until the seabed stratum space robot moves to the preset operation point.
2. The control method of the submarine stratum space robot according to claim 1, wherein the specific training process of the initial control model is as follows:
constructing a simulation environment model; the simulation environment model comprises a submarine stratum environment simulation model and a robot simulation motion model;
setting a training preset operation point;
initializing a Critic network, an Actor network, a Critic target network and an Actor target network;
under the current iteration times, determining the current state of the robot simulation motion model according to the simulation environment model;
obtaining the action of the robot simulation motion model at the current moment according to the training preset operation point, the state of the robot simulation motion model at the current moment and an Actor network under the current iteration times;
the robot simulation motion model executes the action of the robot simulation motion model at the current moment to obtain behavior reward and the state of the next moment;
inputting the state of the robot simulation motion model at the next moment into an Actor target network under the current iteration times to obtain the action of the robot simulation motion model at the next moment;
inputting the state of the robot simulation motion model at the current moment and the action of the robot simulation motion model at the current moment into a Critic network under the current iteration number to obtain an estimated Q value;
obtaining an actual Q value according to the action of the robot simulation motion model at the next moment, the state of the robot simulation motion model at the next moment, the Critic target network under the current iteration number and the behavior reward;
updating the Critic network under the current iteration times and the Actor network under the current iteration times according to the actual Q value and the estimated Q value to obtain the Critic network under the next iteration times and the Actor network under the next iteration times;
and updating the Critic target network under the current iteration number and the Actor target network under the current iteration number according to the Critic network under the next iteration number and the Actor network under the next iteration number to obtain the Critic target network under the next iteration number and the Actor target network under the next iteration number, updating the iteration number to enter the next iteration until the robot simulation motion model collides or a preset training operation point is reached, and stopping the iteration until a set iteration number is reached to obtain the initial control model.
3. The method for controlling a submarine stratum space robot according to claim 2, wherein the step of updating the Critic network for the current iteration count and the Actor network for the current iteration count according to the actual Q value and the estimated Q value to obtain the Critic network for the next iteration count and the Actor network for the next iteration count specifically comprises:
obtaining a Critic network loss function under the current iteration number according to the actual Q value and the estimated Q value;
training the Critic network under the current iteration times according to the Critic network loss function under the current iteration times to obtain the Critic network under the next iteration times;
obtaining an Actor network loss function under the current iteration times according to the estimated Q value;
and updating the Actor network under the current iteration times according to the Actor network loss function under the current iteration times.
4. The method for controlling a submarine stratigraphic space robot according to claim 2, wherein the updating of the Critic target network under the current iteration number and the Actor target network under the current iteration number according to the Critic network under the next iteration number and the Actor network under the next iteration number to obtain the Critic target network under the next iteration and the Actor target network under the next iteration specifically comprises:
updating the Critic target network under the current iteration times according to the Critic network under the next iteration times to be used as the Critic target network under the next iteration;
and updating the Actor target network under the current iteration times according to the Actor network under the next iteration times to serve as the Actor target network under the next iteration.
5. A control system for a subsea stratigraphic space robot, comprising:
the presetting module is used for presetting a preset operation point;
the acquisition module is used for acquiring the current time state of the submarine stratum space robot to be controlled; the states comprise attitude information, positioning information and resistance in the motion process of the submarine drilling robot and the working states of all driving hydraulic cylinders in the submarine drilling robot;
the control module is used for obtaining the action at the next moment according to the preset operation point, the state at the current moment and the control model at the current moment, and the action at the next moment is used for controlling the motion of the submarine stratum space robot; the actions include control commands for each drive hydraulic cylinder inside the subsea drilling robot; the control model at the current moment is obtained by updating the control model at the last moment based on a DDPG algorithm;
and the execution module is used for repeatedly executing the acquisition module and the control module until the seabed stratum space robot moves to the preset operation point.
6. The control system of a subsea stratigraphic space robot according to claim 5, characterized by further comprising:
the building module is used for building a simulation environment model; the simulation environment model comprises a seabed stratum environment simulation model and a robot simulation motion model;
the second preset module is used for setting a preset training operation point;
the network initialization module is used for initializing a Critic network, an Actor network, a Critic target network and an Actor target network;
the current state determining module is used for determining the current state of the robot simulation motion model according to the simulation environment model under the current iteration times;
the current action determining module is used for obtaining the action of the robot simulation motion model at the current moment according to the preset training operation point, the state of the robot simulation motion model at the current moment and an Actor network under the current iteration times;
the next moment state determining module is used for the robot simulation motion model to execute the action of the robot simulation motion model at the current moment so as to obtain behavior reward and the state of the next moment;
the next moment action determining module is used for inputting the state of the robot simulation motion model at the next moment into an Actor target network under the current iteration times to obtain the action of the robot simulation motion model at the next moment;
the estimated Q value calculation module is used for inputting the state of the robot simulation motion model at the current moment and the action of the robot simulation motion model at the current moment into a Critic network under the current iteration number to obtain an estimated Q value;
the actual Q value calculation module is used for obtaining an actual Q value according to the action of the robot simulation motion model at the next moment, the state of the robot simulation motion model at the next moment, the Critic target network under the current iteration number and the behavior reward;
the network updating module is used for updating the Critic network under the current iteration times and the Actor network under the current iteration times according to the actual Q value and the estimated Q value to obtain the Critic network under the next iteration times and the Actor network under the next iteration times;
and the target network updating module is used for updating the Critic target network under the current iteration number and the Actor target network under the current iteration number according to the Critic network under the next iteration number and the Actor network under the next iteration number to obtain the Critic target network under the next iteration and the Actor target network under the next iteration, and updating the iteration number to enter the next iteration until the robot simulation motion model collides or reaches a preset training operation point, and stopping the iteration until the set iteration number is reached to obtain the initial control model.
7. The control system of a subsea stratigraphic space robot according to claim 6, characterized in that said network update module comprises in particular:
the Critic network loss function calculation unit is used for obtaining a Critic network loss function under the current iteration number according to the actual Q value and the estimated Q value;
the Critic network updating unit is used for training the Critic network under the current iteration times according to the Critic network loss function under the current iteration times to obtain the Critic network under the next iteration times;
the Actor network loss function calculation unit is used for obtaining an Actor network loss function under the current iteration times according to the estimated Q value;
and the Actor network updating unit is used for updating the Actor network under the current iteration times according to the Actor network loss function under the current iteration times.
8. The control system of a subsea stratigraphic space robot according to claim 6, characterized in that said target network update module comprises in particular:
the Critic target network updating unit is used for updating the Critic target network under the current iteration times according to the Critic network under the next iteration times to be used as the Critic target network under the next iteration;
and the Actor target network updating unit is used for updating the Actor target network under the current iteration times according to the Actor network under the next iteration times to be used as the Actor target network under the next iteration times.
CN202210623726.5A 2022-06-02 2022-06-02 Control method and system for submarine stratum space robot Active CN114771783B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210623726.5A CN114771783B (en) 2022-06-02 2022-06-02 Control method and system for submarine stratum space robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210623726.5A CN114771783B (en) 2022-06-02 2022-06-02 Control method and system for submarine stratum space robot

Publications (2)

Publication Number Publication Date
CN114771783A true CN114771783A (en) 2022-07-22
CN114771783B CN114771783B (en) 2023-08-22

Family

ID=82420784

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210623726.5A Active CN114771783B (en) 2022-06-02 2022-06-02 Control method and system for submarine stratum space robot

Country Status (1)

Country Link
CN (1) CN114771783B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006065703A (en) * 2004-08-30 2006-03-09 Inst Of Systems Information Technologies Kyushu Self-position estimation device, self-position estimation method, program capable of executing self-position estimation method by computer, and recording medium recording program
WO2015050884A1 (en) * 2013-10-03 2015-04-09 Halliburton Energy Services, Inc. Multi-layer sensors for downhole inspection
CN112668235A (en) * 2020-12-07 2021-04-16 中原工学院 Robot control method of DDPG algorithm based on offline model pre-training learning
CN113236223A (en) * 2021-06-10 2021-08-10 安徽理工大学 Intelligent design system and method for coal mine underground gas prevention and control drilling
CN114035568A (en) * 2021-03-27 2022-02-11 浙江大学 Method for planning path of stratum drilling robot in combustible ice trial production area
CN114563011A (en) * 2022-01-24 2022-05-31 北京大学 Active auditory localization method for map-free navigation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006065703A (en) * 2004-08-30 2006-03-09 Inst Of Systems Information Technologies Kyushu Self-position estimation device, self-position estimation method, program capable of executing self-position estimation method by computer, and recording medium recording program
WO2015050884A1 (en) * 2013-10-03 2015-04-09 Halliburton Energy Services, Inc. Multi-layer sensors for downhole inspection
CN112668235A (en) * 2020-12-07 2021-04-16 中原工学院 Robot control method of DDPG algorithm based on offline model pre-training learning
CN114035568A (en) * 2021-03-27 2022-02-11 浙江大学 Method for planning path of stratum drilling robot in combustible ice trial production area
CN113236223A (en) * 2021-06-10 2021-08-10 安徽理工大学 Intelligent design system and method for coal mine underground gas prevention and control drilling
CN114563011A (en) * 2022-01-24 2022-05-31 北京大学 Active auditory localization method for map-free navigation

Also Published As

Publication number Publication date
CN114771783B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
JP6854549B2 (en) AUV action planning and motion control methods based on reinforcement learning
CN107053179B (en) A kind of mechanical arm Compliant Force Control method based on Fuzzy Reinforcement Learning
CN113681543B (en) Mechanical arm zero-force control method based on model prediction
CN103955231B (en) Intelligent control method, device and system for multi-joint mechanical arm
CN105773623A (en) SCARA robot trajectory tracking control method based on prediction indirect iterative learning
CN110666793A (en) Method for realizing robot square part assembly based on deep reinforcement learning
CN116460860B (en) Model-based robot offline reinforcement learning control method
CN108555914B (en) DNN neural network self-adaptive control method based on tendon-driven dexterous hand
CN114035568A (en) Method for planning path of stratum drilling robot in combustible ice trial production area
WO2023020036A1 (en) Redundant manipulator tracking control method based on echo state network
CN115416024A (en) Moment-controlled mechanical arm autonomous trajectory planning method and system
Li et al. A 3D printed variable cross-section pneumatic soft manipulator with high-precision positioning capability: Design and control implementation
CN114771783A (en) Control method and system for submarine stratum space robot
CN117742387A (en) Track planning method for hydraulic excavator based on TD3 reinforcement learning algorithm
JPH07319507A (en) Robot controller
CN114211503B (en) Rope-driven flexible robot track control method and system based on visual feedback
WO2023165174A1 (en) Method for constructing controller for robot, motion control method and apparatus for robot, and robot
CN115922711A (en) Brain-like synchronous tracking control method for double mechanical arms
CN115390573A (en) Manta ray type bionic fish control method and device based on reinforcement learning and storage medium
CN105467841B (en) A kind of class nerve control method of humanoid robot upper extremity exercise
Maeda et al. Improving operational space control of heavy manipulators via open-loop compensation
CN115284276A (en) Robot joint torque control method based on long-term and short-term memory network
Li et al. Global Control of Soft Manipulator by Unifying Cosserat and Neural Network
Heyu et al. Impedance control method with reinforcement learning for dual-arm robot installing slabstone
CN114800487B (en) Underwater robot operation control method based on disturbance observation technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant