CN113119132A - Deep sea fine remote control task implementation method based on simulation learning - Google Patents
Deep sea fine remote control task implementation method based on simulation learning Download PDFInfo
- Publication number
- CN113119132A CN113119132A CN202110430739.6A CN202110430739A CN113119132A CN 113119132 A CN113119132 A CN 113119132A CN 202110430739 A CN202110430739 A CN 202110430739A CN 113119132 A CN113119132 A CN 113119132A
- Authority
- CN
- China
- Prior art keywords
- simulation
- deep sea
- mechanical arm
- remote control
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004088 simulation Methods 0.000 title claims abstract description 53
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000002787 reinforcement Effects 0.000 claims abstract description 9
- 230000033001 locomotion Effects 0.000 claims description 11
- 230000007246 mechanism Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 4
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 230000008054 signal transmission Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/0081—Programme-controlled manipulators with master teach-in means
Abstract
The invention relates to a deep sea fine remote control task implementation method based on simulation learning. The method for realizing the deep sea fine manipulation task based on the simulation learning provided by the invention is characterized in that virtual manipulation data is used as a teaching example, a model-free reinforcement learning algorithm is applied, and a manipulation strategy which can be migrated to an actual scene and has certain robustness and generalization capability is trained in a simulation environment, so that the technical difficulty of deep sea fine remote manipulation is reduced, and the intelligent degree of deep sea operation is improved.
Description
Technical Field
The invention relates to the technical field of robot control and machine learning, in particular to an intelligent control method, and specifically relates to a deep sea fine remote control task implementation method based on simulation learning.
Background
With the continuous development of marine equipment and technology, people's exploration for deep sea enters a more profound stage. Autonomous underwater vehicles, deep sea motorized robotic arms, underwater dexterous hands, etc., are expected to utilize these tools to perform detailed manipulations, e.g., building, assembling, etc., on the ocean floor, similar to those on land, to better explore the ocean. Under the complex and severe working environment, teleoperation is still the most important means for realizing deep sea operation at present, namely an operator controls a mechanical arm to operate in a cockpit or on shore by operating a main hand. However, for some touch-rich fine-manipulation tasks, even experienced operators need to make repeated attempts before they can be completed, subject to water flow, light, etc. In addition, the delay in signal transmission also causes great difficulty in operation.
Deep sea teleoperation is an empirical technique and the operator needs to accumulate experience through a large number of practical simulations. However, deep sea robots and associated equipment are expensive to manufacture and maintain and frequent practice is impractical. In actual operation, the cost of sea and the risk of failure need to be considered, and in order to reduce the failure probability, a new hand has a chance to participate. These factors all contribute to the high cost of cultivating an experienced deep sea operator.
The invention provides a deep sea fine manipulation task implementation method based on simulation learning, which is an end-to-end method and aims to reduce the technical threshold of deep sea fine remote manipulation and save the high talent culture cost.
Disclosure of Invention
The invention aims to provide a deep sea fine manipulation task implementation method based on simulation learning, which is characterized in that virtual manipulation data is used as a teaching example, a model-free reinforcement learning algorithm is applied, and a manipulation strategy which can be migrated to an actual scene and has certain robustness and generalization capability is trained in a simulation environment, so that the technical difficulty of deep sea fine remote manipulation is reduced, and the intelligent degree of deep sea operation is improved.
The invention provides a deep sea fine remote control task implementation method based on simulation learning, which is characterized by comprising the following steps,
step S1: building a MuJoCo simulation environment of a target task, wherein the MuJoCo simulation environment comprises an executing mechanism such as a mechanical arm and the like and a control object;
step S2: an operator operates the mechanical arm in the simulation environment established in the step S1 to complete a target task, and software records the motion sequence of each joint of the mechanical arm, so as to provide a teaching example for the simulation training in the step S4;
step S3: an Actor-critical framework is adopted to construct a network, the observation input of a strategy network comprises information such as mechanical arm body motion, force feedback, vision and the like, and the output of the strategy network is a joint position instruction of each joint of the mechanical arm;
step S4: and (5) taking the teaching example in the step S2 as a simulation object, and training the network by applying a reinforcement learning algorithm which is not based on a model, so that the mechanical arm in the simulation environment can complete the target task with a success rate of more than 99%.
And step S5, migrating the trained strategy network to an actual scene, and completing the target task without manual intervention.
Further, the step S1 further includes adding noise parameters to the simulation environment, where the noise parameters are added to the positions of the joints of the mechanical arm.
Compared with the prior art, the beneficial effects of the invention are embodied in the following aspects:
the invention applies the simulation learning or the reinforcement learning to the deep sea fine manipulation task for the first time, is a new teleoperation means facing to the rich contact manipulation task in the extreme environment, improves the intelligent degree of similar operation to a great extent, and reduces the on-site operation difficulty. For a target task with a fixed scene, the establishment and the teaching of the simulation environment can be performed in advance to reduce the workload.
The invention adopts an end-to-end method and takes a neural network as a carrier to learn in a simulation environment to obtain a strategy with stronger robustness and generalization capability. The better robustness and generalization capability enable a finer touch-rich steering task to be achieved with this method and tolerate some disturbances and deviations, whereas traditional direct teleoperation relies entirely on the operator's steering level, without this advantage.
The method provided by the invention does not need real-time communication between the master end and the slave end, and can realize the unmanned intervention operation process after the trained strategy network is deployed to the slave end, thereby avoiding the time delay problem in the signal transmission process. Different from the realization that only the target task is concerned by a reinforcement learning algorithm which is not based on a model, the method takes the teaching example in the simulation environment as the simulation object to train the strategy, and introduces the simulation reward item into the reward function, so that the explored observation space and action space are concentrated on the teaching example, the training cost is reduced, and the success rate of realizing the target task is improved.
Drawings
FIG. 1 is a schematic diagram comparing an end-to-end implementation method of deep sea fine remote control task based on simulation learning with a traditional direct remote control method;
FIG. 2 is a flow chart of an end-to-end implementation method of the deep sea fine remote control task based on the simulation learning of the present invention;
fig. 3 is a schematic input/output diagram of the policy network according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the field of deep sea exploration, the traditional teleoperation technology has many limitations. Firstly, teleoperation depends on experience and technology and psychological quality of operators to a great extent, and is high in cost and poor in reliability; secondly, for the fine control task with rich contact, the control difficulty is very large and the target task is often difficult to realize due to the influence of factors such as water flow, light, communication delay and the like. The deep sea fine control task implementation method based on simulation learning provided by the invention overcomes the limitations and improves the operation intelligence.
As shown in fig. 2, the implementation method of deep sea fine teleoperation task based on simulation learning proposed by the present invention includes the following five steps:
and step S1, building a MuJoCo simulation environment of the target task, wherein the MuJoCo simulation environment comprises an executing mechanism such as a mechanical arm and a control object, and adding proper noise in the simulation environment to simulate the influence of unstable water flow on the task executing process.
The step S1 specifically includes: in order to avoid the high training cost in a real scene, a MuJoCo simulation environment of a target task is established by taking the real scene as a reference object; because modeling has errors and the influence of unstable water flow is considered, noise needs to be added in a simulation environment to enhance the robustness of a strategy; the noise is mainly added to the joint positions of the joints of the robot arm.
And step S2, the operator operates the mechanical arm in the simulation environment established in the step S1 to complete the target task, and the software records the motion sequence of each joint of the mechanical arm, so that a teaching example is provided for the simulation training in the step S4.
The step S2 specifically includes: as shown in fig. 1, an operator can simply teach a target task in a simulation environment through a mobile phone, a joystick and other devices, and only needs to present a reasonable track without completing the task; the purpose of this step is to provide a simulation example for the subsequent training process, and improve the training efficiency.
Step S3, constructing a network by adopting an Actor-Critic framework, wherein the observation input of the strategy network comprises information such as mechanical arm body motion, force feedback, vision and the like; and outputting the strategy network as joint position information of each joint of the mechanical arm.
In an actual scenario, this information comes from sensors such as motor encoders, force/torque sensors, cameras, etc., whereas in a simulated environment, this information comes from corresponding virtual sensors.
The step S3 specifically includes: the Actor-criticic framework comprises two parts, wherein the Actor part is a policy network, and the criticic part is a value function network.
As shown in fig. 3, the three pieces of information, i.e., the vision, the force feedback, and the mechanical arm body movement, need to be processed to extract useful features, and then be used as observation input of the policy network.
The visual information is mainly an image and is processed through a convolutional neural network; the force feedback information is a series of time sequence signals, and collision is predicted through an LSTM network; the motion information of the mechanical arm body is composed of the motion information of each joint of the mechanical arm, and dimension reduction is carried out through a full-connection network. And outputting the strategy network as joint position instructions of all joints of the mechanical arm to enable the mechanical arm to complete the target task.
And step S4, taking the teaching example in the step S2 as a simulation object, and training the network by applying a reinforcement learning algorithm which is not based on a model, so that the mechanical arm in the simulation environment can complete the target task with a success rate of more than 99%.
The step S4 specifically includes: the reinforcement learning algorithm, which is not model-based, relies on a reward function to achieve the target steering task by continually exploring unknown conditions and utilizing known information. Wherein the reward function isBy imitating rewardsAnd target awardsTwo parts are formed. Reinforcement learning algorithm awards by maximizing objectives in one aspectTo achieve the target task, and on the other hand by maximizing the simulated rewardTo mimic the teaching paradigm as much as possible to avoid unnecessary exploration. In particular, simulated rewardsWherein the content of the first and second substances,showing the joint position of the j-th joint of the robot arm at time t in the teaching example,showing the joint position of the j-th joint of the mechanical arm at the time t in the simulation process, and enabling the joint position of the mechanical arm at each time to be close to the teaching example. The target reward is specific to the task. The criterion of successful training or model convergence is whether the mechanical arm in the simulation environment can complete the target task with a success rate of more than 99%.
And step S5, migrating the trained strategy network to an actual scene, and completing the target task without manual intervention.
The step S5 specifically includes: and migrating the trained strategy network to a processor of an execution end. In an actual scene, the image, the force feedback information and the mechanical arm body motion information are processed and then used as the input of a strategy network, and the network outputs the motion sequence of each joint of the mechanical arm. Human intervention is no longer required during the actual execution of the target task, as shown in fig. 1.
Examples of applications of the invention are listed below:
example 1: application of deep sea fine control task implementation method based on simulation learning in deep sea detection activity
The invention can be applied to deep sea exploration activities, such as sample collection, biological living body detection and the like. At present, due to the technical limitation, the fine manipulation in deep sea is difficult to realize, which results in the difficulty of some detection activities, such as the collection of complex samples, the living body detection of submarine organisms and the like. The technology of the invention can enable the deep-sea executing mechanism to autonomously complete complex operation tasks, thereby expanding the depth and the breadth of the detection activity.
Example 2: application of the deep sea fine manipulation task implementation method based on simulation learning in underwater operation, the method can be applied to underwater operation, such as pipeline laying, marine product picking, underwater equipment installation and the like. Due to environmental restrictions, long-time underwater operation needs to depend on an underwater robot. However, for the delicate control tasks such as assembly and fine fishing, which are rich in contact, a smart control means is required, and the conventional robot control technology has great defects in both dexterity and autonomy. The end-to-end control method provided by the invention does not need human intervention in the execution stage, and has certain robustness and generalization capability, so that the autonomy and the flexibility of underwater operation are improved. Meanwhile, for the operation scene similar to pipeline laying, underwater equipment installation and the like which is relatively single underwater operation, the training process can be carried out once to obtain the strategy suitable for the scene, and the efficiency is high.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.
Claims (3)
1. A realization method of deep sea fine remote control task based on simulation learning is characterized in that,
comprises the following steps of (a) carrying out,
step S1: building a MuJoCo simulation environment of a target task, wherein the MuJoCo simulation environment comprises an executing mechanism such as a mechanical arm and the like and a control object;
step S2: an operator operates the mechanical arm in the simulation environment established in the step S1 to complete a target task, and software records the motion sequence of each joint of the mechanical arm, so as to provide a teaching example for the simulation training in the step S4;
step S3: an Actor-critical framework is adopted to construct a network, the observation input of a strategy network comprises information such as mechanical arm body motion, force feedback, vision and the like, and the output of the strategy network is a joint position instruction of each joint of the mechanical arm;
step S4: and (5) taking the teaching example in the step S2 as a simulation object, and training the network by applying a reinforcement learning algorithm which is not based on a model, so that the mechanical arm in the simulation environment can complete the target task with a success rate of more than 99%.
And step S5, migrating the trained strategy network to an actual scene, and completing the target task without manual intervention.
2. The method for implementing deep sea fine remote control task based on simulation learning of claim 1, wherein the step S1 further comprises adding the noise parameter to each joint position of the mechanical arm in the simulation environment.
3. An application of the realization method of deep sea fine remote control task based on simulation learning is characterized in that the application comprises deep sea exploration or underwater operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110430739.6A CN113119132A (en) | 2021-04-21 | 2021-04-21 | Deep sea fine remote control task implementation method based on simulation learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110430739.6A CN113119132A (en) | 2021-04-21 | 2021-04-21 | Deep sea fine remote control task implementation method based on simulation learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113119132A true CN113119132A (en) | 2021-07-16 |
Family
ID=76778551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110430739.6A Pending CN113119132A (en) | 2021-04-21 | 2021-04-21 | Deep sea fine remote control task implementation method based on simulation learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113119132A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114779661A (en) * | 2022-04-22 | 2022-07-22 | 北京科技大学 | Chemical synthesis robot system based on multi-classification generation confrontation imitation learning algorithm |
CN116394276A (en) * | 2023-06-06 | 2023-07-07 | 帕西尼感知科技(张家港)有限公司 | Sample generation and model training method, device and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107263449A (en) * | 2017-07-05 | 2017-10-20 | 中国科学院自动化研究所 | Robot remote teaching system based on virtual reality |
CN108284436A (en) * | 2018-03-17 | 2018-07-17 | 北京工业大学 | Remote mechanical dual arm system and method with learning by imitation mechanism |
CN110450153A (en) * | 2019-07-08 | 2019-11-15 | 清华大学 | A kind of mechanical arm article active pick-up method based on deeply study |
CN112338921A (en) * | 2020-11-16 | 2021-02-09 | 西华师范大学 | Mechanical arm intelligent control rapid training method based on deep reinforcement learning |
-
2021
- 2021-04-21 CN CN202110430739.6A patent/CN113119132A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107263449A (en) * | 2017-07-05 | 2017-10-20 | 中国科学院自动化研究所 | Robot remote teaching system based on virtual reality |
CN108284436A (en) * | 2018-03-17 | 2018-07-17 | 北京工业大学 | Remote mechanical dual arm system and method with learning by imitation mechanism |
CN110450153A (en) * | 2019-07-08 | 2019-11-15 | 清华大学 | A kind of mechanical arm article active pick-up method based on deeply study |
CN112338921A (en) * | 2020-11-16 | 2021-02-09 | 西华师范大学 | Mechanical arm intelligent control rapid training method based on deep reinforcement learning |
Non-Patent Citations (2)
Title |
---|
卢彬鹏: "基于模仿学习和强化学习的机械臂运动技能获取", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
解永春等: "基于学习的空间机器人在轨服务操作技术", 《空间控制技术与应用》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114779661A (en) * | 2022-04-22 | 2022-07-22 | 北京科技大学 | Chemical synthesis robot system based on multi-classification generation confrontation imitation learning algorithm |
CN116394276A (en) * | 2023-06-06 | 2023-07-07 | 帕西尼感知科技(张家港)有限公司 | Sample generation and model training method, device and system |
CN116394276B (en) * | 2023-06-06 | 2023-08-15 | 帕西尼感知科技(张家港)有限公司 | Sample generation and model training method, device and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111618847B (en) | Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements | |
Simetti et al. | Task priority control of underwater intervention systems: Theory and applications | |
CN113119132A (en) | Deep sea fine remote control task implementation method based on simulation learning | |
CN110989639B (en) | Underwater vehicle formation control method based on stress matrix | |
Meger et al. | Learning legged swimming gaits from experience | |
Vuković et al. | Trajectory learning and reproduction for differential drive mobile robots based on GMM/HMM and dynamic time warping using learning from demonstration framework | |
CN112809689B (en) | Language-guidance-based mechanical arm action element simulation learning method and storage medium | |
Carrera et al. | Towards autonomous robotic valve turning | |
CN111152227A (en) | Mechanical arm control method based on guided DQN control | |
Liu et al. | A swarm of unmanned vehicles in the shallow ocean: A survey | |
Palnitkar et al. | Chatsim: Underwater simulation with natural language prompting | |
Pan et al. | Learning for depth control of a robotic penguin: A data-driven model predictive control approach | |
CN117549310A (en) | General system of intelligent robot with body, construction method and use method | |
Richard et al. | How to train your heron | |
Peng et al. | Moving object grasping method of mechanical arm based on deep deterministic policy gradient and hindsight experience replay | |
Bonin-Font et al. | ARSEA: a virtual reality subsea exploration assistant | |
CN113927593B (en) | Mechanical arm operation skill learning method based on task decomposition | |
Fornas et al. | Fitting primitive shapes in point clouds: a practical approach to improve autonomous underwater grasp specification of unknown objects | |
Yu et al. | Neural-Dynamics-Based Path Planning of a Bionic Robotic Fish | |
Fu et al. | A review of formation control methods for MAUV systems | |
Goel et al. | Leveraging Competitive Robotics Experience to Spread Marine Education | |
Boje et al. | A review of robotics research in South Africa | |
Giovannangeli et al. | Human-robot interactions as a cognitive catalyst for the learning of behavioral attractors | |
Guan | 3D locomotion biomimetic robot fish with haptic feedback | |
Watanabe et al. | Efficient body babbling for robot's drawing motion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210716 |
|
RJ01 | Rejection of invention patent application after publication |