CN113119132A - Deep sea fine remote control task implementation method based on simulation learning - Google Patents

Deep sea fine remote control task implementation method based on simulation learning Download PDF

Info

Publication number
CN113119132A
CN113119132A CN202110430739.6A CN202110430739A CN113119132A CN 113119132 A CN113119132 A CN 113119132A CN 202110430739 A CN202110430739 A CN 202110430739A CN 113119132 A CN113119132 A CN 113119132A
Authority
CN
China
Prior art keywords
simulation
deep sea
mechanical arm
remote control
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110430739.6A
Other languages
Chinese (zh)
Inventor
李铁风
范耀威
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110430739.6A priority Critical patent/CN113119132A/en
Publication of CN113119132A publication Critical patent/CN113119132A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/0081Programme-controlled manipulators with master teach-in means

Abstract

The invention relates to a deep sea fine remote control task implementation method based on simulation learning. The method for realizing the deep sea fine manipulation task based on the simulation learning provided by the invention is characterized in that virtual manipulation data is used as a teaching example, a model-free reinforcement learning algorithm is applied, and a manipulation strategy which can be migrated to an actual scene and has certain robustness and generalization capability is trained in a simulation environment, so that the technical difficulty of deep sea fine remote manipulation is reduced, and the intelligent degree of deep sea operation is improved.

Description

Deep sea fine remote control task implementation method based on simulation learning
Technical Field
The invention relates to the technical field of robot control and machine learning, in particular to an intelligent control method, and specifically relates to a deep sea fine remote control task implementation method based on simulation learning.
Background
With the continuous development of marine equipment and technology, people's exploration for deep sea enters a more profound stage. Autonomous underwater vehicles, deep sea motorized robotic arms, underwater dexterous hands, etc., are expected to utilize these tools to perform detailed manipulations, e.g., building, assembling, etc., on the ocean floor, similar to those on land, to better explore the ocean. Under the complex and severe working environment, teleoperation is still the most important means for realizing deep sea operation at present, namely an operator controls a mechanical arm to operate in a cockpit or on shore by operating a main hand. However, for some touch-rich fine-manipulation tasks, even experienced operators need to make repeated attempts before they can be completed, subject to water flow, light, etc. In addition, the delay in signal transmission also causes great difficulty in operation.
Deep sea teleoperation is an empirical technique and the operator needs to accumulate experience through a large number of practical simulations. However, deep sea robots and associated equipment are expensive to manufacture and maintain and frequent practice is impractical. In actual operation, the cost of sea and the risk of failure need to be considered, and in order to reduce the failure probability, a new hand has a chance to participate. These factors all contribute to the high cost of cultivating an experienced deep sea operator.
The invention provides a deep sea fine manipulation task implementation method based on simulation learning, which is an end-to-end method and aims to reduce the technical threshold of deep sea fine remote manipulation and save the high talent culture cost.
Disclosure of Invention
The invention aims to provide a deep sea fine manipulation task implementation method based on simulation learning, which is characterized in that virtual manipulation data is used as a teaching example, a model-free reinforcement learning algorithm is applied, and a manipulation strategy which can be migrated to an actual scene and has certain robustness and generalization capability is trained in a simulation environment, so that the technical difficulty of deep sea fine remote manipulation is reduced, and the intelligent degree of deep sea operation is improved.
The invention provides a deep sea fine remote control task implementation method based on simulation learning, which is characterized by comprising the following steps,
step S1: building a MuJoCo simulation environment of a target task, wherein the MuJoCo simulation environment comprises an executing mechanism such as a mechanical arm and the like and a control object;
step S2: an operator operates the mechanical arm in the simulation environment established in the step S1 to complete a target task, and software records the motion sequence of each joint of the mechanical arm, so as to provide a teaching example for the simulation training in the step S4;
step S3: an Actor-critical framework is adopted to construct a network, the observation input of a strategy network comprises information such as mechanical arm body motion, force feedback, vision and the like, and the output of the strategy network is a joint position instruction of each joint of the mechanical arm;
step S4: and (5) taking the teaching example in the step S2 as a simulation object, and training the network by applying a reinforcement learning algorithm which is not based on a model, so that the mechanical arm in the simulation environment can complete the target task with a success rate of more than 99%.
And step S5, migrating the trained strategy network to an actual scene, and completing the target task without manual intervention.
Further, the step S1 further includes adding noise parameters to the simulation environment, where the noise parameters are added to the positions of the joints of the mechanical arm.
Compared with the prior art, the beneficial effects of the invention are embodied in the following aspects:
the invention applies the simulation learning or the reinforcement learning to the deep sea fine manipulation task for the first time, is a new teleoperation means facing to the rich contact manipulation task in the extreme environment, improves the intelligent degree of similar operation to a great extent, and reduces the on-site operation difficulty. For a target task with a fixed scene, the establishment and the teaching of the simulation environment can be performed in advance to reduce the workload.
The invention adopts an end-to-end method and takes a neural network as a carrier to learn in a simulation environment to obtain a strategy with stronger robustness and generalization capability. The better robustness and generalization capability enable a finer touch-rich steering task to be achieved with this method and tolerate some disturbances and deviations, whereas traditional direct teleoperation relies entirely on the operator's steering level, without this advantage.
The method provided by the invention does not need real-time communication between the master end and the slave end, and can realize the unmanned intervention operation process after the trained strategy network is deployed to the slave end, thereby avoiding the time delay problem in the signal transmission process. Different from the realization that only the target task is concerned by a reinforcement learning algorithm which is not based on a model, the method takes the teaching example in the simulation environment as the simulation object to train the strategy, and introduces the simulation reward item into the reward function, so that the explored observation space and action space are concentrated on the teaching example, the training cost is reduced, and the success rate of realizing the target task is improved.
Drawings
FIG. 1 is a schematic diagram comparing an end-to-end implementation method of deep sea fine remote control task based on simulation learning with a traditional direct remote control method;
FIG. 2 is a flow chart of an end-to-end implementation method of the deep sea fine remote control task based on the simulation learning of the present invention;
fig. 3 is a schematic input/output diagram of the policy network according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the field of deep sea exploration, the traditional teleoperation technology has many limitations. Firstly, teleoperation depends on experience and technology and psychological quality of operators to a great extent, and is high in cost and poor in reliability; secondly, for the fine control task with rich contact, the control difficulty is very large and the target task is often difficult to realize due to the influence of factors such as water flow, light, communication delay and the like. The deep sea fine control task implementation method based on simulation learning provided by the invention overcomes the limitations and improves the operation intelligence.
As shown in fig. 2, the implementation method of deep sea fine teleoperation task based on simulation learning proposed by the present invention includes the following five steps:
and step S1, building a MuJoCo simulation environment of the target task, wherein the MuJoCo simulation environment comprises an executing mechanism such as a mechanical arm and a control object, and adding proper noise in the simulation environment to simulate the influence of unstable water flow on the task executing process.
The step S1 specifically includes: in order to avoid the high training cost in a real scene, a MuJoCo simulation environment of a target task is established by taking the real scene as a reference object; because modeling has errors and the influence of unstable water flow is considered, noise needs to be added in a simulation environment to enhance the robustness of a strategy; the noise is mainly added to the joint positions of the joints of the robot arm.
And step S2, the operator operates the mechanical arm in the simulation environment established in the step S1 to complete the target task, and the software records the motion sequence of each joint of the mechanical arm, so that a teaching example is provided for the simulation training in the step S4.
The step S2 specifically includes: as shown in fig. 1, an operator can simply teach a target task in a simulation environment through a mobile phone, a joystick and other devices, and only needs to present a reasonable track without completing the task; the purpose of this step is to provide a simulation example for the subsequent training process, and improve the training efficiency.
Step S3, constructing a network by adopting an Actor-Critic framework, wherein the observation input of the strategy network comprises information such as mechanical arm body motion, force feedback, vision and the like; and outputting the strategy network as joint position information of each joint of the mechanical arm.
In an actual scenario, this information comes from sensors such as motor encoders, force/torque sensors, cameras, etc., whereas in a simulated environment, this information comes from corresponding virtual sensors.
The step S3 specifically includes: the Actor-criticic framework comprises two parts, wherein the Actor part is a policy network, and the criticic part is a value function network.
As shown in fig. 3, the three pieces of information, i.e., the vision, the force feedback, and the mechanical arm body movement, need to be processed to extract useful features, and then be used as observation input of the policy network.
The visual information is mainly an image and is processed through a convolutional neural network; the force feedback information is a series of time sequence signals, and collision is predicted through an LSTM network; the motion information of the mechanical arm body is composed of the motion information of each joint of the mechanical arm, and dimension reduction is carried out through a full-connection network. And outputting the strategy network as joint position instructions of all joints of the mechanical arm to enable the mechanical arm to complete the target task.
And step S4, taking the teaching example in the step S2 as a simulation object, and training the network by applying a reinforcement learning algorithm which is not based on a model, so that the mechanical arm in the simulation environment can complete the target task with a success rate of more than 99%.
The step S4 specifically includes: the reinforcement learning algorithm, which is not model-based, relies on a reward function to achieve the target steering task by continually exploring unknown conditions and utilizing known information. Wherein the reward function is
Figure BDA0003031250730000041
By imitating rewards
Figure BDA0003031250730000042
And target awards
Figure BDA0003031250730000043
Two parts are formed. Reinforcement learning algorithm awards by maximizing objectives in one aspect
Figure BDA0003031250730000044
To achieve the target task, and on the other hand by maximizing the simulated reward
Figure BDA0003031250730000045
To mimic the teaching paradigm as much as possible to avoid unnecessary exploration. In particular, simulated rewards
Figure BDA0003031250730000046
Wherein the content of the first and second substances,
Figure BDA0003031250730000047
showing the joint position of the j-th joint of the robot arm at time t in the teaching example,
Figure BDA0003031250730000048
showing the joint position of the j-th joint of the mechanical arm at the time t in the simulation process, and enabling the joint position of the mechanical arm at each time to be close to the teaching example. The target reward is specific to the task. The criterion of successful training or model convergence is whether the mechanical arm in the simulation environment can complete the target task with a success rate of more than 99%.
And step S5, migrating the trained strategy network to an actual scene, and completing the target task without manual intervention.
The step S5 specifically includes: and migrating the trained strategy network to a processor of an execution end. In an actual scene, the image, the force feedback information and the mechanical arm body motion information are processed and then used as the input of a strategy network, and the network outputs the motion sequence of each joint of the mechanical arm. Human intervention is no longer required during the actual execution of the target task, as shown in fig. 1.
Examples of applications of the invention are listed below:
example 1: application of deep sea fine control task implementation method based on simulation learning in deep sea detection activity
The invention can be applied to deep sea exploration activities, such as sample collection, biological living body detection and the like. At present, due to the technical limitation, the fine manipulation in deep sea is difficult to realize, which results in the difficulty of some detection activities, such as the collection of complex samples, the living body detection of submarine organisms and the like. The technology of the invention can enable the deep-sea executing mechanism to autonomously complete complex operation tasks, thereby expanding the depth and the breadth of the detection activity.
Example 2: application of the deep sea fine manipulation task implementation method based on simulation learning in underwater operation, the method can be applied to underwater operation, such as pipeline laying, marine product picking, underwater equipment installation and the like. Due to environmental restrictions, long-time underwater operation needs to depend on an underwater robot. However, for the delicate control tasks such as assembly and fine fishing, which are rich in contact, a smart control means is required, and the conventional robot control technology has great defects in both dexterity and autonomy. The end-to-end control method provided by the invention does not need human intervention in the execution stage, and has certain robustness and generalization capability, so that the autonomy and the flexibility of underwater operation are improved. Meanwhile, for the operation scene similar to pipeline laying, underwater equipment installation and the like which is relatively single underwater operation, the training process can be carried out once to obtain the strategy suitable for the scene, and the efficiency is high.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (3)

1. A realization method of deep sea fine remote control task based on simulation learning is characterized in that,
comprises the following steps of (a) carrying out,
step S1: building a MuJoCo simulation environment of a target task, wherein the MuJoCo simulation environment comprises an executing mechanism such as a mechanical arm and the like and a control object;
step S2: an operator operates the mechanical arm in the simulation environment established in the step S1 to complete a target task, and software records the motion sequence of each joint of the mechanical arm, so as to provide a teaching example for the simulation training in the step S4;
step S3: an Actor-critical framework is adopted to construct a network, the observation input of a strategy network comprises information such as mechanical arm body motion, force feedback, vision and the like, and the output of the strategy network is a joint position instruction of each joint of the mechanical arm;
step S4: and (5) taking the teaching example in the step S2 as a simulation object, and training the network by applying a reinforcement learning algorithm which is not based on a model, so that the mechanical arm in the simulation environment can complete the target task with a success rate of more than 99%.
And step S5, migrating the trained strategy network to an actual scene, and completing the target task without manual intervention.
2. The method for implementing deep sea fine remote control task based on simulation learning of claim 1, wherein the step S1 further comprises adding the noise parameter to each joint position of the mechanical arm in the simulation environment.
3. An application of the realization method of deep sea fine remote control task based on simulation learning is characterized in that the application comprises deep sea exploration or underwater operation.
CN202110430739.6A 2021-04-21 2021-04-21 Deep sea fine remote control task implementation method based on simulation learning Pending CN113119132A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110430739.6A CN113119132A (en) 2021-04-21 2021-04-21 Deep sea fine remote control task implementation method based on simulation learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110430739.6A CN113119132A (en) 2021-04-21 2021-04-21 Deep sea fine remote control task implementation method based on simulation learning

Publications (1)

Publication Number Publication Date
CN113119132A true CN113119132A (en) 2021-07-16

Family

ID=76778551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110430739.6A Pending CN113119132A (en) 2021-04-21 2021-04-21 Deep sea fine remote control task implementation method based on simulation learning

Country Status (1)

Country Link
CN (1) CN113119132A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114779661A (en) * 2022-04-22 2022-07-22 北京科技大学 Chemical synthesis robot system based on multi-classification generation confrontation imitation learning algorithm
CN116394276A (en) * 2023-06-06 2023-07-07 帕西尼感知科技(张家港)有限公司 Sample generation and model training method, device and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107263449A (en) * 2017-07-05 2017-10-20 中国科学院自动化研究所 Robot remote teaching system based on virtual reality
CN108284436A (en) * 2018-03-17 2018-07-17 北京工业大学 Remote mechanical dual arm system and method with learning by imitation mechanism
CN110450153A (en) * 2019-07-08 2019-11-15 清华大学 A kind of mechanical arm article active pick-up method based on deeply study
CN112338921A (en) * 2020-11-16 2021-02-09 西华师范大学 Mechanical arm intelligent control rapid training method based on deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107263449A (en) * 2017-07-05 2017-10-20 中国科学院自动化研究所 Robot remote teaching system based on virtual reality
CN108284436A (en) * 2018-03-17 2018-07-17 北京工业大学 Remote mechanical dual arm system and method with learning by imitation mechanism
CN110450153A (en) * 2019-07-08 2019-11-15 清华大学 A kind of mechanical arm article active pick-up method based on deeply study
CN112338921A (en) * 2020-11-16 2021-02-09 西华师范大学 Mechanical arm intelligent control rapid training method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
卢彬鹏: "基于模仿学习和强化学习的机械臂运动技能获取", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
解永春等: "基于学习的空间机器人在轨服务操作技术", 《空间控制技术与应用》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114779661A (en) * 2022-04-22 2022-07-22 北京科技大学 Chemical synthesis robot system based on multi-classification generation confrontation imitation learning algorithm
CN116394276A (en) * 2023-06-06 2023-07-07 帕西尼感知科技(张家港)有限公司 Sample generation and model training method, device and system
CN116394276B (en) * 2023-06-06 2023-08-15 帕西尼感知科技(张家港)有限公司 Sample generation and model training method, device and system

Similar Documents

Publication Publication Date Title
CN111618847B (en) Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements
Simetti et al. Task priority control of underwater intervention systems: Theory and applications
CN113119132A (en) Deep sea fine remote control task implementation method based on simulation learning
CN110989639B (en) Underwater vehicle formation control method based on stress matrix
Meger et al. Learning legged swimming gaits from experience
Vuković et al. Trajectory learning and reproduction for differential drive mobile robots based on GMM/HMM and dynamic time warping using learning from demonstration framework
CN112809689B (en) Language-guidance-based mechanical arm action element simulation learning method and storage medium
Carrera et al. Towards autonomous robotic valve turning
CN111152227A (en) Mechanical arm control method based on guided DQN control
Liu et al. A swarm of unmanned vehicles in the shallow ocean: A survey
Palnitkar et al. Chatsim: Underwater simulation with natural language prompting
Pan et al. Learning for depth control of a robotic penguin: A data-driven model predictive control approach
CN117549310A (en) General system of intelligent robot with body, construction method and use method
Richard et al. How to train your heron
Peng et al. Moving object grasping method of mechanical arm based on deep deterministic policy gradient and hindsight experience replay
Bonin-Font et al. ARSEA: a virtual reality subsea exploration assistant
CN113927593B (en) Mechanical arm operation skill learning method based on task decomposition
Fornas et al. Fitting primitive shapes in point clouds: a practical approach to improve autonomous underwater grasp specification of unknown objects
Yu et al. Neural-Dynamics-Based Path Planning of a Bionic Robotic Fish
Fu et al. A review of formation control methods for MAUV systems
Goel et al. Leveraging Competitive Robotics Experience to Spread Marine Education
Boje et al. A review of robotics research in South Africa
Giovannangeli et al. Human-robot interactions as a cognitive catalyst for the learning of behavioral attractors
Guan 3D locomotion biomimetic robot fish with haptic feedback
Watanabe et al. Efficient body babbling for robot's drawing motion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210716

RJ01 Rejection of invention patent application after publication