CN113119132A

CN113119132A - Deep sea fine remote control task implementation method based on simulation learning

Info

Publication number: CN113119132A
Application number: CN202110430739.6A
Authority: CN
Inventors: 李铁风; 范耀威
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2021-04-21
Filing date: 2021-04-21
Publication date: 2021-07-16

Abstract

The invention relates to a deep sea fine remote control task implementation method based on simulation learning. The method for realizing the deep sea fine manipulation task based on the simulation learning provided by the invention is characterized in that virtual manipulation data is used as a teaching example, a model-free reinforcement learning algorithm is applied, and a manipulation strategy which can be migrated to an actual scene and has certain robustness and generalization capability is trained in a simulation environment, so that the technical difficulty of deep sea fine remote manipulation is reduced, and the intelligent degree of deep sea operation is improved.

Description

Deep sea fine remote control task implementation method based on simulation learning

Technical Field

The invention relates to the technical field of robot control and machine learning, in particular to an intelligent control method, and specifically relates to a deep sea fine remote control task implementation method based on simulation learning.

Background

With the continuous development of marine equipment and technology, people's exploration for deep sea enters a more profound stage. Autonomous underwater vehicles, deep sea motorized robotic arms, underwater dexterous hands, etc., are expected to utilize these tools to perform detailed manipulations, e.g., building, assembling, etc., on the ocean floor, similar to those on land, to better explore the ocean. Under the complex and severe working environment, teleoperation is still the most important means for realizing deep sea operation at present, namely an operator controls a mechanical arm to operate in a cockpit or on shore by operating a main hand. However, for some touch-rich fine-manipulation tasks, even experienced operators need to make repeated attempts before they can be completed, subject to water flow, light, etc. In addition, the delay in signal transmission also causes great difficulty in operation.

Deep sea teleoperation is an empirical technique and the operator needs to accumulate experience through a large number of practical simulations. However, deep sea robots and associated equipment are expensive to manufacture and maintain and frequent practice is impractical. In actual operation, the cost of sea and the risk of failure need to be considered, and in order to reduce the failure probability, a new hand has a chance to participate. These factors all contribute to the high cost of cultivating an experienced deep sea operator.

The invention provides a deep sea fine manipulation task implementation method based on simulation learning, which is an end-to-end method and aims to reduce the technical threshold of deep sea fine remote manipulation and save the high talent culture cost.

Disclosure of Invention

The invention aims to provide a deep sea fine manipulation task implementation method based on simulation learning, which is characterized in that virtual manipulation data is used as a teaching example, a model-free reinforcement learning algorithm is applied, and a manipulation strategy which can be migrated to an actual scene and has certain robustness and generalization capability is trained in a simulation environment, so that the technical difficulty of deep sea fine remote manipulation is reduced, and the intelligent degree of deep sea operation is improved.

The invention provides a deep sea fine remote control task implementation method based on simulation learning, which is characterized by comprising the following steps,

step S1: building a MuJoCo simulation environment of a target task, wherein the MuJoCo simulation environment comprises an executing mechanism such as a mechanical arm and the like and a control object;

step S2: an operator operates the mechanical arm in the simulation environment established in the step S1 to complete a target task, and software records the motion sequence of each joint of the mechanical arm, so as to provide a teaching example for the simulation training in the step S4;

step S3: an Actor-critical framework is adopted to construct a network, the observation input of a strategy network comprises information such as mechanical arm body motion, force feedback, vision and the like, and the output of the strategy network is a joint position instruction of each joint of the mechanical arm;

step S4: and (5) taking the teaching example in the step S2 as a simulation object, and training the network by applying a reinforcement learning algorithm which is not based on a model, so that the mechanical arm in the simulation environment can complete the target task with a success rate of more than 99%.

And step S5, migrating the trained strategy network to an actual scene, and completing the target task without manual intervention.

Further, the step S1 further includes adding noise parameters to the simulation environment, where the noise parameters are added to the positions of the joints of the mechanical arm.

Compared with the prior art, the beneficial effects of the invention are embodied in the following aspects:

the invention applies the simulation learning or the reinforcement learning to the deep sea fine manipulation task for the first time, is a new teleoperation means facing to the rich contact manipulation task in the extreme environment, improves the intelligent degree of similar operation to a great extent, and reduces the on-site operation difficulty. For a target task with a fixed scene, the establishment and the teaching of the simulation environment can be performed in advance to reduce the workload.

The invention adopts an end-to-end method and takes a neural network as a carrier to learn in a simulation environment to obtain a strategy with stronger robustness and generalization capability. The better robustness and generalization capability enable a finer touch-rich steering task to be achieved with this method and tolerate some disturbances and deviations, whereas traditional direct teleoperation relies entirely on the operator's steering level, without this advantage.

The method provided by the invention does not need real-time communication between the master end and the slave end, and can realize the unmanned intervention operation process after the trained strategy network is deployed to the slave end, thereby avoiding the time delay problem in the signal transmission process. Different from the realization that only the target task is concerned by a reinforcement learning algorithm which is not based on a model, the method takes the teaching example in the simulation environment as the simulation object to train the strategy, and introduces the simulation reward item into the reward function, so that the explored observation space and action space are concentrated on the teaching example, the training cost is reduced, and the success rate of realizing the target task is improved.

Drawings

FIG. 1 is a schematic diagram comparing an end-to-end implementation method of deep sea fine remote control task based on simulation learning with a traditional direct remote control method;

FIG. 2 is a flow chart of an end-to-end implementation method of the deep sea fine remote control task based on the simulation learning of the present invention;

fig. 3 is a schematic input/output diagram of the policy network according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the field of deep sea exploration, the traditional teleoperation technology has many limitations. Firstly, teleoperation depends on experience and technology and psychological quality of operators to a great extent, and is high in cost and poor in reliability; secondly, for the fine control task with rich contact, the control difficulty is very large and the target task is often difficult to realize due to the influence of factors such as water flow, light, communication delay and the like. The deep sea fine control task implementation method based on simulation learning provided by the invention overcomes the limitations and improves the operation intelligence.

As shown in fig. 2, the implementation method of deep sea fine teleoperation task based on simulation learning proposed by the present invention includes the following five steps:

and step S1, building a MuJoCo simulation environment of the target task, wherein the MuJoCo simulation environment comprises an executing mechanism such as a mechanical arm and a control object, and adding proper noise in the simulation environment to simulate the influence of unstable water flow on the task executing process.

The step S1 specifically includes: in order to avoid the high training cost in a real scene, a MuJoCo simulation environment of a target task is established by taking the real scene as a reference object; because modeling has errors and the influence of unstable water flow is considered, noise needs to be added in a simulation environment to enhance the robustness of a strategy; the noise is mainly added to the joint positions of the joints of the robot arm.

And step S2, the operator operates the mechanical arm in the simulation environment established in the step S1 to complete the target task, and the software records the motion sequence of each joint of the mechanical arm, so that a teaching example is provided for the simulation training in the step S4.

The step S2 specifically includes: as shown in fig. 1, an operator can simply teach a target task in a simulation environment through a mobile phone, a joystick and other devices, and only needs to present a reasonable track without completing the task; the purpose of this step is to provide a simulation example for the subsequent training process, and improve the training efficiency.

Step S3, constructing a network by adopting an Actor-Critic framework, wherein the observation input of the strategy network comprises information such as mechanical arm body motion, force feedback, vision and the like; and outputting the strategy network as joint position information of each joint of the mechanical arm.

In an actual scenario, this information comes from sensors such as motor encoders, force/torque sensors, cameras, etc., whereas in a simulated environment, this information comes from corresponding virtual sensors.

The step S3 specifically includes: the Actor-criticic framework comprises two parts, wherein the Actor part is a policy network, and the criticic part is a value function network.

As shown in fig. 3, the three pieces of information, i.e., the vision, the force feedback, and the mechanical arm body movement, need to be processed to extract useful features, and then be used as observation input of the policy network.

The visual information is mainly an image and is processed through a convolutional neural network; the force feedback information is a series of time sequence signals, and collision is predicted through an LSTM network; the motion information of the mechanical arm body is composed of the motion information of each joint of the mechanical arm, and dimension reduction is carried out through a full-connection network. And outputting the strategy network as joint position instructions of all joints of the mechanical arm to enable the mechanical arm to complete the target task.

And step S4, taking the teaching example in the step S2 as a simulation object, and training the network by applying a reinforcement learning algorithm which is not based on a model, so that the mechanical arm in the simulation environment can complete the target task with a success rate of more than 99%.

The step S4 specifically includes: the reinforcement learning algorithm, which is not model-based, relies on a reward function to achieve the target steering task by continually exploring unknown conditions and utilizing known information. Wherein the reward function is

By imitating rewards

And target awards

Two parts are formed. Reinforcement learning algorithm awards by maximizing objectives in one aspect

To achieve the target task, and on the other hand by maximizing the simulated reward

To mimic the teaching paradigm as much as possible to avoid unnecessary exploration. In particular, simulated rewards

Wherein the content of the first and second substances,

showing the joint position of the j-th joint of the robot arm at time t in the teaching example,

showing the joint position of the j-th joint of the mechanical arm at the time t in the simulation process, and enabling the joint position of the mechanical arm at each time to be close to the teaching example. The target reward is specific to the task. The criterion of successful training or model convergence is whether the mechanical arm in the simulation environment can complete the target task with a success rate of more than 99%.

The step S5 specifically includes: and migrating the trained strategy network to a processor of an execution end. In an actual scene, the image, the force feedback information and the mechanical arm body motion information are processed and then used as the input of a strategy network, and the network outputs the motion sequence of each joint of the mechanical arm. Human intervention is no longer required during the actual execution of the target task, as shown in fig. 1.

Examples of applications of the invention are listed below:

example 1: application of deep sea fine control task implementation method based on simulation learning in deep sea detection activity

The invention can be applied to deep sea exploration activities, such as sample collection, biological living body detection and the like. At present, due to the technical limitation, the fine manipulation in deep sea is difficult to realize, which results in the difficulty of some detection activities, such as the collection of complex samples, the living body detection of submarine organisms and the like. The technology of the invention can enable the deep-sea executing mechanism to autonomously complete complex operation tasks, thereby expanding the depth and the breadth of the detection activity.

Example 2: application of the deep sea fine manipulation task implementation method based on simulation learning in underwater operation, the method can be applied to underwater operation, such as pipeline laying, marine product picking, underwater equipment installation and the like. Due to environmental restrictions, long-time underwater operation needs to depend on an underwater robot. However, for the delicate control tasks such as assembly and fine fishing, which are rich in contact, a smart control means is required, and the conventional robot control technology has great defects in both dexterity and autonomy. The end-to-end control method provided by the invention does not need human intervention in the execution stage, and has certain robustness and generalization capability, so that the autonomy and the flexibility of underwater operation are improved. Meanwhile, for the operation scene similar to pipeline laying, underwater equipment installation and the like which is relatively single underwater operation, the training process can be carried out once to obtain the strategy suitable for the scene, and the efficiency is high.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A realization method of deep sea fine remote control task based on simulation learning is characterized in that,

comprises the following steps of (a) carrying out,

2. The method for implementing deep sea fine remote control task based on simulation learning of claim 1, wherein the step S1 further comprises adding the noise parameter to each joint position of the mechanical arm in the simulation environment.

3. An application of the realization method of deep sea fine remote control task based on simulation learning is characterized in that the application comprises deep sea exploration or underwater operation.