CN113524173A

CN113524173A - End-to-end intelligent capture method for extraterrestrial detection sample

Info

Publication number: CN113524173A
Application number: CN202110674012.2A
Authority: CN
Inventors: 黄煌; 高锡珍; 汤亮; 刘昊; 谢心如; 刘乃龙
Original assignee: Beijing Institute of Control Engineering
Current assignee: Beijing Institute of Control Engineering
Priority date: 2021-06-17
Filing date: 2021-06-17
Publication date: 2021-10-22
Anticipated expiration: 2041-06-17
Also published as: CN113524173B

Abstract

An end-to-end intelligent capture method for an extra-terrestrial detection sample is characterized in that a digital-physical test is carried out in a mode of firstly carrying out digital training and then carrying out a physical test; the method comprises the following steps: a sample collection method based on reinforcement learning is designed, then a sample collection digital simulation training environment is constructed to train a model, and finally the model is transferred to a physical environment to be verified, and the result shows that the model can be used for capturing objects with unknown and irregular geometric shapes with high success rate, and the success of an extraterrestrial sampling task is guaranteed.

Description

End-to-end intelligent capture method for extraterrestrial detection sample

Technical Field

The invention relates to an end-to-end intelligent capture method for an extraterrestrial exploration sample, and belongs to the technical field of aerospace.

Background

Extraterrestrial exploration is an important means for exploring universe origin and astrology evolution and peacefully developing universe resources by human beings, and is a main development direction in the future in the world aerospace field. Extraterrestrial exploration gradually extends from the nearest moon to more and more distant planets such as mars and asteroids, and the exploration mode gradually progresses from glancing and flying, to landing inspection and sampling return. The collection of the extraterrestrial detection sample is a core key link of sampling return, and has important scientific value and engineering significance.

At present, the collection of the extraterrestrial detection samples is carried out on-site sampling mainly through a mechanical arm, laser or drilling equipment, but at present, the extraterrestrial detection samples are still dependent on ground instructions or human-in-loop operation modes, and the extraterrestrial detection samples are difficult to independently perform various complex detection tasks under unknown change environments. Meanwhile, the characteristic problems that detection of the irregular unknown object is long in time consumption and large in deviation, and the grabbing pose of the irregular object is difficult to accurately describe and depends on manual setting exist in the process of conducting out-of-ground sampling based on the traditional method. Under the background of a new generation of artificial intelligence, the implantation of the artificial intelligence technology is an extremely effective way for improving the sampling autonomy of the extraterrestrial detector.

Disclosure of Invention

The invention aims to: in order to solve the problem of extraterrestrial detection sample acquisition, an end-to-end extraterrestrial detection sample intelligent grabbing method is provided, sample acquisition in Mars detection is used as an application background, learning training of a digital-physical integrated grabbing, analyzing and boxing full process is carried out, and full-autonomous target finding, grabbing and fine operation are achieved.

The purpose of the invention is realized by the following technical scheme:

an end-to-end intelligent capture method for an extraterrestrial exploration sample comprises the following steps:

selecting a reinforcement learning method;

constructing an extraterrestrial exploration sample acquisition simulation training environment;

performing digital training in the constructed simulation training environment to obtain a grabbing model;

and transferring the obtained grabbing model to an extraterrestrial exploration sample grabbing physical experiment system, and carrying out an extraterrestrial exploration sample acquisition physical experiment based on reinforcement learning, so as to finish the intelligent grabbing of the end-to-end extraterrestrial exploration sample.

Further, a near-end strategy optimization method PPO is adopted as a selected reinforcement learning method.

Furthermore, a multi-platform robot simulation software Webots is adopted to construct an extraterrestrial exploration sample acquisition simulation training environment.

Further, when an extraterrestrial detection sample acquisition simulation training environment is established, a target mechanical arm, a paw, a camera, a target object, a box and a desktop model are established;

the paw is arranged at the front end of the target mechanical arm and used for grabbing a target object on the desktop;

the camera is arranged above the desktop and used for observing a target object to be grabbed;

the box is used for placing the target object after the paw grabs the target object.

Further, the performing digital training specifically includes: by designing a reward function and a network structure, a deep neural network is trained, RGB-D images obtained through a camera are input, and the optimal grabbing pose under a corresponding image coordinate system is output.

Further, the reward function is as follows:

in the near-end strategy optimization method PPO, dense neural network DenseneET is adopted for both executing the network Actor and evaluating the network criticic, and the specific parameters are as follows: a DenseNet-121 network is selected, and 121 layers comprise an initialization layer, a dense connection layer, a transition layer and a full connection layer.

Further, the training process comprises the following steps:

(1) according to the current article grabbing environment state, the mechanical arm selects and executes grabbing actions according to an initial grabbing strategy; the initial grabbing strategy is obtained according to the selected reinforcement learning method;

(2) after the grabbing action is executed, the grabbing environment is transferred to a new state, and corresponding action rewards are obtained through a reward function;

(3) repeating the process until all the objects in the training environment are successfully grabbed;

(4) and obtaining a deep neural network model, namely a grabbing model.

Further, the extraterrestrial detection sample grabbing physical experiment system comprises a target mechanical arm, a paw, a camera, a target object, a box and a table;

the box is used for placing a target object after the target object is grabbed by the paw;

the step of transferring the obtained grabbing model to an extraterrestrial exploration sample grabbing physical experiment system is to establish a one-to-one correspondence relationship between grabbing poses in a simulation environment and grabbing poses in a physical experiment environment;

in a physical test environment, the position and pose of the camera relative to a mechanical arm base coordinate system are solved by using a calibration plate, and the grabbing position and pose in the simulation environment are converted to be under the mechanical arm base coordinate system, so that the mechanical arm is controlled to finish sample grabbing.

Further, the sample collection physical test based on reinforcement learning for extraterrestrial exploration specifically comprises: the neural network parameters obtained by training are tested and verified in the migration physical environment, and the mechanical arm continuously updates the grabbing model through continuous interaction with the environment, so that continuous learning is realized, and the success rate of sample collection is improved.

Furthermore, the invention also provides an intelligent selection system for the capture pose of the extraterrestrial detection sample, which comprises the following steps:

the reinforcement learning method determination module: selecting a near-end strategy optimization method PPO as a reinforcement learning method;

the simulation training environment construction module: constructing an extraterrestrial exploration sample acquisition simulation training environment by adopting multi-platform robot simulation software Webots; establishing a target mechanical arm, a paw, a camera, a target object, a box and a desktop model; the paw is arranged at the front end of the target mechanical arm and used for grabbing a target object on the desktop; the camera is arranged above the desktop and used for observing a target object to be grabbed; the box is used for placing a target object after the target object is grabbed by the paw;

a training module: in the constructed simulation training environment, carrying out digital training to obtain a grabbing model, which specifically comprises the following steps: training a deep neural network by designing a reward function and a network structure, inputting an RGB-D image obtained by a camera, and outputting a corresponding optimal grabbing pose;

the reward function is as follows:

in PPO, dense neural network DenseNet is adopted for executing network Actor and evaluating network criticic, and the specific parameters are as follows: selecting a DenseNet-121 network, wherein the 121 layer comprises an initialization layer, a dense connection layer, a transition layer and a full connection layer;

a test verification module: and transferring the obtained grabbing model to an extraterrestrial exploration sample grabbing physical experiment system, and carrying out an extraterrestrial exploration sample acquisition physical experiment based on reinforcement learning, so as to finish the end-to-end extraterrestrial exploration sample grabbing pose intelligent selection.

Compared with the prior art, the invention has the following beneficial effects:

(1) according to the end-to-end intelligent capture method for the extraterrestrial exploration sample, disclosed by the invention, the end-to-end intelligent capture pose selection method for the extraterrestrial exploration sample does not need sample supervision training, is a self-learning mechanism, and can be promoted on line.

(2) According to the method for intelligently selecting the capture pose of the end-to-end extraterrestrial detection sample, prior information such as the shape and the size of the sample is not needed to be known in the capture training process, and an object with an unknown and irregular geometric shape can be captured with high success rate.

(3) The neural network parameter obtained by training is tested and verified in a migration physical environment, and the mechanical arm can continuously update the model through continuous interaction with the environment, so that continuous learning is realized, and the success rate of sample collection is continuously improved.

Drawings

FIG. 1 is a flow chart of an end-to-end method for intelligently grabbing an end-to-end extra-terrestrial probe sample according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an underground survey sample collection simulation training environment according to an embodiment of the present invention;

fig. 3 is a schematic diagram of grabbing irregular stones in the physical environment for acquiring an underground detection sample according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

Fig. 1 is a flow chart of an end-to-end method for intelligently grabbing an end-to-end extraterrestrial probe sample according to an embodiment of the present invention, wherein digital-physical test verification is performed in a manner of digital training and physical test; the method comprises the following steps:

step one, selecting a reinforcement learning method;

step two, constructing an extraterrestrial detection sample acquisition simulation training environment;

step three, performing digital training in the constructed simulation training environment to obtain a grabbing model;

and step four, transferring the obtained grabbing model to an extraterrestrial exploration sample grabbing physical experiment system, carrying out an extraterrestrial exploration sample acquisition physical experiment based on reinforcement learning, and grabbing articles such as stones, so as to finish end-to-end extraterrestrial exploration sample grabbing pose intelligent selection.

In the embodiment of the invention, a near-end strategy optimization method PPO is adopted as a reinforcement learning algorithm.

In the embodiment of the invention, a digital simulation environment is constructed by adopting multi-platform robot simulation software Webots.

In the embodiment of the invention, models of a target mechanical arm, a paw, a camera, a target object, a box, a desktop and the like are established in a simulation environment.

In the embodiment of the present invention, the digital training is specifically: by designing a reward function and a network structure, a deep neural network is trained, RGB-D images obtained through a camera are input, and a corresponding optimal grabbing pose is output.

In the embodiment of the present invention, the training process includes the following steps:

(4) and obtaining a deep neural network model, namely a grabbing model.

In the embodiment of the invention, the action reward function is designed as follows:

in the embodiment of the invention, a dense neural network DenseNet is adopted in both the execution network Actor and the evaluation network Critic in PPO, and the specific parameters are as follows:

a DenseNet-121 network is selected, and 121 layers comprise an initialization Layer, a dense connection Layer, a Transition Layer (TL) and a full connection Layer.

In the embodiment of the invention, the extraterrestrial detection sample grabbing physical experiment system comprises a target mechanical arm, a paw, a camera, a target object, a box and a table;

the step of transferring the obtained grabbing model to an extraterrestrial exploration sample grabbing physical experiment system is to establish a one-to-one correspondence relationship between grabbing poses in a simulation environment and physical environment grabbing poses;

in a physical experiment system, the position and pose of a camera relative to a mechanical arm base coordinate system are solved by using a calibration plate, and the grabbing position and pose in the simulation environment are converted to be under the mechanical arm base coordinate system, so that the mechanical arm is controlled to finish sample grabbing.

In the embodiment of the invention, the trained neural network parameters are tested and verified in a migration physical environment, and the mechanical arm can continuously update the model by continuously interacting with the environment, so that continuous learning is realized, and the success rate of sample collection is continuously improved.

Example (b):

sample collection in Mars detection is used as an application background, learning and training of the whole process of digital-physical integrated grabbing, analyzing and boxing is carried out, and full-autonomous target finding, grabbing and fine operation are achieved. The sample collection training grabbing process is shown in fig. 1. And (3) repeatedly carrying out iterative training on the basis of selecting a reinforcement learning training method and an initial network to determine a proper network structure, a reward design and training hyper-parameters, so as to obtain an optimal grabbing point network model and finally realize mechanical arm sampling control.

Step 1: the extraterrestrial exploration is designed based on a sample collection method of reinforcement learning.

The grab task is regarded as a Markov decision process: given the state s at time t_tThe mechanical arm according to the strategy pi(s)_t) Selecting and executing action a_tThen transited to a new state s_t+1And obtain a corresponding reward

The grabbing task needs to find a way to accumulate rewards

The largest strategy, γ, is the discount factor. The test adopts an off-strategy PPO method, and the PPO algorithm problem is defined as shown in formula (1):

wherein, theta_oldRepresenting the policy parameter vector before update, A_tAnd the estimated value of the dominance function at the time t is shown, and beta represents the price adjustment parameter of the KL dispersion. The optimal parameters can be directly solved by adopting a gradient descent method.

In a deep learning Network, gradient disappearance is easily caused by deepening of the Network depth, a connection relation between different layers is established by a Dense neural Network (DenseNet), a Network structure better than ResNet is made through the connection, the gradient disappearance problem is further lightened, the Network is narrow, the parameter quantity is greatly reduced, and the problem of over-fitting is favorably inhibited, so that the strategy Network is designed according to DenseNet. Because the action space of the mechanical arm is continuous, the training strategy effect is poor or dimension disasters occur due to direct discretization, PPO and DensenNet are combined, and the mean value and the variance of the grasping posture continuous control quantity f are directly obtained. And (4) taking the star surface plane grabbing task into consideration, obtaining the positions (x, y) when the mechanical arm paw is grabbed and the rotation angle alpha of the mechanical arm paw along the z axis by randomly sampling the continuous control quantity f.

Step 2: and (5) constructing an extraterrestrial exploration sample acquisition simulation training environment.

The problems of high cost and low efficiency exist when the mechanical arm is directly used for physical training, a digital simulation environment is constructed by adopting open-source multi-platform robot simulation software Webots, training is carried out based on the reinforcement learning algorithm and the deep neural network in the step 1, and the effect of inputting R-GBD images and outputting corresponding action states is achieved by designing a reward function and a network structure through the deep neural network.

The mechanical arm grabbing training simulation system is built on the basis of an existing UR5 seven-degree-of-freedom mechanical arm of Webots, can control each angle of a joint space and the posture of the tail end of the mechanical arm relative to a base in a Euclidean space, and can simultaneously solve the displacement converted from a function to the joint space by using self inverse kinematics. In order to improve the grabbing success rate by considering that the shape of an operation object is irregular, the object is grabbed and placed by adopting a cooperative mechanical arm with three fingers at the tail end, and the closing is controlled by controlling the angle of a clamping jaw. In addition, models of the target object, box, and desktop are built in the simulation environment, as shown in FIG. 2. In order to compensate for the missing information caused by the view angle, an object RGB-D image acquired by a depth camera is used as the input of a strategy network, namely the state of a Markov decision process. The camera mounting orientation is 45 ° down, and the image size is 200 × 200 × 4.

And step 3: extraterrestrial exploration is based on a physical test of a sample collection method of reinforcement learning.

And (3) carrying out test verification on the neural network parameters obtained by training in the step (2) in a physical environment, and continuously interacting with the environment to enable the robot to continuously update the model, realize continuous learning and continuously improve the success rate of sample collection.

On the basis of the technical scheme, a pile of objects with complex shapes are put into a box with the size of 20 × 10cm according to a certain sequence and pose, and the limited space is required to contain as many objects as possible. Under the non-structural environment, it is difficult to directly obtain the accurate position, posture and shape of object, and the model of training in will emulation moves and directly snatchs arbitrary irregular stone to the actual scene, constantly adds the stone in the snatching process, snatchs the success rate 83.33%. The result of grabbing in the real environment is shown in fig. 3.

Those skilled in the art will appreciate that those matters not described in detail in the present specification are well known in the art.

Although the present invention has been described with reference to the preferred embodiments, it is not intended to limit the present invention, and those skilled in the art can make variations and modifications of the present invention without departing from the spirit and scope of the present invention by using the methods and technical contents disclosed above.

Claims

1. An end-to-end intelligent capture method for an extraterrestrial exploration sample is characterized by comprising the following steps:

selecting a reinforcement learning method;

and transferring the obtained grabbing model to an extraterrestrial exploration sample grabbing physical experiment system, and carrying out an extraterrestrial exploration sample acquisition physical experiment based on reinforcement learning, so as to finish end-to-end extraterrestrial exploration sample grabbing.

2. The method for intelligently grabbing the end-to-end extra-terrestrial exploration samples according to claim 1, is characterized in that: and adopting a near-end strategy optimization method PPO as a selected reinforcement learning method.

3. The method for intelligently grabbing the end-to-end extra-terrestrial exploration samples according to claim 1, is characterized in that: and constructing an extraterrestrial detection sample acquisition simulation training environment by adopting multi-platform robot simulation software Webots.

4. The method for intelligently grabbing the end-to-end extra-terrestrial exploration samples according to claim 3, wherein the method comprises the following steps: when an extraterrestrial detection sample acquisition simulation training environment is established, a target mechanical arm, a paw, a camera, a target object, a box and a desktop model are established;

5. The method for intelligently grabbing the end-to-end extra-terrestrial exploration samples according to claim 1, is characterized in that: the digital training is specifically as follows: by designing a reward function and a network structure, a deep neural network is trained, RGB-D images obtained through a camera are input, and the optimal grabbing pose under a corresponding image coordinate system is output.

6. The method for intelligently grabbing the end-to-end extra-terrestrial exploration samples according to claim 5, wherein the method comprises the following steps: the reward function is as follows:

in PPO, dense neural network DenseNet is adopted for executing network Actor and evaluating network criticic, and the specific parameters are as follows: a DenseNet-121 network is selected, and 121 layers comprise an initialization layer, a dense connection layer, a transition layer and a full connection layer.

7. The method for intelligently grabbing end-to-end extra-terrestrial exploration samples according to claim 6, wherein the training process comprises the following steps:

(4) and obtaining a deep neural network model, namely a grabbing model.

8. The method for intelligently grabbing the end-to-end extra-terrestrial exploration samples according to claim 1, is characterized in that: the extraterrestrial detection sample grabbing physical experiment system comprises a target mechanical arm, a paw, a camera, a target object, a box and a table;

9. The method for intelligently grabbing the end-to-end extra-terrestrial exploration samples according to claim 1, is characterized in that: the sample collection physical test for the extraterrestrial exploration based on reinforcement learning specifically comprises the following steps: the neural network parameters obtained by training are tested and verified in the migration physical environment, and the mechanical arm continuously updates the grabbing model through continuous interaction with the environment, so that continuous learning is realized, and the success rate of sample collection is improved.

10. An intelligent capture system for the extraterrestrial exploration sample, which is implemented by the intelligent capture method for the end-to-end extraterrestrial exploration sample according to any one of claims 1 to 9, and is characterized by comprising:

the reward function is as follows: