CN114367988A - Mechanical arm motion planning method and device on autonomous inspection platform of coal mine - Google Patents

Mechanical arm motion planning method and device on autonomous inspection platform of coal mine Download PDF

Info

Publication number
CN114367988A
CN114367988A CN202210127986.3A CN202210127986A CN114367988A CN 114367988 A CN114367988 A CN 114367988A CN 202210127986 A CN202210127986 A CN 202210127986A CN 114367988 A CN114367988 A CN 114367988A
Authority
CN
China
Prior art keywords
mechanical arm
vector
sample
neural network
joint angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210127986.3A
Other languages
Chinese (zh)
Other versions
CN114367988B (en
Inventor
付鑫
王海军
杨晓辉
王洪磊
孟繁悦
吴太晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
General Coal Research Institute Co Ltd
Original Assignee
General Coal Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by General Coal Research Institute Co Ltd filed Critical General Coal Research Institute Co Ltd
Priority to CN202210127986.3A priority Critical patent/CN114367988B/en
Publication of CN114367988A publication Critical patent/CN114367988A/en
Application granted granted Critical
Publication of CN114367988B publication Critical patent/CN114367988B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • B25J9/1666Avoiding collision or forbidden zones
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control

Abstract

The application provides a method and a device for planning the motion of a mechanical arm on an autonomous inspection platform of a coal mine, wherein the method comprises the following steps: detecting obstacles within the reach range of the mechanical arm; carrying out voxelization processing on the obstacle to obtain a voxel vector of the obstacle; acquiring a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current moment; inputting the voxel vector, the target pose vector of the mechanical arm and the joint angle vector of the mechanical arm at the current moment into a pre-trained motion planner to obtain the joint scheduling variation of the mechanical arm at the next moment; and controlling the mechanical arm to complete the next action according to the joint angle variation of the mechanical arm at the next moment. The method and the device have the advantages that voxel processing is carried out on the obstacles within the reach range of the mechanical arm, the joint angle variable quantity of the mechanical arm at the next moment is obtained by the pre-trained motion planner, the mechanical arm is controlled to complete the next action, the method and the device are suitable for complex working scenes, and the requirement of controlling the mobile inspection platform in a coal mine scene is met.

Description

Mechanical arm motion planning method and device on autonomous inspection platform of coal mine
Technical Field
The application relates to the technical field of mechanical arm motion planning, in particular to a mechanical arm motion planning method and device on an autonomous inspection platform of a coal mine.
Background
In coal mines, a large number of devices which need to be periodically monitored exist in the open air or underground, such as belt conveyors, power distribution rooms, fully mechanized coal mining face devices, ventilation devices, drainage devices and the like, and when faults occur, simple processing is often required, such as switching-on of the power distribution rooms and other operations. In order to realize the functions, the autonomous navigation mobile platform and the mechanical arm are combined, and the mechanical arm is applied to the inspection platform. Compared with the traditional factory workshop working environment, the working environment of the underground inspection platform is more complex, mainly has the characteristics of non-structurization, personnel participation, incapability of setting a safety zone, unpredictable dynamic change of a scene and the like, and provides higher requirements for the motion planning of the mechanical arm.
Disclosure of Invention
The application provides a method and a device for planning the motion of a mechanical arm on an autonomous inspection platform of a coal mine, which are used for improving the trajectory quality of the motion planning of the mechanical arm in a complex environment.
The embodiment of the first aspect of the application provides a method for planning the motion of a mechanical arm on an autonomous inspection platform of a coal mine, which comprises the following steps:
detecting obstacles within the reach range of the mechanical arm;
carrying out voxelization processing on the obstacle to obtain a voxel vector of the obstacle;
acquiring a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current moment;
inputting the voxel vector, the target pose vector of the mechanical arm and the joint angle vector of the mechanical arm at the current moment into a pre-trained motion planner to obtain the joint angle variation of the mechanical arm at the next moment;
and controlling the mechanical arm to finish the next action according to the joint angle variation of the mechanical arm at the next moment.
The embodiment of the second aspect of the application provides a mechanical arm motion planning device on a coal mine autonomous inspection platform, which comprises:
the detection module is used for detecting obstacles within the reach range of the mechanical arm;
the processing module is used for carrying out voxelization processing on the obstacle to obtain a voxel vector of the obstacle;
the acquisition module is used for acquiring a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current moment;
the planning module is used for inputting the voxel vector, the target pose vector of the mechanical arm and the joint angle vector of the mechanical arm at the current moment into a pre-trained motion planner to obtain the joint angle variation of the mechanical arm at the next moment;
and the control module is used for controlling the mechanical arm to complete the next action according to the joint angle variation of the mechanical arm at the next moment.
The technical scheme provided by the embodiment of the application can have the following beneficial effects: the method comprises the steps of carrying out voxelization processing on obstacles in the reach range of the mechanical arm, obtaining the joint angle variable quantity of the mechanical arm at the next moment by utilizing a pre-trained motion planner, and controlling the mechanical arm to complete the next action. The method and the device are suitable for complex working scenes, are high in calculation speed, and meet the requirement for implementing control of the mobile inspection platform in the coal mine scene.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flow chart of a method for planning the movement of a mechanical arm on an autonomous inspection platform for a coal mine according to an embodiment of the present application;
fig. 2 is a schematic diagram of an exercise planner according to an embodiment of the present application;
fig. 3 is a schematic diagram of a training process of an exercise planner according to an embodiment of the present application;
fig. 4 is a schematic diagram of a training process of another exercise planner according to an embodiment of the present application;
fig. 5 is a schematic diagram of a method for planning the motion of a mechanical arm on an autonomous inspection platform for a coal mine according to an embodiment of the application;
fig. 6 is a schematic structural diagram of a mechanical arm movement planning device on an autonomous inspection platform for a coal mine according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of another device for planning the motion of a mechanical arm on an autonomous inspection platform for a coal mine according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The method and the device for planning the motion of the mechanical arm on the autonomous inspection platform for the coal mine according to the embodiment of the application are described below with reference to the accompanying drawings.
Fig. 1 is a schematic flow chart of a method for planning the movement of a mechanical arm on an autonomous inspection platform for a coal mine according to an embodiment of the present application. As shown in fig. 1, the method for planning the motion of the mechanical arm on the autonomous inspection platform for the coal mine may include the following steps:
step 101, detecting obstacles within the reach of the mechanical arm.
It should be noted that the obstacle is an obstacle that may interfere with the movement of the robot arm within the reach of the robot arm, for example, inspection platform equipment, sundries, staff, etc. within the reach of the robot arm.
And 102, carrying out voxelization processing on the obstacle to obtain a voxel vector of the obstacle.
In some embodiments of the present application, an obstacle may be decomposed into a fixed number N of cube voxels with a fixed side length L, and the three-dimensional coordinates (x, y, z) of the center point of the cube voxels are spliced into a voxel vector O of the obstacle with a length of 3N:
O=(x1,y1,z1,x2,y2,z2,…,xN,yN,zN)
and 103, acquiring a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current moment.
It should be noted that the target pose of the robot arm is used to indicate the position and orientation of the end effector (e.g., a gripper, a welding gun, etc.) of the robot arm reaching the target point. Alternatively, a coordinate system may be provided, for example, a rectangular coordinate system may be established with the center point of the robot arm base as the origin. According to the rectangular coordinate system, a tool center point TCP of the mechanical arm represents an end effector of the mechanical arm, and a target pose vector of the tool center point TCP of the mechanical arm is obtained
Figure BDA0003501334590000041
Figure BDA0003501334590000042
Wherein x, y and z are three-dimensional space coordinates of the central point of the mechanical arm tool, and w, i, j and k are space rotation coordinates expressed by quaternions of the end effector of the mechanical arm.
Further, as an example, assuming that the robot arm is a six-degree-of-freedom robot arm, a joint angle vector of the robot arm at the present time may be represented as θcurrent=(θ123456) Wherein thetaiThe angle of the ith joint of the mechanical arm at the current moment is shown.
And 104, inputting the voxel vector, the target pose vector of the mechanical arm and the joint angle vector of the mechanical arm at the current moment into a pre-trained motion planner to obtain the joint angle variation of the mechanical arm at the next moment.
As an example, as shown in fig. 2Shown, a voxel vector O, a target pose vector of the mechanical arm can be calculated
Figure BDA0003501334590000051
And the joint angle vector theta of the mechanical arm at the current momentcurrentSplicing is carried out as an input vector I of the motion plannernurrent
Figure BDA0003501334590000052
Inputting vector InurrentInputting the angle change to a pre-trained motion planner to obtain the angle change O of the joint at the next moment of the mechanical armneural
Oneural=Δθ=(Δθ1,Δθ2,Δθ3,Δθ4,Δθ5,Δθ6)
Wherein, Delta thetaiThe change amount of the joint angle of the mechanical arm at the next moment of the ith joint is shown.
Optionally, in some embodiments of the present application, the motion planner may be generated by updating the strategic neural network parameters based on a reinforcement learning algorithm through a large amount of training data in a simulation environment. Specific implementation can be seen in the description of the following embodiments.
And 105, controlling the mechanical arm to complete the next action according to the joint angle variation of the mechanical arm at the next moment.
That is, the joint angle variation amount O according to the next time of the armneuralAnd controlling each joint of the mechanical arm to finish corresponding action.
According to the method for moving the mechanical arm on the coal mine autonomous inspection platform, the barrier within the reach range of the mechanical arm is subjected to voxelization processing, the joint angle variation of the mechanical arm at the next moment is obtained by using a pre-trained motion planner, and the mechanical arm is controlled to complete the next action. The method and the device are suitable for complex working scenes, are high in calculation speed, and meet the requirement for implementing control of the mobile inspection platform in the coal mine scene.
It should be noted that, the motion planner in the above embodiment needs to collect a large amount of training data to perform training learning so as to optimize the performance of the motion planner, so that a simulation environment can be built, a mechanical arm working scene is simulated, and training data is collected in the simulation environment. As an example, as shown in fig. 3, an implementation process of a pre-training motion planner provided in an embodiment of the present application may include the following steps:
step 301, create a simulation environment.
It should be noted that, the simulation environment is built based on a physical engine and is used for simulating the working scene of the mechanical arm, and the simulation environment includes the mechanical arm, the inspection platform equipment and the environmental barrier.
And 302, observing a joint angle vector of the mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample and a voxel vector of the barrier sample at the current moment in the simulation environment, and obtaining a joint angle variation of the mechanical arm sample at the next moment.
As an example, a joint angle vector I of a mechanical arm sample at the current moment in a simulation environment is observedcurrentTarget pose vector of mechanical arm sample
Figure BDA0003501334590000061
And inputting the joint angle vector of the current moment of the mechanical arm sample, the voxel vector of the current moment of the barrier sample and the target pose vector of the mechanical arm sample into a strategic neural network, and obtaining the action a executed by the mechanical arm sample at the next moment from the action probability distribution output by the strategic neural network.
For example, a joint angle vector I of a mechanical arm sample at the current moment in a simulation environment at a certain momentcurrentTarget pose vector of mechanical arm sample
Figure BDA0003501334590000062
And the voxel vector O of the current moment of the barrier sample is input into the strategic neural network, and in the action probability distribution output by the strategic neural network, three possible values of the action a can be obtained, wherein the actionThe probability of 1 being likely to occur is 0.3, the probability of action 2 being likely to occur is 0.6, and the probability of action 3 being likely to occur is 0.1. According to the action probability distribution output by the strategy neural network, the possibility that the action a' executed at the next moment of the mechanical arm sample is the action 2 is the largest.
And 303, controlling the mechanical arm sample to complete the next movement according to the joint angle variation of the mechanical arm sample at the next moment in the simulation environment.
At step 304, the status of the robotic arm sample and the status of the obstacle sample, reward signal, and task termination signal after the next action is performed are observed.
Optionally, in some embodiments of the present application, a reward function may be set based on the robotic arm sample and the status of the obstacle after performing the next action. As an example, the reward function may be set to:
Figure BDA0003501334590000071
wherein, thetaiThe angle of the ith joint of the current mechanical arm,
Figure BDA0003501334590000072
is the target joint angle of the ith joint. And taking the reward function value corresponding to the mechanical arm sample and the obstacle state after the next action is executed as a reward signal r.
In addition, if the mechanical arm sample reaches the target point, the task termination signal may be set to 1, which represents that the task of the current simulation environment is ended, or the number of times of actions of the mechanical arm sample in the current simulation environment exceeds a preset threshold, for example, the number of times of actions of the mechanical arm sample in the current simulation environment exceeds 1000 times, that is, the task termination signal is set to 1, which represents that the current simulation environment is abandoned, and the simulation environment may be reset; and if the current mechanical arm sample does not reach the target point, setting the task termination signal to be 0, and representing that the next action is continued under the current simulation environment.
And 305, determining the state of the mechanical arm sample after the next action is executed as a joint angle vector at the next moment of the mechanical arm sample, and determining the state of the obstacle sample after the next action is executed as a voxel vector at the next moment of the obstacle sample.
Step 306, taking the joint angle vector of the mechanical arm sample at the current moment, the target pose vector of the mechanical arm sample, the voxel vector of the barrier sample at the current moment, the action executed by the mechanical arm sample at the next moment, the joint angle vector of the mechanical arm sample at the next moment, the voxel vector of the barrier sample at the next moment, the reward signal and the task termination signal as training data.
As an example, a joint angle vector of the robot arm sample at the current time, a target pose vector of the robot arm sample, and a voxel vector of the obstacle sample at the current time may be represented by a state vector s at the current time, and a joint angle vector of the robot arm sample at the next time, a target pose vector of the robot arm sample, and a voxel vector of the obstacle sample at the next time may be represented by a state vector s' at the next time. The training data (s, a, r, s ', d) includes a state vector s of the robot arm sample at the current moment, an action a performed by the robot arm sample at the next moment, a reward signal r obtained after the robot arm sample performs the action a, a state vector s' of the robot arm sample at the next moment after the robot arm sample performs the action a, and a task termination signal d. The training data (s, a, r, s', d) of this motion of the robot arm is put into an experience playback pool.
And 307, updating parameters of the strategy neural network by adopting training data based on a reinforcement learning algorithm to obtain model parameters of the strategy neural network, and generating a motion planner according to the model parameters.
As an example, part of training data may be collected from an experience playback pool based on a SAC (Soft Actor-critical) reinforcement learning algorithm, and a Q neural network training label is calculated according to the training data; calculating Q neural network prediction data according to the training data; calculating a gradient vector of the Q neural network according to the Q neural network training label and the Q neural network prediction data; and updating parameters of the strategy neural network according to the gradient vector of the Q neural network, and generating a motion planner according to the model parameters.
Through implementing this application embodiment, can provide the motion planner of a mechanical arm to in the coal mine independently patrols and examines the actual scene of platform, adopt this motion planner to carry out the motion planning for the mechanical arm, in order to satisfy the coal mine scene and remove and patrol and examine the requirement that the platform implements control.
It should be noted that the motion planner generated by using the SAC reinforcement learning algorithm is more suitable for a complex working environment, has a sufficiently high sample efficiency, and further improves the performance of the motion planner. As an example, as shown in fig. 4, an implementation process of another pre-training motion planner provided in an embodiment of the present application may include the following steps:
step 401, creating a simulation environment.
And 402, observing a joint angle vector of the mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample and a voxel vector of the barrier sample at the current moment in the simulation environment, and obtaining a joint angle variation of the mechanical arm sample at the next moment.
And step 403, controlling the mechanical arm sample to complete the next movement according to the joint angle variation of the mechanical arm sample at the next moment in the simulation environment.
At step 404, the robot arm sample and obstacle sample status, reward signal, and task termination signal after performing the next action are observed.
Step 405, determining the state of the mechanical arm sample after the next action is executed as the joint angle vector of the mechanical arm sample at the next moment, and determining the state of the obstacle sample after the next action is executed as the voxel vector of the obstacle sample at the next moment.
And step 406, taking the joint angle vector of the mechanical arm sample at the current moment, the target pose vector of the mechanical arm sample, the voxel vector of the barrier sample at the current moment, the action executed by the mechanical arm sample at the next moment, the joint angle vector of the mechanical arm sample at the next moment, the voxel vector of the barrier sample at the next moment, the reward signal and the task termination signal as training data.
Step 407, calculating a Q neural network training label according to the training data based on a reinforcement learning algorithm.
As an example, the Q neural network training label y may be calculated by the following formula:
Figure BDA0003501334590000091
wherein r is a reward signal; s' is a state vector at the next moment, and comprises a joint angle vector at the next moment of the mechanical arm sample, a voxel vector at the next moment of the barrier sample and a target pose vector of the mechanical arm sample; d is a task termination signal; gamma is a first buckling coefficient, wherein the range of gamma is 0-1;
Figure BDA0003501334590000092
the motion is the motion made after the state vector s' at the next moment is input to the strategic neural network;
Figure BDA0003501334590000093
making motion for the ith Q neural network under the state vector s' at the next moment
Figure BDA0003501334590000094
The probability of (d); α is a second folding factor, and as an example, α may be 0.01;
Figure BDA0003501334590000101
acting the strategic neural network under the state vector s' at the next moment
Figure BDA0003501334590000102
The probability of (c).
And step 408, calculating Q neural network prediction data according to the training data.
Inputting the training data (s, a, r, s', d) into a Q neural network, and obtaining Q neural network prediction data of a mechanical arm sample executing the action a under a state vector s at the current moment
Figure BDA0003501334590000103
And step 409, calculating the gradient vector of the Q neural network according to the Q neural network training label and the Q neural network prediction data.
As an example, the gradient vector of the Q neural network may be calculated by the following formula
Figure BDA0003501334590000104
Figure BDA0003501334590000105
Wherein B is the number of training data collected in the experience replay pool,
Figure BDA0003501334590000106
predicting data for the ith Q neural network, and training labels for the ith Q neural network y (r, s', d).
And step 410, updating parameters of the strategy neural network according to the gradient vector of the Q neural network, obtaining model parameters of the strategy neural network, and generating a motion planner according to the model parameters.
Optionally, in some embodiments of the present application, the gradient vector may be based on a Q neural network
Figure BDA0003501334590000107
Updating Q neural network function value phitarg,i(ii) a Updating parameters of the strategic neural network according to the Q neural network function value
Figure BDA0003501334590000108
Figure BDA0003501334590000109
In the embodiment of the present application, steps 401 to 406 may be implemented by any one of the methods in the embodiments of the present application, and the present application is not specifically limited and is not described again.
By implementing the embodiment of the application, the motion planner of the mechanical arm generated by utilizing the SAC reinforcement learning algorithm can be provided, and is more suitable for the actual scene of the autonomous inspection platform of the coal mine.
In order to better understand the method for planning the motion of the mechanical arm on the autonomous inspection platform for the coal mine provided by the embodiment of the application, the following detailed description is provided with reference to fig. 5. Fig. 5 is a schematic diagram of a method for planning the motion of a mechanical arm on an autonomous inspection platform for a coal mine according to an embodiment of the application. As shown in fig. 5, a simulation environment is created (501), a joint angle vector of a current time of a mechanical arm sample, a target pose vector of the mechanical arm sample, and a voxel vector of a current time of an obstacle sample in the simulation environment are observed, and the observed values are input to a strategic neural network (502), so that a joint angle variation (i.e., an action value) of the mechanical arm sample at the next time is obtained (503). And controlling the mechanical arm sample in the simulation environment to execute the next action according to the acquired joint angle variable quantity of the mechanical arm sample at the next moment. And observing and executing the next action, namely a joint angle vector of the next moment of the mechanical arm sample and a body vector of the next moment of the barrier sample, and putting the joint angle vector of the current moment of the mechanical arm sample, the target pose vector of the mechanical arm sample and the voxel vector of the current moment of the barrier sample, the action executed by the next moment of the mechanical arm sample, the joint angle vector of the next moment of the mechanical arm sample, the voxel vector of the next moment of the barrier sample, the reward signal and the task termination signal into an experience playback pool as training data (504). When enough training data is accumulated, part of the training data is sampled from the experience replay pool (505), and the parameters of the strategy neural network are adjusted (506), so that a motion planner is generated. Deploying a pre-trained motion planner into an actual scene (507), sending sensing signals (a target pose vector of the mechanical arm, a joint angle vector of the mechanical arm at the current moment and a voxel vector of an obstacle at the current moment) in a real working scene of the mechanical arm to a motion planner (508), obtaining a joint angle variation of the mechanical arm at the next moment, and controlling the mechanical arm to execute the next action according to the joint angle variation of the mechanical arm at the next moment (509).
In order to realize the embodiment, the application also provides a mechanical arm motion planning device on the autonomous inspection platform of the coal mine. Fig. 6 is a schematic structural diagram of a mechanical arm movement planning device on an autonomous inspection platform for a coal mine according to an embodiment of the present application, and as shown in fig. 6, the mechanical arm movement planning device on the autonomous inspection platform for a coal mine includes: a detection module 601, a processing module 602, an acquisition module 603, a planning module 604, and a control module 605.
The detection module 601 is configured to detect an obstacle within a reach of the mechanical arm.
The processing module 602 is configured to perform voxelization processing on the obstacle to obtain a voxel vector of the obstacle.
The obtaining module 603 is configured to obtain a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current time.
And the planning module 604 is configured to input the voxel vector, the target pose vector of the mechanical arm, and the joint angle vector of the mechanical arm at the current moment to a pre-trained motion planner to obtain a joint angle variation of the mechanical arm at the next moment.
And the control module 605 is configured to control the robot arm to complete the next action according to the joint angle variation at the next moment of the robot arm.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
According to the mechanical arm movement device on the coal mine autonomous inspection platform, the barrier within the reach range of the mechanical arm is subjected to voxelization processing, the joint angle variation of the mechanical arm at the next moment is obtained by using the pre-trained movement planner, and the mechanical arm is controlled to complete the next action. The method and the device are suitable for complex working scenes, are high in calculation speed, and meet the requirement for implementing control of the mobile inspection platform in the coal mine scene.
Fig. 7 is a schematic structural diagram of another device for planning the motion of the mechanical arm on the autonomous inspection platform for the coal mine according to the embodiment of the present application, and as shown in fig. 7, the device for planning the motion of the mechanical arm on the autonomous inspection platform for the coal mine further includes: a training module 706. Wherein 701-705 in fig. 7 and 601-605 in fig. 6 have the same functions and structures.
Wherein the training module 706 is specifically configured to: creating a simulation environment; observing a joint angle vector of a mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample and a voxel vector of an obstacle sample at the current moment in the simulation environment, and obtaining an action executed by the mechanical arm sample at the next moment; executing the action at the next moment in the simulation environment, and controlling the mechanical arm sample to complete the next movement; observing the states of the mechanical arm sample and the obstacle sample, a reward signal and a task termination signal after the next action is executed; determining the state of the mechanical arm sample after the next action is executed as a joint angle vector of the mechanical arm sample at the next moment, and determining the state of the obstacle sample after the next action is executed as a voxel vector of the obstacle sample at the next moment; taking a joint angle vector of the mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample, a voxel vector of the barrier sample at the current moment, an action executed by the mechanical arm sample at the next moment, a joint angle vector of the mechanical arm sample at the next moment, a voxel vector of the barrier sample at the next moment, a reward signal and a task termination signal as training data; based on a reinforcement learning algorithm, parameters of the strategy neural network are updated by adopting training data to obtain model parameters of the strategy neural network, and a motion planner is generated according to the model parameters.
In some embodiments of the present application, the training module 706 is further configured to: calculating a Q neural network training label according to the training data; calculating Q neural network prediction data according to the training data; calculating a gradient vector of the Q neural network according to the Q neural network training label and the Q neural network prediction data; and updating parameters of the strategy neural network according to the gradient vector of the Q neural network.
In some embodiments of the present application, the training module 706 is further configured to: updating a Q neural network function value according to the gradient vector of the Q neural network; and updating parameters of the strategy neural network according to the Q neural network function value.
In some embodiments of the present application, the Q neural network training labels are calculated by the following formula:
Figure BDA0003501334590000131
with regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
According to the mechanical arm movement device on the coal mine autonomous inspection platform, the movement planner generated by the SAC reinforcement learning algorithm is more suitable for the actual scene of the coal mine autonomous inspection platform, has high enough sample efficiency, and further improves the performance of the movement planner.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims (10)

1. A method for planning the motion of a mechanical arm on an autonomous inspection platform of a coal mine is characterized by comprising the following steps:
detecting obstacles within the reach range of the mechanical arm;
carrying out voxelization processing on the obstacle to obtain a voxel vector of the obstacle;
acquiring a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current moment;
inputting the voxel vector, the target pose vector of the mechanical arm and the joint angle vector of the mechanical arm at the current moment into a pre-trained motion planner to obtain the joint angle variation of the mechanical arm at the next moment;
and controlling the mechanical arm to finish the next action according to the joint angle variation of the mechanical arm at the next moment.
2. The method of claim 1, wherein the motion planner is pre-trained by:
creating a simulation environment;
observing a joint angle vector of a mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample and a voxel vector of an obstacle sample at the current moment in the simulation environment, and obtaining a joint angle variation of the mechanical arm sample at the next moment;
controlling the mechanical arm sample to complete the next step of movement according to the joint angle variation at the next moment in the simulation environment;
observing the state of the mechanical arm sample and the state of the obstacle sample, a reward signal and a task termination signal after the next action is performed;
determining the state of the mechanical arm sample after the next action is executed as a joint angle vector of the mechanical arm sample at the next moment, and determining the state of the obstacle sample after the next action is executed as a voxel vector of the obstacle sample at the next moment;
taking the joint angle vector of the mechanical arm sample at the current moment, the target pose vector of the mechanical arm sample and the voxel vector of the barrier sample at the current moment, the action executed by the mechanical arm sample at the next moment, the joint angle vector of the mechanical arm sample at the next moment, the voxel vector of the barrier sample at the next moment, the reward signal and the task termination signal as training data;
and updating parameters of the strategy neural network by adopting the training data based on a reinforcement learning algorithm to obtain model parameters of the strategy neural network, and generating the motion planner according to the model parameters.
3. The method of claim 2, wherein the updating parameters of the neural network of the strategy using the training data based on the reinforcement learning algorithm comprises:
calculating a Q neural network training label according to the training data;
calculating Q neural network prediction data according to the training data;
calculating a gradient vector of the Q neural network according to the Q neural network training label and the Q neural network prediction data;
and updating parameters of the strategy neural network according to the gradient vector of the Q neural network.
4. The method of claim 3, wherein updating parameters of a policy neural network according to gradient vectors of the Q neural network comprises:
updating the Q neural network function value according to the gradient vector of the Q neural network;
and updating parameters of the strategy neural network according to the Q neural network function value.
5. The method of claim 3 or 4, wherein the Q neural network training labels are calculated by the following formula:
Figure FDA0003501334580000031
wherein r is the reward signal; s' is a state vector at the next moment, and includes a joint angle vector at the next moment of the mechanical arm sample, a voxel vector at the next moment of the obstacle sample, and a target pose vector of the mechanical arm sample; d is the task termination signal; gamma is a first buckling coefficient, wherein the range of gamma is 0-1;
Figure FDA0003501334580000032
the motion is the motion made after the state vector s' at the next moment is input to a strategic neural network;
Figure FDA0003501334580000033
making the motion for the ith Q neural network at the state vector s' of the next time instant
Figure FDA0003501334580000034
The probability of (d); alpha is a second buckling coefficient;
Figure FDA0003501334580000035
making the action for the strategic neural network at the state vector s' of the next moment
Figure FDA0003501334580000036
The probability of (c).
6. The utility model provides a colliery is mechanical arm motion planning device on platform of independently patrolling and examining which characterized in that includes:
the detection module is used for detecting obstacles within the reach range of the mechanical arm;
the processing module is used for carrying out voxelization processing on the obstacle to obtain a voxel vector of the obstacle;
the acquisition module is used for acquiring a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current moment;
the planning module is used for inputting the voxel vector, the target pose vector of the mechanical arm and the joint angle vector of the mechanical arm at the current moment into a pre-trained motion planner to obtain the joint angle variation of the mechanical arm at the next moment;
and the control module is used for controlling the mechanical arm to complete the next action according to the joint angle variation of the mechanical arm at the next moment.
7. The apparatus according to claim 6, further comprising a training module, specifically configured to:
creating a simulation environment;
observing a joint angle vector of a mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample and a voxel vector of an obstacle sample at the current moment in the simulation environment, and obtaining an action executed by the mechanical arm sample at the next moment;
executing the action at the next moment in the simulation environment, and controlling the mechanical arm sample to complete the next movement;
observing the robot arm sample and the obstacle sample status, reward signals and task termination signals after the next action is performed;
determining the state of the mechanical arm sample after the next action is executed as a joint angle vector of the mechanical arm sample at the next moment, and determining the state of the obstacle sample after the next action is executed as a voxel vector of the obstacle sample at the next moment;
taking the joint angle vector of the mechanical arm sample at the current moment, the target pose vector of the mechanical arm sample and the voxel vector of the barrier sample at the current moment, the action executed by the mechanical arm sample at the next moment, the joint angle vector of the mechanical arm sample at the next moment, the voxel vector of the barrier sample at the next moment, the reward signal and the task termination signal as training data;
and updating parameters of the strategy neural network by adopting the training data based on a reinforcement learning algorithm to obtain model parameters of the strategy neural network, and generating the motion planner according to the model parameters.
8. The apparatus of claim 7, wherein the training module is further configured to:
calculating a Q neural network training label according to the training data;
calculating Q neural network prediction data according to the training data;
calculating a gradient vector of the Q neural network according to the Q neural network training label and the Q neural network prediction data;
and updating parameters of the strategy neural network according to the gradient vector of the Q neural network.
9. The apparatus of claim 8, wherein the training module is further configured to:
updating the Q neural network function value according to the gradient vector of the Q neural network;
and updating parameters of the strategy neural network according to the Q neural network function value.
10. The apparatus of claim 8 or 9, wherein the Q neural network training labels are calculated by the following formula:
Figure FDA0003501334580000051
wherein r is the reward signal; s' is a state vector at the next moment, and includes a joint angle vector at the next moment of the mechanical arm sample, a voxel vector at the next moment of the obstacle sample, and a target pose vector of the mechanical arm sample; d is the task termination signal; gamma is a first buckling coefficient, wherein the range of gamma is 0-1;
Figure FDA0003501334580000052
the motion is the motion made after the state vector s' at the next moment is input to a strategic neural network;
Figure FDA0003501334580000053
making the motion for the ith Q neural network at the state vector s' of the next time instant
Figure FDA0003501334580000054
The probability of (d); alpha is a second buckling coefficient;
Figure FDA0003501334580000055
making the action for the strategic neural network at the state vector s' of the next moment
Figure FDA0003501334580000056
The probability of (c).
CN202210127986.3A 2022-02-11 2022-02-11 Mechanical arm motion planning method and device on autonomous inspection platform of coal mine Active CN114367988B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210127986.3A CN114367988B (en) 2022-02-11 2022-02-11 Mechanical arm motion planning method and device on autonomous inspection platform of coal mine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210127986.3A CN114367988B (en) 2022-02-11 2022-02-11 Mechanical arm motion planning method and device on autonomous inspection platform of coal mine

Publications (2)

Publication Number Publication Date
CN114367988A true CN114367988A (en) 2022-04-19
CN114367988B CN114367988B (en) 2023-11-17

Family

ID=81145638

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210127986.3A Active CN114367988B (en) 2022-02-11 2022-02-11 Mechanical arm motion planning method and device on autonomous inspection platform of coal mine

Country Status (1)

Country Link
CN (1) CN114367988B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107169986A (en) * 2017-05-23 2017-09-15 北京理工大学 A kind of obstacle detection method and system
CN108375379A (en) * 2018-02-01 2018-08-07 上海理工大学 The fast path planing method and mobile robot of dual DQN based on variation
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
CN109960880A (en) * 2019-03-26 2019-07-02 上海交通大学 A kind of industrial robot obstacle-avoiding route planning method based on machine learning
WO2020062165A1 (en) * 2018-09-29 2020-04-02 区链通网络有限公司 Method, node and system for training reinforcement learning model, and storage medium
CN110977967A (en) * 2019-11-29 2020-04-10 天津博诺智创机器人技术有限公司 Robot path planning method based on deep reinforcement learning
US20200130177A1 (en) * 2018-10-29 2020-04-30 Hrl Laboratories, Llc Systems and methods for few-shot transfer learning
US10668621B1 (en) * 2017-06-23 2020-06-02 Amazon Technologies, Inc. Fast motion planning collision detection
CN112153615A (en) * 2020-09-15 2020-12-29 南京信息工程大学滨江学院 Deep learning-based user association method in multi-cell cellular D2D equipment
US10936947B1 (en) * 2017-01-26 2021-03-02 Amazon Technologies, Inc. Recurrent neural network-based artificial intelligence system for time series predictions
KR20210063791A (en) * 2019-11-25 2021-06-02 한국기술교육대학교 산학협력단 System for mapless navigation based on dqn and slam considering characteristic of obstacle and processing method thereof
CN113492402A (en) * 2020-04-03 2021-10-12 发那科株式会社 Fast robot motion optimization with distance field

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10936947B1 (en) * 2017-01-26 2021-03-02 Amazon Technologies, Inc. Recurrent neural network-based artificial intelligence system for time series predictions
CN107169986A (en) * 2017-05-23 2017-09-15 北京理工大学 A kind of obstacle detection method and system
US10668621B1 (en) * 2017-06-23 2020-06-02 Amazon Technologies, Inc. Fast motion planning collision detection
CN108375379A (en) * 2018-02-01 2018-08-07 上海理工大学 The fast path planing method and mobile robot of dual DQN based on variation
CN108762281A (en) * 2018-06-08 2018-11-06 哈尔滨工程大学 It is a kind of that intelligent robot decision-making technique under the embedded Real-time Water of intensified learning is associated with based on memory
WO2020062165A1 (en) * 2018-09-29 2020-04-02 区链通网络有限公司 Method, node and system for training reinforcement learning model, and storage medium
US20200130177A1 (en) * 2018-10-29 2020-04-30 Hrl Laboratories, Llc Systems and methods for few-shot transfer learning
CN109960880A (en) * 2019-03-26 2019-07-02 上海交通大学 A kind of industrial robot obstacle-avoiding route planning method based on machine learning
KR20210063791A (en) * 2019-11-25 2021-06-02 한국기술교육대학교 산학협력단 System for mapless navigation based on dqn and slam considering characteristic of obstacle and processing method thereof
CN110977967A (en) * 2019-11-29 2020-04-10 天津博诺智创机器人技术有限公司 Robot path planning method based on deep reinforcement learning
CN113492402A (en) * 2020-04-03 2021-10-12 发那科株式会社 Fast robot motion optimization with distance field
CN112153615A (en) * 2020-09-15 2020-12-29 南京信息工程大学滨江学院 Deep learning-based user association method in multi-cell cellular D2D equipment

Also Published As

Publication number Publication date
CN114367988B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
Lakshmanan et al. Complete coverage path planning using reinforcement learning for tetromino based cleaning and maintenance robot
CN110000785B (en) Agricultural scene calibration-free robot motion vision cooperative servo control method and equipment
CN109397285B (en) Assembly method, assembly device and assembly equipment
CN109910018B (en) Robot virtual-real interaction operation execution system and method with visual semantic perception
CN105082132A (en) Rapid robotic imitation learning of force-torque tasks
EP3585569B1 (en) Systems, apparatus, and methods for robotic learning and execution of skills
Leiva et al. Robust rl-based map-less local planning: Using 2d point clouds as observations
Sadeghzadeh et al. Self-learning visual servoing of robot manipulator using explanation-based fuzzy neural networks and Q-learning
CN107309873B (en) Mechanical arm motion control method and system
Tung et al. Socially aware robot navigation using deep reinforcement learning
CN115038554A (en) Construction of complex scenarios for autonomous machines based on sensors
Gutzeit et al. The besman learning platform for automated robot skill learning
CN114367988B (en) Mechanical arm motion planning method and device on autonomous inspection platform of coal mine
WO2021008798A1 (en) Training of a convolutional neural network
Martyshkin et al. Development and investigation of a motion planning algorithm for a mobile robot with a smart machine vision system
Xu et al. Indoor multi-sensory self-supervised autonomous mobile robotic navigation
Nikitenko et al. RRTs postprocessing for uncertain environments
Han Trajectory tracking control method for flexible robot of construction machinery based on computer vision
Tian et al. Fruit Picking Robot Arm Training Solution Based on Reinforcement Learning in Digital Twin
Anderson et al. Autonomous navigation via a deep Q network with one-hot image encoding
Ding et al. Improving motion of robotic manipulators by an embedded optimizer
Guan Self-inspection method of unmanned aerial vehicles in power plants using deep q-network reinforcement learning
Supakar et al. PSO obstacle avoidance algorithm for robot in unknown environment
US11921492B2 (en) Transfer between tasks in different domains
Sunwoo et al. Optimal Path Search for Robot Manipulator using Deep Reinforcement Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant