CN114367988A

CN114367988A - Mechanical arm motion planning method and device on autonomous inspection platform of coal mine

Info

Publication number: CN114367988A
Application number: CN202210127986.3A
Authority: CN
Inventors: 付鑫; 王海军; 杨晓辉; 王洪磊; 孟繁悦; 吴太晖
Original assignee: General Coal Research Institute Co Ltd
Current assignee: General Coal Research Institute Co Ltd
Priority date: 2022-02-11
Filing date: 2022-02-11
Publication date: 2022-04-19
Anticipated expiration: 2042-02-11
Also published as: CN114367988B

Abstract

The application provides a method and a device for planning the motion of a mechanical arm on an autonomous inspection platform of a coal mine, wherein the method comprises the following steps: detecting obstacles within the reach range of the mechanical arm; carrying out voxelization processing on the obstacle to obtain a voxel vector of the obstacle; acquiring a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current moment; inputting the voxel vector, the target pose vector of the mechanical arm and the joint angle vector of the mechanical arm at the current moment into a pre-trained motion planner to obtain the joint scheduling variation of the mechanical arm at the next moment; and controlling the mechanical arm to complete the next action according to the joint angle variation of the mechanical arm at the next moment. The method and the device have the advantages that voxel processing is carried out on the obstacles within the reach range of the mechanical arm, the joint angle variable quantity of the mechanical arm at the next moment is obtained by the pre-trained motion planner, the mechanical arm is controlled to complete the next action, the method and the device are suitable for complex working scenes, and the requirement of controlling the mobile inspection platform in a coal mine scene is met.

Description

Mechanical arm motion planning method and device on autonomous inspection platform of coal mine

Technical Field

The application relates to the technical field of mechanical arm motion planning, in particular to a mechanical arm motion planning method and device on an autonomous inspection platform of a coal mine.

Background

In coal mines, a large number of devices which need to be periodically monitored exist in the open air or underground, such as belt conveyors, power distribution rooms, fully mechanized coal mining face devices, ventilation devices, drainage devices and the like, and when faults occur, simple processing is often required, such as switching-on of the power distribution rooms and other operations. In order to realize the functions, the autonomous navigation mobile platform and the mechanical arm are combined, and the mechanical arm is applied to the inspection platform. Compared with the traditional factory workshop working environment, the working environment of the underground inspection platform is more complex, mainly has the characteristics of non-structurization, personnel participation, incapability of setting a safety zone, unpredictable dynamic change of a scene and the like, and provides higher requirements for the motion planning of the mechanical arm.

Disclosure of Invention

The application provides a method and a device for planning the motion of a mechanical arm on an autonomous inspection platform of a coal mine, which are used for improving the trajectory quality of the motion planning of the mechanical arm in a complex environment.

The embodiment of the first aspect of the application provides a method for planning the motion of a mechanical arm on an autonomous inspection platform of a coal mine, which comprises the following steps:

detecting obstacles within the reach range of the mechanical arm;

carrying out voxelization processing on the obstacle to obtain a voxel vector of the obstacle;

acquiring a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current moment;

inputting the voxel vector, the target pose vector of the mechanical arm and the joint angle vector of the mechanical arm at the current moment into a pre-trained motion planner to obtain the joint angle variation of the mechanical arm at the next moment;

and controlling the mechanical arm to finish the next action according to the joint angle variation of the mechanical arm at the next moment.

The embodiment of the second aspect of the application provides a mechanical arm motion planning device on a coal mine autonomous inspection platform, which comprises:

the detection module is used for detecting obstacles within the reach range of the mechanical arm;

the processing module is used for carrying out voxelization processing on the obstacle to obtain a voxel vector of the obstacle;

the acquisition module is used for acquiring a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current moment;

the planning module is used for inputting the voxel vector, the target pose vector of the mechanical arm and the joint angle vector of the mechanical arm at the current moment into a pre-trained motion planner to obtain the joint angle variation of the mechanical arm at the next moment;

and the control module is used for controlling the mechanical arm to complete the next action according to the joint angle variation of the mechanical arm at the next moment.

The technical scheme provided by the embodiment of the application can have the following beneficial effects: the method comprises the steps of carrying out voxelization processing on obstacles in the reach range of the mechanical arm, obtaining the joint angle variable quantity of the mechanical arm at the next moment by utilizing a pre-trained motion planner, and controlling the mechanical arm to complete the next action. The method and the device are suitable for complex working scenes, are high in calculation speed, and meet the requirement for implementing control of the mobile inspection platform in the coal mine scene.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a schematic flow chart of a method for planning the movement of a mechanical arm on an autonomous inspection platform for a coal mine according to an embodiment of the present application;

fig. 2 is a schematic diagram of an exercise planner according to an embodiment of the present application;

fig. 3 is a schematic diagram of a training process of an exercise planner according to an embodiment of the present application;

fig. 4 is a schematic diagram of a training process of another exercise planner according to an embodiment of the present application;

fig. 5 is a schematic diagram of a method for planning the motion of a mechanical arm on an autonomous inspection platform for a coal mine according to an embodiment of the application;

fig. 6 is a schematic structural diagram of a mechanical arm movement planning device on an autonomous inspection platform for a coal mine according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of another device for planning the motion of a mechanical arm on an autonomous inspection platform for a coal mine according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

The method and the device for planning the motion of the mechanical arm on the autonomous inspection platform for the coal mine according to the embodiment of the application are described below with reference to the accompanying drawings.

Fig. 1 is a schematic flow chart of a method for planning the movement of a mechanical arm on an autonomous inspection platform for a coal mine according to an embodiment of the present application. As shown in fig. 1, the method for planning the motion of the mechanical arm on the autonomous inspection platform for the coal mine may include the following steps:

step 101, detecting obstacles within the reach of the mechanical arm.

It should be noted that the obstacle is an obstacle that may interfere with the movement of the robot arm within the reach of the robot arm, for example, inspection platform equipment, sundries, staff, etc. within the reach of the robot arm.

And 102, carrying out voxelization processing on the obstacle to obtain a voxel vector of the obstacle.

In some embodiments of the present application, an obstacle may be decomposed into a fixed number N of cube voxels with a fixed side length L, and the three-dimensional coordinates (x, y, z) of the center point of the cube voxels are spliced into a voxel vector O of the obstacle with a length of 3N:

O＝(x₁,y₁,z₁,x₂,y₂,z₂,…,x_N,y_N,z_N)

and 103, acquiring a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current moment.

It should be noted that the target pose of the robot arm is used to indicate the position and orientation of the end effector (e.g., a gripper, a welding gun, etc.) of the robot arm reaching the target point. Alternatively, a coordinate system may be provided, for example, a rectangular coordinate system may be established with the center point of the robot arm base as the origin. According to the rectangular coordinate system, a tool center point TCP of the mechanical arm represents an end effector of the mechanical arm, and a target pose vector of the tool center point TCP of the mechanical arm is obtained

Wherein x, y and z are three-dimensional space coordinates of the central point of the mechanical arm tool, and w, i, j and k are space rotation coordinates expressed by quaternions of the end effector of the mechanical arm.

Further, as an example, assuming that the robot arm is a six-degree-of-freedom robot arm, a joint angle vector of the robot arm at the present time may be represented as θ_current＝(θ₁,θ₂,θ₃,θ₄,θ₅,θ₆) Wherein theta_iThe angle of the ith joint of the mechanical arm at the current moment is shown.

And 104, inputting the voxel vector, the target pose vector of the mechanical arm and the joint angle vector of the mechanical arm at the current moment into a pre-trained motion planner to obtain the joint angle variation of the mechanical arm at the next moment.

As an example, as shown in fig. 2Shown, a voxel vector O, a target pose vector of the mechanical arm can be calculated

And the joint angle vector theta of the mechanical arm at the current moment_currentSplicing is carried out as an input vector I of the motion planner_nurrent。

Inputting vector I_nurrentInputting the angle change to a pre-trained motion planner to obtain the angle change O of the joint at the next moment of the mechanical arm_neural。

O_neural＝Δθ＝(Δθ₁,Δθ₂,Δθ₃,Δθ₄,Δθ₅,Δθ₆)

Wherein, Delta theta_iThe change amount of the joint angle of the mechanical arm at the next moment of the ith joint is shown.

Optionally, in some embodiments of the present application, the motion planner may be generated by updating the strategic neural network parameters based on a reinforcement learning algorithm through a large amount of training data in a simulation environment. Specific implementation can be seen in the description of the following embodiments.

And 105, controlling the mechanical arm to complete the next action according to the joint angle variation of the mechanical arm at the next moment.

That is, the joint angle variation amount O according to the next time of the arm_neuralAnd controlling each joint of the mechanical arm to finish corresponding action.

According to the method for moving the mechanical arm on the coal mine autonomous inspection platform, the barrier within the reach range of the mechanical arm is subjected to voxelization processing, the joint angle variation of the mechanical arm at the next moment is obtained by using a pre-trained motion planner, and the mechanical arm is controlled to complete the next action. The method and the device are suitable for complex working scenes, are high in calculation speed, and meet the requirement for implementing control of the mobile inspection platform in the coal mine scene.

It should be noted that, the motion planner in the above embodiment needs to collect a large amount of training data to perform training learning so as to optimize the performance of the motion planner, so that a simulation environment can be built, a mechanical arm working scene is simulated, and training data is collected in the simulation environment. As an example, as shown in fig. 3, an implementation process of a pre-training motion planner provided in an embodiment of the present application may include the following steps:

step 301, create a simulation environment.

It should be noted that, the simulation environment is built based on a physical engine and is used for simulating the working scene of the mechanical arm, and the simulation environment includes the mechanical arm, the inspection platform equipment and the environmental barrier.

And 302, observing a joint angle vector of the mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample and a voxel vector of the barrier sample at the current moment in the simulation environment, and obtaining a joint angle variation of the mechanical arm sample at the next moment.

As an example, a joint angle vector I of a mechanical arm sample at the current moment in a simulation environment is observed_currentTarget pose vector of mechanical arm sample

And inputting the joint angle vector of the current moment of the mechanical arm sample, the voxel vector of the current moment of the barrier sample and the target pose vector of the mechanical arm sample into a strategic neural network, and obtaining the action a executed by the mechanical arm sample at the next moment from the action probability distribution output by the strategic neural network.

For example, a joint angle vector I of a mechanical arm sample at the current moment in a simulation environment at a certain moment_currentTarget pose vector of mechanical arm sample

And the voxel vector O of the current moment of the barrier sample is input into the strategic neural network, and in the action probability distribution output by the strategic neural network, three possible values of the action a can be obtained, wherein the actionThe probability of 1 being likely to occur is 0.3, the probability of action 2 being likely to occur is 0.6, and the probability of action 3 being likely to occur is 0.1. According to the action probability distribution output by the strategy neural network, the possibility that the action a' executed at the next moment of the mechanical arm sample is the action 2 is the largest.

And 303, controlling the mechanical arm sample to complete the next movement according to the joint angle variation of the mechanical arm sample at the next moment in the simulation environment.

At step 304, the status of the robotic arm sample and the status of the obstacle sample, reward signal, and task termination signal after the next action is performed are observed.

Optionally, in some embodiments of the present application, a reward function may be set based on the robotic arm sample and the status of the obstacle after performing the next action. As an example, the reward function may be set to:

wherein, thetaⁱThe angle of the ith joint of the current mechanical arm,

is the target joint angle of the ith joint. And taking the reward function value corresponding to the mechanical arm sample and the obstacle state after the next action is executed as a reward signal r.

In addition, if the mechanical arm sample reaches the target point, the task termination signal may be set to 1, which represents that the task of the current simulation environment is ended, or the number of times of actions of the mechanical arm sample in the current simulation environment exceeds a preset threshold, for example, the number of times of actions of the mechanical arm sample in the current simulation environment exceeds 1000 times, that is, the task termination signal is set to 1, which represents that the current simulation environment is abandoned, and the simulation environment may be reset; and if the current mechanical arm sample does not reach the target point, setting the task termination signal to be 0, and representing that the next action is continued under the current simulation environment.

And 305, determining the state of the mechanical arm sample after the next action is executed as a joint angle vector at the next moment of the mechanical arm sample, and determining the state of the obstacle sample after the next action is executed as a voxel vector at the next moment of the obstacle sample.

Step 306, taking the joint angle vector of the mechanical arm sample at the current moment, the target pose vector of the mechanical arm sample, the voxel vector of the barrier sample at the current moment, the action executed by the mechanical arm sample at the next moment, the joint angle vector of the mechanical arm sample at the next moment, the voxel vector of the barrier sample at the next moment, the reward signal and the task termination signal as training data.

As an example, a joint angle vector of the robot arm sample at the current time, a target pose vector of the robot arm sample, and a voxel vector of the obstacle sample at the current time may be represented by a state vector s at the current time, and a joint angle vector of the robot arm sample at the next time, a target pose vector of the robot arm sample, and a voxel vector of the obstacle sample at the next time may be represented by a state vector s' at the next time. The training data (s, a, r, s ', d) includes a state vector s of the robot arm sample at the current moment, an action a performed by the robot arm sample at the next moment, a reward signal r obtained after the robot arm sample performs the action a, a state vector s' of the robot arm sample at the next moment after the robot arm sample performs the action a, and a task termination signal d. The training data (s, a, r, s', d) of this motion of the robot arm is put into an experience playback pool.

And 307, updating parameters of the strategy neural network by adopting training data based on a reinforcement learning algorithm to obtain model parameters of the strategy neural network, and generating a motion planner according to the model parameters.

As an example, part of training data may be collected from an experience playback pool based on a SAC (Soft Actor-critical) reinforcement learning algorithm, and a Q neural network training label is calculated according to the training data; calculating Q neural network prediction data according to the training data; calculating a gradient vector of the Q neural network according to the Q neural network training label and the Q neural network prediction data; and updating parameters of the strategy neural network according to the gradient vector of the Q neural network, and generating a motion planner according to the model parameters.

Through implementing this application embodiment, can provide the motion planner of a mechanical arm to in the coal mine independently patrols and examines the actual scene of platform, adopt this motion planner to carry out the motion planning for the mechanical arm, in order to satisfy the coal mine scene and remove and patrol and examine the requirement that the platform implements control.

It should be noted that the motion planner generated by using the SAC reinforcement learning algorithm is more suitable for a complex working environment, has a sufficiently high sample efficiency, and further improves the performance of the motion planner. As an example, as shown in fig. 4, an implementation process of another pre-training motion planner provided in an embodiment of the present application may include the following steps:

step 401, creating a simulation environment.

And 402, observing a joint angle vector of the mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample and a voxel vector of the barrier sample at the current moment in the simulation environment, and obtaining a joint angle variation of the mechanical arm sample at the next moment.

And step 403, controlling the mechanical arm sample to complete the next movement according to the joint angle variation of the mechanical arm sample at the next moment in the simulation environment.

At step 404, the robot arm sample and obstacle sample status, reward signal, and task termination signal after performing the next action are observed.

Step 405, determining the state of the mechanical arm sample after the next action is executed as the joint angle vector of the mechanical arm sample at the next moment, and determining the state of the obstacle sample after the next action is executed as the voxel vector of the obstacle sample at the next moment.

And step 406, taking the joint angle vector of the mechanical arm sample at the current moment, the target pose vector of the mechanical arm sample, the voxel vector of the barrier sample at the current moment, the action executed by the mechanical arm sample at the next moment, the joint angle vector of the mechanical arm sample at the next moment, the voxel vector of the barrier sample at the next moment, the reward signal and the task termination signal as training data.

Step 407, calculating a Q neural network training label according to the training data based on a reinforcement learning algorithm.

As an example, the Q neural network training label y may be calculated by the following formula:

wherein r is a reward signal; s' is a state vector at the next moment, and comprises a joint angle vector at the next moment of the mechanical arm sample, a voxel vector at the next moment of the barrier sample and a target pose vector of the mechanical arm sample; d is a task termination signal; gamma is a first buckling coefficient, wherein the range of gamma is 0-1;

the motion is the motion made after the state vector s' at the next moment is input to the strategic neural network;

making motion for the ith Q neural network under the state vector s' at the next moment

The probability of (d); α is a second folding factor, and as an example, α may be 0.01;

acting the strategic neural network under the state vector s' at the next moment

The probability of (c).

And step 408, calculating Q neural network prediction data according to the training data.

Inputting the training data (s, a, r, s', d) into a Q neural network, and obtaining Q neural network prediction data of a mechanical arm sample executing the action a under a state vector s at the current moment

And step 409, calculating the gradient vector of the Q neural network according to the Q neural network training label and the Q neural network prediction data.

As an example, the gradient vector of the Q neural network may be calculated by the following formula

Wherein B is the number of training data collected in the experience replay pool,

predicting data for the ith Q neural network, and training labels for the ith Q neural network y (r, s', d).

And step 410, updating parameters of the strategy neural network according to the gradient vector of the Q neural network, obtaining model parameters of the strategy neural network, and generating a motion planner according to the model parameters.

Optionally, in some embodiments of the present application, the gradient vector may be based on a Q neural network

Updating Q neural network function value phi_targ,i(ii) a Updating parameters of the strategic neural network according to the Q neural network function value

In the embodiment of the present application, steps 401 to 406 may be implemented by any one of the methods in the embodiments of the present application, and the present application is not specifically limited and is not described again.

By implementing the embodiment of the application, the motion planner of the mechanical arm generated by utilizing the SAC reinforcement learning algorithm can be provided, and is more suitable for the actual scene of the autonomous inspection platform of the coal mine.

In order to better understand the method for planning the motion of the mechanical arm on the autonomous inspection platform for the coal mine provided by the embodiment of the application, the following detailed description is provided with reference to fig. 5. Fig. 5 is a schematic diagram of a method for planning the motion of a mechanical arm on an autonomous inspection platform for a coal mine according to an embodiment of the application. As shown in fig. 5, a simulation environment is created (501), a joint angle vector of a current time of a mechanical arm sample, a target pose vector of the mechanical arm sample, and a voxel vector of a current time of an obstacle sample in the simulation environment are observed, and the observed values are input to a strategic neural network (502), so that a joint angle variation (i.e., an action value) of the mechanical arm sample at the next time is obtained (503). And controlling the mechanical arm sample in the simulation environment to execute the next action according to the acquired joint angle variable quantity of the mechanical arm sample at the next moment. And observing and executing the next action, namely a joint angle vector of the next moment of the mechanical arm sample and a body vector of the next moment of the barrier sample, and putting the joint angle vector of the current moment of the mechanical arm sample, the target pose vector of the mechanical arm sample and the voxel vector of the current moment of the barrier sample, the action executed by the next moment of the mechanical arm sample, the joint angle vector of the next moment of the mechanical arm sample, the voxel vector of the next moment of the barrier sample, the reward signal and the task termination signal into an experience playback pool as training data (504). When enough training data is accumulated, part of the training data is sampled from the experience replay pool (505), and the parameters of the strategy neural network are adjusted (506), so that a motion planner is generated. Deploying a pre-trained motion planner into an actual scene (507), sending sensing signals (a target pose vector of the mechanical arm, a joint angle vector of the mechanical arm at the current moment and a voxel vector of an obstacle at the current moment) in a real working scene of the mechanical arm to a motion planner (508), obtaining a joint angle variation of the mechanical arm at the next moment, and controlling the mechanical arm to execute the next action according to the joint angle variation of the mechanical arm at the next moment (509).

In order to realize the embodiment, the application also provides a mechanical arm motion planning device on the autonomous inspection platform of the coal mine. Fig. 6 is a schematic structural diagram of a mechanical arm movement planning device on an autonomous inspection platform for a coal mine according to an embodiment of the present application, and as shown in fig. 6, the mechanical arm movement planning device on the autonomous inspection platform for a coal mine includes: a detection module 601, a processing module 602, an acquisition module 603, a planning module 604, and a control module 605.

The detection module 601 is configured to detect an obstacle within a reach of the mechanical arm.

The processing module 602 is configured to perform voxelization processing on the obstacle to obtain a voxel vector of the obstacle.

The obtaining module 603 is configured to obtain a target pose vector of the mechanical arm and a joint angle vector of the mechanical arm at the current time.

And the planning module 604 is configured to input the voxel vector, the target pose vector of the mechanical arm, and the joint angle vector of the mechanical arm at the current moment to a pre-trained motion planner to obtain a joint angle variation of the mechanical arm at the next moment.

And the control module 605 is configured to control the robot arm to complete the next action according to the joint angle variation at the next moment of the robot arm.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

According to the mechanical arm movement device on the coal mine autonomous inspection platform, the barrier within the reach range of the mechanical arm is subjected to voxelization processing, the joint angle variation of the mechanical arm at the next moment is obtained by using the pre-trained movement planner, and the mechanical arm is controlled to complete the next action. The method and the device are suitable for complex working scenes, are high in calculation speed, and meet the requirement for implementing control of the mobile inspection platform in the coal mine scene.

Fig. 7 is a schematic structural diagram of another device for planning the motion of the mechanical arm on the autonomous inspection platform for the coal mine according to the embodiment of the present application, and as shown in fig. 7, the device for planning the motion of the mechanical arm on the autonomous inspection platform for the coal mine further includes: a training module 706. Wherein 701-705 in fig. 7 and 601-605 in fig. 6 have the same functions and structures.

Wherein the training module 706 is specifically configured to: creating a simulation environment; observing a joint angle vector of a mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample and a voxel vector of an obstacle sample at the current moment in the simulation environment, and obtaining an action executed by the mechanical arm sample at the next moment; executing the action at the next moment in the simulation environment, and controlling the mechanical arm sample to complete the next movement; observing the states of the mechanical arm sample and the obstacle sample, a reward signal and a task termination signal after the next action is executed; determining the state of the mechanical arm sample after the next action is executed as a joint angle vector of the mechanical arm sample at the next moment, and determining the state of the obstacle sample after the next action is executed as a voxel vector of the obstacle sample at the next moment; taking a joint angle vector of the mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample, a voxel vector of the barrier sample at the current moment, an action executed by the mechanical arm sample at the next moment, a joint angle vector of the mechanical arm sample at the next moment, a voxel vector of the barrier sample at the next moment, a reward signal and a task termination signal as training data; based on a reinforcement learning algorithm, parameters of the strategy neural network are updated by adopting training data to obtain model parameters of the strategy neural network, and a motion planner is generated according to the model parameters.

In some embodiments of the present application, the training module 706 is further configured to: calculating a Q neural network training label according to the training data; calculating Q neural network prediction data according to the training data; calculating a gradient vector of the Q neural network according to the Q neural network training label and the Q neural network prediction data; and updating parameters of the strategy neural network according to the gradient vector of the Q neural network.

In some embodiments of the present application, the training module 706 is further configured to: updating a Q neural network function value according to the gradient vector of the Q neural network; and updating parameters of the strategy neural network according to the Q neural network function value.

In some embodiments of the present application, the Q neural network training labels are calculated by the following formula:

According to the mechanical arm movement device on the coal mine autonomous inspection platform, the movement planner generated by the SAC reinforcement learning algorithm is more suitable for the actual scene of the coal mine autonomous inspection platform, has high enough sample efficiency, and further improves the performance of the movement planner.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A method for planning the motion of a mechanical arm on an autonomous inspection platform of a coal mine is characterized by comprising the following steps:

detecting obstacles within the reach range of the mechanical arm;

2. The method of claim 1, wherein the motion planner is pre-trained by:

creating a simulation environment;

observing a joint angle vector of a mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample and a voxel vector of an obstacle sample at the current moment in the simulation environment, and obtaining a joint angle variation of the mechanical arm sample at the next moment;

controlling the mechanical arm sample to complete the next step of movement according to the joint angle variation at the next moment in the simulation environment;

observing the state of the mechanical arm sample and the state of the obstacle sample, a reward signal and a task termination signal after the next action is performed;

determining the state of the mechanical arm sample after the next action is executed as a joint angle vector of the mechanical arm sample at the next moment, and determining the state of the obstacle sample after the next action is executed as a voxel vector of the obstacle sample at the next moment;

taking the joint angle vector of the mechanical arm sample at the current moment, the target pose vector of the mechanical arm sample and the voxel vector of the barrier sample at the current moment, the action executed by the mechanical arm sample at the next moment, the joint angle vector of the mechanical arm sample at the next moment, the voxel vector of the barrier sample at the next moment, the reward signal and the task termination signal as training data;

and updating parameters of the strategy neural network by adopting the training data based on a reinforcement learning algorithm to obtain model parameters of the strategy neural network, and generating the motion planner according to the model parameters.

3. The method of claim 2, wherein the updating parameters of the neural network of the strategy using the training data based on the reinforcement learning algorithm comprises:

calculating a Q neural network training label according to the training data;

calculating Q neural network prediction data according to the training data;

calculating a gradient vector of the Q neural network according to the Q neural network training label and the Q neural network prediction data;

and updating parameters of the strategy neural network according to the gradient vector of the Q neural network.

4. The method of claim 3, wherein updating parameters of a policy neural network according to gradient vectors of the Q neural network comprises:

updating the Q neural network function value according to the gradient vector of the Q neural network;

and updating parameters of the strategy neural network according to the Q neural network function value.

5. The method of claim 3 or 4, wherein the Q neural network training labels are calculated by the following formula:

wherein r is the reward signal; s' is a state vector at the next moment, and includes a joint angle vector at the next moment of the mechanical arm sample, a voxel vector at the next moment of the obstacle sample, and a target pose vector of the mechanical arm sample; d is the task termination signal; gamma is a first buckling coefficient, wherein the range of gamma is 0-1;

the motion is the motion made after the state vector s' at the next moment is input to a strategic neural network;

making the motion for the ith Q neural network at the state vector s' of the next time instant

The probability of (d); alpha is a second buckling coefficient;

making the action for the strategic neural network at the state vector s' of the next moment

The probability of (c).

6. The utility model provides a colliery is mechanical arm motion planning device on platform of independently patrolling and examining which characterized in that includes:

7. The apparatus according to claim 6, further comprising a training module, specifically configured to:

creating a simulation environment;

observing a joint angle vector of a mechanical arm sample at the current moment, a target pose vector of the mechanical arm sample and a voxel vector of an obstacle sample at the current moment in the simulation environment, and obtaining an action executed by the mechanical arm sample at the next moment;

executing the action at the next moment in the simulation environment, and controlling the mechanical arm sample to complete the next movement;

observing the robot arm sample and the obstacle sample status, reward signals and task termination signals after the next action is performed;

8. The apparatus of claim 7, wherein the training module is further configured to:

calculating a Q neural network training label according to the training data;

calculating Q neural network prediction data according to the training data;

9. The apparatus of claim 8, wherein the training module is further configured to:

10. The apparatus of claim 8 or 9, wherein the Q neural network training labels are calculated by the following formula:

The probability of (d); alpha is a second buckling coefficient;

The probability of (c).