CN113910221A

CN113910221A - Mechanical arm autonomous motion planning method, device, equipment and storage medium

Info

Publication number: CN113910221A
Application number: CN202111143685.1A
Authority: CN
Inventors: 林凡; 李沐; 卢泉州
Original assignee: GCI Science and Technology Co Ltd
Current assignee: GCI Science and Technology Co Ltd
Priority date: 2021-09-28
Filing date: 2021-09-28
Publication date: 2022-01-11
Anticipated expiration: 2041-09-28
Also published as: CN113910221B

Abstract

The invention discloses a method, a device, equipment and a storage medium for planning the autonomous movement of a mechanical arm, wherein the method comprises the following steps: acquiring mechanical arm data comprising position coordinate values, current moving speed values and current yaw speed values of the mechanical arm and obstacle data comprising position coordinate values and size data of obstacles; acquiring action evaluation index values by adopting an artificial potential field algorithm according to the mechanical arm data and the barrier data; taking the mechanical arm data, the obstacle data and the action evaluation index value as state values, inputting the state values into a preset decision model, and selecting the moving action, the continuous speed value of the moving action, the continuous angular speed value of the yaw action and the continuous angular speed value of the yaw action of the mechanical arm through the decision model; the decision model adopts an action strategy function based on normal distribution. The invention can output actions with continuous moving speed values and continuous swinging angular speed values based on the action strategy function of normal distribution, thereby improving the operation accuracy of the mechanical arm in continuous motion.

Description

Mechanical arm autonomous motion planning method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of robot control, in particular to a method, a device, equipment and a storage medium for planning autonomous movement of a mechanical arm.

Background

The precision operation mechanical arm is a novel product which can carry out precision processing on products on a factory assembly line after a worker introduces a motion code written in advance, and is a novel intelligent product integrating high operation precision, high execution efficiency and high automation. In a traditional workshop, staff are often required to stare at the assembly line all the time to process semi-finished products passing through the assembly line in time, but with continuous promotion of factory intellectualization, the precision operation manipulator is gradually and widely applied to the factory assembly line by virtue of the advantages of high operation precision, small error, short operation time, high efficiency, capability of reducing factory human resource loss, reducing enterprise cost and the like, and is an indispensable link in the construction of intelligent factories in the future.

However, most of the existing algorithms adopted by the manipulator are algorithms for realizing the autonomous navigation function, such as a, D, RRT algorithms and the like, the idea of the algorithms is to search out an optimal path according to the position of a target point and the position of an obstacle by positioning the position of the manipulator, and the algorithms can obtain good effect when being applied to a simple actual scene.

Disclosure of Invention

The invention provides a mechanical arm autonomous motion planning method, a device, equipment and a storage medium, which are used for solving the problem that the operation accuracy of a mechanical arm in continuous motion cannot be improved by the existing mechanical arm motion planning method, and can make a decision on the motion of the mechanical arm based on a motion strategy function of normal distribution, wherein the decision-making motion can sample the moving speed and the yaw angular speed from the normal distribution so as to output motions with a continuous moving speed value and a continuous swinging angular speed value, so that the operation accuracy of the mechanical arm in continuous motion is improved, and the mechanical arm is suitable for complex actual environments.

In order to solve the above technical problem, a first aspect of an embodiment of the present invention provides a method for planning autonomous motion of a robot arm, including:

acquiring mechanical arm data and obstacle data, wherein the mechanical arm data comprises position coordinate values of a mechanical arm, a current movement speed value and a current yaw speed value, and the obstacle data comprises position coordinate values and size data of an obstacle;

obtaining an action evaluation index value by adopting an artificial potential field algorithm according to the mechanical arm data and the barrier data;

inputting the mechanical arm data, the obstacle data and the action evaluation index value as state values into a preset decision model, and selecting the movement action of the mechanical arm, the continuous speed value of the movement action, the yaw action and the continuous angular speed value of the yaw action through the decision model; and the decision model adopts an action strategy function based on normal distribution.

As an improvement, the action policy function is specifically:

wherein s represents a state value, θ represents a parameter vector value, α represents a learning rate, σ (s, θ) represents a strategy distribution variance with the parameter vector value θ, μ (s, θ) represents a strategy distribution expectation with the parameter vector value θ, and π (x | s, θ) represents a probability of selecting an action x when the state value of the decision model is s and the parameter vector value is θ.

As an improvement, the selecting, by the decision model, the moving motion of the mechanical arm, the continuous velocity value of the moving motion, the yaw motion, and the continuous angular velocity value of the yaw motion specifically includes:

respectively limiting the normal distribution expectation of the moving speed of the mechanical arm and the normal distribution expectation of the yaw angular velocity of the mechanical arm by adopting a tanh activation function, and respectively obtaining a speed strategy distribution expectation and an angular velocity strategy distribution expectation;

obtaining a normal distribution and a moving action strategy function of the moving speed according to the strategy distribution variance of the moving speed and the speed strategy distribution expectation;

selecting the moving action of the mechanical arm according to the moving action strategy function, sampling the moving speed according to the normal distribution of the moving speed by the moving action, and selecting the continuous speed value;

obtaining normal distribution and a yaw action strategy function of the yaw angular velocity according to the strategy distribution variance of the yaw angular velocity and the strategy distribution expectation of the angular velocity;

and selecting the yaw action of the mechanical arm according to the yaw action strategy function, wherein the yaw action samples the yaw velocity according to the normal distribution of the yaw velocity to obtain the continuous angular velocity value.

As an improvement, the method obtains the decision model in advance by the following steps:

acquiring mechanical arm data and obstacle data at multiple moments;

obtaining action evaluation index values at a plurality of moments by adopting an artificial potential field algorithm according to the mechanical arm data and the obstacle data at the plurality of moments;

inputting the mechanical arm data, the obstacle data and the action evaluation index value at a plurality of moments as state values into a PPO model;

and training the PPO model by adopting a PPO algorithm based on the action strategy function of normal distribution, and obtaining the decision model.

As one improvement, the decision model comprises an input layer, a full connection layer, a selection network and an evaluation network;

the input layer is used for converting a state space sequence (S)₁,S₂,…,S_t) Input to the full connection layer, S_tRepresents the state value at time t;

the fully-connected layer comprises a first active layer and a second active layer;

the first active layer has 256 nodes, one node corresponds to one moving action or yaw action of the mechanical arm, and according to the state space sequence, the first active layer selects an evaluation function corresponding to the moving action or yaw action of the mechanical arm, wherein the evaluation function comprises a state cost function and an action cost function;

the second activation layer is used for limiting normal distribution expectation of the moving speed of the mechanical arm and normal distribution expectation of the yaw velocity of the mechanical arm by adopting a tanh activation function according to the state space sequence, and obtaining speed strategy distribution expectation and angular velocity strategy distribution expectation respectively;

the evaluation network is used for estimating the evaluation function according to the state space sequence, obtaining an advantage function according to the estimated evaluation function, obtaining a return value according to the advantage function, and updating the parameter vector value according to the return value;

the selection network constructs the moving action strategy function, the normal distribution of the moving speed, the yaw action strategy function and the normal distribution of the angular velocity according to the speed strategy distribution expectation and the angular velocity strategy distribution expectation obtained by the second activation layer, selects the moving action of the mechanical arm according to the moving action strategy function, samples the moving speed according to the normal distribution of the moving speed by the moving action, selects the continuous velocity value, selects the yaw action of the mechanical arm according to the yaw action strategy function, samples the yaw velocity according to the normal distribution of the yaw velocity, and obtains the continuous angular velocity value.

As an improvement, the obtaining of the action evaluation index value by using an artificial potential field algorithm according to the mechanical arm data and the obstacle information data specifically includes:

acquiring a risk factor according to the mechanical arm data and the obstacle data;

acquiring repulsive potential energy of the barrier according to the danger factor and a preset danger threshold;

obtaining the action evaluation index value according to the repulsive force potential energy and the attractive force potential energy of the destination area;

wherein the risk factor is defined as:

t＝d₀-max(L,W)

t represents the value of the risk factor, d₀Indicating the distance between the mechanical arm and the obstacle, L, W indicating the length and width of the obstacle, respectively;

the repulsive potential energy is defined as:

U′_rdenotes repulsive potential energy, eta denotes repulsive factor, t^*Represents the minimum value of the risk factor, t₀Represents a preset hazard threshold;

the gravitational potential energy is defined as:

U′_arepresents gravitational potential energy, k_pDenotes the gravity factor, d_gRepresents the distance between the robot arm and the destination,

a distance threshold representing a distance between a moving platform of the robotic arm and a destination;

the motion evaluation index value is determined by a formula U ═ U'_r+U′_aAnd (4) obtaining.

As an improvement, the acquiring of the mechanical arm data and the obstacle data specifically includes:

acquiring a position coordinate value of the mechanical arm based on a base position induction sensor arranged on the mechanical arm;

obtaining a current movement speed value of the mechanical arm based on at least one speed sensing sensor mounted on the mechanical arm, and obtaining a current yaw speed value of the mechanical arm based on at least one yaw type angular velocity sensor mounted on the mechanical arm;

and obtaining laser radar point cloud data of the obstacle according to the laser radar, and clustering the laser radar point cloud data by adopting an Euclidean distance clustering method of a K-D tree to obtain the obstacle data.

A second aspect of the embodiments of the present invention provides a robot arm autonomous movement planning apparatus, including:

the data acquisition module is used for acquiring mechanical arm data and obstacle data, wherein the mechanical arm data comprises position coordinate values of a mechanical arm, a current moving speed value and a current yaw speed value, and the obstacle data comprises position coordinate values and size data of an obstacle;

the action evaluation index value acquisition module is used for acquiring an action evaluation index value by adopting an artificial potential field algorithm according to the mechanical arm data and the barrier data;

the action decision module is used for inputting the mechanical arm data, the obstacle data and the action evaluation index value into a preset decision model as state values, and selecting the movement action of the mechanical arm, the continuous speed value of the movement action, the yaw action and the continuous angular speed value of the yaw action through the decision model; and the decision model adopts an action strategy function based on normal distribution.

A third aspect of an embodiment of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the method for planning autonomous movement of a robot arm according to any one of the first aspect.

A fourth aspect of embodiments of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, and when the computer program runs, the apparatus on which the computer-readable storage medium is located is controlled to perform the method for planning autonomous movement of a robot arm according to any one of the first aspects.

Compared with the prior art, the method, the device, the equipment and the storage medium for planning the autonomous motion of the mechanical arm have the advantages that the action strategy function based on normal distribution is used as the action strategy function of the decision model, the action of the mechanical arm is decided based on the action strategy function based on normal distribution, the movement speed and the yaw angular velocity can be sampled from the normal distribution by the decision action, the actions with the continuous movement velocity value and the continuous swing angular velocity value are output, the operation accuracy of the mechanical arm in the continuous motion process is improved, and the mechanical arm is suitable for complex actual environments.

Drawings

Fig. 1 is a schematic flow chart of a robot autonomous movement planning method according to a preferred embodiment of the present invention;

FIG. 2 is a network architecture diagram of a preferred embodiment of a decision model provided by the present invention;

fig. 3 is a schematic structural diagram of a preferred embodiment of the autonomous movement planning apparatus for a robot arm according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic flow chart of a method for planning autonomous movement of a robot arm according to a preferred embodiment of the present invention.

The first aspect of the embodiment of the present invention provides a method for planning autonomous motion of a robot arm, including steps S1 to S3, which are specifically as follows:

step S1: acquiring mechanical arm data and obstacle data, wherein the mechanical arm data comprises position coordinate values of a mechanical arm, a current movement speed value and a current yaw speed value, and the obstacle data comprises position coordinate values and size data of an obstacle.

In particular, to define states S in a reinforcement learning algorithm_tIt is necessary to acquire a plurality of data such as the position, speed, and obstacle data of the robot arm itself. Setting the magnetic north direction as the positive direction of a y axis and the magnetic north 90 degrees to the east as the positive direction of an x axis in a factory, and placing a router at intervals, wherein the placing distance is based on the scale of the factory, and the placing distance can be properly shortened when the scale is larger; the information sent by the sensor is collected through the router, and a plurality of data such as the position, the speed and the obstacle data of the mechanical arm are provided for the mechanical arm by means of a wireless sensing network.

According to the embodiment of the invention, a position sensing sensor is arranged on a base of a mechanical arm, the coordinate values (N, E, U) of the position sensing sensor in a factory are sensed by a wireless sensing network formed by a router, a plurality of speed sensing sensors and deflection type angular velocity sensors are arranged on the mechanical arm, and the moving speed v of the mechanical arm and the rotating yaw angular velocity omega are obtained through the sensors.

Further, the laser radar is used as a sensing sensor of the obstacle, and the Euclidean distance clustering method based on the K-D tree is used for clustering the point cloud of the laser radar to obtain the position coordinate value (X) of the obstacle relative to the self_i,Y_i,Z_i) And size data (L) including length, width and height information_i,W_i,H_i)。

The reinforcement learning is based on the state S at a certain time_tAction selection, State S in the autonomous navigation scenario of an embodiment of the present invention_tThe system comprises position coordinate values (N, E, U), mechanical arm motion information (v, omega) and obstacle information (X)_i,Y_i，Z_i,L_i,W_i,H_i)。

Step S2: obtaining an action evaluation index value by adopting an artificial potential field algorithm according to the mechanical arm data and the barrier data;

wherein the risk factor is defined as:

t＝d₀-max(L,W)

the repulsive potential energy is defined as:

U′_rdenotes repulsive potential energy, eta denotes repulsive factor, t^*Represents the minimum value of the risk factor, t₀Representing a preset danger threshold;

the gravitational potential energy is defined as:

Specifically, the embodiment of the invention provides an improved artificial potential field algorithm to evaluate the condition that a mechanical arm completes a task, and the condition is used as a state value for performing reinforcement learning as a decision model.

The artificial potential field algorithm is one of motion algorithms of a mechanical arm, and the basic idea is that an obstacle generates repulsive force to the obstacle, a destination generates attractive force to the obstacle, and potential energy is the sum of the attractive force and the repulsive force. Traditional gravitational potential energy U_aAnd repulsive force potential energy U_rThe formulas are respectively as follows:

in the formula (1), U_aPotential energy of gravitation, k_pAs a factor of gravity, d_gThe distance between the user and the destination; in the formula (2), U_rIs repulsive potential energy, eta is repulsive force factor, rho is the distance between the mechanical arm and the nearest barrier, rho₀Is the distance threshold between the mechanical arm and the obstacle.

However, the repulsive potential energy of the conventional artificial potential field is the distance ρ between the nearest obstacle and the conventional artificial potential field, but many times, the obstacle far away from the nearest obstacle may have a collision risk because of its large volume, so a risk factor t is designed herein for determining the risk level of the obstacle, and the risk factor is defined as follows:

t＝d₀-max(L,W) (3)

in the formula (3), d₀Indicating the distance of the arm itself from the obstacle, L and W indicating the length and width of the obstacle, respectively.

Calculating the risk factor of each obstacle and taking the minimum value t^*The repulsive potential is replaced as follows:

in the formula (4), t₀Is a risk threshold.

In addition, in case the robot operates in a complicated environment, that is, the destination of the movement of the robot is not a certain point but a region, in order to more accurately define the potential energy of the destination region, the gravitational potential energy is defined as follows:

in the formula (5), the reaction mixture is,

the distance between the moving platform of the mechanical arm and the destination is a threshold value, and if the distance between the moving platform and the destination is smaller than the threshold value, the gravitational potential energy is 0.

And finally, taking the potential energy sum of the attraction force and the repulsion force as an evaluation index of the motion action of the mechanical arm:

U＝U′_r+U′_a (6)

the larger the U value, the worse the completion of the operation, and the smaller the U value, the better the completion of the operation. The evaluation index is used as a state value of the decision model, so that the model can conveniently carry out action decision design.

Step S3: inputting the mechanical arm data, the obstacle data and the action evaluation index value as state values into a preset decision model, and selecting the movement action of the mechanical arm, the continuous speed value of the movement action, the yaw action and the continuous angular speed value of the yaw action through the decision model; and the decision model adopts an action strategy function based on normal distribution.

As an improvement, the action policy function is specifically:

acquiring mechanical arm data and obstacle data at multiple moments;

It should be noted that the basic idea of the reinforcement learning algorithm is to acquire rewards by interacting with the environment, and thus to learn itself. The reinforcement learning algorithm includes several main components, namely agent, environment, status, action, and reward. If the agent is in state S at time t_tAction A can be selected according to the current policy function π_tThis action affects the environment and gets a return reward R at the next moment_t+1The sequence of trajectories interacting with the environment may be represented as: s₀，A₀，R₁，S₁，A₁，R₂，…，S_t，A_t，R_t+1。

Among the goals of agent training is to maximize turn awards earned, i.e., the expected return G_tThe reward represents the accumulated award weight after the end of the round, G_tThe definition formula is:

in equation (7), γ is a discount factor having a value less than 1.

For evaluating the quality of a certain state or action, a state cost function V is introduced_π(s) and a motion cost function Q_π(s, a), the equations are as follows:

V_π(s)＝E_π[G_t|S_t＝s] (8)

Q_π(s，a)＝E_π[G_t|S_t＝s，A_t＝a] (9)

the state cost function and the action cost function reflect the average expected return value of the rounds available in the current state or action, and therefore can be used as decision indexes for reinforcement learning. E_π[G_t|S_t＝s]Is shown in state S_tWhen s, the desired reward G_tAverage expected return value of; e_π[G_t|S_t＝s，A_t＝a]Is shown in state S_tModel selection action a ═ s_tIn the case of a probability of a, the desired reward G_tAverage expected return value of.

Further, many reinforcement learning algorithms are currently based on Q-values, with model parameters used to update the action cost function Q_π(s, a), strategy that is, to select the largest Q_π(s, a). Unlike a reinforcement learning algorithm based on a Q value, the PPO algorithm employed in the embodiments of the present invention defines model parameters in a policy function pi (a | s, θ):

π(a|s，θ_t)＝Pr{A_t＝a|S_t＝s，θ_t＝θ} (10)

in formula (10), Pr { A_t＝a|S_t＝s，θ_tθ represents the state S at time t_tParameter vector θ ═ s_tThe model selects action a_tIs the probability of a.

By updating the parameter vector theta_tAnd a better strategy function can be obtained, and the parameter vector updating equation is as follows:

θ_t+1＝θ_t+αδ_tπ(a|s，θ_t) (11)

in the formula (11), α is the learning rate, δ_tIs a reference value solved in the training process.

The purpose of the updating of the parameter vector is to maximize the reward function. The reward function is defined as follows:

in the formula (12), r(s)_t) Is shown at s_tThe reward value under the state, and the reward function for solving the reward value is as follows:

in the formula (13), d_aIndicating the distance of the robot arm to the center of the destination,

c represents a collision detection factor, and is 1 when a collision occurs and 0 when no collision occurs. A negative reward is received when the arm hits an obstacle and a positive reward is received when the destination is reached. In addition to this, a drivable area is defined beyond which a negative reward is also obtained.

Further, in the actual training process, the decision model may perform worse and worse due to the set learning rate α being inappropriate. To solve this problem, the PPO algorithm defines a dominant function A_π(s_t，a_t)：

A_π(s_t，a_t)＝Q_π(s，a)-V_π(s) (14)

Equation (14) represents the difference between the round awards obtained for act a and the average round awards obtained for the acts. If A_π(s_t，a_t) If > 0, it indicates that action a is better than the average performance.

According to the advantage function, an approximate return function is constructed

In the formula (15), the reaction mixture is,

representing updated policies

Corresponding approximate return function, η (π) represents the return function corresponding to the strategy π before updating,

represents θ before update₀The policy parameters are the corresponding state distributions.

The merit function term in equation (15)

The reward function after the strategy is updated is monotonous and does not decrease, namely the model strategy is better or unchanged. The optimal model strategy can be screened out through the formula (15).

Further, in order to solve the problem of continuity of output actions of the mechanical arm in the movement process, on the basis of a PPO algorithm, according to a PPO action strategy function of a formula (10), the embodiment of the invention redefines the action strategy function by using a normal distribution function as follows:

based on the definition of the action strategy function of equation (16), the mechanical arm movement velocity and yaw rate will be sampled from the normal distribution. Compared with discrete sampling, sampling in normal distribution can make the output action numerical value continuous, thereby solving the problem of output action continuity. And the parameters are updated through the formula (11), the expectation and the variance of normal distribution can be changed, so that the sampling probability of the action is changed, the good action sampling probability is higher, the bad action sampling probability is lower, and a better decision model is obtained.

However, considering that the moving speed and the angular speed of the mechanical arm have upper limit values, if the normal distribution is not limited, most of the sampling actions exceed the upper limit values, which means that many sampling actions will not affect the updating of the model, and further, the convergence speed of the model is slow. The normal distribution expectation is therefore limited using the tanh activation function. Since the tanh activation function equation takes the value (-1,1), it is multiplied by the desired factor delta_μThe desirability of policy distribution may be limited, defined as follows:

μ(s,θ)＝δ_μtanh(x) (17)

the desired factor delta in the formula (17)_μAnd taking values as upper limit values of the moving speed and the yaw velocity of the mechanical arm.

Fig. 2 is a schematic diagram of a network architecture of a preferred embodiment of the decision model provided in the present invention.

As one improvement, the decision model comprises an input layer 201, a full connection layer 202, a selection network 203 and an evaluation network 204;

the input layer 201 is used to encode a state space sequence (S)₁,S₂,…,S_t) Input to the full connection layer 202, S_tRepresents the state value at time t;

the fully-connected layer 202 comprises a first active layer 301 and a second active layer 302;

the number of the nodes of the first active layer 301 is 256, one node corresponds to one moving action or one yawing action of the mechanical arm, and according to the state space sequence, the first active layer 301 selects an evaluation function corresponding to the moving action or the yawing action of the mechanical arm, wherein the evaluation function comprises a state cost function and an action cost function;

the second active layer 302 is configured to respectively limit normal distribution expectation of the moving speed of the mechanical arm and normal distribution expectation of the yaw rate of the mechanical arm by using a tanh active function according to the state space sequence, and respectively obtain speed strategy distribution expectation and angular velocity strategy distribution expectation;

the evaluation network 204 is configured to estimate the evaluation function according to the state space sequence, obtain an advantage function according to the estimated evaluation function, obtain a report value according to the advantage function, and update the parameter vector value according to the report value;

the selection network 203 constructs the moving action strategy function, the normal distribution of the moving velocity, the yaw action strategy function, and the normal distribution of the angular velocity according to the velocity strategy distribution expectation and the angular velocity strategy distribution expectation obtained by the second active layer 302, selects the moving action of the robot arm according to the moving action strategy function, samples the moving velocity according to the normal distribution of the moving velocity by the moving action, selects the continuous velocity value, selects the yaw action of the robot arm according to the yaw action strategy function, samples the yaw velocity according to the normal distribution of the yaw velocity, and obtains the continuous angular velocity value.

Specifically, according to the definition of the action policy function of equation (16), the network framework of the decision model proposed in the embodiment of the present invention is shown in fig. 2, and includes an input layer 201, a fully-connected layer 202, a selection network 203, and an evaluation network 204.

The input layer is a state space sequence (S)₁,S₂,…,S_t) Respectively, into the layers to be activated of the selection network 203 and the evaluation network 204. The to-be-activated layer of the selection network 203 and the evaluation network 204 is called a full connection layer 202, the first layer has 256 nodes, a proper evaluation function is selected as an activation function according to the operation purpose of the mechanical arm, and when the function is applied, namely activated, the function enters the evaluation network 204; the second layer has 128 nodes and the activation function is tanh, and similarly, when the function is activated, the selected network 203 is entered.

The selection network 203 is used to select the desired μ (s, θ) and variance σ (s, θ) of the strategy distribution π (x | s, θ) to construct a normal distribution of the robot arm's movement velocity v and a normal distribution of the yaw rate ω of the turn, according to which the decision-making actions are sampled.

The evaluation network 204 is used to estimate an evaluation function, which can calculate an advantage function A according to equations (14) and (15)_π(s_t,a_t) Further obtain a report function

And participate in updating parameters of selecting network strategy distribution, and gradually screen out, in the motion scene of the mechanical arm, the mechanical arm can obtain a higher return function only by selecting proper moving speed v and yaw angular velocity omega

By adopting the mechanical arm autonomous movement planning method provided by the embodiment of the invention, the action strategy function based on normal distribution is adopted as the action strategy function of the decision model, and the action of the mechanical arm is decided based on the action strategy function based on normal distribution, the decision action can sample the moving speed and the yaw angular speed from the normal distribution so as to output actions with continuous moving speed values and continuous swinging angular speed values, so that the operation accuracy of the mechanical arm in continuous movement is improved, and the mechanical arm is suitable for complex actual environments.

a data obtaining module 401, configured to obtain mechanical arm data and obstacle data, where the mechanical arm data includes a position coordinate value of a mechanical arm, a current movement speed value, and a current yaw speed value, and the obstacle data includes a position coordinate value and size data of an obstacle;

an action evaluation index value acquisition module 402, configured to obtain an action evaluation index value by using an artificial potential field algorithm according to the mechanical arm data and the obstacle data;

a motion decision module 403, configured to input the mechanical arm data, the obstacle data, and the motion evaluation index value as state values into a preset decision model, and select a movement motion of the mechanical arm, a continuous velocity value of the movement motion, a yaw motion, and a continuous angular velocity value of the yaw motion through the decision model; and the decision model adopts an action strategy function based on normal distribution.

As an improvement, the action policy function is specifically:

As an improvement, the action decision module 403 is further configured to:

As an improvement, the apparatus for planning autonomous movement of a mechanical arm further includes a decision model obtaining module 404, configured to:

acquiring mechanical arm data and obstacle data at multiple moments;

As an improvement, the action evaluation index value acquisition module 402 is further configured to:

wherein the risk factor is defined as:

t＝d₀-max(L,W)

the repulsive potential energy is defined as:

the gravitational potential energy is defined as:

As an improvement, the data obtaining module 401 is further configured to:

It should be noted that the apparatus for planning autonomous motion of a mechanical arm according to the embodiment of the present invention can implement all the processes of the method for planning autonomous motion of a mechanical arm according to any one of the embodiments, and the functions and technical effects of the modules in the apparatus are respectively the same as those of the method for planning autonomous motion of a mechanical arm according to the embodiment, and are not described herein again.

A third aspect of the embodiments of the present invention provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the method for planning autonomous movement of a robot arm according to any one of the embodiments of the first aspect.

The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory. The terminal device may also include input and output devices, network access devices, buses, etc.

The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. The general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is the control center of the terminal device and connects the various parts of the whole terminal device using various interfaces and lines.

The memory may be used for storing the computer programs and/or modules, and the processor may implement various functions of the terminal device by executing or executing the computer programs and/or modules stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, where the computer-readable storage medium includes a stored computer program, where the computer program, when running, controls an apparatus where the computer-readable storage medium is located to perform the method for planning autonomous movement of a robot arm according to any one of the embodiments of the first aspect.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary hardware platform, and may also be implemented by hardware entirely. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments of the present invention.

While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims

1. A method for planning autonomous motion of a mechanical arm is characterized by comprising the following steps:

2. The method for planning the autonomous motion of a mechanical arm according to claim 1, wherein the action strategy function is specifically:

3. The method for planning the autonomous motion of the mechanical arm according to claim 2, wherein the selecting, by the decision model, the moving motion of the mechanical arm, the continuous velocity value of the moving motion, the yaw motion, and the continuous angular velocity value of the yaw motion comprises:

4. The autonomous motion planning method for a robot arm according to claim 2, wherein the method pre-acquires the decision model by:

acquiring mechanical arm data and obstacle data at multiple moments;

5. The autonomous motion planning method for a robot arm according to claim 3, wherein the decision model comprises an input layer, a full connection layer, a selection network and an evaluation network;

the input layer is used for converting a state space sequence (S)₁，S₂，…，S_t) Input to the full connection layer, S_tRepresents the state value at time t;

6. The method for planning autonomous motion of a mechanical arm according to claim 1, wherein obtaining an action evaluation index value by using an artificial potential field algorithm according to the mechanical arm data and the obstacle information data specifically comprises:

wherein the risk factor is defined as:

t＝d₀-max(L，W)

t represents the value of a risk factor, d0 represents the distance of the robotic arm from the obstacle, L, W represents the length, width, respectively, of the obstacle;

the repulsive potential energy is defined as:

the gravitational potential energy is defined as:

7. The method for planning autonomous motion of a robotic arm of claim 1, wherein said acquiring robotic arm data and obstacle data specifically comprises:

8. An autonomous movement planning apparatus for a robot arm, comprising:

9. A terminal device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the method for planning autonomous movement of a robot arm according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method for planning autonomous movement of a robot arm according to any of claims 1 to 7.