CN114918912A

CN114918912A - Control method and device of mechanical arm, electronic equipment and storage medium

Info

Publication number: CN114918912A
Application number: CN202210225857.8A
Authority: CN
Inventors: 吴悦晨; 陈赢峰; 范长杰; 胡志鹏
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Lingdong Hangzhou Technology Co ltd
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-08-19
Anticipated expiration: 2042-03-07
Also published as: CN114918912B

Abstract

The application discloses a control method and a control device for a mechanical arm, electronic equipment and a storage medium, wherein the control method for the mechanical arm comprises the following steps: acquiring a first state and a first control signal sequence corresponding to the current time step of the mechanical arm, wherein each control signal in the first control signal sequence is used for driving the mechanical arm to move at a plurality of time steps; on the basis of the first state, sequentially inputting each control signal in the first control signal sequence into a state prediction model for predicting the state of the mechanical arm at the next time step to obtain the predicted state of the mechanical arm at each time step; optimizing the first control signal sequence according to the predicted state of the mechanical arm at each time step and a preset target state to obtain a second control signal sequence; controlling the mechanical arm to move according to the second control signal sequence to obtain a second state of the mechanical arm at a time step next to the current time step; the first state is updated according to the second state, and the first control signal sequence is updated according to the second control signal sequence.

Description

Control method and device of mechanical arm, electronic equipment and storage medium

Technical Field

The present application relates to the field of automatic control, and in particular, to a method and an apparatus for controlling a robot arm, an electronic device, and a computer storage medium.

Background

With the development of industrial technology and artificial intelligence, robotic arms have found applications in more and more fields, such as: in construction, the robot is applied to an excavator mechanical arm of an excavator or applied to four-limb control of a robot in the field of robots.

Currently, there are many different controls for robotic arms in different fields, such as: in the aspect of construction, the control of the excavator depends on actual operation experience of operators and a PID control algorithm set for the excavator; in the field of robotics, control of robotic arms consisting of robotic limbs relies on computer control software and artificial control instructions. However, although the control method in the prior art can make the robot arm complete a predetermined movement, the control accuracy is low, and the control information in the movement process of the robot arm cannot be effectively constrained.

Therefore, how to precisely and effectively control the mechanical arm becomes a problem that needs to be solved by those skilled in the art.

Disclosure of Invention

Embodiments of the present application provide a method and an apparatus for controlling a robot arm, an electronic device, and a computer storage medium, so as to solve the above problems in the prior art.

The control method of the mechanical arm provided by the embodiment of the application comprises the following steps:

acquiring a first state and a first control signal sequence corresponding to a mechanical arm to be controlled at a current time step, wherein each control signal in the first control signal sequence is used for driving the mechanical arm to move at a plurality of time steps;

on the basis of the first state, sequentially inputting each control signal in the first control signal sequence into a state prediction model for predicting the state of the mechanical arm at the next time step to obtain the predicted state of the mechanical arm at each time step;

optimizing the first control signal sequence according to the predicted state of the mechanical arm at each time step and a preset target state to obtain a second control signal sequence;

controlling the mechanical arm to move according to the second control signal sequence to obtain a second state of the mechanical arm at a time step next to the current time step;

updating the first state according to the second state and updating the first control signal sequence according to the second control signal sequence.

Optionally, the method further includes:

on the basis of the updated first state, sequentially inputting the updated first control signal into a state prediction model for predicting the state of the mechanical arm at the next time step to obtain the predicted state of the mechanical arm at each time step;

and controlling the mechanical arm to move according to the second control signal sequence to obtain a second state of the mechanical arm at a time step next to the current time step.

Optionally, the sequentially inputting, based on the first state, each control signal in the first control signal sequence into a state prediction model for predicting the state of the mechanical arm at the next time step to obtain the predicted state of the mechanical arm at each time step includes:

inputting the first state and a first control signal in the first control signal sequence into the state prediction model, and obtaining a predicted state of the mechanical arm at a first time step after the current time step, which is output by the state prediction model;

and for the ith time step after the current time step, inputting the predicted state of the ith-1 time step after the current time step and the ith control signal in the first control signal sequence into the state prediction model, and obtaining the predicted state of the mechanical arm after the current time step, which is output by the state prediction model, wherein i is an integer greater than 1.

Optionally, the optimizing the first control signal sequence according to the predicted state of the mechanical arm at each time step and a preset target state to obtain a second control signal sequence includes:

inputting the target state and the predicted state of each time step into a preset loss function to obtain a loss value output by the loss function, wherein the preset loss function is generated based on a difference value between the target state and the predicted state of each time step;

and optimizing the first control signal sequence based on the loss value to obtain the second control signal sequence.

Optionally, controlling the mechanical arm to move according to the second control signal sequence, including:

acquiring a first control signal in the second control signal sequence;

controlling the mechanical arm to move based on a first control signal in the second control signal sequence.

Optionally, the updating the first control signal sequence according to the second control signal sequence includes:

acquiring the rest control signals except the first control signal in the second control signal sequence;

generating a random control signal;

combining the other control signals except the first control signal in the second control signal sequence with the random control signal to obtain a target control signal sequence;

and replacing the first control signal sequence with the target control signal sequence to obtain an updated first control signal sequence.

Optionally, the states of the mechanical arm include: the angle of the mechanical arm joint and the angular velocity of the mechanical arm joint movement are respectively configured with corresponding weights;

the step of inputting the target state and the predicted state of each time step into a preset loss function to obtain a loss value output by the loss function includes:

determining the sum of first differences between the predicted angle of the mechanical arm joint in the predicted state of each time step and the target angle in the target state through the preset loss function; and the number of the first and second groups,

determining the sum of second difference values between the predicted angular velocity of the mechanical arm joint in the predicted state of each time step and the target angular velocity in the target state through the preset loss function;

and obtaining the loss values of the predicted angle and the predicted angular velocity of the mechanical arm joint output by the loss function according to the weight corresponding to the angle, the weight corresponding to the angular velocity, the sum of the first difference values and the sum of the second difference values.

Optionally, the optimizing the first control signal sequence based on the loss value to obtain the second control signal sequence includes:

and when the loss value is larger than a preset loss threshold value, optimizing the first control signal sequence according to the derivative gradient change of the loss function to obtain the second control signal sequence.

Optionally, the method further includes:

acquiring a first state and a first control signal sequence corresponding to each time step;

taking the first state and a first control signal sequence as training samples of the state prediction model;

performing iterative training on the state prediction model based on the training samples to obtain a trained state prediction model;

updating the state prediction model based on the trained state prediction model.

This application provides a controlling means of arm simultaneously, includes:

the control method comprises the steps of obtaining a first state and a first control signal sequence corresponding to a current time step of a mechanical arm to be controlled, wherein each control signal in the first control signal sequence is used for driving the mechanical arm to move in a plurality of time steps;

a prediction unit configured to sequentially input, based on the first state, each control signal in the first control signal sequence to a state prediction model for predicting a state of the robot arm at a next time step, and obtain a predicted state of the robot arm at each time step;

the optimization unit is used for optimizing the first control signal sequence according to the predicted state of the mechanical arm at each time step and a preset target state to obtain a second control signal sequence;

the control unit is used for controlling the mechanical arm to move according to the second control signal sequence to obtain a second state of the mechanical arm at a time step next to the current time step;

and the updating unit is used for updating the first state according to the second state and updating the first control signal sequence according to the second control signal sequence.

This application provides an electronic equipment simultaneously, includes:

a processor;

a memory for storing a program of the method, which program, when read and executed by the processor, performs the above method.

The present application also provides a computer storage medium storing a computer program which, when executed, performs the above method.

Compared with the prior art, the method has the following advantages:

according to the control method of the mechanical arm, states of the mechanical arm at a plurality of time steps are predicted by combining a first state and a first control signal sequence of the mechanical arm to be controlled at the current time step through a mechanical arm state prediction model, and the predicted state of the mechanical arm at each time step is obtained; then, optimizing the first control signal sequence according to the predicted state of each time step and a target state preset for the mechanical arm to obtain an optimized second control signal sequence, and controlling the mechanical arm to move according to the second control signal sequence to obtain a second state of the mechanical arm; and finally, updating the first state of the mechanical arm at the current time step through the second state, and updating the first control signal sequence through the second control signal sequence, so that the state of the mechanical arm is further controlled by repeating the processes. The method can enable the mechanical arm to move from the current state to the preset target state stably, and realizes accurate and effective control over the movement of the mechanical arm.

Drawings

Fig. 1a is a control flowchart of an excavator mechanical arm according to an embodiment of the present disclosure;

FIG. 1 is a flow chart of a method for controlling a robotic arm according to another embodiment of the present disclosure;

FIG. 2 is a schematic view of a mechanical structure of an excavator according to another embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a PWM signal according to another embodiment of the present application;

fig. 4 is a flowchart of a method for obtaining a sub-state of a mechanical arm corresponding to each time step according to another embodiment of the present disclosure;

FIG. 5 is a schematic structural diagram of a control device of a robot arm according to another embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to another embodiment of the present application.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit and scope of this application, and thus this application is not limited to the specific implementations disclosed below.

The application provides a control method of a mechanical arm, which is characterized in that: optimizing a first control signal sequence for driving the mechanical arm to move at a plurality of time steps through a state prediction model for predicting the state of the mechanical arm at the next time step and combining a first state of the mechanical arm at the current time step to obtain a second control signal sequence, and further controlling the mechanical arm to move from the current time step to the next time step through the second control signal sequence to obtain a second state of the next time step; furthermore, the first state of the current time step is updated through the second state, and the first control signal sequence is updated according to the second control signal sequence, so that the process is circulated, the mechanical arm continues to move, and the control flow is completed.

For the convenience of understanding of the present application, first, a control method of the robot arm is described with reference to a specific use scenario.

The specific use scene of arm that this application embodiment provided is the use of excavator arm. Referring to fig. 1a, fig. 1a is a control flowchart of an excavator robot provided in an embodiment of the present application.

As shown in fig. 1a, the control flow for the excavator robot includes: the system comprises an excavator mechanical arm joint control module 101a and a model prediction optimization module 102a, wherein the model prediction optimization module 102a comprises: a model prediction submodule 102-1a and a control signal optimization module 102-2 a.

Firstly, in order to realize accurate control of the mechanical arm of the excavator, firstly, a preset first control signal sequence is provided for the model prediction optimization module 102a, and each control signal in the first control signal sequence is used for driving the mechanical arm to move at a plurality of time steps.

After receiving the first control signal sequence, the model prediction optimization module 102a inputs the simulation control sequence to the model prediction submodule 102-1a, and the model prediction submodule 102-1a stores a state prediction model for predicting the motion state of the robot arm at a plurality of time steps in the future based on the first state of the robot arm at the current time step and the first control signal sequence.

After the analog prediction sub-module 102-1a receives the first control signal sequence, the following information is obtained: in the case where the robot is controlled based on each control signal in the first control signal sequence, the predicted state of the robot at each time step is assumed to include, for example, the control signals in the first control signal sequence in order: control signal 1, control signal 2, and control signal 3, the predicted state of the mechanical arm at each time step includes: under the control of the first control signal sequence, a predicted state 1 of the robot arm moving from the current time step to a first time step after the current time step, a predicted state 2 of the robot arm moving from the current time step to a second time step after the current time step, and a predicted state 3 of the robot arm moving from the current time step to a third time step after the current time step.

Further, after the predicted state of the mechanical arm at each time step is obtained, the simulation prediction submodule 102-1a sends the predicted state to the control signal optimization module 102-2a, and the control signal optimization module 102-2a optimizes the first control signal sequence according to the predicted state of each time step predicted by the state prediction model and a target state preset for the mechanical arm, so as to obtain an optimized second control signal sequence, wherein a first control signal in the second control signal sequence is used for controlling the mechanical arm to move for one time step.

After the mechanical arm is controlled to move for a time step by a first control signal in the second control signal sequence, obtaining a second state of the mechanical arm, updating a first state corresponding to the current time step by the second state, and inputting the updated first state into the simulation prediction optimization module 102a again; and a random control signal is generated and combined with the other control signals except the first control signal in the second control signal sequence to obtain a target control signal sequence, the first control signal sequence is updated through the target control signal sequence, and the updated first control signal sequence is input into the analog prediction optimization module 102 a.

After receiving the updated first state and the updated first control signal sequence, the simulation prediction optimization module 102a optimizes the updated first control signal sequence through the model prediction submodule 102-1a and the control information optimization module 102-2a in the simulation prediction optimization module 102a, and controls the mechanical arm to move for a time step through the first control signal in the optimized control signal sequence.

According to the method, after the mechanical arm is controlled to move for a time step, the first state and the first control signal sequence of the mechanical arm at the current time step are continuously updated and optimized until the mechanical arm moves for a plurality of time steps, so that the first state of the mechanical arm at the current time step is as close to the predicted target state as possible.

It should be understood that the above description of the usage scenario of the robot arm control method provided in the present application is only for facilitating understanding of the present application, and the robot arm control method provided in the present application may also be applied to other scenarios, for example, the control method is used in the field of robots to control the motion of a robot arm, and the like. The present application is not limited thereto.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for controlling a robot according to another embodiment of the present disclosure.

As shown in fig. 1, the method includes the following steps S101 to S105.

Step S101, acquiring a first state and a first control signal sequence corresponding to a mechanical arm to be controlled at a current time step, wherein each control signal in the first control signal sequence is used for driving the mechanical arm to move at a plurality of time steps.

In the embodiment of the present application, the robot arm may be a robot arm applied to different fields, for example: in industry, the mechanical arm can be a movable arm, a bucket rod and a bucket which are arranged on the excavator body and connected in sequence; in the field of artificial intelligence, the mechanical arm may be a mechanical arm of an intelligent robot or the like. The present application is not limited thereto.

The first state of the mechanical arm corresponding to the current time step may be understood as the current state of the mechanical arm at the current time step, for example, before the mechanical arm is controlled to move, the initial state of the mechanical arm, or the motion state of the mechanical arm at the current moment in the motion process.

In an alternative implementation of the embodiments of the present application, the state of the robotic arm includes an angle and an angular velocity at a joint of the robotic arm. To facilitate understanding of the state of the robot arm, the angle and angular velocity of the robot arm will be described below with reference to fig. 2.

Referring to fig. 2, fig. 2 is a schematic view of a mechanical structure of an excavator according to another embodiment of the present disclosure.

In fig. 2, the method includes: the excavator comprises an excavator body 201, and a boom 202, an arm 203 and a bucket 204 which are sequentially connected and mounted on the excavator body 201. The boom 202, the arm 203, and the bucket 204 together constitute a robot arm of the excavator.

As shown in fig. 2, the angles of the excavator arm include: an included angle α formed between the movable arm 202 and the excavator body 201, an included angle β formed between the arm 203 and the movable arm 202, and an included angle θ formed between the bucket 204 and the arm 203, and the angular velocity of the arm of the excavator is also the angular velocity of the included angle in the motion process of each time step of the arm.

As can be seen from fig. 2, in the case where the fixed position of the robot arm is determined, it is known that the angle and angular velocity of each joint are actually the position and moving direction of the robot arm.

Further, in an optional embodiment of the present application, the state of the robot arm at a certain time step (including the first state corresponding to the current time step) may be obtained by a sensor mounted on the robot arm, for example: the angle sensor can be obtained by an angular displacement sensor and a photoelectric angle sensor which are arranged on a mechanical arm.

In another alternative embodiment of the present application, the state of the robot arm may also adopt pose information of the robot arm, and further, the pose information may be obtained by detecting an IMU (Inertial Measurement Unit) and a robot arm image sensing device installed in the robot arm. Specifically, parameters such as speed, direction, rotation angle, acceleration, gravity and the like of each joint of the mechanical arm can be measured through the IMU, and then the pose information of each joint of the mechanical arm is determined by combining an image of the mechanical arm captured by the image sensing equipment; the image sensing device may be a 2D/3D camera or other optical image capturing device.

The first control signal sequence in step S101 is used to drive the mechanical arm to move, specifically, the first control signal sequence includes control signals of a plurality of adjacent time steps, where each control signal is used to control the mechanical arm to move from a state corresponding to one time step to a state corresponding to the next time step. For example, if the mechanical arm is an arm of a certain robot, and the arm is in a flat lifting state (a first state) at a current time step, if n time steps are required for completing from the flat lifting state to the hand lifting state by the arm, a control signal sequence required for the arm from the flat lifting state to the hand lifting state includes n control signals, and further, if the control signal is u, a corresponding control signal sequence is (u) if the control signal is u ₀ ，u ₁ ，u ₂ ，u ₃ …，u _n-1 )。

In an alternative embodiment of the present application, PWM (pulse width modulation) may be adopted as each control signal in the first control signal sequence.

Specifically, PWM is a method of digitally encoding the level of an analog signal by a high resolution counter and the duty cycle of a square wave to encode the level of a particular analog signal. In practice, the PWM signal is still digital, since at any given moment, the full-amplitude dc supply is either completely present (ON) or completely absent (OFF). The voltage or current source is applied to the analog load as a repetitive sequence of pulses that are either ON or OFF. The on-time is when the dc supply is applied to the load and the off-time is when the supply is disconnected. Any analog value can be encoded using PWM as long as the bandwidth is sufficient.

Please refer to fig. 3, which is a schematic diagram of a PWM signal according to another embodiment of the present application.

Fig. 3 shows three different PWM signals. FIG. 3 (a) is a PWM output with a duty cycle of 10%, i.e., 10% of the time is on and the remaining 90% of the time is off during the signal period; shown in fig. 3 (b) and (c) are PWM outputs with duty cycles of 50% and 90%, respectively. The three PWM outputs encode three different analog signal values having intensities of 10%, 50%, and 90% of the full scale value, respectively. For example, assuming that the power supply is 9V and the duty ratio is 10%, an analog signal with an amplitude of 0.9V is provided.

In practical applications, in order to control the movement of the robot arm, a PWM signal representing a certain amplitude is generally input to a motor for controlling the movement of the robot arm, and then the motor is started to control the movement of the robot arm to a certain extent. It should be noted that, in order to realize the safety control of the mechanical arm, that is, to control the angular velocity of the mechanical arm movement within a certain safety range, each control signal in the first control signal sequence needs to be within a stable PWM signal range, and optionally, in the embodiment of the present application, the PWM signal range of each control signal in the first control signal sequence is between-100 and + 100.

In the embodiment of the present application, the method for controlling a robot arm is to predict the states of the robot arm at different time steps based on the first control signal sequence and the first state, and further to optimally update the first control signal sequence based on the predicted state of the robot arm and a target state preset for the robot arm, so that the robot arm can move from the first state at the current time step to the preset target state as smoothly as possible in the actual movement process. Therefore, it is necessary to predict the predicted state of the robot arm at each time step under the control of the first control signal sequence based on the first state of the robot arm at the current time step and the first control signal sequence.

And step S102, sequentially inputting each control signal in the first control signal sequence into a state prediction model for predicting the state of the mechanical arm at the next time step based on the first state, and obtaining the predicted state of the mechanical arm at each time step.

In the embodiment of the present application, the state prediction model is a neural network obtained by Machine Learning (ML) training, and the machine learning (a multi-domain cross discipline, which relates to multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like) is specially used for researching how a computer simulates or realizes human learning behaviors to obtain new knowledge or skills, and reorganizing an existing knowledge structure to continuously improve new performance of the existing knowledge structure. Machine learning generally includes techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like. Machine learning belongs to a branch of Artificial Intelligence (AI) technology.

In the embodiment of the present application, the state prediction model is configured to predict the state of the robot arm at a time step next to the time step based on the state of the robot arm at the time step and the control signal, that is, the input data of the robot arm state prediction model is the angle and the angular velocity of each joint of the robot arm, and the PWM control signal, and the output data is the predicted value of the angle and the angular velocity of each joint of the robot arm in the future.

Specifically, the state prediction model obtained in the above manner may be represented by the formula x' ═ f (ζ, x, u), where ζ represents a model parameter that determines the accuracy of model prediction; x represents a first state of the robotic arm, i.e., the angle and angular velocity of each joint of the robotic arm at the current time step; and x' is the predicted state of the mechanical arm at the next time step after the current time step, namely the angle and the angular speed of each joint of the mechanical arm at the next time step after the current time step.

Further, after the state prediction model is obtained, the predicted state of the mechanical arm at each time step can be predicted and obtained based on the current state of the mechanical arm and the first control signal sequence.

In the specific application process, the step S102 includes the following steps S102-1 and S102-2:

step S102-1, inputting the first state and a first control signal in the first control signal sequence into the state prediction model, and obtaining a predicted state of the mechanical arm at a first time step after the current time step, which is output by the state prediction model;

step S102-2, inputting the predicted state of the ith-1 time step after the current time step and the ith control signal in the first control signal sequence into the state prediction model, and obtaining the predicted state of the mechanical arm output by the state prediction model at the ith time step after the current time step, wherein i is an integer greater than 1.

Please refer to fig. 4, which is a flowchart illustrating a method for obtaining a sub-state of a robot arm at each time step according to another embodiment of the present application.

That is, if the first state of the robot arm corresponding to the current time step is x ₀ The first control signal sequence is (u) ₀ ，u ₁ ，u ₂ ，u ₃ …，u _n-1 ) Then the state prediction model may be based on the first state x ₀ And the control information u ₀ Predicting the predicted state x of the robot arm at the first time step after the current time step ₁ ；

Further, the state prediction model may be based on a predicted state x corresponding to a 1 st time step after the current time step ₁ And second control information u in said first control signal sequence ₁ Predicting the predicted state x of the robot arm at a second time step after the current time step ₂ ；

And analogizing in sequence until the state prediction model traverses all control information in the first control signal sequence to obtain the prediction result of the state prediction model on the state of the mechanical arm in a plurality of time steps.

And S103, optimizing the first control signal sequence according to the predicted state of the mechanical arm at each time step and a preset target state to obtain a second control signal sequence.

The purpose of step S103 is to find a control signal for controlling the mechanical arm to move for one time step by optimizing the first control signal sequence.

Specifically, the process of optimizing the initialization control sequence of the mechanical arm includes the following steps S103-1 to S103-2:

step S103-1, inputting the target state and the predicted state of each time step into a preset loss function, and obtaining a loss value output by the loss function, wherein the preset loss function is generated based on a difference value between the target state and the predicted state of each time step;

the loss function is used to measure a degree of difference between the predicted state and the target state of the state prediction model, and in this embodiment, the loss function is represented by f ═ f (u), where f is the loss value and u represents the first control signal sequence.

In order to facilitate understanding of the preset loss function provided in the embodiment of the present application, the preset loss function and a process of calculating a loss value by the loss function are described as follows:

here, the state of the robot arm includes: the angle of the mechanical arm joint and the angular velocity of the mechanical arm joint motion are provided with corresponding weights.

Further, the above step S103-1 is the following steps 1 to 3.

Step 1, determining the sum of first difference values between the prediction angle of the mechanical arm joint in the prediction state of each time step and the target angle in the target state through the preset loss function; and (c) a second step of,

step 2, determining the sum of second difference values between the predicted angular velocity of the mechanical arm joint in the predicted state of each time step and the target angular velocity in the target state through the preset loss function;

and 3, obtaining the loss value of the predicted angle and the predicted angular velocity of the mechanical arm joint output by the loss function according to the weight corresponding to the angle, the weight corresponding to the angular velocity, the sum of the first difference and the sum of the second difference.

For example: assuming that the predicted states of the mechanical arm at each time step predicted by the state prediction model are respectively (x) ₁ ，y ₁ )、(x ₂ ，y ₂ )、(x ₃ ，y ₃ )，……，(x _n ，y _n ) Wherein n is an integer greater than 1; the preset target state is (X, Y).

The sum of said first differences is equal to:

the sum of said second differences being equal to:

further, assume that the weight assigned to an angle is q ₁ The weight assigned to the angular velocity is q ₂ Then the preset penalty function can be written as the following equation (1):

wherein n in formula (1) represents the number of control signals in the first control signal sequence.

Further, since the state of the robot at one time step is determined by a control signal input to the robot at a time step before the one time step, the above equation (1) is essentially a loss function in which the robot control information is used as an independent variable and the loss value is used as a dependent variable, that is, f (f) (u).

And S103-2, optimizing the first control signal sequence based on the loss value to obtain the second control signal sequence.

The purpose of step S103-2 is to adjust and optimize each time step state of the robot arm, so that the loss value generated in the process of moving the robot arm from the current state to the target state is reduced. Since the state of the robot at each time step is determined by the first control signal sequence, it is essential to optimize the first control signal sequence.

Specifically, the step S103-2 includes: and when the loss value is larger than a preset loss threshold value, optimizing the first control signal sequence according to the derivative gradient change of the loss function to obtain the second control signal sequence.

And step S104, controlling the mechanical arm to move according to the second control signal sequence, and obtaining a second state of the mechanical arm at a time step next to the current time step.

In the embodiment of the present application, it is considered that the first control signal in the second control signal sequence may be used to control the robot arm to move to the next time step.

Based on this, the above step S104 includes the following steps S5 and S6:

step 5, acquiring a first control signal in the second control signal sequence;

and 6, controlling the mechanical arm to move based on the first control signal in the second control signal sequence.

Specifically, after the mechanical arm is controlled to move for a time step by a first control signal in the second control signal sequence, a second state of the mechanical arm at a time step next to the current time step may be obtained by a sensor or a sensing device mounted on the mechanical arm.

Step S105, updating the first state according to the second state, and updating the first control signal sequence according to the second control signal sequence.

Further, after the robot arm moves from the current time step to the next time step, in order to enable the robot arm to completely execute the expected movement process, the preset target state can be smoothly approached after moving from the first state for a plurality of time steps, the first state of the current time step needs to be updated (that is, the second state is taken as the new first state), and the first control signal sequence needs to be updated based on the optimized second control signal sequence.

In an optional embodiment of the present application, the updating the first control signal sequence according to the second control sequence comprises the following steps S7 to S10:

step 7, obtaining the other control signals except the first control signal in the second control signal sequence;

step 8, generating a random control signal;

step 9, combining the rest control signals except the first control signal in the second control signal sequence with the random control signal to obtain a target control signal sequence;

and 10, replacing the first control signal sequence with the target control signal sequence to obtain an updated first control signal sequence.

For example, assume that the second control signal sequence is (k) ₁ 、k ₂ 、……、k _n-1 ) Then a control signal sequence (k) consisting of other control signals than k1 is selected ₂ 、k ₃ ……、k _n-1 ) Then, after randomly generating a random control signal k ', k' is put into the AND control signal sequence (k) ₂ 、k ₃ ……、k _n-1 ) To obtain a target control signal sequence (k) ₂ 、k ₃ ……、k _n-1 K'), taking the target control signal as a new first control signal sequence.

In another optional embodiment of the present application, the implementation manner of updating the first control signal sequence according to the second control sequence may not be limited to the method provided in the foregoing step 7 to step 10, for example, the foregoing step of generating a random control signal may also be omitted, and the remaining control signals except the first control signal in the second control signal sequence are selected as the target control signal sequence; or, a random control signal sequence is generated, and the random control signal sequence is used as the target control signal sequence, and so on, which is not limited in this application.

Further, after the above steps S101 to S105 are completed, that is, the movement of the robot arm within a time step is completed, in order to move the robot arm toward the preset target state, the method further includes the following steps S11 to S13:

step S11, based on the updated first state, sequentially inputting the updated first control signal into a state prediction model for predicting the state of the mechanical arm at the next time step, and obtaining the predicted state of the mechanical arm at each time step;

step S12, optimizing the first control signal sequence according to the predicted state and the preset target state of the mechanical arm at each time step to obtain a second control signal sequence;

and step S13, controlling the mechanical arm to move according to the second control signal sequence, and obtaining a second state of the mechanical arm at a time step next to the current time step.

That is to say, after the first state and the first control signal sequence are updated, the method from step S101 to step S105 is performed based on the updated first state and the updated first control signal sequence, so that the robot arm continues to move relative to the current time step on the basis of the updated first state and the updated first control signal sequence until the robot arm completes a motion process from the current time step to a time step next to the current time step according to the second control signal sequence, and so on until the robot arm moves for a preset time step.

Further, in order to facilitate understanding of the control method of the robot arm provided in the present application, a detailed description of the control method is provided below as a specific application of the method.

First, assume that the first control signal sequence is u _s ＝(u ₀ ，u ₁ ，…，u ₉ ) First state x of the current time step ₀ The preset target state is x _n ；

Said first control signal sequence u is transmitted _s And said first state x ₀ After a state prediction model for predicting the state of the mechanical arm at the next step is input, the state prediction model obtains the predicted state of the mechanical arm at each time step in the following way:

x ₀ +u ₀ →x ₁ i.e. the state prediction model is based on a first state x ₀ And a first control signal u in the first control signal sequence ₀ Predicting a predicted state x of a first time step after the current time step ₁ ；

x ₁ +u ₁ →x ₂ I.e. the state prediction model is based on the state x ₁ And a second control signal u in the first control signal sequence ₁ Predicting a predicted state x of a second time step after said current time step ₂ ；

And so on until obtaining the predicted state x10, namely x, of the tenth time step after the current time step ₉ +u ₉ →x ₁₀ 。

In the above manner, the predicted state sequence x output by the state prediction model and used for identifying the predicted state of the mechanical arm at each time step can be obtained _s ＝(x ₁ ，x ₂ ，…，x ₁₀ )。

Further, the prediction state sequence x is _s And a preset target state x _n Inputting a preset loss function, and if a loss value output by the preset loss function is greater than a preset loss threshold value, it indicates that the first control signal sequence needs to be optimized, specifically, the optimization process is to optimize the first control signal sequence according to the derivative gradient change of the loss function to obtain a second control signal sequence;

it is assumed here that the second control signal sequence is u 'after the first control signal sequence has been optimized' _s ＝(u’ ₀ ，u’ ₁ ，…，u’ ₉ )。

Here, u 'is selected' ₀ Further, the control unit controls the robot arm to move from the current time step to a time step next to the current time step as a control signal at a first time step after the current time step, and further assumes that the state of the robot arm is a second state x 'when the robot arm moves from the current time step to the time step next to the current time step' ₁ 。

Since the state of the robot arm has changed, here, the second state x' ₁ Updating a first state of the mechanical arm, taking a second state of the current time step as the first state, wherein the first state is updated to x ₀ ＝x’ ₁ ；

Further, the first control signal sequence u is required to be applied _s Updating is carried out;

in an alternative embodiment of the present application, the first control signal u is adjusted _s The second control signal sequence u 'is required to be adopted in the updating process' _s U 'of' ₁ To u' ₉ And a random control signal u ₁₀ . And updating the first control signal by the control signal, and the updated first control signal sequence u _s ＝(u’ ₁ ，u’ ₂ ，…，u’ ₉ ，u ₁₀ )。

Further, the updated first state x is used ₀ And the updated first control signal u _s Inputting a state prediction model, predicting the state of each mechanical arm at 10 time steps later by the state prediction model, obtaining the predicted state of each time step, repeating the process of optimizing the first control signal sequence by the loss function, and obtaining a second control signal sequence u' _s 。

By the second control signal sequence u' _s The first control signal controls the mechanical arm to move to the current positionThe time step next to the time step.

Analogizing according to the method until the mechanical arm moves for 10 time steps, and according to the control method of the mechanical arm, after the mechanical arm moves for the time steps, the final state of the mechanical arm is based on the preset target state x _n And the motion process of the mechanical arm is stable.

In an optional embodiment of the present application, in order to ensure the accuracy of the state prediction model for predicting the state of the robot arm, the control method of the robot arm provided by the present application further includes a training process for the state prediction model, and specifically, the control method of the robot arm further includes the following steps S14 to S17.

Step S14, acquiring a first state and a first control signal sequence corresponding to each time step;

step S15, using the first state and the first control signal sequence as training samples of the state prediction model;

step S16, performing iterative training on the state prediction model based on the training sample to obtain a trained state prediction model;

step S17, updating the state prediction model based on the trained state prediction model.

In a specific application process, the sample data may be obtained by modifying the mechanical arm, for example, the first state of each time step may be obtained by an angle sensor mounted on the mechanical arm, and the first control signal sequence corresponding to each time step may be obtained by a pulse width modulation signal sensor mounted on the mechanical arm.

In summary, according to the control method of the mechanical arm provided by the application, states of the mechanical arm at multiple time steps are predicted by using a mechanical arm state prediction model and combining a first state and a first control signal sequence of the mechanical arm to be controlled at a current time step, so that a predicted state of the mechanical arm at each time step is obtained; then, optimizing the first control signal sequence according to the predicted state of each time step and a target state preset for the mechanical arm to obtain an optimized second control signal sequence, and controlling the mechanical arm to move according to the second control signal sequence to obtain a second state of the mechanical arm; and finally, updating the first state of the mechanical arm at the current time step through the second state, and updating the first control signal sequence through the second control signal sequence, so that the process is repeated to further control the motion of the mechanical arm. The method can enable the mechanical arm to move from the current state to the preset target state stably, and realizes accurate and effective control over the movement of the mechanical arm.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a control device of a robot arm according to another embodiment of the present disclosure. Since the embodiment of the apparatus is basically similar to the embodiment of the method, the description is simple, and for the relevant points, reference may be made to the partial description of the embodiment of the method.

As shown in fig. 5, the control device for a robot arm according to an embodiment of the present invention includes:

an obtaining unit 501, configured to obtain a first state and a first control signal sequence corresponding to a current time step of a mechanical arm to be controlled, where each control signal in the first control signal sequence is used to drive motion of the mechanical arm at multiple time steps;

a prediction unit 502, configured to sequentially input, based on the first state, each control signal in the first control signal sequence into a state prediction model for predicting a state of the mechanical arm at a next time step, and obtain a predicted state of the mechanical arm at each time step;

an optimizing unit 503, configured to optimize the first control signal sequence according to the predicted state of the mechanical arm at each time step and a preset target state, to obtain a second control signal sequence;

a control unit 504, configured to control the mechanical arm to move according to the second control signal sequence, so as to obtain a second state of the mechanical arm at a time step next to the current time step;

an updating unit 505, configured to update the first state according to the second state, and update the first control signal sequence according to the second control signal sequence.

Optionally, the apparatus is further configured to:

optimizing the first control signal sequence based on the loss value to obtain the second control signal sequence.

Optionally, the controlling the mechanical arm to move according to the second control signal sequence includes:

acquiring a first control signal in the second control signal sequence;

acquiring the other control signals except the first control signal in the second control signal sequence;

generating a random control signal;

Optionally, the states of the mechanical arm include: the method comprises the following steps that the angle of a mechanical arm joint and the angular speed of mechanical arm joint movement are respectively configured with corresponding weights;

the inputting the target state and the predicted state of each time step into a preset loss function to obtain a loss value output by the loss function includes:

Optionally, the apparatus is further configured to:

Please refer to fig. 6, where fig. 6 is a schematic structural diagram of an electronic device according to another embodiment of the present application, and the description of the embodiment of the electronic device is basically similar to that of the embodiment of the method and the embodiment of the apparatus described above, so that the description is relatively simple.

The electronic device includes: a processor 601;

a memory 602 for storing a program of the method, which program when read and executed by the processor 601 performs any of the methods described above.

The embodiment of the application also provides a computer storage medium, wherein the computer storage medium stores a computer program, and the program executes any one method of the embodiment when executed.

It should be noted that, for the detailed description of the computer storage medium provided in this market, reference may also be made to the related description of the above method embodiments, and details are not repeated here.

Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

1. Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media (transmitter 7media), such as modulated data signals and carrier waves.

2. It will be apparent to those skilled in the art that embodiments of the present application may be provided as a system or an electronic device. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Claims

1. A method for controlling a robot arm, comprising:

acquiring a first state and a first control signal sequence corresponding to a current time step of a mechanical arm to be controlled, wherein each control signal in the first control signal sequence is used for driving the mechanical arm to move at a plurality of time steps;

updating the first state according to the second state, and updating the first control signal sequence according to the second control signal sequence.

2. The control method according to claim 1, characterized in that the method further comprises:

3. The control method according to claim 1, wherein the obtaining the predicted state of the robot arm at each time step by sequentially inputting each control signal in the first control signal sequence into a state prediction model for predicting the state of the robot arm at a next time step based on the first state comprises:

4. The control method according to claim 1, wherein the optimizing the first control signal sequence according to the predicted state of the robot arm at each time step and a preset target state to obtain a second control signal sequence comprises:

5. The control method of claim 1, wherein said controlling the robot arm movement according to the second sequence of control signals comprises:

acquiring a first control signal in the second control signal sequence;

6. The control method of claim 5, wherein the updating the first control signal sequence in accordance with the second control signal sequence comprises:

generating a random control signal;

7. The control method according to claim 4, wherein the state of the robot arm includes: the angle of the mechanical arm joint and the angular velocity of the mechanical arm joint movement are respectively configured with corresponding weights;

8. The control method of claim 4, wherein optimizing the first control signal sequence based on the loss value to obtain the second control signal sequence comprises:

9. The control method according to claim 1, characterized by further comprising:

performing iterative training on the state prediction model based on the training sample to obtain a trained state prediction model;

10. A control device for a robot arm, comprising:

the prediction unit is used for sequentially inputting each control signal in the first control signal sequence into a state prediction model for predicting the state of the mechanical arm at the next time step based on the first state to obtain the predicted state of the mechanical arm at each time step;

11. An electronic device, comprising:

a processor;

memory for storing a program of a method, which when read run by a processor, performs the method of any of claims 1-9.

12. A computer storage medium, characterized in that it stores a computer program which, when executed, performs the method of any one of claims 1-9.