CN115256401A - Space manipulator shaft hole assembly variable impedance control method based on reinforcement learning - Google Patents

Space manipulator shaft hole assembly variable impedance control method based on reinforcement learning Download PDF

Info

Publication number
CN115256401A
CN115256401A CN202211038250.5A CN202211038250A CN115256401A CN 115256401 A CN115256401 A CN 115256401A CN 202211038250 A CN202211038250 A CN 202211038250A CN 115256401 A CN115256401 A CN 115256401A
Authority
CN
China
Prior art keywords
impedance
space manipulator
reinforcement learning
training
shaft hole
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211038250.5A
Other languages
Chinese (zh)
Inventor
詹腾达
高鼎峰
余朝宝
周宇航
许铭轩
郭毓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202211038250.5A priority Critical patent/CN115256401A/en
Publication of CN115256401A publication Critical patent/CN115256401A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems

Abstract

The invention discloses a space manipulator shaft hole assembly variable impedance control method for reinforcement learning. The scheme of the invention is based on reinforcement learning to carry out variable impedance control on the shaft hole assembly of the space manipulator, the control can track dynamic force, the dynamic error is smaller than that of the traditional fixed impedance control, the response speed is higher, the influence of uncertain factors in the environment can be effectively weakened, and the tracking precision is better than that of the traditional fixed impedance control.

Description

Space manipulator shaft hole assembly variable impedance control method based on reinforcement learning
Technical Field
The invention belongs to the field of space manipulator control, and particularly relates to a space manipulator shaft hole assembly variable impedance control method based on reinforcement learning.
Background
With the progress and development of space technology, the application of spacecrafts and space stations greatly affects human production and life. Due to the influence of factors such as vacuum and weightlessness in a space environment, a large amount of space debris and garbage float in the space surrounding the earth, the safety of an on-orbit spacecraft and a space station is seriously threatened, and meanwhile, various space facilities inevitably face the problems of equipment aging, faults and the like along with the increase of service time, so that the maintenance work of the space facilities is very necessary.
In the process of completing service tasks such as on-track assembly and the like, the space manipulator inevitably contacts with the external environment to generate force, which puts high requirements on the contact force control of the space manipulator, and meanwhile, in the space environment, various external disturbances such as gravity gradient torque, friction force and the like exist, so that the influence of external interference needs to be overcome. The flexible control can adjust the action of the mechanical arm according to the change of the external environment, and the control precision and the stability of the assembly operation can be effectively improved.
In order to coordinate the contact force between the mechanical arm and the environment, hogen N firstly proposes impedance control, and realizes compliant contact between the robot and the environment by establishing an ideal dynamic relationship between the contact force at the end of the mechanical arm and the deviation between the expected track and the actual track, but constant impedance control is difficult to maintain stable contact force under the condition that the geometric and rigidity parameters of the environment are uncertain. The environment of the space manipulator is complex and changeable when the space manipulator executes tasks, the environment information is difficult to accurately identify, and meanwhile, due to the fact that nonlinear time-varying factors exist in the target environment, the target tasks are difficult to achieve through the impedance control method with fixed parameters. If the impedance control parameters can be dynamically adjusted in real time according to the change of tasks and environments, the control performance is better.
Disclosure of Invention
Based on the above problems, the present invention aims to provide a space manipulator shaft hole assembly variable impedance control method based on reinforcement learning, which can update the parameters of an impedance controller in the interaction with a complex environment, ensure the rapidity of static force response and the accuracy of dynamic force tracking, and realize the flexible control of the space manipulator assembly operation.
The technical scheme for realizing the purpose of the invention is as follows:
a space manipulator shaft hole assembly variable impedance control method based on reinforcement learning comprises the following steps:
step 1, constructing a space manipulator model based on a DH parameter method;
step 2, constructing a conversion model of the joint angle state and the terminal pose of the space manipulator based on a forward and inverse kinematics algorithm;
step 3, initializing internal and external parameters of a binocular camera, acquiring images by using the binocular camera, and acquiring position information of the assembly holes;
step 4, constructing an impedance controller based on reinforcement learning, and setting an impedance parameter action table, a reward function and a suspension condition in the training process according to an expected target;
step 5, training an impedance controller based on a neural network;
and 6, inputting real-time information of the tail end of the mechanical arm, updating the impedance parameters of the impedance controller, outputting the position correction quantity of the tail end of the mechanical arm, and finishing the variable impedance control of the shaft hole assembly of the space mechanical arm.
Compared with the prior art, the invention has the remarkable advantages that:
(1) According to the technical scheme, variable impedance control is performed on the shaft hole assembly of the space manipulator based on reinforcement learning, dynamic force can be tracked through the control, dynamic errors are smaller than those of traditional fixed impedance control, the response speed is higher, and better tracking accuracy is achieved compared with the traditional fixed impedance control;
(2) The technical scheme of the invention realizes variable impedance control in shaft hole assembly of the mechanical arm based on reinforcement learning, can effectively weaken influence of uncertain factors in the environment, and improves accuracy and rapidity of the space mechanical arm on terminal force control.
The present invention is described in further detail below with reference to the attached drawing figures.
Drawings
FIG. 1 is a flowchart of steps of a space manipulator shaft hole assembly variable impedance control method based on reinforcement learning according to the present invention.
Fig. 2 is a schematic structural diagram of an impedance controller based on reinforcement learning according to the present invention.
FIG. 3 is a schematic flow chart of the neural network-based training impedance controller according to the present invention.
FIG. 4 is a schematic diagram of a fully-connected neural network structure according to the present invention.
Figure 5 is a schematic view of a space manipulator assembly in an embodiment of the present invention.
Figure 6 is a schematic representation of a space manipulator simulation in an embodiment of the present invention.
Fig. 7 is a trace diagram of a simulated impedance parameter in an embodiment of the invention.
Figure 8 is a schematic diagram illustrating a simulation of a terminal position trajectory of a spatial robot arm in an embodiment of the present invention.
Figure 9 is a schematic representation of a simulation of the end of space manipulator velocity trajectory in an embodiment of the present invention.
Fig. 10 is a simulation diagram of a static force tracking trajectory of a space manipulator in an embodiment of the present invention.
Fig. 11 is a schematic diagram illustrating a simulation of a dynamic force tracking trajectory of a space manipulator in an embodiment of the present invention.
Detailed Description
A space manipulator shaft hole assembly variable impedance control method based on reinforcement learning comprises the following steps:
step 1, constructing a space manipulator model based on a DH parameter method;
step 2, constructing a conversion model of the joint angle state and the tail end pose of the space manipulator based on a forward and inverse kinematics algorithm;
step 3, initializing internal and external parameters of a binocular camera, acquiring images by using the binocular camera, and acquiring position information of the assembly holes;
step 4, constructing an impedance controller based on reinforcement learning, and setting an impedance parameter action table, a reward function and a suspension condition in the training process according to an expected target, wherein the impedance parameter action table, the reward function and the suspension condition are specifically as follows:
step 4-1, constructing an impedance controller:
the impedance control strategy aims at realizing an ideal dynamic relation between the tail end position of the space robot and the tail end contact force, the relation between a mechanical arm tail end tooling device and an assembly plane is simplified into a spring-mass block-damping model, and the mathematical model is as follows:
Figure BDA0003819517640000031
wherein, x d Respectively representing the actual motion trail and the expected motion trail of the tail end of the space manipulator, F e Representing the force of the end of the arm against the external environment, M d ,K d ,C d Respectively corresponding to an expected inertia matrix, an expected rigidity matrix and an expected damping matrix of the impedance controller;
Figure BDA0003819517640000032
respectively representing the actual acceleration, the expected acceleration, the actual speed and the expected speed of the tail end of the space manipulator, and selecting K from the impedance controller d ,C d As a control quantity, M d Set to a constant value of 1;
step 4-2, the control target of the impedance controller in the application is to track the expected force quickly, so that the speed of the tail end of the mechanical arm quickly approaches to 0, and simultaneously, the overshoot amount in the tracking process of the static force is optimized (the overshoot amount refers to the deviation between the maximum actual force and the expected force of the system, namely the deviation between the sharp top and the expected force);
for this purpose, corresponding reward and punishment are given to the state of the tail end of the mechanical arm in the training process, corresponding positive reward is given when the state of the tail end of the mechanical arm reaches a desired target, an optimal control parameter is found, and a reward function is set:
Figure BDA0003819517640000033
Figure BDA0003819517640000034
wherein T represents the duration of single training, and v represents the tail end speed of the space manipulator;
in the above formula E f Setting the reward function as above for the error value of the expected force and the current time force, T is the current training simulation duration, and the speed value is expected to be capable of rapidly approaching to the range of 0-0.2 in speed.
This function can give greater rewards for smaller force steady state errors and greater penalties as the velocity deviates more from 0.
Step 4-3, considering that if the single change degree of the impedance parameter is too small, the impedance control of the tail end position of the mechanical arm is difficult to achieve a remarkable effect, and if the change range of the impedance parameter is too large, the stability of the impedance control of the tail end of the mechanical arm can be reduced, so that an impedance parameter action table for reinforcement learning is set:
δC d ∈[±2,±1,0],δK d ∈[±5,±4,±3,±2,±1,0]
wherein δ is the delta correction amount of the setting, δ C d As a transformation of the stiffness coefficient, δ K d And selecting corresponding actions in each sampling period for damping transformation quantity, and obtaining an optimal action strategy after multiple times of training.
In addition, the training suspension condition is set as: the training times reach the set threshold value.
Or, when the error value between the expected force and the current time force during training is greater than the set threshold value or the error between the maximum current time force of the system during training and the expected force exceeds the set threshold value, it indicates that the strategy of the training is going to be developed in a divergent direction, and the training is repeated by returning to the central set parameter.
Step 5, training the impedance controller based on the neural network, specifically:
the Q-learning (Q-learning) algorithm is essentially a markov decision process, which performs actions in the current state to find the reward value of the next state, and continuously updates the Q table, and the specific formula is as follows:
Q(s t ,a t )←Q(s t ,a t )+α[r t +γmaxQ(s t+1 ,a)-(s t ,a t )]
the traditional Q learning method updates the current state according to the Q value of the latter state, the method depends on a Q value table, the excessive system state wastes a larger memory space, DDQN adopts a fully-connected neural network 'strategy network' to realize the prediction of the Q value of the current state, the tail end state information of a mechanical arm is input into the 'strategy network' to obtain the Q value of the moment, a 'target network' is introduced to predict the state of the next moment, the mean square error of the difference value of the prediction results of the two neural networks is used as the loss function of the model, and the 'strategy network' is finally updated by back-propagating network parameters according to the following formula.
Specifically, the method comprises the following steps: the method comprises the steps of firstly setting the total times of training, collecting an experience table of the space manipulator in single training, placing the experience table into an experience pool ((namely a queue has the maximum storage length in the queue, and pops up experience with poor performance once the maximum length is exceeded), inputting higher-rewarded experience in the experience pool into a strategy network at intervals together with experience randomly extracted from the experience pool, updating the strategy network through residual errors between predicted values in the strategy network and a target network, setting updating time, replacing the target network with the strategy network once the time is exceeded, realizing target network updating, outputting the action with the highest score through feedback in the environment of the target network, sequentially circulating until the finally set total times of training is greater than a set value, and finishing training.
Further, the strategy network predicts the Q value of the current time in the impedance controller based on reinforcement learning, predicts the Q value of the next time in the impedance controller based on reinforcement learning based on the target network, and takes the mean square error of the difference between the two times as a loss function:
L=Mse(Q(s t ,a)-r-γQ(s t+1 ,a)
where Mse represents the mean square error, Q(s) t And a) represents the Q value at time t, and γ ∈ (0, 1)Represents the decay rate in the learning process, and α ∈ (0, 1) represents the learning rate of the model.
Further, the strategy network adopts a full neural network structure, the position, the speed, the acceleration and the force error information of the tail end of the mechanical arm are used as network input, the number of neurons in a hidden layer is set to be 400, the ReLU function is selected by an activation function, and the Q value of each action at the current moment is output.
The target network adopts a full neural network structure, the position, the speed, the acceleration and the force error information of the tail end of the mechanical arm are used as network input, the number of neurons in a hidden layer is set to be 400, the ReLU function is selected by the activation function, and the Q value of each action at the next moment is output.
The invention improves the updating process of the experience pool, marks and stores the optimal state information generated in the training process, and inputs the optimal state information to the experience pool at intervals of a training period, and the high-profit state information can improve the rapidity of DDQN model convergence.
And 6, inputting real-time information of the tail end of the mechanical arm, updating the impedance parameters of the impedance controller, outputting the position correction quantity of the tail end of the mechanical arm, and finishing the variable impedance control of the shaft hole assembly of the space mechanical arm.
A space manipulator shaft hole assembly variable impedance control system based on reinforcement learning comprises the following modules:
the space manipulator model construction module comprises: the method is used for constructing a space manipulator model based on a DH parameter method;
the terminal pose transformation model building module comprises: the method comprises the steps of constructing a transformation model of the joint angle state and the terminal pose of the space manipulator based on a forward-inverse kinematics algorithm;
position information acquisition module of pilot hole: the device comprises a binocular camera, a positioning device and a control device, wherein the binocular camera is used for initializing internal and external parameters of the binocular camera, acquiring images by using the binocular camera and acquiring position information of an assembly hole;
the impedance controller constructing module: the impedance controller is used for constructing an impedance controller based on reinforcement learning, and setting an impedance parameter action table, a reward function and a suspension condition in the training process according to an expected target;
a training module: training an impedance controller based on a neural network;
the variable impedance control module is assembled in the shaft hole of the space manipulator: the impedance controller is used for inputting real-time information of the tail end of the mechanical arm, updating impedance parameters of the impedance controller, outputting position correction of the tail end of the mechanical arm and finishing variable impedance control of shaft hole assembly of the space mechanical arm.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the following steps when executing the computer program:
step 1, constructing a space manipulator model based on a DH parameter method;
step 2, constructing a conversion model of the joint angle state and the tail end pose of the space manipulator based on a forward and inverse kinematics algorithm;
step 3, initializing internal and external parameters of a binocular camera, acquiring images by using the binocular camera, and acquiring position information of the assembly holes;
step 4, constructing an impedance controller based on reinforcement learning, and setting an impedance parameter action table, a reward function and a suspension condition in the training process according to an expected target;
step 5, training an impedance controller based on a neural network;
and 6, inputting real-time information of the tail end of the mechanical arm, updating the impedance parameters of the impedance controller, outputting the position correction quantity of the tail end of the mechanical arm, and finishing the variable impedance control of the shaft hole assembly of the space mechanical arm.
A computer-storable medium having stored thereon a computer program which, when executed by a processor, performs the steps of:
step 1, constructing a space manipulator model based on a DH parameter method;
step 2, constructing a conversion model of the joint angle state and the terminal pose of the space manipulator based on a forward and inverse kinematics algorithm;
step 3, initializing internal and external parameters of a binocular camera, acquiring images by using the binocular camera, and acquiring position information of the assembly holes;
step 4, constructing an impedance controller based on reinforcement learning, and setting an impedance parameter action table, a reward function and a suspension condition in the training process according to an expected target;
step 5, training an impedance controller based on a neural network;
and 6, inputting real-time information of the tail end of the mechanical arm, updating the impedance parameters of the impedance controller, outputting the position correction quantity of the tail end of the mechanical arm, and finishing the variable impedance control of the shaft hole assembly of the space mechanical arm.
The present invention will be further described with reference to the following examples.
Examples
With reference to fig. 1, a space manipulator shaft hole assembly variable impedance control method based on reinforcement learning includes the following steps:
step 1, constructing a space manipulator model based on a DH parameter method;
step 2, constructing a conversion model of the joint angle state and the terminal pose of the space manipulator based on a forward and inverse kinematics algorithm;
step 3, initializing internal and external parameters of a binocular camera, acquiring images by using the binocular camera, and acquiring position information of the assembly holes;
step 4, constructing an impedance controller based on reinforcement learning, and setting an impedance parameter action table, a reward function and a suspension condition in the training process according to an expected target, wherein the impedance parameter action table, the reward function and the suspension condition are specifically as follows:
step 4-1, constructing an impedance controller:
with reference to fig. 2 and 3, the objective of the impedance control strategy is to achieve an ideal dynamic relationship between the terminal position and the terminal contact force of the space robot, and the patent simplifies the relationship between the terminal tooling device of the mechanical arm and the assembly plane into a spring-mass-damping model, and the mathematical model is as follows:
Figure BDA0003819517640000071
wherein, x d Respectively representing the actual motion trail and the expected motion trail of the tail end of the space manipulator F e Representing the end of a robot arm and the external environmentActing force of (M) d ,K d ,C d Respectively corresponding to an expected inertia matrix, an expected rigidity matrix and an expected damping matrix of the impedance controller;
Figure BDA0003819517640000072
respectively representing the actual acceleration, the expected acceleration, the actual speed and the expected speed of the tail end of the space manipulator, and selecting K from the impedance controller d ,C d As a control quantity, M d Set to a constant value of 1;
step 4-2, the control target of the impedance controller in the application is to track the expected force quickly, so that the speed of the tail end of the mechanical arm quickly approaches to 0, and simultaneously, the overshoot amount in the tracking process of the static force is optimized (the overshoot amount refers to the deviation between the maximum actual force and the expected force of the system, namely the deviation between the sharp top and the expected force);
for this purpose, corresponding reward and punishment are given to the state of the tail end of the mechanical arm in the training process, corresponding positive reward is given when the state of the tail end of the mechanical arm reaches a desired target, an optimal control parameter is found, and a reward function is set:
Figure BDA0003819517640000073
Figure BDA0003819517640000074
wherein T represents the duration of a single training, E f And setting the reward function as above for the error value of the expected force and the current moment force, wherein T is the current simulation duration, and the speed value is expected to be capable of rapidly approaching to the range of 0-0.2 in speed.
This function can give greater rewards for smaller force steady state errors and greater penalties as the velocity deviates more from 0.
Step 4-3, considering that if the single change degree of the impedance parameter is too small, the impedance control of the tail end position of the mechanical arm is difficult to achieve a remarkable effect, and if the change range of the impedance parameter is too large, the stability of the impedance control of the tail end of the mechanical arm is reduced, so that an impedance parameter action table for reinforcement learning is set:
δC d ∈[±2,±1,0],δK d ∈[±5,±4,±3,±2,±1,0]
and selecting corresponding action in each sampling period, and obtaining an optimal action strategy after multiple times of training.
In addition, the training suspension condition is set as: the training times reach the set threshold value.
Or, when the error value between the expected force and the current time force in the training is larger than the set threshold or the error between the maximum current time force and the expected force of the system in the training exceeds the set threshold, the strategy of the training is shown to be developed in the direction of dispersion, and the training is repeated by returning to the central set parameter.
Step 5, training the impedance controller based on the neural network, specifically:
the Q-learning (Q-learning) algorithm is essentially a markov decision process, which performs actions in the current state to find the reward value of the next state, and continuously updates the Q table, and the specific formula is as follows:
Q(s t ,a t )←Q(s t ,a t )+α[r t +γmaxQ(s t+1 ,a)-(s t ,a t )]
the traditional Q learning method updates the current state according to the Q value of the next state, the method depends on a Q value table, but more memory space can be wasted by excessive system states, DDQN adopts a fully-connected neural network strategy network to predict the Q value of the current state, the tail end state information of the mechanical arm is input into the strategy network to obtain the Q value of the moment, a target network is introduced to predict the state of the next moment, the mean square error of the difference of the prediction results of the two neural networks is used as the loss function of the model, and the strategy network is finally updated by back propagation network parameters according to the following formula.
Specifically, the method comprises the following steps: the method comprises the steps of firstly setting the total times of training, collecting an experience table of the space manipulator in single training, placing the experience table into an experience pool ((namely a queue has the maximum storage length in the queue, and pops up experience with poor performance once the maximum length is exceeded), inputting higher-rewarded experience in the experience pool into a strategy network at intervals together with experience randomly extracted from the experience pool, updating the strategy network through residual errors between predicted values in the strategy network and a target network, setting updating time, replacing the target network with the strategy network once the time is exceeded, realizing target network updating, outputting the action with the highest score through feedback in the environment of the target network, sequentially circulating until the finally set total times of training is greater than a set value, and finishing training.
Further, the strategy network predicts the Q value of the current time in the impedance controller based on reinforcement learning, predicts the Q value of the next time in the impedance controller based on reinforcement learning based on the target network, and takes the mean square error of the difference between the two times as a loss function:
L=Mse(Q(s t ,a)-r-γQ(s t+1 ,a)
where Mse represents the mean square error, Q(s) t A) represents a Q value at time t, γ ∈ (0, 1) represents a decay rate during learning, and α ∈ (0, 1) represents a learning rate of the model.
Further, with reference to fig. 4, the policy network adopts an all-neural network structure, the position, the speed, the acceleration and the force error information of the end of the mechanical arm are used as network inputs, the number of neurons in the hidden layer is set to 400, the ReLU function is selected as the activation function, and the Q value of each action at the current moment is output.
The target network adopts a full neural network structure, the position, the speed, the acceleration and the force error information of the tail end of the mechanical arm are used as network input, the number of neurons in a hidden layer is set to be 400, the ReLU function is selected by the activation function, and the Q value of each action at the next moment is output.
The invention improves the updating process of the experience pool, marks and stores the optimal state information generated in the training process, and inputs the optimal state information into the experience pool at intervals of a training period, and the high-income state information can improve the rapidity of the DDQN model convergence.
And 6, inputting real-time information of the tail end of the mechanical arm, updating the impedance parameters of the impedance controller, outputting the position correction quantity of the tail end of the mechanical arm, and finishing the variable impedance control of the shaft hole assembly of the space mechanical arm.
A schematic diagram of a conventional space manipulator assembly is shown in fig. 5, in this embodiment, a simulation of the impedance control of the end of the manipulator is implemented by combining Robotic Toolbox and python tensorflow2.0 in MATLAB,
the UR5 robot simulation environment of figure 6 was created using a Robotic Toolbox.
When the mechanical arm is hindered by the environment to move on the expected track, because the environment generally has rigid property, the mechanical arm can generate an interaction force F with the external environment at the moment e The force/position relationship between the robot and the environment can be considered as a spring model, as follows.
F e =K e (x-x e ) (6)
Wherein K is e Representing the ambient stiffness, x e Indicating an ambient position offset. Let K e Is 500N/m.
In the simulation, the mechanical arm is set to move downwards along the Z axis, and the expected force is set to be F x ,F y ,F z ]=[0,0,15N]That is, only the stress information in the Z-axis direction is considered, and the initial state of the end of the mechanical arm is set to [ x, v, a ]]=[0,-0.5m/s,0]。
Let simulation time T epsilon (0, 2) s, simulation period T =0.005s, and expected position X d 0.2m, the final simulation result is shown in fig. 7, fig. 7 is an impedance control optimum parameter table selected after reinforcement learning, the simulation is divided into three stages, as shown in fig. 7,
1) In the first stage, a large error exists between the tail end of the mechanical arm and the expected position, so that a strategy with high rigidity and high damping is selected, and the impedance controller is enabled to respond quickly.
2) In the second stage, after the mechanical arm reaches the target plane, a strategy of reducing rigidity is adopted, and the method reduces the force error of the system gradually.
3) In the third stage, when the overshoot of the tail end of the mechanical arm is 0, it can be seen that at this time, because the position and the speed of the tail end of the mechanical arm are low, a low-rigidity and low-damping strategy is adopted, so that the static force error of the system approaches to 0.
Fig. 8 and 10 are control effect diagrams of the end position of the mechanical arm and the static force, and a comparison of the two shows that the variable impedance method provided by the invention can make a faster response to a force error at an initial moment, and can realize the tracking of a target force on the premise of meeting a smaller overshoot, and the static error is smaller than that of the traditional constant impedance control.
Fig. 9 is a tracking simulation diagram of the velocity of the end of the mechanical arm, and the velocity of the end of the mechanical arm in the simulation result can be better stabilized within a set threshold value of | v | <0.2, and meanwhile, the velocity of the method of the present invention can reach 0 more quickly than that of the conventional method.
Fig. 11 is a graph of a tracking curve of a mechanical arm tail end for a dynamic force, and the dynamic error of the variable impedance control provided by the invention for the tracking of the dynamic force is smaller than that of the traditional impedance control, the response speed is faster, and the tracking accuracy is better than that of the traditional constant impedance control.
The foregoing embodiments illustrate and describe the general principles and principal features of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed.

Claims (10)

1. A space manipulator shaft hole assembly variable impedance control method based on reinforcement learning is characterized by comprising the following steps:
step 1, constructing a space manipulator model based on a DH parameter method;
step 2, constructing a conversion model of the joint angle state and the terminal pose of the space manipulator based on a forward and inverse kinematics algorithm;
step 3, initializing internal and external parameters of a binocular camera, acquiring images by using the binocular camera, and acquiring position information of the assembly holes;
step 4, constructing an impedance controller based on reinforcement learning, and setting an impedance parameter action table, a reward function and a suspension condition in the training process according to an expected target;
step 5, training an impedance controller based on a neural network;
and 6, inputting real-time information of the tail end of the mechanical arm, updating the impedance parameters of the impedance controller, outputting the position correction quantity of the tail end of the mechanical arm, and finishing the variable impedance control of the shaft hole assembly of the space mechanical arm.
2. The reinforcement learning-based variable impedance control method for the shaft hole assembly of the space manipulator of claim 1, wherein the step 4 of constructing the reinforcement learning-based impedance controller comprises the following steps:
step 4-1, constructing an impedance controller:
Figure FDA0003819517630000011
wherein, x d Respectively representing the actual motion trail and the expected motion trail of the tail end of the space manipulator F e Representing the force of the end of the arm against the external environment, M d ,K d ,C d An expected inertia matrix, an expected stiffness matrix and an expected damping matrix respectively corresponding to the impedance controller;
Figure FDA0003819517630000012
respectively representing the actual acceleration, the expected acceleration, the actual speed and the expected speed of the tail end of the space manipulator, and selecting K from the impedance controller d ,C d As a control amount;
step 4-2, setting a reward function:
Figure FDA0003819517630000014
Figure FDA0003819517630000013
where T represents the duration of a single training, E f An error value between the expected force and the current moment force;
step 4-3, setting an impedance parameter action table for reinforcement learning:
δC d ∈[±2,±1,0],δK d ∈[±5,±4,±3,±2,±1,0]
delta is the delta correction amount set.
3. The reinforcement learning-based space manipulator shaft hole assembly variable impedance control method according to claim 2, wherein the training suspension condition is set as:
the training times reach the set threshold value.
4. The space manipulator shaft hole assembly variable impedance control method based on reinforcement learning of claim 2, wherein the impedance controller trained based on the neural network in the step 5 is specifically:
the method comprises the steps of firstly setting the total times of training, collecting an experience table of the space manipulator in single training, placing the experience table in an experience pool, inputting higher-rewarded experiences in the experience pool into a strategy network at intervals together with experiences randomly extracted from the experience pool, updating the strategy network through a residual error between a predicted value in the strategy network and a target network, setting updating time, replacing the target network with the strategy network once the time is exceeded, updating the target network, finally outputting an action with the highest score through the feedback of the target network in an environment, circulating in sequence until the finally set total times of training is larger than a set value, and finishing training.
5. The reinforcement learning-based space manipulator shaft hole assembly variable impedance control method according to claim 4, wherein the strategy network predicts the Q value of the current time in the reinforcement learning-based impedance controller, predicts the Q value of the next time in the reinforcement learning-based impedance controller based on the target network, and takes the mean square error of the difference between the two times as a loss function:
L=Mse(Q(s t ,a)-r-γQ(s t+1 ,a)
where Mse represents the mean square error, Q(s) t A) represents a Q value at time t, γ ∈ (0, 1) represents a decay rate during learning, and α ∈ (0, 1) represents a learning rate of the model.
6. The reinforcement learning-based variable impedance control method for spatial manipulator shaft hole assembly according to claim 5, wherein the strategy network adopts a whole neural network structure, the position, speed, acceleration and force error information of the manipulator tip are used as network inputs, the number of hidden layer neurons is set to 400, the ReLU function is selected as the activation function, and the Q value of each action at the current moment is output.
7. The reinforcement learning-based space manipulator shaft hole assembly variable impedance control method of claim 5, wherein the target network adopts a whole neural network structure, the position, the speed, the acceleration and the force error information of the manipulator tail end are used as network input, the number of hidden layer neurons is set to 400, the ReLU function is selected by the activation function, and the Q value of each action at the next moment is output.
8. The utility model provides a space manipulator shaft hole assembly variable impedance control system based on reinforcement study which characterized in that includes following module:
the space manipulator model construction module comprises: the method is used for constructing a space manipulator model based on a DH parameter method;
the terminal pose transformation model building module comprises: the method comprises the steps of constructing a transformation model of the joint angle state and the tail end pose of the space manipulator based on a forward-inverse kinematics algorithm;
position information acquisition module of pilot hole: the binocular camera is used for initializing internal and external parameters of the binocular camera, acquiring images by using the binocular camera and acquiring position information of the assembly holes;
the impedance controller constructing module: the impedance controller is used for constructing an impedance controller based on reinforcement learning, and setting an impedance parameter action table, a reward function and a suspension condition in the training process according to an expected target;
a training module: training an impedance controller based on a neural network;
assembling a variable impedance control module in a shaft hole of the space manipulator: the impedance controller is used for inputting real-time information of the tail end of the mechanical arm, updating impedance parameters of the impedance controller, outputting position correction of the tail end of the mechanical arm and finishing variable impedance control of shaft hole assembly of the space mechanical arm.
9. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1-7 are implemented by the processor when executing the computer program.
10. A computer-storable medium having stored thereon a computer program, characterised in that the computer program, when being executed by a processor, carries out the steps of the method as set forth in claims 1-7.
CN202211038250.5A 2022-08-29 2022-08-29 Space manipulator shaft hole assembly variable impedance control method based on reinforcement learning Pending CN115256401A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211038250.5A CN115256401A (en) 2022-08-29 2022-08-29 Space manipulator shaft hole assembly variable impedance control method based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211038250.5A CN115256401A (en) 2022-08-29 2022-08-29 Space manipulator shaft hole assembly variable impedance control method based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN115256401A true CN115256401A (en) 2022-11-01

Family

ID=83755665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211038250.5A Pending CN115256401A (en) 2022-08-29 2022-08-29 Space manipulator shaft hole assembly variable impedance control method based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN115256401A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115421387A (en) * 2022-09-22 2022-12-02 中国科学院自动化研究所 Variable impedance control system and control method based on inverse reinforcement learning
CN116619383A (en) * 2023-06-21 2023-08-22 山东大学 Mechanical arm PID control method and system based on definite learning

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115421387A (en) * 2022-09-22 2022-12-02 中国科学院自动化研究所 Variable impedance control system and control method based on inverse reinforcement learning
CN115421387B (en) * 2022-09-22 2023-04-14 中国科学院自动化研究所 Variable impedance control system and control method based on inverse reinforcement learning
CN116619383A (en) * 2023-06-21 2023-08-22 山东大学 Mechanical arm PID control method and system based on definite learning
CN116619383B (en) * 2023-06-21 2024-02-20 山东大学 Mechanical arm PID control method and system based on definite learning

Similar Documents

Publication Publication Date Title
CN112904728B (en) Mechanical arm sliding mode control track tracking method based on improved approach law
CN115256401A (en) Space manipulator shaft hole assembly variable impedance control method based on reinforcement learning
CN107505947B (en) Space robot captured target despinning and coordination control method
WO2020207219A1 (en) Non-model robot control method for multi-shaft-hole assembly optimized by environmental prediction
CN107160398B (en) The safe and reliable control method of Rigid Robot Manipulator is limited based on the total state for determining study
CN111319036A (en) Self-adaptive algorithm-based mobile mechanical arm position/force active disturbance rejection control method
EP3978204A1 (en) Techniques for force and torque-guided robotic assembly
CN108267952B (en) Self-adaptive finite time control method for underwater robot
CN114995132B (en) Multi-arm spacecraft model prediction control method, equipment and medium based on Gaussian mixture process
CN111428317A (en) Joint friction torque compensation method based on 5G and recurrent neural network
CN112859889B (en) Autonomous underwater robot control method and system based on self-adaptive dynamic planning
Gomes et al. Active control to flexible manipulators
CN112338913A (en) Trajectory tracking control method and system of multi-joint flexible mechanical arm
CN113219825B (en) Single-leg track tracking control method and system for four-leg robot
CN114077258A (en) Unmanned ship pose control method based on reinforcement learning PPO2 algorithm
Sæbϕ et al. Robust task-priority impedance control for vehicle-manipulator systems
Kumar et al. Novel m-PSO Optimized LQR Control Design for Flexible Link Manipulator: An Experimental Validation.
Meng et al. Reinforcement learning based variable impedance control for high precision human-robot collaboration tasks
CN116697829A (en) Rocket landing guidance method and system based on deep reinforcement learning
CN107894709A (en) Controlled based on Adaptive critic network redundancy Robot Visual Servoing
Song et al. Enhanced fireworks algorithm-auto disturbance rejection control algorithm for robot fish path tracking
Jandaghi et al. Motion Dynamics Modeling and Fault Detection of a Soft Trunk Robot
Mou et al. Control method for robotic manipulation of heavy industrial cables
Zhou et al. Intelligent Control of Manipulator Based on Deep Reinforcement Learning
Heyu et al. Impedance control method with reinforcement learning for dual-arm robot installing slabstone

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination