CN113290554A - Intelligent optimization control method for Baxter mechanical arm based on value iteration - Google Patents

Intelligent optimization control method for Baxter mechanical arm based on value iteration Download PDF

Info

Publication number
CN113290554A
CN113290554A CN202110464400.8A CN202110464400A CN113290554A CN 113290554 A CN113290554 A CN 113290554A CN 202110464400 A CN202110464400 A CN 202110464400A CN 113290554 A CN113290554 A CN 113290554A
Authority
CN
China
Prior art keywords
vector
strategy
optimal
mechanical arm
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110464400.8A
Other languages
Chinese (zh)
Other versions
CN113290554B (en
Inventor
王波
朱俊威
董子源
张恒
夏振浩
周巧倩
张钧涵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110464400.8A priority Critical patent/CN113290554B/en
Publication of CN113290554A publication Critical patent/CN113290554A/en
Application granted granted Critical
Publication of CN113290554B publication Critical patent/CN113290554B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Feedback Control In General (AREA)

Abstract

A Baxter mechanical arm intelligent optimization control method based on value iteration is characterized by firstly initializing a Baxter mechanical arm system and selecting a basis function; sampling the system state and input, calculating the state of the next moment according to the current moment state, and calculating an optimal value function on line; after the optimal value function is obtained, the strategy is updated by using a greedy algorithm, the strategy is optimal when converging, and the strategy is not updated any more, so that the optimal control on the system is realized. The method realizes intelligent optimization control on the system by solving the optimal control strategy through value iterative adaptive control, does not need to identify the system under the condition that partial model parameters of the system are unknown, but realizes the optimal control on the system on line by a value iterative adaptive control method, and simultaneously realizes the effect on the practical level by carrying out algorithm debugging on a robot platform.

Description

Intelligent optimization control method for Baxter mechanical arm based on value iteration
Technical Field
The invention belongs to the technical field of control, and particularly provides a value iteration-based Baxter mechanical arm intelligent optimization control method, which is used for realizing optimal control of a Baxter mechanical arm system under the condition that a system model is unknown.
Background
The multi-axis mechanical arm can be widely popularized and used in multiple fields due to the unique design structure, the industrial mechanical arm is adopted to replace tasks to be completed by manpower, the automation level of industrial production and processing can be improved, and therefore breakthrough of the mechanical arm technology and industrial expansion significance are great.
The traditional development process of the control system generally takes mathematical simulation as a main part, the mathematical simulation is difficult to realize for controlled objects with nonlinear strong coupling degree such as Baxter mechanical arms, and the result confidence of the simulation is low, so that the expected effect is often difficult to achieve. Meanwhile, for the control research of the multi-axis mechanical arm at present, a traditional model-based control method is mostly adopted, the control of the system cannot be realized on line by utilizing a data driving idea, and a completely known system model is required. The Baxter mechanical arm has the advantages that due to the fact that model parameters are unknown, the usable traditional model-based control method is more limited, modeling is conducted on the Baxter mechanical arm through system identification, the workload is huge, a large amount of time and energy are consumed, and meanwhile the problems that models are not matched, unmodeled dynamics and the like may exist.
Disclosure of Invention
In order to overcome the defects of the existing method, the invention provides a value iteration-based intelligent optimization control method for a Baxter mechanical arm, which provides an adaptive value iteration algorithm, combines the concepts of ADP and an intelligent optimization control system theory, and provides an online ADP technology, and can solve the problem of continuous time infinite time domain optimal control of a system with unknown kinetic parameters in a time-forward manner; updating controller parameters according to a signal sequence for measuring the performance of the controller, and enabling the parameters to be close to an optimal control strategy and a corresponding optimal value function through an iterative process of updating a control strategy and value function estimation; each iteration step includes updating the control strategy based on the value function of the current control strategy and updating the control strategy based on the new value function estimate.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a Baxter mechanical arm intelligent optimization control method based on value iteration considers the Baxter mechanical arm system dynamic equation as follows:
Figure BDA0003043114290000011
wherein the ratio of q,
Figure BDA0003043114290000012
respectively representing the position, angular velocity, angular acceleration vector, M of the mechanical armj(q) denotes the arm inertia matrix, Cj
Figure BDA0003043114290000013
Representing the Coriolis moment vector, G, of the armj(q) represents a mechanical arm gravity moment vector, tau represents a mechanical arm control moment vector, taudAn unknown disturbance torque vector representing an external environment;
the system state vector is represented by:
Figure BDA0003043114290000021
the state space equation for the Baxter manipulator is given as follows:
Figure BDA0003043114290000022
wherein, u-tau is the system moment input,
Figure BDA0003043114290000023
is the state vector, y is the output, matrix Ac,Bc,hcThe definition is as follows:
Figure BDA0003043114290000024
wherein, OnIs a zero matrix of (n × n), InAn identity matrix of (n × n);
Figure BDA0003043114290000025
wherein, 0nIs a zero matrix of (n × 1), n (x)1,x2) Collecting the relevant information of the Coriolis moment and the gravitational moment;
iterative optimal control problem for Q-learning values:
Figure BDA0003043114290000026
the problem of the optimal control in the finite field is as follows:
Figure BDA0003043114290000027
selecting Q ═ 1, R ═ 1, (A, B) can be controlled, the solving of controller is confirmed by Bellman optimum principle, u ═ Kx, where K ═ R-1BTH, and H satisfies the algebraic Riccati equation:
ATH+HA-HBR-1BTH+Q=0 (8)
the intelligent optimization control method comprises the following steps:
step 1) initializing the system, comprising the following steps:
1.1) selecting a basis function: for continuous time LQR, the value function is quadratic in the state,
Figure BDA0003043114290000028
therefore, the basis functions of the actor neural network in equation (9) are selected
Figure BDA0003043114290000029
Figure BDA0003043114290000031
A quadratic polynomial vector as a state component, where the number of states of the vector is n, and the basis function contains n (n +1)/2 components, and a weight vector W is composed of elements in a matrix H;
Figure BDA0003043114290000032
1.2) initializing the system: selecting an initial state x0Calculating the initial value of the basis function and determining an initial strategy K0
Step 2) sampling the system, and calculating by a least square method to obtain an optimal value function, namely a strategy evaluation process; to obtain in strategy KiThe Q function of each step next, using the parameter matrix HiCalculating and recording
Figure BDA0003043114290000033
The above formula becomes:
Figure BDA0003043114290000034
wherein
Figure BDA0003043114290000035
Is a second order polynomial basis vector of Kronecker inner product and has the element of zi(t)zj(t)}i=1,n;j=i,n
Figure BDA0003043114290000036
Acting on an n x n matrix as a function of the matrix of vector values, by superimposing the elements of the symmetric matrix into a vector, the off-diagonal elements being summed to Hij+HjiThereby obtaining a column vector;
in each iteration step, the same control strategy K is usediAfter collecting enough number of position and angular velocity track points, solving Q function parameters by using a least square method
Figure BDA0003043114290000037
Thereby obtaining Hi+1The minimum of the parameter vector H is found by minimizing the error between the objective functions in the least squares sense, N in the state space>N (N +1) points ZiEvaluating, resulting in a least squares solution:
Figure BDA0003043114290000038
wherein the content of the first and second substances,
Figure BDA0003043114290000039
Figure BDA00030431142900000310
the state at discrete times of time T and T + T is measured, and the reward observed during the sampling interval:
Figure BDA00030431142900000311
Hi+1=f(hi+1) (15)
and 3) updating the optimal parameters through a greedy algorithm according to the obtained optimal value function:
Figure BDA00030431142900000312
when the least squares approach converges, the strategy is no longer updated, resulting in an optimal strategy, the continuous-time ADP algorithm consists of iterations between (14) and (6), however, using (15) to update the control strategy does not require a system matrix containing kinetic knowledge, which allows the algorithm to be implemented without a model.
The working principle of the invention is as follows: initializing a system and determining system control; and sampling the system, performing strategy evaluation by using a least square method on-line calculation value function, and updating the strategy by using a greedy algorithm when an optimal value function is obtained, so as to finally obtain the optimal strategy.
The robot platform is a Baxter robot, the Baxter robot is a double-arm robot developed by Retink robotics in America, and a single mechanical arm of the robot is a redundant flexible joint mechanical arm with seven degrees of freedom. The robot body is supported by the movable base, the robot arm is connected with the rigid connecting rod through a rotary joint, the joint is connected through an elastic brake, namely, a motor and a speed reducer are connected in series with a spring to drive a load, and the function of protecting a human or the robot body is achieved under the action of man-machine cooperation or external impact. The flexible joint can also detect angular deviations through the hall effect. There are torque sensors at both Baxter joints. The front end and the rear end of the arm are driven by 26W and 63W servo motors, and the joint angle is read by a 14bit encoder. The Baxter robot is an open source robot based on an ROS (reactive oxygen species) operating system, and operates through a Linux platform, and a user can be interconnected with an internal computer of the robot through a network to read information or send an instruction, or remotely control the internal computer to operate related programs through SSH (secure shell). The information reading and real-time control of the Baxter robot can be realized by using the SDK (software development kit) related to Baxter and through the API (application programming interface) of ROS. The SDK in Baxter may provide relevant function interfaces and important tools: such as Gazebo emulators and Moveit mobile software packages.
The invention has the beneficial effects that: the intelligent optimization control of the system is realized by solving the optimal control strategy through the adaptive control of value iteration, the system is not required to be identified under the condition that the parameters of partial models of the system are unknown, the optimal control of the system is realized on line by the adaptive control method based on the value iteration, meanwhile, the algorithm debugging is carried out on the robot platform, and the effect is realized on the practical level.
Drawings
FIG. 1 is a flow chart of a value iteration-based Baxter manipulator intelligent optimization control method;
FIG. 2 is a diagram of system position and angle changes based on value iterative adaptive control;
FIG. 3 is a graph comparing performance indicators based on value iterations and under control of any given policy;
FIG. 4 is a diagram of system input changes based on policy iteration.
Detailed Description
In order to make the technical features, purposes and advantages of the present invention clearer and clearer, the technical scheme of the present invention is further described below with reference to the accompanying drawings and practical experiments.
Referring to fig. 1 to 4, a value iteration-based intelligent optimization control method for a Baxter mechanical arm includes initializing a Baxter mechanical arm system, and selecting a basis function; sampling the system state and input, calculating the state of the next moment according to the current moment state, and calculating an optimal value function on line; after the optimal value function is obtained, the strategy is updated by using a greedy algorithm, the strategy is optimal when converging, and the strategy is not updated any more, so that the optimal control on the system is realized.
The invention relates to a value iteration-based intelligent optimal control method for a Baxter mechanical arm, which comprises the following steps of:
1) initializing a system and selecting a basis function;
2) sampling the system and collecting input and output data; calculating the optimal value of the value function by using a least square method, and performing strategy evaluation;
3) the policy is updated using a greedy algorithm.
Further, in the step 1), consider a three-joint Baxter robot arm system as follows:
Figure BDA0003043114290000051
wherein
Figure BDA0003043114290000052
BccIt is not known that the user is,
Figure BDA0003043114290000053
and Q is 1, and R is 1.
An experiment is based on a value iteration adaptive control algorithm, the position and the angular velocity of a mechanical arm are acquired, the evaluation and the update of a strategy in the control algorithm do not involve the use of a matrix containing dynamics knowledge, and q1The position of a joint of the mechanical arm is indicated,
Figure BDA0003043114290000054
the angular velocity of a joint of the mechanical arm. Initializing the system and taking an initial state x0=[1 1 1 1 1 1]TSelecting basis functions
Figure BDA0003043114290000055
Further, in step 2), a certain policy is given at will, and policy evaluation and policy promotion are performed on the system:
2.1) policy evaluation: at a given initial policy K0=O3×6In the case of (1), the sampling time T is taken to be 0.004s, and the sampling time T is taken to be within a finite interval [ T, T + T ]]Sampling the system, updating the position and the angular velocity x (T + T) at the next moment by using the position and the angular velocity x (T) of the mechanical arm at the current moment, and performing value function calculation by using a least square method, wherein the change of the position and the angular velocity of the mechanical arm and the change of a value function are shown in fig. 2 and fig. 3; 2.2) strategy promotion: and after strategy evaluation, obtaining an optimal value function, updating the strategy by using a greedy algorithm, and obtaining an optimal strategy when the strategy does not change along with time.
From the experimental result shown in fig. 3, after the strategy is updated for 60 times, the strategy convergence is not updated, the final convergence of the joint speed of the mechanical arm is close to 0, and the control effect can meet the expected requirement.
In connection with known kinetic models, any given strategy
Figure BDA0003043114290000056
In the comparison of the situation, fig. 3 shows that the system state convergence of the method is smooth and fast, and excessive overshoot does not occur in the process, so that the expected control effect can be realized, which can be found in the performance index comparison in fig. 4, and the method can better and faster obtain the optimal performance index.
The invention provides a value iteration-based multi-axis mechanical arm intelligent optimization control method, which uses a value iteration self-adaptive control method to realize on-line solution of the optimal control problem of a system through two steps of strategy evaluation and strategy promotion, and compared with the prior art, the invention has the practicability that: the system model parameters are not required to be identified, and system information can be acquired by collecting system track data, so that an optimal control strategy is acquired; and the method realizes good control on an actual platform through the debugging of the Baxter robot platform.
The technical solution of the present invention is described in detail above with reference to the accompanying drawings but is not limited thereto, and various changes and modifications can be made within the knowledge of those skilled in the art based on the concept of the present invention.

Claims (1)

1. An intelligent optimization control method for a Baxter mechanical arm based on value iteration is characterized in that the method considers the system kinetic equation of the Baxter mechanical arm as follows:
Figure FDA0003043114280000011
wherein the ratio of q,
Figure FDA0003043114280000012
respectively representing the position, angular velocity, angular acceleration vector, M of the mechanical armj(q) represents the arm inertia matrix,
Figure FDA0003043114280000013
representing the Coriolis moment vector, G, of the armj(q) represents a mechanical arm gravity moment vector, tau represents a mechanical arm control moment vector, taudAn unknown disturbance torque vector representing an external environment;
the system state vector is represented by:
Figure FDA0003043114280000014
the state space equation for the Baxter manipulator is given as follows:
Figure FDA0003043114280000015
wherein, u-tau is the system moment input,
Figure FDA0003043114280000016
is the state vector, y is the output, matrix Ac,Bc,hcThe definition is as follows:
Figure FDA0003043114280000017
wherein, OnIs a zero matrix of (n × n), InAn identity matrix of (n × n);
Figure FDA0003043114280000018
wherein, 0nIs a zero matrix of (n × 1), n (x)1,x2) Collecting the relevant information of the Coriolis moment and the gravitational moment; iterative optimal control problem for Q-learning values:
Figure FDA0003043114280000019
the problem of the optimal control in the finite field is as follows:
Figure FDA00030431142800000110
selecting Q ═ 1, R ═ 1, (A, B) can be controlled, the solving of controller is confirmed by Bellman optimum principle, u ═ Kx, where K ═ R-1BTH, and H satisfies the algebraic Riccati equation:
ATH+HA-HBR-1BTH+Q=0 (8);
the intelligent optimization control method comprises the following steps:
step 1) initializing the system, comprising the following steps:
1.1) selecting a basis function: for continuous time LQR, the value function is quadratic in the state,
Figure FDA0003043114280000021
therefore, the basis functions of the actor neural network in equation (9) are selected
Figure FDA0003043114280000022
Rn→RLA quadratic polynomial vector as a state component, where the number of states of the vector is n, and the basis function contains n (n +1)/2 components, and a weight vector W is composed of elements in a matrix H;
Figure FDA0003043114280000023
1.2) initializing the system: selecting an initial state x0Calculating the initial value of the basis function and determining an initial strategy K0
Step 2) sampling the system, and calculating by a least square method to obtain an optimal value function, namely a strategy evaluation process; to obtain in strategy KiThe Q function of each step next, using the parameter matrix HiCalculating, let's say z ═ xT uT]TThe above formula becomes:
Figure FDA0003043114280000024
wherein
Figure FDA0003043114280000025
Is KThe ronecker inner product quadratic polynomial basis vector has the element of zi(t)zj(t)}i=1,n;j=i,n
Figure FDA0003043114280000026
Acting on an n x n matrix as a function of the matrix of vector values, by superimposing the elements of the symmetric matrix into a vector, the off-diagonal elements being summed to Hij+HjiThereby obtaining a column vector;
in each iteration step, the same control strategy K is usediAfter collecting enough number of position and angular velocity track points, solving Q function parameters by using a least square method
Figure FDA0003043114280000027
Thereby obtaining Hi+1The minimum of the parameter vector H is found by minimizing the error between the objective functions in the least squares sense, N in the state space>N (N +1) points ZiEvaluating, resulting in a least squares solution:
Figure FDA0003043114280000028
wherein the content of the first and second substances,
Figure FDA0003043114280000029
Figure FDA00030431142800000210
the state at discrete times of time T and T + T is measured, and the reward observed during the sampling interval:
Figure FDA00030431142800000211
Hi+1=f(hi+1) (15)
and 3) updating the optimal parameters through a greedy algorithm according to the obtained optimal value function:
Figure FDA00030431142800000212
when the least squares approach converges, the strategy is no longer updated, resulting in an optimal strategy, the continuous-time ADP algorithm consists of iterations between (14) and (6), however, using (15) to update the control strategy does not require a system matrix containing kinetic knowledge, which allows the algorithm to be implemented without a model.
CN202110464400.8A 2021-04-28 2021-04-28 Intelligent optimization control method for Baxter mechanical arm based on value iteration Active CN113290554B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110464400.8A CN113290554B (en) 2021-04-28 2021-04-28 Intelligent optimization control method for Baxter mechanical arm based on value iteration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110464400.8A CN113290554B (en) 2021-04-28 2021-04-28 Intelligent optimization control method for Baxter mechanical arm based on value iteration

Publications (2)

Publication Number Publication Date
CN113290554A true CN113290554A (en) 2021-08-24
CN113290554B CN113290554B (en) 2022-06-17

Family

ID=77320428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110464400.8A Active CN113290554B (en) 2021-04-28 2021-04-28 Intelligent optimization control method for Baxter mechanical arm based on value iteration

Country Status (1)

Country Link
CN (1) CN113290554B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108406773A (en) * 2018-04-27 2018-08-17 佛山科学技术学院 A kind of 2R drive lacking planar manipulator control methods that energy consumption is minimum
CN108415435A (en) * 2018-04-04 2018-08-17 上海华测导航技术股份有限公司 A kind of agricultural machinery circular curve automatic Pilot control method
CN109919359A (en) * 2019-02-01 2019-06-21 陕西科技大学 A kind of vehicle path planning method based on ADP algorithm
CN111722531A (en) * 2020-05-12 2020-09-29 天津大学 Online model-free optimal control method for switching linear system
CN112084645A (en) * 2020-09-02 2020-12-15 沈阳工程学院 Energy management method of energy storage system of lithium ion battery based on hybrid iterative ADP method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108415435A (en) * 2018-04-04 2018-08-17 上海华测导航技术股份有限公司 A kind of agricultural machinery circular curve automatic Pilot control method
CN108406773A (en) * 2018-04-27 2018-08-17 佛山科学技术学院 A kind of 2R drive lacking planar manipulator control methods that energy consumption is minimum
CN109919359A (en) * 2019-02-01 2019-06-21 陕西科技大学 A kind of vehicle path planning method based on ADP algorithm
CN111722531A (en) * 2020-05-12 2020-09-29 天津大学 Online model-free optimal control method for switching linear system
CN112084645A (en) * 2020-09-02 2020-12-15 沈阳工程学院 Energy management method of energy storage system of lithium ion battery based on hybrid iterative ADP method

Also Published As

Publication number Publication date
CN113290554B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
CN110275436B (en) RBF neural network self-adaptive control method of multi-single-arm manipulator
CN110238839B (en) Multi-shaft-hole assembly control method for optimizing non-model robot by utilizing environment prediction
CN108621158B (en) Time optimal trajectory planning control method and device for mechanical arm
CN113103237B (en) Reconfigurable mechanical arm control method and system oriented to unknown environment constraints
WO2009027673A1 (en) Inverse kinematics
CN111702767A (en) Manipulator impedance control method based on inversion fuzzy self-adaptation
CN115890735B (en) Mechanical arm system, mechanical arm, control method of mechanical arm system, controller and storage medium
KR20220155921A (en) Method for controlling a robot device
CN116460860B (en) Model-based robot offline reinforcement learning control method
Mazare et al. Adaptive variable impedance control for a modular soft robot manipulator in configuration space
Shang et al. Vibration suppression method for flexible link underwater manipulator considering torsional flexibility based on adaptive PI controller with nonlinear disturbance observer
CN113290554B (en) Intelligent optimization control method for Baxter mechanical arm based on value iteration
Tan et al. Controlling robot manipulators using gradient-based recursive neural networks
CN117047774A (en) Multi-joint hydraulic mechanical arm servo control method and system based on sliding film self-adaption
Al-Shuka et al. Adaptive hybrid regressor and approximation control of robotic manipulators in constrained space
CN113370205B (en) Baxter mechanical arm track tracking control method based on machine learning
CN113352320B (en) Q learning-based Baxter mechanical arm intelligent optimization control method
CN113954077B (en) Underwater swimming mechanical arm trajectory tracking control method and device with energy optimization function
Nawrocka et al. Neural network control for robot manipulator
Djeffal et al. Optimized computer torque control and dynamic model of a spatial single section continuum robot
Shafei et al. Trajectory tracking of an uncertain wheeled mobile robotic manipulator with a hybrid control approach
Dash et al. Inverse kinematics solution of a 6-DOF industrial robot
Khoukhi Data-driven multi-stage motion planning of parallel kinematic machines
CN111775142A (en) Model identification and self-adaptive control method for hydraulic mechanical arm
CN113325711B (en) Intelligent control method for preset precision positioning of flexible mechanical arm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant