CN113290554A

CN113290554A - Intelligent optimization control method for Baxter mechanical arm based on value iteration

Info

Publication number: CN113290554A
Application number: CN202110464400.8A
Authority: CN
Inventors: 王波; 朱俊威; 董子源; 张恒; 夏振浩; 周巧倩; 张钧涵
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-04-28
Filing date: 2021-04-28
Publication date: 2021-08-24
Anticipated expiration: 2041-04-28
Also published as: CN113290554B

Abstract

A Baxter mechanical arm intelligent optimization control method based on value iteration is characterized by firstly initializing a Baxter mechanical arm system and selecting a basis function; sampling the system state and input, calculating the state of the next moment according to the current moment state, and calculating an optimal value function on line; after the optimal value function is obtained, the strategy is updated by using a greedy algorithm, the strategy is optimal when converging, and the strategy is not updated any more, so that the optimal control on the system is realized. The method realizes intelligent optimization control on the system by solving the optimal control strategy through value iterative adaptive control, does not need to identify the system under the condition that partial model parameters of the system are unknown, but realizes the optimal control on the system on line by a value iterative adaptive control method, and simultaneously realizes the effect on the practical level by carrying out algorithm debugging on a robot platform.

Description

Intelligent optimization control method for Baxter mechanical arm based on value iteration

Technical Field

The invention belongs to the technical field of control, and particularly provides a value iteration-based Baxter mechanical arm intelligent optimization control method, which is used for realizing optimal control of a Baxter mechanical arm system under the condition that a system model is unknown.

Background

The multi-axis mechanical arm can be widely popularized and used in multiple fields due to the unique design structure, the industrial mechanical arm is adopted to replace tasks to be completed by manpower, the automation level of industrial production and processing can be improved, and therefore breakthrough of the mechanical arm technology and industrial expansion significance are great.

The traditional development process of the control system generally takes mathematical simulation as a main part, the mathematical simulation is difficult to realize for controlled objects with nonlinear strong coupling degree such as Baxter mechanical arms, and the result confidence of the simulation is low, so that the expected effect is often difficult to achieve. Meanwhile, for the control research of the multi-axis mechanical arm at present, a traditional model-based control method is mostly adopted, the control of the system cannot be realized on line by utilizing a data driving idea, and a completely known system model is required. The Baxter mechanical arm has the advantages that due to the fact that model parameters are unknown, the usable traditional model-based control method is more limited, modeling is conducted on the Baxter mechanical arm through system identification, the workload is huge, a large amount of time and energy are consumed, and meanwhile the problems that models are not matched, unmodeled dynamics and the like may exist.

Disclosure of Invention

In order to overcome the defects of the existing method, the invention provides a value iteration-based intelligent optimization control method for a Baxter mechanical arm, which provides an adaptive value iteration algorithm, combines the concepts of ADP and an intelligent optimization control system theory, and provides an online ADP technology, and can solve the problem of continuous time infinite time domain optimal control of a system with unknown kinetic parameters in a time-forward manner; updating controller parameters according to a signal sequence for measuring the performance of the controller, and enabling the parameters to be close to an optimal control strategy and a corresponding optimal value function through an iterative process of updating a control strategy and value function estimation; each iteration step includes updating the control strategy based on the value function of the current control strategy and updating the control strategy based on the new value function estimate.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a Baxter mechanical arm intelligent optimization control method based on value iteration considers the Baxter mechanical arm system dynamic equation as follows:

wherein the ratio of q,

respectively representing the position, angular velocity, angular acceleration vector, M of the mechanical arm_j(q) denotes the arm inertia matrix, C_j

Representing the Coriolis moment vector, G, of the arm_j(q) represents a mechanical arm gravity moment vector, tau represents a mechanical arm control moment vector, tau_dAn unknown disturbance torque vector representing an external environment;

the system state vector is represented by:

the state space equation for the Baxter manipulator is given as follows:

wherein, u-tau is the system moment input,

is the state vector, y is the output, matrix A_c,B_c,h_cThe definition is as follows:

wherein, O_nIs a zero matrix of (n × n), I_nAn identity matrix of (n × n);

wherein, 0_nIs a zero matrix of (n × 1), n (x)₁,x₂) Collecting the relevant information of the Coriolis moment and the gravitational moment;

iterative optimal control problem for Q-learning values:

the problem of the optimal control in the finite field is as follows:

selecting Q ═ 1, R ═ 1, (A, B) can be controlled, the solving of controller is confirmed by Bellman optimum principle, u ═ Kx, where K ═ R^-1B^TH, and H satisfies the algebraic Riccati equation:

A^TH+HA-HBR^-1B^TH+Q＝0 (8)

the intelligent optimization control method comprises the following steps:

step 1) initializing the system, comprising the following steps:

1.1) selecting a basis function: for continuous time LQR, the value function is quadratic in the state,

therefore, the basis functions of the actor neural network in equation (9) are selected

A quadratic polynomial vector as a state component, where the number of states of the vector is n, and the basis function contains n (n +1)/2 components, and a weight vector W is composed of elements in a matrix H;

1.2) initializing the system: selecting an initial state x₀Calculating the initial value of the basis function and determining an initial strategy K₀；

Step 2) sampling the system, and calculating by a least square method to obtain an optimal value function, namely a strategy evaluation process; to obtain in strategy K_iThe Q function of each step next, using the parameter matrix H_iCalculating and recording

The above formula becomes:

wherein

Is a second order polynomial basis vector of Kronecker inner product and has the element of z_i(t)z_j(t)}_{i＝1,n；j＝i,n}，

Acting on an n x n matrix as a function of the matrix of vector values, by superimposing the elements of the symmetric matrix into a vector, the off-diagonal elements being summed to H_ij+H_jiThereby obtaining a column vector;

in each iteration step, the same control strategy K is used_iAfter collecting enough number of position and angular velocity track points, solving Q function parameters by using a least square method

Thereby obtaining H_i+1The minimum of the parameter vector H is found by minimizing the error between the objective functions in the least squares sense, N in the state space>N (N +1) points Z_iEvaluating, resulting in a least squares solution:

wherein the content of the first and second substances,

the state at discrete times of time T and T + T is measured, and the reward observed during the sampling interval:

H_i+1＝f(h_i+1) (15)

and 3) updating the optimal parameters through a greedy algorithm according to the obtained optimal value function:

when the least squares approach converges, the strategy is no longer updated, resulting in an optimal strategy, the continuous-time ADP algorithm consists of iterations between (14) and (6), however, using (15) to update the control strategy does not require a system matrix containing kinetic knowledge, which allows the algorithm to be implemented without a model.

The working principle of the invention is as follows: initializing a system and determining system control; and sampling the system, performing strategy evaluation by using a least square method on-line calculation value function, and updating the strategy by using a greedy algorithm when an optimal value function is obtained, so as to finally obtain the optimal strategy.

The robot platform is a Baxter robot, the Baxter robot is a double-arm robot developed by Retink robotics in America, and a single mechanical arm of the robot is a redundant flexible joint mechanical arm with seven degrees of freedom. The robot body is supported by the movable base, the robot arm is connected with the rigid connecting rod through a rotary joint, the joint is connected through an elastic brake, namely, a motor and a speed reducer are connected in series with a spring to drive a load, and the function of protecting a human or the robot body is achieved under the action of man-machine cooperation or external impact. The flexible joint can also detect angular deviations through the hall effect. There are torque sensors at both Baxter joints. The front end and the rear end of the arm are driven by 26W and 63W servo motors, and the joint angle is read by a 14bit encoder. The Baxter robot is an open source robot based on an ROS (reactive oxygen species) operating system, and operates through a Linux platform, and a user can be interconnected with an internal computer of the robot through a network to read information or send an instruction, or remotely control the internal computer to operate related programs through SSH (secure shell). The information reading and real-time control of the Baxter robot can be realized by using the SDK (software development kit) related to Baxter and through the API (application programming interface) of ROS. The SDK in Baxter may provide relevant function interfaces and important tools: such as Gazebo emulators and Moveit mobile software packages.

The invention has the beneficial effects that: the intelligent optimization control of the system is realized by solving the optimal control strategy through the adaptive control of value iteration, the system is not required to be identified under the condition that the parameters of partial models of the system are unknown, the optimal control of the system is realized on line by the adaptive control method based on the value iteration, meanwhile, the algorithm debugging is carried out on the robot platform, and the effect is realized on the practical level.

Drawings

FIG. 1 is a flow chart of a value iteration-based Baxter manipulator intelligent optimization control method;

FIG. 2 is a diagram of system position and angle changes based on value iterative adaptive control;

FIG. 3 is a graph comparing performance indicators based on value iterations and under control of any given policy;

FIG. 4 is a diagram of system input changes based on policy iteration.

Detailed Description

In order to make the technical features, purposes and advantages of the present invention clearer and clearer, the technical scheme of the present invention is further described below with reference to the accompanying drawings and practical experiments.

Referring to fig. 1 to 4, a value iteration-based intelligent optimization control method for a Baxter mechanical arm includes initializing a Baxter mechanical arm system, and selecting a basis function; sampling the system state and input, calculating the state of the next moment according to the current moment state, and calculating an optimal value function on line; after the optimal value function is obtained, the strategy is updated by using a greedy algorithm, the strategy is optimal when converging, and the strategy is not updated any more, so that the optimal control on the system is realized.

The invention relates to a value iteration-based intelligent optimal control method for a Baxter mechanical arm, which comprises the following steps of:

1) initializing a system and selecting a basis function;

2) sampling the system and collecting input and output data; calculating the optimal value of the value function by using a least square method, and performing strategy evaluation;

3) the policy is updated using a greedy algorithm.

Further, in the step 1), consider a three-joint Baxter robot arm system as follows:

wherein

B_c,η_cIt is not known that the user is,

and Q is 1, and R is 1.

An experiment is based on a value iteration adaptive control algorithm, the position and the angular velocity of a mechanical arm are acquired, the evaluation and the update of a strategy in the control algorithm do not involve the use of a matrix containing dynamics knowledge, and q₁The position of a joint of the mechanical arm is indicated,

the angular velocity of a joint of the mechanical arm. Initializing the system and taking an initial state x₀＝[1 1 1 1 1 1]^TSelecting basis functions

Further, in step 2), a certain policy is given at will, and policy evaluation and policy promotion are performed on the system:

2.1) policy evaluation: at a given initial policy K₀＝O_3×6In the case of (1), the sampling time T is taken to be 0.004s, and the sampling time T is taken to be within a finite interval [ T, T + T ]]Sampling the system, updating the position and the angular velocity x (T + T) at the next moment by using the position and the angular velocity x (T) of the mechanical arm at the current moment, and performing value function calculation by using a least square method, wherein the change of the position and the angular velocity of the mechanical arm and the change of a value function are shown in fig. 2 and fig. 3; 2.2) strategy promotion: and after strategy evaluation, obtaining an optimal value function, updating the strategy by using a greedy algorithm, and obtaining an optimal strategy when the strategy does not change along with time.

From the experimental result shown in fig. 3, after the strategy is updated for 60 times, the strategy convergence is not updated, the final convergence of the joint speed of the mechanical arm is close to 0, and the control effect can meet the expected requirement.

In connection with known kinetic models, any given strategy

In the comparison of the situation, fig. 3 shows that the system state convergence of the method is smooth and fast, and excessive overshoot does not occur in the process, so that the expected control effect can be realized, which can be found in the performance index comparison in fig. 4, and the method can better and faster obtain the optimal performance index.

The invention provides a value iteration-based multi-axis mechanical arm intelligent optimization control method, which uses a value iteration self-adaptive control method to realize on-line solution of the optimal control problem of a system through two steps of strategy evaluation and strategy promotion, and compared with the prior art, the invention has the practicability that: the system model parameters are not required to be identified, and system information can be acquired by collecting system track data, so that an optimal control strategy is acquired; and the method realizes good control on an actual platform through the debugging of the Baxter robot platform.

The technical solution of the present invention is described in detail above with reference to the accompanying drawings but is not limited thereto, and various changes and modifications can be made within the knowledge of those skilled in the art based on the concept of the present invention.

Claims

1. An intelligent optimization control method for a Baxter mechanical arm based on value iteration is characterized in that the method considers the system kinetic equation of the Baxter mechanical arm as follows:

wherein the ratio of q,

respectively representing the position, angular velocity, angular acceleration vector, M of the mechanical arm_j(q) represents the arm inertia matrix,

the system state vector is represented by:

the state space equation for the Baxter manipulator is given as follows:

wherein, u-tau is the system moment input,

wherein, O_nIs a zero matrix of (n × n), I_nAn identity matrix of (n × n);

wherein, 0_nIs a zero matrix of (n × 1), n (x)₁,x₂) Collecting the relevant information of the Coriolis moment and the gravitational moment; iterative optimal control problem for Q-learning values:

the problem of the optimal control in the finite field is as follows:

A^TH+HA-HBR^-1B^TH+Q＝0 (8)；

the intelligent optimization control method comprises the following steps:

step 1) initializing the system, comprising the following steps:

Rⁿ→R^LA quadratic polynomial vector as a state component, where the number of states of the vector is n, and the basis function contains n (n +1)/2 components, and a weight vector W is composed of elements in a matrix H;

Step 2) sampling the system, and calculating by a least square method to obtain an optimal value function, namely a strategy evaluation process; to obtain in strategy K_iThe Q function of each step next, using the parameter matrix H_iCalculating, let's say z ═ x^T u^T]^TThe above formula becomes:

wherein

Is KThe ronecker inner product quadratic polynomial basis vector has the element of z_i(t)z_j(t)}_{i＝1,n；j＝i,n}，

wherein the content of the first and second substances,

H_i+1＝f(h_i+1) (15)