CN112947078A

CN112947078A - Servo motor intelligent optimization control method based on value iteration

Info

Publication number: CN112947078A
Application number: CN202110148138.6A
Authority: CN
Inventors: 朱俊威; 王波; 夏振浩; 顾曹源; 吴麒; 张文安
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-02-03
Filing date: 2021-02-03
Publication date: 2021-06-11

Abstract

A servo motor intelligent optimization control method based on value iteration is characterized in that a system is initialized, and system control is determined; and sampling the system, performing strategy evaluation by using a least square method on-line calculation value function, and updating the strategy by using a greedy algorithm when an optimal value function is obtained, so as to finally obtain the optimal strategy. The method realizes intelligent optimization control on the system by solving the optimal control strategy through value iterative adaptive control, and compared with the prior art, the method does not need to identify the system under the condition that the model parameters of part of the system are unknown, but realizes the optimal control on the system on line based on the value iterative adaptive control method.

Description

Servo motor intelligent optimization control method based on value iteration

Technical Field

The invention belongs to the technical field of control, and particularly relates to a servo motor intelligent optimization control method based on value iteration, which realizes the optimal control of a servo motor under the condition of unknown system model.

Background

In the past control problem, for solving the optimal control problem of the continuous linear time-invariant system, the solution of the generation Raccati equation is usually calculated, such as an eigenvector-based algorithm and a numerical simulation method, LQR method, but all of these methods and their numerically advantageous variables ARE off-line programs, and have been proved to be able to converge to the desired ARE solution. They either operate on hamiltonian matrices associated with ARE (eigenvector and matrix sign based algorithms) or require solving Lyapunov equations (newton's method). In all cases, a model of the system is required and a prior identification process is always required. In practice, however, the system modeling is difficult and requires a great deal of time and effort. Furthermore, even if a model is available, the state feedback controller derived based on the model is only optimal for model approximation of the actual system dynamics.

When the global model of the controlled system is unknown, application data driven control theory and method are considered to solve the actual control problem. The data-driven concept means that on-line and off-line data of a controlled system are utilized, various expected functions of the system based on data forecasting, evaluation, scheduling, monitoring, diagnosis, decision-making, optimization and the like are realized under the condition that a model is not completely known, the data-driven concept is originally derived from the field of computer science, and is still in the bud stage in the control field in recent years. At present, for the control research of a servo motor, more than all traditional control methods based on models are adopted, the control of a system cannot be realized on line by utilizing a data driving idea, and the models need to be completely known, so that a large amount of time is needed for modeling, and meanwhile, the problems of model mismatching, unmodeled dynamics and the like can exist.

Disclosure of Invention

In order to overcome the defects of the existing method, the invention provides a servo motor intelligent optimization control method based on value iteration, which provides a self-adaptive value iteration algorithm, combines the concepts of ADP and an intelligent optimization control system theory, and provides a new ADP technology, and can solve the problem of continuous time infinite time domain optimal control of a linear system with partially unknown dynamics (namely, the internal dynamics specified by a system matrix A) in a time forward mode. The controller parameters are updated based on a sequence of signals measuring the controller performance, using an iterative process of updating the control strategy and value function estimates to bring them closer to the optimal control strategy and corresponding optimal value function. Each iteration step includes an update of the value function estimate based on the current control strategy and then the control strategy is updated based on the new value function estimate.

An entity object (networked multi-axis motion control system) with physical characteristics considered by the invention is shown in fig. 1 and mainly comprises an upper computer, an ARM microprocessor, an AC servo system, a servo motor, a power supply, a switch and a CAN bus. The algorithm programming is based on a Visual Studio C + + environment, the Visual Studio is a set of powerful IDE (integrated development environment) developed by Microsoft (Microsoft) as a platform mainly based on Windows, and the Visual Studio supports the development of various languages such as C #, F #, VB, C/C + +, and the like. The main work of the upper computer is to receive data from the ARM microprocessor through a TCP/IP protocol, operate an embedded control algorithm and then send a control instruction to the ARM microprocessor. The ARM microprocessor is used as a data transfer station, obtains information such as speed, position, torque and the like of the servo motor from the servo system through the CAN bus, transmits the information to the upper computer, and simultaneously receives a control command of the upper computer and sends the control command to the servo system. The AC servo system is a high-performance communication type servo driver of Taida ASDA-A2 series, and is used for responding to a control command of a PC in real time and driving a servo motor to execute corresponding actions, and a power supply and a switch are responsible for power-on and power-off of the system.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a servo motor intelligent optimization control method based on value iteration comprises the following steps:

step 1) establishing a motor system state space equation as follows:

wherein A is a system state matrix, and part of parameters are notKnowing that the information therein is obtained from the current state x (T) and the next time x (T + T), B is the input matrix, x (T) e Rⁿ，u(t)＝R^mAnd (a, B) is controllable, for optimal control problems:

the problem of the optimal control in the finite field is as follows:

selecting Q ═ 1, R ═ 1, (A, B) can be controlled, the solving of controller is confirmed by Bellman optimum principle, u ═ Kx, where K ═ R^-1B^TP, and P satisfies the algebraic Riccati equation:

A^TP+PA-PBR^-1B^TP+Q＝0 (4)；

step 2) initializing the system, comprising the following steps:

2.1) selecting a basis function: for continuous time LQR, the value is quadratic in the state,

therefore, the basis functions of the actor neural network in equation (5) are selected

A quadratic polynomial vector as a state component, where the number of states of the vector is n, and the basis function contains n (n +1)/2 components, and a weight vector W is composed of elements in the matrix P;

the system control is linear in state, u ═ Kx, with the basis functions of the actor neural network (6) as state components;

u_k＝h(x_k)＝U^Tx_k (6)

2.2) initializing the system: selecting an initial state x₀Calculating the initial value of the basis function and determining an initial strategy K₀；

Step 3) sampling the system, and calculating by a least square method to obtain an optimal value function, namely a strategy evaluation process; to obtain in strategy K_iThe value function V (x (T + T)) ═ x for each next step^T(t+T)P_ix (T + T), using a parameter matrix P_iAnd calculating, wherein the formula is changed into:

wherein

Is a second order polynomial basis vector of Kronecker inner product and has the element of { x_i(t)x_j(t)}_{i＝1,n；j＝i,n}，

Acting on the nxn matrix as a vector value matrix function, by superimposing the elements of the symmetric matrix into a vector, the off-diagonal elements are summed to P_ij+P_jiThereby obtaining a column vector;

in each iteration step, the same control strategy K is used_iAfter collecting enough state track points, solving V function parameter by using minimum multiplication

Thereby obtaining P_i+1Finding the minimum of the parameter vector P in the least-squares sense by minimizing the error between the objective functions, N in the state space>N (N +1) points xi are evaluated to obtain a least squares solution:

wherein X ═ X (T) -X (T + T)]^T (9)

The state at discrete times of time T and T + T is measured, and the reward observed during the sampling interval:

and 4) updating the optimal parameters through a greedy algorithm according to the obtained optimal value function:

u_i＝-R^-1B^TP_ix＝K_ix (12)

when the least square method converges, the strategy is not updated any more, and the optimal strategy is obtained.

In the present invention, the continuous-time ADP algorithm consists of iterations between (11) and (12). However, using (12) the update control strategy (operator) requires a B matrix, which makes the tuning algorithm only partially model-free.

The working principle of the invention is as follows: initializing a system and determining system control; and sampling the system, performing strategy evaluation by using a least square method on-line calculation value function, and updating the strategy by using a greedy algorithm when an optimal value function is obtained, so as to finally obtain the optimal strategy.

The invention has the beneficial effects that: compared with the prior art, under the condition that partial model parameters of the system are unknown, the system is not required to be identified, and the optimal control is realized on line by the adaptive control method based on value iteration.

Drawings

FIG. 1 is a flow chart of a design method operating and implemented on a platform;

FIG. 2 is a system input u variation curve based on value iterative adaptive control;

FIG. 3 is a diagram of system state change based on value iterative adaptive control;

FIG. 4 is a graph of the change in state of a system at any given strategy value for a known system dynamics model;

FIG. 5 is a graph of a variation of the transition matrix P based on value iterative adaptive control;

FIG. 6 is a graph of performance index change based on value iterative adaptive control and any given policy K.

Detailed Description

In order to make the technical features, purposes and advantages of the present invention clearer and clearer, the technical scheme of the present invention is further described below with reference to the accompanying drawings and practical experiments.

Referring to fig. 1-6, a servo motor intelligent optimization control method based on value iteration, first initializing a target system, and selecting a proper basis function; sampling the system, calculating the state of the next moment according to the state of the current moment, and calculating the optimal value function on line; after the optimal value function is obtained, the strategy is updated by using a greedy algorithm, the strategy is optimal when converging, and the strategy is not updated any more, so that the optimal control on the system is realized.

The optimal control method of the servo motor based on value iteration comprises the following steps:

step 1) establishing a motor system state space equation as follows:

wherein, A is a system state matrix, part of parameters are unknown, the information is obtained from the current state x (T) and the next time x (T + T), B is an input matrix, x (T) epsilon Rⁿ，u(t)＝R^mAnd (a, B) is controllable, for optimal control problems:

the problem of the optimal control in the finite field is as follows:

A^TP+PA-PBR^-1B^TP+Q＝0 (4)；

step 2) initializing the system, comprising the following steps:

Rⁿ→R^LA quadratic polynomial vector as a state component, where the number of states of the vector is n, and the basis function contains n (n +1)/2 components, and a weight vector W is composed of elements in the matrix P;

u_k＝h(x_k)＝U^Tx_k (6)

Step 3) sampling the system, calculating by a least square method, and obtaining an optimal value functionNamely a policy evaluation process; to obtain in strategy K_iThe value function V (x (T + T)) ═ x for each next step^T(t+T)P_ix (T + T), using a parameter matrix P_iAnd calculating, wherein the formula is changed into:

wherein

wherein X ═ X (T) -X (T + T)]^T (9)

u_i＝-R^-1B^TP_ix＝K_ix (12)

In this embodiment, in step 1), a second-order motor system is considered as follows:

wherein

B＝[0 2]^TTaking Q ═ 1 and R ═ 1, the solution to the continuous-time ARE equation:

in the step 2), the self-adaptive control algorithm based on value iteration is used for the experiment, the matrix A is only used for obtaining sampling data, and the evaluation and the updating of the strategy in the control algorithm do not involve the use of the matrix A. x is the number of₁Finger motor position, x₂Refers to the motor speed. Initializing the system and taking an initial state x₀＝[4 0.4]^TSelecting basis functions

The above-mentionedIn step 3), arbitrarily giving a certain policy, performing policy evaluation on the system: at a given initial policy K₀＝[0 0]^TIn the case of (1), the sampling time T is taken to be 0.05s, and the sampling time T is set to be 0.05s in a finite interval [ T, T + T ]]Sampling the system, updating the state x (T + T) at the next moment by using the state x (T) at the current moment, and calculating a value function by using a least square method;

in the step 4), strategy promotion is performed on the system: after strategy evaluation, an optimal value function is obtained, strategy updating is carried out by utilizing a greedy algorithm, when the strategy does not change along with time, an optimal strategy is obtained, control action is continuous time control, gain is updated at a sampling point, and the change is shown in figure 2.

From the experimental results of fig. 3, after the policy is updated 3 times, the policy convergence does not converge to 0 finally when the system state is updated. FIG. 4 illustrates that the final value of P, which is obtained by value iteration, is

The control effect can meet the expected requirement.

In comparison with the known dynamic model and the condition that any given strategy K is [ 30.3 ], fig. 5 and 6 illustrate that the system state convergence of the method is more gradual and rapid, and excessive overshoot does not occur in the process, which can be found in the performance index in fig. 6, and the method can better and more rapidly obtain the optimal performance index.

The invention provides a servo motor intelligent optimization control method based on value iteration, which uses a value iteration self-adaptive control method to realize on-line solving of the optimal control problem of a system through two steps of strategy evaluation and strategy promotion, and compared with the prior art, the invention has the practicability that: the system model parameters are not required to be identified, and system information can be acquired by acquiring system track data, so that an optimal control strategy is acquired.

The technical solution of the present invention is described in detail above with reference to the accompanying drawings but is not limited thereto, and various changes and modifications can be made within the knowledge of those skilled in the art based on the concept of the present invention.

Claims

1. An intelligent optimization control method for a servo motor based on value iteration is characterized by comprising the following steps:

step 1) establishing a motor system state space equation as follows:

the problem of the optimal control in the finite field is as follows:

A^TP+PA-PBR^-1B^TP+Q＝0 (4)；

step 2) initializing the system, comprising the following steps:

u_k＝h(x_k)＝U^Tx_k (6)

wherein

Acting on the nxn matrix as a vector-valued matrix function by superimposing the elements of the symmetric matrix into a vector, summing the off-diagonal elements to P_ij+P_jiThereby obtaining a column vector;

in each iteration step, the same control strategy K is used_iAfter collecting enough state track points, solving V function parameter by least square method

Thereby obtaining P_i+1The minimum of the parameter vector P is found by minimizing the error between the objective functions in the least squares sense, N in the state space>N (N +1) points xi are evaluated to obtain a least squares solution:

wherein X ═ X (T) -X (T + T)]^T (9)

u_i＝-R^-1B^TP_ix＝K_ix (12)