NL2027701B1

NL2027701B1 - Point-to-point tracking control method for multi-agent trajectory-updating iterative learning

Info

Publication number: NL2027701B1
Application number: NL2027701A
Authority: NL
Inventors: Liu Chenglin; Luo Yujuan
Original assignee: Univ Jiangnan
Priority date: 2020-06-19
Filing date: 2021-03-03
Publication date: 2022-03-15
Also published as: CN111722628B; NL2027701A; CN111722628A

Abstract

The present invention discloses a point—to—point tracking control method for multi—agent trajectory—updating iterative learning, belonging to the field of control technology. The method includes: first constructing a mathematical model of a discrete heterogeneous multi—agent system, where an expected position point is considered as being generated by a virtual leader, and determining a spanning tree structure with the leader as a root node according to a communication topology structure of the multi—agent system; next, designing a target—trajectory updating method according to an expected point, and updating a target trajectory to enable a new target trajectory to converge to a system output; and finally, designing a P—type iterative learning method based on target—trajectory updating for follower agents, to implement complete tracking of the expected point by the multi—agent system. In the present invention, by means of the foregoing method, a point—to—point tracking control problem in a heterogeneous multi—agent system is resolved, and the speed at which a system output tracks a new target trajectory is faster than the speed at which the system output tracks a fixed target trajectory, so that agents complete the tracking of an expected point.

Description

Point-to-point tracking control method for multi-agent trajectory-updating iterative learning

FIELD OF THE INVENTION The present invention relates to an iterative learning method based on target-trajectory updating to resolve a point-to-point tracking control problem in a heterogeneous multi-agent system, belonging to the field of control technology.

DESCRIPTION OF THE RELATED ART In recent decades, with the continuous development of artificial intelligence and industrial technologies, many massive-scale control systems with complex structures have emerged, and a plurality of subsystems need to communicate and cooperate with each other to complete macro tasks. The coordination and cooperation among agents significantly improve the degree of intelligence of individual behavior and more adequately complete a lot of work that cannot be completed by single individuals. So far, multi-agent coordination control technologies have been widely applied to fields such as sensor networks, robots, and traffic signal control. In actual industrial production, many controlled systems perform repeated movement tasks within a limited range, for example, a serving system with an instruction signal being a periodic function, a satellite that collaboratively makes a periodic movement around the earth, and a robotic arm that completes repetitive tasks such as welding and transport on an assembly line. In consideration of wear generated during equipment operation and aging, it is generally very difficult to obtain an accurate system model for a controlled system. For this type of multi-agent system that performs repeated movement tasks within a limited range, a system output needs to implement zero-error tracking of an expected trajectory within an entire operating range. To implement accurate tracking of an expected trajectory within an entire operating range in a multi-agent system having a repeated movement property, the concept of iterative learning is introduced into a consensus tracking control problem of the multi-agent system. In the research of consensus in an iterative learning-based multi-agent system, it is usually required that a system output can implement full trajectory tracking within an entire operating range. However, during production with automated coordination and control, a system output only needs to implement the tracking of an expected position point at a specific time point. For example, when a robotic arm grabs and places a plant, it is only necessary to consider outputs at time points of grabbing and placing a plant, and it is not necessary to additionally consider outputs at other time points. For some complex process procedures, due to the limitations of equipment, it is impossible to detect all data. It is difficult to complete the tracking of all data points, and the tracking of only some detectable position points can be implemented. Therefore, the tracking control of a specific point is of significant research value.

At present, the research of point-to-point tracking control has drawn the attention of some scholars. In a conventional method for implementing point-to-point tracking control, any trajectory passing through an expected position point is usually designed, so that a point-to-point tracking control problem is converted into a full trajectory tracking control problem of a fixed target trajectory. The full trajectory tracking control of a fixed target trajectory is a relatively simple method for resolving a point-to-point tracking control problem. However, the quality of the tracking performance of this method is related to the selection of the fixed target trajectory passing through the expected position point. The selection of an optimal fixed target trajectory requires particular a priori knowledge. This somewhat limits the implementation of a point-to-point tracking control problem. In addition, in the method, the degree of freedom at other time points cannot be fully utilized to resolve a point-to-point tracking control problem. To remedy the deficiency of a fixed-trajectory point-to-point tracking control method, some scholars proposed a control method based on target-trajectory updating to resolve a point-to-point tracking control problem in a system. Son T D, Ahn H S, and

Moore K L. (Iterative-learning control in optimal tracking problems with specified data points. Automatica, 2013) used a tracking error between a target trajectory of a previous iteration and a system output trajectory to obtain to a target trajectory in a current iteration, so as to establish a target trajectory update function. AN Tongjian, LIU Xiangguan (Robust iterative-learning control of target-trajectory updating point-to-point. Journal of ZheJiang University, 2015) used an interpolation method to propose an iterative learning method based on target-trajectory updating to resolve a point-to-point tracking problem having initial disturbance, and reached the conclusion that the algorithm has better tracking performance than a fixed-trajectory point-to-point tracking control algorithm. TAO Hongfeng, DONG Xiaoqi, and YANG Huizhong (Optimization and application of a point-to-point iterative-learning control algorithm with reference to trajectory updating. Control Theory & Applications, 2016) introduced norm optimization based on a target trajectory updating iterative-learning algorithm to improve the tracking precision and rapidity of the algorithm, and analyzed the convergence and robustness of the system with no disturbance and nonrepetitive disturbance. At present, the research of point-to-point tracking control of a single system has drawn the attention of some scholars. For a multi-agent system formed by a plurality of agents that collaborate and cooperate with each other, how to use an iterative learning method to resolve a point-to-point tracking control problem in the multi-agent system is one difficulty in the current control field.

SUMMARY OF THE INVENTION An objective of the present invention is to provide an iterative learning method based on target-trajectory updating to resolve a point-to-point tracking control problem in a heterogeneous multi-agent system. The technical solution for achieving the objective of the present invention is as follows: A multi-agent trajectory-updating iterative-learning point-to-point tracking control method includes the following steps:

step 1. constructing a model of a discrete heterogeneous multi-agent system; step 2. analyzing an information exchange relationship among agents in the discrete heterogeneous multi-agent system, and constructing a communication topology structure of the multi-agent system by using a directed graph, where only one or more follower agents are capable of acquiring leader information, and a communication topology diagram formed by a leader and followers includes one spanning tree with the leader as a root node; step 3. giving an initial state condition of all follower agents; step 4. designing a target-trajectory updating method according to an expected position point, solving parameters of the target-trajectory updating method, and updating a target trajectory to enable a new target trajectory to asymptotically converge to a system output; and step 5. designing a P-type iterative learning method based on target-trajectory updating for the follower agents, and solving parameters of the P-type iterative learning method, to implement complete tracking of the expected position point within a limited time in the multi-agent system. Compared with the prior art, the obvious advantages of the present invention lie in that a point-to-point tracking control problem in a heterogeneous multi-agent system is resolved, and an updated target trajectory is closer to a system output than a fixed target trajectory. That is, the speed at which the system output converges to a new target trajectory is faster than the speed at which the system output converges to the fixed target trajectory, so that agents can complete the tracking of a given expected point, and the control better conforms to actual application.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a structural diagram of a network topology according to the present invention. FIG. 2 is a tracking process of a 10! iteration in a communication diagram o of the topology in FIG. 1 according to the present invention.

FIG. 3 is a tracking process of an 80" iteration in a communication diagram of the topology in FIG. 1 according to the present invention.

FIG. 4 is an error convergence diagram in a communication diagram of the topology in FIG. 1 according to the present invention.

FIG. 5 is a tracking process of a 10% iteration in a communication diagram of the topology in FIG. 1 in an iterative learning method based on a fixed target trajectory.

FIG. 6 is a tracking process of a 100% iteration in a communication diagram of the topology in FIG. 1 in an iterative learning method based on a fixed target trajectory.

FIG. 7 is an error convergence diagram in a communication diagram of the topology in FIG. 1 in an iterative learning method based on a fixed target trajectory.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The solution of the present invention is further described below with reference to the accompanying drawings and specific embodiments.

The present invention provides an iterative learning method based on target-trajectory updating to resolve a point-to-point tracking problem in a multi-agent system, including the following steps: Step 1. Construct a mathematical model of a discrete heterogeneous multi-agent system.

A model of a discrete heterogeneous multi-agent system formed by n different agents is: frst +1) = Aix (0) + Bui (©) i=12-n (1). Vil) = Cox; pe (0) where k denotes the number of iterations, i represents an i" agent, i = 1,2,---,n, and t € [0,N] is a sampling time point within one period; x;,(t) € RP, u;g(t) €R7i, and y;,(t) € R™ respectively denote a state, a control input, and a system output of the agent i; and A; € RPUPi, B; € RVU, and CE

R™Pi are matrices having corresponding numbers of dimensions.

It is defined that x(t) = [x] (6), 23, (0), +, xk (OF and w(t) = ie De], © = [Ee yi (@®), +, vE OTT, so that the system (1) is written in a compact matrix form as: X(t +1) = Ax, (0) + Buy (1) a = tr 2) where A =diag{d, A4, An} . B =diag{B,,B,,--,B,} , and C= diag{Cy, Cy, ++, Cy}. The system (2) is converted into an input-output matrix model based on a time sequence: Yi = Pur + 0x (0) (3), where yi = DO), (1), +, ye (N)]" and wy, = [1 (0), up (1), +, wp (NJ, 0 0 0 0 0 CB 0 0 0 - 0 CAB CB 0 0 | and CA%B CAB CB 0 «0 CAN-1B CAN-2B CAN-3B u CB 0 Q=I[c cA CA? CA3 - CAV] For a conventional iterative-learning control algorithm, a control target usually implements full trajectory tracking control of a fixed trajectory yg(£). It is required that as the iterations continue, the system output keeps approaching the fixed trajectory as the number of iterations increases, that is, Vix(E) > Valt), t €{0,1,2,---,N}. However, during actual engineering, in most cases, it 1s only necessary to implement the tracking of the time points T = {t;,t,, +, ty} that require tracking.

Therefore, in the present invention, it is considered to use an iterative-learning control algorithm based on target-trajectory updating to implement the tracking of an expected position point in the multi-agent system, that is, v;, (ts) > Valts) (Ss = 12 :,M), and has OSt <t << ty SN, where yalts) is the expected position point.

Based on a leader-follower communication structure, it is considered that the expected position point Valts) (s=12::,M) is regarded as being generated by a virtual leader, n agents in the system (1) are considered as followers, and only some follower agents can directly acquire leader information.

The main work content of the present invention is that for the multi-agent system (1) in which only some follower agents can directly acquire information of the expected position point, in a fixed communication topology, an appropriate learning method is designed to implement complete tracking of the expected position point within a limited time in the multi-agent system (1). Step 2. Analyze an information exchange relationship among agents in the multi-agent system, construct a communication topology structure of the multi-agent system by using a directed graph, and determine a directed spanning tree structure with the leader as a root node according to the communication topology structure of the multi-agent system.

The directed graph G = (V,E, A) is used to denote the topology structure of the multi-agent system, where a node set V ={1,2,---,n} of the graph G corresponds to the n agents, an edge set ESV XV of the graph G corresponds to information exchange and transfer among the agents, the weight of an edge is a; 20 and a; =0 (i,j EV), and a matrix A = [a] € RX js a weighted adjacency matrix; if a node j is capable of obtaining information from a node i in the directed graph, a node connecting edge is denoted by e;; = (i,j) EE; if e; EE, an element in the weighted adjacency matrix is a;; > 0, or otherwise the element 1s 0, and a; = 0, and Vi € V; a neighbor set of the agent i is Ni={j Ee V:(i,j) €E}, and a Laplacian matrix of the graph G is L=D-A= [&;] ER" and a matrix D is a degree matrix of the graph G, where in the formula: Lj = (21e, and ij D= diag{ 1 Gij = 1, ce, ml} In the directed graph G, a directed path from a node i; to a node iy is an ordered sequence (itx), (is,is) of a series of edges.

If one node i has one directed path to all other nodes in the directed graph G, the node i is used as a root node, and if the graph G has a root node, the directed graph has one spanning tree. In the present invention, a leader-follower coordination control structure is used to research a multi-agent consensus tracking problem. After a leader is added, the n follower agents and the leader form a graph G = {0 UG}. Information transfer between the agent i and the leader is denoted by s;. s; > 0 denotes that the agent has a relation to the leader. 5; = 0 denotes that the agent does not have a relation to the leader.

In the directed graph G, if there is one directed spanning tree with the leader as a root node, it indicates that the leader has one directed path to all the follower agents.

Step 3. Give an initial state condition of all follower agents. an initial state reset condition of all the follower agents is: x; ,(t) = 0.

Step 4. Design a target-trajectory updating method according to an expected position point, solve parameters of the target-trajectory updating method, and update a target trajectory to enable a new target trajectory to asymptotically converge to a system output.

The tracking of a fixed trajectory by using an iterative-learning control algorithm usually requires that as the number of iterations increases, the system output y;,(t) asymptotically converges to a fixed trajectory ya(t), that is, Iva Verl Sla Vell (4). A target-trajectory updating algorithm proposed in the present invention is to enable a new target trajectory 7;x(t) to asymptotically converge to the system output y(t), that is, Neer — yell lle — yell (5). First, it is defined that the target-trajectory updating algorithm is: Fin (1) =v, (t) +1 (1) f (1), (6), where T;x+1(Ê) is a target trajectory of the i™ agent obtained after learning and update of a k™ iteration, y,(t) is any trajectory passing through the expected position point ya(ts). h(t)=(t—1,)(r—1,)---(1—1,) , and fi (©) is any discrete function. Let (8) = DOT FO = [AO LO, LOI H()=diag{h (1) 7 (t). 1, (0)}, and V0) = [ya (©), ya(®), +, y4(D)]". Formula (6) is converted into: Te+1 (0) = Ya (O + HOS) (D). Formula (7) is then written into a time sequence-based form: Tk+1 = Ya + Hf (8), where Terr = Dis OT 1 (1), +, 1 (NJ, Ya = [Ya (0), Ya (1), =, Ya (NM), H = diag{H(0),H(1),---,H(N)}, and =O. FN].

Because point-to-point tracking requires that the value of a target trajectory at a time point T = {t;,t,, tm} that requires tracking in each update be kept consistent with that of a given expected point, that is, (ts) = Valts), Formula (8) may further be converted into a target trajectory at any sampling point: Teer =T + Hf (9).

Let f = F(ry — yr), where F is a real diagonal matrix, Formula (9) is denoted as: Ter = Te FHF (ne Ye) (10).

Let A, = HF, because a matrix H and a matrix F are both diagonal matrices, A; is also a real diagonal matrix, and A, (0) 0 oe 0 Ak = | 0 (1) u 0 | where 0 0 LW)

A(T) 0 aE 0 in the formula, A(t) = | 0 A2x (0) Ĳ 0 | , the 0 NE) target-trajectory updating algorithm (10) is turned into: Terr = Ti + Alg — Vie) (11). As can be learned from Formula (11): Feat 7 Vk =n +A (iN) (02). =(I+4)(n- x), Norms are calculated on two sides of Formula (12): reer Vell SU + Alle yell (13). Therefore, when ||! + All < 1, it is obtained that [lr1 — Vill < re — yell. In a point-to-point tracking control problem based on target-trajectory updating, the value of a target trajectory at a time point T = {t;,t,, ty} that requires tracking is fixed, and is kept consistent with that of an expected point, that is, the value satisfies: Tik(ts) = Valts). s=12--,M (14). Therefore, it is obtained that: Tk+1(ts) = rip(ts) (15).

As can be learned from Formula (11), when Alt) =0 (s=12--,M) is satisfied at a time point T = {t4,t,,-, ty} that requires tracking and 7;1(ts) = Valts) is satisfied, Formula (15) is true.

Therefore, if ||! + All = 1 is satisfied and A;,(t;) =0 (s=12:-,M), it is obtained that |r — Yell S Ire — yell.

As can be seen from Formula (5). as the number of iterations increases, an updated target trajectory is closer to the system output than a fixed target trajectory. That is, the speed at which the system output converges to a new target trajectory is faster than the speed at which the system output converges to the fixed target trajectory. As can be seen, a point-to-point tracking control algorithm based on target-trajectory updating can enable the system to track an expected point faster, to achieve a better tracking effect, and the deficiency of the point-to-point tracking control algorithm of the fixed target trajectory can be remedied.

Step 5. Design a P-type iterative learning method based on target-trajectory updating for the follower agents, and solve parameters of the P-type iterative learning method, to implement complete tracking of the expected position point within a limited time in the multi-agent system.

First, it is given that a tracking error of each agent is: ex (0) =r) Jilt) (16), ex) = yi) —yix(t), JEN, (17), where e; (ft) represents an error between an output of the agent i during the k'™ iteration and a target trajectory obtained after iterative updating, and eijx(t) denotes an error between the agent and neighbor agents thereof in the kt" iteration.

Let &; (ft) denote information received or measured by the agent i during the kf iteration, it is obtained that: oeli) = Djen; aijeiju(t) + sie (t) (18), where d;; is the weight of an edge, and s; is a coupling weight between the agent { and the leader.

Because e(t) = e;;(t) — jx (t), Formula (18) is converted into: Zn = Tye, aij (eik = e140) + sie (8) (19).

It is defined that erf) = [er x(t), nr (H)]” and &(t) = [Ee (0), 82 (6), En, xCO)JF, and by using graph theory knowledge, Formula (19) is written as: GO =(L +9 ne) (20), where S = diag{s;,s3 Sn}, L is a Laplacian matrix of G, and In denotes an m X m-dimensional identity matrix.

Formula (20) 1s also written into a time sequence-based form, that is: Sp =Meg (21),

where ex = [er(0), ex (1), +, ex (NM]", & = Be), ED, &(W)]T, and M = diag{(L + 5) ® Im}wxn- In the present invention, it is considered to use the P-type iterative learning method for each follower agent to resolve a tracking control problem of an expected point in the multi-agent system, and an iterative learning method is shown as follows: Upger1 (1) = Up () + i641 (1) (22), where [; € RU? is a learning gain.

Let we (8) = [et (Oz (Dn (D] GD) = [EO EO, Eni (D)] and Formula (22) is converted into: Ups 1 (8) = up (8) + PÈg+1(0) (23). where FP = diag{l'y, 1, Lj}. Next, let dn = [e(0), EE (NJ and Uk = [u (0), up (1), +, tp (NDF, and Formula (23) is converted into: Ugs1 = Uy + Tks (24), where I" = diagT}Nxy.

The iterative learning method is obtained by substituting Formula (21) into Formula (24): Uppy = ug +PMexs1 (25). An iterative learning method based on target-trajectory updating is obtained by using Formula (11) and Formula (25): = Me : bo + on “30 (20). When A; = 0, Formula (26) is turned into: fii = Uy + Mey1 (27). Tg+1 = Tk In this case, a target trajectory is not iteratively updated.

Therefore, Formula (27) is an iterative learning method based on a fixed target trajectory.

As can be seen, Formula (27) is a special form of Formula (26). It is obtained from Formula (16) that: ex Tg — Yr (28).

It may then further be obtained from Formula (3) and Formula (26) that: Cy PT Yew zi +46 3} Guy, — 0x, (0) (29). =r, +A (5-3 )-G(u +TMe,_, )- 0x, (0) = + Ae, Gu, —GI Me, ,, — 0x, (0). It is obtained by integrating Formula (29) that: (I + GPM)ex1 = Tk + Age — Guy — Q%p+1(0) (30). It is obtained from Formula (3) that: Guy = yr — 0x (0) (31). It is obtained by substituting Formula (31) into Formula (30) that: (1+GTM )e,,, =r, Vet Ae + Ox, (0)-0x,,, (0) aD. ze, +Ae +0x (0)-0x, (0) =(I+4)e +0x, (0)-0x,, (0). Because all the follower agents satisfy x;,(0) =0, it is obtained that 2341(0) — x, (0) = 0, so that Formula (32) is simplified as: (I+GPM}ex,1 =U + Ae, (33). Two sides of Formula (33) are both left-multiplied by (I + GM), to obtain: exer = I+ GTM) HI + Ae, (34). Norms are then calculated on two sides of Formula (34), it is obtained that: leaf <| (ream) (1+ 2) a] (35). < \(1+GTm | 11+ 2) Je.

Because it has been proved that ||! + A.]| = 1, it is obtained that: lea <| (146M) le). G6), As can be learned from Formula (36), when |C + GTM) < 1, it is obtained that ler! > 0,k >.

Therefore, for t € [0,N], when k > =, e(t) -» 0. For all ts ET € [0,N], when k — <=, as can be seen from Formula (14) and Formula (16): Viera (Ls) - Tira (Es) = Valts) (37).

In summary, for a discrete heterogeneous multi-agent system, under the action of an iterative learning method based on target-trajectory updating, if a matrix I enables an inequation [[(I + GTM) | <1 to be true, as the iterations continue, an output trajectory of a follower converges to an expected point, that is, when k >, yp ,(ts) = Valts).

Embodiment A discrete heterogeneous multi-agent system formed by six different follower agents and one leader agent is considered. A communication topology diagram in the system is shown in FIG. 1. The sequence number 0 represents the leader agent, and the sequence numbers 1 to 6 represent the follower agents. A dynamics model of the follower agents is as follows: I 2 0 fre +0 =| Sle + [Ju Jr) =[0 0.1]xy, (8) _ [04 -02 0 foc +0 = rea + | wane Var (t) = [0.2 1]xgr(t) 1 -05 0 0.1 Xip(t + 1) = |0.1 0 0.2 Xi p(t) + 0 u(t) and 1 2 -3 1 ‚an yix(®) = [0.1 0.2 0.4]x,(t),i=34 1 0.2 03 05 0.5 (#40 02 03 0 O0 (+ 0 © x = x, 4, (6), He 01 0 02 1 |" 0" I -4 1 3 1 VaO=[0 0 0 02]x, (0, i=56. The system simulation time is t € [0,2]s. The sampling time is 0.1 s. Five points are selected as expected position points for the research of tracking control, and a point to be tracked is T = {20,60,100,140,180}. An expected output is y4(T) = {5,3, 3, 5, 1.5}.

The expected position point y4(T) = {5,3, 3, -5,1.5} is considered as being generated by a virtual leader with a sequence number of 0. The foregoing six agents are considered as followers, and only some follower agents can directly acquire the leader information. As can be learned from a communication topology diagram 1, only the agent 1 and the agent 4 can directly obtain information of the leader 0. Therefore, S = diag{1.5,0,0,2,0,0}, and at the same time a Laplacian matrix among the agents is obtained as follows: 2 0 -1 -1 0 © -1 1 0 0 0 ©O 0 | EE | 0 0 +1 06 2 1 0 0 0 10 1 In the simulation, initial states of the agents is set to: x, ,(0) = [0 10], x2x(0) =[0 1, xp@=[2 2 1, (9) =[2 2 1]". (0) = [0 0 0 5], and x5, (0)=[0 0 0 5], and a control input signal of each agent during the first iteration is set to 0.

For the iterative learning method (27) based on a fixed target trajectory, a trajectory of the foregoing expected position point y,(T) = {5, 3,-3,-5,1.5} is y(t) = (-6.5t*+41.7t3-72.4t2+33.3t+1).

For the iterative learning method (26) based on target-trajectory updating, Ha) = ya(®), and Ae satisfies A0 = {ECON poe 1.2 15, 15, 3,3} is selected, to obtain a converge condition |: +GI(L+S)® 1) | = 0.73 <1 of the multi-agent system. Under the action of the iterative learning method (26) based on target-trajectory updating, FIG. 2 and FIG. 3 respectively denote a tracking process of the sixth agent in a 10" iteration and an 80" iteration. It can be very clearly seen that as the iteration process continues, the agents can track the expected position point. FIG. 5 denotes an error convergence diagram of the six follower agents under the action of the iterative learning method based on target-trajectory updating. maX ‚7 le. (1) <107is set to an error precision requirement.

As can be seen, when the iterative learning has performed 80 times, the six follower agents can completely track the expected position point.

To compare the tracking performance of the iterative learning method (26) based on target-trajectory updating with that of the iterative learning method (27) based on a fixed target trajectory, Tx4+1 = Tk = Valt) is selected.

In this case, the algorithm (26) is converted into an iterative-learning control algorithm of the fixed target trajectory.

Under the action of an iterative-learning algorithm with a fixed target trajectory, FIG. 6 and FIG. 7 respectively denote tracking processes of the sixth agent in the 10" iteration and the 100" iteration, and it is very clearly seen that as the iterative process continues, the agents can track the fixed target trajectory ya(t). Because the fixed target trajectory y,(t) passes through the expected position point ya(T), the algorithm (27) can complete the tracking of the expected position point.

It can be seen from FIG. 7 that the follower agents that use an iterative-learning control algorithm of the fixed target trajectory can completely track an expected trajectory after the 100™ iteration, and the convergence speed is slower than the convergence speed of a target-trajectory updating iterative-learning algorithm.

In summary, it is found that an updated target trajectory can implement point-to-point tracking in the multi-agent system more rapidly than the fixed target trajectory.

Claims

Conclusions

A method for a point-to-point trace control method for iterative learning in updating the trajectory of multiple agents, comprising the steps of: step 1 constructing a model of a plurality of discrete heterogeneous multi-agent systems; step 2 analyzing an information exchange relationship between agents in the discrete heterogeneous multi-agent system, and constructing a communication topology structure of the multi-agent system using a directed graph, where only one or more trailing agents are able to transfer leader information acquire, and a communication topology diagram formed by a leader and a follower includes one spanning tree with the leader as a root node; step 3 giving an initial state of all tracking agents; step 4 designing a target trajectory update method according to an expected position point, resolving parameters of the target trajectory update method, and updating a target trajectory to allow a new target trajectory to converge asymptotically to a system output; and step 5 designing a P-type iterative learning process based on updating the target trajectory for the tracking agents, and resolving parameters of the P-type iterative learning process, to achieve complete tracing of the expected point position within a limited time. implement the multi-agent system.

The point-to-point trace control method for iterative learning in updating the trajectory of multiple agents according to claim 1, wherein in step 1, a discrete heterogeneous multi-agent system model is formed by n different agents: 20 fax +1) = Ax (t) + Bridge (1) (1 Felt) = Cyl)'

where k represents the number of iterations, / represents the i% agent, i = 1,2, -,n, and t €[0,N] is a sampling time within one period; x i(t) RP , u(t) ER , and y(t) ER ™ denote a state, a control input and a system output of the agent i, respectively; and A; € RPPPt, B; €RPT, and C; € R™Pi are matrices with a corresponding number of dimensions, it is defined that 20 = [Pi Oi, al (D]T and w(t) = OE OT, n= Die, vi, vl (OTT, such that the system (1) in a compact matrix form is written as: [rale +1) = Ax (t) + Buy (t) (2).ye) = Cxy(t) where A =diag{d, Az, Ax} , B =diag{ByB,-,B,} , and C= diag{C,, Co, Co}; the system (2) is converted into an input-output matrix model based on a time sequence: Vi = Pug + Qx, (0)(3), where yi = DO, y(1), =, 3(M]" and wy = [1,(0), up (1), tee (NJ, 0 0 0 0 «0 CB 0 0 0 0 : CAB CB 0 0 | and CA'B CAB CB 0 «0 CAN-1B CAN-2B CAN -3B u CB 0 Q=I[c cA CA? CA3 - CAV and time points T = Í{t,t,, tu} to be tracked are given, preferably using a control method to track the expected position point in the multi-agent system, that is, Virlts) > Valts), s=12-,M, and OSt <t; < + <ty SN, where yalts) is the expected position point; the expected position point yg(ts ) is considered to be generated by a virtual leader, s = 1,2--,M; and n agents in the system are considered followers, and only a few follower agents can get the leader information directly

The method for a point-to-point trace control method for iterative learning in updating the trajectory of multiple agents according to claim 1 or 2, wherein in step 2, the directed graph G = (V,E,A) is used to denote the topology structure of the multi-agent system, where a node set of V ={1,2,,n} of the graph G corresponds to the 7 agents, and an edge set ESVxV of the graph G corresponds to information exchange and transfer among the agents, the weight of a rand isa; 20 and a; =0 (i,j EV), and a matrix A = [a;;] € R™ ™ js a weighted adjacent matrix; if node j is able to obtain information from node 7 in the directed graph, a node connection edge is denoted by e; = (Lj) EE; as e; EE, an element in the weighted adjacent matrix a;; = 0, or else the element is 0, and a; =0, and Vi € V: a neighboring set of the agent i is N; ={j €V:(i,j) EE}; and a Laplacian matrix of the graph G is L=DA= [&;] € R™™ and a matrix D is a degree matrix of the graph G, where in the formula: Ly = (2204 py and j D= diag{¥ 7., a, i =1, em}, and in the directed graph CG, a directed path from node i: to node i is an ordered sequence (iz), ,lis1,is) of a series of edges; if one node i has one directed path to all other nodes in the directed graph G, the node i is a root node, and if the graph G has a root node, the directed graph has one spanning tree. after adding a leader, the 7 follower agents and the leader form a graph G = {0 UG}, information transfer between the agent i and the leader is denoted as s;, s; > 0 indicates that the agent has a relationship with the leader, and s; = 0 indicates that the agent has no relationship with the leader; and in the directed graph G, if there is one directed spanned tree with the leader as the root node, it indicates that the leader has a directed path to all volt agents.

The method for a point-to-point trace control method for iterative learning in updating the trajectory of multiple agents according to any one of claims 1 to 3, wherein in step 3 an initial state recovery condition of all tracking agents is: xi (0) = 0 (4).

The point-to-point trace control method for iterative learning in updating the trajectory of multiple agents according to any one of claims 1-4 | wherein in step 4 the target trajectory update method is as follows: Bia (=v (+R (0) f(t), (5), where T;x+1(t) is a target trajectory of the i agent obtained after the learning and updating a ks iteration, y,(t) is a trajectory passing through the expected point position yalts) , A (1)=(t=1)(t=1,)-(t=1,). f(t) are discrete functions; let (0 = DD OT FO = [AQ LO, (OI, H (1) =diag{h (1).h (1). 1, (1)}, and Yo( ®) = ya(©,ya(®), ~, ya(OI", Formula (4) is converted into: Tear (8) = Ya (©) + H(O)f (0) (6); Formula (6) is rewritten to a time sequence based form: Teer = Va + Hf (7), where Tt = [eer Ta (DTe 1 (NDT, Ya = [Y,(0), Y, (1), tty Ya (NT, H = diag{H(0),H(1),---,H(N)}, and f=1f(0), FQ, FN]; because point-to-point tracing requires that the value of a target trajectory at a time T = {t;,t,, +, ty} that requires tracing in each update is kept consistent with that expected from a particular point, i.e., 1y;(ts) = Valts), Formula (7) is further converted to a target trajectory at each sampling point: Teer = Te +Hf(8); let f = F(x — ye), where F is a real diagonal matrix, Formula (8) is denoted as: Thor = FHF re) (9); let A, = HF, because a matrix H and a matrix F are both diagonal matrices, is A; also a real diagonal matrix, and A(0) 0 “ee 0 A = | 0 Ae (1) u 0 where 0 0 a) Aart) 0 oe 0 in the formula, A,(t)= | 0 Bark) . 0 | ‚ the target trajectory 0 0 Ank) update method (9) is reversed to: Terr = Tk + Akke) (10); tracing a fixed trajectory using an iterative learning algorithm requires that as the number of iterations increases, the system output Vix(t) converges asymptotically to a fixed trajectory y,(t), i.e. IVa Verl lye zel (11) ; a current target range update algorithm is to make a new target range 7;g(t) converge asymptotically to the system output y(t), i.e. Iris Vell Slk yell (12); and for a point-to-point tracing control problem, the target trajectory update algorithm is used Txs = Tr + A(x — yi), and if on |! +24 =1 is satisfied and Agsatisfaction = < Aix) < 0,t € [ONNT can [ri — AO =O0teT '

Vill < Ir — vill are obtained, and T denotes times T = {t;,t5, ++, ty} to follow.

The method for a point-to-point trace control method for iterative learning in updating the trajectory of multiple agents according to any one of claims 1 to 5, wherein in step 5, the P-type iterative learning process based on updating the target trajectory is as follows: first, it is given that a trace drift of each agent is: eik = Tilt) — Felt) (13), jn) = y(t) VO, JEN, (14), where e;,( t) represents a discrepancy between the output of the agent i during the &"° iteration and a target trajectory obtained after iterative update, and e(t) represents a deviation between the agent and its neighboring agents during the 4" iteration; let &;,(t) denote information received or measured by the agent i during the &"" iteration, it is obtained that: diel) = jen aijeijn(t) + sie (8) (15), where a;; the weighting is of an edge, and s; is a link weighting between the agent; and the leader; because ex (t) = e;,(t) — e‚x(t), Formula (15) is converted to: diel) = Xen, ij (e(t) — ej (t)) + se, (8 ) (16); it is defined that e(t) = [erx(6, esp (t), +, nk()]T and &()= Ee), E20), And «OTT, and using graph theory, is it is possible to write Formula (16) as: GM =(L+®Iy)e(® (17), where S = diag{s,,S2, Sn}, L is a Laplacian matrix of G, and Im denotes a m X m-dimensional unity matrix;

Formula (17) is also written in a time sequence based form, that is:

$e = Mex (18),

where ex = [er(0), ec (1), ‚er (NI, & = [§,(0), & (1), +, Eel], and

SM = diag{(L +S) ® Imn}nxen:

preferably using the P-type iterative learning for each tracking agent to solve a trace control problem of an expected point in the multi-agent system, and an iterative learning method is shown as follows:

upger1 = wp) + lieix+i() (19),

where I; €RTP? is a learning increase;

let u(t) = [01 (0), uz 1 (2), ne (0) and E(t) = [£16 (6), E20 (D), EO], Formula (19) is converted to :

Up (8) = up (8) + P&k41(t) (20),

where P = diag{l'y, 5, Li},

then, let Ze = [64(0),& (1), +, & (NT and Uk = [u (0), u, (1), ++, wu, (NT, Formula (20) is converted nasty:

Uppy = U +E (20),

where P = diag{T'}yxn;

Formula (18) is populated into Formula (21) to obtain an iterative learning control method:

Ug+1 = U +I Mey; (22);

an iterative learning method based on updating the target trajectory that can be obtained from Formula (10) and Formula (22) is:

Ur = Up + PMegy1 (23): en

Ther = Tx + Acre — Vp)

For the discrete heterogeneous multi-agent system (1), under the influence of the iterative learning method (23) based on updating the target trajectory, as an inequality ||(I + GPM)! <1 is true, as the iterations continue, an output path from a follower converges to an expected point, i.e. if k >, Yx+1(ts) = Valts).