CN111948937A

CN111948937A - Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system

Info

Publication number: CN111948937A
Application number: CN202010697834.8A
Authority: CN
Inventors: 李铁山; 龙跃; 程玉华; 李美霖; 李耀仑
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2020-07-20
Filing date: 2020-07-20
Publication date: 2020-11-17
Anticipated expiration: 2040-07-20
Also published as: CN111948937B

Abstract

The invention provides a multi-gradient recursive reinforcement learning fuzzy control method and device for a multi-agent system, and belongs to the technical field of ship control of the multi-agent system. The invention mainly aims at a multi-agent ship course discrete system, and improves the speed and the precision of multi-agent course tracking while realizing an optimized control target by adopting lower system energy consumption through multi-gradient recursive reinforcement learning fuzzy control. In addition, the invention provides a multi-gradient recursive learning algorithm, which solves the problem of local extremum in the learning process of the fuzzy logic system weight, enables the weight to be faster and more accurately converged, and improves the reliability and stability of the system.

Description

Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system

Technical Field

The invention belongs to the technical field of ship control of a multi-agent system, and particularly relates to a multi-gradient recursive reinforcement learning fuzzy control method and system of the multi-agent system.

Background

The intelligent ship motion has the characteristics of large time lag, large inertia, nonlinearity and the like, the parameter perturbation of the control model is generated by the change of the navigational speed and the loading, and the uncertainty is generated in the course control system of the intelligent ship by the factors of the change of the navigational condition, the interference of environmental parameters, the measurement inaccuracy and the like. Aiming at the problems caused by the nonlinear uncertain dynamics, the intelligent algorithm is continuously applied to the field of intelligent ship heading control, such as self-adaptive control, robust control, fuzzy self-adaptive control, iterative sliding mode control and a least parameter learning method. A multi-agent system composed of a plurality of intelligent ships is considered, each intelligent ship has independent dynamic and can interact with the environment, the complex problem of multi-ship course control is converted, the reference signal is used as a virtual leader, and course consistency control of the multi-agent system is completed on the premise of low cost. At present, most of the existing multi-agent system ship course consistency design methods based on the fuzzy logic system do not consider the problem of weight convergence local extreme values, and the ship course tracking speed is slow due to the fact that the ship has large inertia, so that energy consumption of a controller and abrasion of a steering engine are serious. In addition, compromise between control performance and control cost is less considered in the ship course consistency control result of the conventional multi-agent system, and the use cost is high, so that the engineering implementation is not facilitated.

Disclosure of Invention

In view of the problems in the background art, the present invention provides a multi-gradient recursive reinforcement learning fuzzy control method and apparatus for a multi-agent system. The invention mainly aims at the multi-agent ship course discrete system, and can effectively reduce the energy consumption of the controller, reduce the abrasion of the steering engine and improve the course consistency control speed and precision of the multi-agent ship course through multi-gradient recursion reinforcement learning fuzzy control.

In order to achieve the purpose, the technical scheme of the invention is as follows:

the multi-gradient recursion reinforcement learning fuzzy control method of the multi-agent system comprises the following steps:

s1, transmitting the collected multi-agent course information to a ship-mounted computer, wherein the ship-mounted computer establishes a multi-agent ship course discrete nonlinear control system mathematical model related to ship course angles by considering the ship steady-state rotation nonlinear characteristic, and the course information comprises rudder angle information measured according to a multi-agent ship steering engine and current course angle information measured by a compass;

s2, the multi-agent shipborne computer obtains a multi-agent course tracking dynamic error and a multi-agent system course tracking transformation system based on the course angle dynamic error of the agents and the virtual leader reference signal, and the course angle change rate of the agents and the dynamic error of the virtual control function;

s3, according to the course tracking dynamic error and the tracking performance threshold of the multi-agent, designing a utility function for obtaining a strategy utility function in a fuzzy evaluation module, obtaining a cost function for designing the fuzzy evaluation module by utilizing the general approximation principle and the Bellman principle of a fuzzy logic system, and designing the self-adaptive update rate of the fuzzy evaluation module based on a multi-gradient recursion method;

s4, designing a virtual controller and a strategy utility function of the multi-agent system in a fuzzy execution module according to the connection weight of each agent in the multi-agent system, and designing the self-adaptive update rate of the fuzzy execution module based on a multi-gradient recursion method;

s5, designing and obtaining a course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation adaptive update rate, the virtual control function, the strategy utility function and the multi-gradient recursive adaptive update rate, thereby obtaining a control input rudder angle of the multi-agent system, transmitting the rudder angle instruction to a multi-agent ship steering engine to output the multi-agent ship course angle, and further realizing the course consistency control of the multi-agent system.

Further, in step S1, establishing a mathematical model of the multi-agent ship heading discrete nonlinear control system, which includes the specific processes:

the multi-agent shipborne computer utilizes the collected rudder angle information and course angle information, considers the ship steady-state rotation nonlinear characteristic, and establishes a multi-agent nonlinear discrete system mathematical model as follows:

in the formula (1), xi_i,1(k) The course angle of the ith agent in the multi-agent system is 1, wherein N is the sequence number of the agents in the multi-agent system, 1 is a first subsystem, and k is the time; xi_i,2(k) Is the rate of change of course angle, 2 is the second subsystem, u_i(k) Inputting a rudder angle; y is_i(k) As output of the system, g_i＝K_i/T_iTo control the gain, wherein K_iIs a ship's turning index, T_iIs a ship follow-up index, f_i,2(ξ_i,2(k) Is an unknown non-linear function, d_i(k) Is an unknown but bounded external disturbance and satisfies

Is an unknown positive number;

further, the specific process of establishing the multi-agent system course tracking transformation system in the step S2 is as follows:

the multi-agent shipborne computer designs a course tracking error variable by utilizing course information:

in the formula (2), the reaction mixture is,_i,1(k) for the course angle dynamic errors of the ith and jth agents and the reference signal in the multi-agent system,_i,2(k) the heading angle change rate xi of the ith agent_i,2(k) And a virtual control function alpha_i,1(k) Error variable of a_i,jIs the connection weight between the ith agent and the jth agent, a_i,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent system_d(k) Participating for smoothly bounded virtual leaderExamining a track;

in order to facilitate the course consistency control design of the multi-agent system and avoid the problem of no correlation of subsystems, the system transformation is carried out on the formula (1) to establish a multi-agent course tracking transformation system:

further, the specific establishment process of the fuzzy evaluation module of the multi-agent system in the step S3 is as follows:

course angle dynamic error based on multi-intelligent-body shipborne computer_i,1(k) And tracking performance threshold, designing utility function pi_i(k) Is composed of

In the formula is more than 0, pi_i(k) 0 means that tracking performance is acceptable, pi_i(k) With 1 representing unacceptable tracking performance, a utility function pi is used_i(k) Design of the strategic Utility function M_i(k) Is composed of

Wherein 0 < gamma_i< 1 is a design parameter, L is a time range, and the formula (5) can be expressed as

The strategy utility function M can be obtained by using the general approximation principle of the fuzzy logic system_i(k) Is composed of

In the formula [ theta ]_i,cTo satisfy

C denotes the evaluation module,

is an unknown positive number of the positive numbers,

as a function of weight is θ_i,c(k) The transpose of (a) is performed,

is a bounded fuzzy basis function and satisfies

Is composed of

Transpose of v_i,c(k) Is to approximate the error, and satisfies

Is an unknown positive number;

further, Bellman error is defined

Is composed of

In the formula

Is an ideal parameter theta_i,cIs estimated by the estimation of (a) a,

is that

The transpose of (a) is performed,

is M_i(k) (ii) an estimate of (d);

according to equation (7), the cost function is defined as

To make the cost function phi_i,c(k) The minimization is achieved, and the self-adaptive update rate of the evaluation module is designed into

Is composed of

In the formula of_i,c＞0，0＜γ_i＜1，

Is a bounded fuzzy basis function and satisfies

T denotes the transpose of the image,

for gradient index, p is a positive integer to represent the gradient length;

further, the virtual controller α of the multi-agent system in the step S4_i,1(k) And fuzzy execution module multi-gradient recursive adaptive update rate

The specific establishment process comprises the following steps:

design the virtual controller as

Defining policy utility functions for multi-agent system fuzzy execution modules

Is composed of

In the formula, S_i,2(k)＝[ξ_i,1(k),ξ_i,2(k),y_d(k)]^T，

Is a bounded fuzzy basis function and satisfies

In order to estimate the parameters in an ideal manner,

is composed of

Transposing;

according to equation (10), the cost function is defined as

To make the cost function phi_i,2(k) The minimization is achieved, and the multi-gradient recursive adaptive update rate is designed according to the multi-gradient recursive algorithm

Is composed of

In the formula, mu_i,2＞0，

Is provided withFuzzy basis functions of the boundary, and satisfies inequality

Further, the specific solving process of controlling the input rudder angle in the step S5 is as follows:

determining a multi-gradient recursive reinforcement learning controller for a multi-agent system: the virtual controller alpha will be obtained_i,1(k) Evaluation module and multi-gradient recursive adaptive update rate

And

obtaining actual control input u of system by using multi-agent shipborne computer_i(k) Comprises the following steps:

in the formula, parameter c_i,1＞0，c_i,2＞0，

Further, the virtual leader reference signal in step 2 and the tracking performance threshold in step 3 are designed according to actual requirements, preferably < 5 °.

A multi-gradient recursion reinforcement learning fuzzy control system of a multi-agent system comprises a data acquisition unit, a data transmission unit, a multi-agent shipborne computer and a data feedback unit;

the data acquisition unit is used for acquiring course information in the ship navigation process; the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer; the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and completing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship; the data feedback unit is used for transmitting rudder angle instructions obtained by the multi-agent ship-borne computer to the multi-agent ship steering engine to output multi-agent ship course angles, and realizing course track tracking consistency control of the multi-agent system,

the multi-agent shipborne computer comprises a multi-agent ship course tracking system mathematical model building module, a multi-agent ship course system tracking error module, a fuzzy evaluation module, a virtual controller building module, a multi-gradient recursion reinforcement learning self-adaptive update rate module, a multi-gradient recursion reinforcement learning fuzzy controller building module and a data feedback unit;

the multi-agent ship course tracking system mathematical model building module is used for building a multi-agent ship course discrete nonlinear control system mathematical model between the input and the output of the system based on the multi-agent course information;

the multi-agent ship course system tracking error module is used for constructing a multi-agent course tracking error dynamic model and a transformation system based on the course information of each agent and the virtual leader in the multi-agent system;

the fuzzy evaluation module is used for designing a fuzzy evaluation cost function based on a preset tracking performance threshold value based on the multi-agent course tracking error to complete the design of a fuzzy evaluation self-adaptive update rate;

the virtual controller construction module is used for designing a virtual control function of the multi-agent system by utilizing the error between the output signal and the reference signal, and designing a virtual controller according to the virtual control function;

the multi-gradient recursion reinforcement learning self-adaptive update rate module is used for obtaining a multi-gradient recursion self-adaptive update rate based on the fuzzy evaluation module information and the strategy utility function;

the multi-gradient recursion reinforcement learning fuzzy controller construction module is used for obtaining the course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation self-adaption updating rate, the virtual control function, the strategy utility function and the multi-gradient recursion self-adaption updating rate.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

compared with the prior art, on one hand, the multi-agent ship course tracking control method based on the fuzzy evaluation signal aims at considering a multi-agent ship course system, the fuzzy evaluation signal and the multi-gradient recursion reinforcement learning controller are used for solving the problem of multi-agent course tracking consistency control, effectively reducing the energy consumption of the controller, reducing the abrasion of a steering engine, being more suitable for solving the problem of multi-agent ship motion control with the characteristics of large time lag, large inertia and nonlinearity, and improving the speed and the precision of multi-agent course tracking while realizing the optimization control target by adopting lower system energy consumption. On the other hand, the invention provides a multi-gradient recursive learning algorithm, which solves the problem of local extremum in the learning process of the weight of the fuzzy logic system, enables the weight to be converged more quickly and accurately, and improves the reliability and stability of the system.

Drawings

FIG. 1 is a flow chart of a control method of the present invention.

FIG. 2 is a communication topology diagram of the multi-agent system of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

FIG. 1 is a flow chart of a control method of the present invention. As shown in FIG. 1, the present invention discloses a multi-gradient recursive reinforcement learning fuzzy control method for a multi-agent system, which specifically comprises the following steps:

step 1: transmitting multi-agent system course information to a ship-borne computer according to rudder angle data measured by a multi-agent ship steering engine and current course angle data measured by a compass, and establishing a mathematical model related to a multi-agent ship course discrete nonlinear control system, wherein the course information comprises the rudder angle data measured by the ship steering engine and the current course angle data measured by the compass, and the mathematical model of the multi-agent ship course nonlinear discrete system is established by considering the stable rotation nonlinear characteristic of a ship:

in the formula (1), xi_i,1(k) The heading angle of the ith intelligent agent in the multi-intelligent-agent system is shown, i is 1, …, N is the sequence number of the intelligent agent in the multi-intelligent-agent system, the corner mark 1 represents the first subsystem, and k is the time; xi_i,2(k) For the course angular rate of change, the corner 2 indicates the second subsystem, u_i(k) Inputting a rudder angle; y is_i(k) As output of the system, g_i＝K_i/T_iTo control the gain, wherein K_iIs a ship's turning index, T_iIs a ship following index, f_i,2(ξ_i,2(k) Is an unknown non-linear function, d_i(k) Is an unknown but bounded external disturbance and satisfies

Is an unknown positive number;

step 2: designing a multi-agent system ship course transformation system: the multi-agent shipborne computer designs a course tracking error variable by utilizing course information:

in the formula (2), the reaction mixture is,_i,1(k) for the course angle dynamic errors of the ith and jth agents and the reference signal in the multi-agent system,_i,2(k) is the state variable xi of the ith agent_i,2(k) And a virtual controller alpha_i,1(k) Error variable of a_i,jFor the ith agent andconnection weight between jth agents, a_i,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent system_d(k) For smoothly bounded virtual leader reference trajectories, α_i,1(k) Is a virtual controller to be designed;

and step 3: designing a fuzzy evaluation module of the multi-agent system: tracking dynamic error based on multi-intelligent-body shipborne computer_i,1(k) And presetting a tracking performance threshold value and designing a utility function pi_i(k) In order to realize the purpose,

in the formula is more than 0, pi_i(k) 0 means that tracking performance is acceptable, pi_i(k) 1 represents unacceptable tracking performance, and the utility function pi is calculated by using an on-board computer_i(k) Design of the strategic Utility function M_i(k) Is composed of

In the formula [ theta ]_i,cTo satisfy

The ideal adjustable parameters of the pressure sensor and the pressure sensor,

is an unknown positive number of the positive numbers,

as a function of weight is θ_i,c(k) The transpose of (a) is performed,

is a bounded fuzzy basis function and satisfies

Is composed of

Transpose of v_i,c(k) Is to approximate the error, and satisfies

Is an unknown positive number;

further, Bellman error is defined

Is composed of

In the formula

Is an ideal parameter theta_i,cIs estimated by the estimation of (a) a,

is that

The transpose of (a) is performed,

is M_i(k) (ii) an estimate of (d);

according to equation (7), the cost function is defined as

Is composed of

In the formula of_i,c＞0，0＜γ_iLess than 1, is a parameter to be designed,

is a bounded fuzzy basis function and satisfies

For gradient index, p is a positive integer to represent the gradient length;

and 4, step 4: virtual controller alpha for designing multi-agent system_i,1(k) Multi-gradient recursive adaptive update rate with fuzzy execution module

The virtual controller is designed such that,

Is composed of

In the formula, S_i,2(k)＝[ξ_i,1(k),ξ_i,2(k),y_d(k)]^T，

Is a bounded fuzzy basis function and satisfies

In order to estimate the parameters in an ideal manner,

is composed of

Transposing;

according to equation (10), the cost function is defined as

Is composed of

In the formula, mu_i,2Is more than 0, is a parameter to be designed,

is a bounded fuzzy basis function and satisfies an inequality

And 5: determining a multi-gradient recursive reinforcement learning controller for a multi-agent system: the virtual controller alpha will be obtained_i,1(k) Evaluation module and multi-gradient recursive adaptive update rate

And

obtaining actual control input u of system by using multi-agent shipborne computer_i(k) Instructions for:

in the formula, parameter c_i,1＞0，c_i,2＞0，

The invention also provides a multi-gradient recursion reinforcement learning fuzzy control system of the multi-agent system, which comprises the following components:

the data acquisition unit is used for acquiring course information in the ship navigation process, wherein the course information comprises rudder angle data and current course angle data;

the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer;

the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and finishing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship, and specifically comprises the following steps:

the multi-agent ship course tracking system mathematical model building module is used for building a multi-agent ship system mathematical model between the input and the output of the system based on the multi-agent course information;

the virtual controller building module is used for designing a virtual control function of the multi-agent system by utilizing the error between the output signal and the reference signal and designing a virtual controller according to the virtual control function;

the multi-gradient recursion reinforcement learning self-adaptive update rate module is used for obtaining a multi-gradient recursion self-adaptive update rate based on the evaluation fuzzy evaluation module information and the strategy utility function;

a multi-gradient recursion reinforcement learning fuzzy controller building module used for obtaining a course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation self-adaptive update rate, the virtual control function, the strategy utility function and the multi-gradient recursion self-adaptive update rate,

and the data feedback unit is used for transmitting the rudder angle instruction of the multi-agent shipborne calculation to the multi-agent ship steering engine module, and the steering engine module outputs the multi-agent ship course angle to realize course consistency control of the multi-agent system.

FIG. 2 is an example of a communication topology of a multi-agent system employed by the present invention. As can be seen in the figure, 0 is the virtual leader, 1/2/3 and 4 are both single agents in a multi-agent system. The virtual leader and the single agent only have one-way information flow, and the information flow between the single agents can be one-way or two-way. And a single agent intelligently receives information from neighbors, and cannot obtain all information of all individuals. A spanning tree exists in the communication topological graph, and the necessary condition for realizing consistency control is met.

While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims

1. The multi-gradient recursion reinforcement learning fuzzy control method of the multi-agent system is characterized by comprising the following steps:

s2, the multi-agent shipborne computer obtains a multi-agent course tracking dynamic error and a multi-agent system course tracking transformation system based on the course angle dynamic error of the agents and the virtual leader reference signal, the course angle change rate of the agents and the dynamic error of the virtual controller;

s3, according to the course tracking dynamic error and the tracking performance threshold of the multi-agent, designing a utility function for obtaining a strategy utility function in the fuzzy evaluation module based on the tracking performance threshold, obtaining a cost function for designing the fuzzy evaluation module by utilizing the general approximation principle and the Bellman principle of the fuzzy logic system, and designing the self-adaptive update rate of the fuzzy evaluation module based on the multi-gradient recursion method;

s5, designing and obtaining a course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking dynamic error, the fuzzy evaluation cost function, the fuzzy evaluation adaptive update rate, the virtual control function, the strategy utility function and the multi-gradient recursive adaptive update rate, thereby obtaining a control input rudder angle of the multi-agent system, transmitting the rudder angle instruction to a multi-agent ship steering engine to output the multi-agent ship course angle, and further realizing the course consistency control of the multi-agent system.

2. The multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, wherein in step S1, the multi-agent shipborne computer uses the collected rudder angle information and course angle information to establish a mathematical model of the multi-agent ship course discrete nonlinear control system by considering the ship steady-state rotation nonlinear characteristic, and the specific formula is as follows:

in the formula (1), xi_i,1(k) The heading angle of the ith intelligent agent in the multi-intelligent-agent system is shown, wherein i is 1, N is the sequence number of the intelligent agent in the multi-intelligent-agent system, the angle mark 1 is the 1 st subsystem, and k is the time; xi_i,2(k) For the rate of change of course angle, the corner mark 2 is the 2 nd subsystem, u_i(k) Inputting a rudder angle; y is_i(k) As output of the system, g_i＝K_i/T_iTo control the gain, wherein K_iIs a ship's turning index, T_iIs a ship follow-up index, f_i,2(ξ_i,2(k) Is an unknown non-linear function, d_i(k) Is an unknown but bounded external disturbance and satisfies

Is an unknown positive number.

3. The multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, wherein the specific process of establishing the multi-agent system course tracking transformation system in the step S2 is as follows:

the multi-agent shipborne computer designs a course tracking dynamic error by utilizing course information:

in the formula (2), the reaction mixture is,_i,1(k) for the course angle dynamic errors of the ith and jth agents and the reference signal in the multi-agent system,_i,2(k) the heading angle change rate xi of the ith agent_i,2(k) And a virtual control function alpha_i,1(k) Error variable of a_i,jIs the connection weight between the ith agent and the jth agent, a_i,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent system_d(k) Referencing a trajectory for a smooth bounded virtual leader;

4. the multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, wherein the fuzzy evaluation module of the multi-agent system in the step S3 is specifically established by:

Obtaining a strategy utility function M by using a general approximation principle of a fuzzy logic system_i(k) As follows below, the following description will be given,

in the formula [ theta ]_i,cTo satisfy

C denotes the evaluation module,

is an unknown positive number of the positive numbers,

as a weight vector, is θ_i,c(k) The transpose of (a) is performed,

is a bounded fuzzy basis function and satisfies

Is composed of

Transpose of v_i,c(k) Is to approximate the error, and satisfies

Is an unknown positive number;

defining Bellman error

Is composed of

In the formula

Is an ideal parameter theta_i,cIs estimated by the estimation of (a) a,

is that

The transpose of (a) is performed,

is M_i(k) (ii) an estimate of (d);

according to equation (7), the cost function is defined as

Is composed of

In the formula of_i,c＞0，0＜γ_i＜1，

Is a bounded fuzzy basis function and satisfies

T denotes transpose, l is gradient index, and p is positive integer to denote gradient length.

5. The multi-gradient recursive reinforcement learning fuzzy control method of multi-agent system as claimed in claim 1, wherein the virtual controller α of multi-agent system in step S4_i,1(k) And fuzzy execution module multi-gradient recursive adaptive update rate

The specific establishment process comprises the following steps:

design the virtual controller as

Is composed of

In the formula, S_i,2(k)＝[ξ_i,1(k),ξ_i,2(k),y_d(k)]^T，

Is a bounded fuzzy basis function and satisfies

In order to estimate the parameters in an ideal manner,

is composed of

Transposing;

according to equation (10), the cost function is defined as

Is composed of

In the formula, mu_i,2＞0，S_i,2(k-l+1)＝[ξ_i,1(k-l+1),ξ_i,2(k-l+1),y_d(k-l+1)]^T，

Is a bounded fuzzy basis function and satisfies an inequality

6. The multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, wherein said step S5 is to control the concrete solving process of the input rudder angle to be: determining a multi-gradient recursive reinforcement learning controller of a multi-agent system, combining the obtained virtual controller alpha_i,1(k) Evaluation module and multi-gradient recursive adaptive update rate

And

calculating actual control input u of system by using multi-agent shipborne computer_i(k) In order to realize the purpose,

in the formula, parameter c_i,1＞0，c_i,2＞0，S_i,2(k-l+1)＝[ξ_i,1(k-l+1),ξ_i,2(k-l+1),a_i,0y₀(k-l+1)]。

7. The multi-gradient recursive reinforcement learning fuzzy control method of multi-agent system as claimed in claim 4, wherein the tracking performance threshold in step 3 is designed according to actual requirements.

8. A multi-gradient recursion reinforcement learning fuzzy control device of a multi-agent system comprises a data acquisition unit, a data transmission unit, a multi-agent shipborne computer and a data feedback unit;

the data acquisition unit is used for acquiring course information in the ship navigation process; the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer; the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and completing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship; the data feedback unit is used for transmitting rudder angle instructions obtained by the multi-agent ship-borne computer to the multi-agent ship steering engine to output multi-agent ship course angles, so as to realize course track tracking consistency control of the multi-agent system,

the system is characterized in that the multi-agent shipborne computer comprises a multi-agent ship course tracking system mathematical model building module, a multi-agent ship course system tracking error module, a fuzzy evaluation module, a virtual controller building module, a multi-gradient recursion reinforcement learning self-adaption updating rate module, a multi-gradient recursion reinforcement learning fuzzy controller building module and a data feedback unit;