CN111948937B - Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system - Google Patents

Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system Download PDF

Info

Publication number
CN111948937B
CN111948937B CN202010697834.8A CN202010697834A CN111948937B CN 111948937 B CN111948937 B CN 111948937B CN 202010697834 A CN202010697834 A CN 202010697834A CN 111948937 B CN111948937 B CN 111948937B
Authority
CN
China
Prior art keywords
agent
course
fuzzy
ship
gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010697834.8A
Other languages
Chinese (zh)
Other versions
CN111948937A (en
Inventor
李铁山
龙跃
程玉华
李美霖
李耀仑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202010697834.8A priority Critical patent/CN111948937B/en
Publication of CN111948937A publication Critical patent/CN111948937A/en
Application granted granted Critical
Publication of CN111948937B publication Critical patent/CN111948937B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/0275Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using fuzzy logic only
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
    • G05D1/02Control of position or course in two dimensions
    • G05D1/0206Control of position or course in two dimensions specially adapted to water vehicles

Abstract

The invention provides a multi-gradient recursive reinforcement learning fuzzy control method and device for a multi-agent system, and belongs to the technical field of ship control of the multi-agent system. The invention mainly aims at a multi-agent ship course discrete system, and improves the speed and the precision of multi-agent course tracking while realizing an optimized control target by adopting lower system energy consumption through multi-gradient recursive reinforcement learning fuzzy control. In addition, the invention provides a multi-gradient recursive learning algorithm, which solves the problem of local extremum in the learning process of the fuzzy logic system weight, enables the weight to be faster and more accurately converged, and improves the reliability and stability of the system.

Description

Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system
Technical Field
The invention belongs to the technical field of ship control of a multi-agent system, and particularly relates to a multi-gradient recursive reinforcement learning fuzzy control method and system of the multi-agent system.
Background
The intelligent ship motion has the characteristics of large time lag, large inertia, nonlinearity and the like, the parameter perturbation of the control model is generated by the change of the navigational speed and the loading, and the uncertainty is generated in the course control system of the intelligent ship by the factors of the change of the navigational condition, the interference of environmental parameters, the measurement inaccuracy and the like. Aiming at the problems caused by the nonlinear uncertain dynamics, the intelligent algorithm is continuously applied to the field of intelligent ship heading control, such as self-adaptive control, robust control, fuzzy self-adaptive control, iterative sliding mode control and a least parameter learning method. A multi-agent system composed of a plurality of intelligent ships is considered, each intelligent ship has independent dynamic and can interact with the environment, the complex problem of multi-ship course control is converted, the reference signal is used as a virtual leader, and course consistency control of the multi-agent system is completed on the premise of low cost. At present, most of the existing multi-agent system ship course consistency design methods based on the fuzzy logic system do not consider the problem of weight convergence local extreme values, and the ship course tracking speed is slow due to the fact that the ship has large inertia, so that energy consumption of a controller and abrasion of a steering engine are serious. In addition, compromise between control performance and control cost is less considered in the ship course consistency control result of the conventional multi-agent system, and the use cost is high, so that the engineering implementation is not facilitated.
Disclosure of Invention
In view of the problems in the background art, the present invention provides a multi-gradient recursive reinforcement learning fuzzy control method and apparatus for a multi-agent system. The invention mainly aims at the multi-agent ship course discrete system, and can effectively reduce the energy consumption of the controller, reduce the abrasion of the steering engine and improve the course consistency control speed and precision of the multi-agent ship course through multi-gradient recursion reinforcement learning fuzzy control.
In order to achieve the purpose, the technical scheme of the invention is as follows:
the multi-gradient recursion reinforcement learning fuzzy control method of the multi-agent system comprises the following steps:
s1, transmitting the collected multi-agent course information to a ship-mounted computer, wherein the ship-mounted computer establishes a multi-agent ship course discrete nonlinear control system mathematical model related to ship course angles by considering the ship steady-state rotation nonlinear characteristic, and the course information comprises rudder angle information measured according to a multi-agent ship steering engine and current course angle information measured by a compass;
s2, the multi-agent shipborne computer obtains a multi-agent course tracking dynamic error and a multi-agent system course tracking transformation system based on the course angle dynamic error of the agents and the virtual leader reference signal, and the course angle change rate of the agents and the dynamic error of the virtual control function;
s3, according to the course tracking dynamic error and the tracking performance threshold value of the multi-agent, designing a utility function for obtaining a strategy utility function in a fuzzy evaluation module, obtaining a cost function for designing the fuzzy evaluation module by utilizing the general approximation principle and the Bellman principle of a fuzzy logic system, and designing the fuzzy evaluation self-adaptive update rate based on a multi-gradient recursion method;
s4, designing a virtual control function and a strategy utility function of the multi-agent system in a fuzzy execution module according to the connection weight of each agent in the multi-agent system, and designing the self-adaptive update rate of the fuzzy execution module based on a multi-gradient recursion method;
s5, designing and obtaining a course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation adaptive update rate, the virtual control function, the strategy utility function and the multi-gradient recursive adaptive update rate, thereby obtaining a control input rudder angle of the multi-agent system, transmitting the rudder angle instruction to a multi-agent ship steering engine to output the multi-agent ship course angle, and further realizing the course consistency control of the multi-agent system.
Further, in step S1, establishing a mathematical model of the multi-agent ship heading discrete nonlinear control system, which includes the specific processes:
the multi-agent shipborne computer utilizes the collected rudder angle information and course angle information, considers the ship steady-state rotation nonlinear characteristic, and establishes a multi-agent nonlinear discrete system mathematical model as follows:
Figure GDA0003023422650000021
in the formula (1), xii,1(k) The course angle of the ith agent in the multi-agent system is 1, wherein N is the sequence number of the agents in the multi-agent system, 1 is a first subsystem, and k is the time; xii,2(k) Is the rate of change of course angle, 2 is the second subsystem, ui(k) Inputting a rudder angle; y isi(k) As output of the system, gi=Ki/TiTo control the gain, wherein KiIs a ship's turning index, TiIs a ship follow-up index, fi,2i,2(k) Is an unknown non-linear function, di(k) Is an unknown but bounded external disturbance and satisfies
Figure GDA0003023422650000022
Figure GDA0003023422650000023
Is an unknown positive number;
further, the specific process of establishing the multi-agent system course tracking transformation system in the step S2 is as follows:
the multi-agent shipborne computer designs a course tracking error variable by utilizing course information:
Figure GDA0003023422650000024
in the formula (2), δi,1(k) Is the course angle dynamic error, delta, of the ith and jth agents and the reference signal in the multi-agent systemi,2(k) The heading angle change rate xi of the ith agenti,2(k) And a virtual control function alphai,1(k) Error variable of ai,jIs the connection weight between the ith agent and the jth agent, ai,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent systemd(k) Referencing a trajectory for a smooth bounded virtual leader;
in order to facilitate the course consistency control design of the multi-agent system and avoid the problem of no correlation of subsystems, the system transformation is carried out on the formula (1) to establish a multi-agent course tracking transformation system:
Figure GDA0003023422650000031
further, the specific establishment process of the fuzzy evaluation module of the multi-agent system in the step S3 is as follows:
course angle dynamic error delta based on multi-intelligent-body shipborne computeri,1(k) And tracking performance threshold epsilon, designing utility function pii(k) Is composed of
Figure GDA0003023422650000032
In which epsilon is greater than 0, pii(k) 0 means that tracking performance is acceptable, pii(k) With 1 representing unacceptable tracking performance, a utility function pi is usedi(k) Design of the strategic Utility function Mi(k) Is composed of
Figure GDA0003023422650000033
Wherein 0 < gammai< 1 is a design parameter, L is a time range, and the formula (5) can be expressed as
Figure GDA0003023422650000034
The strategy utility function M can be obtained by using the general approximation principle of the fuzzy logic systemi(k) Is composed of
Figure GDA0003023422650000035
In the formula [ theta ]i,cTo satisfy
Figure GDA0003023422650000036
C denotes the evaluation module,
Figure GDA0003023422650000037
is an unknown positive number of the positive numbers,
Figure GDA0003023422650000038
as a function of weight is θi,c(k) The transpose of (a) is performed,
Figure GDA0003023422650000039
is a bounded fuzzy basis function and satisfies
Figure GDA00030234226500000310
Figure GDA00030234226500000311
Is composed of
Figure GDA00030234226500000312
Transpose of vi,c(k) Is to approximate the error, and satisfies
Figure GDA00030234226500000313
Figure GDA00030234226500000314
Is an unknown positive number;
further, Bellman error is defined
Figure GDA00030234226500000315
Is composed of
Figure GDA0003023422650000041
In the formula
Figure GDA0003023422650000042
Figure GDA0003023422650000043
Is an ideal parameter thetai,cIs estimated by the estimation of (a) a,
Figure GDA0003023422650000044
is that
Figure GDA0003023422650000045
The transpose of (a) is performed,
Figure GDA0003023422650000046
is Mi(k) (ii) an estimate of (d);
according to equation (7), the cost function is defined as
Figure GDA0003023422650000047
To make the cost function phii,c(k) The minimization is achieved, and the self-adaptive update rate of the evaluation module is designed into
Figure GDA0003023422650000048
Is composed of
Figure GDA0003023422650000049
In the formula ofi,c>0,0<γi<1,
Figure GDA00030234226500000410
Figure GDA00030234226500000411
Is a bounded fuzzy basis function and satisfies
Figure GDA00030234226500000412
T represents transposition, l is gradient index, and p is a positive integer to represent gradient length;
further, the virtual control function α of the multi-agent system in step S4i,1(k) And fuzzy execution module multi-gradient recursive adaptive update rate
Figure GDA00030234226500000413
The specific establishment process comprises the following steps:
designing a virtual control function as
Figure GDA00030234226500000414
Defining policy utility functions for multi-agent system fuzzy execution modules
Figure GDA00030234226500000415
Is composed of
Figure GDA00030234226500000416
In the formula, Si,2(k)=[ξi,1(k),ξi,2(k),yd(k)]T
Figure GDA00030234226500000417
Is a bounded fuzzy basis function and satisfies
Figure GDA00030234226500000418
Figure GDA00030234226500000419
In order to estimate the parameters in an ideal manner,
Figure GDA00030234226500000420
is composed of
Figure GDA00030234226500000421
Transposing;
according to equation (10), the cost function is defined as
Figure GDA00030234226500000422
To make the cost function phii,2(k) The minimization is achieved, and the multi-gradient recursive adaptive update rate is designed according to the multi-gradient recursive algorithm
Figure GDA00030234226500000423
Is composed of
Figure GDA00030234226500000424
In the formula, mui,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),yd(k-l+1)]T
Figure GDA0003023422650000051
Is a bounded fuzzy basis function and satisfies an inequality
Figure GDA0003023422650000052
Further, the specific solving process of controlling the input rudder angle in the step S5 is as follows:
determining multi-gradients for multi-agent systemsA recursive reinforcement learning controller: the virtual controller alpha will be obtainedi,1(k) Evaluation module and multi-gradient recursive adaptive update rate
Figure GDA0003023422650000053
And
Figure GDA0003023422650000054
obtaining actual control input u of system by using multi-agent shipborne computeri(k) Comprises the following steps:
Figure GDA0003023422650000055
in the formula, parameter ci,1>0,ci,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),ai,0y0(k-l+1)]。
Further, the virtual leader reference signal in step 2 and the tracking performance threshold epsilon in step 3 are designed according to actual requirements, preferably, epsilon is less than 5 degrees.
A multi-gradient recursion reinforcement learning fuzzy control system of a multi-agent system comprises a data acquisition unit, a data transmission unit, a multi-agent shipborne computer and a data feedback unit;
the data acquisition unit is used for acquiring course information in the ship navigation process; the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer; the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and completing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship; the data feedback unit is used for transmitting rudder angle instructions obtained by the multi-agent ship-borne computer to the multi-agent ship steering engine to output multi-agent ship course angles, and realizing course track tracking consistency control of the multi-agent system,
the multi-agent shipborne computer comprises a multi-agent ship course tracking system mathematical model building module, a multi-agent ship course system tracking error module, a fuzzy evaluation module, a virtual controller building module, a multi-gradient recursion reinforcement learning self-adaptive update rate module and a multi-gradient recursion reinforcement learning fuzzy controller building module;
the multi-agent ship course tracking system mathematical model building module is used for building a multi-agent ship course discrete nonlinear control system mathematical model between the input and the output of the system based on the multi-agent course information;
the multi-agent ship course system tracking error module is used for constructing a multi-agent course tracking error dynamic model and a transformation system based on the course information of each agent and the virtual leader in the multi-agent system;
the fuzzy evaluation module is used for designing a fuzzy evaluation cost function based on a preset tracking performance threshold value based on the multi-agent course tracking error to complete the design of a fuzzy evaluation self-adaptive update rate;
the virtual controller construction module is used for designing a virtual control function of the multi-agent system by utilizing the error between the output signal and the reference signal, and designing a virtual controller according to the virtual control function;
the multi-gradient recursion reinforcement learning self-adaptive update rate module is used for obtaining a multi-gradient recursion self-adaptive update rate based on the fuzzy evaluation module information and the strategy utility function;
the multi-gradient recursion reinforcement learning fuzzy controller construction module is used for obtaining the course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation self-adaption updating rate, the virtual control function, the strategy utility function and the multi-gradient recursion self-adaption updating rate.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
compared with the prior art, on one hand, the multi-agent ship course tracking control method based on the fuzzy evaluation signal aims at considering a multi-agent ship course system, the fuzzy evaluation signal and the multi-gradient recursion reinforcement learning controller are used for solving the problem of multi-agent course tracking consistency control, effectively reducing the energy consumption of the controller, reducing the abrasion of a steering engine, being more suitable for solving the problem of multi-agent ship motion control with the characteristics of large time lag, large inertia and nonlinearity, and improving the speed and the precision of multi-agent course tracking while realizing the optimization control target by adopting lower system energy consumption. On the other hand, the invention provides a multi-gradient recursive learning algorithm, which solves the problem of local extremum in the learning process of the weight of the fuzzy logic system, enables the weight to be converged more quickly and accurately, and improves the reliability and stability of the system.
Drawings
FIG. 1 is a flow chart of a control method of the present invention.
FIG. 2 is a communication topology diagram of the multi-agent system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a flow chart of a control method of the present invention. As shown in FIG. 1, the present invention discloses a multi-gradient recursive reinforcement learning fuzzy control method for a multi-agent system, which specifically comprises the following steps:
step 1: transmitting multi-agent system course information to a ship-borne computer according to rudder angle data measured by a multi-agent ship steering engine and current course angle data measured by a compass, and establishing a mathematical model related to a multi-agent ship course discrete nonlinear control system, wherein the course information comprises the rudder angle data measured by the ship steering engine and the current course angle data measured by the compass, and the mathematical model of the multi-agent ship course nonlinear discrete system is established by considering the stable rotation nonlinear characteristic of a ship:
Figure GDA0003023422650000071
in the formula (1), xii,1(k) The heading angle of the ith intelligent agent in the multi-intelligent-agent system is shown, wherein i is 1, N is the sequence number of the intelligent agent in the multi-intelligent-agent system, the corner mark 1 represents a first subsystem, and k is the time; xii,2(k) For the course angular rate of change, the corner 2 indicates the second subsystem, ui(k) Inputting a rudder angle; y isi(k) As output of the system, gi=Ki/TiTo control the gain, wherein KiIs a ship's turning index, TiIs a ship following index, fi,2i,2(k) Is an unknown non-linear function, di(k) Is an unknown but bounded external disturbance and satisfies
Figure GDA0003023422650000072
Figure GDA0003023422650000073
Is an unknown positive number;
step 2: designing a multi-agent system ship course transformation system: the multi-agent shipborne computer designs a course tracking error variable by utilizing course information:
Figure GDA0003023422650000074
in the formula (2), δi,1(k) For the course angle dynamic error, delta, of the ith and jth agents and reference signals in a multi-agent systemi,2(k) Is the state variable xi of the ith agenti,2(k) And a virtual controller alphai,1(k) Error variable of ai,jIs the connection weight between the ith agent and the jth agent, ai,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent systemd(k) For smoothly bounded virtual leader reference trajectories, αi,1(k) Is a virtual controller to be designed;
in order to facilitate the course consistency control design of the multi-agent system and avoid the problem of no correlation of subsystems, the system transformation is carried out on the formula (1) to establish a multi-agent course tracking transformation system:
Figure GDA0003023422650000075
and step 3: designing a fuzzy evaluation module of the multi-agent system: tracking dynamic error delta based on multi-intelligent-body shipborne computeri,1(k) And a preset tracking performance threshold epsilon, and designing a utility function pii(k) In order to realize the purpose,
Figure GDA0003023422650000076
in which epsilon is greater than 0, pii(k) 0 means that tracking performance is acceptable, pii(k) 1 represents unacceptable tracking performance, and the utility function pi is calculated by using an on-board computeri(k) Design of the strategic Utility function Mi(k) Is composed of
Figure GDA0003023422650000081
Wherein 0 < gammai< 1 is a design parameter, L is a time range, and the formula (5) can be expressed as
Figure GDA0003023422650000082
The strategy utility function M can be obtained by using the general approximation principle of the fuzzy logic systemi(k) Is composed of
Figure GDA0003023422650000083
In the formula [ theta ]i,cTo satisfy
Figure GDA0003023422650000084
The ideal adjustable parameters of the pressure sensor and the pressure sensor,
Figure GDA0003023422650000085
is an unknown positive number of the positive numbers,
Figure GDA0003023422650000086
as a function of weight is θi,c(k) The transpose of (a) is performed,
Figure GDA0003023422650000087
is a bounded fuzzy basis function and satisfies
Figure GDA0003023422650000088
Figure GDA0003023422650000089
Is composed of
Figure GDA00030234226500000810
Transpose of vi,c(k) Is to approximate the error, and satisfies
Figure GDA00030234226500000811
Figure GDA00030234226500000812
Is an unknown positive number;
further, Bellman error is defined
Figure GDA00030234226500000813
Is composed of
Figure GDA00030234226500000814
In the formula
Figure GDA00030234226500000815
Figure GDA00030234226500000816
Is an ideal parameter thetai,cIs estimated by the estimation of (a) a,
Figure GDA00030234226500000817
is that
Figure GDA00030234226500000818
The transpose of (a) is performed,
Figure GDA00030234226500000819
is Mi(k) (ii) an estimate of (d);
according to equation (7), the cost function is defined as
Figure GDA00030234226500000820
To make the cost function phii,c(k) The minimization is achieved, and the self-adaptive update rate of the evaluation module is designed into
Figure GDA00030234226500000821
Is composed of
Figure GDA00030234226500000822
In the formula ofi,c>0,0<γiLess than 1, is a parameter to be designed,
Figure GDA00030234226500000823
Figure GDA00030234226500000824
is a bounded fuzzy basis function and satisfies
Figure GDA00030234226500000825
l is gradient index, p is positive integer to express gradient length;
and 4, step 4: designing virtual control function alpha of multi-agent systemi,1(k) Multi-gradient recursive adaptive update rate with fuzzy execution module
Figure GDA0003023422650000091
The virtual control function is designed such that,
Figure GDA0003023422650000092
defining policy utility functions for multi-agent system fuzzy execution modules
Figure GDA0003023422650000093
Is composed of
Figure GDA0003023422650000094
In the formula, Si,2(k)=[ξi,1(k),ξi,2(k),yd(k)]T
Figure GDA0003023422650000095
Is a bounded fuzzy basis function and satisfies
Figure GDA0003023422650000096
Figure GDA0003023422650000097
In order to estimate the parameters in an ideal manner,
Figure GDA0003023422650000098
is composed of
Figure GDA0003023422650000099
Transposing;
according to equation (10), the cost function is defined as
Figure GDA00030234226500000910
To make the cost function phii,2(k) The minimization is achieved, and the multi-gradient recursive adaptive update rate is designed according to the multi-gradient recursive algorithm
Figure GDA00030234226500000911
Is composed of
Figure GDA00030234226500000912
In the formula, mui,2> 0, is the parameter to be designed, Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),yd(k-l+1)]T
Figure GDA00030234226500000913
Is a bounded fuzzy basis function and satisfies an inequality
Figure GDA00030234226500000914
And 5: determining a multi-gradient recursive reinforcement learning controller for a multi-agent system: the virtual controller alpha will be obtainedi,1(k) Evaluation module and multi-gradient recursive adaptive update rate
Figure GDA00030234226500000915
And
Figure GDA00030234226500000916
obtaining actual control input u of system by using multi-agent shipborne computeri(k) Instructions for:
Figure GDA00030234226500000917
in the formula, parameter ci,1>0,ci,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),ai,0y0(k-l+1)]。
The invention also provides a multi-gradient recursion reinforcement learning fuzzy control system of the multi-agent system, which comprises the following components:
the data acquisition unit is used for acquiring course information in the ship navigation process, wherein the course information comprises rudder angle data and current course angle data;
the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer; the data feedback unit is used for transmitting the rudder angle instruction of multi-agent shipborne calculation to the multi-agent ship steering engine module, and the steering engine module outputs the multi-agent ship course angle to realize course consistency control of the multi-agent system;
the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and finishing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship, and specifically comprises the following steps:
the multi-agent ship course tracking system mathematical model building module is used for building a multi-agent ship system mathematical model between the input and the output of the system based on the multi-agent course information;
the multi-agent ship course system tracking error module is used for constructing a multi-agent course tracking error dynamic model and a transformation system based on the course information of each agent and the virtual leader in the multi-agent system;
the fuzzy evaluation module is used for designing a fuzzy evaluation cost function based on a preset tracking performance threshold value based on the multi-agent course tracking error to complete the design of a fuzzy evaluation self-adaptive update rate;
the virtual controller building module is used for designing a virtual control function of the multi-agent system by utilizing the error between the output signal and the reference signal and designing a virtual controller according to the virtual control function;
the multi-gradient recursion reinforcement learning self-adaptive update rate module is used for obtaining a multi-gradient recursion self-adaptive update rate based on the evaluation fuzzy evaluation module information and the strategy utility function;
and the multi-gradient recursion reinforcement learning fuzzy controller building module is used for obtaining the course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation self-adaption updating rate, the virtual control function, the strategy utility function and the multi-gradient recursion self-adaption updating rate.
FIG. 2 is an example of a communication topology of a multi-agent system employed by the present invention. As can be seen in the figure, 0 is the virtual leader, 1/2/3 and 4 are both single agents in a multi-agent system. The virtual leader and the single agent only have one-way information flow, and the information flow between the single agents can be one-way or two-way. And a single agent intelligently receives information from neighbors, and cannot obtain all information of all individuals. A spanning tree exists in the communication topological graph, and the necessary condition for realizing consistency control is met.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (3)

1. The multi-gradient recursion reinforcement learning fuzzy control method of the multi-agent system is characterized by comprising the following steps:
s1, transmitting the collected multi-agent course information to a multi-agent shipborne computer, and establishing a multi-agent ship course discrete nonlinear control system mathematical model related to a ship course angle by the multi-agent shipborne computer by considering the ship steady-state rotation nonlinear characteristic, wherein the specific formula is as follows:
Figure FDA0003023422640000011
in the formula (1), xii,1(k) The heading angle of the ith intelligent agent in the multi-intelligent-agent system is shown, wherein i is 1, N is the sequence number of the intelligent agent in the multi-intelligent-agent system, the angle mark 1 is the 1 st subsystem, and k is the time; xii,2(k) For the rate of change of course angle, the corner mark 2 is the 2 nd subsystem, ui(k) Inputting a rudder angle; y isi(k) As output of the system, gi=Ki/TiTo control the gain, wherein KiIs a ship's turning index, TiIs a ship follow-up index, fi,2i,2(k) Is an unknown non-linear function, di(k) Is an unknown but bounded external disturbance and satisfies
Figure FDA0003023422640000012
Figure FDA0003023422640000013
Is an unknown positive number;
the course information comprises rudder angle information measured according to the multi-agent ship steering engine and current course angle information measured by the compass;
s2, the multi-agent shipborne computer obtains a multi-agent course tracking dynamic error and a multi-agent system course tracking transformation system based on the course angle dynamic error of the agents and the virtual leader reference signal, and the course angle change rate of the agents and the dynamic error of the virtual control function, and the specific process is as follows:
the multi-agent shipborne computer designs a course tracking dynamic error by utilizing course information:
Figure FDA0003023422640000014
in the formula (2), δi,1(k) Is the course angle dynamic error, delta, of the ith and jth agents and the reference signal in the multi-agent systemi,2(k) The heading angle change rate xi of the ith agenti,2(k) And a virtual control function alphai,1(k) Error variable of ai,jIs the connection weight between the ith agent and the jth agent, ai,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent systemd(k) Referencing a trajectory for a smooth bounded virtual leader;
in order to facilitate the course consistency control design of the multi-agent system and avoid the problem of no correlation of subsystems, the system transformation is carried out on the formula (1) to establish a multi-agent course tracking transformation system:
Figure FDA0003023422640000021
s3, according to course tracking dynamic errors and tracking performance thresholds of the multi-agent, a utility function for obtaining a strategy utility function is designed in a fuzzy evaluation module based on the tracking performance thresholds, a cost function for designing the fuzzy evaluation module is obtained by utilizing a general approximation principle and a Bellman principle of a fuzzy logic system, and a fuzzy evaluation self-adaptive update rate is designed based on a multi-gradient recursion method, wherein the fuzzy evaluation module is specifically established in the following process:
course angle dynamic error delta based on multi-intelligent-body shipborne computeri,1(k) And tracking performance threshold epsilon, designing utility function pii(k) Is composed of
Figure FDA0003023422640000022
In which epsilon is greater than 0, pii(k) 0 means that tracking performance is acceptable, pii(k) With 1 representing unacceptable tracking performance, a utility function pi is usedi(k) Design of the strategic Utility function Mi(k) Is composed of
Figure FDA0003023422640000023
Wherein 0 < gammai< 1 is a design parameter, L is a time range, and the formula (5) can be expressed as
Figure FDA0003023422640000024
Obtaining a strategy utility function M by using a general approximation principle of a fuzzy logic systemi(k) As follows below, the following description will be given,
Figure FDA0003023422640000025
in the formula [ theta ]i,cTo satisfy
Figure FDA0003023422640000026
C denotes the evaluation module,
Figure FDA0003023422640000027
is an unknown positive number of the positive numbers,
Figure FDA0003023422640000028
as a weight vector, is θi,c(k) The transpose of (a) is performed,
Figure FDA0003023422640000029
is a bounded fuzzy basis function and satisfies
Figure FDA00030234226400000210
Figure FDA00030234226400000211
Is composed of
Figure FDA00030234226400000212
Transpose of vi,c(k) Is to approximate the error, and satisfies
Figure FDA00030234226400000213
Figure FDA00030234226400000214
Is an unknown positive number;
defining Bellman error
Figure FDA00030234226400000215
Is composed of
Figure FDA00030234226400000216
In the formula
Figure FDA00030234226400000217
Figure FDA00030234226400000218
Is an ideal parameter thetai,cIs estimated by the estimation of (a) a,
Figure FDA00030234226400000219
is that
Figure FDA00030234226400000220
The transpose of (a) is performed,
Figure FDA00030234226400000221
is Mi(k) (ii) an estimate of (d);
according to equation (7), the cost function is defined as
Figure FDA0003023422640000031
To make the cost function phii,c(k) The minimization is achieved, and the fuzzy evaluation self-adaptive update rate is designed into
Figure FDA0003023422640000032
Is composed of
Figure FDA0003023422640000033
In the formula ofi,c>0,0<γi<1,
Figure FDA0003023422640000034
Figure FDA0003023422640000035
Is a bounded fuzzy basis function and satisfies
Figure FDA0003023422640000036
T represents transposition, l is gradient index, and p is a positive integer to represent gradient length;
s4, designing a virtual control function and a strategy utility function of the multi-agent system in a fuzzy execution module according to the connection weight of each agent in the multi-agent system, designing a multi-gradient recursion self-adaption updating rate of the fuzzy execution module based on a multi-gradient recursion method, and specifically establishing the following steps:
designing a virtual control function as
Figure FDA0003023422640000037
Defining policy utility functions for multi-agent system fuzzy execution modules
Figure FDA0003023422640000038
Is composed of
Figure FDA0003023422640000039
In the formula, Si,2(k)=[ξi,1(k),ξi,2(k),yd(k)]T
Figure FDA00030234226400000310
Is a bounded fuzzy basis function and satisfies
Figure FDA00030234226400000311
Figure FDA00030234226400000312
In order to estimate the parameters in an ideal manner,
Figure FDA00030234226400000313
is composed of
Figure FDA00030234226400000314
Transposing;
according to equation (10), the cost function is defined as
Figure FDA00030234226400000315
To make the cost function phii2(k) The minimization is achieved, and the multi-gradient recursive self-adaptive update rate of the fuzzy execution module is designed according to the multi-gradient recursive algorithm
Figure FDA00030234226400000316
Is composed of
Figure FDA00030234226400000317
In the formula, mui,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),yd(k-l+1)]T
Figure FDA0003023422640000041
Is a bounded fuzzy basis function and satisfies an inequality
Figure FDA0003023422640000042
S5, designing and obtaining a course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking dynamic error, the fuzzy evaluation cost function, the fuzzy evaluation adaptive update rate, the virtual control function, the strategy utility function and the fuzzy execution module multi-gradient recursive adaptive update rate, thereby obtaining a control input rudder angle of the multi-agent system, transmitting the rudder angle instruction to a multi-agent ship steering engine to output the multi-agent ship course angle, and further realizing course consistency control of the multi-agent system, wherein the specific solving process of the control input rudder angle is as follows: determining a multi-gradient recursive reinforcement learning controller of a multi-agent system, combining the obtained virtual control function alphai,1(k) Multi-gradient recursive adaptive update of evaluation module and fuzzy execution moduleRate of change
Figure FDA0003023422640000043
And
Figure FDA0003023422640000044
calculating actual control input u of system by using multi-agent shipborne computeri(k) In order to realize the purpose,
Figure FDA0003023422640000045
in the formula, parameter ci,1>0,ci,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),ai,0y0(k-l+1)],y0And (k-l +1) represents the reference signal at time (k-l + 1).
2. The multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, wherein the tracking performance threshold epsilon in step 3 is designed according to actual requirements.
3. An apparatus for implementing the multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, comprising a data acquisition unit, a data transmission unit, a multi-agent shipborne computer and a data feedback unit;
the data acquisition unit is used for acquiring course information in the ship navigation process; the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer; the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and completing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship; the data feedback unit is used for transmitting rudder angle instructions obtained by the multi-agent ship-borne computer to the multi-agent ship steering engine to output multi-agent ship course angles, so as to realize course track tracking consistency control of the multi-agent system,
the system is characterized in that the multi-intelligent-body shipborne computer comprises a multi-intelligent-body ship course tracking system mathematical model building module, a multi-intelligent-body ship course system tracking error module, a fuzzy evaluation module, a virtual controller building module, a multi-gradient recursion reinforcement learning self-adaption updating rate module and a multi-gradient recursion reinforcement learning fuzzy controller building module;
the multi-agent ship course tracking system mathematical model building module is used for building a multi-agent ship course discrete nonlinear control system mathematical model between the input and the output of the system based on the multi-agent course information;
the multi-agent ship course system tracking error module is used for constructing a multi-agent course tracking error dynamic model and a transformation system based on the course information of each agent and the virtual leader in the multi-agent system;
the fuzzy evaluation module is used for designing a fuzzy evaluation cost function based on a preset tracking performance threshold value based on the multi-agent course tracking error to complete the design of a fuzzy evaluation self-adaptive update rate;
the virtual controller construction module is used for designing a virtual control function of the multi-agent system by using the error between the output signal and the reference signal;
the multi-gradient recursion reinforcement learning self-adaptive update rate module is used for obtaining a multi-gradient recursion self-adaptive update rate based on the fuzzy evaluation module information and the strategy utility function;
the multi-gradient recursion reinforcement learning fuzzy controller construction module is used for obtaining the course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation self-adaption updating rate, the virtual control function, the strategy utility function and the multi-gradient recursion self-adaption updating rate.
CN202010697834.8A 2020-07-20 2020-07-20 Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system Active CN111948937B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010697834.8A CN111948937B (en) 2020-07-20 2020-07-20 Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010697834.8A CN111948937B (en) 2020-07-20 2020-07-20 Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system

Publications (2)

Publication Number Publication Date
CN111948937A CN111948937A (en) 2020-11-17
CN111948937B true CN111948937B (en) 2021-07-06

Family

ID=73340723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010697834.8A Active CN111948937B (en) 2020-07-20 2020-07-20 Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system

Country Status (1)

Country Link
CN (1) CN111948937B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112947084B (en) * 2021-02-08 2022-09-23 重庆大学 Model unknown multi-agent consistency control method based on reinforcement learning
CN113359474B (en) * 2021-07-06 2022-09-16 杭州电子科技大学 Extensible distributed multi-agent consistency control method based on gradient feedback
CN114200830B (en) * 2021-11-11 2023-09-22 辽宁石油化工大学 Multi-agent consistency reinforcement learning control method
CN116400691B (en) * 2023-03-29 2023-11-21 大连海事大学 Novel discrete time specified performance reinforcement learning unmanned ship course tracking control method and system
CN116300949A (en) * 2023-03-29 2023-06-23 大连海事大学 Course tracking control method and system for discrete time reinforcement learning unmanned ship

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3238272B2 (en) * 1994-02-16 2001-12-10 三菱重工業株式会社 Fuzzy control type mooring equipment
CN109188909B (en) * 2018-09-26 2021-04-23 大连海事大学 Self-adaptive fuzzy optimal control method and system for ship course nonlinear discrete system
CN109062058B (en) * 2018-09-26 2021-03-19 大连海事大学 Ship course track tracking design method based on self-adaptive fuzzy optimal control
CN109857117B (en) * 2019-03-07 2021-10-29 广东华中科技大学工业技术研究院 Unmanned ship cluster formation method based on distributed pattern matching
CN110262524B (en) * 2019-08-02 2022-04-01 大连海事大学 Design method of unmanned ship cluster optimal aggregation controller
CN110658829B (en) * 2019-10-30 2021-03-30 武汉理工大学 Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning
CN111290387B (en) * 2020-02-21 2022-06-03 大连海事大学 Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system

Also Published As

Publication number Publication date
CN111948937A (en) 2020-11-17

Similar Documents

Publication Publication Date Title
CN111948937B (en) Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system
CN109507885B (en) Model-free self-adaptive AUV control method based on active disturbance rejection
CN110687799B (en) Fuzzy self-adaptive output feedback control method and system for intelligent ship autopilot system
CN111273549B (en) Fuzzy self-adaptive output feedback fault-tolerant control method and system for intelligent ship autopilot system
CN112612209B (en) Full-drive ship track tracking control method and system based on instruction filtering neural network controller
CN111897225B (en) Fuzzy self-adaptive output feedback control method and system for intelligent ship autopilot system
CN111290387B (en) Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system
CN113110511B (en) Intelligent ship course control method based on generalized fuzzy hyperbolic model
CN110703605B (en) Self-adaptive fuzzy optimal control method and system for intelligent ship autopilot system
CN111308890B (en) Unmanned ship data-driven reinforcement learning control method with designated performance
CN111198502B (en) Unmanned ship track tracking control method based on interference observer and fuzzy system
CN114115262B (en) Multi-AUV actuator saturation cooperative formation control system and method based on azimuth information
CN114442640B (en) Track tracking control method for unmanned surface vehicle
CN112782981B (en) Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system
CN109656142B (en) Cascade structure model-free self-adaptive guidance method for unmanned ship
CN111221335A (en) Fuzzy self-adaptive output feedback finite time control method and system for intelligent ship autopilot system
Mu et al. Path following for podded propulsion unmanned surface vehicle: Theory, simulation and experiment
CN114967702A (en) Unmanned ship control system and path tracking method
CN117452827B (en) Under-actuated unmanned ship track tracking control method
CN114967714A (en) Anti-interference motion control method and system for autonomous underwater robot
CN113467231A (en) Unmanned ship path tracking method based on sideslip compensation ILOS guidance law
CN112698575B (en) Intelligent ship autopilot adaptive fuzzy output feedback control method and system
CN116527515A (en) Remote state estimation method based on polling protocol
CN114564029B (en) Full-drive ship track tracking control method and device based on direct parameterization method
CN113985898B (en) Nonlinear path tracking control method of under-actuated marine aircraft

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant