CN111948937A - Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system - Google Patents

Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system Download PDF

Info

Publication number
CN111948937A
CN111948937A CN202010697834.8A CN202010697834A CN111948937A CN 111948937 A CN111948937 A CN 111948937A CN 202010697834 A CN202010697834 A CN 202010697834A CN 111948937 A CN111948937 A CN 111948937A
Authority
CN
China
Prior art keywords
agent
course
fuzzy
ship
gradient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010697834.8A
Other languages
Chinese (zh)
Other versions
CN111948937B (en
Inventor
李铁山
龙跃
程玉华
李美霖
李耀仑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202010697834.8A priority Critical patent/CN111948937B/en
Publication of CN111948937A publication Critical patent/CN111948937A/en
Application granted granted Critical
Publication of CN111948937B publication Critical patent/CN111948937B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/0275Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using fuzzy logic only
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/0206Control of position or course in two dimensions specially adapted to water vehicles

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Remote Sensing (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a multi-gradient recursive reinforcement learning fuzzy control method and device for a multi-agent system, and belongs to the technical field of ship control of the multi-agent system. The invention mainly aims at a multi-agent ship course discrete system, and improves the speed and the precision of multi-agent course tracking while realizing an optimized control target by adopting lower system energy consumption through multi-gradient recursive reinforcement learning fuzzy control. In addition, the invention provides a multi-gradient recursive learning algorithm, which solves the problem of local extremum in the learning process of the fuzzy logic system weight, enables the weight to be faster and more accurately converged, and improves the reliability and stability of the system.

Description

Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system
Technical Field
The invention belongs to the technical field of ship control of a multi-agent system, and particularly relates to a multi-gradient recursive reinforcement learning fuzzy control method and system of the multi-agent system.
Background
The intelligent ship motion has the characteristics of large time lag, large inertia, nonlinearity and the like, the parameter perturbation of the control model is generated by the change of the navigational speed and the loading, and the uncertainty is generated in the course control system of the intelligent ship by the factors of the change of the navigational condition, the interference of environmental parameters, the measurement inaccuracy and the like. Aiming at the problems caused by the nonlinear uncertain dynamics, the intelligent algorithm is continuously applied to the field of intelligent ship heading control, such as self-adaptive control, robust control, fuzzy self-adaptive control, iterative sliding mode control and a least parameter learning method. A multi-agent system composed of a plurality of intelligent ships is considered, each intelligent ship has independent dynamic and can interact with the environment, the complex problem of multi-ship course control is converted, the reference signal is used as a virtual leader, and course consistency control of the multi-agent system is completed on the premise of low cost. At present, most of the existing multi-agent system ship course consistency design methods based on the fuzzy logic system do not consider the problem of weight convergence local extreme values, and the ship course tracking speed is slow due to the fact that the ship has large inertia, so that energy consumption of a controller and abrasion of a steering engine are serious. In addition, compromise between control performance and control cost is less considered in the ship course consistency control result of the conventional multi-agent system, and the use cost is high, so that the engineering implementation is not facilitated.
Disclosure of Invention
In view of the problems in the background art, the present invention provides a multi-gradient recursive reinforcement learning fuzzy control method and apparatus for a multi-agent system. The invention mainly aims at the multi-agent ship course discrete system, and can effectively reduce the energy consumption of the controller, reduce the abrasion of the steering engine and improve the course consistency control speed and precision of the multi-agent ship course through multi-gradient recursion reinforcement learning fuzzy control.
In order to achieve the purpose, the technical scheme of the invention is as follows:
the multi-gradient recursion reinforcement learning fuzzy control method of the multi-agent system comprises the following steps:
s1, transmitting the collected multi-agent course information to a ship-mounted computer, wherein the ship-mounted computer establishes a multi-agent ship course discrete nonlinear control system mathematical model related to ship course angles by considering the ship steady-state rotation nonlinear characteristic, and the course information comprises rudder angle information measured according to a multi-agent ship steering engine and current course angle information measured by a compass;
s2, the multi-agent shipborne computer obtains a multi-agent course tracking dynamic error and a multi-agent system course tracking transformation system based on the course angle dynamic error of the agents and the virtual leader reference signal, and the course angle change rate of the agents and the dynamic error of the virtual control function;
s3, according to the course tracking dynamic error and the tracking performance threshold of the multi-agent, designing a utility function for obtaining a strategy utility function in a fuzzy evaluation module, obtaining a cost function for designing the fuzzy evaluation module by utilizing the general approximation principle and the Bellman principle of a fuzzy logic system, and designing the self-adaptive update rate of the fuzzy evaluation module based on a multi-gradient recursion method;
s4, designing a virtual controller and a strategy utility function of the multi-agent system in a fuzzy execution module according to the connection weight of each agent in the multi-agent system, and designing the self-adaptive update rate of the fuzzy execution module based on a multi-gradient recursion method;
s5, designing and obtaining a course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation adaptive update rate, the virtual control function, the strategy utility function and the multi-gradient recursive adaptive update rate, thereby obtaining a control input rudder angle of the multi-agent system, transmitting the rudder angle instruction to a multi-agent ship steering engine to output the multi-agent ship course angle, and further realizing the course consistency control of the multi-agent system.
Further, in step S1, establishing a mathematical model of the multi-agent ship heading discrete nonlinear control system, which includes the specific processes:
the multi-agent shipborne computer utilizes the collected rudder angle information and course angle information, considers the ship steady-state rotation nonlinear characteristic, and establishes a multi-agent nonlinear discrete system mathematical model as follows:
Figure BDA0002591888780000021
in the formula (1), xii,1(k) The course angle of the ith agent in the multi-agent system is 1, wherein N is the sequence number of the agents in the multi-agent system, 1 is a first subsystem, and k is the time; xii,2(k) Is the rate of change of course angle, 2 is the second subsystem, ui(k) Inputting a rudder angle; y isi(k) As output of the system, gi=Ki/TiTo control the gain, wherein KiIs a ship's turning index, TiIs a ship follow-up index, fi,2i,2(k) Is an unknown non-linear function, di(k) Is an unknown but bounded external disturbance and satisfies
Figure BDA0002591888780000022
Is an unknown positive number;
further, the specific process of establishing the multi-agent system course tracking transformation system in the step S2 is as follows:
the multi-agent shipborne computer designs a course tracking error variable by utilizing course information:
Figure BDA0002591888780000023
in the formula (2), the reaction mixture is,i,1(k) for the course angle dynamic errors of the ith and jth agents and the reference signal in the multi-agent system,i,2(k) the heading angle change rate xi of the ith agenti,2(k) And a virtual control function alphai,1(k) Error variable of ai,jIs the connection weight between the ith agent and the jth agent, ai,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent systemd(k) Participating for smoothly bounded virtual leaderExamining a track;
in order to facilitate the course consistency control design of the multi-agent system and avoid the problem of no correlation of subsystems, the system transformation is carried out on the formula (1) to establish a multi-agent course tracking transformation system:
Figure BDA0002591888780000031
further, the specific establishment process of the fuzzy evaluation module of the multi-agent system in the step S3 is as follows:
course angle dynamic error based on multi-intelligent-body shipborne computeri,1(k) And tracking performance threshold, designing utility function pii(k) Is composed of
Figure BDA0002591888780000032
In the formula is more than 0, pii(k) 0 means that tracking performance is acceptable, pii(k) With 1 representing unacceptable tracking performance, a utility function pi is usedi(k) Design of the strategic Utility function Mi(k) Is composed of
Figure BDA0002591888780000033
Wherein 0 < gammai< 1 is a design parameter, L is a time range, and the formula (5) can be expressed as
Figure BDA0002591888780000034
The strategy utility function M can be obtained by using the general approximation principle of the fuzzy logic systemi(k) Is composed of
Figure BDA0002591888780000035
In the formula [ theta ]i,cTo satisfy
Figure BDA0002591888780000036
C denotes the evaluation module,
Figure BDA0002591888780000037
is an unknown positive number of the positive numbers,
Figure BDA0002591888780000038
as a function of weight is θi,c(k) The transpose of (a) is performed,
Figure BDA0002591888780000039
is a bounded fuzzy basis function and satisfies
Figure BDA00025918887800000310
Is composed of
Figure BDA00025918887800000311
Transpose of vi,c(k) Is to approximate the error, and satisfies
Figure BDA00025918887800000312
Is an unknown positive number;
further, Bellman error is defined
Figure BDA00025918887800000313
Is composed of
Figure BDA0002591888780000041
In the formula
Figure BDA0002591888780000042
Is an ideal parameter thetai,cIs estimated by the estimation of (a) a,
Figure BDA0002591888780000043
is that
Figure BDA0002591888780000044
The transpose of (a) is performed,
Figure BDA0002591888780000045
is Mi(k) (ii) an estimate of (d);
according to equation (7), the cost function is defined as
Figure BDA0002591888780000046
To make the cost function phii,c(k) The minimization is achieved, and the self-adaptive update rate of the evaluation module is designed into
Figure BDA0002591888780000047
Is composed of
Figure BDA0002591888780000048
In the formula ofi,c>0,0<γi<1,
Figure BDA0002591888780000049
Is a bounded fuzzy basis function and satisfies
Figure BDA00025918887800000410
T denotes the transpose of the image,
Figure BDA00025918887800000411
for gradient index, p is a positive integer to represent the gradient length;
further, the virtual controller α of the multi-agent system in the step S4i,1(k) And fuzzy execution module multi-gradient recursive adaptive update rate
Figure BDA00025918887800000412
The specific establishment process comprises the following steps:
design the virtual controller as
Figure BDA00025918887800000413
Defining policy utility functions for multi-agent system fuzzy execution modules
Figure BDA00025918887800000414
Is composed of
Figure BDA00025918887800000415
In the formula, Si,2(k)=[ξi,1(k),ξi,2(k),yd(k)]T
Figure BDA00025918887800000416
Is a bounded fuzzy basis function and satisfies
Figure BDA00025918887800000417
In order to estimate the parameters in an ideal manner,
Figure BDA00025918887800000418
is composed of
Figure BDA00025918887800000419
Transposing;
according to equation (10), the cost function is defined as
Figure BDA00025918887800000420
To make the cost function phii,2(k) The minimization is achieved, and the multi-gradient recursive adaptive update rate is designed according to the multi-gradient recursive algorithm
Figure BDA00025918887800000421
Is composed of
Figure BDA00025918887800000422
In the formula, mui,2>0,
Figure BDA0002591888780000051
Figure BDA0002591888780000052
Is provided withFuzzy basis functions of the boundary, and satisfies inequality
Figure BDA0002591888780000053
Further, the specific solving process of controlling the input rudder angle in the step S5 is as follows:
determining a multi-gradient recursive reinforcement learning controller for a multi-agent system: the virtual controller alpha will be obtainedi,1(k) Evaluation module and multi-gradient recursive adaptive update rate
Figure BDA0002591888780000054
And
Figure BDA0002591888780000055
obtaining actual control input u of system by using multi-agent shipborne computeri(k) Comprises the following steps:
Figure BDA0002591888780000056
in the formula, parameter ci,1>0,ci,2>0,
Figure BDA0002591888780000057
Further, the virtual leader reference signal in step 2 and the tracking performance threshold in step 3 are designed according to actual requirements, preferably < 5 °.
A multi-gradient recursion reinforcement learning fuzzy control system of a multi-agent system comprises a data acquisition unit, a data transmission unit, a multi-agent shipborne computer and a data feedback unit;
the data acquisition unit is used for acquiring course information in the ship navigation process; the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer; the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and completing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship; the data feedback unit is used for transmitting rudder angle instructions obtained by the multi-agent ship-borne computer to the multi-agent ship steering engine to output multi-agent ship course angles, and realizing course track tracking consistency control of the multi-agent system,
the multi-agent shipborne computer comprises a multi-agent ship course tracking system mathematical model building module, a multi-agent ship course system tracking error module, a fuzzy evaluation module, a virtual controller building module, a multi-gradient recursion reinforcement learning self-adaptive update rate module, a multi-gradient recursion reinforcement learning fuzzy controller building module and a data feedback unit;
the multi-agent ship course tracking system mathematical model building module is used for building a multi-agent ship course discrete nonlinear control system mathematical model between the input and the output of the system based on the multi-agent course information;
the multi-agent ship course system tracking error module is used for constructing a multi-agent course tracking error dynamic model and a transformation system based on the course information of each agent and the virtual leader in the multi-agent system;
the fuzzy evaluation module is used for designing a fuzzy evaluation cost function based on a preset tracking performance threshold value based on the multi-agent course tracking error to complete the design of a fuzzy evaluation self-adaptive update rate;
the virtual controller construction module is used for designing a virtual control function of the multi-agent system by utilizing the error between the output signal and the reference signal, and designing a virtual controller according to the virtual control function;
the multi-gradient recursion reinforcement learning self-adaptive update rate module is used for obtaining a multi-gradient recursion self-adaptive update rate based on the fuzzy evaluation module information and the strategy utility function;
the multi-gradient recursion reinforcement learning fuzzy controller construction module is used for obtaining the course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation self-adaption updating rate, the virtual control function, the strategy utility function and the multi-gradient recursion self-adaption updating rate.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
compared with the prior art, on one hand, the multi-agent ship course tracking control method based on the fuzzy evaluation signal aims at considering a multi-agent ship course system, the fuzzy evaluation signal and the multi-gradient recursion reinforcement learning controller are used for solving the problem of multi-agent course tracking consistency control, effectively reducing the energy consumption of the controller, reducing the abrasion of a steering engine, being more suitable for solving the problem of multi-agent ship motion control with the characteristics of large time lag, large inertia and nonlinearity, and improving the speed and the precision of multi-agent course tracking while realizing the optimization control target by adopting lower system energy consumption. On the other hand, the invention provides a multi-gradient recursive learning algorithm, which solves the problem of local extremum in the learning process of the weight of the fuzzy logic system, enables the weight to be converged more quickly and accurately, and improves the reliability and stability of the system.
Drawings
FIG. 1 is a flow chart of a control method of the present invention.
FIG. 2 is a communication topology diagram of the multi-agent system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a flow chart of a control method of the present invention. As shown in FIG. 1, the present invention discloses a multi-gradient recursive reinforcement learning fuzzy control method for a multi-agent system, which specifically comprises the following steps:
step 1: transmitting multi-agent system course information to a ship-borne computer according to rudder angle data measured by a multi-agent ship steering engine and current course angle data measured by a compass, and establishing a mathematical model related to a multi-agent ship course discrete nonlinear control system, wherein the course information comprises the rudder angle data measured by the ship steering engine and the current course angle data measured by the compass, and the mathematical model of the multi-agent ship course nonlinear discrete system is established by considering the stable rotation nonlinear characteristic of a ship:
Figure BDA0002591888780000071
in the formula (1), xii,1(k) The heading angle of the ith intelligent agent in the multi-intelligent-agent system is shown, i is 1, …, N is the sequence number of the intelligent agent in the multi-intelligent-agent system, the corner mark 1 represents the first subsystem, and k is the time; xii,2(k) For the course angular rate of change, the corner 2 indicates the second subsystem, ui(k) Inputting a rudder angle; y isi(k) As output of the system, gi=Ki/TiTo control the gain, wherein KiIs a ship's turning index, TiIs a ship following index, fi,2i,2(k) Is an unknown non-linear function, di(k) Is an unknown but bounded external disturbance and satisfies
Figure BDA0002591888780000072
Is an unknown positive number;
step 2: designing a multi-agent system ship course transformation system: the multi-agent shipborne computer designs a course tracking error variable by utilizing course information:
Figure BDA0002591888780000073
in the formula (2), the reaction mixture is,i,1(k) for the course angle dynamic errors of the ith and jth agents and the reference signal in the multi-agent system,i,2(k) is the state variable xi of the ith agenti,2(k) And a virtual controller alphai,1(k) Error variable of ai,jFor the ith agent andconnection weight between jth agents, ai,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent systemd(k) For smoothly bounded virtual leader reference trajectories, αi,1(k) Is a virtual controller to be designed;
in order to facilitate the course consistency control design of the multi-agent system and avoid the problem of no correlation of subsystems, the system transformation is carried out on the formula (1) to establish a multi-agent course tracking transformation system:
Figure BDA0002591888780000074
and step 3: designing a fuzzy evaluation module of the multi-agent system: tracking dynamic error based on multi-intelligent-body shipborne computeri,1(k) And presetting a tracking performance threshold value and designing a utility function pii(k) In order to realize the purpose,
Figure BDA0002591888780000075
in the formula is more than 0, pii(k) 0 means that tracking performance is acceptable, pii(k) 1 represents unacceptable tracking performance, and the utility function pi is calculated by using an on-board computeri(k) Design of the strategic Utility function Mi(k) Is composed of
Figure BDA0002591888780000081
Wherein 0 < gammai< 1 is a design parameter, L is a time range, and the formula (5) can be expressed as
Figure BDA0002591888780000082
The strategy utility function M can be obtained by using the general approximation principle of the fuzzy logic systemi(k) Is composed of
Figure BDA0002591888780000083
In the formula [ theta ]i,cTo satisfy
Figure BDA0002591888780000084
The ideal adjustable parameters of the pressure sensor and the pressure sensor,
Figure BDA0002591888780000085
is an unknown positive number of the positive numbers,
Figure BDA0002591888780000086
as a function of weight is θi,c(k) The transpose of (a) is performed,
Figure BDA0002591888780000087
is a bounded fuzzy basis function and satisfies
Figure BDA0002591888780000088
Is composed of
Figure BDA0002591888780000089
Transpose of vi,c(k) Is to approximate the error, and satisfies
Figure BDA00025918887800000810
Is an unknown positive number;
further, Bellman error is defined
Figure BDA00025918887800000811
Is composed of
Figure BDA00025918887800000812
In the formula
Figure BDA00025918887800000813
Is an ideal parameter thetai,cIs estimated by the estimation of (a) a,
Figure BDA00025918887800000814
is that
Figure BDA00025918887800000815
The transpose of (a) is performed,
Figure BDA00025918887800000816
is Mi(k) (ii) an estimate of (d);
according to equation (7), the cost function is defined as
Figure BDA00025918887800000817
To make the cost function phii,c(k) The minimization is achieved, and the self-adaptive update rate of the evaluation module is designed into
Figure BDA00025918887800000818
Is composed of
Figure BDA00025918887800000819
In the formula ofi,c>0,0<γiLess than 1, is a parameter to be designed,
Figure BDA00025918887800000820
Figure BDA00025918887800000821
is a bounded fuzzy basis function and satisfies
Figure BDA00025918887800000822
For gradient index, p is a positive integer to represent the gradient length;
and 4, step 4: virtual controller alpha for designing multi-agent systemi,1(k) Multi-gradient recursive adaptive update rate with fuzzy execution module
Figure BDA0002591888780000091
The virtual controller is designed such that,
Figure BDA0002591888780000092
defining policy utility functions for multi-agent system fuzzy execution modules
Figure BDA0002591888780000093
Is composed of
Figure BDA0002591888780000094
In the formula, Si,2(k)=[ξi,1(k),ξi,2(k),yd(k)]T
Figure BDA0002591888780000095
Is a bounded fuzzy basis function and satisfies
Figure BDA0002591888780000096
In order to estimate the parameters in an ideal manner,
Figure BDA0002591888780000097
is composed of
Figure BDA0002591888780000098
Transposing;
according to equation (10), the cost function is defined as
Figure BDA0002591888780000099
To make the cost function phii,2(k) The minimization is achieved, and the multi-gradient recursive adaptive update rate is designed according to the multi-gradient recursive algorithm
Figure BDA00025918887800000910
Is composed of
Figure BDA00025918887800000911
In the formula, mui,2Is more than 0, is a parameter to be designed,
Figure BDA00025918887800000912
Figure BDA00025918887800000913
is a bounded fuzzy basis function and satisfies an inequality
Figure BDA00025918887800000914
And 5: determining a multi-gradient recursive reinforcement learning controller for a multi-agent system: the virtual controller alpha will be obtainedi,1(k) Evaluation module and multi-gradient recursive adaptive update rate
Figure BDA00025918887800000915
And
Figure BDA00025918887800000916
obtaining actual control input u of system by using multi-agent shipborne computeri(k) Instructions for:
Figure BDA00025918887800000917
in the formula, parameter ci,1>0,ci,2>0,
Figure BDA00025918887800000918
The invention also provides a multi-gradient recursion reinforcement learning fuzzy control system of the multi-agent system, which comprises the following components:
the data acquisition unit is used for acquiring course information in the ship navigation process, wherein the course information comprises rudder angle data and current course angle data;
the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer;
the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and finishing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship, and specifically comprises the following steps:
the multi-agent ship course tracking system mathematical model building module is used for building a multi-agent ship system mathematical model between the input and the output of the system based on the multi-agent course information;
the multi-agent ship course system tracking error module is used for constructing a multi-agent course tracking error dynamic model and a transformation system based on the course information of each agent and the virtual leader in the multi-agent system;
the fuzzy evaluation module is used for designing a fuzzy evaluation cost function based on a preset tracking performance threshold value based on the multi-agent course tracking error to complete the design of a fuzzy evaluation self-adaptive update rate;
the virtual controller building module is used for designing a virtual control function of the multi-agent system by utilizing the error between the output signal and the reference signal and designing a virtual controller according to the virtual control function;
the multi-gradient recursion reinforcement learning self-adaptive update rate module is used for obtaining a multi-gradient recursion self-adaptive update rate based on the evaluation fuzzy evaluation module information and the strategy utility function;
a multi-gradient recursion reinforcement learning fuzzy controller building module used for obtaining a course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation self-adaptive update rate, the virtual control function, the strategy utility function and the multi-gradient recursion self-adaptive update rate,
and the data feedback unit is used for transmitting the rudder angle instruction of the multi-agent shipborne calculation to the multi-agent ship steering engine module, and the steering engine module outputs the multi-agent ship course angle to realize course consistency control of the multi-agent system.
FIG. 2 is an example of a communication topology of a multi-agent system employed by the present invention. As can be seen in the figure, 0 is the virtual leader, 1/2/3 and 4 are both single agents in a multi-agent system. The virtual leader and the single agent only have one-way information flow, and the information flow between the single agents can be one-way or two-way. And a single agent intelligently receives information from neighbors, and cannot obtain all information of all individuals. A spanning tree exists in the communication topological graph, and the necessary condition for realizing consistency control is met.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.

Claims (8)

1. The multi-gradient recursion reinforcement learning fuzzy control method of the multi-agent system is characterized by comprising the following steps:
s1, transmitting the collected multi-agent course information to a ship-mounted computer, wherein the ship-mounted computer establishes a multi-agent ship course discrete nonlinear control system mathematical model related to ship course angles by considering the ship steady-state rotation nonlinear characteristic, and the course information comprises rudder angle information measured according to a multi-agent ship steering engine and current course angle information measured by a compass;
s2, the multi-agent shipborne computer obtains a multi-agent course tracking dynamic error and a multi-agent system course tracking transformation system based on the course angle dynamic error of the agents and the virtual leader reference signal, the course angle change rate of the agents and the dynamic error of the virtual controller;
s3, according to the course tracking dynamic error and the tracking performance threshold of the multi-agent, designing a utility function for obtaining a strategy utility function in the fuzzy evaluation module based on the tracking performance threshold, obtaining a cost function for designing the fuzzy evaluation module by utilizing the general approximation principle and the Bellman principle of the fuzzy logic system, and designing the self-adaptive update rate of the fuzzy evaluation module based on the multi-gradient recursion method;
s4, designing a virtual controller and a strategy utility function of the multi-agent system in a fuzzy execution module according to the connection weight of each agent in the multi-agent system, and designing the self-adaptive update rate of the fuzzy execution module based on a multi-gradient recursion method;
s5, designing and obtaining a course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking dynamic error, the fuzzy evaluation cost function, the fuzzy evaluation adaptive update rate, the virtual control function, the strategy utility function and the multi-gradient recursive adaptive update rate, thereby obtaining a control input rudder angle of the multi-agent system, transmitting the rudder angle instruction to a multi-agent ship steering engine to output the multi-agent ship course angle, and further realizing the course consistency control of the multi-agent system.
2. The multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, wherein in step S1, the multi-agent shipborne computer uses the collected rudder angle information and course angle information to establish a mathematical model of the multi-agent ship course discrete nonlinear control system by considering the ship steady-state rotation nonlinear characteristic, and the specific formula is as follows:
Figure FDA0002591888770000011
in the formula (1), xii,1(k) The heading angle of the ith intelligent agent in the multi-intelligent-agent system is shown, wherein i is 1, N is the sequence number of the intelligent agent in the multi-intelligent-agent system, the angle mark 1 is the 1 st subsystem, and k is the time; xii,2(k) For the rate of change of course angle, the corner mark 2 is the 2 nd subsystem, ui(k) Inputting a rudder angle; y isi(k) As output of the system, gi=Ki/TiTo control the gain, wherein KiIs a ship's turning index, TiIs a ship follow-up index, fi,2i,2(k) Is an unknown non-linear function, di(k) Is an unknown but bounded external disturbance and satisfies
Figure FDA0002591888770000021
Figure FDA0002591888770000022
Is an unknown positive number.
3. The multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, wherein the specific process of establishing the multi-agent system course tracking transformation system in the step S2 is as follows:
the multi-agent shipborne computer designs a course tracking dynamic error by utilizing course information:
Figure FDA0002591888770000023
in the formula (2), the reaction mixture is,i,1(k) for the course angle dynamic errors of the ith and jth agents and the reference signal in the multi-agent system,i,2(k) the heading angle change rate xi of the ith agenti,2(k) And a virtual control function alphai,1(k) Error variable of ai,jIs the connection weight between the ith agent and the jth agent, ai,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent systemd(k) Referencing a trajectory for a smooth bounded virtual leader;
in order to facilitate the course consistency control design of the multi-agent system and avoid the problem of no correlation of subsystems, the system transformation is carried out on the formula (1) to establish a multi-agent course tracking transformation system:
Figure FDA0002591888770000024
4. the multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, wherein the fuzzy evaluation module of the multi-agent system in the step S3 is specifically established by:
course angle dynamic error based on multi-intelligent-body shipborne computeri,1(k) And tracking performance threshold, designing utility function pii(k) Is composed of
Figure FDA0002591888770000025
In the formula is more than 0, pii(k) 0 means that tracking performance is acceptable, pii(k) With 1 representing unacceptable tracking performance, a utility function pi is usedi(k) Design of the strategic Utility function Mi(k) Is composed of
Figure FDA0002591888770000026
Wherein 0 < gammai< 1 is a design parameter, L is a time range, and the formula (5) can be expressed as
Figure FDA0002591888770000031
Obtaining a strategy utility function M by using a general approximation principle of a fuzzy logic systemi(k) As follows below, the following description will be given,
Figure FDA0002591888770000032
in the formula [ theta ]i,cTo satisfy
Figure FDA0002591888770000033
C denotes the evaluation module,
Figure FDA0002591888770000034
is an unknown positive number of the positive numbers,
Figure FDA0002591888770000035
as a weight vector, is θi,c(k) The transpose of (a) is performed,
Figure FDA0002591888770000036
is a bounded fuzzy basis function and satisfies
Figure FDA0002591888770000037
Figure FDA0002591888770000038
Is composed of
Figure FDA0002591888770000039
Transpose of vi,c(k) Is to approximate the error, and satisfies
Figure FDA00025918887700000310
Figure FDA00025918887700000311
Is an unknown positive number;
defining Bellman error
Figure FDA00025918887700000312
Is composed of
Figure FDA00025918887700000313
In the formula
Figure FDA00025918887700000314
Figure FDA00025918887700000315
Is an ideal parameter thetai,cIs estimated by the estimation of (a) a,
Figure FDA00025918887700000316
is that
Figure FDA00025918887700000317
The transpose of (a) is performed,
Figure FDA00025918887700000318
is Mi(k) (ii) an estimate of (d);
according to equation (7), the cost function is defined as
Figure FDA00025918887700000319
To make the cost function phii,c(k) The minimization is achieved, and the self-adaptive update rate of the evaluation module is designed into
Figure FDA00025918887700000320
Is composed of
Figure FDA00025918887700000321
In the formula ofi,c>0,0<γi<1,
Figure FDA00025918887700000322
Figure FDA00025918887700000323
Is a bounded fuzzy basis function and satisfies
Figure FDA00025918887700000324
T denotes transpose, l is gradient index, and p is positive integer to denote gradient length.
5. The multi-gradient recursive reinforcement learning fuzzy control method of multi-agent system as claimed in claim 1, wherein the virtual controller α of multi-agent system in step S4i,1(k) And fuzzy execution module multi-gradient recursive adaptive update rate
Figure FDA00025918887700000325
The specific establishment process comprises the following steps:
design the virtual controller as
Figure FDA00025918887700000326
Defining policy utility functions for multi-agent system fuzzy execution modules
Figure FDA0002591888770000041
Is composed of
Figure FDA0002591888770000042
In the formula, Si,2(k)=[ξi,1(k),ξi,2(k),yd(k)]T
Figure FDA0002591888770000043
Is a bounded fuzzy basis function and satisfies
Figure FDA0002591888770000044
Figure FDA0002591888770000045
In order to estimate the parameters in an ideal manner,
Figure FDA0002591888770000046
is composed of
Figure FDA0002591888770000047
Transposing;
according to equation (10), the cost function is defined as
Figure FDA0002591888770000048
To make the cost function phii,2(k) The minimization is achieved, and the multi-gradient recursive adaptive update rate is designed according to the multi-gradient recursive algorithm
Figure FDA0002591888770000049
Is composed of
Figure FDA00025918887700000410
In the formula, mui,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),yd(k-l+1)]T
Figure FDA00025918887700000411
Is a bounded fuzzy basis function and satisfies an inequality
Figure FDA00025918887700000412
6. The multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, wherein said step S5 is to control the concrete solving process of the input rudder angle to be: determining a multi-gradient recursive reinforcement learning controller of a multi-agent system, combining the obtained virtual controller alphai,1(k) Evaluation module and multi-gradient recursive adaptive update rate
Figure FDA00025918887700000413
And
Figure FDA00025918887700000414
calculating actual control input u of system by using multi-agent shipborne computeri(k) In order to realize the purpose,
Figure FDA00025918887700000415
in the formula, parameter ci,1>0,ci,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),ai,0y0(k-l+1)]。
7. The multi-gradient recursive reinforcement learning fuzzy control method of multi-agent system as claimed in claim 4, wherein the tracking performance threshold in step 3 is designed according to actual requirements.
8. A multi-gradient recursion reinforcement learning fuzzy control device of a multi-agent system comprises a data acquisition unit, a data transmission unit, a multi-agent shipborne computer and a data feedback unit;
the data acquisition unit is used for acquiring course information in the ship navigation process; the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer; the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and completing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship; the data feedback unit is used for transmitting rudder angle instructions obtained by the multi-agent ship-borne computer to the multi-agent ship steering engine to output multi-agent ship course angles, so as to realize course track tracking consistency control of the multi-agent system,
the system is characterized in that the multi-agent shipborne computer comprises a multi-agent ship course tracking system mathematical model building module, a multi-agent ship course system tracking error module, a fuzzy evaluation module, a virtual controller building module, a multi-gradient recursion reinforcement learning self-adaption updating rate module, a multi-gradient recursion reinforcement learning fuzzy controller building module and a data feedback unit;
the multi-agent ship course tracking system mathematical model building module is used for building a multi-agent ship course discrete nonlinear control system mathematical model between the input and the output of the system based on the multi-agent course information;
the multi-agent ship course system tracking error module is used for constructing a multi-agent course tracking error dynamic model and a transformation system based on the course information of each agent and the virtual leader in the multi-agent system;
the fuzzy evaluation module is used for designing a fuzzy evaluation cost function based on a preset tracking performance threshold value based on the multi-agent course tracking error to complete the design of a fuzzy evaluation self-adaptive update rate;
the virtual controller construction module is used for designing a virtual control function of the multi-agent system by utilizing the error between the output signal and the reference signal, and designing a virtual controller according to the virtual control function;
the multi-gradient recursion reinforcement learning self-adaptive update rate module is used for obtaining a multi-gradient recursion self-adaptive update rate based on the fuzzy evaluation module information and the strategy utility function;
the multi-gradient recursion reinforcement learning fuzzy controller construction module is used for obtaining the course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation self-adaption updating rate, the virtual control function, the strategy utility function and the multi-gradient recursion self-adaption updating rate.
CN202010697834.8A 2020-07-20 2020-07-20 Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system Active CN111948937B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010697834.8A CN111948937B (en) 2020-07-20 2020-07-20 Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010697834.8A CN111948937B (en) 2020-07-20 2020-07-20 Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system

Publications (2)

Publication Number Publication Date
CN111948937A true CN111948937A (en) 2020-11-17
CN111948937B CN111948937B (en) 2021-07-06

Family

ID=73340723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010697834.8A Active CN111948937B (en) 2020-07-20 2020-07-20 Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system

Country Status (1)

Country Link
CN (1) CN111948937B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112947084A (en) * 2021-02-08 2021-06-11 重庆大学 Model unknown multi-agent consistency control method based on reinforcement learning
CN113359474A (en) * 2021-07-06 2021-09-07 杭州电子科技大学 Extensible distributed multi-agent consistency control method based on gradient feedback
CN114200830A (en) * 2021-11-11 2022-03-18 辽宁石油化工大学 Multi-agent consistency reinforcement learning control method
CN116300949A (en) * 2023-03-29 2023-06-23 大连海事大学 Course tracking control method and system for discrete time reinforcement learning unmanned ship
CN116400691A (en) * 2023-03-29 2023-07-07 大连海事大学 Novel discrete time specified performance reinforcement learning unmanned ship course tracking control method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07228486A (en) * 1994-02-16 1995-08-29 Mitsubishi Heavy Ind Ltd Fuzzy control type mooring device
CN109062058A (en) * 2018-09-26 2018-12-21 大连海事大学 Ship course track following design method based on adaptive fuzzy optimum control
CN109188909A (en) * 2018-09-26 2019-01-11 大连海事大学 Adaptive fuzzy method for optimally controlling and system towards ship course nonlinear discrete systems
CN109857117A (en) * 2019-03-07 2019-06-07 广东华中科技大学工业技术研究院 One kind being based on the matched unmanned boat cluster formation method of distributed mode
CN110262524A (en) * 2019-08-02 2019-09-20 大连海事大学 A kind of optimal aggregation controller of unmanned boat cluster and its design method
CN110658829A (en) * 2019-10-30 2020-01-07 武汉理工大学 Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning
CN111290387A (en) * 2020-02-21 2020-06-16 大连海事大学 Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07228486A (en) * 1994-02-16 1995-08-29 Mitsubishi Heavy Ind Ltd Fuzzy control type mooring device
CN109062058A (en) * 2018-09-26 2018-12-21 大连海事大学 Ship course track following design method based on adaptive fuzzy optimum control
CN109188909A (en) * 2018-09-26 2019-01-11 大连海事大学 Adaptive fuzzy method for optimally controlling and system towards ship course nonlinear discrete systems
CN109857117A (en) * 2019-03-07 2019-06-07 广东华中科技大学工业技术研究院 One kind being based on the matched unmanned boat cluster formation method of distributed mode
CN110262524A (en) * 2019-08-02 2019-09-20 大连海事大学 A kind of optimal aggregation controller of unmanned boat cluster and its design method
CN110658829A (en) * 2019-10-30 2020-01-07 武汉理工大学 Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning
CN111290387A (en) * 2020-02-21 2020-06-16 大连海事大学 Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112947084A (en) * 2021-02-08 2021-06-11 重庆大学 Model unknown multi-agent consistency control method based on reinforcement learning
CN112947084B (en) * 2021-02-08 2022-09-23 重庆大学 Model unknown multi-agent consistency control method based on reinforcement learning
CN113359474A (en) * 2021-07-06 2021-09-07 杭州电子科技大学 Extensible distributed multi-agent consistency control method based on gradient feedback
CN114200830A (en) * 2021-11-11 2022-03-18 辽宁石油化工大学 Multi-agent consistency reinforcement learning control method
CN114200830B (en) * 2021-11-11 2023-09-22 辽宁石油化工大学 Multi-agent consistency reinforcement learning control method
CN116300949A (en) * 2023-03-29 2023-06-23 大连海事大学 Course tracking control method and system for discrete time reinforcement learning unmanned ship
CN116400691A (en) * 2023-03-29 2023-07-07 大连海事大学 Novel discrete time specified performance reinforcement learning unmanned ship course tracking control method and system
CN116400691B (en) * 2023-03-29 2023-11-21 大连海事大学 Novel discrete time specified performance reinforcement learning unmanned ship course tracking control method and system

Also Published As

Publication number Publication date
CN111948937B (en) 2021-07-06

Similar Documents

Publication Publication Date Title
CN111948937B (en) Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system
CN110687799B (en) Fuzzy self-adaptive output feedback control method and system for intelligent ship autopilot system
CN111273549B (en) Fuzzy self-adaptive output feedback fault-tolerant control method and system for intelligent ship autopilot system
CN112612209B (en) Full-drive ship track tracking control method and system based on instruction filtering neural network controller
CN111308890B (en) Unmanned ship data-driven reinforcement learning control method with designated performance
CN110262494B (en) Collaborative learning and formation control method for isomorphic multi-unmanned ship system
CN111290387B (en) Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system
CN111897225B (en) Fuzzy self-adaptive output feedback control method and system for intelligent ship autopilot system
CN113110511B (en) Intelligent ship course control method based on generalized fuzzy hyperbolic model
CN110647154B (en) Course track tracking design method of intelligent ship autopilot system based on fuzzy state observer
CN110377036A (en) A kind of unmanned water surface ship Track In Track set time control method constrained based on instruction
CN114115262B (en) Multi-AUV actuator saturation cooperative formation control system and method based on azimuth information
CN114442640B (en) Track tracking control method for unmanned surface vehicle
CN109656142B (en) Cascade structure model-free self-adaptive guidance method for unmanned ship
CN110703605A (en) Self-adaptive fuzzy optimal control method and system for intelligent ship autopilot system
CN112782981B (en) Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system
CN111930124A (en) Fuzzy self-adaptive output feedback finite time control method and system for intelligent ship autopilot system
CN111221335A (en) Fuzzy self-adaptive output feedback finite time control method and system for intelligent ship autopilot system
CN114879671A (en) Unmanned ship trajectory tracking control method based on reinforcement learning MPC
Mu et al. Path following for podded propulsion unmanned surface vehicle: Theory, simulation and experiment
CN115268260A (en) Unmanned ship preset time trajectory tracking control method and system considering transient performance
CN114967714A (en) Anti-interference motion control method and system for autonomous underwater robot
CN115248553A (en) Event triggering adaptive PID track tracking fault-tolerant control method for under-actuated ship
CN112698575B (en) Intelligent ship autopilot adaptive fuzzy output feedback control method and system
Shen et al. Prescribed performance LOS guidance-based dynamic surface path following control of unmanned sailboats

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant