CN111948937B - Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system - Google Patents
Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system Download PDFInfo
- Publication number
- CN111948937B CN111948937B CN202010697834.8A CN202010697834A CN111948937B CN 111948937 B CN111948937 B CN 111948937B CN 202010697834 A CN202010697834 A CN 202010697834A CN 111948937 B CN111948937 B CN 111948937B
- Authority
- CN
- China
- Prior art keywords
- agent
- course
- fuzzy
- ship
- gradient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/0275—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using fuzzy logic only
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course or altitude of land, water, air, or space vehicles, e.g. automatic pilot
- G05D1/02—Control of position or course in two dimensions
- G05D1/0206—Control of position or course in two dimensions specially adapted to water vehicles
Abstract
The invention provides a multi-gradient recursive reinforcement learning fuzzy control method and device for a multi-agent system, and belongs to the technical field of ship control of the multi-agent system. The invention mainly aims at a multi-agent ship course discrete system, and improves the speed and the precision of multi-agent course tracking while realizing an optimized control target by adopting lower system energy consumption through multi-gradient recursive reinforcement learning fuzzy control. In addition, the invention provides a multi-gradient recursive learning algorithm, which solves the problem of local extremum in the learning process of the fuzzy logic system weight, enables the weight to be faster and more accurately converged, and improves the reliability and stability of the system.
Description
Technical Field
The invention belongs to the technical field of ship control of a multi-agent system, and particularly relates to a multi-gradient recursive reinforcement learning fuzzy control method and system of the multi-agent system.
Background
The intelligent ship motion has the characteristics of large time lag, large inertia, nonlinearity and the like, the parameter perturbation of the control model is generated by the change of the navigational speed and the loading, and the uncertainty is generated in the course control system of the intelligent ship by the factors of the change of the navigational condition, the interference of environmental parameters, the measurement inaccuracy and the like. Aiming at the problems caused by the nonlinear uncertain dynamics, the intelligent algorithm is continuously applied to the field of intelligent ship heading control, such as self-adaptive control, robust control, fuzzy self-adaptive control, iterative sliding mode control and a least parameter learning method. A multi-agent system composed of a plurality of intelligent ships is considered, each intelligent ship has independent dynamic and can interact with the environment, the complex problem of multi-ship course control is converted, the reference signal is used as a virtual leader, and course consistency control of the multi-agent system is completed on the premise of low cost. At present, most of the existing multi-agent system ship course consistency design methods based on the fuzzy logic system do not consider the problem of weight convergence local extreme values, and the ship course tracking speed is slow due to the fact that the ship has large inertia, so that energy consumption of a controller and abrasion of a steering engine are serious. In addition, compromise between control performance and control cost is less considered in the ship course consistency control result of the conventional multi-agent system, and the use cost is high, so that the engineering implementation is not facilitated.
Disclosure of Invention
In view of the problems in the background art, the present invention provides a multi-gradient recursive reinforcement learning fuzzy control method and apparatus for a multi-agent system. The invention mainly aims at the multi-agent ship course discrete system, and can effectively reduce the energy consumption of the controller, reduce the abrasion of the steering engine and improve the course consistency control speed and precision of the multi-agent ship course through multi-gradient recursion reinforcement learning fuzzy control.
In order to achieve the purpose, the technical scheme of the invention is as follows:
the multi-gradient recursion reinforcement learning fuzzy control method of the multi-agent system comprises the following steps:
s1, transmitting the collected multi-agent course information to a ship-mounted computer, wherein the ship-mounted computer establishes a multi-agent ship course discrete nonlinear control system mathematical model related to ship course angles by considering the ship steady-state rotation nonlinear characteristic, and the course information comprises rudder angle information measured according to a multi-agent ship steering engine and current course angle information measured by a compass;
s2, the multi-agent shipborne computer obtains a multi-agent course tracking dynamic error and a multi-agent system course tracking transformation system based on the course angle dynamic error of the agents and the virtual leader reference signal, and the course angle change rate of the agents and the dynamic error of the virtual control function;
s3, according to the course tracking dynamic error and the tracking performance threshold value of the multi-agent, designing a utility function for obtaining a strategy utility function in a fuzzy evaluation module, obtaining a cost function for designing the fuzzy evaluation module by utilizing the general approximation principle and the Bellman principle of a fuzzy logic system, and designing the fuzzy evaluation self-adaptive update rate based on a multi-gradient recursion method;
s4, designing a virtual control function and a strategy utility function of the multi-agent system in a fuzzy execution module according to the connection weight of each agent in the multi-agent system, and designing the self-adaptive update rate of the fuzzy execution module based on a multi-gradient recursion method;
s5, designing and obtaining a course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation adaptive update rate, the virtual control function, the strategy utility function and the multi-gradient recursive adaptive update rate, thereby obtaining a control input rudder angle of the multi-agent system, transmitting the rudder angle instruction to a multi-agent ship steering engine to output the multi-agent ship course angle, and further realizing the course consistency control of the multi-agent system.
Further, in step S1, establishing a mathematical model of the multi-agent ship heading discrete nonlinear control system, which includes the specific processes:
the multi-agent shipborne computer utilizes the collected rudder angle information and course angle information, considers the ship steady-state rotation nonlinear characteristic, and establishes a multi-agent nonlinear discrete system mathematical model as follows:
in the formula (1), xii,1(k) The course angle of the ith agent in the multi-agent system is 1, wherein N is the sequence number of the agents in the multi-agent system, 1 is a first subsystem, and k is the time; xii,2(k) Is the rate of change of course angle, 2 is the second subsystem, ui(k) Inputting a rudder angle; y isi(k) As output of the system, gi=Ki/TiTo control the gain, wherein KiIs a ship's turning index, TiIs a ship follow-up index, fi,2(ξi,2(k) Is an unknown non-linear function, di(k) Is an unknown but bounded external disturbance and satisfies Is an unknown positive number;
further, the specific process of establishing the multi-agent system course tracking transformation system in the step S2 is as follows:
the multi-agent shipborne computer designs a course tracking error variable by utilizing course information:
in the formula (2), δi,1(k) Is the course angle dynamic error, delta, of the ith and jth agents and the reference signal in the multi-agent systemi,2(k) The heading angle change rate xi of the ith agenti,2(k) And a virtual control function alphai,1(k) Error variable of ai,jIs the connection weight between the ith agent and the jth agent, ai,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent systemd(k) Referencing a trajectory for a smooth bounded virtual leader;
in order to facilitate the course consistency control design of the multi-agent system and avoid the problem of no correlation of subsystems, the system transformation is carried out on the formula (1) to establish a multi-agent course tracking transformation system:
further, the specific establishment process of the fuzzy evaluation module of the multi-agent system in the step S3 is as follows:
course angle dynamic error delta based on multi-intelligent-body shipborne computeri,1(k) And tracking performance threshold epsilon, designing utility function pii(k) Is composed of
In which epsilon is greater than 0, pii(k) 0 means that tracking performance is acceptable, pii(k) With 1 representing unacceptable tracking performance, a utility function pi is usedi(k) Design of the strategic Utility function Mi(k) Is composed of
Wherein 0 < gammai< 1 is a design parameter, L is a time range, and the formula (5) can be expressed as
The strategy utility function M can be obtained by using the general approximation principle of the fuzzy logic systemi(k) Is composed of
In the formula [ theta ]i,cTo satisfyC denotes the evaluation module,is an unknown positive number of the positive numbers,as a function of weight is θi,c(k) The transpose of (a) is performed,is a bounded fuzzy basis function and satisfies Is composed ofTranspose of vi,c(k) Is to approximate the error, and satisfies Is an unknown positive number;
In the formula Is an ideal parameter thetai,cIs estimated by the estimation of (a) a,is thatThe transpose of (a) is performed,is Mi(k) (ii) an estimate of (d);
according to equation (7), the cost function is defined asTo make the cost function phii,c(k) The minimization is achieved, and the self-adaptive update rate of the evaluation module is designed intoIs composed of
In the formula ofi,c>0,0<γi<1, Is a bounded fuzzy basis function and satisfiesT represents transposition, l is gradient index, and p is a positive integer to represent gradient length;
further, the virtual control function α of the multi-agent system in step S4i,1(k) And fuzzy execution module multi-gradient recursive adaptive update rateThe specific establishment process comprises the following steps:
designing a virtual control function as
In the formula, Si,2(k)=[ξi,1(k),ξi,2(k),yd(k)]T,Is a bounded fuzzy basis function and satisfies In order to estimate the parameters in an ideal manner,is composed ofTransposing;
according to equation (10), the cost function is defined asTo make the cost function phii,2(k) The minimization is achieved, and the multi-gradient recursive adaptive update rate is designed according to the multi-gradient recursive algorithmIs composed of
In the formula, mui,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),yd(k-l+1)]T,Is a bounded fuzzy basis function and satisfies an inequality
Further, the specific solving process of controlling the input rudder angle in the step S5 is as follows:
determining multi-gradients for multi-agent systemsA recursive reinforcement learning controller: the virtual controller alpha will be obtainedi,1(k) Evaluation module and multi-gradient recursive adaptive update rateAndobtaining actual control input u of system by using multi-agent shipborne computeri(k) Comprises the following steps:
in the formula, parameter ci,1>0,ci,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),ai,0y0(k-l+1)]。
Further, the virtual leader reference signal in step 2 and the tracking performance threshold epsilon in step 3 are designed according to actual requirements, preferably, epsilon is less than 5 degrees.
A multi-gradient recursion reinforcement learning fuzzy control system of a multi-agent system comprises a data acquisition unit, a data transmission unit, a multi-agent shipborne computer and a data feedback unit;
the data acquisition unit is used for acquiring course information in the ship navigation process; the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer; the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and completing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship; the data feedback unit is used for transmitting rudder angle instructions obtained by the multi-agent ship-borne computer to the multi-agent ship steering engine to output multi-agent ship course angles, and realizing course track tracking consistency control of the multi-agent system,
the multi-agent shipborne computer comprises a multi-agent ship course tracking system mathematical model building module, a multi-agent ship course system tracking error module, a fuzzy evaluation module, a virtual controller building module, a multi-gradient recursion reinforcement learning self-adaptive update rate module and a multi-gradient recursion reinforcement learning fuzzy controller building module;
the multi-agent ship course tracking system mathematical model building module is used for building a multi-agent ship course discrete nonlinear control system mathematical model between the input and the output of the system based on the multi-agent course information;
the multi-agent ship course system tracking error module is used for constructing a multi-agent course tracking error dynamic model and a transformation system based on the course information of each agent and the virtual leader in the multi-agent system;
the fuzzy evaluation module is used for designing a fuzzy evaluation cost function based on a preset tracking performance threshold value based on the multi-agent course tracking error to complete the design of a fuzzy evaluation self-adaptive update rate;
the virtual controller construction module is used for designing a virtual control function of the multi-agent system by utilizing the error between the output signal and the reference signal, and designing a virtual controller according to the virtual control function;
the multi-gradient recursion reinforcement learning self-adaptive update rate module is used for obtaining a multi-gradient recursion self-adaptive update rate based on the fuzzy evaluation module information and the strategy utility function;
the multi-gradient recursion reinforcement learning fuzzy controller construction module is used for obtaining the course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation self-adaption updating rate, the virtual control function, the strategy utility function and the multi-gradient recursion self-adaption updating rate.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
compared with the prior art, on one hand, the multi-agent ship course tracking control method based on the fuzzy evaluation signal aims at considering a multi-agent ship course system, the fuzzy evaluation signal and the multi-gradient recursion reinforcement learning controller are used for solving the problem of multi-agent course tracking consistency control, effectively reducing the energy consumption of the controller, reducing the abrasion of a steering engine, being more suitable for solving the problem of multi-agent ship motion control with the characteristics of large time lag, large inertia and nonlinearity, and improving the speed and the precision of multi-agent course tracking while realizing the optimization control target by adopting lower system energy consumption. On the other hand, the invention provides a multi-gradient recursive learning algorithm, which solves the problem of local extremum in the learning process of the weight of the fuzzy logic system, enables the weight to be converged more quickly and accurately, and improves the reliability and stability of the system.
Drawings
FIG. 1 is a flow chart of a control method of the present invention.
FIG. 2 is a communication topology diagram of the multi-agent system of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
FIG. 1 is a flow chart of a control method of the present invention. As shown in FIG. 1, the present invention discloses a multi-gradient recursive reinforcement learning fuzzy control method for a multi-agent system, which specifically comprises the following steps:
step 1: transmitting multi-agent system course information to a ship-borne computer according to rudder angle data measured by a multi-agent ship steering engine and current course angle data measured by a compass, and establishing a mathematical model related to a multi-agent ship course discrete nonlinear control system, wherein the course information comprises the rudder angle data measured by the ship steering engine and the current course angle data measured by the compass, and the mathematical model of the multi-agent ship course nonlinear discrete system is established by considering the stable rotation nonlinear characteristic of a ship:
in the formula (1), xii,1(k) The heading angle of the ith intelligent agent in the multi-intelligent-agent system is shown, wherein i is 1, N is the sequence number of the intelligent agent in the multi-intelligent-agent system, the corner mark 1 represents a first subsystem, and k is the time; xii,2(k) For the course angular rate of change, the corner 2 indicates the second subsystem, ui(k) Inputting a rudder angle; y isi(k) As output of the system, gi=Ki/TiTo control the gain, wherein KiIs a ship's turning index, TiIs a ship following index, fi,2(ξi,2(k) Is an unknown non-linear function, di(k) Is an unknown but bounded external disturbance and satisfies Is an unknown positive number;
step 2: designing a multi-agent system ship course transformation system: the multi-agent shipborne computer designs a course tracking error variable by utilizing course information:
in the formula (2), δi,1(k) For the course angle dynamic error, delta, of the ith and jth agents and reference signals in a multi-agent systemi,2(k) Is the state variable xi of the ith agenti,2(k) And a virtual controller alphai,1(k) Error variable of ai,jIs the connection weight between the ith agent and the jth agent, ai,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent systemd(k) For smoothly bounded virtual leader reference trajectories, αi,1(k) Is a virtual controller to be designed;
in order to facilitate the course consistency control design of the multi-agent system and avoid the problem of no correlation of subsystems, the system transformation is carried out on the formula (1) to establish a multi-agent course tracking transformation system:
and step 3: designing a fuzzy evaluation module of the multi-agent system: tracking dynamic error delta based on multi-intelligent-body shipborne computeri,1(k) And a preset tracking performance threshold epsilon, and designing a utility function pii(k) In order to realize the purpose,
in which epsilon is greater than 0, pii(k) 0 means that tracking performance is acceptable, pii(k) 1 represents unacceptable tracking performance, and the utility function pi is calculated by using an on-board computeri(k) Design of the strategic Utility function Mi(k) Is composed of
Wherein 0 < gammai< 1 is a design parameter, L is a time range, and the formula (5) can be expressed as
The strategy utility function M can be obtained by using the general approximation principle of the fuzzy logic systemi(k) Is composed of
In the formula [ theta ]i,cTo satisfyThe ideal adjustable parameters of the pressure sensor and the pressure sensor,is an unknown positive number of the positive numbers,as a function of weight is θi,c(k) The transpose of (a) is performed,is a bounded fuzzy basis function and satisfies Is composed ofTranspose of vi,c(k) Is to approximate the error, and satisfies Is an unknown positive number;
In the formula Is an ideal parameter thetai,cIs estimated by the estimation of (a) a,is thatThe transpose of (a) is performed,is Mi(k) (ii) an estimate of (d);
according to equation (7), the cost function is defined asTo make the cost function phii,c(k) The minimization is achieved, and the self-adaptive update rate of the evaluation module is designed intoIs composed of
In the formula ofi,c>0,0<γiLess than 1, is a parameter to be designed, is a bounded fuzzy basis function and satisfiesl is gradient index, p is positive integer to express gradient length;
and 4, step 4: designing virtual control function alpha of multi-agent systemi,1(k) Multi-gradient recursive adaptive update rate with fuzzy execution moduleThe virtual control function is designed such that,
In the formula, Si,2(k)=[ξi,1(k),ξi,2(k),yd(k)]T,Is a bounded fuzzy basis function and satisfies In order to estimate the parameters in an ideal manner,is composed ofTransposing;
according to equation (10), the cost function is defined asTo make the cost function phii,2(k) The minimization is achieved, and the multi-gradient recursive adaptive update rate is designed according to the multi-gradient recursive algorithmIs composed of
In the formula, mui,2> 0, is the parameter to be designed, Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),yd(k-l+1)]T,Is a bounded fuzzy basis function and satisfies an inequality
And 5: determining a multi-gradient recursive reinforcement learning controller for a multi-agent system: the virtual controller alpha will be obtainedi,1(k) Evaluation module and multi-gradient recursive adaptive update rateAndobtaining actual control input u of system by using multi-agent shipborne computeri(k) Instructions for:
in the formula, parameter ci,1>0,ci,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),ai,0y0(k-l+1)]。
The invention also provides a multi-gradient recursion reinforcement learning fuzzy control system of the multi-agent system, which comprises the following components:
the data acquisition unit is used for acquiring course information in the ship navigation process, wherein the course information comprises rudder angle data and current course angle data;
the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer; the data feedback unit is used for transmitting the rudder angle instruction of multi-agent shipborne calculation to the multi-agent ship steering engine module, and the steering engine module outputs the multi-agent ship course angle to realize course consistency control of the multi-agent system;
the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and finishing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship, and specifically comprises the following steps:
the multi-agent ship course tracking system mathematical model building module is used for building a multi-agent ship system mathematical model between the input and the output of the system based on the multi-agent course information;
the multi-agent ship course system tracking error module is used for constructing a multi-agent course tracking error dynamic model and a transformation system based on the course information of each agent and the virtual leader in the multi-agent system;
the fuzzy evaluation module is used for designing a fuzzy evaluation cost function based on a preset tracking performance threshold value based on the multi-agent course tracking error to complete the design of a fuzzy evaluation self-adaptive update rate;
the virtual controller building module is used for designing a virtual control function of the multi-agent system by utilizing the error between the output signal and the reference signal and designing a virtual controller according to the virtual control function;
the multi-gradient recursion reinforcement learning self-adaptive update rate module is used for obtaining a multi-gradient recursion self-adaptive update rate based on the evaluation fuzzy evaluation module information and the strategy utility function;
and the multi-gradient recursion reinforcement learning fuzzy controller building module is used for obtaining the course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation self-adaption updating rate, the virtual control function, the strategy utility function and the multi-gradient recursion self-adaption updating rate.
FIG. 2 is an example of a communication topology of a multi-agent system employed by the present invention. As can be seen in the figure, 0 is the virtual leader, 1/2/3 and 4 are both single agents in a multi-agent system. The virtual leader and the single agent only have one-way information flow, and the information flow between the single agents can be one-way or two-way. And a single agent intelligently receives information from neighbors, and cannot obtain all information of all individuals. A spanning tree exists in the communication topological graph, and the necessary condition for realizing consistency control is met.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except mutually exclusive features and/or steps.
Claims (3)
1. The multi-gradient recursion reinforcement learning fuzzy control method of the multi-agent system is characterized by comprising the following steps:
s1, transmitting the collected multi-agent course information to a multi-agent shipborne computer, and establishing a multi-agent ship course discrete nonlinear control system mathematical model related to a ship course angle by the multi-agent shipborne computer by considering the ship steady-state rotation nonlinear characteristic, wherein the specific formula is as follows:
in the formula (1), xii,1(k) The heading angle of the ith intelligent agent in the multi-intelligent-agent system is shown, wherein i is 1, N is the sequence number of the intelligent agent in the multi-intelligent-agent system, the angle mark 1 is the 1 st subsystem, and k is the time; xii,2(k) For the rate of change of course angle, the corner mark 2 is the 2 nd subsystem, ui(k) Inputting a rudder angle; y isi(k) As output of the system, gi=Ki/TiTo control the gain, wherein KiIs a ship's turning index, TiIs a ship follow-up index, fi,2(ξi,2(k) Is an unknown non-linear function, di(k) Is an unknown but bounded external disturbance and satisfies Is an unknown positive number;
the course information comprises rudder angle information measured according to the multi-agent ship steering engine and current course angle information measured by the compass;
s2, the multi-agent shipborne computer obtains a multi-agent course tracking dynamic error and a multi-agent system course tracking transformation system based on the course angle dynamic error of the agents and the virtual leader reference signal, and the course angle change rate of the agents and the dynamic error of the virtual control function, and the specific process is as follows:
the multi-agent shipborne computer designs a course tracking dynamic error by utilizing course information:
in the formula (2), δi,1(k) Is the course angle dynamic error, delta, of the ith and jth agents and the reference signal in the multi-agent systemi,2(k) The heading angle change rate xi of the ith agenti,2(k) And a virtual control function alphai,1(k) Error variable of ai,jIs the connection weight between the ith agent and the jth agent, ai,0Is the connection weight, y, between the ith agent and the virtual leader in the multi-agent systemd(k) Referencing a trajectory for a smooth bounded virtual leader;
in order to facilitate the course consistency control design of the multi-agent system and avoid the problem of no correlation of subsystems, the system transformation is carried out on the formula (1) to establish a multi-agent course tracking transformation system:
s3, according to course tracking dynamic errors and tracking performance thresholds of the multi-agent, a utility function for obtaining a strategy utility function is designed in a fuzzy evaluation module based on the tracking performance thresholds, a cost function for designing the fuzzy evaluation module is obtained by utilizing a general approximation principle and a Bellman principle of a fuzzy logic system, and a fuzzy evaluation self-adaptive update rate is designed based on a multi-gradient recursion method, wherein the fuzzy evaluation module is specifically established in the following process:
course angle dynamic error delta based on multi-intelligent-body shipborne computeri,1(k) And tracking performance threshold epsilon, designing utility function pii(k) Is composed of
In which epsilon is greater than 0, pii(k) 0 means that tracking performance is acceptable, pii(k) With 1 representing unacceptable tracking performance, a utility function pi is usedi(k) Design of the strategic Utility function Mi(k) Is composed of
Wherein 0 < gammai< 1 is a design parameter, L is a time range, and the formula (5) can be expressed as
Obtaining a strategy utility function M by using a general approximation principle of a fuzzy logic systemi(k) As follows below, the following description will be given,
in the formula [ theta ]i,cTo satisfyC denotes the evaluation module,is an unknown positive number of the positive numbers,as a weight vector, is θi,c(k) The transpose of (a) is performed,is a bounded fuzzy basis function and satisfies Is composed ofTranspose of vi,c(k) Is to approximate the error, and satisfies Is an unknown positive number;
In the formula Is an ideal parameter thetai,cIs estimated by the estimation of (a) a,is thatThe transpose of (a) is performed,is Mi(k) (ii) an estimate of (d);
according to equation (7), the cost function is defined asTo make the cost function phii,c(k) The minimization is achieved, and the fuzzy evaluation self-adaptive update rate is designed intoIs composed of
In the formula ofi,c>0,0<γi<1, Is a bounded fuzzy basis function and satisfiesT represents transposition, l is gradient index, and p is a positive integer to represent gradient length;
s4, designing a virtual control function and a strategy utility function of the multi-agent system in a fuzzy execution module according to the connection weight of each agent in the multi-agent system, designing a multi-gradient recursion self-adaption updating rate of the fuzzy execution module based on a multi-gradient recursion method, and specifically establishing the following steps:
designing a virtual control function as
In the formula, Si,2(k)=[ξi,1(k),ξi,2(k),yd(k)]T,Is a bounded fuzzy basis function and satisfies In order to estimate the parameters in an ideal manner,is composed ofTransposing;
according to equation (10), the cost function is defined asTo make the cost function phii2(k) The minimization is achieved, and the multi-gradient recursive self-adaptive update rate of the fuzzy execution module is designed according to the multi-gradient recursive algorithmIs composed of
In the formula, mui,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),yd(k-l+1)]T,Is a bounded fuzzy basis function and satisfies an inequality
S5, designing and obtaining a course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking dynamic error, the fuzzy evaluation cost function, the fuzzy evaluation adaptive update rate, the virtual control function, the strategy utility function and the fuzzy execution module multi-gradient recursive adaptive update rate, thereby obtaining a control input rudder angle of the multi-agent system, transmitting the rudder angle instruction to a multi-agent ship steering engine to output the multi-agent ship course angle, and further realizing course consistency control of the multi-agent system, wherein the specific solving process of the control input rudder angle is as follows: determining a multi-gradient recursive reinforcement learning controller of a multi-agent system, combining the obtained virtual control function alphai,1(k) Multi-gradient recursive adaptive update of evaluation module and fuzzy execution moduleRate of changeAndcalculating actual control input u of system by using multi-agent shipborne computeri(k) In order to realize the purpose,
in the formula, parameter ci,1>0,ci,2>0,Si,2(k-l+1)=[ξi,1(k-l+1),ξi,2(k-l+1),ai,0y0(k-l+1)],y0And (k-l +1) represents the reference signal at time (k-l + 1).
2. The multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, wherein the tracking performance threshold epsilon in step 3 is designed according to actual requirements.
3. An apparatus for implementing the multi-gradient recursive reinforcement learning fuzzy control method of the multi-agent system as claimed in claim 1, comprising a data acquisition unit, a data transmission unit, a multi-agent shipborne computer and a data feedback unit;
the data acquisition unit is used for acquiring course information in the ship navigation process; the data transmission unit is used for transmitting the collected course information of the multi-agent ship in the sailing process to the onboard computer; the multi-agent shipborne computer is used for processing the collected course information of the multi-agent ship in the sailing process and completing multi-gradient recursive reinforcement learning fuzzy control of the course of the multi-agent ship; the data feedback unit is used for transmitting rudder angle instructions obtained by the multi-agent ship-borne computer to the multi-agent ship steering engine to output multi-agent ship course angles, so as to realize course track tracking consistency control of the multi-agent system,
the system is characterized in that the multi-intelligent-body shipborne computer comprises a multi-intelligent-body ship course tracking system mathematical model building module, a multi-intelligent-body ship course system tracking error module, a fuzzy evaluation module, a virtual controller building module, a multi-gradient recursion reinforcement learning self-adaption updating rate module and a multi-gradient recursion reinforcement learning fuzzy controller building module;
the multi-agent ship course tracking system mathematical model building module is used for building a multi-agent ship course discrete nonlinear control system mathematical model between the input and the output of the system based on the multi-agent course information;
the multi-agent ship course system tracking error module is used for constructing a multi-agent course tracking error dynamic model and a transformation system based on the course information of each agent and the virtual leader in the multi-agent system;
the fuzzy evaluation module is used for designing a fuzzy evaluation cost function based on a preset tracking performance threshold value based on the multi-agent course tracking error to complete the design of a fuzzy evaluation self-adaptive update rate;
the virtual controller construction module is used for designing a virtual control function of the multi-agent system by using the error between the output signal and the reference signal;
the multi-gradient recursion reinforcement learning self-adaptive update rate module is used for obtaining a multi-gradient recursion self-adaptive update rate based on the fuzzy evaluation module information and the strategy utility function;
the multi-gradient recursion reinforcement learning fuzzy controller construction module is used for obtaining the course consistency controller of the multi-agent system through the multi-agent course tracking transformation system, the course tracking error dynamic state, the fuzzy evaluation cost function, the fuzzy evaluation self-adaption updating rate, the virtual control function, the strategy utility function and the multi-gradient recursion self-adaption updating rate.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010697834.8A CN111948937B (en) | 2020-07-20 | 2020-07-20 | Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010697834.8A CN111948937B (en) | 2020-07-20 | 2020-07-20 | Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111948937A CN111948937A (en) | 2020-11-17 |
CN111948937B true CN111948937B (en) | 2021-07-06 |
Family
ID=73340723
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010697834.8A Active CN111948937B (en) | 2020-07-20 | 2020-07-20 | Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111948937B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112947084B (en) * | 2021-02-08 | 2022-09-23 | 重庆大学 | Model unknown multi-agent consistency control method based on reinforcement learning |
CN113359474B (en) * | 2021-07-06 | 2022-09-16 | 杭州电子科技大学 | Extensible distributed multi-agent consistency control method based on gradient feedback |
CN114200830B (en) * | 2021-11-11 | 2023-09-22 | 辽宁石油化工大学 | Multi-agent consistency reinforcement learning control method |
CN116400691B (en) * | 2023-03-29 | 2023-11-21 | 大连海事大学 | Novel discrete time specified performance reinforcement learning unmanned ship course tracking control method and system |
CN116300949A (en) * | 2023-03-29 | 2023-06-23 | 大连海事大学 | Course tracking control method and system for discrete time reinforcement learning unmanned ship |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP3238272B2 (en) * | 1994-02-16 | 2001-12-10 | 三菱重工業株式会社 | Fuzzy control type mooring equipment |
CN109188909B (en) * | 2018-09-26 | 2021-04-23 | 大连海事大学 | Self-adaptive fuzzy optimal control method and system for ship course nonlinear discrete system |
CN109062058B (en) * | 2018-09-26 | 2021-03-19 | 大连海事大学 | Ship course track tracking design method based on self-adaptive fuzzy optimal control |
CN109857117B (en) * | 2019-03-07 | 2021-10-29 | 广东华中科技大学工业技术研究院 | Unmanned ship cluster formation method based on distributed pattern matching |
CN110262524B (en) * | 2019-08-02 | 2022-04-01 | 大连海事大学 | Design method of unmanned ship cluster optimal aggregation controller |
CN110658829B (en) * | 2019-10-30 | 2021-03-30 | 武汉理工大学 | Intelligent collision avoidance method for unmanned surface vehicle based on deep reinforcement learning |
CN111290387B (en) * | 2020-02-21 | 2022-06-03 | 大连海事大学 | Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system |
-
2020
- 2020-07-20 CN CN202010697834.8A patent/CN111948937B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN111948937A (en) | 2020-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111948937B (en) | Multi-gradient recursive reinforcement learning fuzzy control method and system of multi-agent system | |
CN109507885B (en) | Model-free self-adaptive AUV control method based on active disturbance rejection | |
CN110687799B (en) | Fuzzy self-adaptive output feedback control method and system for intelligent ship autopilot system | |
CN111273549B (en) | Fuzzy self-adaptive output feedback fault-tolerant control method and system for intelligent ship autopilot system | |
CN112612209B (en) | Full-drive ship track tracking control method and system based on instruction filtering neural network controller | |
CN111897225B (en) | Fuzzy self-adaptive output feedback control method and system for intelligent ship autopilot system | |
CN111290387B (en) | Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system | |
CN113110511B (en) | Intelligent ship course control method based on generalized fuzzy hyperbolic model | |
CN110703605B (en) | Self-adaptive fuzzy optimal control method and system for intelligent ship autopilot system | |
CN111308890B (en) | Unmanned ship data-driven reinforcement learning control method with designated performance | |
CN111198502B (en) | Unmanned ship track tracking control method based on interference observer and fuzzy system | |
CN114115262B (en) | Multi-AUV actuator saturation cooperative formation control system and method based on azimuth information | |
CN114442640B (en) | Track tracking control method for unmanned surface vehicle | |
CN112782981B (en) | Fuzzy self-adaptive output feedback designated performance control method and system for intelligent ship autopilot system | |
CN109656142B (en) | Cascade structure model-free self-adaptive guidance method for unmanned ship | |
CN111221335A (en) | Fuzzy self-adaptive output feedback finite time control method and system for intelligent ship autopilot system | |
Mu et al. | Path following for podded propulsion unmanned surface vehicle: Theory, simulation and experiment | |
CN114967702A (en) | Unmanned ship control system and path tracking method | |
CN117452827B (en) | Under-actuated unmanned ship track tracking control method | |
CN114967714A (en) | Anti-interference motion control method and system for autonomous underwater robot | |
CN113467231A (en) | Unmanned ship path tracking method based on sideslip compensation ILOS guidance law | |
CN112698575B (en) | Intelligent ship autopilot adaptive fuzzy output feedback control method and system | |
CN116527515A (en) | Remote state estimation method based on polling protocol | |
CN114564029B (en) | Full-drive ship track tracking control method and device based on direct parameterization method | |
CN113985898B (en) | Nonlinear path tracking control method of under-actuated marine aircraft |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |