CN111624882A - Zero and differential game processing method for supply chain system based on reverse-thrust design method - Google Patents
Zero and differential game processing method for supply chain system based on reverse-thrust design method Download PDFInfo
- Publication number
- CN111624882A CN111624882A CN202010486432.3A CN202010486432A CN111624882A CN 111624882 A CN111624882 A CN 111624882A CN 202010486432 A CN202010486432 A CN 202010486432A CN 111624882 A CN111624882 A CN 111624882A
- Authority
- CN
- China
- Prior art keywords
- supply chain
- chain system
- zero
- game
- differential
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Abstract
The invention relates to a zero and differential game processing method for a supply chain system based on a reverse-thrust design method, which comprises the following specific steps: s1: firstly, modeling a nonlinear switching supply chain system with uncertain customer requirements into a two-party zero-sum differential game problem, and inhibiting the bullwhip effect by a game theory method; s2: secondly, converting the tracking problem of a strict feedback system into an equivalent differential countermeasure problem of an affine system by using a feedforward controller; s3: secondly, researching zero and differential game strategies by using an adaptive dynamic programming technology, and proving the convergence of the differential game based on reverse thrust and the stability of a closed-loop system by using a Lyapunov method; s4: and finally, verifying the effectiveness of the method through a simulation result. The method provided by the invention is suitable for the situation that the model of the supply chain system is not completely known in real life, and has more practical significance when being applied to the nonlinear supply chain system without knowing the state function in advance.
Description
Technical Field
The invention relates to the technical field of supply chain systems, in particular to a zero and differential game processing method for a supply chain system based on a reverse-thrust design method.
Background
An important problem in supply chain management is that the bull's penis effect is weakened, which means that the requirement variability is amplified in the process of transmitting requirement information from downstream to upstream, and in the past decades, people pay great efforts to cope with the adverse effect.
For a complex dynamic supply chain system, in the physical process, nonlinearity is ubiquitous and often ignored by people or depends on slowly changing environment, therefore, modeling the dynamic supply chain system as a cascade nonlinear system can naturally obtain the nonlinear characteristic of the system, as is well known, a structured recursive design method (such as reverse-pushing) is a powerful design tool for adapting to uncertain nonlinearity, unnecessary cancellation can be avoided by the method, however, the current research result about the dynamic supply chain system often ignores certain nonlinear factors, an ideal model can be established only after linearization of the nonlinear factors, obviously, as the important cascade characteristic of the triangular structure nonlinear supply chain system, the system is difficult to be described by a linear model, and the nonlinear supply chain system needs to be researched by adopting a structured method such as reverse-pushing and the like, however, for complex dynamic supply chain systems with uncertain customer requirements, especially for suppressing the bullwhip effect by adopting a game method, it is difficult and challenging to adopt a reverse-push method, and for this reason, we propose a supply chain system based on a reverse-push design method and a zero-sum differential game processing method.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a zero and differential game processing method for a supply chain system based on a reverse-thrust design method.
In order to achieve the purpose, the invention provides the following technical scheme:
a zero and differential game processing method for a supply chain system based on a reverse-thrust design method is characterized in that the supply chain system is a cascade nonlinear system which is composed of equipment and distribution entities and driven by uncertainty of customer demand, the system completes raw material purchasing through material flow and information flow control, converts materials into intermediates and finished products and distributes the finished products to customers, and an important problem in supply chain management is how to weaken the bullwhip effect, namely the influence of amplified demand variability in the process of converting demand information from a downstream layer to an upstream layer, and the method comprises the following specific steps:
s1: firstly, modeling a nonlinear switching supply chain system with uncertain customer requirements into a two-party zero-sum differential game problem, and inhibiting the bullwhip effect by a game theory method;
s2: secondly, converting the tracking problem of a strict feedback system into an equivalent differential countermeasure problem of an affine system by using a feedforward controller;
s3: next, in order to overcome the difficulty that Hamilton-Jacobi-Isaacs (HJI) equation is difficult to obtain analytic solution, a zero and differential game strategy is researched by using an Adaptive Dynamic Programming (ADP) technology, an evaluation network is constructed, a value function of the HJI equation is learned on line in real time by executing network and interference network learning, a control strategy and an interference strategy are constructed, the game algorithm is called as synchronous zero and game strategy iteration, and the convergence of a differential game based on reverse thrust and the stability of a closed-loop system are proved by using a Lyapunov method;
s4: and finally, verifying the effectiveness of the method through a simulation result.
Preferably, the bull penis effect of the supply chain system in the step S1 is usually solved as H ∞ control, and from the perspective of game theory, the design of the H ∞ controller is equivalent to two-person zero-sum game, i.e. the controller minimizes the performance index under the maximum disturbance, thereby realizing the optimal control.
Preferably, in the step S2, the feedforward controller is designed by using a reverse method, so that the tracking problem of the supply chain system in a strict feedback form is converted into the optimal regulation problem in an affine form.
Preferably, in the step S2, based on the strategy iteration of the ADP, the ADP adopts three neural networks (an evaluation network, an execution network, and an interference network) to respectively approximate the value function, the control strategy, and the uncertain customer demand strategy in the iteration process, and finally obtains an approximate solution of the non-linear supply chain system HJI equation.
Preferably, said step S4 aims at designing the control input of the supply chain system with uncertain customer requirements by a zero and differential game method based on reverse-extrapolation, making the system output track in an optimal way while reducing the bullwhip effect, the error of the system output and the reference signal is limited to a small compact set, which illustrates the effectiveness of our proposed method, under switching conditions the system tracking the output of the reference signal is possible, which illustrates the effectiveness of the method herein, for comparison, the controller is designed in the presence of general disturbances.
Compared with the prior art, the invention has the beneficial effects that: the invention models a non-linear supply chain system with uncertain customer demands as a two-person zero-sum game problem, and aims to reduce the bullwhip effect by a game theory method; the method comprises the steps of synchronously updating and evaluating weights of a neural network, an execution neural network and an interference neural network in real time on line by combining a reverse-pushing technology and an ADP technology to obtain a Nash equilibrium solution of a corresponding HJI equation; the stability of the closed-loop system is proved by a Lyapunov method; in real life, the model of the supply chain system is not completely known, so that the method provided by the invention is more practical to be applied to the nonlinear supply chain system without knowing the state function in advance.
Drawings
FIG. 1 is a schematic diagram of a switching signal of the system of the present invention;
FIG. 2 shows the system output y of the present inventiond(t) a tracking reference signal y (t);
FIG. 3 shows the tracking error y (t) -y of the present inventiond(t) schematic drawing;
FIG. 4 shows the system output y of the controller designed under general disturbance according to the present inventiond(t) a tracking reference signal y (t);
FIG. 5 shows the tracking error y (t) -y of the controller designed under the general interference of the present inventiond(t) schematic representation.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-5, the present invention provides a technical solution:
a zero and differential game processing method for a supply chain system based on a reverse-thrust design method is characterized in that the supply chain system is a cascade nonlinear system which is composed of equipment and distribution entities and driven by uncertainty of customer demand, the system completes raw material purchasing through material flow and information flow control, converts materials into intermediates and finished products and distributes the finished products to customers, and an important problem in supply chain management is how to weaken the bullwhip effect, namely the influence of amplified demand variability in the process of converting demand information from a downstream layer to an upstream layer, and the method comprises the following specific steps:
s1: firstly, modeling a nonlinear switching supply chain system with uncertain customer requirements into a two-party zero-sum differential game problem, and inhibiting the bullwhip effect by a game theory method;
s2: secondly, converting the tracking problem of a strict feedback system into an equivalent differential countermeasure problem of an affine system by using a feedforward controller;
s3: next, in order to overcome the difficulty that Hamilton-Jacobi-Isaacs (HJI) equation is difficult to obtain analytic solution, a zero and differential game strategy is researched by using an Adaptive Dynamic Programming (ADP) technology, an evaluation network is constructed, a value function of the HJI equation is learned on line in real time by executing network and interference network learning, a control strategy and an interference strategy are constructed, the game algorithm is called as synchronous zero and game strategy iteration, and the convergence of a differential game based on reverse thrust and the stability of a closed-loop system are proved by using a Lyapunov method;
s4: and finally, verifying the effectiveness of the method through a simulation result.
Example (b):
preparing knowledge:
modeling a supply chain system with uncertain customer demand as a cascade switching nonlinear system with uncertain disturbances, assuming that the supply chain system consists of n devices, each providing raw material to the next device, the inventory level of the kth device and the uncertain customer demand are used separately at time tAndit is shown that the dynamic model considering the kth equipment in the supply chain system is
WhereinRepresenting that the inventory vector is from 1 to k, and k is more than or equal to 1 and less than or equal to n-1; σ (t) [ [0, + ∞) → M {0,1, …, M } represents a switching signal, and when σ (t) → i represents that the ith subsystem is activated, the quantity of goods received by the kth device from the (k + 1) th subsystem is countedRepresents;
andis a known continuous non-linear smooth function, dk∈L2[0, ∞) is unknown bounded, and using the same method we model the nth device as shown
Wherein u is a control input;
in the following, a cascade switching nonlinear system that simulates uncertain customer demand is shown:
where y is the output of the system;
suppose 1 the maximum inventory of the kth device in the supply chain is ckEach device in the supply chain satisfies 0 < xk(t)<ck(1≤k≤n);
Assume 2 functionAndsatisfies with upper and lower boundsAndwherein g iskmin,gkmax,pkminAnd pkmaxIs a normal amount. Because of the fact thatRepresenting the transfer rate from the k +1 th to the k-th device without loss of generality, further assumptions are made herein
Assume 3 that all states in the system are observable;
the control objective is to design the control inputs of the supply chain system such that the output y of the supply chain system tracks y with an optimal trajectorydWhile suppressing the bullwhip effect and ensuring that all signals of the cascaded switching nonlinear system formed by the supply chain system are bounded.
Tracking control problem of a strict feedback system:
in this section, to guarantee tracking error ek=xk-xkdThe feed-forward controller is designed by utilizing a reverse-thrust method, so that the tracking problem of the supply chain system in a strict feedback form is converted into the optimal regulation problem in an affine form, and the reverse-thrust design process is described as follows:
Virtual control input x2dSatisfy the requirement ofFeedback optimal control inputFeed-forward virtual control input will be designed in the next chapterObtained by solving the following formula
Lyapunov candidate function is defined as
Derivation with respect to t
Step k (k is more than or equal to 2 and less than or equal to n-1) for ek=xk-xkdDerived to obtain
Wherein the virtual control input x(k+1)dSatisfy the requirement ofFeedback optimal control inputFeed-forward virtual control input is designed belowSatisfy the formula
Lyapunov candidate function is defined as
To VkDerivative to obtain
Step n: likewise, en=xn-xndIs a derivative of
Wherein the virtual control input udSatisfy the requirement ofFeedback optimal control inputWill be designed in the next chapter, feed-forward virtual control inputSatisfy the formula
At this time, the Lyapunov candidate function is defined as
Then, for VnDerivative to obtain the formula
As mentioned before, we obtainWherein the feedforward virtual control input is represented by, and, the feedback optimal control inputAnd uncertain customer demand d ═ d1,…,dn]TEstimating through a differential game theory;
note that in the observation of FIG. 2, we know that only the feedforward controller U is usedaThe stability of the entire supply chain system cannot be guaranteed and it is therefore necessary to design a differential gaming strategy to smooth out affine-type systems.
Designing a differential game strategy:
the method is characterized in that the bull penis effect of a supply chain system is usually solved as H-infinity control, from the perspective of a game theory, the design of an H-infinity controller is equivalent to a two-person zero-sum game, namely the controller minimizes a performance index under the maximum disturbance, so that the optimal control is realized, therefore, the bull penis effect problem of a nonlinear supply chain system can be solved by a game method, and in the game process, a real-time strategy iteration method of evaluating, executing and interfering three neural networks is utilized to solve an HJI equation generated by the nonlinear zero-sum differential game on line;
zero and differential gaming:
we describe the system as follows:
wherein
X=[x1,…,xn]T
The goal of this problem is to design the control input U such that for a given γ > 0, the control input U is designed such that
Wherein Q (E) is not less than 0 and R is not less than RT> 0 and d ∈ L2[0,∞);
Assuming 4 selects γ > 0, there is a control input U such that the system is progressively stabilized and has L2The gain is not more than gamma;
the performance index is represented by
The H ∞ control problem consisting of control of the supply chain system and uncertain customer demand can be viewed as a two-party zero-sum game problem. Defining a value function of a policy as
It is constrained by the dynamic equation, and our goal is to find a Nash equilibrium point (U)*,d*) So that the control input U of the supply chain system*Minimizing performance index, uncertain customer demand d for supply chain systems*Maximizing the performance index;
defining a Hamiltonian associated with an allowable control input U and an uncertain customer demand input d of a supply chain system as
H(E,U,d)=Q(E)+UTRU-γ2‖d‖2+(▽V(E))T(Fi(E)+Gi(X)U+Pi(X)d)+H=0 (21)
If game saddle points exist, the two-party optimal control problem of the supply chain system has a unique solution, namely the Nash equilibrium condition is established
By quiescent conditionsAnd, we get the optimal control pair for the supply chain system, which can be written as shown
Will bring in the HJI equation that we get for the supply chain system as
V*(0)=0 (25)
In order to obtain a saddle point solution of a differential countermeasure, a HJI equation of a supply chain system must be solved, the HJI equation in a nonlinear system is known to be a partial differential equation and is difficult to obtain by an analytic solution, and therefore an ADP method is adopted for solving;
ADP-based strategy iteration:
ADP adopts three neural networks (evaluation network, execution network and interference network) to respectively approach a value function, a control strategy and an uncertain customer demand strategy in an iteration process to finally obtain an approximate solution of a nonlinear supply chain system HJI equation, before ADP is applied to solve the HJI equation, the following reasoning is given, wherein 1 is an error dynamics system (17) considering a value function (19) and a differential game strategy (24), and J (E) is a continuous differentiable and radially unbounded Lyapunov candidate function, so thatWhereinIs JE(E) Regarding the gradient of E, let Λ (E) be a positive definite matrix, and when E is 0, Λ (E) is 0; for any E ≠ 0,further, Λ (E) satisfiesAnd
then the following relationship holds:
note 4 for error dynamics system (17) with control strategy and disturbance strategy (24), assumeIs a function of the state of the system, in particular, we assumeAnd isThus, the inequalityAccording to
(▽JE(E))T(Fi(E)+Gi(X)U*+Pi(X)d*) The theorem 1 is easy to find reasonable less than 0, and actually, the function can be obtained by properly selecting quadratic polynomial
As can be seen from the high-order approximation theorem of Wierstrass, there exists a completely independent basis setSo that the value function V (E) and its gradient are consistently approximated, i.e. there is a coefficient ciSo that
Is formed in whichThe second terms in equations (28) and (29) converge consistently to zero when N → ∞;
to implement a differential gaming strategy (24), an optimal function is approximated using NN such that
WhereinAndrespectively representing the ideal weight and the activation function of the evaluation neural network,c(E) and L represents an approximation error and a neuron number, respectively, the gradient of equation (30) can be written as
Under the fixed control strategy U and the uncertain customer demand strategy d, the approximation function of the neural network is used for obtaining (32)
Residual error is
According to (24), the feedback optimal control and the worst uncertain customer demand are rewritten as
The HJI equation at this time is
The approximation error generated by the value function is
However, the ideal weight WcIs unknown, therefore, the differential game strategy (24) cannot be directly obtained, and in order to solve the problem that the ideal weight of the value function is unknown, the value function is usedApproximation function is removed so that
Hamiltonian becomes
Clearly, our goal is to adjust the estimated weightsApproximating a HamiltonianSo that the weights are estimatedConverge on the ideal weight WcThat is, designUpdate rate ofMinimizing mean square residual
The tuning law for designing the neural network based on the gradient descent method is shown as (40)
the weight estimation error is
Therefore, according to (35), (38) and (40), we obtain the estimation error dynamics of the evaluation network
According to a standard strategy iterative algorithm, when a solution of Hamiltonian equation (32) is given, network and interference network updates are performed as shown at (43) and (44)
Wherein c isiIs unknown;
obtaining a solution W of formula (32) using a least squares methodcDefining a control strategy and an uncertain demand strategy as shown in (45) and (46);
it is demonstrated that when N goes to infinity, U and d converge to (43) and (44), respectively, the ideal control strategy and uncertain customer demand strategy are updated by (45) and (46), respectively, as shown in (47) and (48), respectively, when the control and uncertain customer demand strategies are calculated in the form of a neural network;
whereinRepresenting ideal weights W at update of control strategycThe current estimated value of (a) of (b),representing ideal weights W when uncertain customer demand policies are enforcedcDefining the error of the implemented neural network estimation and the error of the interfering neural network as shown in (49) and (50);
suppose 5 evaluates the ideal weight W of the neural networkcExist in the upper bound Wmax> 0, such that WcSatisfies | Wc‖≤Wmax(ii) a Gradient of activation functionAnd approximation function gradientAre all bounded so thatAndis formed, wherein σM>0,MIs greater than 0. In addition, residual errorHJIIs also bounded, existsHM> 0, so that |HJI‖≤HMIf true;
theorem 1 (on-line zero-sum game tuning law of supply chain system)
Considering the supply chain system constrained by the dynamic equation (17), using the evaluation neural network in (37), (47) and (48), the execution neural network and the interference neural network to approximate the value function of the supply chain system, control the input and uncertain customer requirements, and the optimization law of the given evaluation network, the execution neural network and the interference neural network ensures the convergence of the weight functions of the three neural networks and the stability of the supply chain system;
let the tuning law of the evaluation network be
WhereinSuppose thatThe continuous excitation condition is met; the tuning law of the execution network is designed as
Optimizing law of interference network
Wherein
F1>0,F2>0,F3>0,F4And > 0 is a tuning parameter, specified in the certificate,is a learning parameter, there is N0So that the number of neurons in the hidden layer is N > N0Error state of supply chain system, and error of neural networkError of executing neural networkAnd errors that interfere with neural networksThe agreement is eventually bounded, and in turn,the index converges to the optimal evaluation neural network weight Wc([25])。
Numerical simulation:
a two-stage nonlinear cascade supply chain system is provided, and the effectiveness of the method is proved;
wherein x is (x)1,x2)T,σ(t):[0,+∞)→M={[1],[2],[3],[4]},
Defining an initial value x1(0)=0.1,x2(0) 0 and reference signal yd=0.5sin(t);
In the design part of the feedback differential game, the selective activation function isThe initial weights of the performing network and the interfering network are chosen randomly between (0, 1), the initial weight of the evaluating network is 1R=I,ac=aa=ad2, 4, the tuning parameter is designed as F1=F3=200*[1,1,1]TAnd F2=F4I is a unit array with appropriate dimensions, 20I;
the Lyapunov candidate function defined in theorem 1 isIn addition, a small probe signal n (t) ═ 0.1sin (t)5cos(t)+0.1sin(2t)5cos (0.2t) was added to the controller in the first 4 seconds to ensure a sustained activation condition.
The goal is to design the control inputs of a supply chain system with uncertain customer needs by a back-push based zero and differential gaming method, such that the system output y tracks y in an optimal mannerdWhile reducing the bullwhip effect, as described in the above note 1, the supply chain system is essentially a switching system, the switching signal of which is shown in fig. 1;
the system output trace and reference signal are shown in fig. 2, and it can be seen from fig. 3 that the error between the system output and the reference signal is limited to a small compact set, which illustrates the effectiveness of our proposed method, and it can be seen from the figure that under switching conditions, the system tracking the output of the reference signal is achievable, which illustrates the effectiveness of the method herein, i.e. the worst-case demand-induced bullwhip effect can be reduced using the method herein;
for comparison, the controller is designed in the presence of general disturbance, the trajectories of the system output and the reference signal are shown in fig. 4, and the errors of the system output and the reference signal are shown in fig. 5, and it is noted that the controller designed in the presence of general disturbance cannot guarantee convergence of the system state.
In conclusion, the invention models a non-linear supply chain system with uncertain customer requirements as a two-person zero-sum game problem, and aims to reduce the bullwhip effect by a game theory method; the method comprises the steps of synchronously updating and evaluating weights of a neural network, an execution neural network and an interference neural network in real time on line by combining a reverse-pushing technology and an ADP technology to obtain a Nash equilibrium solution of a corresponding HJI equation; the stability of the closed-loop system is proved by a Lyapunov method; in real life, the model of the supply chain system is not completely known, and therefore, it is more realistic to apply the method proposed herein to a nonlinear supply chain system that does not require prior knowledge of the state function.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.
Claims (5)
1. The zero and differential game processing method for the supply chain system based on the reverse-thrust design method is characterized by comprising the following steps of: the supply chain system is composed of equipment and distribution entities, is driven by uncertainty of customer demand, completes purchase of raw materials, converts the materials into intermediate and finished products and distributes the finished products to customers by controlling material flow and information flow, and an important problem in supply chain management is how to reduce the bullwhip effect, namely the influence of amplified demand variability in the process of converting demand information from a downstream layer to an upstream layer, and comprises the following specific steps:
s1: firstly, modeling a nonlinear switching supply chain system with uncertain customer requirements into a two-party zero-sum differential game problem, and inhibiting the bullwhip effect by a game theory method;
s2: secondly, converting the tracking problem of a strict feedback system into an equivalent differential countermeasure problem of an affine system by using a feedforward controller;
s3: next, in order to overcome the difficulty that a Hamilton-Jacobi-Isaacs equation is difficult to obtain an analytic solution, a self-adaptive dynamic programming technology is utilized to research zero and differential game strategies, an evaluation network is constructed, the network and an interference network are executed to learn a value function of an HJI equation in real time and online, a control strategy and an interference strategy are constructed, a game algorithm is called as synchronous zero and game strategy iteration, and a Lyapunov method is used for proving the convergence of a differential game based on reverse thrust and the stability of a closed-loop system;
s4: and finally, verifying the effectiveness of the method through a simulation result.
2. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: the bull penis effect of the supply chain system in the step S1 is usually solved as H ∞ control, and from the perspective of game theory, the design of the H ∞ controller is equivalent to two-person zero-sum game, that is, the controller minimizes the performance index under the maximum disturbance, thereby realizing the optimal control.
3. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: in the step S2, a feed-forward controller is designed by using a reverse method, so that the tracking problem of the supply chain system in a strict feedback form is converted into an optimal regulation problem in an affine form.
4. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: in the step S2, based on the strategy iteration of ADP, ADP adopts three neural networks, and the evaluation network, the execution network and the interference network respectively approximate the value function, the control strategy and the uncertain customer demand strategy in the iteration process, so as to finally obtain an approximate solution of the non-linear supply chain system HJI equation.
5. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: the goal in said step S4 is to design the control inputs of the supply chain system with uncertain customer requirements by a back-push based zero and differential gaming method, making the system outputs track in an optimal way while reducing the bullwhip effect, the errors of the system outputs and the reference signals being limited to a small compact set, which illustrates the effectiveness of our proposed method, in switching conditions the system tracking the outputs of the reference signals is possible, which illustrates the effectiveness of the method herein, for comparison, the controller is designed in the presence of general disturbances.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010486432.3A CN111624882B (en) | 2020-06-01 | 2020-06-01 | Zero and differential game processing method for supply chain system based on reverse-thrust design method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010486432.3A CN111624882B (en) | 2020-06-01 | 2020-06-01 | Zero and differential game processing method for supply chain system based on reverse-thrust design method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111624882A true CN111624882A (en) | 2020-09-04 |
CN111624882B CN111624882B (en) | 2023-04-18 |
Family
ID=72272015
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010486432.3A Active CN111624882B (en) | 2020-06-01 | 2020-06-01 | Zero and differential game processing method for supply chain system based on reverse-thrust design method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111624882B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114003050A (en) * | 2021-09-30 | 2022-02-01 | 南京航空航天大学 | Active defense guidance method of three-body countermeasure strategy based on differential game |
CN114760101A (en) * | 2022-03-18 | 2022-07-15 | 北京信息科技大学 | Product and supply chain cooperative evolution system compensation method and system under network attack |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083063A (en) * | 2019-04-29 | 2019-08-02 | 辽宁石油化工大学 | A kind of multiple body optimal control methods based on non-strategy Q study |
-
2020
- 2020-06-01 CN CN202010486432.3A patent/CN111624882B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110083063A (en) * | 2019-04-29 | 2019-08-02 | 辽宁石油化工大学 | A kind of multiple body optimal control methods based on non-strategy Q study |
Non-Patent Citations (4)
Title |
---|
JINGLIANG SUN等: "Distributed zero-sum differential game for multi-agent systems in strict-feedback form with input saturation and output constraint", 《NEURAL NETWORKS》 * |
周海英: "基于随机微分博弈的离散Markov跳变系统H_∞控制", 《广州航海学院学报》 * |
弓镇宇等: "基于零和博弈方法的多智能体系统H_∞一致性", 《河南科学》 * |
杨雪静等: "基于零和博弈的级联非线性系统的跟踪控制", 《北京信息科技大学学报(自然科学版)》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114003050A (en) * | 2021-09-30 | 2022-02-01 | 南京航空航天大学 | Active defense guidance method of three-body countermeasure strategy based on differential game |
CN114003050B (en) * | 2021-09-30 | 2023-10-31 | 南京航空航天大学 | Active defense guidance method of three-body countermeasure strategy based on differential game |
CN114760101A (en) * | 2022-03-18 | 2022-07-15 | 北京信息科技大学 | Product and supply chain cooperative evolution system compensation method and system under network attack |
Also Published As
Publication number | Publication date |
---|---|
CN111624882B (en) | 2023-04-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Soriano et al. | PD control compensation based on a cascade neural network applied to a robot manipulator | |
Xu et al. | Reinforcement learning output feedback NN control using deterministic learning technique | |
de Jesús Rubio et al. | Uniformly stable backpropagation algorithm to train a feedforward neural network | |
US5175678A (en) | Method and procedure for neural control of dynamic processes | |
CN104950677A (en) | Mechanical arm system saturation compensation control method based on back-stepping sliding mode control | |
Yang et al. | Adaptive H∞ tracking control for a class of uncertain nonlinear systems using radial-basis-function neural networks | |
CN111624882B (en) | Zero and differential game processing method for supply chain system based on reverse-thrust design method | |
Lian et al. | Online inverse reinforcement learning for nonlinear systems with adversarial attacks | |
Beyhan et al. | Stable modeling based control methods using a new RBF network | |
Kajiwara et al. | Experimental verification of a real-time tuning method of a model-based controller by perturbations to its poles | |
Wang et al. | Adaptive tuning of the fuzzy controller for robots | |
Fan et al. | Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints | |
Kumar et al. | Lyapunov stability-based control and identification of nonlinear dynamical systems using adaptive dynamic programming | |
Zheng et al. | Prescribed finite-time consensus with severe unknown nonlinearities and mismatched disturbances | |
Tsai et al. | Deadzone compensation based on constrained RBF neural network | |
Guan et al. | Spline adaptive filtering algorithm based on different iterative gradients: Performance analysis and comparison | |
Shahriari-Kahkeshi et al. | Adaptive cooperative control of nonlinear multi-agent systems with uncertain time-varying control directions and dead-zone nonlinearity | |
CN113485099B (en) | Online learning control method of nonlinear discrete time system | |
CN112346342B (en) | Single-network self-adaptive evaluation design method of non-affine dynamic system | |
JPH08152902A (en) | Adaptive processor | |
Wouwer et al. | On the use of simultaneous perturbation stochastic approximation for neural network training | |
JPH04127239A (en) | Automatic control method for fuzzy inference parameter and display method for learning state | |
Hussain et al. | A new neural network and pole placement based adaptive composite controller | |
Petlenkov et al. | Adaptive output feedback linearization for a class of nn-based anarx models | |
JPH0635510A (en) | Model norm adaptive controller using neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |