CN111624882A - Zero and differential game processing method for supply chain system based on reverse-thrust design method - Google Patents

Zero and differential game processing method for supply chain system based on reverse-thrust design method Download PDF

Info

Publication number
CN111624882A
CN111624882A CN202010486432.3A CN202010486432A CN111624882A CN 111624882 A CN111624882 A CN 111624882A CN 202010486432 A CN202010486432 A CN 202010486432A CN 111624882 A CN111624882 A CN 111624882A
Authority
CN
China
Prior art keywords
supply chain
chain system
zero
game
differential
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010486432.3A
Other languages
Chinese (zh)
Other versions
CN111624882B (en
Inventor
李庆奎
杨雪静
易军凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Information Science and Technology University
Original Assignee
Beijing Information Science and Technology University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Information Science and Technology University filed Critical Beijing Information Science and Technology University
Priority to CN202010486432.3A priority Critical patent/CN111624882B/en
Publication of CN111624882A publication Critical patent/CN111624882A/en
Application granted granted Critical
Publication of CN111624882B publication Critical patent/CN111624882B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention relates to a zero and differential game processing method for a supply chain system based on a reverse-thrust design method, which comprises the following specific steps: s1: firstly, modeling a nonlinear switching supply chain system with uncertain customer requirements into a two-party zero-sum differential game problem, and inhibiting the bullwhip effect by a game theory method; s2: secondly, converting the tracking problem of a strict feedback system into an equivalent differential countermeasure problem of an affine system by using a feedforward controller; s3: secondly, researching zero and differential game strategies by using an adaptive dynamic programming technology, and proving the convergence of the differential game based on reverse thrust and the stability of a closed-loop system by using a Lyapunov method; s4: and finally, verifying the effectiveness of the method through a simulation result. The method provided by the invention is suitable for the situation that the model of the supply chain system is not completely known in real life, and has more practical significance when being applied to the nonlinear supply chain system without knowing the state function in advance.

Description

Zero and differential game processing method for supply chain system based on reverse-thrust design method
Technical Field
The invention relates to the technical field of supply chain systems, in particular to a zero and differential game processing method for a supply chain system based on a reverse-thrust design method.
Background
An important problem in supply chain management is that the bull's penis effect is weakened, which means that the requirement variability is amplified in the process of transmitting requirement information from downstream to upstream, and in the past decades, people pay great efforts to cope with the adverse effect.
For a complex dynamic supply chain system, in the physical process, nonlinearity is ubiquitous and often ignored by people or depends on slowly changing environment, therefore, modeling the dynamic supply chain system as a cascade nonlinear system can naturally obtain the nonlinear characteristic of the system, as is well known, a structured recursive design method (such as reverse-pushing) is a powerful design tool for adapting to uncertain nonlinearity, unnecessary cancellation can be avoided by the method, however, the current research result about the dynamic supply chain system often ignores certain nonlinear factors, an ideal model can be established only after linearization of the nonlinear factors, obviously, as the important cascade characteristic of the triangular structure nonlinear supply chain system, the system is difficult to be described by a linear model, and the nonlinear supply chain system needs to be researched by adopting a structured method such as reverse-pushing and the like, however, for complex dynamic supply chain systems with uncertain customer requirements, especially for suppressing the bullwhip effect by adopting a game method, it is difficult and challenging to adopt a reverse-push method, and for this reason, we propose a supply chain system based on a reverse-push design method and a zero-sum differential game processing method.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a zero and differential game processing method for a supply chain system based on a reverse-thrust design method.
In order to achieve the purpose, the invention provides the following technical scheme:
a zero and differential game processing method for a supply chain system based on a reverse-thrust design method is characterized in that the supply chain system is a cascade nonlinear system which is composed of equipment and distribution entities and driven by uncertainty of customer demand, the system completes raw material purchasing through material flow and information flow control, converts materials into intermediates and finished products and distributes the finished products to customers, and an important problem in supply chain management is how to weaken the bullwhip effect, namely the influence of amplified demand variability in the process of converting demand information from a downstream layer to an upstream layer, and the method comprises the following specific steps:
s1: firstly, modeling a nonlinear switching supply chain system with uncertain customer requirements into a two-party zero-sum differential game problem, and inhibiting the bullwhip effect by a game theory method;
s2: secondly, converting the tracking problem of a strict feedback system into an equivalent differential countermeasure problem of an affine system by using a feedforward controller;
s3: next, in order to overcome the difficulty that Hamilton-Jacobi-Isaacs (HJI) equation is difficult to obtain analytic solution, a zero and differential game strategy is researched by using an Adaptive Dynamic Programming (ADP) technology, an evaluation network is constructed, a value function of the HJI equation is learned on line in real time by executing network and interference network learning, a control strategy and an interference strategy are constructed, the game algorithm is called as synchronous zero and game strategy iteration, and the convergence of a differential game based on reverse thrust and the stability of a closed-loop system are proved by using a Lyapunov method;
s4: and finally, verifying the effectiveness of the method through a simulation result.
Preferably, the bull penis effect of the supply chain system in the step S1 is usually solved as H ∞ control, and from the perspective of game theory, the design of the H ∞ controller is equivalent to two-person zero-sum game, i.e. the controller minimizes the performance index under the maximum disturbance, thereby realizing the optimal control.
Preferably, in the step S2, the feedforward controller is designed by using a reverse method, so that the tracking problem of the supply chain system in a strict feedback form is converted into the optimal regulation problem in an affine form.
Preferably, in the step S2, based on the strategy iteration of the ADP, the ADP adopts three neural networks (an evaluation network, an execution network, and an interference network) to respectively approximate the value function, the control strategy, and the uncertain customer demand strategy in the iteration process, and finally obtains an approximate solution of the non-linear supply chain system HJI equation.
Preferably, said step S4 aims at designing the control input of the supply chain system with uncertain customer requirements by a zero and differential game method based on reverse-extrapolation, making the system output track in an optimal way while reducing the bullwhip effect, the error of the system output and the reference signal is limited to a small compact set, which illustrates the effectiveness of our proposed method, under switching conditions the system tracking the output of the reference signal is possible, which illustrates the effectiveness of the method herein, for comparison, the controller is designed in the presence of general disturbances.
Compared with the prior art, the invention has the beneficial effects that: the invention models a non-linear supply chain system with uncertain customer demands as a two-person zero-sum game problem, and aims to reduce the bullwhip effect by a game theory method; the method comprises the steps of synchronously updating and evaluating weights of a neural network, an execution neural network and an interference neural network in real time on line by combining a reverse-pushing technology and an ADP technology to obtain a Nash equilibrium solution of a corresponding HJI equation; the stability of the closed-loop system is proved by a Lyapunov method; in real life, the model of the supply chain system is not completely known, so that the method provided by the invention is more practical to be applied to the nonlinear supply chain system without knowing the state function in advance.
Drawings
FIG. 1 is a schematic diagram of a switching signal of the system of the present invention;
FIG. 2 shows the system output y of the present inventiond(t) a tracking reference signal y (t);
FIG. 3 shows the tracking error y (t) -y of the present inventiond(t) schematic drawing;
FIG. 4 shows the system output y of the controller designed under general disturbance according to the present inventiond(t) a tracking reference signal y (t);
FIG. 5 shows the tracking error y (t) -y of the controller designed under the general interference of the present inventiond(t) schematic representation.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-5, the present invention provides a technical solution:
a zero and differential game processing method for a supply chain system based on a reverse-thrust design method is characterized in that the supply chain system is a cascade nonlinear system which is composed of equipment and distribution entities and driven by uncertainty of customer demand, the system completes raw material purchasing through material flow and information flow control, converts materials into intermediates and finished products and distributes the finished products to customers, and an important problem in supply chain management is how to weaken the bullwhip effect, namely the influence of amplified demand variability in the process of converting demand information from a downstream layer to an upstream layer, and the method comprises the following specific steps:
s1: firstly, modeling a nonlinear switching supply chain system with uncertain customer requirements into a two-party zero-sum differential game problem, and inhibiting the bullwhip effect by a game theory method;
s2: secondly, converting the tracking problem of a strict feedback system into an equivalent differential countermeasure problem of an affine system by using a feedforward controller;
s3: next, in order to overcome the difficulty that Hamilton-Jacobi-Isaacs (HJI) equation is difficult to obtain analytic solution, a zero and differential game strategy is researched by using an Adaptive Dynamic Programming (ADP) technology, an evaluation network is constructed, a value function of the HJI equation is learned on line in real time by executing network and interference network learning, a control strategy and an interference strategy are constructed, the game algorithm is called as synchronous zero and game strategy iteration, and the convergence of a differential game based on reverse thrust and the stability of a closed-loop system are proved by using a Lyapunov method;
s4: and finally, verifying the effectiveness of the method through a simulation result.
Example (b):
preparing knowledge:
modeling a supply chain system with uncertain customer demand as a cascade switching nonlinear system with uncertain disturbances, assuming that the supply chain system consists of n devices, each providing raw material to the next device, the inventory level of the kth device and the uncertain customer demand are used separately at time t
Figure BDA0002519389590000041
And
Figure BDA0002519389590000051
it is shown that the dynamic model considering the kth equipment in the supply chain system is
Figure BDA0002519389590000052
Wherein
Figure BDA0002519389590000053
Representing that the inventory vector is from 1 to k, and k is more than or equal to 1 and less than or equal to n-1; σ (t) [ [0, + ∞) → M {0,1, …, M } represents a switching signal, and when σ (t) → i represents that the ith subsystem is activated, the quantity of goods received by the kth device from the (k + 1) th subsystem is counted
Figure BDA0002519389590000054
Represents;
Figure BDA0002519389590000055
and
Figure BDA0002519389590000056
is a known continuous non-linear smooth function, dk∈L2[0, ∞) is unknown bounded, and using the same method we model the nth device as shown
Figure BDA0002519389590000057
Wherein u is a control input;
note 1 that the supply chain system, system coefficients, are driven by uncertain user demand
Figure BDA0002519389590000058
Figure BDA0002519389590000059
Is different, due to different physical characteristics, transfer rate functions representing storage capacity and operational capacity
Figure BDA00025193895900000510
Also different, and therefore the supply chain system represented by and is essentially a switching system;
in the following, a cascade switching nonlinear system that simulates uncertain customer demand is shown:
Figure BDA00025193895900000511
where y is the output of the system;
suppose 1 the maximum inventory of the kth device in the supply chain is ckEach device in the supply chain satisfies 0 < xk(t)<ck(1≤k≤n);
Assume 2 function
Figure BDA00025193895900000512
And
Figure BDA00025193895900000513
satisfies with upper and lower bounds
Figure BDA00025193895900000514
And
Figure BDA0002519389590000061
wherein g iskmin,gkmax,pkminAnd pkmaxIs a normal amount. Because of the fact that
Figure BDA0002519389590000062
Representing the transfer rate from the k +1 th to the k-th device without loss of generality, further assumptions are made herein
Figure BDA0002519389590000063
Assume 3 that all states in the system are observable;
the control objective is to design the control inputs of the supply chain system such that the output y of the supply chain system tracks y with an optimal trajectorydWhile suppressing the bullwhip effect and ensuring that all signals of the cascaded switching nonlinear system formed by the supply chain system are bounded.
Tracking control problem of a strict feedback system:
in this section, to guarantee tracking error ek=xk-xkdThe feed-forward controller is designed by utilizing a reverse-thrust method, so that the tracking problem of the supply chain system in a strict feedback form is converted into the optimal regulation problem in an affine form, and the reverse-thrust design process is described as follows:
step 1 for e1=x1-x1dDerived to obtain
Figure BDA0002519389590000064
Virtual control input x2dSatisfy the requirement of
Figure BDA0002519389590000065
Feedback optimal control input
Figure BDA0002519389590000066
Feed-forward virtual control input will be designed in the next chapter
Figure BDA0002519389590000067
Obtained by solving the following formula
Figure BDA0002519389590000068
Lyapunov candidate function is defined as
Figure BDA0002519389590000069
Derivation with respect to t
Figure BDA00025193895900000610
Step k (k is more than or equal to 2 and less than or equal to n-1) for ek=xk-xkdDerived to obtain
Figure 1
Wherein the virtual control input x(k+1)dSatisfy the requirement of
Figure BDA0002519389590000072
Feedback optimal control input
Figure BDA0002519389590000073
Feed-forward virtual control input is designed below
Figure BDA0002519389590000074
Satisfy the formula
Figure BDA0002519389590000075
Lyapunov candidate function is defined as
Figure BDA0002519389590000076
To VkDerivative to obtain
Figure BDA0002519389590000077
Step n: likewise, en=xn-xndIs a derivative of
Figure BDA0002519389590000078
Wherein the virtual control input udSatisfy the requirement of
Figure BDA0002519389590000079
Feedback optimal control input
Figure BDA00025193895900000710
Will be designed in the next chapter, feed-forward virtual control input
Figure BDA00025193895900000711
Satisfy the formula
Figure BDA00025193895900000712
At this time, the Lyapunov candidate function is defined as
Figure BDA00025193895900000713
Then, for VnDerivative to obtain the formula
Figure BDA00025193895900000714
Further, define
Figure BDA0002519389590000081
Then the formula can be rewritten as
Figure BDA0002519389590000082
Wherein
Figure BDA0002519389590000083
d=[d1,…,dn]T
As mentioned before, we obtain
Figure BDA0002519389590000084
Wherein the feedforward virtual control input is represented by, and, the feedback optimal control input
Figure BDA0002519389590000085
And uncertain customer demand d ═ d1,…,dn]TEstimating through a differential game theory;
note that in the observation of FIG. 2, we know that only the feedforward controller U is usedaThe stability of the entire supply chain system cannot be guaranteed and it is therefore necessary to design a differential gaming strategy to smooth out affine-type systems.
Designing a differential game strategy:
the method is characterized in that the bull penis effect of a supply chain system is usually solved as H-infinity control, from the perspective of a game theory, the design of an H-infinity controller is equivalent to a two-person zero-sum game, namely the controller minimizes a performance index under the maximum disturbance, so that the optimal control is realized, therefore, the bull penis effect problem of a nonlinear supply chain system can be solved by a game method, and in the game process, a real-time strategy iteration method of evaluating, executing and interfering three neural networks is utilized to solve an HJI equation generated by the nonlinear zero-sum differential game on line;
zero and differential gaming:
we describe the system as follows:
Figure BDA0002519389590000086
wherein
X=[x1,…,xn]T
Figure BDA0002519389590000091
Figure BDA0002519389590000092
Figure BDA0002519389590000093
Figure BDA0002519389590000094
The goal of this problem is to design the control input U such that for a given γ > 0, the control input U is designed such that
Figure BDA0002519389590000095
Wherein Q (E) is not less than 0 and R is not less than RT> 0 and d ∈ L2[0,∞);
Assuming 4 selects γ > 0, there is a control input U such that the system is progressively stabilized and has L2The gain is not more than gamma;
note 3 that hypothesis 4 guarantees the existence of a nonlinear H ∞ control problem solution, i.e., the bull penis effect problem of the supply chain system is solvable;
the performance index is represented by
Figure BDA0002519389590000096
The H ∞ control problem consisting of control of the supply chain system and uncertain customer demand can be viewed as a two-party zero-sum game problem. Defining a value function of a policy as
Figure BDA0002519389590000097
It is constrained by the dynamic equation, and our goal is to find a Nash equilibrium point (U)*,d*) So that the control input U of the supply chain system*Minimizing performance index, uncertain customer demand d for supply chain systems*Maximizing the performance index;
defining a Hamiltonian associated with an allowable control input U and an uncertain customer demand input d of a supply chain system as
H(E,U,d)=Q(E)+UTRU-γ2‖d‖2+(▽V(E))T(Fi(E)+Gi(X)U+Pi(X)d)+H=0 (21)
Wherein
Figure BDA0002519389590000101
Is the gradient of V (E) with respect to E. Function of optimum V*(E) Is defined as
Figure BDA0002519389590000102
If game saddle points exist, the two-party optimal control problem of the supply chain system has a unique solution, namely the Nash equilibrium condition is established
Figure BDA0002519389590000103
By quiescent conditions
Figure BDA0002519389590000104
And, we get the optimal control pair for the supply chain system, which can be written as shown
Figure BDA0002519389590000105
Will bring in the HJI equation that we get for the supply chain system as
Figure BDA0002519389590000106
V*(0)=0 (25)
In order to obtain a saddle point solution of a differential countermeasure, a HJI equation of a supply chain system must be solved, the HJI equation in a nonlinear system is known to be a partial differential equation and is difficult to obtain by an analytic solution, and therefore an ADP method is adopted for solving;
ADP-based strategy iteration:
ADP adopts three neural networks (evaluation network, execution network and interference network) to respectively approach a value function, a control strategy and an uncertain customer demand strategy in an iteration process to finally obtain an approximate solution of a nonlinear supply chain system HJI equation, before ADP is applied to solve the HJI equation, the following reasoning is given, wherein 1 is an error dynamics system (17) considering a value function (19) and a differential game strategy (24), and J (E) is a continuous differentiable and radially unbounded Lyapunov candidate function, so that
Figure BDA0002519389590000111
Wherein
Figure BDA0002519389590000112
Is JE(E) Regarding the gradient of E, let Λ (E) be a positive definite matrix, and when E is 0, Λ (E) is 0; for any E ≠ 0,
Figure BDA00025193895900001115
further, Λ (E) satisfies
Figure BDA0002519389590000113
And
Figure BDA0002519389590000114
then the following relationship holds:
Figure BDA0002519389590000115
note 4 for error dynamics system (17) with control strategy and disturbance strategy (24), assume
Figure BDA0002519389590000116
Is a function of the state of the system, in particular, we assume
Figure BDA0002519389590000117
And is
Figure BDA0002519389590000118
Thus, the inequality
Figure BDA0002519389590000119
According to
(▽JE(E))T(Fi(E)+Gi(X)U*+Pi(X)d*) The theorem 1 is easy to find reasonable less than 0, and actually, the function can be obtained by properly selecting quadratic polynomial
Figure BDA00025193895900001110
As can be seen from the high-order approximation theorem of Wierstrass, there exists a completely independent basis set
Figure BDA00025193895900001111
So that the value function V (E) and its gradient are consistently approximated, i.e. there is a coefficient ciSo that
Figure BDA00025193895900001112
Figure BDA00025193895900001113
Is formed in which
Figure BDA00025193895900001114
The second terms in equations (28) and (29) converge consistently to zero when N → ∞;
to implement a differential gaming strategy (24), an optimal function is approximated using NN such that
Figure BDA0002519389590000121
Wherein
Figure BDA0002519389590000122
And
Figure BDA0002519389590000123
respectively representing the ideal weight and the activation function of the evaluation neural network,c(E) and L represents an approximation error and a neuron number, respectively, the gradient of equation (30) can be written as
Figure BDA0002519389590000124
Under the fixed control strategy U and the uncertain customer demand strategy d, the approximation function of the neural network is used for obtaining (32)
Figure BDA0002519389590000125
Residual error is
Figure BDA0002519389590000126
According to (24), the feedback optimal control and the worst uncertain customer demand are rewritten as
Figure BDA0002519389590000127
The HJI equation at this time is
Figure BDA0002519389590000128
The approximation error generated by the value function is
Figure BDA0002519389590000129
However, the ideal weight WcIs unknown, therefore, the differential game strategy (24) cannot be directly obtained, and in order to solve the problem that the ideal weight of the value function is unknown, the value function is used
Figure BDA0002519389590000131
Approximation function is removed so that
Figure BDA0002519389590000132
Hamiltonian becomes
Figure BDA0002519389590000133
Clearly, our goal is to adjust the estimated weights
Figure BDA0002519389590000134
Approximating a Hamiltonian
Figure BDA00025193895900001317
So that the weights are estimated
Figure BDA0002519389590000135
Converge on the ideal weight WcThat is, design
Figure BDA0002519389590000136
Update rate of
Figure BDA0002519389590000137
Minimizing mean square residual
Figure BDA0002519389590000138
The tuning law for designing the neural network based on the gradient descent method is shown as (40)
Figure BDA0002519389590000139
Wherein
Figure BDA00025193895900001310
ac> 0 is a design parameter that is,
Figure BDA00025193895900001311
Figure BDA00025193895900001312
the weight estimation error is
Figure BDA00025193895900001313
Therefore, according to (35), (38) and (40), we obtain the estimation error dynamics of the evaluation network
Figure BDA00025193895900001314
According to a standard strategy iterative algorithm, when a solution of Hamiltonian equation (32) is given, network and interference network updates are performed as shown at (43) and (44)
Figure BDA00025193895900001315
Figure BDA00025193895900001316
Wherein c isiIs unknown;
obtaining a solution W of formula (32) using a least squares methodcDefining a control strategy and an uncertain demand strategy as shown in (45) and (46);
Figure BDA0002519389590000141
Figure BDA0002519389590000142
it is demonstrated that when N goes to infinity, U and d converge to (43) and (44), respectively, the ideal control strategy and uncertain customer demand strategy are updated by (45) and (46), respectively, as shown in (47) and (48), respectively, when the control and uncertain customer demand strategies are calculated in the form of a neural network;
Figure BDA0002519389590000143
Figure BDA0002519389590000144
wherein
Figure BDA0002519389590000145
Representing ideal weights W at update of control strategycThe current estimated value of (a) of (b),
Figure BDA0002519389590000146
representing ideal weights W when uncertain customer demand policies are enforcedcDefining the error of the implemented neural network estimation and the error of the interfering neural network as shown in (49) and (50);
Figure BDA0002519389590000147
Figure BDA0002519389590000148
suppose 5 evaluates the ideal weight W of the neural networkcExist in the upper bound Wmax> 0, such that WcSatisfies | Wc‖≤Wmax(ii) a Gradient of activation function
Figure BDA0002519389590000149
And approximation function gradient
Figure BDA00025193895900001410
Are all bounded so that
Figure BDA00025193895900001411
And
Figure BDA00025193895900001412
is formed, wherein σM>0,MIs greater than 0. In addition, residual errorHJIIs also bounded, existsHM> 0, so that |HJI‖≤HMIf true;
theorem 1 (on-line zero-sum game tuning law of supply chain system)
Considering the supply chain system constrained by the dynamic equation (17), using the evaluation neural network in (37), (47) and (48), the execution neural network and the interference neural network to approximate the value function of the supply chain system, control the input and uncertain customer requirements, and the optimization law of the given evaluation network, the execution neural network and the interference neural network ensures the convergence of the weight functions of the three neural networks and the stability of the supply chain system;
let the tuning law of the evaluation network be
Figure BDA0002519389590000151
Wherein
Figure BDA0002519389590000152
Suppose that
Figure BDA0002519389590000153
The continuous excitation condition is met; the tuning law of the execution network is designed as
Figure BDA0002519389590000154
Optimizing law of interference network
Figure BDA0002519389590000155
Wherein
Figure BDA0002519389590000156
Figure BDA0002519389590000157
Figure BDA0002519389590000158
F1>0,F2>0,F3>0,F4And > 0 is a tuning parameter, specified in the certificate,
Figure BDA0002519389590000159
is a learning parameter, there is N0So that the number of neurons in the hidden layer is N > N0Error state of supply chain system, and error of neural network
Figure BDA00025193895900001510
Error of executing neural network
Figure BDA00025193895900001511
And errors that interfere with neural networks
Figure BDA00025193895900001512
The agreement is eventually bounded, and in turn,
Figure BDA00025193895900001513
the index converges to the optimal evaluation neural network weight Wc([25])。
Numerical simulation:
a two-stage nonlinear cascade supply chain system is provided, and the effectiveness of the method is proved;
Figure BDA0002519389590000161
wherein x is (x)1,x2)T,σ(t):[0,+∞)→M={[1],[2],[3],[4]},
Figure BDA0002519389590000162
Defining an initial value x1(0)=0.1,x2(0) 0 and reference signal yd=0.5sin(t);
In the design part of the feedback differential game, the selective activation function is
Figure BDA0002519389590000163
The initial weights of the performing network and the interfering network are chosen randomly between (0, 1), the initial weight of the evaluating network is 1
Figure BDA0002519389590000164
R=I,ac=aa=ad2, 4, the tuning parameter is designed as F1=F3=200*[1,1,1]TAnd F2=F4I is a unit array with appropriate dimensions, 20I;
the Lyapunov candidate function defined in theorem 1 is
Figure BDA0002519389590000165
In addition, a small probe signal n (t) ═ 0.1sin (t)5cos(t)+0.1sin(2t)5cos (0.2t) was added to the controller in the first 4 seconds to ensure a sustained activation condition.
The goal is to design the control inputs of a supply chain system with uncertain customer needs by a back-push based zero and differential gaming method, such that the system output y tracks y in an optimal mannerdWhile reducing the bullwhip effect, as described in the above note 1, the supply chain system is essentially a switching system, the switching signal of which is shown in fig. 1;
the system output trace and reference signal are shown in fig. 2, and it can be seen from fig. 3 that the error between the system output and the reference signal is limited to a small compact set, which illustrates the effectiveness of our proposed method, and it can be seen from the figure that under switching conditions, the system tracking the output of the reference signal is achievable, which illustrates the effectiveness of the method herein, i.e. the worst-case demand-induced bullwhip effect can be reduced using the method herein;
for comparison, the controller is designed in the presence of general disturbance, the trajectories of the system output and the reference signal are shown in fig. 4, and the errors of the system output and the reference signal are shown in fig. 5, and it is noted that the controller designed in the presence of general disturbance cannot guarantee convergence of the system state.
In conclusion, the invention models a non-linear supply chain system with uncertain customer requirements as a two-person zero-sum game problem, and aims to reduce the bullwhip effect by a game theory method; the method comprises the steps of synchronously updating and evaluating weights of a neural network, an execution neural network and an interference neural network in real time on line by combining a reverse-pushing technology and an ADP technology to obtain a Nash equilibrium solution of a corresponding HJI equation; the stability of the closed-loop system is proved by a Lyapunov method; in real life, the model of the supply chain system is not completely known, and therefore, it is more realistic to apply the method proposed herein to a nonlinear supply chain system that does not require prior knowledge of the state function.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims (5)

1. The zero and differential game processing method for the supply chain system based on the reverse-thrust design method is characterized by comprising the following steps of: the supply chain system is composed of equipment and distribution entities, is driven by uncertainty of customer demand, completes purchase of raw materials, converts the materials into intermediate and finished products and distributes the finished products to customers by controlling material flow and information flow, and an important problem in supply chain management is how to reduce the bullwhip effect, namely the influence of amplified demand variability in the process of converting demand information from a downstream layer to an upstream layer, and comprises the following specific steps:
s1: firstly, modeling a nonlinear switching supply chain system with uncertain customer requirements into a two-party zero-sum differential game problem, and inhibiting the bullwhip effect by a game theory method;
s2: secondly, converting the tracking problem of a strict feedback system into an equivalent differential countermeasure problem of an affine system by using a feedforward controller;
s3: next, in order to overcome the difficulty that a Hamilton-Jacobi-Isaacs equation is difficult to obtain an analytic solution, a self-adaptive dynamic programming technology is utilized to research zero and differential game strategies, an evaluation network is constructed, the network and an interference network are executed to learn a value function of an HJI equation in real time and online, a control strategy and an interference strategy are constructed, a game algorithm is called as synchronous zero and game strategy iteration, and a Lyapunov method is used for proving the convergence of a differential game based on reverse thrust and the stability of a closed-loop system;
s4: and finally, verifying the effectiveness of the method through a simulation result.
2. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: the bull penis effect of the supply chain system in the step S1 is usually solved as H ∞ control, and from the perspective of game theory, the design of the H ∞ controller is equivalent to two-person zero-sum game, that is, the controller minimizes the performance index under the maximum disturbance, thereby realizing the optimal control.
3. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: in the step S2, a feed-forward controller is designed by using a reverse method, so that the tracking problem of the supply chain system in a strict feedback form is converted into an optimal regulation problem in an affine form.
4. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: in the step S2, based on the strategy iteration of ADP, ADP adopts three neural networks, and the evaluation network, the execution network and the interference network respectively approximate the value function, the control strategy and the uncertain customer demand strategy in the iteration process, so as to finally obtain an approximate solution of the non-linear supply chain system HJI equation.
5. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: the goal in said step S4 is to design the control inputs of the supply chain system with uncertain customer requirements by a back-push based zero and differential gaming method, making the system outputs track in an optimal way while reducing the bullwhip effect, the errors of the system outputs and the reference signals being limited to a small compact set, which illustrates the effectiveness of our proposed method, in switching conditions the system tracking the outputs of the reference signals is possible, which illustrates the effectiveness of the method herein, for comparison, the controller is designed in the presence of general disturbances.
CN202010486432.3A 2020-06-01 2020-06-01 Zero and differential game processing method for supply chain system based on reverse-thrust design method Active CN111624882B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010486432.3A CN111624882B (en) 2020-06-01 2020-06-01 Zero and differential game processing method for supply chain system based on reverse-thrust design method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010486432.3A CN111624882B (en) 2020-06-01 2020-06-01 Zero and differential game processing method for supply chain system based on reverse-thrust design method

Publications (2)

Publication Number Publication Date
CN111624882A true CN111624882A (en) 2020-09-04
CN111624882B CN111624882B (en) 2023-04-18

Family

ID=72272015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010486432.3A Active CN111624882B (en) 2020-06-01 2020-06-01 Zero and differential game processing method for supply chain system based on reverse-thrust design method

Country Status (1)

Country Link
CN (1) CN111624882B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003050A (en) * 2021-09-30 2022-02-01 南京航空航天大学 Active defense guidance method of three-body countermeasure strategy based on differential game
CN114760101A (en) * 2022-03-18 2022-07-15 北京信息科技大学 Product and supply chain cooperative evolution system compensation method and system under network attack

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083063A (en) * 2019-04-29 2019-08-02 辽宁石油化工大学 A kind of multiple body optimal control methods based on non-strategy Q study

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083063A (en) * 2019-04-29 2019-08-02 辽宁石油化工大学 A kind of multiple body optimal control methods based on non-strategy Q study

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JINGLIANG SUN等: "Distributed zero-sum differential game for multi-agent systems in strict-feedback form with input saturation and output constraint", 《NEURAL NETWORKS》 *
周海英: "基于随机微分博弈的离散Markov跳变系统H_∞控制", 《广州航海学院学报》 *
弓镇宇等: "基于零和博弈方法的多智能体系统H_∞一致性", 《河南科学》 *
杨雪静等: "基于零和博弈的级联非线性系统的跟踪控制", 《北京信息科技大学学报(自然科学版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114003050A (en) * 2021-09-30 2022-02-01 南京航空航天大学 Active defense guidance method of three-body countermeasure strategy based on differential game
CN114003050B (en) * 2021-09-30 2023-10-31 南京航空航天大学 Active defense guidance method of three-body countermeasure strategy based on differential game
CN114760101A (en) * 2022-03-18 2022-07-15 北京信息科技大学 Product and supply chain cooperative evolution system compensation method and system under network attack

Also Published As

Publication number Publication date
CN111624882B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
Soriano et al. PD control compensation based on a cascade neural network applied to a robot manipulator
Xu et al. Reinforcement learning output feedback NN control using deterministic learning technique
de Jesús Rubio et al. Uniformly stable backpropagation algorithm to train a feedforward neural network
US5175678A (en) Method and procedure for neural control of dynamic processes
CN104950677A (en) Mechanical arm system saturation compensation control method based on back-stepping sliding mode control
Yang et al. Adaptive H∞ tracking control for a class of uncertain nonlinear systems using radial-basis-function neural networks
CN111624882B (en) Zero and differential game processing method for supply chain system based on reverse-thrust design method
Lian et al. Online inverse reinforcement learning for nonlinear systems with adversarial attacks
Beyhan et al. Stable modeling based control methods using a new RBF network
Kajiwara et al. Experimental verification of a real-time tuning method of a model-based controller by perturbations to its poles
Wang et al. Adaptive tuning of the fuzzy controller for robots
Fan et al. Adaptive nearly optimal control for a class of continuous-time nonaffine nonlinear systems with inequality constraints
Kumar et al. Lyapunov stability-based control and identification of nonlinear dynamical systems using adaptive dynamic programming
Zheng et al. Prescribed finite-time consensus with severe unknown nonlinearities and mismatched disturbances
Tsai et al. Deadzone compensation based on constrained RBF neural network
Guan et al. Spline adaptive filtering algorithm based on different iterative gradients: Performance analysis and comparison
Shahriari-Kahkeshi et al. Adaptive cooperative control of nonlinear multi-agent systems with uncertain time-varying control directions and dead-zone nonlinearity
CN113485099B (en) Online learning control method of nonlinear discrete time system
CN112346342B (en) Single-network self-adaptive evaluation design method of non-affine dynamic system
JPH08152902A (en) Adaptive processor
Wouwer et al. On the use of simultaneous perturbation stochastic approximation for neural network training
JPH04127239A (en) Automatic control method for fuzzy inference parameter and display method for learning state
Hussain et al. A new neural network and pole placement based adaptive composite controller
Petlenkov et al. Adaptive output feedback linearization for a class of nn-based anarx models
JPH0635510A (en) Model norm adaptive controller using neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant