CN111624882A

CN111624882A - Zero and differential game processing method for supply chain system based on reverse-thrust design method

Info

Publication number: CN111624882A
Application number: CN202010486432.3A
Authority: CN
Inventors: 李庆奎; 杨雪静; 易军凯
Original assignee: Beijing Information Science and Technology University
Current assignee: Beijing Information Science and Technology University
Priority date: 2020-06-01
Filing date: 2020-06-01
Publication date: 2020-09-04
Anticipated expiration: 2040-06-01
Also published as: CN111624882B

Abstract

The invention relates to a zero and differential game processing method for a supply chain system based on a reverse-thrust design method, which comprises the following specific steps: s1: firstly, modeling a nonlinear switching supply chain system with uncertain customer requirements into a two-party zero-sum differential game problem, and inhibiting the bullwhip effect by a game theory method; s2: secondly, converting the tracking problem of a strict feedback system into an equivalent differential countermeasure problem of an affine system by using a feedforward controller; s3: secondly, researching zero and differential game strategies by using an adaptive dynamic programming technology, and proving the convergence of the differential game based on reverse thrust and the stability of a closed-loop system by using a Lyapunov method; s4: and finally, verifying the effectiveness of the method through a simulation result. The method provided by the invention is suitable for the situation that the model of the supply chain system is not completely known in real life, and has more practical significance when being applied to the nonlinear supply chain system without knowing the state function in advance.

Description

Zero and differential game processing method for supply chain system based on reverse-thrust design method

Technical Field

The invention relates to the technical field of supply chain systems, in particular to a zero and differential game processing method for a supply chain system based on a reverse-thrust design method.

Background

An important problem in supply chain management is that the bull's penis effect is weakened, which means that the requirement variability is amplified in the process of transmitting requirement information from downstream to upstream, and in the past decades, people pay great efforts to cope with the adverse effect.

For a complex dynamic supply chain system, in the physical process, nonlinearity is ubiquitous and often ignored by people or depends on slowly changing environment, therefore, modeling the dynamic supply chain system as a cascade nonlinear system can naturally obtain the nonlinear characteristic of the system, as is well known, a structured recursive design method (such as reverse-pushing) is a powerful design tool for adapting to uncertain nonlinearity, unnecessary cancellation can be avoided by the method, however, the current research result about the dynamic supply chain system often ignores certain nonlinear factors, an ideal model can be established only after linearization of the nonlinear factors, obviously, as the important cascade characteristic of the triangular structure nonlinear supply chain system, the system is difficult to be described by a linear model, and the nonlinear supply chain system needs to be researched by adopting a structured method such as reverse-pushing and the like, however, for complex dynamic supply chain systems with uncertain customer requirements, especially for suppressing the bullwhip effect by adopting a game method, it is difficult and challenging to adopt a reverse-push method, and for this reason, we propose a supply chain system based on a reverse-push design method and a zero-sum differential game processing method.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides a zero and differential game processing method for a supply chain system based on a reverse-thrust design method.

In order to achieve the purpose, the invention provides the following technical scheme:

a zero and differential game processing method for a supply chain system based on a reverse-thrust design method is characterized in that the supply chain system is a cascade nonlinear system which is composed of equipment and distribution entities and driven by uncertainty of customer demand, the system completes raw material purchasing through material flow and information flow control, converts materials into intermediates and finished products and distributes the finished products to customers, and an important problem in supply chain management is how to weaken the bullwhip effect, namely the influence of amplified demand variability in the process of converting demand information from a downstream layer to an upstream layer, and the method comprises the following specific steps:

s1: firstly, modeling a nonlinear switching supply chain system with uncertain customer requirements into a two-party zero-sum differential game problem, and inhibiting the bullwhip effect by a game theory method;

s2: secondly, converting the tracking problem of a strict feedback system into an equivalent differential countermeasure problem of an affine system by using a feedforward controller;

s3: next, in order to overcome the difficulty that Hamilton-Jacobi-Isaacs (HJI) equation is difficult to obtain analytic solution, a zero and differential game strategy is researched by using an Adaptive Dynamic Programming (ADP) technology, an evaluation network is constructed, a value function of the HJI equation is learned on line in real time by executing network and interference network learning, a control strategy and an interference strategy are constructed, the game algorithm is called as synchronous zero and game strategy iteration, and the convergence of a differential game based on reverse thrust and the stability of a closed-loop system are proved by using a Lyapunov method;

s4: and finally, verifying the effectiveness of the method through a simulation result.

Preferably, the bull penis effect of the supply chain system in the step S1 is usually solved as H ∞ control, and from the perspective of game theory, the design of the H ∞ controller is equivalent to two-person zero-sum game, i.e. the controller minimizes the performance index under the maximum disturbance, thereby realizing the optimal control.

Preferably, in the step S2, the feedforward controller is designed by using a reverse method, so that the tracking problem of the supply chain system in a strict feedback form is converted into the optimal regulation problem in an affine form.

Preferably, in the step S2, based on the strategy iteration of the ADP, the ADP adopts three neural networks (an evaluation network, an execution network, and an interference network) to respectively approximate the value function, the control strategy, and the uncertain customer demand strategy in the iteration process, and finally obtains an approximate solution of the non-linear supply chain system HJI equation.

Preferably, said step S4 aims at designing the control input of the supply chain system with uncertain customer requirements by a zero and differential game method based on reverse-extrapolation, making the system output track in an optimal way while reducing the bullwhip effect, the error of the system output and the reference signal is limited to a small compact set, which illustrates the effectiveness of our proposed method, under switching conditions the system tracking the output of the reference signal is possible, which illustrates the effectiveness of the method herein, for comparison, the controller is designed in the presence of general disturbances.

Compared with the prior art, the invention has the beneficial effects that: the invention models a non-linear supply chain system with uncertain customer demands as a two-person zero-sum game problem, and aims to reduce the bullwhip effect by a game theory method; the method comprises the steps of synchronously updating and evaluating weights of a neural network, an execution neural network and an interference neural network in real time on line by combining a reverse-pushing technology and an ADP technology to obtain a Nash equilibrium solution of a corresponding HJI equation; the stability of the closed-loop system is proved by a Lyapunov method; in real life, the model of the supply chain system is not completely known, so that the method provided by the invention is more practical to be applied to the nonlinear supply chain system without knowing the state function in advance.

Drawings

FIG. 1 is a schematic diagram of a switching signal of the system of the present invention;

FIG. 2 shows the system output y of the present invention_d(t) a tracking reference signal y (t);

FIG. 3 shows the tracking error y (t) -y of the present invention_d(t) schematic drawing;

FIG. 4 shows the system output y of the controller designed under general disturbance according to the present invention_d(t) a tracking reference signal y (t);

FIG. 5 shows the tracking error y (t) -y of the controller designed under the general interference of the present invention_d(t) schematic representation.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-5, the present invention provides a technical solution:

Example (b):

preparing knowledge:

modeling a supply chain system with uncertain customer demand as a cascade switching nonlinear system with uncertain disturbances, assuming that the supply chain system consists of n devices, each providing raw material to the next device, the inventory level of the kth device and the uncertain customer demand are used separately at time t

And

it is shown that the dynamic model considering the kth equipment in the supply chain system is

Wherein

Representing that the inventory vector is from 1 to k, and k is more than or equal to 1 and less than or equal to n-1; σ (t) [ [0, + ∞) → M {0,1, …, M } represents a switching signal, and when σ (t) → i represents that the ith subsystem is activated, the quantity of goods received by the kth device from the (k + 1) th subsystem is counted

Represents;

and

is a known continuous non-linear smooth function, d_k∈L₂[0, ∞) is unknown bounded, and using the same method we model the nth device as shown

Wherein u is a control input;

note 1 that the supply chain system, system coefficients, are driven by uncertain user demand

Is different, due to different physical characteristics, transfer rate functions representing storage capacity and operational capacity

Also different, and therefore the supply chain system represented by and is essentially a switching system;

in the following, a cascade switching nonlinear system that simulates uncertain customer demand is shown:

where y is the output of the system;

suppose 1 the maximum inventory of the kth device in the supply chain is c_kEach device in the supply chain satisfies 0 < x_k(t)＜c_k(1≤k≤n)；

Assume 2 function

And

satisfies with upper and lower bounds

And

wherein g is_kmin,g_kmax,p_kminAnd p_kmaxIs a normal amount. Because of the fact that

Representing the transfer rate from the k +1 th to the k-th device without loss of generality, further assumptions are made herein

Assume 3 that all states in the system are observable;

the control objective is to design the control inputs of the supply chain system such that the output y of the supply chain system tracks y with an optimal trajectory_dWhile suppressing the bullwhip effect and ensuring that all signals of the cascaded switching nonlinear system formed by the supply chain system are bounded.

Tracking control problem of a strict feedback system:

in this section, to guarantee tracking error e_k＝x_k-x_kdThe feed-forward controller is designed by utilizing a reverse-thrust method, so that the tracking problem of the supply chain system in a strict feedback form is converted into the optimal regulation problem in an affine form, and the reverse-thrust design process is described as follows:

step 1 for e₁＝x₁-x_1dDerived to obtain

Virtual control input x_2dSatisfy the requirement of

Feedback optimal control input

Feed-forward virtual control input will be designed in the next chapter

Obtained by solving the following formula

Lyapunov candidate function is defined as

Derivation with respect to t

Step k (k is more than or equal to 2 and less than or equal to n-1) for e_k＝x_k-x_kdDerived to obtain

Wherein the virtual control input x_(k+1)dSatisfy the requirement of

Feedback optimal control input

Feed-forward virtual control input is designed below

Satisfy the formula

Lyapunov candidate function is defined as

To V_kDerivative to obtain

Step n: likewise, e_n＝x_n-x_ndIs a derivative of

Wherein the virtual control input u_dSatisfy the requirement of

Feedback optimal control input

Will be designed in the next chapter, feed-forward virtual control input

Satisfy the formula

At this time, the Lyapunov candidate function is defined as

Then, for V_nDerivative to obtain the formula

Further, define

Then the formula can be rewritten as

Wherein

d＝[d₁,…,d_n]^T；

As mentioned before, we obtain

Wherein the feedforward virtual control input is represented by, and, the feedback optimal control input

And uncertain customer demand d ═ d₁,…,d_n]^TEstimating through a differential game theory;

note that in the observation of FIG. 2, we know that only the feedforward controller U is used^aThe stability of the entire supply chain system cannot be guaranteed and it is therefore necessary to design a differential gaming strategy to smooth out affine-type systems.

Designing a differential game strategy:

the method is characterized in that the bull penis effect of a supply chain system is usually solved as H-infinity control, from the perspective of a game theory, the design of an H-infinity controller is equivalent to a two-person zero-sum game, namely the controller minimizes a performance index under the maximum disturbance, so that the optimal control is realized, therefore, the bull penis effect problem of a nonlinear supply chain system can be solved by a game method, and in the game process, a real-time strategy iteration method of evaluating, executing and interfering three neural networks is utilized to solve an HJI equation generated by the nonlinear zero-sum differential game on line;

zero and differential gaming:

we describe the system as follows:

wherein

X＝[x₁,…,x_n]^T

The goal of this problem is to design the control input U such that for a given γ > 0, the control input U is designed such that

Wherein Q (E) is not less than 0 and R is not less than R^T> 0 and d ∈ L₂[0,∞)；

Assuming 4 selects γ > 0, there is a control input U such that the system is progressively stabilized and has L₂The gain is not more than gamma;

note 3 that hypothesis 4 guarantees the existence of a nonlinear H ∞ control problem solution, i.e., the bull penis effect problem of the supply chain system is solvable;

the performance index is represented by

The H ∞ control problem consisting of control of the supply chain system and uncertain customer demand can be viewed as a two-party zero-sum game problem. Defining a value function of a policy as

It is constrained by the dynamic equation, and our goal is to find a Nash equilibrium point (U)^*,d^*) So that the control input U of the supply chain system^*Minimizing performance index, uncertain customer demand d for supply chain systems^*Maximizing the performance index;

defining a Hamiltonian associated with an allowable control input U and an uncertain customer demand input d of a supply chain system as

H(E,U,d)＝Q(E)+U^TRU-γ²‖d‖²+(▽V(E))^T(F_i(E)+G_i(X)U+P_i(X)d)+_H＝0 (21)

Wherein

Is the gradient of V (E) with respect to E. Function of optimum V^*(E) Is defined as

If game saddle points exist, the two-party optimal control problem of the supply chain system has a unique solution, namely the Nash equilibrium condition is established

By quiescent conditions

And, we get the optimal control pair for the supply chain system, which can be written as shown

Will bring in the HJI equation that we get for the supply chain system as

V^*(0)＝0 (25)

In order to obtain a saddle point solution of a differential countermeasure, a HJI equation of a supply chain system must be solved, the HJI equation in a nonlinear system is known to be a partial differential equation and is difficult to obtain by an analytic solution, and therefore an ADP method is adopted for solving;

ADP-based strategy iteration:

ADP adopts three neural networks (evaluation network, execution network and interference network) to respectively approach a value function, a control strategy and an uncertain customer demand strategy in an iteration process to finally obtain an approximate solution of a nonlinear supply chain system HJI equation, before ADP is applied to solve the HJI equation, the following reasoning is given, wherein 1 is an error dynamics system (17) considering a value function (19) and a differential game strategy (24), and J (E) is a continuous differentiable and radially unbounded Lyapunov candidate function, so that

Wherein

Is J_E(E) Regarding the gradient of E, let Λ (E) be a positive definite matrix, and when E is 0, Λ (E) is 0; for any E ≠ 0,

further, Λ (E) satisfies

And

then the following relationship holds:

note 4 for error dynamics system (17) with control strategy and disturbance strategy (24), assume

Is a function of the state of the system, in particular, we assume

And is

Thus, the inequality

According to

(▽J_E(E))^T(F_i(E)+G_i(X)U^*+P_i(X)d^*) The theorem 1 is easy to find reasonable less than 0, and actually, the function can be obtained by properly selecting quadratic polynomial

As can be seen from the high-order approximation theorem of Wierstrass, there exists a completely independent basis set

So that the value function V (E) and its gradient are consistently approximated, i.e. there is a coefficient c_iSo that

Is formed in which

The second terms in equations (28) and (29) converge consistently to zero when N → ∞;

to implement a differential gaming strategy (24), an optimal function is approximated using NN such that

Wherein

And

respectively representing the ideal weight and the activation function of the evaluation neural network,_c(E) and L represents an approximation error and a neuron number, respectively, the gradient of equation (30) can be written as

Under the fixed control strategy U and the uncertain customer demand strategy d, the approximation function of the neural network is used for obtaining (32)

Residual error is

According to (24), the feedback optimal control and the worst uncertain customer demand are rewritten as

The HJI equation at this time is

The approximation error generated by the value function is

However, the ideal weight W_cIs unknown, therefore, the differential game strategy (24) cannot be directly obtained, and in order to solve the problem that the ideal weight of the value function is unknown, the value function is used

Approximation function is removed so that

Hamiltonian becomes

Clearly, our goal is to adjust the estimated weights

Approximating a Hamiltonian

So that the weights are estimated

Converge on the ideal weight W_cThat is, design

Update rate of

Minimizing mean square residual

The tuning law for designing the neural network based on the gradient descent method is shown as (40)

Wherein

a_c> 0 is a design parameter that is,

the weight estimation error is

Therefore, according to (35), (38) and (40), we obtain the estimation error dynamics of the evaluation network

According to a standard strategy iterative algorithm, when a solution of Hamiltonian equation (32) is given, network and interference network updates are performed as shown at (43) and (44)

Wherein c is_iIs unknown;

obtaining a solution W of formula (32) using a least squares method_cDefining a control strategy and an uncertain demand strategy as shown in (45) and (46);

it is demonstrated that when N goes to infinity, U and d converge to (43) and (44), respectively, the ideal control strategy and uncertain customer demand strategy are updated by (45) and (46), respectively, as shown in (47) and (48), respectively, when the control and uncertain customer demand strategies are calculated in the form of a neural network;

wherein

Representing ideal weights W at update of control strategy_cThe current estimated value of (a) of (b),

representing ideal weights W when uncertain customer demand policies are enforced_cDefining the error of the implemented neural network estimation and the error of the interfering neural network as shown in (49) and (50);

suppose 5 evaluates the ideal weight W of the neural network_cExist in the upper bound W_max> 0, such that W_cSatisfies | W_c‖≤W_max(ii) a Gradient of activation function

And approximation function gradient

Are all bounded so that

And

is formed, wherein σ_M＞0，_MIs greater than 0. In addition, residual error_HJIIs also bounded, exists_HM> 0, so that |_HJI‖≤_HMIf true;

theorem 1 (on-line zero-sum game tuning law of supply chain system)

Considering the supply chain system constrained by the dynamic equation (17), using the evaluation neural network in (37), (47) and (48), the execution neural network and the interference neural network to approximate the value function of the supply chain system, control the input and uncertain customer requirements, and the optimization law of the given evaluation network, the execution neural network and the interference neural network ensures the convergence of the weight functions of the three neural networks and the stability of the supply chain system;

let the tuning law of the evaluation network be

Wherein

Suppose that

The continuous excitation condition is met; the tuning law of the execution network is designed as

Optimizing law of interference network

Wherein

F₁＞0,F₂＞0,F₃＞0,F₄And > 0 is a tuning parameter, specified in the certificate,

is a learning parameter, there is N₀So that the number of neurons in the hidden layer is N > N₀Error state of supply chain system, and error of neural network

Error of executing neural network

And errors that interfere with neural networks

The agreement is eventually bounded, and in turn,

the index converges to the optimal evaluation neural network weight W_c([25])。

Numerical simulation:

a two-stage nonlinear cascade supply chain system is provided, and the effectiveness of the method is proved;

wherein x is (x)₁,x₂)^T,σ(t):[0,+∞)→M＝{[1],[2],[3],[4]},

Defining an initial value x₁(0)＝0.1,x₂(0) 0 and reference signal y_d＝0.5sin(t)；

In the design part of the feedback differential game, the selective activation function is

The initial weights of the performing network and the interfering network are chosen randomly between (0, 1), the initial weight of the evaluating network is 1

R＝I，a_c＝a_a＝a_d2, 4, the tuning parameter is designed as F₁＝F₃＝200*[1,1,1]^TAnd F₂＝F₄I is a unit array with appropriate dimensions, 20I;

the Lyapunov candidate function defined in theorem 1 is

In addition, a small probe signal n (t) ═ 0.1sin (t)⁵cos(t)+0.1sin(2t)⁵cos (0.2t) was added to the controller in the first 4 seconds to ensure a sustained activation condition.

The goal is to design the control inputs of a supply chain system with uncertain customer needs by a back-push based zero and differential gaming method, such that the system output y tracks y in an optimal manner_dWhile reducing the bullwhip effect, as described in the above note 1, the supply chain system is essentially a switching system, the switching signal of which is shown in fig. 1;

the system output trace and reference signal are shown in fig. 2, and it can be seen from fig. 3 that the error between the system output and the reference signal is limited to a small compact set, which illustrates the effectiveness of our proposed method, and it can be seen from the figure that under switching conditions, the system tracking the output of the reference signal is achievable, which illustrates the effectiveness of the method herein, i.e. the worst-case demand-induced bullwhip effect can be reduced using the method herein;

for comparison, the controller is designed in the presence of general disturbance, the trajectories of the system output and the reference signal are shown in fig. 4, and the errors of the system output and the reference signal are shown in fig. 5, and it is noted that the controller designed in the presence of general disturbance cannot guarantee convergence of the system state.

In conclusion, the invention models a non-linear supply chain system with uncertain customer requirements as a two-person zero-sum game problem, and aims to reduce the bullwhip effect by a game theory method; the method comprises the steps of synchronously updating and evaluating weights of a neural network, an execution neural network and an interference neural network in real time on line by combining a reverse-pushing technology and an ADP technology to obtain a Nash equilibrium solution of a corresponding HJI equation; the stability of the closed-loop system is proved by a Lyapunov method; in real life, the model of the supply chain system is not completely known, and therefore, it is more realistic to apply the method proposed herein to a nonlinear supply chain system that does not require prior knowledge of the state function.

Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims

1. The zero and differential game processing method for the supply chain system based on the reverse-thrust design method is characterized by comprising the following steps of: the supply chain system is composed of equipment and distribution entities, is driven by uncertainty of customer demand, completes purchase of raw materials, converts the materials into intermediate and finished products and distributes the finished products to customers by controlling material flow and information flow, and an important problem in supply chain management is how to reduce the bullwhip effect, namely the influence of amplified demand variability in the process of converting demand information from a downstream layer to an upstream layer, and comprises the following specific steps:

s3: next, in order to overcome the difficulty that a Hamilton-Jacobi-Isaacs equation is difficult to obtain an analytic solution, a self-adaptive dynamic programming technology is utilized to research zero and differential game strategies, an evaluation network is constructed, the network and an interference network are executed to learn a value function of an HJI equation in real time and online, a control strategy and an interference strategy are constructed, a game algorithm is called as synchronous zero and game strategy iteration, and a Lyapunov method is used for proving the convergence of a differential game based on reverse thrust and the stability of a closed-loop system;

2. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: the bull penis effect of the supply chain system in the step S1 is usually solved as H ∞ control, and from the perspective of game theory, the design of the H ∞ controller is equivalent to two-person zero-sum game, that is, the controller minimizes the performance index under the maximum disturbance, thereby realizing the optimal control.

3. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: in the step S2, a feed-forward controller is designed by using a reverse method, so that the tracking problem of the supply chain system in a strict feedback form is converted into an optimal regulation problem in an affine form.

4. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: in the step S2, based on the strategy iteration of ADP, ADP adopts three neural networks, and the evaluation network, the execution network and the interference network respectively approximate the value function, the control strategy and the uncertain customer demand strategy in the iteration process, so as to finally obtain an approximate solution of the non-linear supply chain system HJI equation.

5. The method for zero-sum differential gaming processing for a supply chain system based on a reverse-thrust design method as set forth in claim 1, wherein: the goal in said step S4 is to design the control inputs of the supply chain system with uncertain customer requirements by a back-push based zero and differential gaming method, making the system outputs track in an optimal way while reducing the bullwhip effect, the errors of the system outputs and the reference signals being limited to a small compact set, which illustrates the effectiveness of our proposed method, in switching conditions the system tracking the outputs of the reference signals is possible, which illustrates the effectiveness of the method herein, for comparison, the controller is designed in the presence of general disturbances.