CN111126905A

CN111126905A - Casting enterprise raw material inventory management control method based on Markov decision theory

Info

Publication number: CN111126905A
Application number: CN201911296380.7A
Authority: CN
Inventors: 唐红涛; 王广森; 陈世义
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2020-05-08
Anticipated expiration: 2039-12-16
Also published as: CN111126905B

Abstract

The method comprises the steps of establishing a casting enterprise raw material inventory control model, namely an SCO-IP model, under a dynamic production environment by utilizing a Markov Decision Process theory, carrying out abstract modeling on the model, and finally quantitatively describing the basic operation Process of a researched object; specifically, (1) describing the operation flow of orders, inventory and production in a casting enterprise under a dynamic production environment, and making reasonable assumption on the environment in which a model needs to be established; (2) on the basis of the current situation and reasonable hypothesis, key parameters of a Markov precision Process theoretical description model are adopted, complete Markov multiple reorganization (3) is established for analyzing Decision rules generated in the Markov precision Process theoretical Process, and a complete target cost function is constructed.

Description

Casting enterprise raw material inventory management control method based on Markov decision theory

Technical Field

The invention relates to the field of raw material inventory management methods, in particular to a raw material inventory management control method for a casting enterprise based on a Markov decision theory.

Background

In many foundries, customer order, foundry production, and inventory management are central to the supply chain management of each foundry. Although the three important supply chain management links are greatly developed in the casting enterprises, the problems to be solved due to the characteristics of the casting enterprises still exist in the process of jointly managing orders, production and inventory:

① the impact of poor coupling of production to inventory management on production;

② delay in response of inventory to production plans placed on random orders;

③ the randomness of the order and the small lot nature of the individual pieces have a large impact on the cost of production.

In the existing casting enterprises, raw material inventory management and production management are split, the management processes are mutually independent, the connection between the raw material inventory management and the production management is not tight enough, information transmission is delayed, so that the raw material inventory cannot respond to the dynamic production process in time, and a certain production interruption risk is easily caused; meanwhile, the inventory management of raw materials is considered independently, so that unnecessary raw material replenishment is easily generated, a large amount of material overstock is caused, higher inventory cost is formed, and economic benefit is reduced.

Disclosure of Invention

The invention provides a raw material inventory management control method for a casting enterprise based on a Markov decision theory, which aims to solve the problem of unnecessary waste in the existing inventory management process and obtain a better inventory control method.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

the method comprises the steps of establishing a casting enterprise raw material inventory control model, namely an SCO-IP model, under a dynamic production environment by utilizing a Markov Decision Process theory, carrying out abstract modeling on the model, and finally quantitatively describing the basic operation Process of a researched object; in particular, the method comprises the following steps of,

(1) describing the operation flows of orders, inventory and production in a casting enterprise under a dynamic production environment, and making reasonable assumptions on the environment in which a model needs to be established;

(2) on the basis of the current situation and reasonable hypothesis, key parameters of the model are described by a Markov precision Process theory, and a complete Markov multiple group is established;

(3) analyzing a Decision rule generated in the Markov precision Process theoretical Process, and constructing a complete target cost function;

(4) and analyzing the characteristics of the model to find an algorithm for solving the optimal inventory control strategy of the model.

Further, making reasonable assumptions about the environment of the model includes:

(1) production materials having the following characteristics were not considered in this model:

① no longer considers the depreciation property of the materials, and a part of the materials can be reused after the materials are used;

②, the material produced in the production process belongs to the necessary loss of production;

③ the utilization value of the newspaper and waste is not considered for the time being;

(2) in order to quantify the storage cost of the materials, the storage point is regarded as a warehouse with lease expenses;

(3) the warehouse capacity has an upper inventory limit;

(4) the influence of the purchased material quantity on the material supplement speed is not considered;

(5) the penalty cost brought by a certain degree of delay delivery is allowed to bear;

(6) ignoring production preparation time after confirming acceptance of the order;

(7) the production is regarded as a single-line production mode, namely a plurality of orders are not produced at the same time;

(8) the scheduled production tasks can be completed on time, and task delay caused by unexpected factors can not occur;

(9) the stock can ensure the normal production;

(10) taking the order quantity of the raw materials and the corresponding stock level as discrete variables;

(11) only consider random incoming or outgoing orders each time the system is reviewed, and we consider the probability distribution when one order comes and the other orders do not come; when a random order arrives, the production planner can determine the relevant information for the order.

Further, the decision rule includes:

(1) order admission rules:

order admission rules aim to address how random customer orders are handled, imposing the following constraints on the model:

OP_k/P_max+x-τ≤0；

(2) production scheduling rules:

since the production scheduling requirement cannot be met only by depending on the stock level state in the model, a production raw material stock level reference matrix is introduced

Which takes into account the orders already placed and the ordering situation,

t_min-t≤τ

after processing the possible incoming orders and scheduling the accepted orders, transition to the system state at the next review time.

Further, the objective cost function is:

further, the Markov Decision Process theory refers to finite stage deterministic Markov Decision Process theory.

Furthermore, the algorithm is an improved reverse induction algorithm based on a dual-processor mechanism, which is obtained by taking a reverse induction method as a main body and combining the characteristics of the model, so that the model is solved more efficiently.

Further, the markov multiple reorganization comprises: decision time and period, state, action, transition probability and reward;

(1) decision time and period:

in the Markov decision process, because the decision time point set T can have various characteristics, the models can be classified according to the characteristics of the decision time point set T:

a) when the decision time point set T is an infinite point set which can be listed, that is, { T ∈ R | T ═ 1,2,3,. and n,. the model is regarded as a discrete decision time model under an infinite planning stage;

b) when the decision time point set T is a tabulatable finite point set, i.e., [1, 2, 3.., n }, we consider the model as a discrete decision time model in a finite planning stage;

c) when the decision time set T is a continuous finite set, namely T belongs to [0, n ], the model is regarded as a continuous decision time model under a finite stage;

d) when the decision time set T is a continuous infinite set, namely T belongs to [0, infinity ]), regarding the model as a continuous decision time model in an infinite stage;

the model is a discrete decision time model under a finite stage;

(2) state and action set:

at the beginning of each decision time, the system will present the corresponding state; s represents a set of possible system states; when a decision maker observes that the state of the system is S at a certain decision moment, S belongs to S, and according to the state, the feasible action set A_sIn the method, a reasonable action a is selected, a belongs to A_s；

(3) Transition probability and reward:

at decision time t, after taking action against state s, two effects are produced on the system:

the decision maker receives an immediate reward (cost) r (s, a),

the current state will be distributed with probability p_t(. s, a) transitioning to the state for the next decision time;

the instant remuneration (cost) r (S, a) is defined in S ∈ S, a ∈ A_sA real-valued function above, which represents the value of the reward (cost) generated at period t after the decision is made at decision time t; when the value is positive, it represents a profit or a reward, otherwise it represents a cost; in the Markov decision process, the production process of the instant reward (cost) is not concerned, and only the value or the expected value of r (s, a) is known after an action is selected; and when continuing to the next examination time, the instant remuneration r (s, a) includes: one-time consideration (cost); cumulative rewards (costs); random reward (cost); consideration (cost) related to the status at the next time;

the expected reward value for action a may be expressed as:

in the above equation, p (s' | s, a) represents the transition probability of the system state s transitioning to the next decision instant, usually for the transition probability,

to this end, a complete markov decision process can be written:

{T,S,A(s),p(·|s,a),r(s,a)}。

the invention has the beneficial effects that:

the problem that the stock management and the production plan management of the raw materials of the casting enterprise are split is analyzed, on the basis of the problem, a raw material stock control model of the casting enterprise under a dynamic production environment is established by using a Markov decision process theory, and an optimal control strategy of the model stock is solved, so that the coupling between a raw material stock management system and a production system in the casting enterprise is increased, the production interruption risk and the operation cost are reduced, and an effective control strategy is provided for the stock management under the dynamic production environment.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a production state transition diagram;

fig. 3 is a stock level state transition diagram.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the method for managing and controlling the raw material inventory of the casting enterprise based on the Markov Decision theory includes establishing a raw material inventory control model of the casting enterprise, namely an SCO-IP model, under a dynamic production environment by using the Markov Decision Process theory, performing abstract modeling on the model, and finally quantitatively describing the basic operation Process of a researched object; in particular, the method comprises the following steps of,

When a decision maker implements a strategy pi, the decision maker obtains a series of remuneration (cost) according to a certain probability at each decision moment, and the cost is summed up to obtain a specific effect function of the model under the strategy pi. According to the optimization criterion, the total effect function under the finite phase under the strategy pi epsilon is defined as:

this expression indicates that the state is s when the decision time is 0₀And with the strategy pi, the desired total cost, r(s), of the system obtained from the start up until time N_N) Representing the cost incurred at the last decision instant in the decision process.

However, in this model, how to initially select a suitable strategy in the system, so that the cost incurred by the system is the lowest, is a matter of concern to the decision maker, i.e. to have the following equation as an optimum function,

for any ε ≧ 0, if there is a policy π^*So that the following equation holds, then^*The method is an N-stage epsilon optimal strategy, and when epsilon is 0, the strategy is called an optimal strategy.

Although the decision maker always wants to know how the system should take the optimal action at each possible initial state, in actual production, only the optimal strategy at a certain possible initial state needs to be considered. This is the finite-stage deterministic Markov precision Process theory and describes the SCO-IP model on the basis of the theory.

In the entire SCO-IP model, a decision maker will review the system status after a certain time, and after each review of the system status, take relevant actions, such as ordering raw materials, making a production plan for an accepted order, etc. The time for examining each state of the system is set as the examination time of the SCO-IP model and is denoted by T, wherein T is 0,1,2,3 …, T-1, and T denotes the last time of the finite stage. At each inspection time t, the decision-maker will observe a system state consisting of a plurality of parts and consisting of a set S_tUniformly expressing:

in the expression (3-1), the expression,

material inventory level status I including various production raw materials of interest at inspection time t_i，tAs shown in the expression (3-2), the expression well reflects the management of the model to be established on the multi-material inventory and is more practical, but the state space of the SCO-IP model becomes more complex, the difficulty of subsequent model solution is increased, and the model is more practical.

In expression (3-1), the SCO-IP model examines the second state component, the plant production state

This is a status component of a time advance nature, i.e., at the current time of review, the decision-maker can observe the status of plant productivity for some future time, which represents the time from the time of review t to t + τ of reviewDuring the time, the remaining capacity status matrix of the plant (which we assume the system has a limited scheduling duration, with a duration of τ unit scheduling intervals, i.e. the decision maker can only observe the scheduling for the next τ scheduling intervals at most) at different times of the inspection is shown as a row vector (3-3), where each element represents the remaining capacity observable in the future at the current time. When a new customer order is placed on production,

changes will follow.

[p_tp_t+1p_t+2… p_t+τ-1p_t+τ](3-3)

According to the Markov precision Process theory, after the system state is examined, the observed system state is analyzed and judged, and corresponding Decision is made. In the SCO-IP model, we set a decision set A (S), as shown by expressions (3-4), each of which

Should be considered as two parts, one part being a row vector containing tau elements

And a row vector containing n elements

A(s_t)＝{s_t∈S_t|A(s_t)} (3-5)

Line vector

As a response to the order scheduling in the decision matrix, it is based on the workshop production state matrix at the time of the examination t

And a material inventory level reference matrix

Scheduling plans for possible incoming orders. Where a "1" indicates that within the review interval, there are orders scheduled therein, and a 0 indicates that within the review interval, there are no orders scheduled therein. If the system has not made any scheduling plans for the order, for example at an audit time of 0, the matrix of τ observed by the decision maker can be expressed as,

it should be noted that as the examination time advances, the elements of the matrix will shift left, and each time an examination period is crossed, the elements will shift left by one bit, the elements in the previous period will overflow, and the last bit will be zero-filled.

Line vector

The response to the stock replenishment of the raw materials in the decision matrix is based on the stock level of each production raw material at the time of the review t

And all possible orders, the order plan made. It is the essence of the discovery that the optimal inventory control strategy is made according to the system status, so that the manufacturing and production can be performed normally while the inventory cost is minimized. It is important to reflect the performance differences exhibited between different control strategies and to derive an optimal raw material inventory control strategy.

In Markov precision Process theory, the expected total cost is formed in the whole finite stage by each instant cost r (s, a) under different strategies

To show this difference in performance, S₀Representing the possible state space in the early stages of the system, we need to find the space so that

Minimum inventory control strategy pi^*。

Markov Decision Process theory holds that it is not important how r (s, a) is obtained during the examination period, but only its value or expected value is obtained, thus giving a component of r (s, a) that contains 4 aspects of cost (reward): (1) one-time revenue (cost) to the next review time; (2) cumulative revenue (cost) for the next review time; (3) random revenue (cost) of state transition to the next review time; (4) depending on the revenue of the next review time status.

In the SCO-IP model, the main goal is to choose the optimal inventory control strategy. Therefore, it is considered that at each decision time, action is taken for a certain state, and the generated instant cost is mainly reflected in the cost associated with raw material inventory management, and the cost generated by other parts is not considered here. After the model and the operating environment are analyzed, a corresponding cost function expression is given:

one-time cost to the next review time:

in the formula (3-6), the first item represents the purchase cost of the material i, Q_i，tIndicates the order quantity of the ith raw material at the inspection time t, B_iRepresenting the purchase unit price of the ith raw material; the second term represents the fixed cost, g, incurred during the ordering process_iIndicating a fixed cost incurred in ordering the ith material, sgn indicates a sign function,

the function indicates that the fixed cost is only generated when an order event occurs. The third item represents the back-off cost incurred after the order is placed. Because the material stock level is insufficient, the scheduling plan is arranged at the back, and finally the delivery can not be completed in the specified time, and the scheduling plan is set as the delayed delivery penalty cost related to the delayed time length.

Representing a cost function related to the delay duration epsilon; the fourth term represents the lost revenue generated by rejecting the order. The expression forms of the third and fourth items will be described in detail later.

In the SCO-IP model, to avoid taking into account the randomness of the risk of order production interruptions, we enforce production preconditions:

when the various production raw materials in inventory can meet the quantity of material required in the order, a production plan is accepted and arranged for the order. Due to the fact that production cannot be performed in a stock out state, stock out costs are not considered. On the contrary, the production loss caused by the fact that the order cannot be arranged in the set production scheduling time due to the fact that the material stock level is insufficient, and the order is rejected finally is converted into stock penalty cost

Cumulative cost for the next stage:

in the expression (3-7), h_iRepresenting the cost of holding material i per unit time. According to the foregoing, since

Taking into account the production plan already scheduled, i.e. the amount consumed, and therefore

The end-of-cycle stock level is shown, while equations (3-7) represent the cost of goods generated at this stage.

Random cost to shift to the next time: in inventory costs, such costs are set to 0,

therefore, the instantaneous cost function r (s, a) of each stage, as shown in equations (3-9) and (3-10),

in the formula (3-9), r_i(s, a) indicates inventory management costs corresponding to the ith production raw material, wherein s' indicates that a decision maker takes action a in a state s at the current inspection time and then transits to a state at the next inspection time with a certain probability. And equations (3-10) represent the inventory management cost for all materials at this stage.

After the entire SCO-IP model is set forth, the deferred delivery costs and the rejection penalty costs involved in the objective cost equation will be explained. Among them, the deferred delivery cost in expression (3-6) is divided into two categories in the SCO-IP model:

when the accepted new customer order is lack of raw materials for production, production cannot be scheduled in the fastest time, so that scheduled production is delayed, and delivery is delayed;

due to the existing production scheduling plan, the new customer order accepted is forced to schedule production at a later time, resulting in production delays and delayed deliveries.

Although the two deferred delivery costs are caused by different system environments, they both represent penalty costs of deferred delivery in the end, as shown in the expression (3-11),

it is common practice to associate a deferred delivery cost with a deferred time, as dictated by the product contract of some businesses and customers. Thus, in equations (3-11), the penalty cost is deferred

Is considered as a function of the lead time epsilon. Since the specific form of this function is not the focus of the study in the SCO-IP model, the delivery cost will be postponed as shown by the expression (3-12)

Expressed as a linear function of the delay time epsilon. In the expression (3-12), δ represents the penalty cost to be borne per unit of the extension time. Although somewhat crude, this approach is not lost as a simple and effective way to express this deferred delivery cost.

f(ε)＝ε*δ (3-12)

In the objective cost function formula (3-9), in addition to the deferred delivery penalty cost, the loss of business due to rejection of the order is another component of the cost function of the control system that needs to be controlled.

In addition to evaluating the plant production capacity consumed by the order during the admission review of the incoming order, the quantities of various production raw materials required by the order also need to be evaluated. When the evaluations pass, the order is accepted, otherwise the system rejects the incoming order. Obviously, when the system rejects an order, the revenue from that order is lost as well. The income corresponding to different orders is different, and as shown in the expression (3-13), the cost of the refusal order is different

Differing from order to order, e.g.，c_1，tPresentation and customer order O_1，tAn associated fixed penalty cost, and refusal of O_2，tThe order will generate c_2，tFixed penalty cost of.

After defining and accounting for customer order information, how a decision maker should make decisions about specific system states with the order information known is an important ring in the SCO-IP model, which dominates the development of the entire production process and the control of the inventory system. Therefore, we introduce two important decision rules as follows:

(1) order admission rules

Order admission rules aim to solve "how to handle random customer orders", for which the model addresses the state s at some inspection time t_tThe following constraints apply:

OP_k/P_max+x-τ≤0 (3-14)

in inequalities (3-15), x represents a row vector in the equation (3-4)

The number of the medium element is 1. This inequality embodies yet another hard constraint of the model that production scheduling inhibition of orders is scheduled outside of the planned time period τ (relative to the current review time t).

(2) Rule of production scheduling

After solving how to handle incoming orders, how to arrange orders becomes another problem to be solved. When a customer order comes, the earliest production scheduling date t needs to be determined by combining the material inventory level of the scheduling date and the production capacity of a workshop_minBut only on the inventory level status given in the SCO-IP model

Can not meet the production demand, so a raw material stock water is introducedFlat reference matrix

As shown in expressions (3-16). The state matrix is used for scheduling reference only,

the order and the order situation are considered, and the stock level of each production raw material is the sum of the remaining stock quantity and the stock quantity from the moment to the moment at each inspection moment due to the purchase lead time of the production raw material.

t_min-t≤τ (3-19)

When there is no t satisfying the above inequality_minIf so, it indicates that the system is unable to properly place the order. Thus, the system will reject the order and thereby incur the loss of service caused by the rejected order

The customer is charged to the manufacturer, which includes the cost and profit of producing the order. In the expression (3-17), θ_tIndicating a production order O_tThe number of cycles required is the number of production capacity remaining on the day in the production scheduleAt t_minIn the presence of theta_tMay be an integer or a decimal number, and the value is represented by t_minAnd (4) determining. The expression also embodies the fact that the delay delivery duration needs to be limited to the minimum time so as to minimize the cost of inventory control; in the inequality (3-18),

is formed by that the examination time is t in the expression (3-3)_minUntil the examination time is t_minA matrix of rows of residual capacity composed of elements of + θ t-1, the inequality being expressed at t_minWhen the production is carried out at the inspection time, the residual production capacity of the immediately adjacent inspection period needs to meet the production capacity required by the production of the new order. According to the characteristic of continuity of the production process of the single order in the production of the foundry, the production process of the foundry must be arranged in the adjacent inspection period when the foundry is subjected to production and arrangement.

In the inequalities (3-19),

is formed by that the examination time is t in the expression (3-16)_minThe inequality representing the row matrix formed by the column elements of (a) at the time of examination t_minWhen the production is scheduled, the stock level of various production raw materials at the moment can meet the quantity of the production raw materials required by a newly-arrived order; while inequality (3-20) represents the examination time t_minShould be within the planned duration tau.

After processing the possible incoming orders and scheduling the accepted orders, we can write the system state for transitioning to the next review time, as shown in expression (3-21). Since the only random factor in the overall SCO-IP model is the randomness of the customer orders, in this control system, the probability distribution of the system state transitioning to the next audit point is the same as the probability distribution p (· | s, a) that the customer order came to. We can express the system state at the next review moment after the transition as follows,

the equation of the state transition is shown in the formula (3-22), and the state transition mode is shown in fig. 2 and 3.

Specially adapted for members other than the state space

Its states between adjacent inspection times also have a transition relationship, as shown in expression (3-23),

in the expression (3-22) above,

represents the production capacity consumption matrix obtained after planning the order, the matrix is a row matrix of 1 x tau, represents the workshop production capacity consumption in the future plan, and at the same time, we will order O_kProduction raw material [ OI ] referred to in (1)_1，kOI_2，k… OI_N，k]As a whole, use

Is shown by

A row matrix is shown which is composed of orders for each raw material to be produced. In the expression (3-23)

Representing the material consumption in different examination periods in the planning time from the current examination time t;

then it is indicated in the same meterAnd in the time division, the arrival condition of each production raw material at different inspection moments forms an N x tau multidimensional matrix by the arrival quantity of each production raw material. It is noted that when the row vector of capacity is shifted to the next examination time, the first element will overflow, the whole element will shift to the left, the last bit of the matrix will be complemented with the capacity in full state, wherein

The same operation will be performed.

In order to solve the SCO-IP model, firstly, the system state space of the SCO-IP model is improved, and then the original reverse induction algorithm is optimized and improved based on the improved state space and by combining the characteristics and the mechanism of the SCO-IP model, so that the improved reverse induction algorithm based on the dual-processor mechanism is obtained.

(1) System state space optimization

According to the setting of the SCO-IP model operation environment, the maximum stock level of the raw materials for production is limited. Therefore, in the process of rationality analysis of the state space of the control system, we apply the following constraints to the initial model state space,

I_1,t+I_2,t+I_3,t≤I_max(3-23)

Q_1,t+Q_2,t+Q_3,t≤I_max(3-24)

in inequalities (3-26) and (3-27), the sum of the stock levels of all the production raw materials cannot exceed the set maximum stock capacity at each system inspection time t, which is a basic condition for keeping the control system operating properly. Likewise, for each system decision time t, the sum of the replenishment quantities of all production raw materials likewise cannot exceed the maximum stock capacity.

The following constraints are further imposed on the state space of the system,

I_max-(I₁+I₂)≥I_3min(3-25)

I_max-(I₁+I₃)≥I_2min(3-26)

I_max-(I₃+I₂)≥I_1min(3-27)

inequalities (3-28) - (3-30) all show that at a certain stock level, the stock level fluctuation range of a certain raw material must be larger than I_i，minIn which I_i，minIndicating that a single order consumes the lowest consumption of the ith production raw material in the order pool. I is_iIndicating the stock level of the i-th production raw material. When the inventory status component in the system status space fails to satisfy the inequality, then the SCO-IP model will not accept any more orders. Because the stock level of a certain production raw material has a floating range smaller than I_i，minThen this inventory level status component is indicated as failing to satisfy the production of any order, which is restocked in real time to maximize inventory capacity. Obviously, the phenomenon that such a production system cannot accept orders is unreasonable, and the system status in this case makes the production process in the SCO-IP model unsustainable.

In conclusion, through the restriction on the inventory system and the production system, the state vectors which do not conform to the actual situation and the basic operation environment in the original system state space can be screened out, so that the system state space is optimized, and the subsequent solution of the model optimal strategy is facilitated.

(2) Order admission processing mechanism

As the diversification of customer demand has increased, it has been difficult for traditional inventory-oriented profitability models to keep up with customer demand. Particularly in the traditional manufacturing industry, inventory-oriented production (MTS) tends to trap products into a difficult position of lost sales. In order to adapt to diversified customer demands, the order-oriented production mode can enable enterprises to flexibly respond to diversified market demands. The enterprise facing order production is adopted to mainly organize production according to the coming customer orders. This allows for a tight connection between the production scheduling plan and the customer order. According to actual production, a business usually receives all possible orders under sufficient production conditions to maximize economic efficiency. Based on SCO-IP model, order admission rules in actual production are added and combined with a standard reverse induction algorithm to realize a corresponding order admission mechanism.

According to the setting of the SCO-IP model operating environment, only one customer order is possible to arrive at each examination time t, and when the order arrives, a decision maker can know the corresponding production information of the order. And the random customer orders for the SCO-IP model are a pool of customer orders formed based on results of historical data analysis of foundry orders. According to the basic characteristics of the model, the algorithm steps of the order admission mechanism are given as follows:

1) determining an order pool, Ord, based on historical statistics_allThe order pool contains N possible orders; and according to the definition of order information, have

n≤N，O_k∈Ord_all. The probability distribution of each order coming in the order pool obeys a certain probability distribution

2) Checking the residual production capacity state P of the system;

3) select order O_nIf the remaining production capacity P of the current system is more than or equal to P_nAnd satisfies the following inequality,

then the customer order O is accepted_n. Otherwise, the customer order O is rejected_nAnd simultaneously obtain the rejection penalty,

4) if N is equal to N, the algorithm is ended; otherwise n is n +1 and step 3 is repeated.

(3) Production scheduling mechanism

According to the actual production situation, the SCO-IP model adopts the production mode of MTO, and the production scheduling plan is based on the information of the customer order. As described above for the formation of the order admission mechanism and the implementation of the algorithm, the production scheduling mechanism is also based on the processing manner of the actual production situation. Since in the SCO-IP model, there may be one order at a time, we simplify the production scheduling mode to a first come first scheduling mode, i.e., an order priority mechanism. The mode is also a relatively common production scheduling mode in production. Therefore, based on the production mode, the implementation steps of the corresponding production scheduling algorithm are given:

1) obtaining the result of the order admission mechanism algorithm at the same examination time t;

2) examining the remaining production capacity state P of the system and the stock level state I of each raw material_t，i；

3) Select order O_kIf the order is accepted, the following formula is calculated, and t satisfying the corresponding constraint condition is found_minOtherwise, keeping the original production scheduling plan;

4) if there is t satisfying the condition_minThen the order is placed, i.e. from the calculated t_minStarting to arrange production backwards continuously until order O_kUntil the production is finished, updating a production scheduling plan; otherwise, keeping the original production scheduling plan;

5) if N is equal to N, the algorithm is ended; otherwise, n is n +1, and returns to step 3.

(4) Dual-processor reverse induction algorithm based on optimization space

After the state space of the SCO-IP model is analyzed reasonably and the order admission processing mechanism and the production scheduling mechanism are realized, the order admission processing mechanism and the production scheduling mechanism need to be combined with a standard reverse induction algorithm so as to find an inventory control optimal strategy of the SCO-IP model. Therefore, an improved reverse induction algorithm is provided, which is based on an optimized system state space and incorporates an order admission mechanism and a production scheduling mechanism. The steps of the algorithm are as follows,

1) establishing a production system, an inventory system, a customer order system and a decision system, and initializing relevant data;

2) establishing an initial system state space S₁Initial action space A₁；

3) Determining a finite stage length T;

4) optimizing the initial system state space and the initial action space to form S'₁And A'₁；

5) At an inspection time T ═ T, in an optimized system state space S'₁Each possible system state;

6) applying an order admission processing mechanism;

7) applying a production scheduling mechanism;

8) record the corresponding t_min；

9) Implementation of optimized action space A'₁；

10) Calculating the optimal expected inventory cost of each system state vector when a random customer order is placed, and recording the corresponding optimal action vector and the optimal expected inventory cost of each possible system state;

11) at an inspection time t ═ t-1, in an optimized system state space S'₁In each of the possible system states of the system,

12) applying order admission processing mechanism and production scheduling mechanism, and recording corresponding t_minThen, performing a motion space;

13) using a reverse induction algorithm, calculating

And record the corresponding mostAction of excellence A^*，

14) If t is 1, the algorithm ends; otherwise, returning to the step (11).

Key parameters of the model

Claims

1. The casting enterprise raw material inventory management control method based on the Markov decision theory is characterized by comprising the following steps of: the method comprises the steps of establishing a casting enterprise raw material inventory control model, namely an SCO-IP model, under a dynamic production environment by using a Markov Decision Process theory, carrying out abstract modeling on the model, and finally quantitatively describing the basic operation Process of a researched object; in particular, the method comprises the following steps of,

2. The Markov decision theory-based casting enterprise raw material inventory management control method according to claim 1, wherein: making reasonable assumptions about the environment of the model includes:

(3) the warehouse capacity has an upper inventory limit;

(9) the stock can ensure the normal production;

3. The Markov decision theory-based casting enterprise raw material inventory management control method according to claim 1, wherein: the decision rule comprises:

(1) order admission rules:

OP_k/P_max+x-τ≤0；

(2) production scheduling rules:

Which takes into account the orders already placed and the ordering situation,

t_min-t≤τ

4. The Markov decision theory-based casting enterprise raw material inventory management control method according to claim 1, wherein: the objective cost function is:

5. the Markov decision theory-based casting enterprise raw material inventory management control method according to claim 1, wherein: the Markov precision Process theory refers to finite stage deterministic Markov precision Process theory.

6. The Markov decision theory-based casting enterprise raw material inventory management control method according to claim 1, wherein: the algorithm is an improved reverse induction algorithm based on a dual-processor mechanism by taking a reverse induction method as a main body and combining the characteristics of the model, and the model is solved more efficiently.

7. The Markov decision theory-based casting enterprise raw material inventory management control method according to claim 1, wherein: the Markov multi-reorganization includes: decision time and period, state, action, transition probability and reward;

(1) decision time and period:

b) when the decision time point set T is a tabulatable finite point set, i.e. T ═ 1,2, 3.., n }, we consider the model as a discrete decision time model under a finite planning stage;

the model is a discrete decision time model under a finite stage;

(2) state and action set:

(3) Transition probability and reward:

the decision maker receives an immediate reward (cost) r (s, a),

the expected reward value for action a may be expressed as:

to this end, a complete markov decision process can be written:

{T,S,A(s),p(·|s,a),r(s,a)}。