CN113077188A - MTO enterprise order accepting method based on average reward reinforcement learning - Google Patents
MTO enterprise order accepting method based on average reward reinforcement learning Download PDFInfo
- Publication number
- CN113077188A CN113077188A CN202110468897.0A CN202110468897A CN113077188A CN 113077188 A CN113077188 A CN 113077188A CN 202110468897 A CN202110468897 A CN 202110468897A CN 113077188 A CN113077188 A CN 113077188A
- Authority
- CN
- China
- Prior art keywords
- order
- enterprise
- mto
- reinforcement learning
- cost
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0631—Resource planning, allocation, distributing or scheduling for enterprises or organisations
- G06Q10/06316—Sequencing of tasks or work
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/067—Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
- G06Q30/0206—Price or cost determination based on market factors
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Entrepreneurship & Innovation (AREA)
- Theoretical Computer Science (AREA)
- Economics (AREA)
- Development Economics (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Finance (AREA)
- Marketing (AREA)
- Accounting & Taxation (AREA)
- Game Theory and Decision Science (AREA)
- Educational Administration (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Operations Research (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Factory Administration (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses an MTO enterprise order receiving method based on average reward reinforcement learning, which comprises the following steps of: assuming order information, determining a system state set, determining a system action set, determining an immediate return function, constructing an order receiving model and solving the order receiving model; on the basis of factors considered by the traditional MTO enterprise order receiving problem, the invention increases order inventory cost and various customer priority factors, constructs an order receiving model in the semi-Markov decision process, uses the SMART algorithm to solve, and on the basis, uses the greedy algorithm to sequence and produce the received orders so as to maximize the long-term average income of the enterprise.
Description
Technical Field
The invention relates to the technical field of enterprise order acceptance selection, in particular to an MTO enterprise order acceptance method based on average reward reinforcement learning.
Background
The MTO enterprise refers to an enterprise which is produced by the enterprise according to a client order, different clients have different requirements on the type of the order, the MTO enterprise organizes and produces the order according to the order requirements put forward by the clients, under the normal condition, the capacity of the enterprise is limited, and the enterprise cannot accept the orders of all clients due to the limitation of various cost factors, so that the MTO enterprise is required to make a corresponding order accepting method, the success of one MTO enterprise depends on the selectivity of the order accepting method to a great extent, and a good order accepting method plays a great role in the long-term profit of the enterprise;
from the existing research, some achievements have been obtained by a decision method related to order acceptance problems, but with the rapid development of electronic commerce, the personalized requirements of consumers become more and more obvious, traditional production enterprises usually do not directly contact terminal customers during product production, when the requirements of customers are diversified, the requirements are difficult to meet, and the existing order acceptance methods are not comprehensive in factors considered in the modeling process, so that order acceptance strategies cannot be effectively determined according to the production capacity and the order states of the enterprises.
Disclosure of Invention
Aiming at the problems, the invention aims to provide an MTO enterprise order receiving method based on average reward reinforcement learning, which increases order inventory cost and various customer priority factors on the basis of the factors considered by the order receiving problem of the traditional MTO enterprise, constructs an order receiving model in the semi-Markov decision process, uses the SMART algorithm to solve, and uses the greedy algorithm to perform sequencing production on the received orders on the basis so as to maximize the long-term average income of the enterprise.
In order to achieve the purpose of the invention, the invention is realized by the following technical scheme: an MTO enterprise order accepting method based on average reward reinforcement learning comprises the following steps:
the method comprises the following steps: assumption of order information
Supposing that an MTO enterprise produces through a single production line and n types of customer orders exist on the market, the order information comprises customer priority mu, price p, quantity Q, unit product production cost c, lead time LT and latest delivery time DT;
step two: determining a set of system states
According to step one, if there are n order types in the system, the system state can be represented by vector S: (μ, p, Q, LT, DT, T), where T represents the production time still required for an order that has been accepted before the decision phase;
step three: determining a set of system actions
According to step one, when a customer order arrives, a decision to accept and reject the order needs to be made, and the set of actions in the model can be represented by the vector a ═ (a)1,a2) Is shown in the specification, wherein a1Indicating acceptance of an order, a2Indicating a rejection of the order;
step four: determining an immediate reward function
After the MTO enterprise makes a decision whether to accept an order, the obtained immediate return function is as follows:
in the formula, I ═ p × Q represents the profit for the order, C ═ C × Q represents the production cost consumed, Y represents the deferred penalty cost for the enterprise, N represents the cost for producing the inventory cost, and J represents the rejection cost for the order;
step five: building order acceptance model
Constructing an order receiving model in a half Markov decision process according to a system state set, a system action set and an immediate return function, simulating a real MTO enterprise order receiving problem based on an average reward reinforcement learning idea, wherein according to a Bellman optimal theorem, a corresponding optimal strategy in the half Markov decision process problem is as follows:
wherein Define the average reward, t, achieved during decision period mmRepresenting the time at which decision period m transitions from state s to state s';
step six: order acceptance model solution
The method comprises the following steps of adopting reinforcement learning average reward as an evaluation target, solving an order accepting model in a half Markov decision process through an average reward reinforcement learning SMART algorithm, sequencing orders in the SMART algorithm by using a greedy algorithm to obtain an optimal order accepting decision, wherein an updating formula of the average reward reinforcement learning SMART algorithm is as follows:
where α represents the learning rate, m represents the current iteration index, rm(s, a, s') represents the immediate reward obtained after taking action a in state s, tm(s, a, s ') denotes the time for transition from state s to s', RmRepresents the cumulative return, p, for the mth decision periodmRepresents the average return, t, of the mth decision periodmRepresenting the cumulative time of the mth decision period.
The further improvement lies in that: in the first step, the order of the customer achieves the poisson distribution with the obedience parameter of lambda, and the price and the required quantity of the order are evenly distributed.
The further improvement lies in that: in the second step, based on the MTO enterprise with limited energy production, if T has the maximum upper limit value and n order types, the state set S of the system has n × T states.
The further improvement lies in that: in the fourth step, the three equations of r (s, a) represent the equation when Q (s, a) is from top to bottom1)>Q(s,a2) When, and at the current state, an order can be inserted into the current production plan, an immediate return is made equal to the net profit obtained by accepting the order, when Q (s, a)1)>Q(s,a2) When, but the order cannot be inserted into the current production plan in the current state, an order net profit equal to the loss is immediately returned, when Q (s, a)1)<Q(s,a2) The immediate return equals the rejection cost.
The further improvement lies in that: in the fourth step, the postponing penalty cost Y ═ μ × u { (T + Q/b) -LT }, where u denotes the postponing penalty cost per unit time and b denotes the unit production capacity of the enterprise.
The further improvement lies in that: in the fourth step, the product produced by the customer before the lead period is not taken in advance, so that the inventory cost N ═ Q × h { LT- (T + Q/b) } generated by temporarily storing the product in the MTO enterprise warehouse is caused, wherein h represents the unit product storage cost per unit time.
The further improvement lies in that: in the sixth step, the exploratory probability e which is reduced along with the increase of the simulation iteration number is adopted to ensure the convergence of the SMART algorithm of the average reward reinforcement learning, and alpha and e are attenuated according to a DCM scheme:
where χ represents an arbitrarily large real number.
The invention has the beneficial effects that: on the basis of factors considered by the traditional MTO enterprise order receiving problem, the invention increases order inventory cost and various customer priority factors, constructs an order receiving model in the semi-Markov decision process, uses the SMART algorithm to solve, and on the basis, uses the greedy algorithm to sequence and produce the received orders so as to maximize the long-term average income of the enterprise.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of an order acceptance method of the present invention;
FIG. 2 is a diagram of a reinforcement learning order decision interaction of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," "fourth," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Referring to fig. 1 and 2, the embodiment provides an MTO enterprise order acceptance method based on average reward reinforcement learning, including the following steps:
the method comprises the following steps: assumption of order information
Supposing that an MTO enterprise produces by a single production line and n types of customer orders exist in the market, wherein order information comprises customer priority mu, price p, quantity Q, unit product production cost c, lead time LT and latest delivery time DT, the customer orders achieve Poisson distribution with compliance parameter lambda, and the price and the required quantity of the orders are uniformly distributed;
step two: determining a set of system states
According to step one, if there are n order types in the system, the system state can be represented by vector S: (mu, p, Q, LT, DT, T), where T represents the production time still needed for an accepted order before the decision phase, and T has the maximum upper limit value and n order types based on the limited-capacity MTO enterprise, then the state set S of the system has n × T states in total;
step three: determining a set of system actions
According to step one, when a customer order arrives, a decision to accept and reject the order needs to be made, and the set of actions in the model can be represented by the vector a ═ (a)1,a2) Is shown in the specification, wherein a1Indicating acceptance of an order, a2Indicating a rejection of the order;
step four: determining an immediate reward function
After the MTO enterprise makes a decision whether to accept an order, the obtained immediate return function is as follows:
in the formula, I ═ p × Q denotes the profit to be obtained from the order, C ═ C × Q denotes the production cost of consumption, Y denotes the deferred penalty cost of the enterprise, N denotes the cost of producing inventory, J denotes the rejection cost of the order, and the three equations of r (s, a) denote the cost of rejection when Q (s, a) is respectively expressed from top to bottom1)>Q(s,a2) When, and at the current state, an order can be inserted into the current production plan, an immediate return is made equal to the net profit obtained by accepting the order, when Q (s, a)1)>Q(s,a2) When, but the order cannot be inserted into the current production plan in the current state, an order net profit equal to the loss is immediately returned, when Q (s, a)1)<Q(s,a2) When the product is returned immediately, the immediate return is equal to rejection cost, the delay penalty cost Y of the enterprise is mu u { (T + Q/b) -LT }, wherein u represents the delay penalty cost per unit time, b represents the unit production capacity of the enterprise, and the product produced by the customer before the lead time is not taken in advance, so that the inventory cost N of the product temporarily stored in the MTO enterprise warehouse is Q h { LT- (T + Q/b) }, wherein h represents the unit product storage cost per unit time;
step five: building order acceptance model
Constructing an order receiving model in a half Markov decision process according to a system state set, a system action set and an immediate return function, simulating a real MTO enterprise order receiving problem based on an average reward reinforcement learning idea, wherein according to a Bellman optimal theorem, a corresponding optimal strategy in the half Markov decision process problem is as follows:
wherein Represents the average return, t, achieved during decision period mmRepresenting the time at which decision period m transitions from state s to state s';
step six: order acceptance model solution
The method comprises the following steps of adopting reinforcement learning average reward as an evaluation target, solving an order accepting model in a half Markov decision process through an average reward reinforcement learning SMART algorithm, sequencing orders in the SMART algorithm by using a greedy algorithm to obtain an optimal order accepting decision, wherein an updating formula of the average reward reinforcement learning SMART algorithm is as follows:
where α represents the learning rate, m represents the current iteration index, rm(s, a, s') represents the immediate reward obtained after taking action a in state s, tm(s, a, s ') denotes the time for transition from state s to s', RmRepresents the cumulative return, p, for the mth decision periodmRepresents the average reward of the mth decision period,tmRepresenting the cumulative time of the mth decision period, the convergence of the average reward reinforcement learning SMART algorithm is guaranteed with a heuristic probability e that decreases with increasing number of simulation iterations, and α and e decay according to the DCM scheme:
where χ represents an arbitrarily large real number.
The SMART algorithm flow is as follows:
1. initializing m, Qm(s,a)、tm、rm、ρmIs 0, e-0.2 alpha-0.1, and order _ list 2]
2.While m<Maxsteps do
3. Calculate e according to DCM mechanismmAnd alpham
4. Randomly generating a number erandomIf em < erandomSelecting the action a with the largest state-action cost function if em>erandomThen randomly select action a in the action set
5. If a is a1,Q(s,a1)>Q(s,a2) When the order can be inserted into the current production plan in the current state, R-C-mu Y-N, and the order is added into the to-be-produced list order _ list; if a is a1,Q(s,a1)>Q(s,a2) And cannot be inserted into the current production plan in the current state, R ═ R-C- μ x Y-N; if a is a2,Q(s,a1)<Q(s,a2),r=-μ*J
6. Executing action a to obtain the next stage state s', rm(s,a,s′),tm(s,a,s′)
7. Updating state-action cost functions
8. If no search is taken, t is updatedm←tm+tm(s,a,s′),Rm+1←Rm+rm(s,a,s′),pm+1←Rm+1/tm+1Otherwise tm+1←tm,Rm+1←Rm,ρm+1←ρm
9. When the order is produced, selecting the order to be produced at the next moment in the order _ list by using a greedy algorithm, and deleting the selected order from the order _ list of the queue to be produced
10. Updating the decision stage m +1
The MTO enterprise order receiving method based on the average reward reinforcement learning increases order inventory cost and various customer priority factors on the basis of factors considered by the traditional MTO enterprise order receiving problem, constructs an order receiving model in a semi-Markov decision process, solves the order by using a SMART algorithm, and performs sequencing production on the received orders by using a greedy algorithm on the basis to maximize the long-term average income of an enterprise, so that the MTO enterprise order receiving method has high order receiving and selecting capability and good adaptability to environmental changes, can balance the profit orders and various costs to bring higher income for the MT0 enterprise, can also meet the personalized demand of a customer, and keeps close connection with the customer.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (7)
1. An MTO enterprise order receiving method based on average reward reinforcement learning is characterized in that: the method comprises the following steps:
the method comprises the following steps: assumption of order information
Supposing that an MTO enterprise produces through a single production line and n types of customer orders exist on the market, the order information comprises customer priority mu, price p, quantity Q, unit product production cost c, lead time LT and latest delivery time DT;
step two: determining a set of system states
According to step one, if there are n order types in the system, the system state can be represented by vector S: (μ, p, Q, LT, DT, T), where T represents the production time still required for an order that has been accepted before the decision phase;
step three: determining a set of system actions
According to step one, when a customer order arrives, a decision to accept and reject the order needs to be made, and the set of actions in the model can be represented by the vector a ═ (a)1,a2) Is shown in the specification, wherein a1Indicating acceptance of an order, a2Indicating a rejection of the order;
step four: determining an immediate reward function
After the MTO enterprise makes a decision whether to accept an order, the obtained immediate return function is as follows:
in the formula, I ═ p × Q represents the profit for the order, C ═ C × Q represents the production cost consumed, Y represents the deferred penalty cost for the enterprise, N represents the cost for producing the inventory cost, and J represents the rejection cost for the order;
step five: building order acceptance model
Constructing an order receiving model in a half Markov decision process according to a system state set, a system action set and an immediate return function, simulating a real MTO enterprise order receiving problem based on an average reward reinforcement learning idea, wherein according to a Bellman optimal theorem, a corresponding optimal strategy in the half Markov decision process problem is as follows:
wherein Represents the average return, t, achieved during decision period mmRepresenting the time at which decision period m transitions from state s to state s';
step six: order acceptance model solution
The method comprises the following steps of adopting reinforcement learning average reward as an evaluation target, solving an order accepting model in a half Markov decision process through an average reward reinforcement learning SMART algorithm, sequencing orders in the SMART algorithm by using a greedy algorithm to obtain an optimal order accepting decision, wherein an updating formula of the average reward reinforcement learning SMART algorithm is as follows:
where α represents the learning rate, m represents the current iteration index, rm(s, a, s') represents the immediate reward obtained after taking action a in state s, tm(s, a, s ') denotes the time for transition from state s to s', RmRepresents the cumulative return, p, for the mth decision periodmRepresents the average return, t, of the mth decision periodmRepresenting the cumulative time of the mth decision period.
2. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the first step, the order of the customer achieves the poisson distribution with the obedience parameter of lambda, and the price and the required quantity of the order are evenly distributed.
3. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the second step, based on the MTO enterprise with limited energy production, if T has the maximum upper limit value and n order types, the state set S of the system has n × T states.
4. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the fourth step, the three equations of r (s, a) represent the equation when Q (s, a) is from top to bottom1)>Q(s,a2) When, and at the current state, an order can be inserted into the current production plan, an immediate return is made equal to the net profit obtained by accepting the order, when Q (s, a)1)>Q(s,a2) When, but the order cannot be inserted into the current production plan in the current state, an order net profit equal to the loss is immediately returned, when Q (s, a)1)<Q(s,a2) The immediate return equals the rejection cost.
5. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the fourth step, the postponing penalty cost Y ═ μ × u { (T + Q/b) -LT }, where u denotes the postponing penalty cost per unit time and b denotes the unit production capacity of the enterprise.
6. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the fourth step, the product produced by the customer before the lead period is not taken in advance, so that the inventory cost N ═ Q × h { LT- (T + Q/b) } generated by temporarily storing the product in the MTO enterprise warehouse is caused, wherein h represents the unit product storage cost per unit time.
7. The MTO enterprise order acceptance method based on average reward reinforcement learning according to claim 1, wherein: in the sixth step, the exploratory probability e which is reduced along with the increase of the simulation iteration number is adopted to ensure the convergence of the SMART algorithm of the average reward reinforcement learning, and alpha and e are attenuated according to a DCM scheme:
where χ represents an arbitrarily large real number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110468897.0A CN113077188B (en) | 2021-04-28 | 2021-04-28 | MTO enterprise order accepting method based on average reward reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110468897.0A CN113077188B (en) | 2021-04-28 | 2021-04-28 | MTO enterprise order accepting method based on average reward reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113077188A true CN113077188A (en) | 2021-07-06 |
CN113077188B CN113077188B (en) | 2022-11-08 |
Family
ID=76619029
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110468897.0A Active CN113077188B (en) | 2021-04-28 | 2021-04-28 | MTO enterprise order accepting method based on average reward reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113077188B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113935586A (en) * | 2021-09-16 | 2022-01-14 | 杭州电子科技大学 | Cloud order dynamic receiving and scheduling method based on deep reinforcement learning |
CN118278693A (en) * | 2024-04-18 | 2024-07-02 | 暨南大学 | Industrial big data-based batch decision making method, service platform and medium for batch production system economy |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103246950A (en) * | 2012-10-30 | 2013-08-14 | 中国科学院沈阳自动化研究所 | Method for promising order of semiconductor assembly and test enterprise |
CN103927628A (en) * | 2011-08-16 | 2014-07-16 | 上海交通大学 | Order management system and order management method oriented to customer commitments |
CN110517002A (en) * | 2019-08-29 | 2019-11-29 | 烟台大学 | Production control method based on intensified learning |
CN111080408A (en) * | 2019-12-06 | 2020-04-28 | 广东工业大学 | Order information processing method based on deep reinforcement learning |
CN111126905A (en) * | 2019-12-16 | 2020-05-08 | 武汉理工大学 | Casting enterprise raw material inventory management control method based on Markov decision theory |
-
2021
- 2021-04-28 CN CN202110468897.0A patent/CN113077188B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103927628A (en) * | 2011-08-16 | 2014-07-16 | 上海交通大学 | Order management system and order management method oriented to customer commitments |
CN103246950A (en) * | 2012-10-30 | 2013-08-14 | 中国科学院沈阳自动化研究所 | Method for promising order of semiconductor assembly and test enterprise |
CN110517002A (en) * | 2019-08-29 | 2019-11-29 | 烟台大学 | Production control method based on intensified learning |
CN111080408A (en) * | 2019-12-06 | 2020-04-28 | 广东工业大学 | Order information processing method based on deep reinforcement learning |
CN111126905A (en) * | 2019-12-16 | 2020-05-08 | 武汉理工大学 | Casting enterprise raw material inventory management control method based on Markov decision theory |
Non-Patent Citations (2)
Title |
---|
王晓欢等: "基于强化学习的订单生产型企业的订单接受策略", 《系统工程理论与实践》 * |
郝鹃等: "基于平均强化学习的订单生产方式企业订单接受策略", 《计算机应用》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113935586A (en) * | 2021-09-16 | 2022-01-14 | 杭州电子科技大学 | Cloud order dynamic receiving and scheduling method based on deep reinforcement learning |
CN118278693A (en) * | 2024-04-18 | 2024-07-02 | 暨南大学 | Industrial big data-based batch decision making method, service platform and medium for batch production system economy |
Also Published As
Publication number | Publication date |
---|---|
CN113077188B (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113077188B (en) | MTO enterprise order accepting method based on average reward reinforcement learning | |
Lee et al. | A multiagent approach to $ q $-learning for daily stock trading | |
CN109636011A (en) | A kind of multishift operation plan scheduling method based on improved change neighborhood genetic algorithm | |
CN108550090A (en) | A kind of processing method and system of determining source of houses pricing information | |
CN109816315A (en) | Path planning method and device, electronic equipment and readable storage medium | |
WO2018161908A1 (en) | Product object processing method and device, storage medium and electronic device | |
CN110555578B (en) | Sales prediction method and device | |
CN116207739B (en) | Optimal scheduling method and device for power distribution network, computer equipment and storage medium | |
CN110046761A (en) | A kind of ethyl alcohol inventory's Replenishment Policy based on multi-objective particle | |
CN109961198A (en) | Related information generation method and device | |
CN109741083A (en) | A kind of material requirement weight predicting method based on enterprise MRP | |
KR102707077B1 (en) | Demand response management method for discrete industrial manufacturing system based on constrained reinforcement learning | |
CN111507673A (en) | Method and device for managing commodity inventory | |
CN110110226A (en) | A kind of proposed algorithm, recommender system and terminal device | |
CN115965169A (en) | Path planning method, intelligent device and computer readable storage medium | |
CN110888728B (en) | Task scheduling method of button cluster server | |
CN115334106A (en) | Microgrid transaction consensus method and system based on Q method and power grid detection and evaluation | |
CN117236824B (en) | Logistics scheduling method for agricultural product online transaction platform | |
CN113592240A (en) | Order processing method and system for MTO enterprise | |
Shakya et al. | A Deep Reinforcement Learning Approach for Inventory Control under Stochastic Lead Time and Demand | |
CN112950033A (en) | Reservoir dispatching decision method and system based on reservoir dispatching rule synthesis | |
CN115841286A (en) | Takeout delivery path planning method based on deep reinforcement learning | |
CN102542432B (en) | Inventory management system and method | |
CN110210885A (en) | Excavate method, apparatus, equipment and the readable storage medium storing program for executing of potential customers | |
CN110414875A (en) | Capacity data processing method, device, electronic equipment and computer-readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |