CN113837628B - Metallurgical industry workshop crown block scheduling method based on deep reinforcement learning - Google Patents

Metallurgical industry workshop crown block scheduling method based on deep reinforcement learning Download PDF

Info

Publication number
CN113837628B
CN113837628B CN202111142373.9A CN202111142373A CN113837628B CN 113837628 B CN113837628 B CN 113837628B CN 202111142373 A CN202111142373 A CN 202111142373A CN 113837628 B CN113837628 B CN 113837628B
Authority
CN
China
Prior art keywords
crown block
task
reinforcement learning
overhead traveling
crown
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111142373.9A
Other languages
Chinese (zh)
Other versions
CN113837628A (en
Inventor
冯凯
张云贵
马湧
梁青艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Iron and Steel Research Institute Group
Original Assignee
China Iron and Steel Research Institute Group
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Iron and Steel Research Institute Group filed Critical China Iron and Steel Research Institute Group
Publication of CN113837628A publication Critical patent/CN113837628A/en
Application granted granted Critical
Publication of CN113837628B publication Critical patent/CN113837628B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06313Resource planning in a project environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/08Logistics, e.g. warehousing, loading or distribution; Inventory or stock management
    • G06Q10/083Shipping
    • G06Q10/0838Historical data

Abstract

The invention discloses a metallurgical industry workshop overhead traveling crane dispatching method based on deep reinforcement learning, and belongs to the technical field of workshop overhead traveling crane dispatching. The invention comprises the following steps: (1) Acquiring the spatial layout of a cross region where the crown blocks are located in a metallurgical workshop and a historical crown block transportation task data table; (2) According to the cross-region space layout, an overhead traveling crane is used as an intelligent agent, the cross-region space is used as an environment, and a deep reinforcement learning model for overhead traveling crane dispatching is established; (3) Performing parameter optimization and training on the deep reinforcement learning model according to a historical overhead traveling crane transportation task data table; (4) And regularly acquiring the current position and state of the overhead travelling crane in the cross-region and the conditions of the transportation tasks which are being executed and are to be executed, generating an environment state input trained deep reinforcement learning model, and generating an overhead travelling crane dispatching scheme. The invention can generate a global optimization scheduling scheme in time aiming at the transportation tasks which are randomly generated or temporarily changed in the metallurgical industry workshop, improves the scheduling efficiency of the crown block, and has stronger robustness and effectiveness.

Description

Method for dispatching shop crown blocks in metallurgical industry based on deep reinforcement learning
Technical Field
The invention belongs to the technical field of shop overhead traveling crane dispatching, and particularly relates to a method for generating an overhead traveling crane dispatching scheme based on a deep reinforcement learning method in the face of an uncertain overhead traveling crane transportation task.
Background
In metallurgical enterprises, whether smelting workshops or storage workshops, crown blocks are the most important transportation tools. The overhead traveling crane scheduling is an important component of enterprise production management, and the overhead traveling crane scheduling efficiency is high and low, so that the production logistics efficiency and the connection matching among working procedures are influenced to a great extent. And reasonable crown block scheduling is very important for improving the overall benefit of metallurgical enterprises. Influenced by non-planning factors of the production process of the metallurgical industry, such as: the overhead crane transportation task in the metallurgical industry workshop has certain uncertainty due to time fluctuation of smelting processes, transportation plan change caused by equipment failure and the like. In the face of uncertain transportation tasks, the overhead traveling crane dispatcher basically can only temporarily make a corresponding dispatching scheme according to self accumulated experience or a certain fixed rule. The scheduling scheme can not avoid the problems of unbalanced load of the overhead travelling crane, more conflict times in the transportation process, low overall scheduling efficiency and the like.
The deep reinforcement learning is a framework for solving a complex sequential decision problem, and a dynamic and optimized overhead traveling crane scheduling scheme can be provided for an uncertain transportation task by training and learning the trial and error performance of a historical overhead traveling crane task and sensing spatial information in a workshop in real time. In the actual production process, how to realize fast and efficient crown block scheduling by using a deep reinforcement learning method is a problem needing further research.
Disclosure of Invention
The invention aims to provide a method for dispatching a crown block in a metallurgical industry workshop based on deep reinforcement learning, which aims to solve the problem that an optimized dispatching scheme of the crown block cannot be obtained in the face of uncertain transportation tasks. The method generates the overhead crane scheduling scheme of the metallurgical industry workshop based on the deep reinforcement learning, can effectively solve the problem of optimizing and scheduling the uncertain transportation tasks, improves the scheduling efficiency of the overhead crane, reduces the conflict probability of the transportation path of the overhead crane, and realizes the efficient completion of the transportation tasks.
The invention discloses a metallurgical industry workshop overhead traveling crane scheduling method based on deep reinforcement learning, which comprises the following steps of:
(1) Acquiring the spatial layout of a cross region where an overhead crane is located in a metallurgical workshop and a historical overhead crane transportation task data table;
(2) According to the cross-region space layout, a crown block is used as an agent, the cross-region space is used as an environment, and a deep reinforcement learning model is created;
(3) Performing parameter optimization and training on the deep reinforcement learning model according to a historical crown block transportation task data table;
(4) According to a certain time interval, acquiring the position and the state of the current overhead travelling crane in a cross-region at regular time, and the conditions of the transportation task which is being executed and is to be executed, and generating an environment state;
(5) Inputting the environmental state into the trained deep reinforcement learning model, outputting the operation actions to be executed by each overhead traveling crane, generating a current overhead traveling crane dispatching scheme, matching the overhead traveling cranes for the transportation tasks according to the current overhead traveling crane dispatching scheme, and designating the transportation paths of the overhead traveling cranes, and then continuing to execute the step 4 until all the transportation tasks in the cross-region are completed.
Further, in the step (1), the spatial layout of the bay area where the crown block is located refers to an overhead layout of a certain bay in a steel-making workshop, and the data to be acquired includes the length and the width of the bay area, and the relative distance between all stations in the bay area and the edge of the bay area in the transverse direction and the longitudinal direction.
Further, in the step (1), the historical data table of the transportation task of the overhead travelling crane comprises a task number, a starting station, a target station, a starting time and an ending time.
Further, in the step (2), the deep reinforcement learning model adopts a DQN algorithm to design a feedback mechanism frame of "action-environment state-reward", each overhead traveling crane in a span area is abstracted into a single intelligent agent, the span area where the overhead traveling crane is located is abstracted into an environment state, the action of the intelligent agent is an operation action of the overhead traveling crane, and states observed by the intelligent agent are task information and state information of all overhead traveling cranes in the span area; the environment state comprises the positions of a task starting station and a task finishing station in a bay area, and the positions and the states of all crown blocks in the same bay area; the set reward function is the reward value when the crown block executes different operations under the conditions of no load and full load, and the reward value is fed back to the intelligent agent in an immediate reward mode.
Further, in the step (2), the information required by the environment state includes positions of the task start station and the task end station in the span, and positions and states of all crown blocks in the same span. The environment state is expressed as a 3 × N matrix; the number of columns N of the matrix is a positive integer, and the cross-region space is represented as N positions. The first, second and third rows of the matrix are the relative positions of the task start station, the crown block and the task end station in the span, respectively. When a crown block transportation task is generated, the value of the corresponding position of the starting station and the end station in the matrix is the serial number of the task. The position value of the crown block corresponding to the second row of the matrix is the number of the crown block; the position values of the first row and the third row of the matrix without tasks are 0, and the position value of the second row of the matrix without the overhead travelling crane is 0. After the overhead traveling crane hoists the task, the position value of the task start station becomes 0, and the position value of the overhead traveling crane becomes a combination of the overhead traveling crane number as an integer and the task number as a decimal. When the crane puts down the task, the position value of the station at the task end point becomes 0, and the position value of the crown block becomes the number of the crown block.
Further, in the step (2), the operation of the crown block comprises 5 actions of moving left, moving right, standing, hoisting and placing the ladle, which are respectively indicated by 0,1,2,3 and 4.
Compared with the prior art, the invention has the advantages and positive effects that: according to the method, a crown block dispatching depth reinforcement learning model is constructed and trained according to historical crown block transportation tasks, reasonable research is carried out, each crown block in a cross-region is abstracted into a single intelligent body, task information, crown block positions and state information are expressed as environment states, and a crown block dispatching model is trained and generated through DQN, so that a crown block real-time dispatching strategy is realized. The crown block scheduling model realized by the method can generate a globally optimized scheduling scheme in time aiming at randomly generated or temporarily changed transportation tasks, and meets the efficiency requirement of online application. The method ensures the robustness and the effectiveness of the crown block dispatching model in the actual production scene on the basis of improving the dispatching efficiency of the crown block.
Drawings
Fig. 1 is a schematic flow diagram of a method for scheduling a crown block in a metallurgical industry workshop based on deep reinforcement learning according to an embodiment of the present invention;
FIG. 2 is an environment state matrix of a current crown block position and task in an embodiment of the present invention;
FIG. 3 is an environmental state matrix after a crown block has been hoisted for a mission in an embodiment of the present invention;
fig. 4 is a training process of a double crown block in the embodiment of the present invention.
Detailed Description
To make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, a method for scheduling a crown block in a steelmaking workshop based on dynamic area allocation according to an embodiment of the present invention includes the following five steps, and the implementation of each step is described below.
Step 1, obtaining the spatial layout of a cross region where an overhead traveling crane is located in a metallurgical workshop, and meanwhile obtaining a historical overhead traveling crane transportation task data table.
In this embodiment, according to the span where the actual crown block is located, the length and the width of the span, the distance between all stations in the span and the edge of the span, and the relative distance between the stations in the span and the transverse direction and the longitudinal direction are collected. The spatial layout of the bay where the crown block is located refers to the overlooking layout of a certain bay in a steel-making workshop, and the data to be acquired comprises the length and the width of the bay, the distance between all stations in the bay and the edge of the bay, and the relative distance between the stations in the bay and the edge of the bay in the transverse direction and the longitudinal direction. A certain span in the steelmaking workshop, such as a raw material span, a molten steel receiving span, a refining span and the like.
The historical overhead traveling crane transportation task data table comprises a task number, a starting station, a target station, task starting time and task ending time.
And 2, according to the cross-region space layout, taking the overhead travelling crane as an intelligent agent and taking the cross-region space as an environment, and creating a model framework for deep reinforcement learning.
The Deep reinforcement learning model frame established by the invention is a feedback mechanism frame of 'action-environment state-reward' designed by adopting a DQN (Deep Q-Network) algorithm, and specifically, each crown block is abstracted into a single intelligent body, the cross region where the crown block is positioned is abstracted into an environment state, the action of the intelligent body is the operation action of the crown block, and the state observed by the intelligent body is task information and state information of all crown blocks in the cross region. And the overhead travelling crane is guided to efficiently complete the overhead travelling crane task by designing a reward function.
The environment state comprises the positions of the task starting station and the task ending station in the bay, and the positions and the states of all crown blocks in the same bay, and is represented as a 3 multiplied by N matrix. The first, second and third rows of the matrix are the relative positions of the task start station, the crown block and the task end station in the span, respectively. N is a positive integer, and is the number of positions for dividing the cross area according to the space size of the cross area and the distribution of the stations, and represents N positions. Generally, 5-10 m of the cross-region space can be used as a position for space size scaling, and different scaling ratios can be selected according to specific application scenes.
According to the embodiment of the invention, the cross-region spatial layout of the crown block is converted into a 3x 30 matrix by scaling the relative positions in equal proportion, so as to represent the environmental state of the deep reinforcement learning model frame. The length of the span is equal to the number of columns by 30 x the reduction scale. The embodiment of the invention directly reflects the station information, the state of the crown block, the task being executed and the task to be executed in the cross area through the environment state matrix of the matrix of 3 multiplied by 30.
As shown in fig. 2, the first, second and third rows of the matrix are the relative positions of the task start station, the crown block and the task end station in the span, respectively. When a task is generated, the value of the position corresponding to the start station and the end station in the matrix is a task number, the value of the task number is greater than 0, and the value of the position without the task is 0. In fig. 2, there are 3 crown blocks in a bay, the crown block 1 is located at position 3, the crown block 2 is located at position 6, the crown block 3 is located at position 26, the three crown blocks are idle in the current state, there are two tasks in the current environment state, the starting station of task 1 is located at position 2, the ending station is located at position 5, the starting station of task 2 is located at position 26, and the ending station is located at position 7. After the overhead traveling crane hoists the task, the value of the position corresponding to the starting station becomes 0, and the value of the position of the overhead traveling crane becomes a combination of the number of the overhead traveling crane as an integer and the number of the task as a decimal, as shown in fig. 3. In fig. 3, the overhead traveling vehicle No. 1 moves to the position 2, and the position value of the overhead traveling vehicle No. 1 for the hoisting task 1,1 becomes 1.1. The operation of the overhead traveling crane to drop the task is the reverse process, for example, the overhead traveling crane 1 drops the task 1, at this time, the vehicle 1 moves to the position 5, the value of the terminal station of the task 1 becomes 0, and the value of the position of the vehicle 1 becomes 1.
Meanwhile, the invention standardizes the operation actions of the crown block to 5 types, namely left moving, right moving, static moving, ladle lifting and ladle releasing, which are sequentially represented by 0,1,2,3, 4.
The parameters of the deep reinforcement learning model comprise a Q network structure, model factors and reward functions, wherein the Q network structure parameters comprise the number of convolution layers, the number of full link layers, the size of a convolution kernel, the number of neurons of the full link layers and activation functions; the model factors comprise reward discount factors, experience pool size, exploration rate and learning rate; the reward function R comprises reward values of hoisting, placing, standing and moving of the crown block under the conditions of no load and full load.
The reward function R of the invention is an evaluation rule of the environment on the current action taken by the intelligent agent, and is fed back to the intelligent agent in an immediate reward mode, and is an important guide signal for the intelligent agent to learn and improve the strategy. Generally, the total time for completing all tasks is the minimum scheduling target of the overhead travelling crane, and the reward value can be additionally reduced by 0.5 after the overhead travelling crane performs one action. Thus, the overhead traveling crane needs to complete all tasks with the least actions, and the total time for completing the tasks is the least.
The action value network (Q network) in the DQN algorithm of the invention is represented as Q π (s, a) in particular form Q π (s, a | theta), wherein s is an environment state, a is the action of the agent, and theta is a parameter of the Q network, and the optimal strategy for iteratively solving the action value network is as follows:
Figure BDA0003284431030000041
Figure BDA0003284431030000042
wherein: theta.theta. k+1 For parameters of the (k + 1) th iteration Q network;θ k Parameters for the kth iteration Q network; alpha is learning rate, and the value range is 0-1; r is t The immediate reward at the moment t can be obtained by calculation through a set reward function R; gamma is a reward discount factor with the value range of 0 to 1; q π (s t ,a tk ) For the Q network at time t in the kth iteration, s t Is the environmental state at time t, a t Is the agent action at time t; q π (s t+1 ,a t+1k ) A target Q network at the next moment in the kth iteration; s is t+1 ,a t+1 The environmental state and the action at the next moment; theta.theta. k Parameters of the target Q network are iterated for the kth time. V is obtained as a gradient. V θ Q π (s t ,a tk ) Q network Q for t time in k iteration π (s t ,a tk ) Gradient at parameter θ.
π * (as) is the agent action that seeks the Q network output such that the total reward value for the agent action sequence is maximized.
All the crown blocks in the cross area in each iteration process complete all the transportation tasks, and the Q network of each crown block obtains the environmental state s of the current time from the starting time to the ending time in one iteration process t Outputting the execution action a t To obtain an immediate reward r t After one iteration is completed, the total time for task completion and the total reward value may be obtained.
And 3, performing parameter optimization and training on the deep reinforcement learning model.
In this embodiment, the data table of the historical transportation task of the overhead traveling crane includes a task number, a start station, a target station, a start time, and an end time, as shown in table 1. When the task numbers are distributed by the transportation tasks in the actual production process, the task numbers are automatically generated, and the historical transportation task list is sorted from small to large according to the task numbers.
TABLE 1 historical crown block transportation task table
Task numbering Starting time Completion time Initial station Destination station
001 St001 Et001 Station A Station B
002 St002 Et002 Station B Station C
003 St003 Et003 Station C Station D
In this embodiment, the deep reinforcement learning model is trained based on historical data of the transportation task of the overhead travelling crane. The training process for a double crown block is shown in fig. 4. The condition of multiple crown blocks is similar to the training process of double crown blocks, each crown block is an agent with the same structure, at the moment t, the agents with the same structure are alternately trained in sequence, the environmental state observed by the next crown block is obtained based on the operation executed by the previous crown block, the immediate rewards of all crown blocks at the moment t are summed and stored in a memory pool as the comprehensive immediate reward, and the memory pool is used for updating the Q network.
In fig. 4, (1) to (9) show the sequence of two crown block Q network training steps. In the step (1), the crown block 1 obtains t time and observes the environmental state
Figure BDA0003284431030000051
The Q network of the crown block 1 in the step (2) is input according to the input
Figure BDA0003284431030000052
Outputting the action of the crown block 1 at time t
Figure BDA0003284431030000053
In the step (3), the crown block 1 compares the current environment state and action with the currently observed environment state of the crown block 2
Figure BDA0003284431030000054
Storing the data into a memory pool of the overhead travelling crane 1; and (4) acquiring the environmental state of the current t moment by the Q network of the crown block 2
Figure BDA0003284431030000055
Step (5), the Q network of the crown block 2 outputs the current action
Figure BDA0003284431030000056
Step (6), the crown block 2 makes the current environment state and action and the environment state observed by the crown block 1 at the next moment
Figure BDA0003284431030000057
Figure BDA0003284431030000058
Storing the data into a memory pool of the crown block 2;
Figure BDA0003284431030000059
the environmental state observed by the crown block 1 at time t + 1. In step (7), calculating the comprehensive immediate reward of two Q network outputs of the crown block 1 and the crown block 2 according to the environment feedback, r t 1 Awarding the crown block 1 at the moment t; r is a radical of hydrogen t 2 Awarding the crown block 2 at the moment t; comprehensive immediate reward r of two crown blocks at time t t =r t 1 +r t 2 . And (8) storing the calculated comprehensive immediate rewards into respective memory pools of the crown blocks respectively. In step (9), the weights of all possible action combinations of the crown block are updated according to the comprehensive immediate reward. In the iterative process, the experience updating Q network is extracted from the memory pool in a timed mode.
And 4, obtaining an optimized deep reinforcement learning model.
In this embodiment, the deep reinforcement learning model parameters, including the Q network structure, the model parameters, and the reward function, are adjusted and optimized according to the training result of the historical data of the transportation task of the overhead travelling crane. The optimized result is as follows:
the Q network structure is composed of 2 convolutional layers and 2 fully-connected layers, wherein the convolutional cores of the convolutional layers are 3x3, the neurons of the fully-connected layers are 2048, and the ReLU function is adopted as an activation function. In the model factors, the learning rate is 0.0005, the discount factor is 0.8, the size of the experience pool is 100 ten thousand, and the exploration rate is 0.3. The bonus function settings are shown in table 2.
Table 2 reward function setting table
Figure BDA0003284431030000061
And step 5, acquiring the position and the state of the current crown block in the cross-region and the conditions of the transportation tasks being executed and to be executed according to a certain time interval, generating a corresponding environment state vector, inputting the environment state vector into the trained deep reinforcement learning model, outputting a current crown block scheduling scheme, namely the action to be executed by each crown block, matching the crown blocks for the transportation tasks to be executed according to the crown block scheduling scheme, and designating a transportation path.
In the embodiment, in the practical application process, firstly, a trained and optimized crown block dispatching depth reinforcement learning model is deployed; secondly, collecting the position and the state of a crown block in a cross area at the current moment, and a starting station and a target station of a transportation task which is being executed and is to be executed; and thirdly, updating the environment state of the model according to the information obtained in the second step, generating an overhead crane scheduling scheme after the deep reinforcement learning model reads the environment state, and indicating the overhead crane distributed for the task to be executed and the moving process of the overhead crane required to perform the transportation task by updating the position of the overhead crane and lifting/lowering operation in the environment state. When indicating the movement of the overhead traveling crane, the traveling track of the overhead traveling crane, the possible avoidance or waiting, etc. will be considered.
Therefore, on the time line, the second step and the third step in the process are continuously repeated at certain time intervals until all transportation tasks in the cross-region are completed, and the optimized crown block dispatching scheme is dynamically generated for the transportation tasks which are randomly generated or temporarily changed by utilizing a deep reinforcement learning model.
The prior art provides a crown block dispatching simulation method based on reinforcement learning, needs a large number of steps of intelligent agent modeling, rule construction, strategy construction and the like, and is not suitable for practical application scenes. The existing simulation method can only carry out simulation and comparison of various scheduling schemes aiming at a task set specified at a certain moment, thereby selecting an optimal scheme. Under actual production conditions, tasks are continuously and continuously generated, and if a large number of scheduling schemes need to be generated each time, the efficiency requirements of online application cannot be met obviously when comparison and optimization are carried out. Compared with the prior art, the method for dispatching the overhead travelling crane based on the deep reinforcement learning model saves a large number of steps of intelligent body modeling, construction rules, strategies and the like, has universality in the aspect of scene applicability, and can be directly used for constructing and using the actual overhead travelling crane dispatching model. The method of the invention utilizes the historical task to train the crown block dispatching model, the trained optimization model can be directly transplanted to the crown block dispatching system, the dispatching scheme can be given through one-time operation, and the requirement of online application is met without training or simulation comparison again.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (8)

1. A metallurgical industry workshop overhead traveling crane scheduling method based on deep reinforcement learning is characterized by comprising the following steps:
step 1, acquiring the spatial layout of a cross region where a crown block is located in a metallurgical workshop and a historical crown block transportation task data table;
the transportation task data comprises a task number, a starting station, a target station, a starting time and an ending time;
step 2, according to the cross-region space layout, a crown block is used as an intelligent agent, the cross-region space is used as an environment, and a deep reinforcement learning model is created;
the deep reinforcement learning model adopts a DQN algorithm to design a feedback mechanism frame of 'action-environment state-reward', each crown block in a cross region is abstracted into a single intelligent body, the cross region where the crown block is located is abstracted into an environment state, the action of the intelligent body is the operation action of the crown block, and the states observed by the intelligent body are task information and state information of all crown blocks in the cross region; the environment state comprises the positions of a task starting station and a task finishing station in a bay area, and the positions and the states of all crown blocks in the same bay area; the set reward function is a reward value when the crown block executes different operations under the conditions of no load and full load, and the reward value is fed back to the intelligent agent in an immediate reward mode;
step 3, performing parameter optimization and training on the deep reinforcement learning model according to a historical crown block transportation task data table;
step 4, acquiring the position and the state of the current crown block in the cross-region at regular time, and transportation task data which are being executed and are to be executed, and generating an environment state;
and 5, inputting the environmental state into the trained deep reinforcement learning model, outputting the operation action to be executed by each overhead traveling crane, generating a current overhead traveling crane dispatching scheme, matching the overhead traveling cranes for the transportation tasks according to the current overhead traveling crane dispatching scheme, and designating the transportation paths of the overhead traveling cranes, and continuing to execute the step 4 until all the transportation tasks in the cross-region are completed.
2. The method for dispatching the crown blocks in the metallurgical industry workshop based on the deep reinforcement learning as claimed in claim 1, wherein the spatial layout of the bay where the crown block is located obtained in step 1 refers to obtaining an overhead layout of the bay in the steelmaking workshop, obtaining the length and width of the bay, and obtaining the relative distance between all stations in the bay and the edge of the bay in the transverse direction and the longitudinal direction.
3. The method for scheduling the crown block in the metallurgical industry workshop based on the deep reinforcement learning as claimed in claim 1, wherein in the step 2, the environmental state is expressed as a 3x N matrix; the column number N of the matrix is a positive integer, and the trans-regional space is represented as N positions; the first, second and third rows of the matrix are the relative positions of a task starting station, a crown block and a task end station in a cross area respectively; when a crown block transportation task is generated, the value of the corresponding position of the starting station and the terminal station in the matrix is the serial number of the task; the position value of the crown block corresponding to the second row of the matrix is the number of the crown block; the position values of the first row and the third row of the matrix without tasks are 0, and the position value of the second row of the matrix without overhead travelling cranes is 0; after the overhead traveling crane hoists the task, the position value of the starting station of the task becomes 0, the position value of the overhead traveling crane becomes the combination of the number of the overhead traveling crane as an integer and the number of the task as a decimal; when the crane puts down the task, the position value of the station at the task end point becomes 0, and the position value of the crown block becomes the number of the crown block.
4. The method for dispatching the crown blocks in the metallurgical industry workshop based on the deep reinforcement learning as claimed in claim 1, wherein in the step 2, the operating actions of the crown blocks are specified to be 5, namely, left-moving, right-moving, static, hoisting and ladle-placing, which are sequentially represented by 0,1,2,3,4.
5. The method for dispatching the crown blocks in the metallurgical industry workshop based on the deep reinforcement learning as claimed in claim 1 or 4, wherein the reward function comprises reward values of the crown blocks during carrying out ladle lifting, ladle placing, standing and moving under the conditions of no load and full load.
6. The metallurgical industry workshop overhead traveling crane dispatching method based on deep reinforcement learning of claim 5, wherein the reward function is that the reward value is additionally reduced by 0.5 after the overhead traveling crane performs an action once.
7. The method for dispatching overhead traveling cranes in a metallurgical industry workshop based on deep reinforcement learning according to claim 1, wherein in the step 2, the optimal strategy of the action value network, namely the Q network, of each overhead traveling crane is solved iteratively as follows:
Figure FDA0003284431020000021
Figure FDA0003284431020000022
wherein Q is π (s, a | θ) represents the Q network, s is the environmental state, a is the action of the agent, and θ is a parameter of the Q network; theta.theta. k+1 Parameters for the (k + 1) th iteration Q network; theta k Parameters of a kth iteration Q network; alpha is learning rate, and the value range is 0-1; gamma is a reward discount factor with the value range of 0 to 1; s is t 、s t+1 The environmental states at the time t and the time t +1 respectively; a is t 、a t+1 The intelligent body actions at the time t and the time t +1 respectively; r is t An immediate reward for time t; q π (s t ,a tk ) Is a Q network at the t moment in the kth iteration; q π (s t+1 ,a t+1k ) A target Q network at the t +1 moment in the kth iteration is obtained; theta.theta. k Parameters of a target Q network in the k iteration;
Figure FDA0003284431020000023
q network Q for t time in k iteration π (s t ,a tk ) A gradient at parameter θ; pi * (a | s) represents the agent action to find the Q network output such that the total reward for the agent action sequence is maximized;
all the crown blocks in the cross area complete all the transportation tasks in each iteration process, the Q network of each crown block obtains the environmental state of the current time from the starting time to the ending time in one iteration process, the execution action is output, the immediate reward is obtained, and the total time and the total reward for completing the tasks are obtained after one iteration is completed.
8. The method for dispatching the crown blocks in the metallurgical industry workshop based on the deep reinforcement learning according to claim 1, wherein in the step 3, the number of crown blocks in a cross area is larger than 1, each crown block is modeled as an agent with the same structure, at the moment t, the agents are alternately trained in sequence, the environmental state observed by the next crown block is obtained based on the operation executed by the previous crown block, and the immediate rewards of all crown blocks at the moment t are summed and stored in a memory pool as a comprehensive immediate reward.
CN202111142373.9A 2021-09-16 2021-09-28 Metallurgical industry workshop crown block scheduling method based on deep reinforcement learning Active CN113837628B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111096536 2021-09-16
CN2021110965364 2021-09-16

Publications (2)

Publication Number Publication Date
CN113837628A CN113837628A (en) 2021-12-24
CN113837628B true CN113837628B (en) 2022-12-09

Family

ID=78966970

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111142373.9A Active CN113837628B (en) 2021-09-16 2021-09-28 Metallurgical industry workshop crown block scheduling method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN113837628B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115640986B (en) * 2022-12-13 2023-03-28 北京云迹科技股份有限公司 Robot scheduling method, device, equipment and medium based on rewards

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633772A (en) * 2021-01-05 2021-04-09 东华大学 Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop
CN112884239A (en) * 2021-03-12 2021-06-01 重庆大学 Aerospace detonator production scheduling method based on deep reinforcement learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633772A (en) * 2021-01-05 2021-04-09 东华大学 Multi-agent deep reinforcement learning and scheduling method for textile fabric dyeing workshop
CN112884239A (en) * 2021-03-12 2021-06-01 重庆大学 Aerospace detonator production scheduling method based on deep reinforcement learning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于免疫遗传算法的车间天车调度仿真模型;郑忠等;《系统工程理论与实践》;20130115(第01期);全文 *
时空约束下连铸车间天车调度的多目标建模与求解;高小强等;《系统工程理论与实践》;20170925(第09期);全文 *

Also Published As

Publication number Publication date
CN113837628A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN112734172B (en) Hybrid flow shop scheduling method based on time sequence difference
Yu et al. Optimizing task scheduling in human-robot collaboration with deep multi-agent reinforcement learning
CN111985672B (en) Single-piece job shop scheduling method for multi-Agent deep reinforcement learning
CN113837628B (en) Metallurgical industry workshop crown block scheduling method based on deep reinforcement learning
CN112446642A (en) Multi-crown-block scheduling optimization method and system
Liu et al. An improved genetic algorithm with modified critical path-based searching for integrated process planning and scheduling problem considering automated guided vehicle transportation task
CN116500986A (en) Method and system for generating priority scheduling rule of distributed job shop
CN112348314A (en) Distributed flexible workshop scheduling method and system with crane
CN111353646A (en) Steel-making flexible scheduling optimization method with switching time, system, medium and equipment
CN107357267B (en) The method for solving mixed production line scheduling problem based on discrete flower pollination algorithm
Zhao et al. Model and heuristic solutions for the multiple double-load crane scheduling problem in slab yards
CN112732436A (en) Deep reinforcement learning acceleration method of multi-core processor-single graphics processor
Hirashima et al. A Q-learning for group-based plan of container transfer scheduling
Hani et al. Simulation based optimization of a train maintenance facility
CN106019940A (en) UKF (Unscented Kalman Filter) neural network-based converter steelmaking process cost control method and system
CN114237222A (en) Method for planning route of delivery vehicle based on reinforcement learning
Zeng et al. A method integrating simulation and reinforcement learning for operation scheduling in container terminals
CN116957177A (en) Flexible workshop production line planning method, system, equipment and medium
CN106865418A (en) A kind of control method of coil of strip reservoir area loop wheel machine equipment
Li et al. An Efficient 2-opt Operator for the Robotic Task Sequencing Problem
Fattahi et al. A hybrid genetic algorithm and parallel variable neighborhood search for jobshop scheduling with an assembly stage
CN114675647A (en) AGV trolley scheduling and path planning method
JP3347006B2 (en) Planning device and planning method
Yuan et al. Flexible Assembly Shop Scheduling Based on Improved Genetic Algorithm
Roh et al. Optimal scheduling of block lifting in consideration of the minimization of traveling distance while unloaded and wire and shackle replacement of a gantry crane

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant