CN114266518A - Material scheduling method, model training method and device - Google Patents

Material scheduling method, model training method and device Download PDF

Info

Publication number
CN114266518A
CN114266518A CN202111569356.3A CN202111569356A CN114266518A CN 114266518 A CN114266518 A CN 114266518A CN 202111569356 A CN202111569356 A CN 202111569356A CN 114266518 A CN114266518 A CN 114266518A
Authority
CN
China
Prior art keywords
sample
state information
material scheduling
parameters
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111569356.3A
Other languages
Chinese (zh)
Inventor
袁晓敏
解鑫
李飞
刘颖
许铭
刘建林
徐进
金莹
张金义
陈凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202111569356.3A priority Critical patent/CN114266518A/en
Publication of CN114266518A publication Critical patent/CN114266518A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a material scheduling method, a model training method and a device, and relates to the technical field of artificial intelligence, in particular to the technical field of simulation control and reinforcement learning. The specific implementation scheme is as follows: acquiring state information corresponding to a target port; determining a material scheduling parameter set matched with the state information based on the state information and the trained port material scheduling model; and executing port material scheduling operation based on each material scheduling parameter in the material scheduling parameter set. This implementation mode can improve efficiency and the precision of harbour material scheduling.

Description

Material scheduling method, model training method and device
Technical Field
The disclosure relates to the technical field of artificial intelligence, further relates to the technical field of simulation control and reinforcement learning, and particularly relates to a material scheduling method, a model training method and a device.
Background
At present, in a port environment, when material scheduling is performed, the following procedures need to be executed: train arrival-train unloading-temporary material storage-ship loading.
In the above material scheduling process, each parameter of the port operation needs to be adjusted. However, the current adjusting method for each parameter of port operation depends on manual adjustment, which results in low efficiency and poor accuracy of material scheduling.
Disclosure of Invention
The disclosure provides a material scheduling method, a model training method and a device.
According to an aspect of the present disclosure, there is provided a material scheduling method, including: acquiring state information corresponding to a target port; determining a material scheduling parameter set matched with the state information based on the state information and the trained port material scheduling model; and executing port material scheduling operation based on each material scheduling parameter in the material scheduling parameter set.
According to another aspect of the present disclosure, there is provided a model training method, including: obtaining sample state information; performing the following model training steps on the sample state information: determining a sample material scheduling parameter set matched with the sample state information based on the sample state information and the model to be trained; determining a reward value based on the sample state information, the sample material scheduling parameter set and a preset reward function; and in response to the fact that the reward value meets the preset convergence condition, determining the model to be trained as the trained port material scheduling model.
According to another aspect of the present disclosure, there is provided an apparatus for material scheduling, comprising: the state acquisition unit is configured to acquire state information corresponding to the target port; the parameter determining unit is configured to determine a material scheduling parameter set matched with the state information based on the state information and the trained port material scheduling model; and the material scheduling operation unit is configured to execute the port material scheduling operation based on each material scheduling parameter in the material scheduling parameter set.
According to another aspect of the present disclosure, there is provided a model training apparatus including: a sample state acquisition unit configured to acquire sample state information; a model training unit configured to perform the following model training steps on the sample state information: determining a sample material scheduling parameter set matched with the sample state information based on the sample state information and the model to be trained; determining a reward value based on the sample state information, the sample material scheduling parameter set and a preset reward function; and in response to the fact that the reward value meets the preset convergence condition, determining the model to be trained as the trained port material scheduling model.
According to another aspect of the present disclosure, there is provided an electronic device including: one or more processors; a memory for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement any of the material scheduling methods or model training methods described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform any one of the material scheduling method or the model training method as described above.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method of material scheduling or a method of model training as any one of the above.
According to the technology disclosed by the invention, the material scheduling method is provided, and the efficiency and the accuracy of port material scheduling can be improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;
FIG. 2 is a flow diagram of one embodiment of a material scheduling method according to the present disclosure;
FIG. 3 is a schematic diagram of an application scenario of a material scheduling method according to the present disclosure;
FIG. 4 is a flow diagram of one embodiment of a model training method according to the present disclosure;
FIG. 5 is a flow diagram of another embodiment of a model training method according to the present disclosure;
FIG. 6 is a schematic structural diagram of one embodiment of a material scheduling apparatus according to the present disclosure;
FIG. 7 is a schematic block diagram of one embodiment of a model training apparatus according to the present disclosure;
fig. 8 is a block diagram of an electronic device for implementing a material scheduling method or a model training method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The terminal devices 101, 102, 103 interact with a server 105 via a network 104 to receive or send messages or the like. The terminal devices 101, 102, and 103 may be used in a material scheduling scenario, determine a corresponding material scheduling parameter set based on the state information of the target port, and control the corresponding port material scheduling device to execute corresponding port material scheduling operation by using each material scheduling parameter in the material scheduling parameter set. In practical application, the terminal devices 101, 102, and 103 may send the state information corresponding to the target port to the server 105 through the network 104, so that the server 105 determines a material scheduling parameter set based on the state information and the trained port material scheduling model, and returns the material scheduling parameter set to the terminal devices 101, 102, and 103, so that the terminal devices 101, 102, and 103 execute port material scheduling operations based on each material scheduling parameter in the material scheduling parameter set. Alternatively, in the model training phase, the terminal apparatuses 101, 102, and 103 may transmit the sample state information to the server 105 via the network 104 so that the server 105 performs model training based on the sample state information.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, a mobile phone, a computer, a tablet, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, for example, the server 105 may obtain the status information sent by the terminal devices 101, 102, 103, determine a material scheduling parameter set matching the status information based on the status information and the trained port material scheduling model, and return the material scheduling parameter set to the terminal devices 101, 102, 103 through the network 104. Or, in the model training phase, the server 105 may also receive the sample state information sent by the terminal devices 101, 102, and 103, and train the model to be trained by using the sample state information to obtain the trained port material scheduling model.
The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the material scheduling method or the model training method provided in the embodiment of the present disclosure may be executed by the terminal devices 101, 102, and 103, or may be executed by the server 105, and the device for material scheduling or the model training device may be disposed in the terminal devices 101, 102, and 103, or may be disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, a flow 200 of one embodiment of a material scheduling method according to the present disclosure is shown. The material scheduling method of the embodiment comprises the following steps:
step 201, obtaining state information corresponding to a target port.
In this embodiment, the executing entity (such as the terminal devices 101, 102, 103 or the server 105 in fig. 1) may obtain the status information corresponding to the target port in the locally stored data or other electronic devices that have established connection in advance. The destination port can be a terminal station for transferring the goods and materials transportation by land and sea, and in the process of dispatching the goods and materials at the port, the destination port can be used for assisting the work by various equipment at the port, unloading the goods and materials from the train and loading the goods and materials into the ship. The material may include, but is not limited to, coal, steel, minerals, etc., and this embodiment is not limited thereto. The number of the target ports may be one, or may be any number greater than or equal to one, which is not limited in this embodiment. The state information of the target port may include an operation state of the equipment for transporting the materials, a storage state of the equipment for storing the materials, and the like. The working state of the equipment for transporting materials can comprise the working states of the equipment such as a tipper, a belt conveyor, a stacker, a discharge trolley, a reclaimer, an activation feeder and a ship loader, and the working state can comprise various states such as whether the equipment is idle or not, the available time of the equipment and the like. The storage state of the equipment for storing the materials can comprise material information stored in a storage yard, material information stored in a silo and the like, and the material information can comprise information such as the weight, the stacking number, the types and the like of the materials.
In some optional implementations of this embodiment, the status information includes at least one of: stacking state information, tipper operating state information, belt operating state information, reclaimer operating state information and ship loader operating state information.
In this implementation manner, the stacking state information may include the number of material stacks, the weight of the material stacks, the serial numbers of the material stacks, the material category information corresponding to the stacks, and the like; the tippler operation state information can comprise the tippler number, the tippler operation starting time, whether the tippler starts to operate or not and the like; the belt operation state information can comprise a belt number, belt occupation time, whether the belt is idle or not and the like; the operating state of the reclaimer can comprise the serial number of the reclaimer, the starting time of the reclaimer, whether the reclaimer starts to operate or not and the like; the loader operation state information may include a loader number, a loader start operation time, whether the loader starts operation, and the like.
And step 202, determining a material scheduling parameter set matched with the state information based on the state information and the trained port material scheduling model.
In this embodiment, the executing body may input the state information into the trained port material scheduling model, so that the trained port material scheduling model generates a material scheduling parameter set corresponding to the state information, and executes port material scheduling operation according to each material scheduling parameter in the material scheduling parameter set. The trained port material scheduling model is a pre-trained model for simultaneously controlling a plurality of parameters, and specifically may adopt a DQN reinforcement learning model (Deep Q Network, a model that combines a neural Network and Q-learning), an operator-critical model (a model that is realized based on reinforcement learning), and the like, which is not limited in this embodiment.
In some optional implementations of this embodiment, each material scheduling parameter in the material scheduling parameter set at least includes one of the following: the system comprises freight train parameters, train carried material category parameters, belt parameters, tipper parameters, stacking parameters, unloader parameters, tripper parameters, reclaimer parameters, activation feeder parameters, ship loader parameters and ship entry list parameters.
In this implementation, the freight train parameters may include parameters such as a train number of the train, a vehicle type of the train, and the like; the train carried material category parameters can comprise material category parameters; the belt parameters can comprise parameters such as a belt number selected when the port material scheduling operation is executed, a belt starting operation time and the like; the tipper parameters can comprise parameters such as the number of the tipper, the starting time of the tipper and the like selected when the port material scheduling operation is executed; the stacking parameters can comprise parameters such as stacking numbers selected when port material scheduling operation is executed; the parameters of the unloader can comprise the serial number of the unloader selected when the port material scheduling operation is executed; the parameters of the activation feeder can comprise the serial number of the activation feeder selected when the port material scheduling operation is executed; the shipment machine parameters may include a shipment machine number selected when the port material scheduling job is performed, and the ship entry list parameters may include a shipment parameter selected when the port material scheduling job is performed.
And step 203, executing port material scheduling operation based on each material scheduling parameter in the material scheduling parameter set.
In this embodiment, after the execution main body determines to obtain the material scheduling parameter set, each flow of the port material scheduling job may be executed according to each material scheduling parameter in the material scheduling parameter set.
In some optional implementations of this embodiment, based on each material scheduling parameter in the material scheduling parameter set, executing the port material scheduling job may include: selecting a designated device for transporting the materials based on the material transportation parameters in the material scheduling parameter set; controlling the appointed equipment to transport the goods to the appointed storage place based on the goods storage parameters in the goods scheduling parameter set; and selecting specified equipment for pulling and loading the goods based on goods and materials loading parameters in the goods and materials scheduling parameter set, and loading the goods and materials of the specified storage place into the corresponding ship by using the specified equipment.
With continued reference to fig. 3, a schematic diagram of one application scenario of the material scheduling method according to the present disclosure is shown. In the application scenario of fig. 3, the executing entity may obtain a target port 301 that needs to perform port material scheduling, and obtain status information 302 of the target port 301, such as a car dumper operating status, a belt operating status, a stacking status, a material reclaimer operating status, a ship loader operating status, and the like. The state information 302 may be manually input, or may be obtained by acquiring sensing data of the target port 301 by preset sensing equipment and analyzing the sensing data, and the specific acquisition mode of the state information 302 is not limited in this embodiment. The executive may then input the state information 302 into the port materials scheduling model 303, resulting in the materials scheduling parameters 304 output by the model. The material scheduling parameters 304 may include freight train parameters, train carried material category parameters, belt parameters, tipper parameters, stacking parameters, unloader parameters, and the like. The executing body may control the material scheduling device to execute the port material scheduling job 305 according to the material scheduling parameter 304 by establishing a communication connection with the material scheduling device in the target port 301 in advance.
According to the material scheduling method provided by the embodiment of the disclosure, the whole process of port material scheduling operation can be considered integrally, and the port material scheduling model trained in advance is used for generating corresponding material scheduling parameters so as to schedule the port equipment to execute the port material scheduling operation according to the material scheduling parameters, so that the efficiency and the accuracy of port material scheduling are improved.
With continued reference to FIG. 4, a flow 400 of one embodiment of a model training method according to the present disclosure is shown. As shown in fig. 4, the model training method of the present embodiment may include the following steps:
step 401, obtaining sample state information.
In this embodiment, the sample state information may simulate the working state of equipment for transporting materials in the real port and the storage state of the equipment for storing materials, and for detailed description of the sample state information, reference is made to detailed description of state information corresponding to the target port, which is not described herein again.
The model training needs to be performed through multiple rounds of iteration, so that the obtained sample state information is initial state information, and after one round of model training is performed, the sample state information is updated, and the next round of model training is performed again until the model training is completed.
Step 402, performing the following model training steps on the sample state information: determining a sample material scheduling parameter set matched with the sample state information based on the sample state information and the model to be trained; determining a reward value based on the sample state information, the sample material scheduling parameter set and a preset reward function; and in response to the fact that the reward value meets the preset convergence condition, determining the model to be trained as the trained port material scheduling model.
In this embodiment, the model to be trained may be a DQN reinforcement learning model, and the executive agent may input the sample state information into the model to be trained to obtain a sample material scheduling parameter set output by the model to be trained. For detailed description of the sample material scheduling parameter set, please refer to the detailed description of the material scheduling parameter set, which is not described herein again. And then, the execution subject can determine the reward value based on the sample state information, the sample material scheduling parameter set and a preset reward function, if the reward value does not meet the preset convergence condition, the sample state information is updated, the model training step is repeatedly executed until the reward value meets the preset convergence condition, and the trained port material scheduling model is obtained. The model to be trained may also be other reinforcement learning models, which is not limited in this embodiment.
According to the model training method provided by the embodiment of the disclosure, the model to be trained can be trained through the sample state information and the preset reward function to obtain the port material scheduling model, so that the comprehensive scheduling of port material scheduling is realized, and the port material scheduling effect is improved.
With continued reference to FIG. 5, a flow 500 of another embodiment of a model training method according to the present disclosure is illustrated. As shown in fig. 5, the model training method of the present embodiment may include the following steps:
step 501, sample state information is obtained.
In this embodiment, the sample status information at least includes one of the following items: stacking sample state information, tipper sample operation state information, belt sample operation state information, reclaimer sample operation state information, and ship loader sample operation state information.
For detailed description of the sample state information, please refer to the detailed description of the state information of the target port; for detailed description of the stacking sample status information, please refer to the detailed description of the stacking status information; for detailed description of the sample operation state information of the car dumper, please refer to the detailed description of the operation state information of the car dumper; for a detailed description of the belt sample operation status information, please refer to the detailed description of the belt operation status information; for the detailed description of the sample operation state information of the reclaimer, please refer to the detailed description of the operation state information of the reclaimer; for a detailed description of the loader sample operation state information, refer to the detailed description of the loader operation state information.
Step 502, performing the following model training steps on the sample state information: determining a sample material scheduling parameter set matched with the sample state information based on the sample state information and the model to be trained; determining a reward value based on the sample state information, the sample material scheduling parameter set and a preset reward function; and in response to the fact that the reward value meets the preset convergence condition, determining the model to be trained as the trained port material scheduling model.
In this embodiment, each sample material scheduling parameter in the sample material scheduling parameter set at least includes one of the following: the system comprises a freight train sample parameter, a train carried material category sample parameter, a belt sample parameter, a tipper sample parameter, a stacking sample parameter, an unloader sample parameter, a reclaimer sample parameter, an activating feeder sample parameter, a ship loader sample parameter and a ship entry list sample parameter. For detailed description of each sample material scheduling parameter in the sample material scheduling parameter set, please refer to detailed description of each material scheduling parameter in the material scheduling parameter set, which is not described herein again.
And 503, in response to the fact that the reward value does not meet the preset convergence condition, updating the sample state information based on the simulation environment, and executing the model training step on the updated sample state information until a trained port material scheduling model is obtained.
In this embodiment, if the reward value does not satisfy the preset convergence condition, the sample state information is updated by using the simulation environment, and the model training step is repeatedly performed on the updated sample state information until the trained port material scheduling model is obtained. The simulation environment is used for performing simulation processing on a real scene of port material scheduling.
In some optional implementations of this embodiment, updating the sample state information based on the simulation environment includes: controlling the simulation environment to simulate port material scheduling operation based on the sample material scheduling parameter set to obtain a simulation environment after the simulation port material scheduling; and updating the sample state information based on the simulation environment after the port material dispatching is simulated.
In this implementation manner, the execution subject may obtain a sample material scheduling parameter set by using this round of training when one round of training is finished and the reward value does not satisfy the preset convergence condition, control each device in the simulation environment to perform port material scheduling operation according to the sample material scheduling parameter, and update the sample state information based on the simulation environment after port material scheduling simulation. By adopting the mode of simulating and updating the sample state information in the simulation environment, the updating of the sample state information can be more consistent with the real situation of port material scheduling operation, and thus the training effect of the model can be improved.
And, the port material scheduling operation may include a discharging operation and a pulling operation, and for the discharging operation, the stacking sample state information may be updated based on the following formula:
Figure BDA0003423056350000091
wherein, WiIndicating the updated number of stacked samples, Wi-1Indicating the number of stacked samples before the update,
Figure BDA0003423056350000092
representing the amount of material unloaded by the n dump trucks.
And updating the operation state information of the tippler sample based on the following formula:
sti=sti-1+t_wt
wherein stiIndicating the updated start time of the tippler, sti-1And t _ wt represents the operation starting time of the tippler before updating, and the operation time of the tippler before updating.
And the belt sample starting time can be equal to the updated tippler starting time.
For pull operations, the stack sample status information may be updated based on the following formula:
Figure BDA0003423056350000101
wherein, WiIndicating the updated number of stacked samples, Wi-1Indicating the number of stacked samples before the update,
Figure BDA0003423056350000102
the number of the materials pulled by the n pulling trucks is shown.
And updating the operation state information of the material taking machine or the activated feeder sample based on the following formula:
sti=sti-1+b_wt
wherein stiIndicating the updated start time, st, of the reclaimer or the activated feederi-1The operation starting time of the reclaimer or the activated feeder before updating is shown, and the b _ wt represents the operation time length of the reclaimer or the activated feeder before updating.
And the belt occupation time can be equal to the updated start operation time of the tippler.
For example, when the simulated environment simulation unloading scene is controlled according to the sample material scheduling parameter set, parameters such as the number of trains, the material category corresponding to each train, the arrival time of each train, the type of each train, the number of trains in an entering list and the like can be determined according to the sample material scheduling parameter set, and corresponding resources such as a tipper, a belt, an unloader, a stacker and the like are selected for the unloading operation of each train. Each belt and each tippler have corresponding operation starting time and operation duration (which can be determined through a preset corresponding relation table of vehicle attributes and operation duration) of a corresponding vehicle type, and operation ending time can be obtained based on the operation starting time and the operation duration. For example, if the end time of a certain belt and car dumper is 6:38, the next time the belt and car dumper start to operate is 6: 39.
In other optional implementation manners of this embodiment, based on the sample material scheduling parameter set, the simulation environment is controlled to simulate the port material scheduling operation, and the simulation environment after the port material scheduling is simulated is obtained, which includes: configuring operation parameters of target equipment in a simulation environment based on a sample material scheduling parameter set and preset constraint conditions; and controlling the target equipment to operate according to the operation parameters to obtain the simulation environment after the material scheduling of the simulated port.
In this implementation manner, the target device may be each device that simulates a port material scheduling operation in a simulation environment, and may include, but is not limited to, a train, a belt conveyor, a tipper, an unloader, a dump truck, a cargo-pulling ship, a reclaimer, an activation feeder, and the like, which is not limited in this embodiment. The preset constraint conditions are used for limiting the selection of each device in the simulation environment so as to realize the reasonable scheduling of each device in the simulation environment. The execution main body can select target equipment which needs to be called in the current round of simulated port material scheduling operation in the simulation environment according to preset constraint conditions and a sample material scheduling parameter set, and configures operation parameters of the target equipment, so that the target equipment operates according to the corresponding operation parameters, and the simulation environment after the simulated port material scheduling is obtained.
In other optional implementations of this embodiment, the preset constraint condition at least includes: the target device is an available device; and/or the working time of the target equipment meets a preset time condition; and/or the device type of the target device is matched with the sample material scheduling parameters in the sample material scheduling parameter set.
In this implementation, the available device may be a device that can be selected in an idle state, and by restricting the target device to be the available device, the determination reasonableness of the device may be improved. The working time of the target device may include a starting working time, an arrival time, a vehicle-to-vehicle time, and the like of the target device, which is not limited in this embodiment. The preset time condition can include that the starting operation time of the target equipment is greater than or equal to the sum of the arrival time of the target equipment and the vehicle-aligning time, or the preset time condition can also include that the sum of the starting operation time of the target equipment and the vehicle-aligning time is in a preset time range, and by adopting the constraint on the time range, the real port material scheduling operation condition in different time periods can be simulated. The device type of the target device may include, but is not limited to, a device height limit type, a device attribute type, a material category for device material scheduling, and the like, which is not limited in this embodiment.
Wherein, in response to determining that the target device is ten thousand tons, the constraint condition for the target device may further include: the corresponding tipplers of a single ten thousand ton train are the same, so that no time for the train alignment exists; and/or the tippler corresponding to the ten thousand ton train is of a type suitable for the ten thousand ton train.
In other optional implementations of this embodiment, the method further includes: and determining a preset reward function based on the total train material dispatching amount, the total ship unloading amount and the belt operation interval in a preset time period.
In this implementation, the preset reward function may be determined based on the total amount of the transportation materials and the resource waiting duration, and specifically, the desired target may be set to be the maximum total amount of the transportation materials and the minimum resource waiting duration. The preset reward function may be as follows:
Figure BDA0003423056350000111
where reward represents the expectation function, STiRepresents the starting time of the ith train, T represents the preset time,
Figure BDA0003423056350000112
indicates the start time, tw, of all trains within a certain timeiWeight, y, representing the i-th train material scheduleiShowing whether the ith train is selected for material scheduling, if so, 1, if not, 0, bwjMaterial for showing material scheduling of jth shipClass, kjAnd the number of the j trains is represented by whether the material dispatching is selected, if so, the number is 1, if not, the number is 0, n represents the total number of the trains, and m represents the total number of the ships. For example, if port material scheduling is simulated in the time range from 7:00 to 18:30 and 1 minute is taken as the interval time when model training is performed, T at this time may be 660, which represents the total time interval from 7:00 to 18:30 of 1 minute. If the total number of trains is constant within this time frame, the longer the interval time between trains, the higher the prize value.
The model training method provided by the embodiment of the disclosure can also simulate port material scheduling operation by using the simulation environment and the sample material scheduling parameter set, update the sample state information based on the simulation environment, and improve the authenticity of the update of the sample state information. And configuring the operation parameters of the target equipment in the simulation environment by setting the constraint conditions, so that the configuration rationality of the operation parameters can be improved, and the updating accuracy of the sample state information is further improved. And based on the total train material scheduling amount, the total ship unloading amount and the belt operation interval in the preset time period, a model training target can be established from the two aspects of the total material scheduling amount and the waiting time, so that the strategy generation effect of the trained model is improved.
With further reference to fig. 6, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a material scheduling apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to electronic devices such as a terminal device and a server.
As shown in fig. 6, the material scheduling apparatus 600 of the present embodiment includes: a status acquisition unit 601, a parameter determination unit 602, and a material scheduling job unit 603.
The status obtaining unit 601 is configured to obtain status information corresponding to the target port.
And a parameter determination unit 602 configured to determine a material scheduling parameter set matching the state information based on the state information and the trained port material scheduling model.
And a material scheduling operation unit 603 configured to execute the port material scheduling operation based on each material scheduling parameter in the material scheduling parameter set.
In some optional implementations of this embodiment, the status information includes at least one of: stacking state information, tipper operating state information, belt operating state information, reclaimer operating state information and ship loader operating state information.
In some optional implementations of this embodiment, each material scheduling parameter in the material scheduling parameter set at least includes one of the following: the system comprises freight train parameters, train carried material category parameters, belt parameters, tipper parameters, stacking parameters, unloader parameters, tripper parameters, reclaimer parameters, activation feeder parameters, ship loader parameters and ship entry list parameters.
It should be understood that the units 601 to 603 described in the apparatus 600 for material scheduling correspond to the respective steps in the method described with reference to fig. 2, respectively. Thus, the operations and features described above for the material scheduling method are also applicable to the apparatus 600 and the units included therein, and are not described herein again.
With further reference to fig. 7, as an implementation of the methods shown in the above-mentioned figures, the present disclosure provides an embodiment of a model training apparatus, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 4, and the apparatus may be specifically applied to electronic devices such as a terminal device and a server.
As shown in fig. 7, the model training apparatus 700 of the present embodiment includes: a sample state acquisition unit 701 and a model training unit 702.
A state acquisition unit 701 configured to acquire sample state information.
A model training unit 702 configured to perform the following model training steps on the sample state information: determining a sample material scheduling parameter set matched with the sample state information based on the sample state information and the model to be trained; determining a reward value based on the sample state information, the sample material scheduling parameter set and a preset reward function; and in response to the fact that the reward value meets the preset convergence condition, determining the model to be trained as the trained port material scheduling model.
In some optional implementations of this embodiment, the model training unit 702 is further configured to: and in response to the fact that the reward value does not meet the preset convergence condition, updating the sample state information based on the simulation environment, and executing a model training step on the updated sample state information until a trained port material scheduling model is obtained.
In some optional implementations of this embodiment, the model training unit 702 is further configured to: controlling the simulation environment to simulate port material scheduling operation based on the sample material scheduling parameter set to obtain a simulation environment after the simulation port material scheduling; and updating the sample state information based on the simulation environment after the port material dispatching is simulated.
In some optional implementations of this embodiment, the model training unit 702 is further configured to: configuring operation parameters of target equipment in a simulation environment based on a sample material scheduling parameter set and preset constraint conditions; and controlling the target equipment to operate according to the operation parameters to obtain the simulation environment after the material scheduling of the simulated port.
In some optional implementations of this embodiment, the preset constraint condition at least includes: the target device is an available device; and/or the working time of the target equipment meets a preset time condition; and/or the device type of the target device is matched with the sample material scheduling parameters in the sample material scheduling parameter set.
In some optional implementations of this embodiment, the sample state information includes at least one of: stacking sample state information, tipper sample operation state information, belt sample operation state information, reclaimer sample operation state information, and ship loader sample operation state information.
In some optional implementations of this embodiment, each sample material scheduling parameter in the sample material scheduling parameter set includes at least one of the following: the system comprises a freight train sample parameter, a train carried material category sample parameter, a belt sample parameter, a tipper sample parameter, a stacking sample parameter, an unloader sample parameter, a reclaimer sample parameter, an activating feeder sample parameter, a ship loader sample parameter and a ship entry list sample parameter.
In some optional implementations of this embodiment, the model training unit 702 is further configured to: and determining a preset reward function based on the train cargo pulling total amount, the ship unloading total amount and the belt operation interval in a preset time period.
It should be understood that the units 701 to 703 recited in the model training apparatus 700 correspond to the respective steps in the method described with reference to fig. 4. Thus, the operations and features described above with respect to the model training method are equally applicable to the apparatus 700 and the units included therein, and are not described in detail here.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 performs the respective methods and processes described above, such as the material scheduling method or the model training method. For example, in some embodiments, the asset scheduling method or the model training method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When loaded into RAM803 and executed by computing unit 801, a computer program may perform one or more steps of the asset scheduling method or model training method described above. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the material scheduling method or the model training method by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (25)

1. A material scheduling method comprises the following steps:
acquiring state information corresponding to a target port;
determining a material scheduling parameter set matched with the state information based on the state information and a trained port material scheduling model;
and executing port material scheduling operation based on each material scheduling parameter in the material scheduling parameter set.
2. The method of claim 1, wherein the status information comprises at least one of: stacking state information, tipper operating state information, belt operating state information, reclaimer operating state information and ship loader operating state information.
3. The method of claim 1, wherein each asset scheduling parameter of the set of asset scheduling parameters comprises at least one of: the system comprises freight train parameters, train carried material category parameters, belt parameters, tipper parameters, stacking parameters, unloader parameters, tripper parameters, reclaimer parameters, activation feeder parameters, ship loader parameters and ship entry list parameters.
4. A model training method, comprising:
obtaining sample state information;
performing the following model training steps on the sample state information: determining a sample material scheduling parameter set matched with the sample state information based on the sample state information and the model to be trained;
determining a reward value based on the sample state information, the sample material scheduling parameter set and a preset reward function;
and in response to the fact that the reward value meets the preset convergence condition, determining the model to be trained as a trained port material scheduling model.
5. The method of claim 4, further comprising:
and in response to the fact that the reward value does not meet the preset convergence condition, updating the sample state information based on the simulation environment, and executing the model training step on the updated sample state information until the trained port material scheduling model is obtained.
6. The method of claim 5, wherein said updating the sample state information based on a simulation environment comprises:
controlling the simulated environment simulation port material scheduling operation based on the sample material scheduling parameter set to obtain a simulated environment after the simulated port material scheduling;
and updating the sample state information based on the simulation environment after the simulated port material scheduling.
7. The method of claim 6, wherein the controlling the simulated environment to simulate port material scheduling operations based on the sample material scheduling parameter set to obtain a simulated environment after simulated port material scheduling comprises:
configuring operation parameters of target equipment in the simulation environment based on the sample material scheduling parameter set and preset constraint conditions;
and controlling the target equipment to operate according to the operation parameters to obtain the simulation environment after the material scheduling of the simulated port.
8. The method according to claim 7, wherein the preset constraints comprise at least:
the target device is an available device; and/or
The operation time of the target equipment meets a preset time condition; and/or
The device type of the target device is matched with the sample material scheduling parameters in the sample material scheduling parameter set.
9. The method of claim 4, wherein the sample state information comprises at least one of: stacking sample state information, tipper sample operation state information, belt sample operation state information, reclaimer sample operation state information, and ship loader sample operation state information.
10. The method of claim 4, wherein each of the sample asset scheduling parameters in the set of sample asset scheduling parameters comprises at least one of: the system comprises a freight train sample parameter, a train carried material category sample parameter, a belt sample parameter, a tipper sample parameter, a stacking sample parameter, an unloader sample parameter, a reclaimer sample parameter, an activating feeder sample parameter, a ship loader sample parameter and a ship entry list sample parameter.
11. The method of claim 4, further comprising:
and determining the preset reward function based on the total train freight, the total ship unloading and the belt operation interval in a preset time period.
12. An apparatus for material scheduling, comprising:
the state acquisition unit is configured to acquire state information corresponding to the target port;
a parameter determination unit configured to determine a material scheduling parameter set matched with the state information based on the state information and a trained port material scheduling model;
and the material scheduling operation unit is configured to execute port material scheduling operation based on each material scheduling parameter in the material scheduling parameter set.
13. The apparatus of claim 12, wherein the status information comprises at least one of: stacking state information, tipper operating state information, belt operating state information, reclaimer operating state information and ship loader operating state information.
14. The apparatus of claim 12, wherein each asset scheduling parameter of the set of asset scheduling parameters comprises at least one of: the system comprises freight train parameters, train carried material category parameters, belt parameters, tipper parameters, stacking parameters, unloader parameters, tripper parameters, reclaimer parameters, activation feeder parameters, ship loader parameters and ship entry list parameters.
15. A model training apparatus comprising:
a sample state acquisition unit configured to acquire sample state information;
a model training unit configured to perform the following model training steps on the sample state information: determining a sample material scheduling parameter set matched with the sample state information based on the sample state information and the model to be trained; determining a reward value based on the sample state information, the sample material scheduling parameter set and a preset reward function; and in response to the fact that the reward value meets the preset convergence condition, determining the model to be trained as a trained port material scheduling model.
16. The apparatus of claim 15, wherein the model training unit is further configured to:
and in response to the fact that the reward value does not meet the preset convergence condition, updating the sample state information based on the simulation environment, and executing the model training step on the updated sample state information until the trained port material scheduling model is obtained.
17. The apparatus of claim 16, wherein the model training unit is further configured to:
controlling the simulated environment simulation port material scheduling operation based on the sample material scheduling parameter set to obtain a simulated environment after the simulated port material scheduling;
and updating the sample state information based on the simulation environment after the simulated port material scheduling.
18. The apparatus of claim 17, wherein the model training unit is further configured to:
configuring operation parameters of target equipment in the simulation environment based on the sample material scheduling parameter set and preset constraint conditions;
and controlling the target equipment to operate according to the operation parameters to obtain the simulation environment after the material scheduling of the simulated port.
19. The apparatus of claim 18, wherein the preset constraints comprise at least:
the target device is an available device; and/or
The operation time of the target equipment meets a preset time condition; and/or
The device type of the target device is matched with the sample material scheduling parameters in the sample material scheduling parameter set.
20. The apparatus of claim 15, wherein the sample state information comprises at least one of: stacking sample state information, tipper sample operation state information, belt sample operation state information, reclaimer sample operation state information, and ship loader sample operation state information.
21. The apparatus of claim 15, wherein each of the sample asset scheduling parameters in the set of sample asset scheduling parameters comprises at least one of: the system comprises a freight train sample parameter, a train carried material category sample parameter, a belt sample parameter, a tipper sample parameter, a stacking sample parameter, an unloader sample parameter, a reclaimer sample parameter, an activating feeder sample parameter, a ship loader sample parameter and a ship entry list sample parameter.
22. The apparatus of claim 15, wherein the model training unit is further configured to:
and determining the preset reward function based on the total train freight, the total ship unloading and the belt operation interval in a preset time period.
23. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.
24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-11.
25. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-11.
CN202111569356.3A 2021-12-21 2021-12-21 Material scheduling method, model training method and device Pending CN114266518A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111569356.3A CN114266518A (en) 2021-12-21 2021-12-21 Material scheduling method, model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111569356.3A CN114266518A (en) 2021-12-21 2021-12-21 Material scheduling method, model training method and device

Publications (1)

Publication Number Publication Date
CN114266518A true CN114266518A (en) 2022-04-01

Family

ID=80828457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111569356.3A Pending CN114266518A (en) 2021-12-21 2021-12-21 Material scheduling method, model training method and device

Country Status (1)

Country Link
CN (1) CN114266518A (en)

Similar Documents

Publication Publication Date Title
Song et al. Study on berth planning problem in a container seaport: Using an integrated programming approach
CN109683504B (en) Warehouse-out control method and device, electronic equipment and storage medium
CN114841315A (en) Method and system for implementing hybrid expert model, electronic device and storage medium
CN113988485B (en) Site arrival amount prediction method and device, electronic equipment and computer readable medium
CN115330227A (en) Container position allocation method, device, equipment and medium for container yard
CN112068455A (en) Task simulation method, system, device, electronic equipment and readable storage medium
CN113887828B (en) Intelligent supply chain production, transportation and marketing cooperation and real-time network planning method and device
CN111144796B (en) Method and device for generating tally information
Rida Modeling and optimization of decision-making process during loading and unloading operations at container port
US20220076366A1 (en) Solving optimization problems associated with maritime facility on optimization solver machines
CN114266518A (en) Material scheduling method, model training method and device
CN116029505A (en) Vehicle scheduling method, device, electronic equipment and readable storage medium
KR20130099634A (en) Business process management of automated container terminal and method using the same
CN114331265A (en) Method and apparatus for outputting information
CN115330058A (en) Object selection model training method, object selection method and device
US11615497B2 (en) Managing optimization of a network flow
CN115700670A (en) Freight transportation goods splicing method
EP3783547B1 (en) System and methods for reply date response and due date management in manufacturing
CN114919908A (en) Storage robot configuration quantity planning method and device and electronic equipment
CN114707820A (en) Cargo transportation method and device, terminal equipment and readable storage medium
JP6554218B1 (en) Deployment number determination device and deployment number determination method
Tsaples et al. Investigating the effects of introducing Automated Straddle Carriers in port operations with a System Dynamics model
CN115526453B (en) Vehicle scheduling method, device, equipment and storage medium
CN112950106B (en) Stock stocking method and device for transfer vehicle, electronic equipment and storage medium
CN115293412A (en) Object scheduling method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination