CN112712251B

CN112712251B - Ship intelligent scheduling method applied to barge management system

Info

Publication number: CN112712251B
Application number: CN202011583873.1A
Authority: CN
Inventors: 钟振洋; 李小华; 孙球喜; 崔峰赫; 郑东虹
Original assignee: Zhuhai Port Information Technology Co ltd
Current assignee: Zhuhai Port Information Technology Co ltd
Priority date: 2020-12-28
Filing date: 2020-12-28
Publication date: 2023-09-12
Anticipated expiration: 2040-12-28
Also published as: CN112712251A

Abstract

The invention discloses a ship intelligent scheduling algorithm applied to a barge management system, which comprises the following operation steps: s1: and establishing a port channel network diagram based on a Neo4J database, storing wharf and channel information in a barge management system into a non-relational database Neo4J as nodes and directed edges in the directed graph, wherein the channel information needs to comprise directions, lengths, channel water depths, maximum ship heights and the like, and the information is stored as attributes of the directed edges and is used as an influencing factor of order matching in the subsequent scheduling, and expressing all geographic information in the barge management system in the form of directed graph node space. The intelligent ship scheduling algorithm applied to the barge management system integrates the route, the ship and the order, and obtains a relatively mature algorithm through continuous training of a model, so that the intelligent ship scheduling algorithm is used for scheduling decisions in actual production scenes, and scheduling efficiency and decision benefits are improved.

Description

Ship intelligent scheduling method applied to barge management system

Technical Field

The invention relates to the field of barge transportation, in particular to an intelligent ship scheduling algorithm applied to a barge management system.

Background

In the barge transportation field, the traditional barge transportation scheduling mode completely depends on the working experience of scheduling personnel, because the influence factors of ship scheduling are more, and meanwhile, the selectable schemes are quite many in each scheduling, so that the method has no standard how to judge the advantages and disadvantages of one scheduling result, mainly depends on personal experience, and needs to manually schedule the execution plan of each voyage of each ship.

Disclosure of Invention

The invention mainly aims to provide a ship intelligent scheduling algorithm applied to a barge management system, which can effectively solve the problems in the background technology.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

an intelligent scheduling algorithm for ships applied to a barge management system comprises the following operation steps:

s1: establishing a port channel network diagram based on a Neo4J database, storing wharf and channel information in a barge management system into a non-relational database Neo4J as nodes and directed edges in the directed graph, wherein the channel information needs to comprise directions, lengths, channel water depths, maximum ship heights and the like, and the information is stored as attributes of the directed edges and is used as an influencing factor of order matching in the subsequent scheduling, and expressing all geographic information in the barge management system in the form of directed graph node space to match ships, orders and routes;

s2: based on hierarchical reinforcement learning, an intelligent ship scheduling model is established, and a ship scheduling system is divided into two parts, and mainly comprises the following steps: the system comprises an offline learning module and an online matching module, wherein the offline learning module extracts information such as ship state, scheduling strategy and corresponding rewards from a system historical scheduling track, trains the information and updates parameters of a network, and simultaneously, a strategy value function V(s) obtained in the learning module is put into real-time storage, and the online matching module obtains the strategy value function V(s) from the real-time storage and calculates corresponding utility index rho corresponding to multiple orders of the ship _ij Taking the index as the weight in the ship-order matching bipartite graph to carry out ship scheduling;

s3: the method is characterized by comprising the steps of defining a state, taking historical data of ship-order matching as a data source for extracting state characteristics, specifically taking factors affecting matching in the last period of time as the state of the current ship periphery when a batch of orders are transported, enabling s to represent the state, and enabling s to be s _t ＝(l _t ,μ _t ,υ _t ) Wherein the context feature vector v _t Includes two kinds, dynamicCharacteristic v _dt And a static characteristic v _st ；

S4: after the reward function setting and determining the state space s and the action space a, the state transfer function t is also determined, and the reward function R in reinforcement learning is defined at the moment _/r The rules of (2) are: when the allocation of one order is established, the reward value is matched with the single ship allocation rate of the ship for the order, otherwise, the reward value is 0, and the reward plastic method is utilized to enrich the expression of the reward function, and the reward of selecting the action a on the state s and transferring to the state s' is defined as:

R(s,o,s’)＝R ₀ (s,o,s’)+Φ(s)

wherein R is ₀ (s, o, s') is the originally defined reward function, Φ(s) is the potential function, and is a sub-objective in the vessel learning process, the potential function Φ(s) is defined as:

s5: the algorithm flow is based on the method for estimating the value function of reinforcement learning, a deep neural network is used for fitting, simulated scheduling is carried out, a simulation environment is initialized first, a ship-order matching method is continuously called before the task is finished to randomly execute scheduling, ship scheduling ideas in each scheduling are different, a reward value is returned after execution is finished to reflect the advantages and disadvantages of the current decision, and the reward value is continuously accumulated to reflect the advantages and disadvantages of all matching in the task execution process.

Preferably, the influencing factors in the step S3 include a dock location l where the ship is located _t Original timestamp μ _t Given context feature vector v _t Etc.

Preferably, the dynamic characteristic v in the step S3 _dt Real-time characteristics including supply and demand of given time-space points, and static characteristics v _st The method comprises the steps of ship static state, current ship transportation state, information of transportation orders, cargo quantity and the like.

Preferably, the action space a in the step S4 includes a space where the ship can select a scheduling policy, that is, the order information that can be selected.

Compared with the prior art, the invention has the following beneficial effects:

in the invention, a ship scheduling model is established by using layered reinforcement learning, a route, a ship and an order are integrated, and a relatively mature algorithm is obtained by continuously training the model, so that the method is used for scheduling decisions in actual production scenes, and scheduling efficiency and decision benefit are improved.

Drawings

FIG. 1 is a flow chart of the deployment of the present invention.

Detailed Description

The technical solutions of the embodiments of the present invention will be clearly and completely described below in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention relates to a ship intelligent scheduling algorithm applied to a barge management system, which comprises the following operation steps:

s2: based on hierarchical reinforcement learning, an intelligent ship scheduling model is established, and a ship scheduling system is divided into two parts, and mainly comprises the following steps: the offline learning module extracts information such as ship state, scheduling strategy and corresponding rewards from a system historical scheduling track, trains and updates the informationThe parameters of the network, at the same time, the strategy value function V(s) obtained in the learning module is put into the real-time storage, and in the online matching module, the strategy value function V(s) is obtained from the real-time storage and the corresponding utility index rho corresponding to the ship and multiple orders is calculated _ij Taking the index as the weight in the ship-order matching bipartite graph to carry out ship scheduling;

s3: the method comprises the steps of defining a state, taking historical data of ship-order matching as a data source for extracting state characteristics, specifically taking factors affecting matching in the last period of time as the state of the current ship periphery when a batch of orders are transported, wherein the influencing factors comprise the dock position l of the ship _t Original timestamp μ _t Given context feature vector v _t Let s represent the state, let s be _t ＝(l _t ,μ _t ,υ _t ) Wherein the context feature vector v _t Comprises two types, namely, dynamic characteristic v _dt And a static characteristic v _st Dynamic characteristic v _dt Real-time characteristics including supply and demand of given time-space points, and static characteristics v _st The method comprises the steps of static ship, current ship transportation state, information of transportation orders, cargo quantity and the like;

s4: after the state space s and the action space a are determined, the state transfer function t is also determined, the action space a comprises a space in which a ship can select a scheduling strategy, namely, selectable order information, and a reward function R in reinforcement learning is defined at the moment _/r The rules of (2) are: when the allocation of one order is established, the reward value is matched with the single ship allocation rate of the ship for the order, otherwise, the reward value is 0, and the reward plastic method is utilized to enrich the expression of the reward function, and the reward of selecting the action a on the state s and transferring to the state s' is defined as:

R(s,o,s’)＝R ₀ (s,o,s’)+Φ(s)

According to the invention, a ship scheduling model is established by using layered reinforcement learning, a route, a ship and an order are integrated, and a relatively mature algorithm is obtained by continuously training the model, so that the method is used for scheduling decisions in actual production scenes, and scheduling efficiency and decision benefit are improved.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. An intelligent ship scheduling method applied to a barge management system is characterized in that: the method comprises the following operation steps:

s1: establishing a port channel network diagram based on a Neo4J database, storing wharf and channel information in a barge management system into a non-relational database Neo4J as nodes and directed edges in the directed graph, wherein the channel information needs to comprise directions, lengths, channel water depths and maximum ship heights, and the information is stored as attributes of the directed edges and is used as an influencing factor of order matching in the subsequent scheduling, and expressing all geographic information in the barge management system in the form of directed graph node space to match ships, orders and routes;

s2: establishing an intelligent ship scheduling model based on hierarchical reinforcement learning, and scheduling the shipThe system is divided into two parts, mainly comprising: the system comprises an offline learning module and an online matching module, wherein the offline learning module extracts ship states, scheduling strategies and corresponding rewarding information from a system historical scheduling track, trains the ship states and the scheduling strategies, updates parameters of a network, and simultaneously stores a strategy value function V(s) obtained in the learning module in real time, obtains the strategy value function V(s) from the real-time storage in the online matching module, and calculates corresponding utility indexes rho corresponding to multiple orders of the ship _ij Taking the index as the weight in the ship-order matching bipartite graph to carry out ship scheduling;

s3: the method is characterized by comprising the steps of defining a state, taking historical data of ship-order matching as a data source for extracting state characteristics, specifically taking factors affecting matching in the last period of time as the state of the current ship periphery when a batch of orders are transported, enabling s to represent the state, and enabling s to be s _t ＝(l _t ,μ _t ,υ _t ) Wherein the context feature vector v _t Comprises two types, namely, dynamic characteristic v _dt And a static characteristic v _st The influencing factors include the dock location l at which the ship is located _t Original timestamp μ _t Given context feature vector v _t Dynamic characteristic v _dt Including real-time characteristics of supply and demand for a given spatiotemporal point, static characteristics v _st The method comprises the steps of static ship, current ship transportation state, information of transportation orders and cargo quantity;

s4: after the reward function setting and determining the state space s and the action space a, the state transfer function t is also determined, and the reward function R in reinforcement learning is defined at the moment _/r The rules of (2) are: when the allocation of one order is established, the rewarding value is matched with the single ship allocation rate of the ship for the order, otherwise, the rewarding value is 0, the rewarding plastic method is utilized to enrich the expression of the rewarding function, and the action space a is selected on the state space s and transferred to the state space s ^’ "rewards are defined as:

R(s,o,s ^’ )＝R ₀ (s,o,s ^’ )+Φ(s)

wherein R is ₀ (s,o,s ^’ ) Is thatThe originally defined reward function, Φ(s), is a potential function, and as a sub-target in the ship learning process, the potential function Φ(s) is defined as:

2. The intelligent scheduling method for ships in a barge management system according to claim 1, wherein: the action space a in the step S4 includes a space where the ship can select a scheduling policy, i.e., selectable order information.