CN118096006B

CN118096006B - Freight vehicle position prediction method, system and storage medium

Info

Publication number: CN118096006B
Application number: CN202410505059.XA
Authority: CN
Inventors: 董保江; 董保山; 金勇�
Original assignee: Guizhou Kaxiong Kadi Network Technology Co ltd
Current assignee: Guizhou Kaxiong Kadi Network Technology Co ltd
Priority date: 2024-04-25
Filing date: 2024-04-25
Publication date: 2024-08-13
Anticipated expiration: 2044-04-25
Also published as: CN118096006A

Abstract

The invention relates to the technical field of logistics and discloses a method, a system and a storage medium for predicting the position of a freight vehicle, wherein the method for predicting the position of the freight vehicle comprises the following steps of 101, collecting current position information of the vehicle and information of cargoes loaded by the vehicle, wherein the information of the cargoes comprises the position of the cargoes in a carriage of the vehicle; step 102, generating intelligent agents, wherein each intelligent agent is bound with a vehicle; step 103, generating an environment, wherein the environment comprises all stations, vehicles, cargoes and carriages, and the carriages are divided into a plurality of loading areas; step 104, generating an initial state according to the current position information of the vehicle and the information of the cargoes loaded on the vehicle. The invention reasonably selects the position of the transfer station, so that the goods can be transferred in the shortest time, the distance and the transportation time are reduced, the problem of route optimization in the process of shipping large goods can be effectively solved, the transportation efficiency is improved, and the goods can be safely delivered to the destination.

Description

Freight vehicle position prediction method, system and storage medium

Technical Field

The present invention relates to the field of logistics, and more particularly, to a method, a system and a storage medium for predicting a position of a freight vehicle.

Background

Shipping, which is a form of logistics, refers to the service of shippers entrusting companies with carrier qualification to transport goods to specified sites and giving them to specified recipients. According to different delivery modes, the method can be divided into sea delivery, land delivery and air delivery.

When large goods are transported on land, after the goods arrive at a fixed large-scale transfer place, the goods are required to be delivered by a worker through a freight vehicle, and the delivery priority is manually and automatically judged to deliver, so that the time of many deliveries cannot reach the standard, and route optimization is required to ensure that the goods safely and quickly arrive at a destination and reduce the transportation cost.

Disclosure of Invention

The invention provides a freight vehicle position prediction method, a freight vehicle position prediction system and a storage medium, which solve the technical problems that in the prior art, the delivery is carried out by manually judging the delivery priority by oneself, so that the timeliness of a plurality of deliveries cannot reach the standard, and route optimization is needed to ensure that the freight safely and quickly reaches a destination and reduce the transportation cost.

The invention provides a freight vehicle position prediction method, which comprises the following steps:

Step 101, collecting current position information of a vehicle and information of cargoes loaded by the vehicle, wherein the information of the cargoes comprises positions of the cargoes in a carriage of the vehicle;

step 102, generating intelligent agents, wherein each intelligent agent is bound with a vehicle;

step 103, generating an environment, wherein the environment comprises all stations, vehicles, cargoes and carriages, and the carriages are divided into a plurality of loading areas;

104, generating an initial state according to the current position information of the vehicle and the information of cargoes loaded on the vehicle;

step 105, inputting an initial state, and selecting behaviors by each agent according to the corresponding strategy network; executing the behaviors of all the agents to obtain the next state;

step 106, initializing the time step to be 2;

Step 107, inputting the next state obtained in the previous time step, and then selecting a behavior by each agent according to the corresponding strategy network; i.e. for each time step t and for each agent i, according to a policy network And epsilon-greedy policy selection behaviorExecuting the behaviors of all the agents to obtain the next state;

Policy network : The decision core of the intelligent agent receives state input and outputs behavior decisions;

Epsilon-greedy strategy: exploring and utilizing a trade-off strategy during training, wherein epsilon probability selects random behaviors to explore, and 1-epsilon probability selects current optimal behaviors to utilize learned experience;

Step 108, if all the cargoes reach the destination, entering the next step, otherwise accumulating the time steps by 1, and returning to step 107;

step 109, predicting and planning a moving path of each vehicle and loading and unloading information according to the behavior of each agent, wherein the moving path comprises a station for transferring cargoes, the loading and unloading information comprises loading or unloading operations of cargoes required to be executed by the vehicle at the station, and the loading operations comprise specific positions of cargoes required to be loaded to carriages of the vehicle.

Further, in step 104, for each cargo, recording its current station, destination station, vehicle and position within the car;

For each vehicle, recording the current station, the next target station, the residual capacity and the occupation condition of each position;

For each stop, a list of its current goods waiting to be loaded and unloaded is recorded.

Further, in step 102, each vehicle is an agent that optimizes the overall scheduling process by observing environmental conditions and execution behavior.

Further, in step 103, including all stations, vehicles, goods and cars, the environment receives the actions of the agent, updates the status, and gives rewards feedback.

Further, in step 105, the method further includes the following steps:

s201: initializing policy network for each agent And Q networkAnd a corresponding target network;

s202: initializing an experience playback buffer D;

s203: for each training round number:

Firstly resetting the environment to obtain an initial state s;

then for each time step t, each agent i, according to the policy network And epsilon-greedy policy selection behaviorWherein；

BehaviorMay be a composite behavior including direction of movement, distance of movement, load/unload decisions, and number of loads/unloads;

Executing all the behaviors of the intelligent agents to obtain the next state RewardingAnd a termination flag done;

Will transfer Store in experience playback buffer D, whereReset the environment if done, otherwise；

For each training step:

randomly sampling a batch of transfer data from D:

for each agent i, a target Q value is calculated:

if done, then Otherwise:

；

Wherein the method comprises the steps of I.e. the act of calculating the next state from the target policy network, wherein；

Update Q network, minimize loss:

；

updating a policy network to maximize the target:

wherein ；

Soft updating target network of all agents:

Wherein the method comprises the steps of AndParameters of the Q network and policy network respectively,Is a small soft update coefficient.

S204: at the time of testing, for each agent i, the policy network is directly usedTo select behavior without using the epsilon-greedy policy.

Wherein:

: number of agents (vehicles);

: a state space including states of all vehicles, goods and stations;

: the behavior space of the ith agent;

: the observation space of the ith agent, typically a subset of the state space;

: the policy network of the ith agent inputs, observes and outputs behaviors;

: the Q network of the ith intelligent agent inputs the state and the behaviors of all the intelligent agents and outputs a Q value;

D: the experience playback pool is used for storing interaction data of the intelligent agent and the environment;

: the 1 st agent is in the current state A behavior of the lower selection;

: the ith agent is in the current state A behavior of the lower selection;

: the Nth agent is in the current state A behavior of the lower selection;

: a discount factor that balances the weights of the current rewards and the future rewards;

： The corresponding target network is used for calculating a sample target value;

: the 1 st agent's behavior selected in the next state;

: the behavior of the jth agent selected in the next state;

: the N-th agent selects the action in the next state;

: behavior of the next state of the target policy network of the jth agent;

: the behavior space of the jth agent;

: updating the minimum target value of the Q network;

: the maximization goal of the policy network is updated.

Further, using a graph neural network (Graph Neural Network, GNN) to optimize a policy network, each car is considered as a graph, the nodes in the graph represent locations within the car, the edges represent the connection between the locations, each good may also be represented as a node connected to the location node where it is located, and the overall dispatch environment may be considered as a collection of these car graphs.

Further, the calculation formula of the first hidden layer of the graph neural network is as follows:

；

a second node vector representing an ith node, the second node vector being the node vector after the update, AndA first node vector representing the i and j-th nodes respectively,AndRepresenting a first and a second weight parameter (trainable parameter) respectively,A transpose (trainable parameter) of the weight vector representing the first hidden layer,Representing a set of nodes directly connected to the ith node;

Representing nodes With neighbor nodesThe weight of the attention between them,A first node vector representing an ith node represents a vector at the node representation of the current graph's attention network layer,The first node vector representing the jth node represents a vector at the node representation of the current graph's attention network layer.

Further, the graph structure data further comprises a first node vector corresponding to each node;

a first node vector representing a node of the vehicle is obtained from capacity information encoding of each storage location of a cabin of the vehicle;

a first node vector representing a node of the cargo is obtained according to the storage position information of the carriage where the cargo is located and the volume information code of the cargo;

The graph structure data is input into a graph neural network, the graph neural network comprises a first hidden layer and a full connection layer, the first hidden layer outputs a second node vector of each node, the full connection layer inputs the second node vector, and the position ID is output.

A freight vehicle location prediction system for generating a dispatch strategy by a freight vehicle location prediction method, comprising:

And the data acquisition and preprocessing module is used for: collecting GPS position data of a vehicle, cargo loading and unloading records and path planning information, and converting the collected data into a graph structure suitable for model training;

model training module: dividing the preprocessed data into a training set, a verification set and a test set, training the GNN model by using the training data, and performing super-parameter tuning by using the verification data;

And an online prediction module: deploying the trained model into an environment by receiving current environment information of the vehicle in real time;

And a scheduling optimization module: and predicting the agent selection behavior by using a strategy network, and generating an optimal scheduling strategy by combining the vehicle driving distance, the cargo handling workload and the customer satisfaction information.

A storage medium storing non-transitory computer readable instructions for performing one or more steps in a method of predicting a location of a freight vehicle.

The invention has the beneficial effects that:

The position of the transfer station can be reasonably selected by the position of the freight vehicle, so that the freight can be transferred in the shortest time, the distance and the transportation time are reduced, the problem of route optimization existing in the process of shipping large-piece freight can be effectively solved, the transportation efficiency is improved, and the safety of the freight to a destination is ensured.

Drawings

FIG. 1 is a flow chart of a method of predicting a position of a freight vehicle in accordance with the present invention;

Fig. 2 is a block diagram of a freight vehicle position prediction system according to the present invention.

Detailed Description

The subject matter described herein will now be discussed with reference to example embodiments. It is to be understood that these embodiments are merely discussed so that those skilled in the art may better understand and implement the subject matter described herein and that changes may be made in the function and arrangement of the elements discussed without departing from the scope of the disclosure herein. Various examples may omit, replace, or add various procedures or components as desired. In addition, features described with respect to some examples may be combined in other examples as well.

Referring to fig. 1, a method for predicting a position of a freight vehicle includes the steps of:

for each cargo, recording its current station, destination station, vehicle (if loaded), and location within the car (if loaded);

For each station, recording its current list of goods waiting for loading and unloading;

step 106, initializing the time step to be 2;

Step 107, inputting the next state obtained in the previous time step, and then selecting a behavior by each agent according to the corresponding strategy network; executing the behaviors of all the agents to obtain the next state;

According to steps 101-109 above, each vehicle is treated as an agent with the objective of minimizing global vehicle distance and load handling effort while maximizing customer satisfaction, specifically:

defining a representation of an agent:

defining an agent to include agents and environments;

Wherein the agent: each vehicle is an agent that optimizes the overall scheduling process by observing environmental conditions and execution behavior;

Wherein the environment: the environment receives the behaviors of the intelligent agents, updates the state and gives rewarding feedback.

Representation of state space:

the entire state can be expressed as one tuple: (cargo state, vehicle state, station state).

The state information at the t-th moment is derived from information of the t-th arrival of each vehicle at the station, and information of cargoes can be acquired from the vehicles, and the information of the first arrival of the vehicles at the station is defined when the vehicles start.

Definition of behavioral space:

for each agent (vehicle), the following atomic behaviors are defined:

Moving to a certain station;

loading a certain cargo to a certain position of a carriage at a certain station;

unloading a certain cargo from a certain position of a carriage at a certain station;

The behavior of each agent may be any combination of these atomic behaviors.

For example: the vehicle moves to the station S1, loads the cargo P1 to the position L1, loads the cargo P2 to the position L2, moves to the station S2, unloads the cargo P1, and unloads the cargo P2.

Wherein the cargo conditions in loading and unloading cargo are defined by a function for determining whether the cargo satisfies the loading conditions;

behavior: traversing all cargoes, finding out the first cargo meeting the condition and loading the first cargo to a designated position;

for example, the function is used to determine whether the volume of the cargo is equal to or less than a specified maximum volume;

input: cargo objects and specified maximum volumes;

and (3) outputting: if the volume of the goods is less than or equal to the specified maximum volume, returning True, otherwise returning False.

Using these atomic behaviors, a scheduling policy can be expressed.

It should be added that the loading and unloading operations may also be determined using the destination, priority, type, etc. attributes of the goods, rather than using specific goods IDs, so that the same strategy may be applied to different goods combinations.

Reward function

There are mainly three optimization objectives: minimizing vehicle travel distance, minimizing cargo handling workload, and maximizing customer satisfaction may be quantified as different bonus items, which are then combined into a comprehensive bonus function.

Vehicle travel distance rewards:

the travel distance of the vehicle may be defined as a negative prize, i.e. prize = - α travel distance.

Where α is a positive scaling factor used to balance the importance of the bonus term.

This bonus term encourages the agent to minimize the total distance travelled by the vehicle.

Cargo handling workload rewards:

Similarly, the load handling workload may be defined as a negative incentive, i.e., incentive= - β.

Where β is another positive scaling factor.

This incentive would encourage the agent to minimize the overall handling effort.

The handling workload may be calculated based on the position of the cargo in the car, for example, dividing the car into three areas, front, middle and rear, and assigning a weight to each area according to the distance from the door of the car:

front part: the loading and unloading are most convenient due to the fact that the loading and unloading are closest to the carriage door, and the workload weight is 1;

middle part: the distance from the carriage door is moderate, the loading and the unloading are relatively convenient, and the workload weight is 1.5;

Rear part: the loading and unloading are the most inconvenient when the loading and unloading are furthest away from the carriage door, and the workload weight is 2;

Next, the handling workload may be calculated using the corresponding workload weights according to the position of the cargo in the vehicle compartment;

the handling workload for each load was calculated using the following formula:

handling workload = cargo weight x workload weight;

In this way, the handling workload can be reasonably estimated based on the location of the cargo within the car, and the estimation can help optimize the scheduling strategy to minimize the overall handling workload.

Customer satisfaction rewards:

Customer satisfaction may be measured in terms of delivery times of the goods, and a desired delivery time may be defined for each good.

If the goods are delivered in advance, a positive prize is given, i.e. prize = γ (desired delivery time-actual delivery time).

If the delivery of goods is delayed, a negative prize, i.e. prize = - δ (actual delivery time-expected delivery time), is given.

Where γ and δ are positive scaling factors and typically δ > γ, represents a greater penalty for late delivery than for early delivery.

Comprehensive rewarding function:

The three bonus items are combined into a weighted sum, namely:

Comprehensive rewards = * Vehicle distance rewards +* Cargo handling workload rewards +* A customer satisfaction reward;

wherein, ，，And the weight coefficient is non-negative, and reflects the relative importance degree of different optimization targets.

By adjusting these weights, a bonus function that balances different factors can be obtained.

Other considerations:

In addition to the three main optimization objectives described above, some other considerations may be added to the bonus function.

For example, a small negative reward may be given to encourage the agent to reduce unnecessary handling operations.

A small positive incentive may also be given to encourage the agent to load and unload goods equally at each station to avoid accumulation of goods at certain stations, these additional incentive items may help the agent learn some useful heuristics.

The optimal scheduling strategy is learned in training using MADDPG algorithm, which includes initializing each agent's strategy network and Q network, where MADDPG algorithm combines the ideas of centralized training and distributed execution, in particular:

the training process of MADDPG algorithm is as follows:

s202: initializing an experience playback buffer D;

s203: for each training round number:

Firstly resetting the environment to obtain an initial state s;

For each training step:

randomly sampling a batch of transfer data from D:

for each agent i, a target Q value is calculated:

if done, then Otherwise:

；

Update Q network, minimize loss:

；

updating a policy network to maximize the target:

wherein ；

Soft updating target network of all agents:

In the epsilon-greedy strategy epsilon is a hyper-parameter representing the probability of exploration. Specifically, when an agent needs to select a behavior:

one behavior is randomly selected with a probability epsilon. This means that the smart agent will try different behaviors with a certain probability, even if these behaviors currently look less than optimal. This helps to discover better behavioural policies that may exist in the environment.

And selecting the action with the highest current estimated Q value by using the probability 1-epsilon. This means that the smart will tend to choose what appears to be the best behaviour based on current knowledge to maximize the expected return.

For example, if ε=0.1, then the smart will randomly select one behavior with a 10% probability and select the currently estimated optimal behavior with a 90% probability.

The epsilon-greedy strategy provides an intuitive trade-off: the larger the epsilon value, the more likely the agent will be to explore; the smaller the epsilon value, the more likely the agent will be to utilize;

Wherein:

: number of agents (vehicles);

: a state space including states of all vehicles, goods and stations;

: the behavior space of the ith agent;

: the policy network of the ith agent inputs, observes and outputs behaviors;

: the 1 st agent is in the current state A behavior of the lower selection;

: the ith agent is in the current state A behavior of the lower selection;

: the Nth agent is in the current state A behavior of the lower selection;

: the 1 st agent's behavior selected in the next state;

: the behavior of the jth agent selected in the next state;

: the N-th agent selects the action in the next state;

: behavior of the next state of the target policy network of the jth agent;

: the behavior space of the jth agent;

: updating the minimum target value of the Q network;

: updating a maximization target of the policy network;

: a prize (Reward) earned at the current time step;

: current environmental State (State);

: performing actions of all agent selections Then, the next state is transferred;

: sample target Q value of the ith agent for guidance Training of a network

Done: a boolean value indicating whether the current round is terminated;

Epsilon-greedy strategy: the trade-off strategy to be explored and utilized during training, epsilon probability select random behavior to explore, and 1-epsilon probability select current optimal behavior utilize learned experience.

The strategy network is a decision core of an intelligent agent, receives state input and outputs behavior decisions, and is optimized by using a graph neural network (Graph Neural Network, GNN) in consideration of flexibility of the internal structure of a carriage and interaction between cargoes, and the GNN can well process graph structure data and capture complex relations between entities.

Specifically, each car can be regarded as a graph, the nodes in the graph represent positions in the car, the edges represent connection relations between the positions, each cargo can also be represented as a node which is connected with the position node where the cargo is located, and the whole scheduling environment can be regarded as a collection of the car graphs.

Definition of the figures:

Status: constructing graph structure data based on state information, wherein the graph structure data comprises nodes and edges, one node represents a vehicle or a goods or a station, and the condition that the edges exist between the two nodes is as follows:

both nodes represent cargo, and the cargo represented by both nodes is located on the same vehicle;

Both nodes represent vehicles and both vehicles are located at the same station;

one of the two nodes represents a vehicle, and the other represents a station, where the vehicle is located;

Both nodes represent stations;

the graph structure data also comprises a first node vector corresponding to each node;

For example, the compartment is divided into three areas of a front part, a middle part and a rear part, each area is divided according to the distance from the compartment door, the compartment door is closest to the front part, the compartment door is furthest from the rear part, and the middle part is arranged between the front part and the rear part;

a first node vector representing a node of the cargo is obtained by encoding the storage position information (front, middle and rear) of the carriage in which the cargo is located and the volume information of the cargo;

For example, position information of a car is encoded with a vector of length 3, where one dimension represents one of the front, middle, and rear three positions, and the other dimension is 0, e.g., the front car may be represented as [1, 0], the middle car may be represented as [0,1,0], and the rear car may be represented as [0, 1];

The actual volume of the goods is 100 cubic meters, and 100 can be directly used as the numerical value of one dimension of the node;

a first node vector representing a node of the station is obtained by encoding according to the position information of the station;

For example, the location information of a station may be encoded with a vector of length 3, where one dimension represents one of a start point, an intermediate point, and an end point, and the other dimension is 0, e.g., the start station may be represented as [1, 0], the intermediate station may be represented as [0,1,0], and the end station may be represented as [0, 1]; the first node vector of the origin station node may be expressed as: [1, 0];

Of course, the foregoing information may be represented by text information, or the corresponding first node vector may be obtained by text encoding.

Inputting the graph structure data into a graph neural network, wherein the graph neural network comprises a first hidden layer and a full connection layer, the first hidden layer outputs a second node vector of each node, the full connection layer inputs the second node vector, and the position ID is output;

When the second node vector of the input full-connection layer belongs to a node representing a vehicle, the output position ID represents the position of the station, and the generated atomic behavior is the position of the station represented by the position ID of the next arrival station of the vehicle;

For example, station 3 is denoted by ID 03;

an ID of 20 indicates a null position when the vehicle is not moving;

When the second node vector input into the full connection layer belongs to a node representing goods, the output position ID represents the storage position of the carriage of the vehicle, and the generated atomic behavior is that the goods need to be transferred to the storage position of the carriage of the vehicle represented by the output position ID when the current vehicle moves to the next station;

for example, an ID of 10 indicates the front compartment of vehicle No. 13;

an ID of 20 is indicated as empty, when the cargo is not moving.

The calculation formula of the first hidden layer is as follows:

；

a second node vector representing an ith node, the second node vector being the node vector after the update, AndA first node vector representing the i and j-th nodes respectively,AndRepresenting a first and a second weight parameter (trainable parameter) respectively,A weight vector (a trainable parameter) representing the first hidden layer,Representing a collection of nodes directly connected to the ith node,Representing nodesWith neighbor nodesThe weight of the attention between them,A first node vector representing an ith node represents a vector at the node representation of the current graph's attention network layer,A first node vector representing a jth node represents a vector at a node representation of the current graph's attention network layer;

Drawing attention layer:

For each node Calculate it and neighbor nodeAttention weighting between：

Wherein the method comprises the steps ofIs a shared matrix of weights that are to be used,Is a function of the attention mechanism (e.g. dot product, series, etc.),Typically representing nodesSum nodeDegree of association or relevance score between them, and then, using the attention weight, calculate the nodeIs represented by the new features of:

Wherein the method comprises the steps of Is an activation function (such as ReLU),The representation sums all neighbor nodes and multiple graph attention layers can be stacked to obtain a higher level representation of the node.

Global pooling:

after annotating the force layer with the last layer, for each graph All node characteristics in the graph are pooled to obtain a graph-level representation：

Wherein the method comprises the steps ofVarious pooling functions, such as maximum pooling, average pooling, etc.,Representing nodesIs characterized in that,The node set representing the i-th graph, i.e., the index set of all the nodes contained in the graph.

Inter-graph attention:

To establish an association between different car graphs, another layer of attention mechanisms may be applied on the graph-level representation:

For each pair of graphs AndCalculate the attention weight between them：

Wherein the method comprises the steps ofIs a shared matrix of weights that are to be used,Representation of the drawingsAndAttention weighting betweenIs a weight matrix shared by the i-th of the (b),Representation of the drawingsAndAttention weighting betweenIs a j-th shared weight matrix of (c), then, the process is carried out, updating representations of each graph using attention weights：

Wherein the method comprises the steps ofRepresenting the summation over all other figures.

Decision output:

Finally, the updated graph is represented Input into a multi-layer perceptron (MLP) to generate scheduling decisions:

for each car diagram MLP outputs a behavior vectorRepresenting a scheduling decision for all cargo in the car:

May include loading, unloading, moving, etc. operations of the cargo;

the representations of all car maps can also be concatenated and input into another MLP to generate global level decisions such as path planning of the vehicle, etc.

During training, behavior can be sampled from these probability distributions using re-parameterized techniques to achieve end-to-end policy gradient optimization.

At the time of testing, the highest probability behavior may be selected or sampled according to some exploration strategy.

Interactions between different car structures and cargo can be flexibly handled, and through the attention mechanism and global aggregation, the policy network can dynamically pay attention to important factors to make context-aware decisions.

Referring to fig. 2, in at least one embodiment of the present invention, there is provided a freight vehicle position prediction system including:

model training module: dividing the preprocessed data into a training set, a verification set and a test set, training the GNN model by using the training data, and performing parameter tuning by using the verification data;

At least one embodiment of the present disclosure provides a storage medium storing non-transitory computer readable instructions for performing one or more of the steps of the foregoing method of predicting a position of a freight vehicle.

A computer program may be stored/distributed on a suitable medium, such as an optical storage medium or a solid-state medium supplied together with or as part of other hardware, but may also be distributed in other forms, such as via the Internet or other wired or wireless telecommunication systems. Any reference signs in the claims shall not be construed as limiting the scope.

The embodiment has been described above with reference to the embodiment, but the embodiment is not limited to the above-described specific implementation, which is only illustrative and not restrictive, and many forms can be made by those of ordinary skill in the art, given the benefit of this disclosure, are within the scope of this embodiment.

Claims

1. A method of predicting a location of a freight vehicle, comprising the steps of:

In step 102, each vehicle is an agent that optimizes the overall scheduling process by observing environmental conditions and execution behavior;

in step 103, including all stations, vehicles, goods and carriages, the environment receives the actions of the agent, updates the state and gives rewards feedback;

in step 104, for each cargo, recording its current station, destination station, vehicle and position within the car;

wherein in step 105, the method further comprises the steps of:

s202: initializing an experience playback buffer D;

s203: for each training round number:

Firstly resetting the environment to obtain an initial state s;

BehaviorIs a composite behavior including direction of movement, distance of movement, load/unload decisions and number of loads/unloads;

For each training step:

randomly sampling a batch of transfer data from D:

for each agent i, a target Q value is calculated:

if done, then Otherwise:

；

Update Q network, minimize loss:

；

updating a policy network to maximize the target:

wherein ；

Soft updating target network of all agents:

；

Wherein the method comprises the steps of AndParameters of the Q network and policy network respectively,Is a small soft update coefficient;

s204: at the time of testing, for each agent i, the policy network is directly used To select behavior without using epsilon-greedy policy;

Wherein:

: number of agents;

: a state space including states of all vehicles, goods and stations;

: the behavior space of the ith agent;

: the policy network of the ith agent inputs, observes and outputs behaviors;

step 106, initializing the time step to be 2;

2. A method of predicting the position of a freight vehicle according to claim 1, wherein each car is regarded as a graph using a graph neural network optimization strategy network, wherein nodes in the graph represent positions in the cars, edges represent connection relations between the positions, each freight can also be represented as a node which is connected to the position node where it is located, and the entire dispatch environment can be regarded as a collection of the car graphs.

3. The method of claim 2, wherein the first hidden layer of the neural network is calculated as follows:

；

a second node vector representing an ith node, the second node vector being the node vector after the update, AndA first node vector representing the i and j-th nodes respectively,AndRepresenting a first and a second weight parameter respectively,A weight vector representing the first hidden layer,Representing a collection of nodes directly connected to the ith node.

4. A method of predicting a location of a freight vehicle as defined in claim 3, wherein the map structure data further includes a first node vector for each node;

5. A freight vehicle position prediction system for generating a scheduling policy by the freight vehicle position prediction method of any one of claims 1-4, comprising:

And a scheduling optimization module: and receiving the output of the online prediction module, and generating an optimal scheduling strategy by combining the vehicle driving distance, the cargo handling workload and the customer satisfaction information.

6. A storage medium storing non-transitory computer readable instructions for performing one or more of the steps of the method of predicting a position of a freight vehicle as claimed in any one of claims 1 to 4.