CN116205550A

CN116205550A - Logistics garden vehicle goods taking scheduling method based on self-adaptive discrete particle swarm algorithm

Info

Publication number: CN116205550A
Application number: CN202310221846.7A
Authority: CN
Inventors: 黎立璋; 陈伯瑜; 张庆东; 王义惠; 郭逸; 徐哲壮; 徐弘杰; 王金龙
Original assignee: Fujian Sangang Minguang Co Ltd; Fujian Sangang Group Co Ltd
Current assignee: Fujian Sangang Minguang Co Ltd; Fujian Sangang Group Co Ltd
Priority date: 2023-03-09
Filing date: 2023-03-09
Publication date: 2023-06-02

Abstract

Logistics garden vehicle pickup scheduling method based on self-adaptive discrete particle swarm optimization. The method comprises the following specific steps: analyzing operation data related to the logistics park vehicles, and carrying out K-means clustering on target cargoes based on the park entering time and the park exiting time of each vehicle and the target cargoes corresponding to the vehicles; encoding the yard allocation section; encoding the pick-up cargo portion; setting global searching to determine an initial population; setting local search to determine an initial population; setting random search to determine an initial population; setting the maximum iteration number of the algorithm as iter; initializing a state set and an action set in reinforcement learning, selecting actions by using a greedy strategy, and calculating a report according to states and fitness values of the population chromosomes at the current moment. The invention has the beneficial effects that: the waiting time of the vehicle in the picking process can be obviously reduced, the logistics throughput is improved, and the maximum operation time of the storage yard is indirectly shortened so as to meet the work efficiency requirement of the logistics park storage yard.

Description

Logistics garden vehicle goods taking scheduling method based on self-adaptive discrete particle swarm algorithm

Technical Field

The invention relates to the technical field of vehicle dispatching, in particular to a logistics park vehicle goods taking dispatching method based on a self-adaptive discrete particle swarm algorithm.

Background

With the expansion of the logistics park scale, higher requirements are put on the efficiency of the logistics system. The efficient vehicle dispatching can obviously reduce the waiting time of the vehicle in the picking process and improve the logistics throughput.

At present, the vehicle dispatching mainly depends on manual experience, and has low efficiency on the mode that a plurality of vehicles run simultaneously in different yards, and traffic jam is easy to cause. How to utilize the relevant data to optimize the scheduling mechanism to meet the needs of vehicle and yard operations is a significant challenge.

Most of traditional vehicle round-entering queuing management adopts manual scheduling, and round-entering sequence is arranged according to a first-to-first-row criterion, so that problems of disordered scheduling, lower customer satisfaction and the like are caused due to lack of scientific optimization. The key parameters of the traditional meta-heuristic algorithm cannot be dynamically adjusted, so that the solving efficiency and the solving speed cannot reach the expected effect. The invention solves the scheduling problem of the vehicle storage yard based on the self-adaptive discrete particle swarm algorithm, and optimizes parameters by SARSA algorithm and Q learning in reinforcement learning.

Disclosure of Invention

The invention aims to provide a logistics park vehicle goods taking and scheduling method based on a self-adaptive discrete particle swarm algorithm, aiming at the defects and the shortcomings in the prior art, which can remarkably reduce the waiting time of vehicles in the goods taking process, improve logistics throughput and indirectly shorten the maximum operation time of a storage yard so as to meet the work efficiency requirement of the logistics park.

In order to achieve the above purpose, the invention adopts the following technical scheme: the logistics park vehicle goods taking and dispatching method based on the self-adaptive discrete particle swarm algorithm comprises the following specific steps: analyzing operation data related to the logistics park vehicles, and carrying out K-means clustering on target cargoes based on the park entering time and the park exiting time of each vehicle and the target cargoes corresponding to the vehicles; encoding the yard allocation section; encoding the pick-up cargo portion; setting global searching to determine an initial population; setting local search to determine an initial population; setting random search to determine an initial population; setting the maximum iteration number of the algorithm as iter; initializing a state set and an action set in reinforcement learning, selecting actions by using a greedy strategy, and calculating a report according to states and fitness values of population chromosomes at the current moment; the Sarsa algorithm updates the Q table, judges whether the current Sarsa update times are greater than the conversion times, if so, starts the QLearning algorithm to update the Q table, and selects actions by using a greedy strategy; judging the quality of the current population at every iteration moment, adaptively updating discrete particle swarm parameters omega by using the population state of the chromosome at the moment according to the trained Q table, and selecting omega which is most beneficial to the chromosome of the current population; set up [0,1]The random number is compared with omega, and the mode of goods taking variation and yard selection variation is adopted; selecting a POX chromosome crossing mode, and crossing part of chromosomes of a storage yard and the chromosomes of goods to be picked up respectively; selecting a two-point crossing mode, and respectively crossing part of chromosomes of the storage yard and the chromosomes of the goods to be picked; chromosome updating operation is carried out, the fitness value of each particle in the current population is evaluated and calculated, and the optimal P of the individual position is updated by comparison _i And a global optimum position P _g The method comprises the steps of carrying out a first treatment on the surface of the Judging whether the iteration times meet the conditionIf yes, stopping iteration and outputting a global optimal position P _g And taking the result as an optimal scheduling result.

As a further illustration of the invention: the operation data related to the analyte logistics park vehicle are specifically as follows: according to a storage yard mixing rule, analyzing the residence time of the vehicle history in the storage yards, and giving the picking time length distribution T of the vehicles in each storage yard _i 。

As a further illustration of the invention: the encoding of the yard allocation portion specifically includes: setting the chromosome length of the yard allocation part as the sum of the cargo types of all the vehicle cargo taking cargoes, wherein each genetic bit is expressed by an integer, the genetic bit is sequentially arranged according to the vehicles and the cargo taking cargoes of the vehicles, and each integer represents the yard number which is allocated by the current cargo taking cargoes of the vehicles and meets the requirement.

As a further illustration of the invention: the coding of the goods picking part is specifically as follows: the chromosome length of the goods taking part is set to be the sum of the goods types of all the goods taking of the vehicle, each gene is directly encoded by the vehicle number, and the sequence of the vehicle numbers indicates the sequence of the goods taking of the vehicle.

As a further illustration of the invention: the setting global search determines an initial population specifically as follows: setting an array with the length equal to the number of the storage yards, wherein the value on each array corresponds to the working time on the corresponding storage yard, randomly selecting a vehicle i from a vehicle picking set, starting from the first goods picking of the current vehicle i, adding the time corresponding to the storage yard of the current meeting the target goods picking to the time corresponding to the array, selecting the shortest time from the time to be used as the current target storage yard, updating the array, and so on until all the vehicles pick.

As a further illustration of the invention: the setting local search determines an initial population specifically as follows: setting an array with the length equal to the number of the storage yards, wherein the value on each array corresponds to the working time on the corresponding storage yard, randomly selecting a vehicle i from a vehicle picking set, starting from the first goods picking of the current vehicle i, adding the time corresponding to the storage yard of the current meeting the target goods picking, selecting the shortest time from the time to be used as the current target storage yard, resetting the array to 0 when the selection of the vehicle i is finished, and selecting the next vehicle until the goods picking of all vehicles is finished.

As a further illustration of the invention: the initial population is determined by setting random search specifically as follows: randomly selecting a vehicle i from a vehicle collection, randomly sequencing the target cargo taking sequence of the vehicle i, randomly selecting a storage yard from a storage yard collection meeting the cargo requirement for the sequenced target cargo, and the like until all vehicles are completely taken.

As a further illustration of the invention: the specific steps of initializing a state set and an action set in reinforcement learning, selecting actions by using a greedy strategy, and calculating a report according to state and fitness values of population chromosomes at the current moment are as follows: initializing a state set in reinforcement learning, setting the value range of the state set to be 0.5 and 1, and dividing the value range into 10 sections as state1 and state2 … … state10 on average; ) Initializing an action set in reinforcement learning, setting the value range of a state set to be [0,1], and dividing the value range into 10 intervals on average, wherein the 10 intervals are action1, action2 … … action10; and selecting actions according to states by using a greedy strategy, and calculating the reward according to states and fitness of the population chromosomes at the current moment.

As a further illustration of the invention: the Sarsa algorithm updates the Q table, judges whether the current Sarsa update times are larger than the conversion times, if so, starts the Qlearning algorithm to update the Q table, and selects the specific steps of actions by using a greedy strategy: setting the algorithm conversion times t=1, wherein the algorithm conversion times are t=25; the Sarsa algorithm is based on Q (s _t ，a _t )＝(1-a)Q(s _t ，a _t )+a(r _t+1 +γQ(s _t+1 ，a _t+1 ) Updating the Q table; judging whether the current t is larger than the conversion times, if so, starting chromosome updating operation, and if not, starting to select a POX chromosome crossing mode; QLearning algorithm is based on Q (s _t ，a _t )＝(1-a)Q(s _t ，a _t )+a(r _t+1 +γmax _a Q(s _t+1 ，a _t+1 ) For a pair ofThe Q table is updated and actions are selected using a greedy strategy.

As a further illustration of the invention: the random numbers between the [0,1] are set, and compared with omega, the mode of adopting goods taking variation and yard selection variation by utilizing the size of the random numbers is specifically as follows: the method comprises the steps of respectively carrying out mutation on a chromosome part of goods to be picked, carrying out mutation on a storage yard selection part of a vehicle, and selecting one of the two types of mutation as a chromosome mutation operation by using a roulette-to-gambling mode.

After the technical scheme is adopted, the invention has the beneficial effects that: the logistics garden vehicle goods taking and dispatching method has the following advantages:

1. omega in the discrete particle swarm algorithm is intelligently updated by using the reinforcement learning algorithm, so that the solving efficiency and the solving speed are improved;

2. the chromosome state set is divided, the average fitness of the population and the diversity of the population are used as states to reflect the state of the whole population, the quality of the whole population is improved, and excellent individuals are easier to obtain;

3. the Sarsa algorithm is combined with the QLearning algorithm. The Sarsa algorithm is an on-pole algorithm, and the result obtained by the algorithm is more conservative than QLearing, the convergence speed is faster, but the learning effect is inferior to QLearing. The Qlearning algorithm is an off-policy algorithm, and the solving result of the algorithm has larger potential than that of Sarsa, has better global searching capability, and has slower convergence speed. The two algorithms are combined, the learning effect and the convergence rate of the Sarsa algorithm are inherited at the initial stage of the algorithm, and the better optimization capability of Qlearning is inherited at the later stage of the algorithm

4. The vehicle scheduling algorithm provided by the method can obviously reduce the waiting time of the vehicle in the picking process, improve the logistics throughput, indirectly shorten the maximum operation time of the storage yard and meet the work efficiency requirement of the logistics park storage yard.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained according to these drawings without inventive faculty for a person skilled in the art.

Fig. 1 is an algorithmic schematic of the present invention.

Fig. 2 is a schematic flow chart of the algorithm of the present invention.

Fig. 3 is a graph comparing the results of algorithm iterations of vehicle 10 in the present invention.

Fig. 4 is a graph comparing the results of algorithm iterations of vehicle 40 in the present invention.

Fig. 5 is a view showing the scheduling result of the vehicle 10 according to the present invention.

Fig. 6 is a view showing the scheduling result of the vehicle 40 in the present invention.

Detailed Description

Referring to fig. 1 to 6, the technical scheme adopted in this embodiment is as follows: the method comprises the following specific steps:

s1, analyzing operation data related to vehicles in a logistics park, carrying out K-means clustering on target cargoes based on the time of entering and exiting the park of each vehicle and the target cargoes corresponding to the vehicles, analyzing the stay time of the vehicle history in the storage yard according to the storage yard mixing rule, and giving out the cargoes taking time distribution T of the vehicles in each storage yard _i 。

S2, coding a storage yard allocation part, setting the chromosome length of the storage yard allocation part as the sum of the cargo types of all the vehicle cargo taking cargoes, wherein each gene position is expressed by an integer, and sequentially arranging the cargo types according to the vehicles and the vehicles, and each integer represents the storage yard number which is allocated by the current cargo taking cargoes and meets the requirement.

S3, coding the goods taking part, setting the chromosome length of the goods taking part as the sum of the goods types of all the goods taking of the vehicle, directly coding each gene by using the vehicle number, and indicating the sequence of the goods taking of the vehicle by the sequence of the vehicle number.

S4, setting global search to determine an initial population, setting an array with the length equal to the number of storage yards, wherein the value on each array corresponds to the working time on the corresponding storage yard, randomly selecting a vehicle i from a vehicle collection, starting from the first goods taking of the current vehicle i, adding the corresponding time in the array to the goods taking time of the storage yard currently meeting the target goods taking, selecting the shortest time from the shortest time as the current target storage yard, updating the array, and so on until all vehicles are completely taken.

S5, setting local search to determine an initial population, setting an array with the length equal to the number of storage yards, wherein the value on each array corresponds to the working time on the corresponding storage yard, randomly selecting a vehicle i from a vehicle collection, starting from the first goods taking of the current vehicle i, adding the corresponding time in the array to the goods taking time of the storage yard currently meeting the target goods taking, selecting the shortest time from the time as the current target storage yard, resetting the array to 0 when the selection of the vehicle i is finished, and selecting the next vehicle until the goods taking of all vehicles is finished.

S6, setting random search to determine an initial population, randomly selecting a vehicle i from a vehicle picking set, randomly sequencing the target goods picking sequence of the vehicle i, randomly selecting a storage yard in a storage yard set meeting the goods requirement for the sequenced target goods, and the like until all vehicles are picked.

And S7, setting the maximum iteration number of the algorithm as iter, wherein the iter=50.

S8, initializing a state set and an action set in reinforcement learning, selecting actions by using a greedy strategy, and calculating reward according to states and fitness values of population chromosomes at the current moment:

1) Initializing a state set in reinforcement learning, setting the value range of the state set to be 0.5 and 1, and dividing the value range into 10 sections as state1 and state2 … … state10 on average;

2) Initializing an action set in reinforcement learning, setting the value range of a state set to be [0,1], and dividing the value range into 10 intervals on average, wherein the 10 intervals are action1, action2 … … action10;

3) And selecting actions according to states by using a greedy strategy, and calculating the reward according to states and fitness of the population chromosomes at the current moment.

S9, updating the Q table by the Sarsa algorithm, judging whether the current Sarsa updating times are larger than the conversion times, if so, starting the Qlearning algorithm to update the Q table, and selecting actions by using a greedy strategy:

1) Setting the algorithm conversion times t=1, wherein the algorithm conversion times are t=25;

2) The Sarsa algorithm is based on Q (s _t ，a _t )＝(1-a)Q(s _t ，a _t )+a(r _t+1 +γQ(s _t+1 ，a _t+1 ) Updating Q table, wherein Q(s) _t ，a _t ) Representing state s _t By action a _t And a represents the soft update weight, r _t+1 Rewards representing time steps t+1, gamma represents discount rate, t=t+1

3) Judging whether the current t is larger than the conversion times, if so, starting S14, and if not, starting S12;

4) The Qkearning algorithm is based on Q (s _t ，a _t )＝(1-a)Q(s _t ，a _t )+a(r _t+1 +γmax _a Q(s _t+1 ，a _t+1 ) Updating a Q table and selecting actions using a greedy strategy, where Q(s) _t ，a _t ) Representing state s _t By action a _t And a represents the soft update weight, r _t+1 Representing the rewards of time steps t+1, gamma represents the discount rate, t=t+1.

And S10, judging the quality of the current population at every iteration moment, adaptively updating the discrete particle swarm parameters omega by using the current population state of the chromosome according to the trained Q table, and selecting the omega which is most beneficial to the chromosome of the current population.

S11, setting random numbers between [0,1], and using the size of the random numbers to compare with omega, adopting a mode of taking goods variation and yard selection variation to respectively perform variation on a chromosome part of the taking goods and variation on a yard selection part of a vehicle, and using a mode of roulette on a roulette to select one of the two variations as a chromosome variation operation.

S12, selecting a POX chromosome crossing mode, and crossing partial chromosomes of the storage yard and the goods taking chromosomes respectively.

S13, selecting a two-point crossing mode, and respectively crossing partial chromosomes of the storage yard and the goods taking chromosomes.

S14, performing chromosome updating operation, evaluating and calculating fitness value of each particle in the current population, and updating the optimal P of the individual position by comparing _i And a global optimum position P _g 。

S15, judging whether the iteration times meet the conditions, if so, stopping iteration, and outputting a global optimal position P _g And taking the result as an optimal scheduling result.

The working principle of the invention is as follows: based on the combination of a Sarsa algorithm and a Qlearning algorithm in reinforcement learning, the key parameter omega of a discrete particle swarm algorithm is adaptively selected, firstly, the time of entering and exiting a round of each vehicle and the target goods for taking goods corresponding to the vehicle are analyzed, K-means clustering is carried out on the target goods, the stay time of the vehicle history in a storage yard is analyzed according to a storage yard mixing rule, the goods taking time distribution of the vehicle in each storage yard is given out, secondly, an initial population with better quality is generated by combining global searching, local searching and random searching, finally omega and omega in the discrete particle swarm algorithm are intelligently updated through reinforcement learning, the diversity of particle searching is determined, and when omega is large, the global searching capability is strong and the local searching capability is weak; and when the omega value is smaller, the local searching capability is strong, the global searching capability is weak, and the self-adaptive parameter omega is selected by using the reinforcement learning intelligent algorithm, so that the scheduling problem of the vehicle yard is solved in an efficient manner.

The foregoing is merely illustrative of the present invention and not restrictive, and other modifications and equivalents thereof may occur to those skilled in the art without departing from the spirit and scope of the present invention.

Claims

1. The logistics park vehicle goods taking and dispatching method based on the self-adaptive discrete particle swarm algorithm is characterized by comprising the following steps of: the method comprises the following specific steps:

s1, analyzing operation data related to logistics garden vehicles, and carrying out K-means clustering on target cargoes based on garden entering time and garden exiting time of each vehicle and target cargo taking corresponding to the vehicles;

s2, coding a storage yard distribution part;

s3, encoding the goods taking part;

s4, setting global searching to determine an initial population;

s5, setting local search to determine an initial population;

s6, setting random search to determine an initial population;

s7, setting the maximum iteration number of the algorithm as item;

s8, initializing a state set and an action set in reinforcement learning, selecting actions by using a greedy strategy, and calculating a reward according to states and fitness values of the population chromosomes at the current moment;

s9, updating the Q table by the Sarsa algorithm, judging whether the current Sarsa updating times are larger than the conversion times, if so, starting the Qlearning algorithm to update the Q table, and selecting actions by using a greedy strategy;

s10, judging the quality of the current population at every iteration moment, adaptively updating the discrete particle swarm parameters omega by using the current population state of the chromosome according to the trained Q table, and selecting omega which is most beneficial to the chromosome of the current population;

s11, setting random numbers between [0,1], and adopting a mode of goods taking variation and yard selection variation by utilizing the size of the random numbers compared with omega;

s12, selecting a POX chromosome crossing mode, and respectively crossing part of chromosomes of a storage yard and the chromosomes of goods to be picked;

s13, selecting a two-point crossing mode, and respectively crossing part of chromosomes of the storage yard and the chromosomes of the goods to be picked;

s14, performing chromosome updating operation, evaluating and calculating fitness value of each particle in the current population, and updating the optimal P of the individual position by comparing _i And a global optimum position P _g ；

S15, judging whether the iteration times meet the conditions, if so, stopping iteration and outputting the wholeLocal optimum position P _g And taking the result as an optimal scheduling result.

2. The logistics park vehicle pickup scheduling method based on the adaptive discrete particle swarm algorithm of claim 1, wherein the method comprises the following steps: the S1 specifically comprises the following steps: according to a storage yard mixing rule, analyzing the residence time of the vehicle history in the storage yards, and giving the picking time length distribution T of the vehicles in each storage yard _i 。

3. The logistics park vehicle pickup scheduling method based on the adaptive discrete particle swarm algorithm of claim 1, wherein the method comprises the following steps: the step S2 is specifically as follows: setting the chromosome length of the yard allocation part as the sum of the cargo types of all the vehicle cargo taking cargoes, wherein each genetic bit is expressed by an integer, the genetic bit is sequentially arranged according to the vehicles and the cargo taking cargoes of the vehicles, and each integer represents the yard number which is allocated by the current cargo taking cargoes of the vehicles and meets the requirement.

4. The logistics park vehicle pickup scheduling method based on the adaptive discrete particle swarm algorithm of claim 1, wherein the method comprises the following steps: the step S3 is specifically as follows: the chromosome length of the goods taking part is set to be the sum of the goods types of all the goods taking of the vehicle, each gene is directly encoded by the vehicle number, and the sequence of the vehicle numbers indicates the sequence of the goods taking of the vehicle.

5. The logistics park vehicle pickup scheduling method based on the adaptive discrete particle swarm algorithm of claim 1, wherein the method comprises the following steps: the step S4 specifically comprises the following steps: setting an array with the length equal to the number of the storage yards, wherein the value on each array corresponds to the working time on the corresponding storage yard, randomly selecting a vehicle i from a vehicle picking set, starting from the first goods picking of the current vehicle i, adding the time corresponding to the storage yard of the current meeting the target goods picking to the time corresponding to the array, selecting the shortest time from the time to be used as the current target storage yard, updating the array, and so on until all the vehicles pick.

6. The logistics park vehicle pickup scheduling method based on the adaptive discrete particle swarm algorithm of claim 1, wherein the method comprises the following steps: the step S5 specifically comprises the following steps: setting an array with the length equal to the number of the storage yards, wherein the value on each array corresponds to the working time on the corresponding storage yard, randomly selecting a vehicle i from a vehicle picking set, starting from the first goods picking of the current vehicle i, adding the time corresponding to the storage yard of the current meeting the target goods picking, selecting the shortest time from the time to be used as the current target storage yard, resetting the array to 0 when the selection of the vehicle i is finished, and selecting the next vehicle until the goods picking of all vehicles is finished.

7. The logistics park vehicle pickup scheduling method based on the adaptive discrete particle swarm algorithm of claim 1, wherein the method comprises the following steps: the step S6 specifically comprises the following steps: randomly selecting a vehicle i from a vehicle collection, randomly sequencing the target cargo taking sequence of the vehicle i, randomly selecting a storage yard from a storage yard collection meeting the cargo requirement for the sequenced target cargo, and the like until all vehicles are completely taken.

8. The logistics park vehicle pickup scheduling method based on the adaptive discrete particle swarm algorithm of claim 1, wherein the method comprises the following steps: the specific steps of the S8 are as follows:

9. The logistics park vehicle pickup scheduling method based on the adaptive discrete particle swarm algorithm of claim 1, wherein the method comprises the following steps: the specific steps of the S9 are as follows:

2) The Sarsa algorithm is based on Q (s _t ，a _t )＝(1-a)Q(s _t ，a _t )+a(r _t+1 +γQ(s _t+1 ，a _t+1 ) Updating the Q table;

4) QLearning algorithm is based on Q (s _t ，a _t )＝(1-a)Q(s _t ，a _t )+a(r _t+1 +γmax _a Q(s _t+1 ，a _t+1 ) Update the Q table and select actions using greedy policies.

10. The logistics park vehicle pickup scheduling method based on the adaptive discrete particle swarm algorithm of claim 1, wherein the method comprises the following steps: the step S11 specifically comprises the following steps: the method comprises the steps of respectively carrying out mutation on a chromosome part of goods to be picked, carrying out mutation on a storage yard selection part of a vehicle, and selecting one of the two types of mutation as a chromosome mutation operation by using a roulette-to-gambling mode.