CN111815032A

CN111815032A - Automatic guided vehicle scheduling method and related device thereof

Info

Publication number: CN111815032A
Application number: CN202010561452.2A
Authority: CN
Inventors: 周宏宾; 任宇鹏; 金亦东; 卢维
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Huaray Technology Co Ltd
Priority date: 2020-06-18
Filing date: 2020-06-18
Publication date: 2020-10-23

Abstract

The application provides an automated guided vehicle scheduling method and a related device thereof. The automatic guided vehicle dispatching method comprises the following steps: inputting the current paths and the paths of the next stage of the plurality of automatic guided vehicles into a reinforcement learning network to obtain the priorities of the plurality of automatic guided vehicles; and according to the high-low sequence of the priorities of the automatic guided vehicles, sequentially determining whether to send the path of the next stage of the automatic guided vehicle to the automatic guided vehicle. The method and the device can learn the optimal rule in operation by means of the reinforcement learning network, save the complex work of manually designing the passing rule, and improve the scheduling efficiency.

Description

Automatic guided vehicle scheduling method and related device thereof

Technical Field

The present application relates to the field of automated guided vehicle technology, and in particular, to a method and apparatus for automated guided vehicle scheduling.

Background

An Automated Guided Vehicle (Automated Guided Vehicle) system is an important component of digital production workshops and intelligent storage logistics, and along with the continuous development of related industries, the Automated Guided Vehicle has increasingly wide industrial application, and in the same Automated Guided Vehicle system, the number and the types of the Automated Guided vehicles are also increasing. However, the space available for the automated guided vehicles to travel in the workshop or warehouse is often limited, so that there is a competitive relationship between the automated guided vehicles with respect to the drivable road resources, which requires the automated guided vehicle system to control and schedule the operation of all the automated guided vehicles, so as to ensure the safe and efficient operation of the system.

The automated guided vehicle system generally adopts manually designed traffic rules to control and schedule all automated guided vehicles, such as a straight-ahead priority rule (turning for straight-ahead), a near-ahead priority rule (first come first get), a load priority rule (empty vehicle for load vehicle), and the like. These manually designed rules are problematic, for example, as automated guided vehicle systems become more complex and traffic regulations become more and more concerned, the rules become more and more difficult to design.

Disclosure of Invention

The application provides an automatic guided vehicle dispatching method and a related device thereof, which can learn the optimal rule in operation by means of a reinforcement learning network, thereby saving the complex work of manually designing the traffic rule and improving the dispatching efficiency.

In order to achieve the above object, the present application provides an automated guided vehicle scheduling method, including:

inputting the current paths and the paths of the next stage of the plurality of automatic guided vehicles into a reinforcement learning network to obtain the priorities of the plurality of automatic guided vehicles;

and according to the high-low sequence of the priorities of the automatic guided vehicles, sequentially determining whether to send the path of the next stage of the automatic guided vehicle to the automatic guided vehicle.

The method comprises the following steps of inputting the current paths and the paths of the next stage of the automatic guided vehicles into a reinforcement learning network, wherein the method comprises the following steps: determining a global path for each automated guided vehicle based on the task for each automated guided vehicle; dividing the global path of each automatic guided vehicle into at least one local path;

inputting the current path and the path of the next stage of a plurality of automatic guided vehicles into a reinforcement learning network, wherein the reinforcement learning network comprises the following steps: inputting the current local paths and the local paths of the next stage of the automatic guided vehicles into a reinforcement learning network;

sequentially determining whether to send the path of the next stage of the automated guided vehicle to the automated guided vehicle, including: and sequentially confirming whether the local path of the next stage of the automatic guided vehicle is sent to the automatic guided vehicle.

The method for inputting the reinforcement learning network by taking the current local paths of the automatic guided vehicles as the input of the reinforcement learning network comprises the following steps:

and taking the current local paths and the incomplete global paths of the automatic guided vehicles as the input of the reinforcement learning network, wherein the incomplete global paths of the automatic guided vehicles comprise the local paths of the next stage of the automatic guided vehicles.

obtaining a grid map of each automated guided vehicle based on the size of each automated guided vehicle, the current local path, the incomplete global path and the size of the load of each automated guided vehicle, wherein pixel points of first pixel values of the grid map of the automated guided vehicle are regions corresponding to the current local path of the automated guided vehicle, and pixel points of second pixel values of the grid map are regions corresponding to the incomplete global path of the automated guided vehicle; taking the grid graphs of a plurality of automatic guided vehicles as the input of the reinforcement learning network; or the like, or, alternatively,

and forming a characteristic matrix based on the sizes of the automatic guided vehicles, the current local path, the incomplete global path and the sizes of the loads of the automatic guided vehicles, and taking the characteristic matrix as the input of the reinforcement learning network.

Wherein, the method further comprises:

counting the efficiency of dispatching the automatic guided vehicle;

to optimize the reinforcement learning network with efficiency.

The reinforcement learning network comprises a priority computing network and a value function network, wherein the input of the priority computing network is a grid graph of a plurality of automatic guided vehicles, the output of the priority computing network is the priority of the plurality of automatic guided vehicles, the input of the value function network is the grid graph and the priority of the plurality of automatic guided vehicles, and the output of the value function network is a predicted value of the task completion quantity in unit time;

the efficiency of counting and dispatching the automatic guided vehicle comprises the following steps: counting the real value of the task amount completed in unit time;

optimizing a reinforcement learning network with efficiency, comprising: optimizing parameters of a value function network by taking a square error between a predicted value and a true value of the task amount completed in unit time as a loss;

and taking the gradient of the automatic guided vehicle priority in the value function network to the task completion amount in unit time as the gradient of the automatic guided vehicle priority of the priority calculation network so as to optimize the parameters of the priority calculation network.

Wherein, confirm whether to send the local route of automated guided vehicle next stage to automated guided vehicle in proper order, include:

determining whether nodes in a local path of the next stage of the automatic guided vehicle are locked by other automatic guided vehicles;

if the local path is locked, determining not to send the local path of the next stage to the automatic guided vehicle;

and if the local path is not locked, sending the local path of the next stage to the automatic guided vehicle, and enabling the node of the local path of the next stage to be locked.

Wherein the step of causing the node of the local path of the next phase of transmission to be locked comprises:

when the automatic guided vehicle receives the local path of the next stage, locking the node of the local path of the next stage; or the like, or, alternatively,

and when the local path of the next stage is sent to the automatic guided vehicle, locking the node of the local path of the next stage.

Wherein, the method further comprises:

and acquiring information of locking nodes and unlocking nodes of the automatic guided vehicles from the plurality of automatic guided vehicles.

Wherein, divide into at least one local route the global route of every automated guided vehicle, include:

the global path of each automated guided vehicle is divided into at least one local path according to the length of the global path and/or the position of the turning point.

Wherein determining a global path for each automated guided vehicle based on the task for each automated guided vehicle comprises: and determining a global path with the optimal cost for each automatic guided vehicle by using an A-x algorithm based on the starting point and the end point of the task of each automatic guided vehicle.

In order to achieve the above object, the present application provides an automated guided vehicle scheduling apparatus, which includes a memory and a processor; the memory has stored therein a computer program for execution by the processor to perform the steps of the above method.

To achieve the above object, the present application provides a readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the steps of the above method.

The method comprises the following steps: the current paths and the next-stage paths of the automatic guided vehicles are processed through the reinforcement learning network to obtain the priorities of the automatic guided vehicles, and whether the next-stage paths of the automatic guided vehicles are sent to the automatic guided vehicles or not is determined in sequence according to the sequence of the priorities of the automatic guided vehicles.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of a first embodiment of an automated guided vehicle scheduling method according to the present application;

fig. 2 is a schematic flow chart of a second embodiment of the automated guided vehicle scheduling method according to the present application;

FIG. 3 is a schematic diagram of global path segmentation in the automated guided vehicle scheduling method of the present application;

FIG. 4 is a schematic diagram of a grid image in the automated guided vehicle scheduling method of the present application;

fig. 5 is a schematic diagram of a node for confirming whether an automated guided vehicle passes or not in the automated guided vehicle scheduling method according to the present application;

FIG. 6 is a schematic diagram of a global path in the automated guided vehicle scheduling method of the present application;

FIG. 7 is a schematic diagram of a priority computing network in the automated guided vehicle scheduling method of the present application;

FIG. 8 is a schematic diagram of a median function network for the automated guided vehicle scheduling method of the present application;

FIG. 9 is a schematic structural diagram of the automated guided vehicle dispatching device of the present application;

FIG. 10 is a schematic view of the automated guided vehicle dispatch system of the present application;

FIG. 11 is a schematic structural diagram of an embodiment of a storage medium readable by the present application.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present application, the automated guided vehicle scheduling method and the related apparatus provided in the present application are further described in detail below with reference to the accompanying drawings and the detailed description.

Referring to fig. 1, fig. 1 is a schematic flow chart of a first embodiment of an automated guided vehicle scheduling method according to the present application. The automatic guided vehicle scheduling method of the embodiment comprises the following steps.

S101: and inputting the current paths and the paths of the next stage of the plurality of automatic guided vehicles into the reinforcement learning network to obtain the priorities of the plurality of automatic guided vehicles.

Inputting the current paths and the next-stage paths of the automatic guided vehicles into a reinforcement learning network, and calculating the current paths and the next-stage paths of the automatic guided vehicles by the reinforcement learning network to obtain the priorities of the automatic guided vehicles.

The current path of the automated guided vehicle may include a path which has been sent to the automated guided vehicle and is being traveled by the automated guided vehicle. It will be appreciated that the current path of the automated guided vehicle may also include paths that have been sent to the automated guided vehicle and that have not been traveled by the automated guided vehicle. The path of the next stage of the automated guided vehicle refers to a path to be sent to the automated guided vehicle at the next moment in sequence.

Further, when the current path of the automated guided vehicle only includes the path which has been sent to the automated guided vehicle and is being traveled by the automated guided vehicle, the path of the next stage of the automated guided vehicle is the path to be sent to the automated guided vehicle and the automated guided vehicle is to be traveled by the automated guided vehicle at the next moment in the sequence.

S102: and sequentially determining whether to send the path of the next stage of each automatic guided vehicle to the corresponding automatic guided vehicle according to the high-low sequence of the priority of the automatic guided vehicles.

For example, if the priority levels of the 5 automated guided vehicles A, B, C, D and E are respectively the second lowest, the middle, the highest, the lowest and the second highest, it is determined whether the route of the next stage of the automated guided vehicle C can be sent to the corresponding automated guided vehicle, it is determined whether the route of the next stage of the automated guided vehicle E can be sent to the corresponding automated guided vehicle, it is determined whether the route of the next stage of the automated guided vehicle B can be sent to the corresponding automated guided vehicle, it is determined whether the route of the next stage of the automated guided vehicle a can be sent to the corresponding automated guided vehicle, and it is determined whether the route of the next stage of the automated guided vehicle D can be sent to the corresponding automated guided vehicle.

In the embodiment, the current paths and the next-stage paths of the automatic guided vehicles are processed through the reinforcement learning network to obtain the priorities of the automatic guided vehicles, and then whether the next-stage paths of the automatic guided vehicles are sent to the automatic guided vehicles is determined in sequence according to the sequence of the priorities of the automatic guided vehicles.

Further, as shown in fig. 2, the present embodiment divides the global route planned based on each task of each automated guided vehicle into at least one local route to improve the road surface utilization efficiency. The automatic guided vehicle scheduling method of the embodiment comprises the following steps.

S201: a global path for each automated guided vehicle is determined based on the task for each automated guided vehicle.

Path planning may be performed based on information about the tasks of each automated guided vehicle to determine a global path for each automated guided vehicle.

The related information of the task comprises a starting point of the task and an end point of the task. In addition, the related information of the task may further include information such as priority of the task. It is understood that the start point of a task may refer to the end point of the last task received by the corresponding automated guided vehicle in chronological order. For example, the vehicle a receives the task a first and then receives the task B, and then the end point of the task a can be used as the starting point of the task B. In other embodiments, due to the path planning algorithm, there may be situations where a task delegated to the automated guided vehicle is cancelled, taking the newly received task as the task to be executed at the next moment in the sequence, and thus the starting point of the newly received task may be the end point of the task being executed by the automated guided vehicle.

In one implementation, any path from the task start point to the task end point can be selected as the global path of each automated guided vehicle.

In another implementation, a global path with the optimal cost may be obtained based on the relevant information of the task of each automated guided vehicle by using a path planning algorithm. The path planning algorithm comprises an A-x algorithm, a D algorithm and the like.

In yet another implementation, a shortest path planning algorithm may be utilized to obtain a global path with the shortest distance based on information about tasks of each automated guided vehicle.

It will be appreciated that the global path is for each task of each automated guided vehicle, i.e. each task of each automated guided vehicle has a respective global path.

S202: the global path of each automated guided vehicle is divided into at least one local path.

In an implementation, a global path may be partitioned into a predetermined number of local paths. For example, each global path is divided into 5 local paths, assuming that the predetermined number is 5. Of course, the predetermined number may be 4, 7, or other values. Further, the lengths of the local paths of the same global path may be substantially equal, or the number of nodes on the local paths of the same global path may be equal.

In another implementation manner, the global path may be segmented according to a set path segmentation length to obtain at least one local path.

In another implementation manner, the global path may be segmented according to the set number of path segmentation nodes to obtain at least one local path. For example, as shown in fig. 3, the global path includes 13 nodes A, B, C, D … … M, and if the set number of path division nodes is 6, the global path may be divided into an a → F local path, an F → K local path, and a K → M local path, where the a → F local path and the F → K local path include 6 nodes, and the K → M local path includes 3 nodes.

In yet another implementation, the global path is partitioned into at least one local path based on the turn point locations of the global path. Specifically, the global path is divided by taking the turning points as the dividing points, for example, as shown in fig. 3, the global path includes 4 turning points of 1, 2, 3, and 4, and the global path is divided by taking the turning points as the dividing points, and the global path may be divided into an a → 1 local path, a 1 → 2 local path, a 2 → 3 local path, and a 3 → 4 local path.

In another implementation manner, the global path may be further segmented based on a set maximum length of the local path to obtain at least one local path.

It is to be understood that the global path may be split into at least two local paths.

Of course, the global path may also be divided into a local path, i.e. the global path is not divided, and the global path may be used as the local path.

S203: and inputting the current local paths and the next-stage local paths of the automatic guided vehicles into the reinforcement learning network to obtain the priorities of the automatic guided vehicles.

In an implementation scenario, the current local path and the local path of the next stage may belong to the same global path.

In another implementation scenario, the current local path and the local path of the next stage may also belong to two different global paths, respectively. For example, the current local path of the automated guided vehicle is the local path of task a that the automated guided vehicle is performing, and the local path of the next stage of the automated guided vehicle is the local path of task B that the automated guided vehicle is to perform next in sequence.

Furthermore, all the local paths which are not sent to the automatic guided vehicle in the global path to which the local path of the next stage belongs can be combined into the global path which is not completed. It is understood that the global path that is not completed includes the local path of the next stage. And then inputting the current local path and the incomplete global path into a reinforcement learning network so that the reinforcement learning network sends the current local path and the incomplete global path from the global direction to optimize the traffic efficiency of the automatic guided vehicle.

When the automated guided vehicle travels along the current local path, the influence range of the automated guided vehicle is not limited to the nodes on the current local path, and may influence the nodes adjacent to the nodes on the current local path, and the influence range is related to the size of the automated guided vehicle, the size of the load, and the interval of the map nodes. Therefore, in order to further improve the scheduling efficiency and the scheduling accuracy of the scheduling method, the sizes of the automated guided vehicles and the sizes of the loads of the automated guided vehicles, as well as the current local path and the incomplete global path, may be input to the reinforcement learning network, so that the reinforcement learning network may determine the priorities of the automated guided vehicles based on the sizes of the automated guided vehicles, the sizes of the loads of the automated guided vehicles, the current local path and the incomplete global path.

It can be understood that the information of the current path and the path of the next stage of each automated guided vehicle can be integrated into the grid image or the feature matrix of each automated guided vehicle, then the grid image or the feature matrix of a plurality of automated guided vehicles is input into the reinforcement learning network, and the grid image or the feature matrix is adopted to express the working conditions of the plurality of automated guided vehicles, so that on one hand, the integrity of the information is ensured, on the other hand, the more mature technology such as the convolutional neural network can be used for processing the input of the complicated working conditions of the plurality of automated guided vehicles, and the operation efficiency can be improved.

For example, a feature matrix is configured based on the sizes of the plurality of automated guided vehicles, the current path and the path of the next stage, and the sizes of the loads of the plurality of automated guided vehicles, and the feature matrix is used as an input of the reinforcement learning network.

Further, the raster image of each automated guided vehicle of the present application represents the scanning surface on which the automated guided vehicle travels, such as the raster image shown in fig. 4, with different pixel values. Specifically, the pixel point of the first pixel value of the raster image (the pixel point with the deepest color shown in fig. 4) is a sweeping surface corresponding to the current local path of the automated guided vehicle, and the pixel point of the second pixel value of the raster image (the pixel point with the middle color tone shown in fig. 4) is a sweeping surface corresponding to the incomplete global path of the automated guided vehicle. Further, the pixel values of the pixel points (the lightest-colored pixel points shown in fig. 4) in the area that is not affected by the automated guided vehicle in the raster image can be unified into the third pixel value.

It can be understood that the information related to one automated guided vehicle may be represented by one single-channel gray-scale grid image, and the grid images of a plurality of automated guided vehicles may be combined to obtain a multi-channel grid image, and the combined multi-channel grid image may be input to the reinforcement learning network. In addition, the raster images of a plurality of automated guided vehicles may be combined in the channel direction to obtain a multichannel raster image.

Further, the algorithm for processing the grid image or the feature matrix in the reinforcement learning network to obtain the priorities of the automatic guided vehicles can be a convolutional neural network, the grid image or the feature matrix is processed by the convolutional neural network to output vehicle priority vectors, and the length of the vectors is the same as the number of the automatic guided vehicles; and sequentially dispatching vehicles to pass according to the priority of the automatic guided vehicles. The forward operation of the convolutional neural network is quick, and the problem of the real-time property of a genetic algorithm does not exist.

S204: and sequentially determining whether the local path of the next stage of each automatic guided vehicle can be sent to the corresponding automatic guided vehicle according to the high-low sequence of the priority of the automatic guided vehicles.

As shown in fig. 5, each road resource node is assigned with a mutex lock, when an automated guided vehicle passes through a certain road node, the state of the mutex lock of the node needs to be queried first, and if the mutex lock is in a locked state, the automated guided vehicle cannot obtain the node and needs to wait; if the mutual exclusion lock is in an open state, the automatic guided vehicle acquires the node and locks the mutual exclusion lock to prevent other automatic guided vehicles from entering the node; when the automatic guided vehicle drives away from the node, the mutual exclusion lock is released, so that the automatic guided vehicle can be used by other automatic guided vehicles. Under the condition that the nodes are distributed with mutual exclusion locks, whether the local path of the next stage of the automatic guided vehicle can be sent to the corresponding automatic guided vehicle can be determined by confirming whether the nodes in the local path of the next stage of the automatic guided vehicle are locked by other automatic guided vehicles. Specifically, if the node in the local path of the next stage of the automated guided vehicle is locked by other automated guided vehicles, determining not to send the local path of the next stage to the automated guided vehicle; and if the nodes in the local path of the next stage of the automatic guided vehicle are not locked by other automatic guided vehicles, the local path of the next stage is sent to the automatic guided vehicle, and the sent local path of the next stage is locked.

Because the global optimal path is often very long, if the global optimal path is uniformly locked and protected, the overall efficiency of the system is greatly limited, but under the condition that the global path is divided into at least two local paths, the scheduling device sends the local path of the next stage to the automatic guided vehicle instead of sending the global path to the automatic guided vehicle, so that the automatic guided vehicle can only lock the received local path and cannot lock all nodes on the global path, the nodes occupied by each automatic guided vehicle are reduced, the use of the road surface is optimized, and the operating efficiency of the automatic guided vehicle is improved.

In one implementation, when the dispatching device sends the local path of the next stage to the automated guided vehicle, the dispatching device locks the node of the local path of the next stage so as to enable the local path of the next stage to be locked.

In another implementation manner, the scheduling device sends the local path of the next stage to the automated guided vehicle, and when the automated guided vehicle receives the local path of the next stage, the automated guided vehicle locks the node of the local path of the next stage, so that the local path of the next stage is locked. Therefore, by means of the mutual exclusion lock and the method for determining the sequence of the sending paths based on the priority, the paths sent to the automatic guided vehicles can be prevented from conflicting, the safety of a dispatching system is guaranteed, and the running efficiency of the automatic guided vehicles can be improved.

In order to further improve the road surface utilization rate, a gradual release strategy can be implemented on the nodes on the path. Furthermore, after the automatic guided vehicle runs through part of the local path, the dispatching device or the automatic guided vehicle can gradually unlock nodes which are passed by the automatic guided vehicle and are not influenced any more, so that the nodes which are passed by and are not influenced any more can be released for other automatic guided vehicles to use, and road surface resources are effectively utilized. Certainly, when the automated guided vehicle locks and unlocks the nodes, the states of locking, unlocking, driving and the like of the automated guided vehicle need to be fed back to the scheduling device, so that the scheduling device can know the states of the road nodes in time, whether a local path at the next stage needs to be sent to the corresponding automated guided vehicle can be determined based on the states of the road nodes, such as locked states or unlocked states, and the scheduling device can effectively schedule the automated guided vehicle.

For example, in fig. 6, the local dynamic path a-B, the automated guided vehicle locks nodes a, B, and B simultaneously when obtaining the path; when the automatic guided vehicle runs to the point a and does not influence the point A any more, the mutual exclusion lock of the point A needs to be released immediately, and the unlocking information is reported to the upper computer.

It is understood that both the local path and the global path may be referred to as a path, that is, the automated guided vehicle scheduling method of the first embodiment may include: inputting the current global path and the next-stage global path of a plurality of automatic guided vehicles into a reinforcement learning network to obtain the priority of the plurality of automatic guided vehicles, and further determining the condition of sending the next-stage global path to the plurality of automatic guided vehicles according to the priority sequence; or inputting the current local paths and the next-stage local paths of the automatic guided vehicles into the reinforcement learning network to obtain the priorities of the automatic guided vehicles, and further determining the condition of sending the next-stage local paths to the automatic guided vehicles according to the priority sequence; or inputting the current global path and the local path of the next stage of the plurality of automatic guided vehicles into the reinforcement learning network to obtain the priority of the plurality of automatic guided vehicles, and further determining the condition of sending the sequence of the local path of the next stage to the plurality of automatic guided vehicles according to the priority sequence; or inputting the current local paths and the next-stage global paths of the automatic guided vehicles into the reinforcement learning network to obtain the priorities of the automatic guided vehicles, and further determining the condition of sending the next-stage global paths to the automatic guided vehicles according to the priority sequence.

It can be understood that the automated guided vehicle scheduling method of the present application may also count the efficiency of scheduling the automated guided vehicle, and optimize the reinforcement learning network based on the counted efficiency, that is, optimize the parameters in the convolutional neural network by taking the traffic efficiency improvement as the objective function.

For example, the task completion amount in unit time is counted, and the reinforcement learning network is optimized according to the task completion amount in unit time. It is understood that the task completion amount per unit time refers to the total amount of tasks completed by all the dispatching trucks per unit time.

Optionally, the reinforcement learning network includes a priority computation network and a value function network.

The input of the priority computing network is the raster images of the automatic guided vehicles, and the output of the priority computing network is the priority of the automatic guided vehicles. Specifically, as shown in fig. 7, the priority calculation network takes raster images of a plurality of automated guided vehicles as input, outputs a one-dimensional vector with the same number as that of an input channel through a plurality of convolution layers, normalizes the value inside the vector to be between [0 and 1] through a sigmoid function, and the size of the value represents the passing priority of the automated guided vehicle corresponding to the channel.

The input of the value function network is the raster images and the priorities of the automatic guided vehicles, and the output of the value function network is the predicted value of the task amount completed in unit time. As shown in fig. 8, the value function network takes a multi-channel gray-scale raster image and a vehicle priority vector as inputs, combines the two inputs in the middle of a plurality of convolution layers, and finally outputs a scalar quantity representing the predicted value of the number of tasks completed in unit time under the current state and strategy to evaluate the quality of the current strategy.

The step of optimizing the reinforcement learning network by the task amount in unit time may include: optimizing parameters of a value function network by taking a square error between a predicted value and a true value of the task amount completed in unit time as a loss; and taking the gradient of the automatic guided vehicle priority in the value function network to the task completion amount in unit time as the gradient of the automatic guided vehicle priority of the priority calculation network so as to optimize the parameters of the priority calculation network.

It is understood that, in the process of calculating the priorities of a plurality of automated guided vehicles using the reinforcement learning network, the value function network is not enabled, and the priorities of the plurality of automated guided vehicles are calculated using only the priority calculation network. When the reinforcement learning network needs to be optimized, the value function network is started to optimize the priority computing network.

The automatic guided vehicle scheduling method is generally realized by an automatic guided vehicle scheduling device, so the application also provides the automatic guided vehicle scheduling device. Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of an automatic guided vehicle dispatching device according to the present application. The automatic guided vehicle dispatching device 10 comprises a processor 12 and a memory 11; memory 11 is used to store program instructions for implementing automated guided vehicle scheduling methods as described above, and processor 12 is used to execute the program instructions stored in memory 11.

Further, as shown in fig. 10, the present application also discloses an automated guided vehicle dispatching system 20, wherein the dispatching system 20 comprises a dispatching device 21 and an AGV vehicle group 22. The AGV car cluster 22 includes at least one automated guided vehicle.

The scheduling device 21 is configured to assign each newly added task to an automated guided vehicle according to a current working set state after receiving a requirement of the newly added task, perform path planning based on each newly added task to obtain a global path of each newly added task, divide the global path into local paths, add the local paths of the new tasks into the working set, and then determine whether the global path or the local paths need to be sent to the corresponding automated guided vehicles.

The automated guided vehicle acquires the global path or the local path from the scheduling device 21 and runs along the current global path or the local path under the control of the motion controller.

Furthermore, under the condition that the path nodes are configured with the mutual exclusion lock, the automatic guided vehicles can also uniformly lock the nodes on the received global path or local path so as to avoid collision with other automatic guided vehicles and ensure the safety. And the automated guided vehicles can also send the relevant information of the locked nodes to the scheduling device 21, so that the scheduling device 21 can know which nodes are locked when determining whether the global path or the local path of the next stage of each automated guided vehicle needs to be sent to the corresponding automated guided vehicle, and further the global path or the local path with the locked nodes is not sent to the corresponding automated guided vehicle, so that the same node can be prevented from being distributed to a plurality of automated guided vehicles, and the safety of the automated guided vehicles is ensured. In addition, when the automatic guided vehicles drive through some nodes, the nodes which pass through the automatic guided vehicles and are not influenced by the automatic guided vehicles can be unlocked, so that the nodes can be used by other automatic guided vehicles, and the utilization efficiency of road surface resources is improved. Likewise, the automated guided vehicle may also send information about the unlock node to the scheduling device 21. The scheduling device 21 may store the acquired information related to the unlocking node, the locking node and the driving state of the automated guided vehicle in the working set.

Of course, in another implementation, the nodes may also be locked and unlocked by the scheduler 21. Of course, the dispatching device 21 needs to know the position of the automated guided vehicle, so that the dispatching device 21 can know which nodes the automated guided vehicle has driven through, and can unlock the corresponding nodes.

The logic process of the automated guided vehicle scheduling method is presented as a computer program, which can be stored in a computer storage medium if it is sold or used as a stand-alone software product, and thus a readable storage medium is proposed in the present application. Referring to fig. 11, fig. 11 is a schematic structural diagram of an embodiment of a readable storage medium 30 of the present application, in which a computer program 31 is stored, and when the computer program 31 is executed by a processor, the steps in the automated guided vehicle scheduling method are implemented.

The readable storage medium 30 may be a medium that can store a computer program, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or may also be a server that stores the computer program, and the server can send the stored computer program to another device for running or can run the stored computer program by itself. The readable storage medium 30 may be a combination of a plurality of entities from a physical point of view, for example, a plurality of servers, a server plus a memory, or a memory plus a removable hard disk.

The above embodiments are merely examples and are not intended to limit the scope of the present disclosure, and all modifications, equivalents, and flow charts using the contents of the specification and drawings of the present disclosure or those directly or indirectly applied to other related technical fields are intended to be included in the scope of the present disclosure.

Claims

1. An automated guided vehicle scheduling method, the method comprising:

2. The automated guided vehicle scheduling method according to claim 1, wherein the inputting of the current path and the path of the next stage of the plurality of automated guided vehicles into the reinforcement learning network previously comprises: determining a global path for each automated guided vehicle based on the task for each automated guided vehicle; dividing the global path of each automatic guided vehicle into at least one local path;

the inputting of the current path and the path of the next stage of the plurality of automated guided vehicles into the reinforcement learning network comprises: inputting the current local paths and the local paths of the next stage of the automatic guided vehicles into a reinforcement learning network;

the sequentially determining whether to send the path of the next stage of the automated guided vehicle to the automated guided vehicle includes: and sequentially confirming whether the local path of the next stage of the automatic guided vehicle is sent to the automatic guided vehicle.

3. The automated guided vehicle scheduling method according to claim 2, wherein the taking of the current local path of the plurality of automated guided vehicles as an input of the reinforcement learning network comprises:

4. The automated guided vehicle scheduling method according to claim 3, wherein the taking of the current local path of the plurality of automated guided vehicles as an input of the reinforcement learning network comprises:

obtaining a grid map of each automated guided vehicle based on the size of each automated guided vehicle, the current local path, the incomplete global path and the size of the load of each automated guided vehicle, wherein pixel points of a first pixel value of the grid map of the automated guided vehicle are regions corresponding to the current local path of the automated guided vehicle, and pixel points of a second pixel value of the grid map are regions corresponding to the incomplete global path of the automated guided vehicle; taking the grid graphs of a plurality of automatic guided vehicles as the input of the reinforcement learning network; or the like, or, alternatively,

5. The automated guided vehicle scheduling method of claim 4, further comprising:

counting the efficiency of dispatching the automatic guided vehicle;

optimizing the reinforcement learning network with the efficiency.

6. The automated guided vehicle scheduling method according to claim 5, wherein the reinforcement learning network comprises a priority computing network and a value function network, wherein the input of the priority computing network is a grid map of a plurality of automated guided vehicles, the output of the priority computing network is the priorities of the plurality of automated guided vehicles, the input of the value function network is the grid map and the priorities of the plurality of automated guided vehicles, and the output of the value function network is a predicted value of the task volume completed in unit time;

the efficiency of the statistical dispatch automated guided vehicle includes: counting the real value of the task amount completed in unit time;

the optimizing the reinforcement learning network with the efficiency includes: optimizing parameters of a value function network by taking a square error between a predicted value and a true value of the task amount completed in unit time as a loss;

7. The automated guided vehicle scheduling method according to claim 1, wherein the sequentially determining whether to send the local path of the next stage of the automated guided vehicle to the automated guided vehicle comprises:

8. The automated guided vehicle scheduling method of claim 7, wherein said causing a node of a local path of a next phase of transmission to be locked comprises:

9. The automated guided vehicle scheduling method of claim 7, further comprising:

10. The automated guided vehicle scheduling method of claim 1, wherein the dividing the global path of each automated guided vehicle into at least one local path comprises:

11. The automated guided vehicle scheduling method of claim 1, wherein determining the global path for each automated guided vehicle based on the task for each automated guided vehicle comprises: and determining a global path with the optimal cost for each automatic guided vehicle by using an A-x algorithm based on the starting point and the end point of the task of each automatic guided vehicle.

12. An automated guided vehicle scheduling apparatus, comprising a memory and a processor; the memory has stored therein a computer program for execution by the processor to carry out the steps of the method according to any one of claims 1 to 11.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 11.