CN115660386A - Robot scheduling reward method and device, electronic equipment and storage medium - Google Patents

Robot scheduling reward method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115660386A
CN115660386A CN202211595942.XA CN202211595942A CN115660386A CN 115660386 A CN115660386 A CN 115660386A CN 202211595942 A CN202211595942 A CN 202211595942A CN 115660386 A CN115660386 A CN 115660386A
Authority
CN
China
Prior art keywords
scheduling
robot
nodes
node
reward
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211595942.XA
Other languages
Chinese (zh)
Inventor
龚汉越
支涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunji Technology Co Ltd
Original Assignee
Beijing Yunji Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yunji Technology Co Ltd filed Critical Beijing Yunji Technology Co Ltd
Priority to CN202211595942.XA priority Critical patent/CN115660386A/en
Publication of CN115660386A publication Critical patent/CN115660386A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

The application provides a robot scheduling reward method, a robot scheduling reward device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a service request within a preset time interval, generating a scheduling task according to the service request, and determining map information; according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes; calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode; generating a dispatching driving path corresponding to the robot according to the position information, the state information and the main node of the robot; and determining a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes. The application improves the dispatching capability of the robot dispatching system, improves the dispatching efficiency of the robot, and improves the intelligent service level of the robot.

Description

Robot scheduling reward method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of robot technologies, and in particular, to a robot scheduling reward method and apparatus, an electronic device, and a storage medium.
Background
With the deep development of digitization and intelligent technologies in various fields, more intelligent devices are widely applied in the scenes of hotels, shopping malls, office buildings and the like. The intelligent robot is used as a device with high use frequency in intelligent devices, and can provide services such as delivery, welcoming, tour and the like for hotel customers. The hotel robot is used for providing the object delivery service, so that the labor amount of hotel workers is reduced, and the intelligent service level of the hotel is improved.
In a scheduling scenario of a commercial robot, for example, a robot providing services in a hotel, a building, etc., the robot generally needs to be assisted by vertical transportation, such as an elevator, etc., and may also pass through some gates in a field while performing a scheduling task during the traveling of the robot. Therefore, the commercial robot is not simply scheduled to move in a translational manner between two points, and as the scheduling requirements and the number of robots increase, an optimal scheduling scheme needs to be found for the robot in a certain space area. At present, the existing robot scheduling system cannot realize the scheduling of robots in a reward mode, so that a globally optimal scheduling scheme cannot be obtained, and the scheduling capability of the robot scheduling system is reduced.
Disclosure of Invention
In view of this, embodiments of the present application provide a robot scheduling reward method, an apparatus, an electronic device, and a storage medium, so as to solve the problem that, in the prior art, an existing robot scheduling system cannot implement robot scheduling in a reward manner, and therefore cannot obtain a globally optimal scheduling scheme, and reduce the scheduling capability of the robot scheduling system.
In a first aspect of embodiments of the present application, a robot scheduling reward method is provided, including: acquiring a service request within a preset time interval, generating a scheduling task according to the service request, and determining map information of a scene corresponding to the scheduling task; according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes; calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode; responding to the scheduling trigger request, acquiring the position information and the state information of the robot at the current moment, and generating a scheduling driving path corresponding to the robot according to the position information, the state information and the master node of the robot; and determining a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes.
In a second aspect of the embodiments of the present application, there is provided a robot scheduling reward device, including: the system comprises an acquisition module, a scheduling module and a processing module, wherein the acquisition module is configured to acquire a service request in a preset time interval, generate a scheduling task according to the service request and determine map information of a scene corresponding to the scheduling task; the dividing module is configured to execute node operation on the map information according to the scheduling task to obtain a main node corresponding to the scheduling task, and divide the main node to obtain sub-nodes; the calculation module is configured to calculate a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode; the generating module is configured to respond to the scheduling trigger request, acquire the position information and the state information of the robot at the current moment, and generate a scheduling driving path corresponding to the robot according to the position information, the state information and the master node of the robot; and the determining module is configured to determine a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes.
In a third aspect of the embodiments of the present application, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method are implemented.
In a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, in which a computer program is stored, and the computer program realizes the steps of the above method when being executed by a processor.
The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:
the method comprises the steps of generating a scheduling task according to a service request by acquiring the service request in a preset time interval, and determining map information of a scene corresponding to the scheduling task; according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes; calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode; responding to a scheduling trigger request, acquiring the position information and the state information of the robot at the current moment, and generating a scheduling driving path corresponding to the robot according to the position information, the state information and the main node of the robot; and determining a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes. According to the method and the system, the optimal dispatching driving path can be determined for the robot executing the dispatching task in a dispatching profit reward calculation mode, the dispatching capacity of a robot dispatching system is improved, the dispatching efficiency of the robot is improved, and the intelligent service level of the robot is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for scheduling rewards by a robot according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a robot scheduling reward device according to an embodiment of the present disclosure;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
The embodiment of the application takes a hotel application scenario as an example, and takes task scheduling of a robot in a hotel as an example for detailed description, but it should be understood that the embodiment of the application is not limited to the hotel application scenario, nor to remote task scheduling of a hotel robot, and any other scenario for remotely scheduling a robot is applicable to the scheme, for example, robot scheduling in a logistics scenario, robot scheduling in a building scenario, and the like.
The technical solution of the present application will be described in detail with reference to specific examples.
Fig. 1 is a flowchart illustrating a method for scheduling rewards by a robot according to an embodiment of the present disclosure. The robot scheduling reward method of fig. 1 may be performed by a robot scheduling system. As shown in fig. 1, the robot scheduling reward method may specifically include:
s101, acquiring a service request in a preset time interval, generating a scheduling task according to the service request, and determining map information of a scene corresponding to the scheduling task;
s102, according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes;
s103, calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode;
s104, responding to the scheduling trigger request, acquiring the position information and the state information of the robot at the current moment, and generating a scheduling driving path corresponding to the robot according to the position information, the state information and the master node of the robot;
and S105, determining a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes.
Specifically, the scheduling benefit reward value of the embodiment of the application refers to a reward value which is calculated by a robot scheduling system according to the walking path of the robot between a main node and an intermediate node in a virtual scene map and the passing speed of the robot between adjacent nodes on the path in the same historical time period, wherein the higher the scheduling benefit reward value is, the better result can be achieved when the robot executes the scheduling task along the walking path. The method and the system measure the advantages and the disadvantages of the walking paths of the robots through the scheduling benefit reward values, and obtain an optimal scheduling scheme when the sum of the scheduling benefit reward values of the walking paths of all the robots executing scheduling tasks is the largest.
In some embodiments, acquiring a service request within a preset time interval, generating a scheduling task according to the service request, and determining map information of a scene corresponding to the scheduling task, includes: receiving a service request sent by a user side in each time interval, generating a scheduling task according to the service request, and acquiring map information of a scene corresponding to the scheduling task from a pre-stored map according to the scheduling task; the service request comprises a delivery request, and the delivery request comprises article information, payment information, time information and destination position information.
Specifically, in order to improve the efficiency of robot scheduling and reduce the invalid travel path of the robot, the service request is acquired according to the preset time interval before the scheduling task is executed, so that the scheduling task in a period of time is generated. In practical application, the time interval may be set to 5 minutes, 10 minutes, 15 minutes, and the like, for example, all service requests in the time period are acquired every 5 minutes, and then the scheduling task corresponding to the time period is generated according to all service requests, so that the scheduling task of the present application is a task with a time sequence.
Furthermore, the map information of the scene corresponding to the scheduling task is acquired from the database according to the scheduling task, and the map information is acquired in order that the scheduling system nodizes the map according to the map information and the task information of the scheduling task, namely, a plurality of nodes are generated in the map, and the walking path of the robot is determined according to the nodes.
In some embodiments, according to the scheduling task, performing a nodularization operation on the map information to obtain a master node corresponding to the scheduling task, and dividing the master node to obtain child nodes, including: performing node operation on the map information according to the initial position information and the target position information in the scheduling task so as to nodize the map into a plurality of main nodes, and performing node division on adjacent main nodes according to a preset distance or time to obtain sub-nodes between the adjacent main nodes; the main node comprises an initial node, a destination node and an intermediate node, and the intermediate node comprises an elevator node and a gate node.
Specifically, the scheduling tasks include start position information and destination position information corresponding to each task, and certainly in practical application, different scheduling tasks may share one start position, that is, only include one start position information; generating a start node in the map according to the start position information, and generating different destination nodes in the map according to the destination position information of each scheduling task, for example: assuming that in a delivery scene, the robot 1 needs to perform scheduling tasks to the points a, B, and C, the points a, B, and C are destination nodes, and the destination nodes may include positions that the robot needs to reach, such as a room doorway, a hall, and the like.
Furthermore, a plurality of intermediate nodes are also included between the starting node and the destination node, and the intermediate nodes may be fixed nodes in the map, for example, gates, elevators, etc. located between the starting node and the destination node are used as the intermediate nodes, so that at least one starting node, a plurality of destination nodes, and a plurality of intermediate nodes are generated in the map.
Further, in order to calculate the passing reward of the robot more accurately, the area between the main nodes is further divided into a plurality of sub-nodes, and in practical application, the division of the sub-nodes may be in a distance or time manner, for example, one sub-node is set every 1 meter, or one sub-node is set every 1 second, so that the sub-nodes are the basic period for the robot to obtain the passing reward, that is, the robot obtains the passing reward every other sub-node.
In some embodiments, calculating a scheduled revenue reward value between the neighboring child node and the neighboring master node according to a predetermined scheduled reward calculation manner includes: acquiring historical environment average speed and historical passing time of the robot between the adjacent main node and the adjacent sub-node in the same historical time period, and calculating scheduling profit reward values corresponding to the adjacent main node and the adjacent sub-node based on the historical environment average speed and the historical passing time; the historical environment average speed is used for representing the congestion degree of the robot in the areas of the adjacent main nodes and the adjacent sub nodes in the same historical time period.
Specifically, the calculation process of the scheduling reward (i.e., the passing reward) of the present application may be simulated as a process in which the robot eats candies along the walking path, that is, a candy is placed on each child node of the walking path of the robot, and when the robot passes through the child node, the candy of the child node is added to the scheduling reward.
Further, when the dispatching reward value is calculated, the dispatching reward value is calculated according to the environment average speed and the passing time of the robot in the historical time period of the time period corresponding to the current dispatching task, therefore, the environment average speed and the passing time corresponding to the node area in the historical time period can image the passing reward of the robot, in other words, the size of the candy (the size of the dispatching reward value) is in direct proportion to the congestion degree of the path in the walking process of the robot, namely, the lower the environment average speed corresponding to the node area, the smaller the candy (the smaller the dispatching reward value). In practical application, the practical calculation of the method is that the passing reward (candy size) of the conflicted resource of the robot in the process of arriving at the destination is in direct proportion to the candy size and the congestion degree, the congestion can be originated from the junction of small area switching in a scene and from the node of which the running statistical speed is lower than the average speed in the scene, for example: the middle node corresponding to the elevator is set as a large candy, and the middle node corresponding to the gate is set as a middle candy.
In some embodiments, the scheduling revenue reward value for the neighboring master node and the neighboring child node is calculated using the following formula:
Figure 950492DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 241796DEST_PATH_IMAGE002
a value representing a scheduled revenue award is indicated,
Figure 482066DEST_PATH_IMAGE003
the average speed of the historical environment is represented,
Figure 884228DEST_PATH_IMAGE004
representing the historical transit time.
Further, the minimum unit of the scheduling reward value of the application can be a trace (J), wherein a 1 trace represents a passable area with an environment average speed of 1m/s, and the scheduling system rewards after the normal passage is 1 s:
Figure 21949DEST_PATH_IMAGE001
. The passing speed of the area is lower than the average speed of 1m/s, so that the area is congested, the passing speed of the area is higher than the average speed of 1m/s, so that the area is unobstructed, J and v are in inverse proportion, and the average speed of the environment is the passing speed of the area including pedestrians.
In one specific example, for example: in a 10m passing area with the average speed of 0.5m/s, the robot passes 12.5s at 0.4m/s, at this time, the dispatching profit reward of the dispatching system is (0.5) ^ -1 × 10 = 20J, and the efficiency of the single machine system is 0.4/0.5=80%.
It should be noted that, when the dispatching system cannot accurately capture the average speed of the environment in a short period of time, the average speed of the historical single machines can be used as the reward standard of the dispatching profit. When the method and the device are used for calculating the dispatching profit reward value, the used environment average speed is the historical average speed when the robot passes through the area corresponding to the node in the same historical time period, and when the historical average speed is calculated, the historical average speed can be calculated according to the distance between the nodes and the passing time of the robot, for example: when the robot 1 passes from the point A to the point D, the distance from the point A to the point D is known, and the historical environment average speed of the robot between each node can be calculated according to the distance and the passing time.
In some embodiments, in response to the scheduling trigger request, acquiring the position information and the state information of the robot at the current moment, and generating a scheduling travel path corresponding to the robot according to the position information, the state information and the master node of the robot includes: when each time interval is finished, generating a scheduling trigger request, responding to the scheduling trigger request, and acquiring the position information of the robot in the map and the state information of the robot at the current moment; generating a dispatching driving path for the robot to reach different target nodes from the starting node according to the position information and the state information of the robot; wherein, the state information comprises the working state and the article loading state.
Specifically, when each time interval is finished, the scheduling system simulates all scheduling driving paths of the robot in the map according to the position information and the state information of the robot and the task information of the scheduling task acquired in the time interval, namely all driving paths from the starting node to the destination node of the robot. Since the scheduling benefit reward value of the robot among the nodes in the same historical time period is already calculated in the foregoing embodiment, the total scheduling benefit reward value corresponding to each scheduling travel path can be calculated according to the scheduling benefit reward value among the nodes, the total scheduling benefit reward value is used for measuring the degree of goodness and badness of the robot scheduling travel path, and the higher the total scheduling benefit reward value is, the scheduling scheme (i.e., the combination of the scheduling travel paths corresponding to all robots) is the optimal scheme.
In some embodiments, determining a total scheduling revenue reward value corresponding to each scheduled travel path based on the scheduling revenue reward values between the adjacent child nodes and the adjacent master nodes comprises: according to the areas where the scheduling driving paths pass, the scheduling benefit reward values of adjacent child nodes in the passing areas are sequentially added according to the sequence to obtain the scheduling benefit reward values between the main nodes which sequentially pass according to the sequence, and the scheduling benefit reward values between all the main nodes are added to obtain the total scheduling benefit reward value corresponding to each scheduling driving path.
Specifically, after all possible scheduling driving paths are simulated, the scheduling benefit reward value among each node is known, so that a total scheduling benefit reward value corresponding to each scheduling driving path (i.e., each scheduling scheme) can be calculated, the scheduling driving paths are sorted according to the total scheduling benefit reward value, and a combination of the scheduling driving paths with the maximum total scheduling benefit reward value is selected as a final scheduling scheme.
In some embodiments, the quantification of "traces" is independent of the economic return of capacity itself, and the quantification of "traces" is a fraction of the cost of capacity (a fraction because it does not take into account hardware depreciation, site rentals, and maintenance of the robot, but only the opportunity cost of time); like a single capacity demand, costs are different during peak and peak hours because the average speed of the traffic in each area is different during different hours; quantification of "traces" also guides pricing of revenue pieces, since it is a cost, for example: the time opportunity cost of the transport capacity of the elevator needing to be transferred for 2 times is larger than that of the transport capacity of the elevator needing to be transferred for 1 time, and whether the price can be fixed or not can be adjusted according to the quantization degree; for example, the opportunity cost of capacity time during peak hours may be greater than during peak flat hours.
The quantification of the "trace" itself and the efficiency of the operation of a single task are also independent as long as the start and destination of the capacity demand are confirmed, the point in time of the capacity demand is confirmed, the "trace" can be confirmed. For a capacity system, the "trails" are implemented as a process of collecting trails J, similar to eating beans along the way. The number of acquired "traces" in a unit time or a designated time is the embodiment of the capacity of the system capacity scheduling, for example: at noon peak 2 hours, the number of realized "traces" is the embodiment of the scheduling capability of the system in the peak period.
According to the technical scheme provided by the embodiment of the application, the scheduling task is generated according to the service request by acquiring the service request within the preset time interval, and the map information of the scene corresponding to the scheduling task is determined; according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes; calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode; responding to the scheduling trigger request, acquiring the position information and the state information of the robot at the current moment, and generating a scheduling driving path corresponding to the robot according to the position information, the state information and the master node of the robot; and determining a total scheduling profit reward value corresponding to each scheduling driving path according to the scheduling profit reward values between the adjacent child nodes and the adjacent main nodes. According to the method and the system, the optimal dispatching driving path can be determined for the robot executing the dispatching task in a dispatching profit reward calculation mode, the dispatching capacity of a robot dispatching system is improved, the dispatching efficiency of the robot is improved, and the intelligent service level of the robot is improved.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Fig. 2 is a schematic structural diagram of a robot scheduling bonus device according to an embodiment of the application. As shown in fig. 2, the robot scheduling bonus device includes:
the obtaining module 201 is configured to obtain a service request within a preset time interval, generate a scheduling task according to the service request, and determine map information of a scene corresponding to the scheduling task;
the dividing module 202 is configured to perform a nodularization operation on the map information according to the scheduling task to obtain a main node corresponding to the scheduling task, and divide the main node to obtain sub-nodes;
the calculation module 203 is configured to calculate a scheduling benefit reward value between adjacent child nodes and adjacent master nodes according to a predetermined scheduling reward calculation mode;
the generating module 204 is configured to respond to the scheduling trigger request, acquire the position information and the state information of the robot at the current moment, and generate a scheduling driving path corresponding to the robot according to the position information, the state information and the master node of the robot;
the determining module 205 is configured to determine a total scheduling benefit reward value corresponding to each scheduling travel path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent master nodes.
In some embodiments, the obtaining module 201 in fig. 2 receives a service request sent by a user end in each time interval, generates a scheduling task according to the service request, and obtains map information of a scene corresponding to the scheduling task from a pre-stored map according to the scheduling task; the service request comprises a delivery request, and the delivery request comprises article information, payment information, time information and destination position information.
In some embodiments, the partitioning module 202 in fig. 2 performs a nodularization operation on the map information according to the start position information and the destination position information in the scheduling task, so as to nodularize the map into a plurality of main nodes, and performs node partitioning between adjacent main nodes according to a preset distance or time to obtain sub-nodes between adjacent main nodes; the main node comprises an initial node, a destination node and an intermediate node, and the intermediate node comprises an elevator node and a gate node.
In some embodiments, the calculation module 203 of fig. 2 obtains historical environmental average speed and historical transit time of the robot between the adjacent main node and the adjacent sub node in the same historical time period, and calculates a scheduling revenue reward value corresponding to the adjacent main node and the adjacent sub node based on the historical environmental average speed and the historical transit time; the historical environment average speed is used for representing the congestion degree of the robot in the areas of the adjacent main nodes and the adjacent sub nodes in the same historical time period.
In some embodiments, the calculation module 203 of fig. 2 calculates the scheduled revenue award values for the neighboring primary nodes and the neighboring sub-nodes using the following formula:
Figure 280892DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 504062DEST_PATH_IMAGE002
a value representing a scheduled revenue award may be displayed,
Figure 444337DEST_PATH_IMAGE003
the average speed of the historical environment is represented,
Figure 436563DEST_PATH_IMAGE004
representing the historical transit time.
In some embodiments, the generation module 204 of fig. 2 generates a scheduling trigger request at the end of each time interval, and in response to the scheduling trigger request, acquires the position information of the robot in the map at the current time and the state information of the robot; generating a dispatching driving path for the robot to reach different target nodes from the starting node according to the position information and the state information of the robot; wherein, the state information comprises a working state and an article loading state.
In some embodiments, the determining module 205 in fig. 2 sequentially adds the scheduling benefit reward values of the adjacent child nodes passing through the area according to the area through which the scheduling driving path passes, to obtain the scheduling benefit reward values between the host nodes sequentially passing through the area according to the order, and adds the scheduling benefit reward values between all the host nodes to obtain the total scheduling benefit reward value corresponding to each scheduling driving path.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
Fig. 3 is a schematic structural diagram of an electronic device 3 provided in an embodiment of the present application. As shown in fig. 3, the electronic apparatus 3 of this embodiment includes: a processor 301, a memory 302, and a computer program 303 stored in the memory 302 and operable on the processor 301. The steps in the various method embodiments described above are implemented when the processor 301 executes the computer program 303. Alternatively, the processor 301 implements the functions of the modules/units in the above-described device embodiments when executing the computer program 303.
Illustratively, the computer program 303 may be partitioned into one or more modules/units, which are stored in the memory 302 and executed by the processor 301 to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 303 in the electronic device 3.
The electronic device 3 may be a desktop computer, a notebook, a palm computer, a cloud server, or other electronic devices. The electronic device 3 may include, but is not limited to, a processor 301 and a memory 302. Those skilled in the art will appreciate that fig. 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine certain components, or different components, for example, the electronic device may also include input-output devices, network access devices, buses, etc.
The Processor 301 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage 302 may be an internal storage unit of the electronic device 3, for example, a hard disk or a memory of the electronic device 3. The memory 302 may also be an external storage device of the electronic device 3, such as a plug-in hard disk provided on the electronic device 3, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 302 may also include both an internal storage unit of the electronic device 3 and an external storage device. The memory 302 is used for storing computer programs and other programs and data required by the electronic device. The memory 302 may also be used to temporarily store data that has been output or is to be output.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a division of modules or units, a division of logical functions only, an additional division may be made in actual implementation, multiple units or components may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.
Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the foregoing embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and instructs related hardware to implement the steps of the foregoing method embodiments when executed by a processor. The computer program may comprise computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain suitable additions or additions that may be required in accordance with legislative and patent practices within the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals or telecommunications signals in accordance with legislative and patent practices.
The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1. A method for robot scheduling rewards, comprising:
acquiring a service request in a preset time interval, generating a scheduling task according to the service request, and determining map information of a scene corresponding to the scheduling task;
according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes;
calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode;
responding to a scheduling trigger request, acquiring position information and state information of the robot at the current moment, and generating a scheduling driving path corresponding to the robot according to the position information and state information of the robot and the master node;
and determining a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes.
2. The method according to claim 1, wherein the obtaining a service request within a preset time interval, generating a scheduling task according to the service request, and determining map information of a scene corresponding to the scheduling task comprises:
receiving a service request sent by a user side in each time interval, generating a scheduling task according to the service request, and acquiring map information of a scene corresponding to the scheduling task from a pre-stored map according to the scheduling task;
the service request comprises a delivery request, and the delivery request comprises article information, payment information, time information and destination position information.
3. The method according to claim 2, wherein the performing a nodularization operation on the map information according to the scheduling task to obtain a main node corresponding to the scheduling task, and dividing the main node to obtain sub-nodes comprises:
performing node operation on the map information according to the initial position information and the target position information in the scheduling task so as to nodify the map into a plurality of main nodes, and performing node division on adjacent main nodes according to a preset distance or time to obtain sub-nodes between the adjacent main nodes;
wherein the master node comprises an origin node, a destination node and an intermediate node, and the intermediate node comprises an elevator node and a gate node.
4. The method of claim 1, wherein calculating a scheduling reward value between neighboring child nodes and neighboring master nodes according to a predetermined scheduling reward calculation manner comprises:
acquiring historical environment average speed and historical passing time of the robot between the adjacent main node and the adjacent sub-node in the same historical time period, and calculating scheduling profit reward values corresponding to the adjacent main node and the adjacent sub-node based on the historical environment average speed and the historical passing time;
the historical environment average speed is used for representing the congestion degree of the robot in the areas of the adjacent main nodes and the adjacent sub nodes in the historical same time period.
5. The method of claim 4, wherein the scheduling revenue reward value for the neighboring master node and the neighboring child node is calculated using the following equation:
Figure 689800DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,
Figure 872520DEST_PATH_IMAGE002
a value representing a scheduled revenue award may be displayed,
Figure 324361DEST_PATH_IMAGE003
the average speed of the historical environment is shown,
Figure 985150DEST_PATH_IMAGE004
representing the historical transit time.
6. The method according to claim 1, wherein the obtaining position information and state information of the robot at the current moment in response to the scheduling trigger request, and generating a scheduled travel path corresponding to the robot according to the position information and state information of the robot and the master node comprises:
when each time interval is finished, generating a scheduling trigger request, and responding to the scheduling trigger request to acquire the position information of the robot in the map and the state information of the robot at the current moment;
generating a scheduling driving path for enabling the robot to reach different target nodes from an initial node according to the position information and the state information of the robot;
wherein, the state information comprises a working state and an article loading state.
7. The method of claim 1, wherein determining a total scheduling revenue reward value for each of the scheduled travel paths based on the scheduling revenue reward values between the neighboring child nodes and the neighboring master nodes comprises:
according to the areas where the scheduling driving paths pass through, the scheduling benefit reward values of the adjacent sub nodes passing through the areas are added in sequence according to the sequence, the scheduling benefit reward values between the main nodes passing through in sequence are obtained, the scheduling benefit reward values between all the main nodes are added, and the total scheduling benefit reward value corresponding to each scheduling driving path is obtained.
8. A robotic dispatch reward device, comprising:
the system comprises an acquisition module, a scheduling module and a processing module, wherein the acquisition module is configured to acquire a service request within a preset time interval, generate a scheduling task according to the service request and determine map information of a scene corresponding to the scheduling task;
the dividing module is configured to execute node operation on the map information according to the scheduling task to obtain a main node corresponding to the scheduling task, and divide the main node to obtain sub-nodes;
the calculation module is configured to calculate a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode;
the generating module is configured to respond to a scheduling trigger request, acquire position information and state information of a robot at the current moment, and generate a scheduling driving path corresponding to the robot according to the position information and the state information of the robot and the master node;
and the determining module is configured to determine a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any one of claims 1 to 7 when executing the program.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN202211595942.XA 2022-12-13 2022-12-13 Robot scheduling reward method and device, electronic equipment and storage medium Pending CN115660386A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211595942.XA CN115660386A (en) 2022-12-13 2022-12-13 Robot scheduling reward method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211595942.XA CN115660386A (en) 2022-12-13 2022-12-13 Robot scheduling reward method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115660386A true CN115660386A (en) 2023-01-31

Family

ID=85020045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211595942.XA Pending CN115660386A (en) 2022-12-13 2022-12-13 Robot scheduling reward method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115660386A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114089755A (en) * 2021-11-16 2022-02-25 大连理工大学 Multi-robot task allocation method based on consistency package algorithm
CN114201303A (en) * 2021-12-15 2022-03-18 杭州电子科技大学 Task unloading optimization method of fixed path AGV in industrial Internet of things environment
CN114296440A (en) * 2021-09-30 2022-04-08 中国航空工业集团公司北京长城航空测控技术研究所 AGV real-time scheduling method integrating online learning
CN115330095A (en) * 2022-10-14 2022-11-11 青岛慧拓智能机器有限公司 Mine car dispatching model training method, device, chip, terminal, equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114296440A (en) * 2021-09-30 2022-04-08 中国航空工业集团公司北京长城航空测控技术研究所 AGV real-time scheduling method integrating online learning
CN114089755A (en) * 2021-11-16 2022-02-25 大连理工大学 Multi-robot task allocation method based on consistency package algorithm
CN114201303A (en) * 2021-12-15 2022-03-18 杭州电子科技大学 Task unloading optimization method of fixed path AGV in industrial Internet of things environment
CN115330095A (en) * 2022-10-14 2022-11-11 青岛慧拓智能机器有限公司 Mine car dispatching model training method, device, chip, terminal, equipment and medium

Similar Documents

Publication Publication Date Title
Hörl et al. Fleet operational policies for automated mobility: A simulation assessment for Zurich
WO2019165616A1 (en) Signal light control method, related device, and system
Chen et al. Parking reservation for managing downtown curbside parking
Arnott et al. Modeling parking
CN111750862A (en) Multi-region-based robot path planning method, robot and terminal equipment
Chen et al. Spatial-temporal pricing for ride-sourcing platform with reinforcement learning
CN103886775A (en) Parking space intelligent query and reservation system and method thereof
CN110861104B (en) Method, medium, terminal and device for assisting robot in conveying articles
CN103475495A (en) Cloud-computing virtual-machine resource-usage charging method
CN105489061A (en) Public bike cloud intelligent parking and allocating system
CN108520775A (en) Practice on intelligent hospital is lined up to pay method, apparatus and equipment
CN113780808A (en) Vehicle service attribute decision optimization method based on flexible bus connection system line
CN110111599A (en) A kind of parking induction method based on big data, terminal device and storage medium
AU2014324087A1 (en) Determining network maps of transport networks
CN104778858A (en) Parking lot management system and method based on Internet of Things
CN115640986B (en) Robot scheduling method, device, equipment and medium based on rewards
Thakral Matching with Stochastic Arrival.
CN110675182A (en) Ticket pricing method and device, storage medium and server
Sowmya et al. Model free Reinforcement Learning to determine pricing policy for car parking lots
CN115660386A (en) Robot scheduling reward method and device, electronic equipment and storage medium
CN110503822A (en) The method and apparatus for determining traffic plan
Wang et al. The optimal queuing strategy for airport taxis
CN110175718A (en) A kind of berth allocation method and terminal device
CN113438318B (en) Performance test system and method of cloud control platform, electronic equipment and storage medium
CN115081983A (en) Order dispatching method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230131

RJ01 Rejection of invention patent application after publication