CN115660386A

CN115660386A - Robot scheduling reward method and device, electronic equipment and storage medium

Info

Publication number: CN115660386A
Application number: CN202211595942.XA
Authority: CN
Inventors: 龚汉越; 支涛
Original assignee: Beijing Yunji Technology Co Ltd
Current assignee: Beijing Yunji Technology Co Ltd
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-01-31

Abstract

The application provides a robot scheduling reward method, a robot scheduling reward device, electronic equipment and a storage medium. The method comprises the following steps: acquiring a service request within a preset time interval, generating a scheduling task according to the service request, and determining map information; according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes; calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode; generating a dispatching driving path corresponding to the robot according to the position information, the state information and the main node of the robot; and determining a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes. The application improves the dispatching capability of the robot dispatching system, improves the dispatching efficiency of the robot, and improves the intelligent service level of the robot.

Description

Robot scheduling reward method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of robot technologies, and in particular, to a robot scheduling reward method and apparatus, an electronic device, and a storage medium.

Background

With the deep development of digitization and intelligent technologies in various fields, more intelligent devices are widely applied in the scenes of hotels, shopping malls, office buildings and the like. The intelligent robot is used as a device with high use frequency in intelligent devices, and can provide services such as delivery, welcoming, tour and the like for hotel customers. The hotel robot is used for providing the object delivery service, so that the labor amount of hotel workers is reduced, and the intelligent service level of the hotel is improved.

In a scheduling scenario of a commercial robot, for example, a robot providing services in a hotel, a building, etc., the robot generally needs to be assisted by vertical transportation, such as an elevator, etc., and may also pass through some gates in a field while performing a scheduling task during the traveling of the robot. Therefore, the commercial robot is not simply scheduled to move in a translational manner between two points, and as the scheduling requirements and the number of robots increase, an optimal scheduling scheme needs to be found for the robot in a certain space area. At present, the existing robot scheduling system cannot realize the scheduling of robots in a reward mode, so that a globally optimal scheduling scheme cannot be obtained, and the scheduling capability of the robot scheduling system is reduced.

Disclosure of Invention

In view of this, embodiments of the present application provide a robot scheduling reward method, an apparatus, an electronic device, and a storage medium, so as to solve the problem that, in the prior art, an existing robot scheduling system cannot implement robot scheduling in a reward manner, and therefore cannot obtain a globally optimal scheduling scheme, and reduce the scheduling capability of the robot scheduling system.

In a first aspect of embodiments of the present application, a robot scheduling reward method is provided, including: acquiring a service request within a preset time interval, generating a scheduling task according to the service request, and determining map information of a scene corresponding to the scheduling task; according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes; calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode; responding to the scheduling trigger request, acquiring the position information and the state information of the robot at the current moment, and generating a scheduling driving path corresponding to the robot according to the position information, the state information and the master node of the robot; and determining a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes.

In a second aspect of the embodiments of the present application, there is provided a robot scheduling reward device, including: the system comprises an acquisition module, a scheduling module and a processing module, wherein the acquisition module is configured to acquire a service request in a preset time interval, generate a scheduling task according to the service request and determine map information of a scene corresponding to the scheduling task; the dividing module is configured to execute node operation on the map information according to the scheduling task to obtain a main node corresponding to the scheduling task, and divide the main node to obtain sub-nodes; the calculation module is configured to calculate a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode; the generating module is configured to respond to the scheduling trigger request, acquire the position information and the state information of the robot at the current moment, and generate a scheduling driving path corresponding to the robot according to the position information, the state information and the master node of the robot; and the determining module is configured to determine a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes.

In a third aspect of the embodiments of the present application, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method are implemented.

In a fourth aspect of the embodiments of the present application, a computer-readable storage medium is provided, in which a computer program is stored, and the computer program realizes the steps of the above method when being executed by a processor.

The embodiment of the application adopts at least one technical scheme which can achieve the following beneficial effects:

the method comprises the steps of generating a scheduling task according to a service request by acquiring the service request in a preset time interval, and determining map information of a scene corresponding to the scheduling task; according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes; calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode; responding to a scheduling trigger request, acquiring the position information and the state information of the robot at the current moment, and generating a scheduling driving path corresponding to the robot according to the position information, the state information and the main node of the robot; and determining a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes. According to the method and the system, the optimal dispatching driving path can be determined for the robot executing the dispatching task in a dispatching profit reward calculation mode, the dispatching capacity of a robot dispatching system is improved, the dispatching efficiency of the robot is improved, and the intelligent service level of the robot is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flowchart of a method for scheduling rewards by a robot according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a robot scheduling reward device according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

The embodiment of the application takes a hotel application scenario as an example, and takes task scheduling of a robot in a hotel as an example for detailed description, but it should be understood that the embodiment of the application is not limited to the hotel application scenario, nor to remote task scheduling of a hotel robot, and any other scenario for remotely scheduling a robot is applicable to the scheme, for example, robot scheduling in a logistics scenario, robot scheduling in a building scenario, and the like.

The technical solution of the present application will be described in detail with reference to specific examples.

Fig. 1 is a flowchart illustrating a method for scheduling rewards by a robot according to an embodiment of the present disclosure. The robot scheduling reward method of fig. 1 may be performed by a robot scheduling system. As shown in fig. 1, the robot scheduling reward method may specifically include:

s101, acquiring a service request in a preset time interval, generating a scheduling task according to the service request, and determining map information of a scene corresponding to the scheduling task;

s102, according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes;

s103, calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode;

s104, responding to the scheduling trigger request, acquiring the position information and the state information of the robot at the current moment, and generating a scheduling driving path corresponding to the robot according to the position information, the state information and the master node of the robot;

and S105, determining a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes.

Specifically, the scheduling benefit reward value of the embodiment of the application refers to a reward value which is calculated by a robot scheduling system according to the walking path of the robot between a main node and an intermediate node in a virtual scene map and the passing speed of the robot between adjacent nodes on the path in the same historical time period, wherein the higher the scheduling benefit reward value is, the better result can be achieved when the robot executes the scheduling task along the walking path. The method and the system measure the advantages and the disadvantages of the walking paths of the robots through the scheduling benefit reward values, and obtain an optimal scheduling scheme when the sum of the scheduling benefit reward values of the walking paths of all the robots executing scheduling tasks is the largest.

In some embodiments, acquiring a service request within a preset time interval, generating a scheduling task according to the service request, and determining map information of a scene corresponding to the scheduling task, includes: receiving a service request sent by a user side in each time interval, generating a scheduling task according to the service request, and acquiring map information of a scene corresponding to the scheduling task from a pre-stored map according to the scheduling task; the service request comprises a delivery request, and the delivery request comprises article information, payment information, time information and destination position information.

Specifically, in order to improve the efficiency of robot scheduling and reduce the invalid travel path of the robot, the service request is acquired according to the preset time interval before the scheduling task is executed, so that the scheduling task in a period of time is generated. In practical application, the time interval may be set to 5 minutes, 10 minutes, 15 minutes, and the like, for example, all service requests in the time period are acquired every 5 minutes, and then the scheduling task corresponding to the time period is generated according to all service requests, so that the scheduling task of the present application is a task with a time sequence.

Furthermore, the map information of the scene corresponding to the scheduling task is acquired from the database according to the scheduling task, and the map information is acquired in order that the scheduling system nodizes the map according to the map information and the task information of the scheduling task, namely, a plurality of nodes are generated in the map, and the walking path of the robot is determined according to the nodes.

In some embodiments, according to the scheduling task, performing a nodularization operation on the map information to obtain a master node corresponding to the scheduling task, and dividing the master node to obtain child nodes, including: performing node operation on the map information according to the initial position information and the target position information in the scheduling task so as to nodize the map into a plurality of main nodes, and performing node division on adjacent main nodes according to a preset distance or time to obtain sub-nodes between the adjacent main nodes; the main node comprises an initial node, a destination node and an intermediate node, and the intermediate node comprises an elevator node and a gate node.

Specifically, the scheduling tasks include start position information and destination position information corresponding to each task, and certainly in practical application, different scheduling tasks may share one start position, that is, only include one start position information; generating a start node in the map according to the start position information, and generating different destination nodes in the map according to the destination position information of each scheduling task, for example: assuming that in a delivery scene, the robot 1 needs to perform scheduling tasks to the points a, B, and C, the points a, B, and C are destination nodes, and the destination nodes may include positions that the robot needs to reach, such as a room doorway, a hall, and the like.

Furthermore, a plurality of intermediate nodes are also included between the starting node and the destination node, and the intermediate nodes may be fixed nodes in the map, for example, gates, elevators, etc. located between the starting node and the destination node are used as the intermediate nodes, so that at least one starting node, a plurality of destination nodes, and a plurality of intermediate nodes are generated in the map.

Further, in order to calculate the passing reward of the robot more accurately, the area between the main nodes is further divided into a plurality of sub-nodes, and in practical application, the division of the sub-nodes may be in a distance or time manner, for example, one sub-node is set every 1 meter, or one sub-node is set every 1 second, so that the sub-nodes are the basic period for the robot to obtain the passing reward, that is, the robot obtains the passing reward every other sub-node.

In some embodiments, calculating a scheduled revenue reward value between the neighboring child node and the neighboring master node according to a predetermined scheduled reward calculation manner includes: acquiring historical environment average speed and historical passing time of the robot between the adjacent main node and the adjacent sub-node in the same historical time period, and calculating scheduling profit reward values corresponding to the adjacent main node and the adjacent sub-node based on the historical environment average speed and the historical passing time; the historical environment average speed is used for representing the congestion degree of the robot in the areas of the adjacent main nodes and the adjacent sub nodes in the same historical time period.

Specifically, the calculation process of the scheduling reward (i.e., the passing reward) of the present application may be simulated as a process in which the robot eats candies along the walking path, that is, a candy is placed on each child node of the walking path of the robot, and when the robot passes through the child node, the candy of the child node is added to the scheduling reward.

Further, when the dispatching reward value is calculated, the dispatching reward value is calculated according to the environment average speed and the passing time of the robot in the historical time period of the time period corresponding to the current dispatching task, therefore, the environment average speed and the passing time corresponding to the node area in the historical time period can image the passing reward of the robot, in other words, the size of the candy (the size of the dispatching reward value) is in direct proportion to the congestion degree of the path in the walking process of the robot, namely, the lower the environment average speed corresponding to the node area, the smaller the candy (the smaller the dispatching reward value). In practical application, the practical calculation of the method is that the passing reward (candy size) of the conflicted resource of the robot in the process of arriving at the destination is in direct proportion to the candy size and the congestion degree, the congestion can be originated from the junction of small area switching in a scene and from the node of which the running statistical speed is lower than the average speed in the scene, for example: the middle node corresponding to the elevator is set as a large candy, and the middle node corresponding to the gate is set as a middle candy.

In some embodiments, the scheduling revenue reward value for the neighboring master node and the neighboring child node is calculated using the following formula:

wherein the content of the first and second substances,

a value representing a scheduled revenue award is indicated,

the average speed of the historical environment is represented,

representing the historical transit time.

Further, the minimum unit of the scheduling reward value of the application can be a trace (J), wherein a 1 trace represents a passable area with an environment average speed of 1m/s, and the scheduling system rewards after the normal passage is 1 s:

. The passing speed of the area is lower than the average speed of 1m/s, so that the area is congested, the passing speed of the area is higher than the average speed of 1m/s, so that the area is unobstructed, J and v are in inverse proportion, and the average speed of the environment is the passing speed of the area including pedestrians.

In one specific example, for example: in a 10m passing area with the average speed of 0.5m/s, the robot passes 12.5s at 0.4m/s, at this time, the dispatching profit reward of the dispatching system is (0.5) ^ -1 × 10 = 20J, and the efficiency of the single machine system is 0.4/0.5=80%.

It should be noted that, when the dispatching system cannot accurately capture the average speed of the environment in a short period of time, the average speed of the historical single machines can be used as the reward standard of the dispatching profit. When the method and the device are used for calculating the dispatching profit reward value, the used environment average speed is the historical average speed when the robot passes through the area corresponding to the node in the same historical time period, and when the historical average speed is calculated, the historical average speed can be calculated according to the distance between the nodes and the passing time of the robot, for example: when the robot 1 passes from the point A to the point D, the distance from the point A to the point D is known, and the historical environment average speed of the robot between each node can be calculated according to the distance and the passing time.

In some embodiments, in response to the scheduling trigger request, acquiring the position information and the state information of the robot at the current moment, and generating a scheduling travel path corresponding to the robot according to the position information, the state information and the master node of the robot includes: when each time interval is finished, generating a scheduling trigger request, responding to the scheduling trigger request, and acquiring the position information of the robot in the map and the state information of the robot at the current moment; generating a dispatching driving path for the robot to reach different target nodes from the starting node according to the position information and the state information of the robot; wherein, the state information comprises the working state and the article loading state.

Specifically, when each time interval is finished, the scheduling system simulates all scheduling driving paths of the robot in the map according to the position information and the state information of the robot and the task information of the scheduling task acquired in the time interval, namely all driving paths from the starting node to the destination node of the robot. Since the scheduling benefit reward value of the robot among the nodes in the same historical time period is already calculated in the foregoing embodiment, the total scheduling benefit reward value corresponding to each scheduling travel path can be calculated according to the scheduling benefit reward value among the nodes, the total scheduling benefit reward value is used for measuring the degree of goodness and badness of the robot scheduling travel path, and the higher the total scheduling benefit reward value is, the scheduling scheme (i.e., the combination of the scheduling travel paths corresponding to all robots) is the optimal scheme.

In some embodiments, determining a total scheduling revenue reward value corresponding to each scheduled travel path based on the scheduling revenue reward values between the adjacent child nodes and the adjacent master nodes comprises: according to the areas where the scheduling driving paths pass, the scheduling benefit reward values of adjacent child nodes in the passing areas are sequentially added according to the sequence to obtain the scheduling benefit reward values between the main nodes which sequentially pass according to the sequence, and the scheduling benefit reward values between all the main nodes are added to obtain the total scheduling benefit reward value corresponding to each scheduling driving path.

Specifically, after all possible scheduling driving paths are simulated, the scheduling benefit reward value among each node is known, so that a total scheduling benefit reward value corresponding to each scheduling driving path (i.e., each scheduling scheme) can be calculated, the scheduling driving paths are sorted according to the total scheduling benefit reward value, and a combination of the scheduling driving paths with the maximum total scheduling benefit reward value is selected as a final scheduling scheme.

In some embodiments, the quantification of "traces" is independent of the economic return of capacity itself, and the quantification of "traces" is a fraction of the cost of capacity (a fraction because it does not take into account hardware depreciation, site rentals, and maintenance of the robot, but only the opportunity cost of time); like a single capacity demand, costs are different during peak and peak hours because the average speed of the traffic in each area is different during different hours; quantification of "traces" also guides pricing of revenue pieces, since it is a cost, for example: the time opportunity cost of the transport capacity of the elevator needing to be transferred for 2 times is larger than that of the transport capacity of the elevator needing to be transferred for 1 time, and whether the price can be fixed or not can be adjusted according to the quantization degree; for example, the opportunity cost of capacity time during peak hours may be greater than during peak flat hours.

The quantification of the "trace" itself and the efficiency of the operation of a single task are also independent as long as the start and destination of the capacity demand are confirmed, the point in time of the capacity demand is confirmed, the "trace" can be confirmed. For a capacity system, the "trails" are implemented as a process of collecting trails J, similar to eating beans along the way. The number of acquired "traces" in a unit time or a designated time is the embodiment of the capacity of the system capacity scheduling, for example: at noon peak 2 hours, the number of realized "traces" is the embodiment of the scheduling capability of the system in the peak period.

According to the technical scheme provided by the embodiment of the application, the scheduling task is generated according to the service request by acquiring the service request within the preset time interval, and the map information of the scene corresponding to the scheduling task is determined; according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes; calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode; responding to the scheduling trigger request, acquiring the position information and the state information of the robot at the current moment, and generating a scheduling driving path corresponding to the robot according to the position information, the state information and the master node of the robot; and determining a total scheduling profit reward value corresponding to each scheduling driving path according to the scheduling profit reward values between the adjacent child nodes and the adjacent main nodes. According to the method and the system, the optimal dispatching driving path can be determined for the robot executing the dispatching task in a dispatching profit reward calculation mode, the dispatching capacity of a robot dispatching system is improved, the dispatching efficiency of the robot is improved, and the intelligent service level of the robot is improved.

The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.

Fig. 2 is a schematic structural diagram of a robot scheduling bonus device according to an embodiment of the application. As shown in fig. 2, the robot scheduling bonus device includes:

the obtaining module 201 is configured to obtain a service request within a preset time interval, generate a scheduling task according to the service request, and determine map information of a scene corresponding to the scheduling task;

the dividing module 202 is configured to perform a nodularization operation on the map information according to the scheduling task to obtain a main node corresponding to the scheduling task, and divide the main node to obtain sub-nodes;

the calculation module 203 is configured to calculate a scheduling benefit reward value between adjacent child nodes and adjacent master nodes according to a predetermined scheduling reward calculation mode;

the generating module 204 is configured to respond to the scheduling trigger request, acquire the position information and the state information of the robot at the current moment, and generate a scheduling driving path corresponding to the robot according to the position information, the state information and the master node of the robot;

the determining module 205 is configured to determine a total scheduling benefit reward value corresponding to each scheduling travel path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent master nodes.

In some embodiments, the obtaining module 201 in fig. 2 receives a service request sent by a user end in each time interval, generates a scheduling task according to the service request, and obtains map information of a scene corresponding to the scheduling task from a pre-stored map according to the scheduling task; the service request comprises a delivery request, and the delivery request comprises article information, payment information, time information and destination position information.

In some embodiments, the partitioning module 202 in fig. 2 performs a nodularization operation on the map information according to the start position information and the destination position information in the scheduling task, so as to nodularize the map into a plurality of main nodes, and performs node partitioning between adjacent main nodes according to a preset distance or time to obtain sub-nodes between adjacent main nodes; the main node comprises an initial node, a destination node and an intermediate node, and the intermediate node comprises an elevator node and a gate node.

In some embodiments, the calculation module 203 of fig. 2 obtains historical environmental average speed and historical transit time of the robot between the adjacent main node and the adjacent sub node in the same historical time period, and calculates a scheduling revenue reward value corresponding to the adjacent main node and the adjacent sub node based on the historical environmental average speed and the historical transit time; the historical environment average speed is used for representing the congestion degree of the robot in the areas of the adjacent main nodes and the adjacent sub nodes in the same historical time period.

In some embodiments, the calculation module 203 of fig. 2 calculates the scheduled revenue award values for the neighboring primary nodes and the neighboring sub-nodes using the following formula:

wherein the content of the first and second substances,

a value representing a scheduled revenue award may be displayed,

the average speed of the historical environment is represented,

representing the historical transit time.

In some embodiments, the generation module 204 of fig. 2 generates a scheduling trigger request at the end of each time interval, and in response to the scheduling trigger request, acquires the position information of the robot in the map at the current time and the state information of the robot; generating a dispatching driving path for the robot to reach different target nodes from the starting node according to the position information and the state information of the robot; wherein, the state information comprises a working state and an article loading state.

In some embodiments, the determining module 205 in fig. 2 sequentially adds the scheduling benefit reward values of the adjacent child nodes passing through the area according to the area through which the scheduling driving path passes, to obtain the scheduling benefit reward values between the host nodes sequentially passing through the area according to the order, and adds the scheduling benefit reward values between all the host nodes to obtain the total scheduling benefit reward value corresponding to each scheduling driving path.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 3 is a schematic structural diagram of an electronic device 3 provided in an embodiment of the present application. As shown in fig. 3, the electronic apparatus 3 of this embodiment includes: a processor 301, a memory 302, and a computer program 303 stored in the memory 302 and operable on the processor 301. The steps in the various method embodiments described above are implemented when the processor 301 executes the computer program 303. Alternatively, the processor 301 implements the functions of the modules/units in the above-described device embodiments when executing the computer program 303.

Illustratively, the computer program 303 may be partitioned into one or more modules/units, which are stored in the memory 302 and executed by the processor 301 to accomplish the present application. One or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 303 in the electronic device 3.

The electronic device 3 may be a desktop computer, a notebook, a palm computer, a cloud server, or other electronic devices. The electronic device 3 may include, but is not limited to, a processor 301 and a memory 302. Those skilled in the art will appreciate that fig. 3 is merely an example of the electronic device 3, and does not constitute a limitation of the electronic device 3, and may include more or less components than those shown, or combine certain components, or different components, for example, the electronic device may also include input-output devices, network access devices, buses, etc.

The Processor 301 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The storage 302 may be an internal storage unit of the electronic device 3, for example, a hard disk or a memory of the electronic device 3. The memory 302 may also be an external storage device of the electronic device 3, such as a plug-in hard disk provided on the electronic device 3, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 302 may also include both an internal storage unit of the electronic device 3 and an external storage device. The memory 302 is used for storing computer programs and other programs and data required by the electronic device. The memory 302 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/computer device and method may be implemented in other ways. For example, the above-described apparatus/computer device embodiments are merely illustrative, and for example, a division of modules or units, a division of logical functions only, an additional division may be made in actual implementation, multiple units or components may be combined or integrated with another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method of the foregoing embodiments may be implemented by a computer program, which may be stored in a computer readable storage medium and instructs related hardware to implement the steps of the foregoing method embodiments when executed by a processor. The computer program may comprise computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain suitable additions or additions that may be required in accordance with legislative and patent practices within the jurisdiction, for example, in some jurisdictions, computer readable media may not include electrical carrier signals or telecommunications signals in accordance with legislative and patent practices.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A method for robot scheduling rewards, comprising:

acquiring a service request in a preset time interval, generating a scheduling task according to the service request, and determining map information of a scene corresponding to the scheduling task;

according to the scheduling task, performing node operation on the map information to obtain main nodes corresponding to the scheduling task, and dividing the main nodes to obtain sub-nodes;

calculating a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode;

responding to a scheduling trigger request, acquiring position information and state information of the robot at the current moment, and generating a scheduling driving path corresponding to the robot according to the position information and state information of the robot and the master node;

and determining a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes.

2. The method according to claim 1, wherein the obtaining a service request within a preset time interval, generating a scheduling task according to the service request, and determining map information of a scene corresponding to the scheduling task comprises:

receiving a service request sent by a user side in each time interval, generating a scheduling task according to the service request, and acquiring map information of a scene corresponding to the scheduling task from a pre-stored map according to the scheduling task;

the service request comprises a delivery request, and the delivery request comprises article information, payment information, time information and destination position information.

3. The method according to claim 2, wherein the performing a nodularization operation on the map information according to the scheduling task to obtain a main node corresponding to the scheduling task, and dividing the main node to obtain sub-nodes comprises:

performing node operation on the map information according to the initial position information and the target position information in the scheduling task so as to nodify the map into a plurality of main nodes, and performing node division on adjacent main nodes according to a preset distance or time to obtain sub-nodes between the adjacent main nodes;

wherein the master node comprises an origin node, a destination node and an intermediate node, and the intermediate node comprises an elevator node and a gate node.

4. The method of claim 1, wherein calculating a scheduling reward value between neighboring child nodes and neighboring master nodes according to a predetermined scheduling reward calculation manner comprises:

acquiring historical environment average speed and historical passing time of the robot between the adjacent main node and the adjacent sub-node in the same historical time period, and calculating scheduling profit reward values corresponding to the adjacent main node and the adjacent sub-node based on the historical environment average speed and the historical passing time;

the historical environment average speed is used for representing the congestion degree of the robot in the areas of the adjacent main nodes and the adjacent sub nodes in the historical same time period.

5. The method of claim 4, wherein the scheduling revenue reward value for the neighboring master node and the neighboring child node is calculated using the following equation:

wherein, the first and the second end of the pipe are connected with each other,

a value representing a scheduled revenue award may be displayed,

the average speed of the historical environment is shown,

representing the historical transit time.

6. The method according to claim 1, wherein the obtaining position information and state information of the robot at the current moment in response to the scheduling trigger request, and generating a scheduled travel path corresponding to the robot according to the position information and state information of the robot and the master node comprises:

when each time interval is finished, generating a scheduling trigger request, and responding to the scheduling trigger request to acquire the position information of the robot in the map and the state information of the robot at the current moment;

generating a scheduling driving path for enabling the robot to reach different target nodes from an initial node according to the position information and the state information of the robot;

wherein, the state information comprises a working state and an article loading state.

7. The method of claim 1, wherein determining a total scheduling revenue reward value for each of the scheduled travel paths based on the scheduling revenue reward values between the neighboring child nodes and the neighboring master nodes comprises:

according to the areas where the scheduling driving paths pass through, the scheduling benefit reward values of the adjacent sub nodes passing through the areas are added in sequence according to the sequence, the scheduling benefit reward values between the main nodes passing through in sequence are obtained, the scheduling benefit reward values between all the main nodes are added, and the total scheduling benefit reward value corresponding to each scheduling driving path is obtained.

8. A robotic dispatch reward device, comprising:

the system comprises an acquisition module, a scheduling module and a processing module, wherein the acquisition module is configured to acquire a service request within a preset time interval, generate a scheduling task according to the service request and determine map information of a scene corresponding to the scheduling task;

the dividing module is configured to execute node operation on the map information according to the scheduling task to obtain a main node corresponding to the scheduling task, and divide the main node to obtain sub-nodes;

the calculation module is configured to calculate a scheduling benefit reward value between adjacent child nodes and adjacent main nodes according to a preset scheduling reward calculation mode;

the generating module is configured to respond to a scheduling trigger request, acquire position information and state information of a robot at the current moment, and generate a scheduling driving path corresponding to the robot according to the position information and the state information of the robot and the master node;

and the determining module is configured to determine a total scheduling benefit reward value corresponding to each scheduling driving path according to the scheduling benefit reward values between the adjacent child nodes and the adjacent main nodes.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any one of claims 1 to 7 when executing the program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.