WO2021140577A1

WO2021140577A1 - Robot control system

Info

Publication number: WO2021140577A1
Application number: PCT/JP2020/000203
Authority: WO
Inventors: 俊行樽井
Original assignee: ウェルヴィル株式会社
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2021-07-15

Abstract

[Problem] To provide robot control such that effective reinforcement learning is possible. [Solution] This system for controlling multiple robots is characterized by comprising: a work memory unit for storing multiple tasks to be performed by robots; an assignment unit for assigning individual tasks to the robots; a transmission unit for transmitting assigned tasks to a robot control device; and a status acquisition unit for acquiring robot operating status, and is also characterized in that the assignment unit changes the task assignment priority in accordance with operating conditions.

Description

Robot control system

The present invention relates to a robot control system.

Reinforcement learning related to robot control is being conducted (see Patent Document 1).

Japanese Unexamined Patent Publication No. 2019-7891

However, the system described in Patent Document 1 does not improve efficiency when a plurality of robots operate.

The present invention has been made in view of such a background, and an object of the present invention is to provide a technique capable of effectively controlling a plurality of robots.

The main invention of the present invention for solving the above problems is a system for controlling a plurality of robots, in which a work storage unit for storing a plurality of tasks to be performed by the robot and each of the tasks are assigned to the robot. The allocation processing unit includes a transmission unit that transmits the assigned work to the control device of the robot, and a status acquisition unit that acquires the operation status of the robot. It is characterized in that the allocation destination of the work is changed accordingly.

According to the present invention, efficient reinforcement learning can be performed for robot control.

It is a figure which shows the whole image of the system configuration which concerns on the robot control system of this embodiment. It is a figure which shows the system configuration example of the robot control system of this embodiment. It is a figure explaining the functional outline of the 2nd layer in the robot control system of this embodiment. It is a figure explaining the queue management of work. It is a figure explaining the resource allocation to a robot. It is a figure explaining the allocation to the robot of carrying 45 pieces of luggage. It is a figure explaining the distance from a start point to an arrival point. It is a figure which shows the schedule example of the robot occupied state and open state. It is a figure explaining the generation of the timing to take work out of a queue. It is a figure which shows the conceptual model of a robot scheduler. It is a figure which shows the conceptual model of a robot scheduler. It is a figure explaining the operation at the time of a robot failure. It is a figure explaining the operation in time series concerning a transfer order. It is a figure explaining the queuing management function. It is a figure which shows the method of finding out whether or not there is a robot in a standby state. It is a figure explaining the overlap of the passing time of a robot. It is a figure explaining the order change of a destination. It is a figure which shows the whole image of the system configuration which concerns on the robot control system of this embodiment. It is a figure which shows the outline of the robot control AI3 and the simulator 4. It is a figure explaining the state which the robot control AI3 shares the position information of the robot other than the robot corresponding to itself. It is a figure explaining an example of the behavior of the robot controlled by the robot control AI3. It is a figure explaining the whole structure of the robot control system of this embodiment. It is a figure explaining the control hierarchy of the robot control AI3 in the robot control system of this embodiment. It is a figure explaining the map information of this embodiment. It is a figure explaining the learning path of the robot control AI3 of this embodiment. It is a figure explaining the hierarchical structure of the robot control AI3. It is a figure explaining the robot simulator adapter. It is a figure explaining the interface between a robot simulator 41 and Unity (registered trademark) 5 which displays an image. It is a figure which shows an example of a transfer robot and an obstacle. It is a figure explaining a global sensor and a local sensor. It is a figure explaining the learning inside the robot simulator 41. It is a figure explaining the communication performed between a robot simulator 41 and a sensor 42. It is a figure explaining the sensor 42. It is a figure explaining an example of the arrangement of a sensor 42 and the state arrangement which holds the information from a sensor 42. It is a figure explaining the distance measured by the sensor 42 included in the

robots

2 and 41. It is a figure explaining the road model assumed in this embodiment. It is a figure explaining the relationship between the movement distance and position information (latitude and longitude). It is a figure which shows the hardware configuration example of the computer used for the robot control system which concerns on this embodiment.

<Outline of the invention>
The contents of the embodiments of the present invention will be described in a list. The present invention includes, for example, the following configuration.
[Item 1]
A system that controls multiple robots
A working memory unit that stores a plurality of tasks to be performed by the robot,
An allocation processing unit that assigns each of the tasks to the robot,
A transmitter that transmits the assigned work to the control device of the robot, and
A status acquisition unit that acquires the operating status of the robot, and
With
The allocation processing unit changes the allocation destination of the work according to the operation status.
A robot control system featuring.
[Item 2]
The robot control system according to item 1.
The allocation processing unit allocates one work to one or a plurality of robots according to a first work amount required for the work and a second work amount that the robot can perform.
A robot control system featuring.
[Item 3]
The robot control system according to item 1.
The allocation processing unit performs the work so that the amount of the work assigned to each of the plurality of robots is smoothed by the cumulative amount of the work assigned to each of the plurality of robots in a predetermined period. Assigning to the robot,
A robot control system featuring.
[Item 4]
The robot control system according to item 1.
The status acquisition unit acquires information indicating the operation status from the control device of the robot and a sensor independent of the robot.
A robot control system featuring.
[Item 5]
The robot control system according to item 1.
To provide a book that stores at least the occupied time occupied by the robot for the work in the debit and credit as an account item of at least each robot and the operating state as a whole.
A robot control system featuring.

The present invention can also have the following configurations.
[Item 1]
A system that controls robots
A control unit that controls the robot and
A simulator that simulates the movement of the robot and
With
Reinforcement learning related to the control of the robot according to the virtual operation of the robot simulated by the simulator.
A robot control system featuring.
[Item 2]
The robot control system according to item 1.
The robot comprises one or more sensors.
The control unit transmits a control signal related to the operation of the robot to the simulator to the simulator.
The simulator simulates the operation of the virtual robot in response to the control signal, simulates the measurement by the virtual sensor, and transmits the measurement information by the virtual sensor to the control unit. ,
The control unit performs the reinforcement learning according to the measurement information.
A robot control system featuring.
[Item 3]
The robot control system according to item 1.
The control unit
The request reception layer that accepts instructions to the robot and
A work pooling layer that gives the instruction as an input value for the reinforcement learning,
The AI layer that performs reinforcement learning and
A robot control system characterized by being equipped with.

<Purpose>
Hereinafter, the robot control system according to the embodiment of the present invention will be described.
FIG. 1 is a diagram showing an overall image of a system configuration according to the robot control system of the present embodiment. The robot control system of this embodiment is configured in five layers.

The first layer makes an external connection. The first layer can receive instructions from the user by, for example, natural language processing.

The second layer is the management layer. Manage multiple robots together. The second layer can be a scheduler for overall optimization.

The third layer is the control layer of the robot and controls the robot. The third layer can perform individual optimization for one robot such as route search.

The fourth layer is the execution layer, which is the layer on which the robot operates. In the present embodiment, the fourth layer can virtually operate the robot by simulation.

The fifth layer is an IoT layer. The fifth layer manages measurement data by various sensors and the like required for the autonomous robot.

<System configuration example>
FIG. 2 is a diagram showing a system configuration example of the robot control system of the present embodiment. The second layer robot scheduler includes an MDM server, an AP server, a DB server, and an ESB server. The robot session control of the third layer includes an ESB server, a DB server, an AP server, a Cache server, and a robot control AI process. The fourth layer robot simulator includes synchronous control, a communication adapter, map information, an ML agent, and Unity (registered trademark). The real robot operating environment of the fifth layer includes the API of the real robot and the SDK of the communication adapter.

<Function overview>
FIG. 3 is a diagram illustrating a functional outline of the second layer in the robot control system of the present embodiment. The second layer is configured as an independent domain. Request data can be input by text from the two-layer ESB. In the content of the request, the robot NO is not necessary, and the robot is automatically assigned and determined in two layers. The content of the request is expanded (queued). In the example of FIG. 3, the content of the request is to go from the waiting area to the starting position, and then return to the waiting area via a plurality of destinations. While implementing this request, you will receive an arrival result report from the 3rd layer every time you go through the starting point and the destination. The second layer gives an instruction to go to the next destination (evacuation area) for each destination. The robot can be released when it finally arrives at the evacuation area.

<Queue>
FIG. 4 is a diagram illustrating work queue management.

<Main functions of scheduler>
The main functions of the robot scheduler will be explained.

Transport robots are used in warehouses, factories, or on the premises of various buildings. In this embodiment, it is assumed that 100 or more robots automatically act. The premise that the robot takes action is that the transfer instruction from the external business system triggers the action. Assuming a business scene, there is a requirement that luggage exists on a shelf or the like, and the amount of luggage that has been waiting for a while from that shelf is moved to an appropriate place on another premises.

Collected luggage is transported from the place of occurrence (start position) to the destination (destination), but it is conceivable that the collected luggage may be at one destination or at multiple locations. Therefore, in this second layer, it is required to satisfy the following requirements as a role to accurately convey the contents of work to the robot and smoothly achieve the purpose.

== 1. Master related generation ==
1. 1. Master-related generation-Robot NO generation Robot master key is automatically generated by specifying the number of robots parameter.
The number of robots should match the number of robots defined in the 4th layer.
The robot NO is generated by combining the fixed position of the name and the variable position of the numerical value.
-Generation of standby position, start position, and destination master The coordinate axes of each work location are mastered from the map information of the entire building, and are used when calculating the Euclidean distance from here.
-Registration of robot type settings (allowable weight, volume) Robots have different sizes and are divided into several types. At present, one type of robot has a size of 1 m in length and 0.8 m in width, but it will be possible to handle several types of robots with different sizes.
-Definition of robot account There are queue account, activity account (ordinary / emergency), and failure account.
== 2. Queue ==
2. Function to receive transport requirements from the outside and queue once Queuing is to put a request in a queue and prepare to assign a robot. All orders are once queued in two layers, regardless of the robot reserve having zero balance. Then, wait for the preceding order to be completed. Then, it is transferred to the waiting account of the order. If a robot breaks down in the middle, the robot is transferred to the failure account and another robot is assigned. After that, the robot is excluded from the allocation target.

<Robot resource allocation>
FIG. 5 is a diagram illustrating resource allocation to the robot. As shown in FIG. 5, the relationship between the robot assigned in the plan and the order is transferred when the actual result is generated and the robot that has already been released occurs. By repeating the allocation change, the difference between the planned allocation and the actual allocation occurs. The transfer journal in that case will be described later.

== 3. Resource allocation to standby robot ==
3. 3. When it receives a request, it first allocates it to the waiting robot resource. The queue is created after allocating robot resources.
(1) When a request (work) is received, robot resources are first allocated. At that time, there is no priority in the requested items. It is necessary to keep the robot on standby for emergencies, but this is to prepare a robot that is not normally used for emergencies so that the emergency can be reliably taken in the event of a robot failure.
(2) Depending on the amount of luggage (volume, weight) that can be loaded on the robot, it may be necessary to divide it into a plurality of robots. Since one order = one robot, it is divided into multiple orders depending on the amount of luggage. If some of the multiple robots that support multiple orders cannot be assigned, the robots that can be assigned in the same time zone are assigned in the future time. If you think of the requested baggage as one continuous unit, it is not good that it is divided and delivered to the destination. There is no concept of unallocated. Be sure to allocate in the time zone that can be allocated.
(3) If the same start position or the same destination exists in the work under the latest allocated order while the allocation is occurring in the multi at the time of robot allocation, it is within a certain time difference. If the transit time is near, it is necessary to exchange the destination order. Calculated based on the relationship between the distance and time of the destination.
(4) When the robot allocation is performed, the robot allocation is averaged so that the robot once used is not continuously used due to the charging relationship. For that,
1) Select the robot type according to the requested total amount of luggage. (Range of allowable volume and allowable weight) When the type is limited and the allowable amount is exceeded, there is no choice, so the procedure is (5). Calculation: Luggage volume <= Allowable volume and Luggage weight <= Robots that satisfy the allowable weight and are the most minimal robots. If there is no robot that satisfies the above, it becomes (5).
2) Allocate robots in ascending order of accumulated usage time. That is, a robot that has been used for a long time has a new release time, so that the allocation priority is lowered.
3) Actively assign the robot with the oldest release time of the robot. However, if it is interpreted that the battery is being charged while it is released and waiting in the standby area, the longer the standby time, the higher the allocation priority.
(5) When multiple types according to the size of the robot are registered in multiple masters, the type of robot is selected so that it can be carried by one robot as much as possible according to the amount of luggage to be loaded. If it is not possible to select only a specific robot as shown in (2), the robot can be divided into a plurality of robots and assigned. (For the time being, there is only one type, 1m in length and 0.8m in width, but even if the number of types increases, it will be allocated.)
The allocation algorithm selects the robot type with the smallest robot tolerance (weight, volume). If one unit cannot accommodate it, raise the rank and judge. If there is no robot that can be applied to one robot, the robot is divided and assigned.
== Assignment example ==
FIG. 6 is a diagram illustrating allocation of 45 loads to the robot.
== 4. Response in case of failure ==
4. When the robot breaks down, the robot is put into a faulty state from the management screen, and an order is generated to move the robot to the faulty position by transferring to another standby robot. After arriving at the faulty position, continue the order in the middle.
(1) The failed robot NO is notified from the session control.
(2) After transferring the failed robot to a normal robot, the new robot NO notifies the session control of the destination with the failed robot NO.
(3) Session control is up to the coordinate axes of the failed robot. When a new robot is instructed to the robot control AI, the notification that it has arrived at the second layer is returned from the third layer. Since the robot control AI recognizes the destination by name, the request to move to the robot NO is added with a new function.
(4) During that time, the robot is stopped for a certain period of time (10 seconds) because there is no next instruction.
(5) The second layer instructs the robot control AI of the original destination. From here, get on the normal orbit.

<Occupancy and release of robot>
Occupancy and release are scheduled by the planned deployment of performance requests. (Not done in the plan request)
(1) Robot occupancy A robot is not in an occupied state just by being assigned.
Occupied = active when it is actually taken out of the queue and sent to the 3rd layer.
(2) Unit of instruction to the third layer The instruction to the third layer is a unit of a set of one section. That is, instructions are given in section units, such as from the standby position to the start position, from the start position to the destination 1, and from the destination 1 to the destination 2.
The next instruction is the timing when the third layer arrives at the destination, notifies the second layer of the next request, and the second layer receives the next request.
(3) Release the active robot When the work at all destinations is completed and the robot arrives at the waiting area, the robot is released.
When the final request (returned to the standby position) is received from the third layer, the robot assigned to this order is released. (If you want to continue, the next destination will be requested from the 3rd floor. When you reach the waiting area, the next destination will not come.)

When the robot is released, if the preceding request with the same robot ID is waiting, the request in the next waiting state is fetched by queue management, but in that case, the next waiting request of the same released robot Is to be taken out. However, this process is not performed continuously with the release, but is fetched by the time monitoring event.

<Calculation of estimated robot occupancy time>
It can be said that the allocation of robots does not occupy the robot, but the future occupancy of time resources by time zone with the usage time allocated. Occupancy is the timing at which instructions are actually sent to the three layers, and has an actual relationship with the plan.

Allocation in work is in the planning stage, and when a plan request is received, robot allocation (allocation) is performed at that point. Robots allocated at this stage must be calculated and predicted at what point in the future they will be occupied and at what point they will be released. The calculation method is shown below.

FIG. 7 is a diagram for explaining the distance from the start point to the arrival point.
The distance between the two destinations is C + B, not A. Therefore, the occupancy time is the cumulative total of the travel time according to the number of destinations and the work time at each destination, including the waiting area.
N = Number of intervals between destinations W = Average working time (The value is the master value, and the value is determined by the robot type. The standard is 60 seconds, and the value is determined by the ratio of the size.)
M = speed / sec (can be set as a parameter. The default can be 2 m / sec.)
F = Next occupancy interval time (seconds) (The interval time from the release of the same robot to occupancy. It can be set as a parameter, and the default is 1.2 seconds.)
h = magnification (can be set by parameter, default is 1.2)
In the case of, the occupancy time T can be obtained by the following equation.

Next, the occupancy start time and the release time are obtained.
Resource occupancy start time: The occupancy start time is the previous release time of the same robot + F seconds.
Resource release time ・・・・・・・・ Release time = Occupancy start time + Occupancy time T

<Schedule image>
FIG. 8 is a diagram showing an example of a schedule in the robot occupied state and the open state.
・ Equalization of robot utilization rate ・ Appropriate allocation of robots based on the development of plans and achievements
・ Robot status grasp (occupancy, release, queue)
・ Relationship between the amount of work and the number of robots ・ Consideration of charging time ・ Recognition of robot failure status

The premise is that the content of the request from WMS will be planned and developed for picking work, etc., and then the results will improve and transportation work will be born. At that time, I think that the request has been issued even at the planning stage. It will be easier to set up in advance.

<Calculation of loading order on robot>
The loading order and loading position (spatial coordinate axes) are calculated for each robot NO, and the results are returned as an array.

The loadable volume and weight of the robot differ depending on the robot size, but for the robot assigned this time, the loading method is calculated from the total amount of luggage.

For example, as an input value,
0. Robot size (volume that can be loaded = length, width, height), weight limit 1. Volume of individual packages (length, width, height), weight, arrangement of destinations (destination) (in order of destination)
2. Total number 3. Overall volume 4. The total weight etc. is given.

And from the scheduler, as the result data,
1. 1. Luggage NO
2. Loading order 3. Spatial coordinate axes of luggage (8 points)
Etc. are output. This is output as an array for the number of packages. This makes it possible to draw by Plottree or the like.

The result is set in the order. At the same time, it is also registered in the session control robot object. The order contents of the second layer are saved, but the contents of the robot object are cleared when released. How to load luggage into each robot and the contents of loading can be done by inquiring the robot. The inquiry can be displayed on the Web screen by searching the order contents in the second layer.

1. 1. Request number 2. Appropriate ID pointing to the package
3. 3. Assigned robot ID
4. Start position, destination ...
Display the loading image of the box in.

<Timing generation to retrieve from queue>
FIG. 9 is a diagram illustrating the generation of timing for retrieving work from the queue.

The timing of removing work from the queue can be controlled at 10-second intervals. The work in the queue is the first work that has been unfolded. Inside the queue, there are two states: when the robot cannot be assigned and when the assigned work is ready.

(1) When the work that can be executed is taken out from the queue, only one work is taken out at a time. There is a priority.
(2) All unallocated queues are taken out at the same time, and only the work that can be assigned is returned to the queue again. A queue monitoring process is prepared, and a trigger is generated and executed in the ESB at 10-second intervals. (Queue fetch trigger)

<Conceptual model>
FIG. 10 is a diagram showing a conceptual model of the robot scheduler. FIG. 11 is a diagram showing a conceptual model of the robot scheduler.

<Scenario type>
The following scenarios exist.

<Account journals for order status and robot status>
The following table describes the account journals for order status and robot status.

The following table is a table that explains the account journals when dealing with failures.

The first half is attached under the original order number. If you are returning, you do not need to do the latter half. In the latter half, the original order will be used and the order will be Robot002. After that, it becomes the flow of Robot002.
The request number is the same. After associating the robot with the order, the state is set to active or in transit.
Continue in the current order NO in the work.
Note: Alternate robots select robots that are on standby and do not have a queue.

In case of failure, if there is no standby robot in this timing, it will be processed automatically until the first half and will stop here once. The second half is done manually.

The robot scheduler of the second layer is provided with a book for managing the above-mentioned accounts, and can manage the queue, the operation of the robot, and the standby status by the book.

The robot that was operating in the transport order broke down, so assign an alternative robot and take over. Copy the contents of the transport order to the transfer order and generate a journal. It is assumed that the destination was progressing halfway, and it broke down on the way and stopped in the middle of the road. At that time, the journals completed up to the previous time in From and To are not necessary for the alternative robot, and even in the middle of the road, the journal starts from that time. Since this From is the starting point, all necessary journals are generated.

In the instruction to the 3rd layer, the starting point is the position of the failed robot. Therefore, the instruction from the standby position to the start position is the position of the failed robot from the standby position.

<Operation when the robot breaks down>
FIG. 12 is a diagram illustrating an operation when the robot fails. In the case of the failure pattern 1, the robot has failed while the destination 2 still remains, so the robot is transferred to the alternative robot. Therefore, Robot 2 is moved to the position where Robot 1 has failed. In the case of the failure pattern 2, although the failure occurred while returning to the standby area, all the work at the destination has been completed, so the transfer to the alternative robot is not performed.

<Account journals for order status and robot status>
The following table describes the account journals for transfer orders.

FIG. 13 is a diagram illustrating the operation in time series related to the transfer order. As shown in FIG. 13, when the actual result is generated, the plan is changed and the robot is transferred to the robot that can be assigned earlier. From the usable time of all robots that can be obtained by the getBalance function, it is possible to select a robot that does not have the next plan and determine the fastest usable robot among them.

For example, for Robot1, the completion of processing of order 1 has been delayed, so it is necessary to transfer the work of order 2 planned next to another robot. Here, Robot 2 finishes order 3 earlier than planned, but since order 4 is assigned to Robot 2, order 2 of Robot 1 cannot be interrupted. Robot3 has no order after the end of order 5. Therefore, the order 2 can be assigned after the completion of the order 5 of the Robot 3.

The following table is a table explaining an example of account journal entry between order status and robot status.

Accounts such as Robot001 are journalized by +1 only for the time zone existing in the queue at the time of allocation, but once occupied, they disappear from the queue. After that, it proceeds only by the transfer in the state account. It can be understood that the balance of the robot account is +1 or more as the number of queues. It becomes zero when the robot queue runs out. In other words, the balance of the robot can be said to be the number of waiting cases. Others are considered to be managed by the state account.

<Order queue management function>
Queue management of orders will be described.

1. 1. An external request was made, the order was expanded, and the order details were confirmed.
Accepting requests at the planning stage (may or may not exist) ... In the case of planning requests

At the stage of acceptance, normal work development will be performed, but it will be generated as a plan. However, in the case of a plan request, the robot allocates the robot in the state at that time, develops the plan, and enters the queue. Plan expansion is performed in the same way as normal processing, but it is not executed and enters the queue. Leave the plan request as it is and use only the temporary display. Temporary means that when a performance request arrives, it will not be used even if it is displayed.

-The contents of the order must be able to form an interface to the three layers. In other words, the weight and training of luggage, the place of start, and multiple destinations are composed of an array.
-When an order is generated, it will be assigned to the robot and a queue will be generated. For the processing content in that case, an appropriate robot is selected.
(1) The size is determined from the robot type based on the weight and volume of the luggage, and a robot suitable for the size must be assigned.
(2) Give the getBalance function (condition: balance = zero and standby state) to narrow down the robots.
(3) If (2) does not exist, the getBalance function is given (condition: balance = zero and active state) to narrow down.
(4) If (3) does not exist, the getBalance function is narrowed down by (condition: balance> zero and the robot whose last end time is closest to the present).

As a result, (1) becomes the AND condition from (1) to (4). (2) can be any robot, so one is selected, but the method of selecting the robot with the least amount of time used today in the actual results is to order the waiting robot from the waiting robot. You can get a list with the getBalance function, aggregate by the amount of time, and decide on a small number of robots. In (3), one robot whose last end time is closest to the present is selected. (4) is uniquely determined.
-Finally, link the robot and the order to the queue.

2. How to create and retrieve a queue ... A queue is an order queue, and robot allocation has been completed when the order is generated.

-Retrieving an order from the queue is the act of issuing instructions to each robot in three layers.
-Give the getBalance function (condition: balance> zero and standby state) to search for robots, and issue instructions to the third layer in order. At that time, do not forget the journal entry of the next work place (destination) even though it is the first time.
-The robot account is determined, and the order number arranged at the beginning is taken out from the entry element and the order number is confirmed. Once the order is confirmed, it will be possible to generate an interface from the order details to the third layer.
-Finally, activate the robot, put the order in the transport state, and complete the process.

(1) The plan request will be expanded, but will not be subject to queue management. In other words, since it will not be taken out, it will be left as it is, but when the performance request arrives and a new plan development of the performance request is performed, the display will be on this side and the plan request will not be displayed. It is only used for grasping the amount of work.

(2) When the achievement request arrives, the robot is assigned and registered in the actual queue. Two parameters are prepared for queuing.
-Number of cases to be processed continuously at one time-Interval time (in seconds) when taking out one case at a time

(3) Extraction conditions ・ The queue is in a state where orders are hung for each robot. Whenever an order is placed, it is tied to a specific robot (called allocation).
-Robots have already been assigned to all orders in the queue. (If there is a shortage of robots, the queue will only be extended)
-If the robot is currently active, do not retrieve the robot's queue. (Not applicable)
-No parallel processing-For the waiting state, getBalance () is used to get the order list, and the serial number of the order is in chronological order.
Take out one order in order from the order list. At that time, the order number, robot ID, and order elements are expanded on the list.
-Obtain the robot ID from the entry under the fetched order, and if the robot is in the standby state, take it out from the queue and execute the instruction processing to the 3rd layer.

(4) The retrieval process from the queue is realized by an external batch process, and the process is executed at regular period intervals.

FIG. 14 is a diagram illustrating a queue management function.
The waiting state of an order is called a queue. Currently, the following five orders can be obtained at one time. Further, the order of transmission of the taken-out orders to the third layer is as shown in FIG. In this way, only the head of the queue for each robot in the standby state can be acquired in the list.

3. 3. Interface processing of the second and third layers

◆ Receive the request for the next destination (return type = normal) from the 3rd layer.

When chaining from the waiting area to the starting point, from the starting point to the destination, and from the destination to the next destination, the queue is not used in the second layer. The next destination is taken out from the order contents and instructed to the third layer.
The robot ID is the key to the processing content. Based on the robot ID returned from the third layer, the active state of the robot is searched by the getBalance function, and the latest order number of the active state is captured. In addition, the previous destination exists in the entry contents. The next destination and end judgment can be judged by order NO. The order number can also be extracted from the request number. The destination next to the previous destination is acquired from the contents of the order, and an instruction for moving in the range of the current destination to the next destination is issued to the third layer.
If the previous destination was the waiting area, the processing of the order will be completed with this arrival report. The robot is transferred from occupancy to release, and the processing of the robot is completed.

◆ Request for the next destination from the 3rd layer (return type = abnormal)

This is the process when the third layer notifies that the robot ID currently being worked on has failed. The scenario is a failure response process, and the process content differs depending on the location and timing of the current stop.

If the failure occurred while leaving the destination of the transportation destination, it is necessary to transfer to an alternative robot and instruct to move to the failure point. Since the retanned robot ID is out of order, the latest order number in the active state is captured by the getBalance function similar to normal processing. In addition, the previous destination exists in the entry contents. Here, the movement instruction is given to the alternative robot by using the position information of the failure point instead of the next destination.

On the other hand, if the destination remaining is the standby area, the order processing is completed by transferring to the failure without using the alternative robot. When transferring to an alternative robot, the robot ID of the ordered content is set to the robot ID after the transfer, and the content is copied and generated. For work and movement, it is necessary to generate only the journal of the previous destination and determine the next destination.

4. This process is a batch process, which is periodically monitored and the queue is reorganized.

Queue reorganization means that the first queue is performed when an order is generated. The processing time (number of destinations and estimated processing time) of the order in that case is a plan, and when the actual result occurs after that, a gradual deviation occurs, and even though it is waiting in the queue of a certain robot. , It is conceivable that other robots are in a standby state and time is wasted. By reorganizing this state on a regular basis and applying appropriate compression, a queue with maximum efficiency can be generated.

◆ Details of reorganization ・ Find out if there is a waiting robot.
-If there are no robots in the standby state, move the order so that the final time of the number of queues for each robot is equal.

This method is shown in FIG. In the example of FIG. 15, the average order time is (20 + 30 + 10) ÷ 3 = 20.
(1) If the current queue of each robot is less than the average time, leave it as it is.
(2) Since Robot002 is above the average, search for other robots that are below the average. As a result, Robot001 and 003.
(3) Take one of them so that Robot002 is below the average. Take order 5 and move to Robot003. If the result of Robot003 exceeds the average time, the reorganization will be cancelled.

<Overlapping transit times>
FIG. 16 is a diagram for explaining the overlap of the passing times of the robots. In the example of FIG. 16, 12-A overlaps, but the time of arrival is estimated, and if the error between ROBOT001 and ROBOT0012 is within 60 seconds (parameterization), it is determined that they overlap, and either 12 -Change the order of A. If there is only one destination, return the queue to the queue and delay it.

The passing time is not an exact time, but a relative relationship between the robots. The overlap of passing times is determined by calculating the relative difference in the straight line distance from the destination to the destination. In the above example, if the Euclidean distance between the coordinate axes from wait to the next position is calculated and weighted as 1 second per 2 m, the distance error becomes a time error, so the difference is used for judgment. The time of the passing point is determined by accumulating the time from position to position. At that time, the working time for the purpose is set to be constant and added individually.
When i and j are relative positional relationships (for example, wait and 15-A can be expressed by 1 and 2), and K is the cumulative number of times (m), the distance d (ij) is expressed by the following equation. Can be represented by.

The distance xy between two points on the X coordinate axis can be calculated by the following equation.

<Change the order of destinations>
In the third layer, when the robot has arrived at the destination, but the robot preceding the same destination has already arrived and is in a state of meeting after that, and there is another destination. , The third layer compares the waiting time with the travel time to other destinations, and requests the second layer to change the order of the destinations if the other destinations are not crowded and can be unloaded immediately. To do. In the second layer, the order of the work development results is changed, the current destination is in an unprocessed state, and the next destination is instructed to the third layer. The third layer instructs the fourth layer to move from the current position to the next new destination.

FIG. 17 is a diagram illustrating a change in the order of destinations. In the current movement of the robot in the third layer, if the current destination held inside the robot is replaced, the third layer indicates the destination name of the destination to the fourth layer. The fourth layer begins to move from its current position to the newly indicated destination.

In response to the order change request from the 3rd layer, it is possible to determine whether the next destination is in the arriving state (unloading state) by another robot in the 2nd layer. Therefore, even if the order is changed from the third layer, a refusal answer can be issued. In the second layer, if it is not in such a state, it will be exchanged for the next destination. In this example, it is changed from 9-A to 12-A.

The following table is a table showing API specifications for session control from the second layer to the third layer.

2nd layer → 3rd layer Business instruction data structure to robot control AI

The business instruction data structure from the 2nd layer to the 3rd layer robot control AI can be expressed as JSON format data as follows.
{
"robotId": "robot001", "requestNo": "REQNO00001",
"fromDestination": "waitGate", "toDestination": "A",
"destinationOrderNo": 1, "destinationOperation": 1,
"quantity": 10, "weight": 50.0,
"goodsIdList": ["BOOK001", "ORANGE001", "BEEF001"],
"actionType": 1,
"plannedWorkTime": "00000000001500000",
"destinationList": ["A", "G"]
"luggageList": [
{
“LuggageNO”: Luggage NO, “seqNo”: Order,
“A”: {“x”: coordinates, “y”: coordinates, ”z”: coordinates}, “b”: {“x”: coordinates, “y”: coordinates, ”z”: coordinates},
“C”: {“x”: coordinates, “y”: coordinates, ”z”: coordinates}, “d”: {“x”: coordinates, “y”: coordinates, ”z”: coordinates},
“E”: {“x”: coordinates, “y”: coordinates, ”z”: coordinates}, “f”: {“x”: coordinates, “y”: coordinates, ”z”: coordinates},
“G”: {“x”: coordinates, “y”: coordinates, ”z”: coordinates}, “h”: {“x”: coordinates, “y”: coordinates, ”z”: coordinates}
},…
]
}

The following table is a table explaining the structure of the data returned from the third layer to the second layer.

The data structure returned from the third layer to the second layer is expressed in JSON format as follows.
{
"robotId": “robot001”,
"requestNo": "REQNO00001",
“ReturnType”: 1,
“DestinationOrderNo”: 1
“RobotPosition: xxxxx-yyyyy”
}

<Reception data of robot scheduler>
The format of the request data from the outside received by the robot scheduler of the second layer is defined.
1. 1. This is instruction data for the robot to transport a specific load from a designated place to a designated place.
2. The specified luggage is assumed to be a box, and the number of boxes, the total weight, and the total volume are specified.
3. Robots are assigned with one designation, but it may be necessary to divide them into multiple robots due to weight restrictions and volume restrictions.
4. The destination is specified for each box. Therefore, there are a plurality of transport destinations, and they are stacked from the bottom in order of distance.
At that time, a plurality of robots may be required according to the requirement 3.

In order development, one robot makes one order. Work is generated for each destination.
Work development
(1) Movement from the standby position to the start position (loading destination) (2) Loading work (initially human) ... From the 3rd layer, (1) and (2) Completion reports come to the 2nd layer at the same time (1) 3) Moving to the destination: (3) and (4) from the 3rd layer, the completion report comes to the 2nd layer at the same time. If there are multiple destinations, (3) and (4) Repeat 4) (4) Unload at the destination (5) Return to the waiting area when the luggage is empty ... When (4) is completed, request the third layer.
When you arrive at the waiting area, there will be a return from the 3rd layer and the robot will be released.
Regarding (3) and (4), the order is determined by calculation according to the distance.

Request data format 1. Start position name (example: 1-A) and loading amount (number of boxes, weight, volume)
2. Initial transport destination (example: 10-A) and unloading amount (number of boxes, weight, volume)
[From the next time onward, the destination (example: 12-A) and the amount of unloading (number of boxes, weight, volume)] ...]
Note: When there are multiple destinations, the order may be undefined on the data (determined by the system).

Example:
Instruction number: XXXXXX, plan flag: x, (location 1: 1-A, number of boxes: number, weight: number, volume: number),
(Place 1: 10-A, number of boxes: numerical value, weight: numerical value, volume: numerical value),
(Place 1: 12-A, number of boxes: numerical value, weight: numerical value, volume: numerical value)
Note: The instruction number is a unique ID specified by the requesting party (WMS, etc.).

Below is a table of the main record items.
"request"

"Request, order"

"work"

"Move"

<Third layer>
Hereinafter, the third layer will be described. FIG. 18 is a diagram showing an overall image of the system configuration according to the robot control system of the present embodiment. In this embodiment, it is assumed that a general-purpose application platform of AI that controls a robot operating inside a building is developed.

First, human 1 asks robot 2 to do the work. Robot 2 receives the package at the designated place. It is a package of packages for multiple destinations. Luggage collected at one time is delivered to multiple destinations (destination) in the most efficient order. At each location, as "work", the human 1 receives only the luggage required at the destination, and when the work at the destination is completed, the robot 2 moves to the next location. In this way, when the human 1 gives an instruction to the robot 2, the robot 2 has the ability to efficiently carry an object based on the knowledge learned in advance. In addition, the method of instruction is that the human 1 understands the content of the instruction by having a conversation in Japanese and takes an action based on the instruction.

In order to realize these, the request reception layer 31 that accepts instructions, the control layer (robot control AI3) for understanding and acting on the instruction contents, and the road on the premises to avoid obstacles or if the road is closed, An execution layer (simulator 4) is required, such as stopping and acting in response to an instruction from the control layer 2. The request reception layer 31 may be provided by the robot 2 or the robot control AI 3, but in the present embodiment, the request reception layer 31 is provided by the robot control AI. By learning the knowledge in each layer by artificial intelligence, it is possible to realize independent behavior. In that case, the control layer 3 does not need to be aware of the shape and restrictions of the individual robots, and only needs to talk with the robot using a general-purpose instruction interface. Therefore, even if the actual robot does not exist, the demonstration experiment can be performed by the virtual environment and the virtual robot. The so-called physical robot entity should be prepared for each purpose of the place and the operation should be confirmed.

This time, the structure of the robot and the premises, which is the place, will be realized in a virtual simulation environment (simulator 4).

However, the virtual simulation actually performs reinforcement learning for traveling on the road in the premises, just like a real robot. In addition, the control layer 3 has various abilities such as understanding the business story, traveling ability to give instructions to the robot, ability to efficiently derive the shortest distance, and selection of a detour route corresponding to the occurrence of an unexpected obstacle. Reinforcement learning is required.

In this basic framework, the ability of the control layer 3 by AI and the ability of the execution layer 4 by AI aim to achieve the purpose while talking with each other.

<Usage scene of robot control AI>
As described above, this robot control AI3 is not premised on a specific robot. The target function is premised on restrictions as a function related to movement, such as carrying an object when moving from place to place or guiding a person to a specific place, but various usage scenarios are assumed. .. For example
・ When picking a package on a specific shelf in a warehouse and moving it to another target shelf, the package is automatically transported to the specified location efficiently.
・ At airports, tourists want to go somewhere but don't know how to get there, so they will guide you to the destination.
・ At a long-term care facility, carry the finished meal to the target room, or go to the room to pick up the tableware after meals and bring it back.
・ In addition to warehouses, factories and public facilities, etc.
Various usage scenes are possible

Robot 2 has different shapes and functions depending on the purpose, but it is premised that each function is controlled autonomously, and the action at the end is left to the function of the robot.

The robot control AI3 aims to move from place to place based on map information and efficiently carry objects by selecting the optimum route, but it has an autonomous function on the robot 2 side. By fusing, it is possible to solve practical problems.

The greatest feature of this function is that the robot control AI3 assumes that a plurality of robots 2 operate at the same time in a specific usage scene. Therefore, it is assumed that the position of the robot 2 and the state of the road play a role like a control center capable of controlling a plurality of states based on a map of the entire premises.

While performing work such as loading and unloading luggage, the place is occupied when viewed from the robot 2, and the other robots 2 are locked to the place. When the robot control AI3 detects this state, it guides the robot to another place so that the robot control AI3 can perform the work in the place where the occupancy is released after performing another work.

It is necessary to let artificial intelligence learn including these business stories. As a result, the applicability of the usage scene will be expanded, and the utility value will be recognized in the scene where efficiency is pursued and in the field where the total time is limited.

<Robot control AI and robot simulator>
= Explanation of the whole picture =
FIG. 19 is a diagram showing an outline of the robot control AI 3 and the simulator 4. In the example of FIG. 19, the control layer 3 includes the request reception layer.

Robot control AI3 is composed of three layers, each of which has a role. At the 3rd level, the route to the destination is optimized by making full use of artificial intelligence. The result is instructed to the robot simulator 41.

The robot simulator 41 executes the traveling operation from the route to the route and the traveling operation from the route to the destination while receiving instructions from the layer 3.

The robot simulator 41 periodically (every second) notifies the position information and speed of where the robot 2 is now in three layers. Learning is done to perform the actions of right, left, front, back, and stop, and you can judge for yourself.

The sensor is a dedicated function for generating a virtual state, and randomly generates events in chronological order. For example, when it receives an element of action, it returns position information (x, y) and velocity to the simulator in chronological order.

<Sharing location information other than yourself>
FIG. 20 is a diagram illustrating a state in which the robot control AI3 shares position information of a robot other than the robot corresponding to itself.

The robot control AI3 needs to share the state of the entire map information at each time point. The robot simulator 41 also requires the same conditions, but in that case, the sensor 42 collects the individual information of the local sensor and shares it with each simulator 4. However, here, we will not be aware of the global sensor so that changes will not occur even if the state from the actual robot is accepted in the future.

As an alternative, the robot control AI will share the state coming up from each robot simulator 41 through the global cache. Each robot control AI3 operates in a closed world in an independent session, but if information other than itself can be acquired as other session information, it will be shared, so the global cache is used as a means for that.

As a result, since the position information of each robot 2 is known like the control center, the factors that hinder oneself can always be grasped, and the traveling route is also controlled based on this information.

Shared information makes it possible to always recognize the entire information at regular time intervals (1 second) at the same time as its own information.

<Image of robot control AI behavior>
FIG. 21 is a diagram illustrating an example of the behavior of the robot controlled by the robot control AI3.

Receive requests from the second level. The main contents are (1) the starting position, (2) a plurality of destinations to which the luggage is moved, and (3) the number and weight of the luggage at each destination.

Create a schedule in advance, such as assigning waiting robots and calculating the optimum order for destinations.

However, when you actually start moving, you may not always be able to proceed as originally planned. Unexpected things such as obstacles and occupancy of destinations occur suddenly. Therefore, the learning content is to be able to respond to these ever-changing events.

= What does robot control AI learn? = =
Learning to instruct a robot to take a fixed action creates the next action from external instructions and the state sent from the robot. Learn the content for that. (Action from state and input → Calculate reward from result (previous state))

The reward is evaluated based on how much the expected result is obtained, and the evaluation content is weighted to calculate the score.

According to the optimal action value function (Benlemann equation), the next action is learned in an attempt to obtain the maximum value.

<Overall system configuration image>
FIG. 22 is a diagram illustrating an overall configuration of the robot control system of the present embodiment.

= Robot control AI =
1. 1. When the package information is received as a negative document, the document content is analyzed and converted into standard request information.
2. Understand the request information and dynamically generate the story.
・ Determine the order of transit to the destination ・ Number, weight, distance, and time of luggage for each destination 3. Instruct the next action from the current state.

<Control hierarchy>
FIG. 23 is a diagram illustrating a control hierarchy of the robot control AI3 in the robot control system of the present embodiment.

Each layer has a role and performs independent processing. The robot control AI3 layer becomes the core of this process and performs actual control. The work of the robot 2 here is to load the cargo at the starting place, reach the designated place in the shortest time, and unload the cargo. The work of loading and unloading is done by humans this time.

The feature of the processing here is that the work pooling layer 31 gives an independent instruction to one destination to the input layer of the neural network, so that the work for a plurality of destinations is serialized and in order. It is to be input.

The robot control AI layer is to instruct the autonomous robot to optimally execute the story corresponding to one processing purpose. This part is reinforcement learning with DDQN.

<What is map information?>
FIG. 24 is a diagram illustrating map information of the present embodiment.

In the present embodiment, the map information is assumed to be the numerical value of the range of the place, the composition of the destination, and the information of the route composition. On this map, it is assumed that multiple robots are passing at the same time.

<What does robot control AI3 learn?>
= Overview of route optimization learning =

Of the destinations A to L, if the robot 2 is instructed to receive at the starting point, deliver it to multiple destinations, and deliver it, how to go through the multiple destinations will deliver it most efficiently. Learn what you can do.

= Agent judgment element =
Based on the input value = request content, learn to receive the package at a predetermined location and deliver it to multiple destinations accurately and within the time limit.

= Elements to learn =
・ Move from the waiting area toward the start position. At that time, remember to select the optimum route. (Details will be described later.)

・ After loading the luggage, move to the first destination. At that time, remember to select the optimum route. (Details will be described later.)

・ Remember to detour on routes that cannot be passed.
Existing at present x. There are multiple other robots in the y position, and I don't want to go that route to the destination.
When an obstacle, another robot, or a path state cannot be passed, as in Candidate 2 and Candidate 3, the optimum route that is possible from a plurality of route candidates is selected. Each time, the candidate no is lowered while calculating the number of angles and the distance.
Assumption for each time point = You can recognize the state where robots other than yourself are scattered somewhere on the road (route) on the map. Can be obtained in the state.

・ Remember that you can move from the first destination to the next destination.

・ When approaching the destination, if it is occupied by another robot, select another optimal destination and detour. (In this embodiment, it is assumed that a human is responsible for loading and unloading the luggage.)

= Determined as the position information of the central reference line of the road. = =
The mileage from the robot 2 is acquired as an actual distance, but it is assumed that the distance deviates from the reference distance within a certain range.

(4) If an obstacle that blocks the passage is detected, including other robots,
・ If the obstacle disappears after stopping for 3 seconds, continue as it is.
・ If there are still obstacles after 3 seconds, it is judged that the vehicle cannot travel, and the travel time to the destination via the detour route is calculated once. Let this be p time. If it is determined that the vehicle cannot run, it is determined that the obstacle will be removed after waiting 10 seconds, so this is set as q hours.
・ If p time> q time, wait here without detouring. If it is the other way around, detour. (The reason for setting it to 10 seconds is that the robot does not know when the obstacle disappears. After 10 seconds, it determines whether there is an obstacle again.)
-However, since the standard value of the traveling time between adjacent routes is 1 m / min, it is determined whether to wait or detour by converting the distance between the routes. The distance is calculated from the map information because the linear distance from the route to the route can be known from the x and y coordinates.

= Distance calculation =
Find the distance from point A (x, y) to point B (x, y). If the linear distance is in the horizontal direction, y is fixed and || A (x) -B (x) || = m.

In the vertical direction, x is fixed so that || A (y) -B (y) || = m.

Actually, the number of passing routes is obtained from the latest routes of points A and B, and the total distance between the routes + the latest distances from both routes are added to obtain the value.

|| Route 1 (y) -A (y) || + Number of routes x Distance between routes (equal) + || Route 2 (y) -B (y) || = Real distance (m) between A and B Become.

(5) In the algorithm, four routes are given to select a route set from the start position to the destination set. At this stage, only the route-to-route relationship is determined. You have to learn the shortest route by yourself because you have to go through several routes to the destination.

(6) Reward evaluation is performed in two stages.
-The number of routes from the end (start position) to the end (destination) is known from the shortest route matrix.
Difference = Shortest path matrix Execution result If the difference <0, the reward = difference x 10 ... The difference is negative, so multiplying by 10 gives a negative value.
If the difference> = 0, the reward = difference x 10 ... The difference will be positive, so multiplying by 10 will result in a positive result.
-Note: The difference calculation here is only for the shortest path number, and does not consider the number of angles.

= How to determine whether to go straight, up / down, or left / right on the route =
FIG. 25 is a diagram illustrating a learning path of the robot control AI3 of the present embodiment.

The route and the route are connected by calculation, but when heading for the next route, it is necessary to calculate whether to go straight, up or down, or left or right.

= Rule 2 (optimization of destination array order) =
Change the sequence number of the destination array on the way.
If the distance to the coordinate axis of the destination is vacant when it is within the range of 2 m, it is assumed that you have occupied it.

1. 1. Empty (I was within a 2m distance from my destination)
2. Occupied state (I recognize the occupied state within 3 m of the destination and stop)
3. 3. Unoccupied (work completed when another person is more than 2m away from the destination)
4. Occupancy is starting (the distance to the destination is within the range of 2m by others)

With the above as the occupancy state of the destination, the entire destination occupancy status can be recognized within the independent session of each robot control AI3. Therefore, the destination occupancy state of all the robots 2 of oneself and others is obtained every second and shared by all the robots 2. The calculation of the estimated time until the robot 2 of another person is released from the occupied state is proportional to the number of luggage held by the robot 2. For example, if the destination F = 5, the destination H = 7, the destination J = 3, and the current arrival is near the destination H,

Setting of standard occupancy time (default => 5 seconds / number of luggage = 1) ・・・ As parameterization
Occupancy start time + (standard occupancy time x 7) = scheduled occupancy release time.

(1) In the destination array excluding the first start position, if the destination is occupied by another robot 2, the order is brought to the end as much as possible. => Synonymous with increasing the sequence number.

Occupancy means the time from when the robot 2 arrives at the destination until it departs for the next destination or waiting area. The timing of changing the destination sequence number is determined immediately before one destination is completed and the movement to the next destination is started. However, the sequence numbers are not replaced (if the remaining time until release is occupied release time-current time <20 seconds).

Otherwise, replace the sequence numbers of the remaining destinations, excluding the completed destinations, from the occupancy status. → Go to route determination in (2) If the destination is still occupied by another robot when it arrives at the destination, the order of the destinations is changed here as well.

(2) Route optimization is calculated in one section from the destination (start position) to the destination. If the section changes, the optimum route is calculated again at that stage. → DDQN (optimal action value function)

・ What do you have to decide that the destination arrangement is appropriate? → Judging from the above example, depending on which is more appropriate, (A-> D-> J) or (A-> J-> E), (A-> D-> J): (A-> J-> E) ) = 7 to 5, the latter is optimal.

Furthermore, the number of angles is 2 to 2, so they are even. Calculate the total number of routes for all the most appropriate destinations, swapping the destination order.

= Rule 3 =
Obstacles are shared so that the robot can recognize where the obstacle is currently occurring while it is running. The maximum number and positions of obstacles that can be recognized by each robot control AI3 at one time is 20. During learning, make it impossible to select a route that includes obstacles at the selection stage, whether the selected route has obstacles. Obstacles disappear 10 seconds after they occur.

1. 1. Recognizing the existence of obstacles Obstacles can be broadly classified into two types.

(1) It is an obstacle that can pass even if it exists at a passing point. In this case, the other party may be another robot 2. It is possible that they cross each other so that they do not collide with each other, or that the other party is stopped and overtakes. In this case, since the autonomous robot 2 passes by itself avoiding obstacles, the robot control AI3 does not need to know.

(2) When the road is completely impassable, the autonomous robot 2 has no choice but to turn back and select another route. In this case, an obstacle is created at the moment when the other autonomous robot 2 recognizes the sensor 42 and determines that the robot cannot pass. This information will be shared by all robot sessions and will be recognized by the whole at the same time. That is, the impassable information at this time is attached with coordinate axes (x, y) and recorded as a state.

As mentioned above, the obstacle means (2).

When selecting the shortest path, if a positional relationship corresponding to the path is detected in the obstacle array, the shortest path is calculated again. As a result, find a route that does not contain obstacles.

Therefore, in the learning process, it is necessary to intentionally randomly insert position information into the obstacle array to select the shortest path.

2. Recognize that previously existing obstacles have disappeared

State: The robot is included in this obstacle array at the timing when the obstacle array (x, y), which is commonly held in s, disappears.

When passing through the positional relationship (x, y), if the passing position information and the obstacle position information match within a certain error (2 m range), the obstacle is removed.

On the contrary, if it exists in the obstacle array, it will not pass through the place intentionally, and the obstacle will exist forever. This time, by convention, it will disappear 10 seconds after the obstacle occurs. (Underlined parameters)

<System structure of robot control AI layer>
FIG. 26 is a diagram illustrating a hierarchical structure of the robot control AI3.

<Creation of robot simulator adapter>
FIG. 27 is a diagram illustrating a robot simulator adapter.

The stub is a program for testing the robot simulator 41 before directly interfacing with the actual robot control AI3 application. It reads text data, generates instructions to the autonomous robot 41, and enables simulation.

In the actual configuration, this stub part corresponds to a part of the environment, and it plays the role of interface to the autonomous robot in response to the request from the environment without reading the text data.

Note the following.
-The interface format should be strictly matched to the real thing.
-Although the instruction data format is free, the issue order NO and interface format (Json) are determined.

<Role of robot simulator>
The robot simulator 41 executes instructions from the upper hierarchy by reinforcing learning the actual knowledge of the robot 2. The main role is to derive the next action based on the information acquired from the sensor 42, and to generate the action based on the state. It operates according to the instruction from the robot control AI3. The instruction content is added to the input layer of the robot simulator.

1. 1. Instruction data from robot control AI3 2. List of all robots 41 and obstacle positions at the moment 3. State from sensor 42 Reinforcement learning by DDQN is performed based on the above three states.

Of these three, consider the feasibility of the second information. Originally, the actual physical robot 2 moves while reacting to a state in which the outside world changes dynamically based on its own sensor, so learning is based only on its own sensor information at that time. Do.

However, it is necessary for the simulator 41 to capture the movement events of a plurality of robots at the same time. On the robot control AI3 side, which is a higher layer, each robot 41 reports a state, and therefore control can be performed by sharing the state, but the simulator 4 must do the same.

In fact, the learning method of the simulator 4 learns to react to a dynamic and sudden event on the sensor 42 side while periodically generating it. However, since the sensor 42 has only its own sensor information, the information given to the simulator 4 is closed within itself. For example, assuming that five robots 2 are moving at the same time, each of the five robots 2 travels in a uniquely closed state. However, what is developed in the entire map information is that the five robots 2 are instructed from the upper level, and they are running without any dependence on each other.

From the simulator 4, it seems that each of them is running randomly. By sharing the state from the five sensors 42, it is possible to acquire the distance to a robot other than oneself and the distance to an obstacle that repeatedly occurs and disappears on a regular basis. In addition, since it is possible to judge a collision, it is used in a reward judgment. The collision becomes a game set.

From the above, in summary, the state of the robot simulator 41, including myself, must be able to constantly acquire the state of all running robot simulators 41 other than myself. However, the sensor 42 (local sensor) only needs to generate a state peculiar to each robot. It is the global sensor that collects all the local sensor information. The hierarchical relationship is shown below.

<Interface with Unity (registered trademark)>
FIG. 28 is a diagram illustrating an interface between the robot simulator 41 and Unity (registered trademark) 5 that displays an image.

Define the interface between the robot simulator 41 and Unity 5 that displays real-time images. The interface condition is an image in which the entire moment of the current image is taken as a unit and this is continuously made into a moving image, so the position information of all robots 41 and the position information of all obstacles are listed. hand over. The delivery interval is 100 ms.

= Prerequisites for location information =
The position information of the coordinate axes is displayed as (x, y), which is a point. The robot 41 and obstacles are displayed by volume, and the point on this coordinate axis indicates which position. The position of the robot is the black dot at the beginning. Obstacles are fixed at the front on the left.

FIG. 29 is a diagram showing an example of a transfer robot and an obstacle.

Since the road width is 2m, the robot 41 can pass through

obstacles

1 and 2, but the obstacle 3 cannot pass through and must stop. After that, we will head back to the circuit.

Table 1

<Role of sensor>
The sensor 42 has a role of generating its own state and transmitting it to the simulator 4. Originally, it is built in the physical robot 2, but since the actual robot 2 has various functions depending on the purpose, it is decided to prepare a simulator 41 that does not depend on these functions. did.

The simulator 41 has the same knowledge as the robot 2 and performs reinforcement learning in advance so that it can act autonomously, but the sensor 42 is dedicated to the role of generating a physical state.

Sensor 42 plays a role of calculating the current state that can occur due to its own running while calculating it in time series, so reinforcement learning is not necessary. The simulator 4 decides whether to take the same action in response to the result, but in this case, reinforcement learning is required.

The sensor 42 must generate its own state on a regular basis (100 ms). The state is a two-dimensional coordinate axis space of statically defined map information, and repeats a calculation action for changing the situation in which one is placed in a time series by an action received from the simulator 4. The role is to return the result to the simulator 4 at any time.

However, this is only in my own world, and does not include the events of other simulators 4 that occur in the entire map information. That is, it is not possible to recognize that another robot 41 is approaching itself or an obstacle that suddenly occurs. Therefore, since the robot control AI3 knows the states that must be shared among the states received from the plurality of robot simulators 41, these states are periodically (eg, 100 ms) incorporated into its own state. There is a need. This is not necessary if the physical robot behaves in a physical location, but it is necessary in the simulation space. This part is the key to the simulation.

As a method, prepare a group sensor that collects the sensors 42 of all robots. It is controlled by two layers, a local sensor and a global sensor. In addition to the state of each robot, the global sensor also plays a role in storytelling the creation and disappearance of obstacles and causing random events.

<Global sensor and local sensor>
FIG. 30 is a diagram illustrating a global sensor and a local sensor.

The global sensor acquires the status of each local sensor when passing the sensor information to the robot simulator 41. All the states of these local sensors are put together as an overall state list, and this collected state list is handed over to the robot simulator 41.

When learning, the robot simulator 41 grasps all other events occurring at the present time as a state and determines an action.

<Autonomous robot simulator>
FIG. 31 is a diagram illustrating learning inside the robot simulator 41.

= Role of sensor =
1. 1. Based on the map information, the current position of oneself while traveling is periodically obtained and reported to the simulator 4. (Calculate and hand over your position, mileage, obstacles, and distance to the wall each time)

2. Regularly inspect and report distances to front and rear walls and obstacles.

3. A new state is generated in response to the content of the action a from the agent of the simulator 4.

4. The inside between the sensor 42 and the simulator 4 changes its state periodically (100 ms), but the simulator 4 returns to the robot control AI3 every second.

5. Calculate and hand over your position, mileage, obstacles, and distance to the wall each time.

6. The sensor 42 receives the overall state list and recognizes the position of the robot 41 and the position of an obstacle with others.

<Communication between robot simulator and sensor>
FIG. 32 is a diagram illustrating communication performed between the robot simulator 41 and the sensor 42.

<Sensor recognition (synchronization with robot side)>
FIG. 33 is a diagram illustrating the sensor 42.

The 32 sensors 42 have 64 units as each element at (1) distance and (2) angle. The 64 units (neurons) also hold four, (1) walls, (2) obstacles, (3) white lines (closest to 1), and (4) white lines (closest to 2), so 64x4. = 256 neurons.

= Recognition of sensor elements =
FIG. 34 is a diagram illustrating an example of the arrangement of the sensor 42 and the state arrangement for holding the information from the sensor 42.

The distance to a wall, an obstacle, or a white line is acquired by a total of 32

sensors

42, 16 in the front and 16 in the rear. The individual sensors 42 independently return the values as states. The state must be recognized from its independent value.

<State generated by the sensor>
Table 2

<Robot sensor>
FIG. 35 is a diagram for explaining the distance measured by the sensor 42 included in the

robots

2 and 41.

The dashed arrow is the straight distance to the wall. The sensor ID is fixed, and the direction changes when the car turns the steering wheel, but straight ahead is always a predetermined sensor ID. This is used by

virtual robots

2 and 41 when calculating their own position, regardless of DQN.

<Cumulative robot / sensor status>
= Take a change in action =
1. 1. Record which position of the current number of stages (1, 2, 3, 4, 5) ・ Speed per stage, 0.55m / 1000ms
・ Calculated as the cumulative speed of one accelerator (speed = current number of stages x 0.2m / 1000ms)
・ One brake is calculated as minus (speed = ((current number of stages-1) x 0.2m / 1000ms)

2. Record any position of the current number of steps (0, +1, +2, +3 ... +15) of the right-hand drive ... 16 steps to the right from the center ・ 11,25 degrees per step, MAX 180 degrees ・ Right Handles are accumulated each time (advance angle = (current number of stages + 1) x 11.25 degrees)

3. 3. Record any position of the current number of left-hand drive steps (0, -1, -2, -3 ... -15) ... 16 steps to the left from the center-

At

11,25 degrees per step, MAX 180 degrees ・ The left-hand drive is accumulated each time (advance angle = (current number of stages -1) x 11.25 degrees)

4. The back is effective only when the speed = 0 (stationary state), and moves 0.4 m behind while keeping the current number of steps of the steering wheel.

= Record the state of the sensor =
1. 1. The angle is uniquely held by the sensor ID. There are four types of distances: walls, obstacles, the closest line (right or left), and the second closest line (right or left).

The distance is calculated by trigonometric function every time.
The calculation method is shown separately.

= Calculation of the current position (GPS) information of the car =
FIG. 36 is a diagram illustrating a road model assumed in the present embodiment.

The current position is calculated on the two-dimensional coordinate axes of latitude and longitude (x-axis, y-axis).
Calculate the positional relationship within the coordinate axes shown on the left.

<Calculation of location information>
= Calculate the latitude and longitude at the time when the car travels in a certain direction =
FIG. 37 is a diagram illustrating the relationship between the moving distance and the position information (latitude / longitude).

(1) One accelerator is 0.2 m / sec. The speed is determined by the current number of accelerator stages.
(2) The elapsed time is calculated by the difference between the previous time and the current time. (3) The elapsed time is determined by multiplying by the speed of the number of accelerator stages to obtain the moving distance.

Example:
Current number of stages: 2
Current speed: 0.2 x 2 = 0.4m / 1000ms
Elapsed time: Last time: 11:35:20 234ms
This time: 11:35:20:350ms
Difference: 116ms
Angle: One handle: 11.25 degrees Mileage: 116ms ÷ 1000ms x 0.4m = 0.0464m
C = 0.0464m

(1) Calculation of x-axis of latitude and longitude Cosθ (11.25 degrees) = A ÷ C
A = 0.0464m x Cosθ (11.25 degrees)
A = 0.0464 x 0.98078528 = 0.04550844m
(2) Calculation of y-axis of latitude and longitude Sinθ (11.25 degrees) = B ÷ C
B = Sinθ (11.25 degrees) x C
B = 0.195090 x 0.0464m = 0.00905218m
(3) Position of the car Position (X, Y) = (0.04550844m, 0.00905218m) + (P, Q)
= (0.04550844m + P, 0.00905218m + Q)
That is, it is the distance moved in about 0.1 seconds at a speed of 0.4 m / s.

<Hardware>
FIG. 38 is a diagram showing a hardware configuration example of a computer used in the robot control system according to the present embodiment. The computer may be a general-purpose computer such as a workstation or a personal computer, or may be logically realized by cloud computing. The illustrated configuration is an example, and may have other configurations.

The computer includes at least a processor 20, a memory 21, a storage 22, a transmission / reception unit 23, an input / output unit 24, and the like. The processor 20 is an arithmetic unit that controls the operation of the entire computer, controls the transmission and reception of data between each element, and performs information processing necessary for application execution and authentication processing. For example, the processor 20 is a CPU (Central Processing Unit), and executes each information processing by executing a program or the like stored in the storage 22 and expanded in the memory 21. The memory 21 includes a main memory composed of a volatile storage device such as a DRAM (Dynamic Random Access Memory) and an auxiliary storage composed of a non-volatile storage device such as a flash memory or an HDD (Hard Disc Drive). The memory 21 is used as a work area or the like of the processor 20, and also stores a BIOS (Basic Input / Output System) executed when the computer is started, various setting information, and the like. The storage 22 stores various programs such as application programs. A database storing data used for each process may be built in the storage 22. The transmission / reception unit 23 connects the computer to the network and the blockchain network. The transmission / reception unit 23 may be provided with a short-range communication interface of Bluetooth (registered trademark) and BLE (Bluetooth Low Energy). The input / output unit 24 is an information input device such as a keyboard and a mouse, and an output device such as a display.

In each of the request reception layer 31, the control layer (robot control AI3), and the execution layer (simulator 4) of the robot control system according to the present embodiment, the processor 20 included in the computer reads the program stored in the storage 22 into the memory 21. It is realized by executing. Further, the learning result (model) by the robot control AI, the map information, the route information, and the like can be stored in, for example, a storage area provided by the memory 21 or the storage 22.

<Hardware>
FIG. 99 is a diagram showing a hardware configuration example of a computer used in the robot control system according to the present embodiment. The computer may be a general-purpose computer such as a workstation or a personal computer, or may be logically realized by cloud computing. The illustrated configuration is an example, and may have other configurations.

In each of the second layer scheduler, the third layer robot control system request reception layer 31, the control layer (robot control AI3), and the execution layer (simulator 4) according to the present embodiment, the processor 20 provided in the computer is the storage 22. It is realized by reading the program stored in the memory 21 into the memory 21 and executing the program. Further, various storage units in the second layer and learning results (models), map information, route information, etc. by the robot control AI in the third layer can be stored in, for example, a storage area provided by the memory 21 or the storage 22. ..

Although the present embodiment has been described above, the above embodiment is for facilitating the understanding of the present invention, and is not for limiting and interpreting the present invention. The present invention can be modified and improved without departing from the spirit thereof, and the present invention also includes an equivalent thereof.

2 Robot 3 Robot Control AI
31 Request reception layer 32 Work pooling layer 4 Simulator 41 Robot simulator 42 Sensor 43 Adapter

Claims

A system that controls multiple robots
A working memory unit that stores a plurality of tasks to be performed by the robot,
An allocation processing unit that assigns each of the tasks to the robot,
A transmitter that transmits the assigned work to the control device of the robot, and
A status acquisition unit that acquires the operating status of the robot, and
With
The allocation processing unit changes the allocation destination of the work according to the operation status.
A robot control system featuring.
The robot control system according to claim 1.
The allocation processing unit allocates one work to one or a plurality of robots according to a first work amount required for the work and a second work amount that the robot can perform.
A robot control system featuring.
The robot control system according to claim 1.
The allocation processing unit performs the work so that the amount of the work assigned to each of the plurality of robots is smoothed by the cumulative amount of the work assigned to each of the plurality of robots in a predetermined period. Assigning to the robot,
A robot control system featuring.
The robot control system according to claim 1.
The status acquisition unit acquires information indicating the operation status from the control device of the robot and a sensor independent of the robot.
A robot control system featuring.
The robot control system according to claim 1.
To provide a book that stores at least the occupied time occupied by the robot for the work in the debit and credit as an account item of at least each robot and the operating state as a whole.
A robot control system featuring.