CN116166027B

CN116166027B - Intelligent robot control method and system for warehouse logistics

Info

Publication number: CN116166027B
Application number: CN202310201498.7A
Authority: CN
Inventors: 袁飞
Original assignee: Zhanjiang Chengtong Logistics Co ltd
Current assignee: Zhanjiang Chengtong Logistics Co ltd
Priority date: 2023-02-28
Filing date: 2023-02-28
Publication date: 2023-12-26
Anticipated expiration: 2043-02-28
Also published as: CN116166027A

Abstract

The invention relates to the technical field of intelligent control, in particular to an intelligent robot control method and system for warehouse logistics, wherein the method comprises the following steps: acquiring a task path of the ARM robot; the method comprises the steps of acquiring intersection coordinates, speed, acceleration and priority of ARM robots at each intersection, obtaining a state vector, constructing a reward function, acquiring a deep learning network, and acquiring an optimal route of the ARM robots of a route to be planned.

Description

Intelligent robot control method and system for warehouse logistics

Technical Field

The invention relates to the technical field of intelligent control, in particular to an intelligent robot control method and system for warehouse logistics.

Background

Along with the continuous rise of the order quantity of electronic commerce, the warehouse logistics industry is continuously and automatically upgraded. The logistics industry increases automatic guided vehicles (Automated Guided Vehicle, AGVs), unmanned warehouse systems, intelligent logistics automatic sorting robots and the like year by year. However, the unmanned whole warehouse is difficult to kick on, and even the industry is huge, all warehouses under the flag are unlikely to be changed into unmanned warehouses.

Therefore, an autonomous mobile robot (Automated Mobile Robot) with automatic driving capability, namely an ARM robot, is generated, the ARM robot can adapt to complex man-machine cooperation scenes more than an AGV, the ARM robot relies on powerful technologies such as AI perception, SLAM, deep learning, control decision and the like, environment modification (laying of magnetic stripes/two-dimensional codes) is not needed, and the ARM robot can be used for being fast on duty and directly integrated into an operation scene to serve as an intelligent carrier, so that the ARM robot has more practicability.

In the prior art, when an ARM robot is used for carrying out man-machine occasion operation, the surrounding environment is required to be continuously identified so as to realize dynamic planning of a route, however, in the operation process, more than one ARM robot exists, so that the running route of the ARM robot and other routes possibly have conflicted nodes, namely, the ARM robots collide at the same time point and the same place, and therefore, the route planning of the ARM robot in the prior art is unreasonable, and the package transportation efficiency is low.

Disclosure of Invention

The invention provides an intelligent robot control method and system for warehouse logistics, which are used for solving the problem of low package transportation efficiency caused by unreasonable route planning of ARM robots of the existing logistics warehouse.

The intelligent robot control method for warehouse logistics adopts the following technical scheme:

acquiring a task path of the ARM robot in a storage two-dimensional operation area map;

acquiring a path curve of the ARM robot in a three-dimensional space according to the coordinates and time of each point on a task path; acquiring an intersection point and an intersection point coordinate of a path curve of each ARM robot and path curves of all other ARM robots;

acquiring inertia of each ARM robot when delivering packages, and acquiring emergency degree of each package; acquiring the priority of the ARM robot when passing through each intersection point according to the emergency degree and inertia corresponding to the package transported by the ARM robot;

acquiring a state vector of each ARM robot according to the intersection point coordinates, the speed, the acceleration and the priority of the ARM robot at each intersection point;

acquiring the avoiding time of the ARM robot when avoiding at each intersection point according to the speed of the ARM robot and the preset safety diameter, and constructing a reward function according to the avoiding time of the ARM robot when avoiding at each intersection point and the actual avoiding time;

constructing a deep learning network based on a reward function, taking a state vector corresponding to each ARM robot as input of the deep learning network, marking a route for each ARM robot manually and taking the optimal route corresponding to each ARM robot as output of the deep learning network, training the deep learning network, and obtaining a trained deep learning network model when the reward function converges;

and inputting a state vector corresponding to the ARM robot of each route to be planned in the two-dimensional operation area map into a trained deep learning network model to obtain an optimal route of the ARM robot of each route to be planned, and controlling the ARM robot to operate according to the optimal route.

Preferably, constructing the reward function includes:

acquiring total avoiding time spent by the ARM robot when avoiding other ARM robots from a first conflict point to a current conflict point of a path curve according to the avoiding time of the ARM robot for avoiding at each intersection point;

acquiring the actual avoiding total time of the ARM robot from the first conflict point of the path curve to the current conflict point when the ARM robot avoids other ARM robots according to the actual avoiding time of the ARM robot for avoiding at each intersection point;

obtaining a discount factor of a reward value corresponding to the current conflict point according to the number from the first conflict point to the current conflict point on the path curve and the total number of the conflict points on the path curve;

taking the difference value of the total avoiding time and the actual avoiding time as the rewarding value of the current conflict point;

and obtaining the reward function according to the reward value corresponding to each conflict point on the path curve and the discount factor of the reward value.

Preferably, the obtaining the urgency of each package includes:

acquiring the order receiving time of each package, the departure time of the ARM robot for delivering the package and the current time;

acquiring a first time difference value between the departure time of the ARM robot for delivering the package and the current time;

acquiring a second time difference value between the departure time of the ARM robot for conveying the package and the order receiving time of the package;

and taking the absolute value of the ratio of the first time difference value to the second time difference value as the emergency degree of the corresponding package.

Preferably, acquiring the priority of the ARM robot when passing each intersection point includes:

and taking the sum of the emergency degree corresponding to the package transported by the ARM robot and the inertia of the ARM robot when transporting the package as the priority of the ARM robot when passing through each intersection point.

Preferably, acquiring the state vector of each ARM robot includes:

the intersection points of the path curve of the ARM robot and all other path curves are marked as conflict points;

acquiring the distance between the current coordinate of the ARM robot and each conflict point, and obtaining a distance sequence according to the arrangement from small to large;

taking intersection point coordinates corresponding to intersection points of the path curves of the ARM robot and other path curves, and speed, acceleration and priority of the ARM robot at the intersection points as element values of a state vector;

arranging the element values according to the sequence of the intersection points corresponding to the distance sequences to obtain a state vector of each ARM robot;

the number of the element values in the state vector is the total number of ARM robots, and when the number of the element values in the state vector is smaller than the total number of the ARM robots, the element values corresponding to the insufficient positions are supplemented by 0.

Preferably, the ratio of the preset safety diameter to the speed of the ARM robot is used as the avoiding time when the ARM robot avoids at each intersection point.

Preferably, the initial time of the time axis in the three-dimensional space is the moment when each ARM robot is started every day.

Preferably, acquiring inertia of each ARM robot when delivering packages includes:

acquiring the total weight of the ARM robot and the package transported by the ARM robot, and the maximum load weight of the ARM robot;

acquiring the product of the total weight of the ARM robot and the package transported by the ARM robot and the speed of the ARM robot;

the ratio of the product of the total weight and the speed to the maximum load weight is taken as the inertia of the ARM robot when the package is transported.

An intelligent robotic control system for warehouse logistics, comprising:

the information acquisition module is used for acquiring a task path of the ARM robot in the storage two-dimensional operation area map;

the feature acquisition module is used for acquiring a path curve of the ARM robot in a three-dimensional space according to the coordinates and time of each point on the task path; acquiring an intersection point and an intersection point coordinate of a path curve of each ARM robot and path curves of all other ARM robots; the method comprises the steps of acquiring inertia of each ARM robot when delivering packages, and acquiring emergency degree of each package; acquiring the priority of the ARM robot when passing through each intersection point according to the emergency degree and inertia corresponding to the package transported by the ARM robot; the method comprises the steps of acquiring a state vector of each ARM robot according to the intersection point coordinates, the speed, the acceleration and the priority of the ARM robot at each intersection point;

the network construction module is used for acquiring the avoiding time of the ARM robot when avoiding at each intersection point according to the speed of the ARM robot and the preset safety diameter, and constructing a reward function according to the avoiding time of the ARM robot when avoiding at each intersection point and the actual avoiding time; the method comprises the steps of constructing a deep learning network based on a reward function, taking a state vector corresponding to each ARM robot as input of the deep learning network, marking a route for each ARM robot manually and taking the optimal route corresponding to each ARM robot as output of the deep learning network, training the deep learning network, and obtaining a trained deep learning network model when the reward function converges;

the route planning module is used for inputting the state vector corresponding to the ARM robot of each route to be planned in the two-dimensional operation area map into the trained deep learning network model, obtaining the optimal route of the ARM robot of each route to be planned, and controlling the ARM robot to operate according to the optimal route.

The intelligent robot control method and system for warehouse logistics have the beneficial effects that:

by analyzing the road junction conflict problem, the road junction conflict problem can be divided into two aspects of space overlapping and time conflict, the space overlapping does not necessarily cause the road junction conflict, only the space overlapping and the time conflict are met, the road junction conflict is caused, so that nodes with the space conflict and the time conflict are obtained, namely the intersection point of a path curve of the ARM robot and path curves of all other ARM robots, so that the conflict position is accurately positioned, secondly, the characteristics of each ARM robot in passing through the conflict point are comprehensively described by utilizing the speed, the position, the acceleration and all the priorities of the ARM robots, then the state vector of each ARM robot in the corresponding path curve is obtained according to the speed, the position, the acceleration and the priorities, and a reward function of a deep learning network is constructed by utilizing the state vector, so that when the optimal route of the ARM robot is obtained by utilizing the deep learning network, namely the speed, the position, the acceleration and the state vector obtained by all the priorities of the ARM robot are trained on the deep learning network, the optimal route of each ARM robot is ensured, namely the optimal route of each ARM robot is reasonably planned by the intelligent robot, so that the intelligent route of each robot is wrapped up.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an embodiment of an intelligent robotic control method for warehouse logistics of the present invention;

fig. 2 is a task path of an ARM robot in an embodiment of a method and system for controlling an intelligent robot for warehouse logistics according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

An embodiment of an intelligent robot control method for warehouse logistics of the present invention is shown in fig. 1, and the method includes:

s1, acquiring a task path of an ARM robot;

specifically, the ARM robot can exchange data with the total server in real time by using a SLAM laser navigation technology, can autonomously identify surrounding environment information through a laser radar, does not need to rely on external auxiliary positioning facilities, can flexibly receive a downloaded task list through a list arranging system, automatically goes to a charging pile when electric quantity reaches a critical point in the tasks, and only goes a curve when turning due to the fact that the ARM robot defaults to a straight line, therefore, a task path only consists of a straight line with a variable length and a curve with a limited turning radius, and the task path of the ARM robot is acquired in a two-dimensional operation area map of storage according to the current position coordinate and the target position coordinate of the ARM robot, as shown in fig. 2, a solid origin in the map is the current position coordinate, and a round corner folding line is the task path.

S2, acquiring intersection point coordinates, speed, acceleration and priority of the ARM robot at each intersection point, and obtaining a state vector;

specifically, a path curve of the ARM robot in a three-dimensional space is obtained according to the coordinates and time of each point on a task path, wherein the construction of a three-dimensional coordinate system comprises the following steps: the three-dimensional coordinate system takes the length direction of a two-dimensional operation area map as the abscissa axis of each point on a task path, the width direction of the two-dimensional operation area map as the ordinate axis of each point on the task path, the direction perpendicular to the two-dimensional operation area map as the time coordinate axis to obtain the three-dimensional coordinate system, then the coordinates corresponding to each ARM robot at each moment are obtained to obtain the path curve of the ARM robot in the three-dimensional space, wherein the origin of the time axis sets the operation starting moment of the ARM robot at each day, the coordinates of each point on the path curve in the three-dimensional space are (x, y, t), (x, y) are the coordinates in the two-dimensional operation area map of the actual operation environment of the ARM robot, and the coordinates (x, y, t) represent the coordinates of the ARM robot at the moment t.

Specifically, the intersection point and the intersection point coordinates of the path curve of each ARM robot and the path curves of all other ARM robots are obtained; because the intersection conflict problem can be resolved into two aspects of space overlapping and time conflict, the space overlapping does not necessarily cause intersection conflict, no space overlapping does not necessarily cause intersection conflict, only the space overlapping and time conflict are met, the intersection conflict can be caused, so that nodes with the space conflict and the time conflict can be taken as conflict points, specifically, when the conflict points are obtained, each ARM robot is taken as a target ARM robot, the intersection point of the path curve of the target ARM robot in the three-dimensional coordinate system and the path curves of all other ARM robots is obtained by taking the target ARM robot as an example, the intersection point indicates that the paths of the target ARM robot conflict with the paths of other ARM robots, the intersection point is taken as the conflict point, and then the intersection point coordinates of the path curves of the target ARM robot and all other ARM robots are obtained in the three-dimensional coordinate system.

Specifically, the inertia of each ARM robot when delivering packages is obtained, and the emergency degree of each package is obtained. In this embodiment, the ARM robots all operate at the set highest speed, and since the avoidance behavior is achieved by speed reduction, the faster the speed, the greater the loss of speed reduction to the task, the greater the load degree, the greater the inertia I, and the harder the speed reduction.

Therefore, the method for calculating the inertia I when the ARM robot conveys the package comprises the following steps: acquiring the total weight of the ARM robot and the package transported by the ARM robot, and the maximum load weight of the ARM robot; acquiring the product of the total weight of the ARM robot and the package transported by the ARM robot and the speed of the ARM robot; the ratio of the product of the total weight and the speed to the maximum load weight is taken as the inertia of the ARM robot when carrying the package, and the aim of comparing the maximum load weight is to normalize the inertia.

The method for calculating the emergency degree of each package comprises the following steps: acquiring the order receiving time of each package, the departure time of the ARM robot for delivering the package and the current time; acquiring a first time difference value between the departure time of the ARM robot for delivering the package and the current time; acquiring a second time difference value between the departure time of the ARM robot for conveying the package and the order receiving time of the package; taking the absolute value of the ratio of the first time difference value to the second time difference value as the emergency degree of the corresponding package, wherein the calculation formula of the emergency degree Z of the package is as follows:

wherein Z represents the emergency degree of the package transported by the ARM robot;

T _now representing the current time for the ARM robot to deliver the package;

T _get the method comprises the steps that the order receiving time when the ARM robot receives a package to be delivered is represented;

T _send the starting time when the ARM robot starts to deliver the package is represented;

the absolute value symbol is denoted by i.

It should be noted that, the units of the current time, the order receiving time and the departure time are all hours, in this embodiment, the 24 hours are adopted, the minutes are converted into corresponding hours, for example, 14:30 is converted into 14.5 hours, and this embodiment considers that the longer the current time of the package departure time interval is, i.e. the larger the absolute value of the difference value of the current time of the package departure time interval is, the more urgent the package is, the longer the order receiving time is to the departure time, i.e. the larger the absolute value of the difference value of the package order receiving time is to the departure time is, and the longer the package residence time is, the less urgent the package is considered.

Specifically, the priority of the ARM robot when passing through each intersection point is obtained according to the emergency degree and inertia corresponding to the package transported by the ARM robot; for the parcel, the greater the corresponding emergency degree is, the more important the parcel should pass through, and for the ARM robot with large inertia, the greater the inertia I is, the more likely the parcel carried on the ARM robot slips when the ARM robot with large inertia is braked, therefore, when the intersection conflict occurs, the ARM robot with larger inertia I is led to pass through preferentially in general, so in the embodiment, the corresponding emergency degree of the parcel carried by the ARM robot and the sum value of the inertia are taken as the priority of the ARM robot when passing through each intersection, the processing of the ARM robot meeting the human staff should be more gifted, namely the priority of the human staff is the highest, the motion state of the human staff is higher, the safety distance should be set larger, the human staff is detected by using the infrared sensor on the ARM robot, whether the human staff exists in front or not is confirmed by receiving the fixed mode of the infrared spectrum sent by the human staff, when the ARM robot meets the human staff, the new path planning is directly carried out, namely the path planning is needed for the human staff, namely the human staff is the priority is the highest, and traction force is required to be planned again.

Specifically, a state vector of each ARM robot is obtained according to the intersection point coordinates, the speed, the acceleration and the priority of the ARM robot at each intersection point, wherein the specific process of obtaining the state vector is as follows: the intersection points of the path curve of the ARM robot and all other path curves are marked as conflict points; acquiring the distance between the current coordinate of the ARM robot and each conflict point, and obtaining a distance sequence according to the arrangement from small to large; taking coordinates corresponding to the intersection points of the path curves of the ARM robot and other path curves, and the speed, acceleration and priority of the ARM robot at the intersection points as element values of a state vector; arranging the element values according to the sequence of the intersection points corresponding to the distance sequences to obtain a state vector of each ARM robot; the number of the element values in the state vector is the total number of ARM robots, and when the number of the element values in the state vector is smaller than the total number of the ARM robots, the element values corresponding to the insufficient positions are supplemented by 0.

S3, constructing a reward function and acquiring a deep learning network;

specifically, a deep learning network is built based on a reward function, a state vector corresponding to each ARM robot is used as input of the deep learning network, a route is marked on each ARM robot manually and used as an optimal route of the ARM robot, the optimal route corresponding to each ARM robot is used as output of the deep learning network, the deep learning network is trained, and a trained deep learning network model is obtained when the reward function converges.

The core idea of the deep learning network adopted in the embodiment is to make the ARM robot try wrong continuously, find the strategy of the optimal route, and mathematically represent fitting a deep learning network, namely, the input of the deep learning network is the state of each point of the ARM robot on the path curve, namely, the state vector corresponding to the path curve of each ARM robot, so that the highest rewards are obtained, and the optimal route of the ARM robot is output.

Acquiring the avoiding time of the ARM robot when avoiding at each intersection point according to the speed of the ARM robot and the preset safety diameter, constructing a reward function according to the avoiding time of the ARM robot when avoiding at each intersection point and the actual avoiding time, firstly setting the safety distance of the ARM robot, namely setting the safety distance to be a region obtained by taking the center of the ARM robot as the center and taking the radius as a circle as the safety region, namely, the safety diameter is 2 meters, and then acquiring the avoiding time of the ARM robot when avoiding at each intersection point according to the speed of the ARM robot and the preset safety diameter, wherein the specific calculation formula of the avoiding time T is as follows:

wherein 2r represents the safety diameter of the ARM robot;

v represents the corresponding real-time speed of the ARM robot at each intersection point;

it should be noted that, the safety diameter and the avoidance time are in a direct proportion relationship, the speed and the avoidance time are in an inverse proportion relationship, when the ARM robot is avoiding a certain ARM robot, in order to avoid collision caused by physical errors, the ARM robots cannot be inevitably attached to the avoidance in practical application, and each ARM robot can be regarded as a spherical object, so in the embodiment, the safety diameter is set to be 2 meters, and secondly, the intersection occupation time of the avoided ARM robot is determined, so that the deceleration process can be directly ignored in the embodiment, and meanwhile, the purpose of calculating the avoidance time of the ARM robot when each intersection point is avoiding is to save the calculation time of the process in the practical deep learning network training process.

Specifically, a reward function is constructed according to the avoiding time of the ARM robot for avoiding at each intersection point and the actual avoiding time; since the actual avoidance time is available, the reward function refers to that the corresponding element value in the state vector at a certain time moves through and reaches the corresponding element value in the state vector corresponding to the next time, wherein the element value of the state vector is as follows: coordinates corresponding to the intersection points of the path curves of the ARM robot and other path curves, speed, acceleration and priority of the ARM robot at the intersection points, obtaining an optimization direction capable of indicating the whole deep learning network by using a reward function, and specifically, constructing the reward function comprises the following steps: acquiring total avoiding time spent by the ARM robot when avoiding other ARM robots from a first conflict point to a current conflict point of a path curve according to the avoiding time of the ARM robot for avoiding at each intersection point; acquiring the actual avoiding total time of the ARM robot from the first conflict point of the path curve to the current conflict point when the ARM robot avoids other ARM robots according to the actual avoiding time of the ARM robot for avoiding at each intersection point; obtaining a discount factor of a reward value corresponding to the current conflict point according to the number from the first conflict point to the current conflict point on the path curve and the total number of the conflict points on the path curve, and taking the difference value of the total avoiding time and the actual avoiding time as the reward value of the current conflict point; the method for obtaining the reward function according to the reward value corresponding to each conflict point on the path curve and the discount factor of the reward value is a time difference algorithm, and the time difference algorithm is not described in the present embodiment of the present invention. In order to minimize the time spent passing the conflict point, the prize value is calculated as:

R _t ＝T _tb -T _t1

wherein R is _t Indicating that the ARM robot avoids the rewarding value of other ARM robots when the ARM robot is at a conflict point t;

T _tb the total avoiding time spent by the ARM robot when avoiding other ARM robots from the first conflict point to the conflict point t of the path curve is represented;

T _t1 the method comprises the steps that the actual avoiding total time when the ARM robot avoids other ARM robots from a first conflict point to a conflict point t of a path curve is represented;

the avoidance time T is used to control the distribution of the prize values corresponding to the prize function R to be around 0, and to make the prize values at different times positive or negative, so that the speed of the avoidance gradient drop is too slow.

Specifically, the discount factor for obtaining the prize value corresponding to the conflict point includes: the discount factor theta is used for considering rewards brought by future movement trend, namely the larger the value of the discount factor theta is, the more important the decision is to pay attention to long-term rewards, otherwise, short-term rewards are taken as the main, and the calculation formula of the discount factor is as follows:

wherein θ represents a discount factor;

nm represents the number of ARM robots from the first point of conflict to the current point of conflict on the path curve;

n represents the total number of conflict points on a path curve of the ARM robot;

the more the number Nm of conflict points on the path curve at the current time, the more the uncertainty factors in future environments are, the more the current path seems to bring forward benefits, but the longer-term benefits are lower along with the interference of the randomness factors in the environments, so that the smaller the corresponding discount factor θ, the smaller the number Nm, and the larger the discount factor θ.

Specifically, the reward function is:

Q＝R _t +θ ¹ R _t+1 +θ ² R _t+2 +...+θ ^l R _t+l

wherein Q represents the total prize value, θ ¹ Representing the corresponding discount factor, θ, at the 1 st conflict point ² Representing the corresponding discount factor, θ, at the 2 nd conflict point ^l The discount factor corresponding to the conflict point at the t+l is used for considering rewards brought by future movement trend, namely, the larger the value of theta is, the more important the decision is to pay attention to long-term rewards, otherwise, the short-term rewards are taken as the main; r is R _t+l Represents the corresponding prize value at the point of conflict t+l, R _t Represents the corresponding prize value at the point of conflict t, R _t+1 Represents the corresponding prize value at the t+1 conflict point, R _t+2 The reward value corresponding to the point of conflict at t+2 is expressed, and it should be noted that the reward function is the state price in the deep learning networkThe value function is a function in the prior art, and only the buckling factor and the prize value are defined in the present invention, so that the prize function is not described in detail in this embodiment.

Taking the constructed reward function as a function during deep neural network training, taking a state vector corresponding to each ARM robot as an input of a deep learning network, manually marking a route for each ARM robot and taking the optimal route corresponding to each ARM robot as an output of the deep learning network, training the deep learning network to obtain a trained deep learning network model, and obtaining the trained deep learning network model when the reward function converges, namely, when the total reward value of the reward function is highest; specifically, in order to make planning an optimal route to a plurality of ARM robots simultaneously, the core idea of the deep learning network is to make an intelligent body continuously try wrong, find an optimal coping strategy, mathematically represent fitting a neural network, the deep learning network is a prior art network, the embodiment is not repeated, in order to make the ARM robots adapt to most human-computer cooperation scenes, the data acquisition cost of actual operation is too high, the deep learning network of the deep learning network needs to be trained, a map of a working area is loaded in each ARM by utilizing a deep learning network model after training, order data of storage logistics is collected, the distribution situation of package weight, departure place and destination information is obtained, order tasks are randomly generated according to the distribution situation, virtual operation is carried out by the ARM robots, a plurality of random walking obstacles are arranged in the scenes, in order to simulate the behaviors of human operators, in the operation process, the number of collisions are recorded, the deep learning network is trained by using a reward function, the deep learning network is trained for more than 100 times, the training is carried out until the total value of the function is not more than the total value, the map of the deep learning network is well trained, and the training environment is changed, for example, the map of the deep learning is changed, and the training of the deep learning network can be more accurately trained by using the map is changed, and the training environment is changed, for example, the training of the deep learning region is changed.

Because ARM robots avoid ARM robots with higher priority than ARM robots, global efficiency is not necessarily improved, and more subsequent conflict areas are possibly generated due to one avoidance, a deep learning network model is obtained by combining a central coordination network with a trained deep learning network in the embodiment, wherein the top layer of the deep learning network model is the central coordination network, the bottom layer is the deep learning network corresponding to each ARM robot, so that the subsequent optimal route of all ARM robots is planned, the central coordination network collects and distributes information of the ARM robots in a concentrated mode, the communication times and the complexity are effectively controlled, the problem that multiple agent strategies are not coordinated due to non-stable environment is better relieved by an algorithm, and the corresponding actions generated by the current states and strategies of the ARM robots are communicated through communication by the method of the deep learning network, so that the problems caused by a Markov decision process and a non-stable environment can be relieved, and routes of all ARM robots are more coordinated.

S4, acquiring an optimal route of the ARM robot of the route to be planned;

Specifically, the state vector corresponding to the ARM robot of each route to be planned in the two-dimensional operation area map obtained in the step S3 is input into a trained deep learning network model, the optimal routes of the ARM robots of all routes to be planned are obtained, and the ARM robots are controlled to operate according to the corresponding optimal routes.

The invention discloses an intelligent robot control system for warehouse logistics, which comprises the following components: the information acquisition module is used for acquiring a task path of the ARM robot in the storage two-dimensional operation area map; the feature acquisition module is used for acquiring a path curve of the ARM robot in a three-dimensional space according to the coordinates and time of each point on the task path; acquiring an intersection point and an intersection point coordinate of a path curve of each ARM robot and path curves of all other ARM robots; the method comprises the steps of acquiring inertia of each ARM robot when delivering packages, and acquiring emergency degree of each package; acquiring the priority of the ARM robot when passing through each intersection point according to the emergency degree and inertia corresponding to the package transported by the ARM robot; the method comprises the steps of acquiring a state vector of each ARM robot according to the intersection point coordinates, the speed, the acceleration and the priority of the ARM robot at each intersection point; the network construction module is used for acquiring the avoiding time of the ARM robot when avoiding at each intersection point according to the speed of the ARM robot and the preset safety diameter, and constructing a reward function according to the avoiding time of the ARM robot when avoiding at each intersection point and the actual avoiding time; the method comprises the steps of constructing a deep learning network based on a reward function, taking a state vector corresponding to each ARM robot as input of the deep learning network, marking a route for each ARM robot manually and taking the route as an optimal route of the ARM robot, taking the optimal route corresponding to each ARM robot as output of the deep learning network, training the deep learning network, and obtaining a trained deep learning network model when the reward function converges; the route planning module is used for inputting the priority, the state vector and the path curve corresponding to the ARM robots of each route to be planned in the two-dimensional operation area map into the trained deep learning network model, obtaining the optimal route of the ARM robots of each route to be planned, and controlling the ARM robots to operate according to the optimal route.

According to the intelligent robot control method and system for warehouse logistics, the problem of intersection conflict can be divided into two aspects of space overlapping and time conflict by analyzing the intersection conflict problem, the space overlapping does not necessarily cause intersection conflict, the intersection conflict can only be caused by meeting the space overlapping and the time conflict, so that the node with the space conflict and the time conflict is taken as a conflict node, the conflict position is accurately positioned, the characteristics of each ARM robot on the passing conflict point are comprehensively described by utilizing the speed, the position, the acceleration and all the priorities of the ARM robots, then the state vector of each ARM robot on the corresponding path curve is obtained according to the speed, the position, the acceleration and the priorities, and the state vector is utilized to construct a reward function of a deep learning network, so that when the optimal route of the ARM robots is obtained by utilizing the deep learning network, the ARM network is trained on the deep learning network by the state vector obtained by the speed, the position, the acceleration and all the priorities of the ARM robots, namely, the optimal route of each ARM robot is obtained by intelligently obtaining the ARM, and the optimal route of each ARM robot is guaranteed, and the intelligent transportation obstacle avoidance efficiency is improved.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. An intelligent robot control method for warehouse logistics is characterized by comprising the following steps:

inputting a state vector corresponding to the ARM robot of each route to be planned in the two-dimensional operation area map into a trained deep learning network model to obtain an optimal route of the ARM robot of each route to be planned, and controlling the ARM robot to operate according to the optimal route;

acquiring inertia of each ARM robot when delivering packages comprises the following steps:

taking the ratio of the product of the total weight and the speed to the maximum load weight as inertia when the ARM robot conveys the package;

the acquiring the priority of the ARM robot when passing through each intersection point comprises the following steps:

2. The intelligent robotic control method for warehouse logistics of claim 1, wherein constructing the reward function comprises:

3. The intelligent robotic control method for warehouse logistics of claim 1, wherein the acquiring the urgency of each package comprises:

4. The intelligent robot control method for warehouse logistics of claim 1, wherein the acquiring the state vector of each ARM robot comprises:

5. The intelligent robot control method for warehouse logistics according to claim 1, wherein a ratio of a preset safety diameter to a speed of the ARM robot is used as an avoidance time when the ARM robot performs avoidance at each intersection point.

6. The intelligent robot control method for warehouse logistics according to claim 1, wherein the initial time of the time axis in the three-dimensional space is the time when each ARM robot is started every day.

7. An intelligent robotic control system for warehouse logistics, comprising:

the route planning module is used for inputting a state vector corresponding to the ARM robot of each route to be planned in the two-dimensional operation area map into the trained deep learning network model, obtaining an optimal route of the ARM robot of each route to be planned, and controlling the ARM robot to operate according to the optimal route;