CN113062601A

CN113062601A - Q learning-based concrete distributing robot trajectory planning method

Info

Publication number: CN113062601A
Application number: CN202110284547.9A
Authority: CN
Inventors: 范思文; 纪金帅; 王昊天; 李万莉
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2021-07-02
Anticipated expiration: 2041-03-17
Also published as: CN113062601B

Abstract

The invention relates to a novel track planning scheme of an intelligent concrete distributing robot, which is suitable for the autonomous pouring control of the concrete distributing robot, avoids the complex inverse kinematics interpolation calculation and belongs to the field of intelligent manufacturing. The invention designs a universal track planning frame, which comprises a rapid movement process that a material distribution robot is reset from an initial state to a path starting point and from a path end point to the initial state; and the material distributing robot performs a continuous concrete pouring process from the pouring path starting point to the pouring path end point. In the process of rapid movement, a simple interior point method is adopted to perform inverse solution optimization with time optimization as a target, and a cubic polynomial is adopted to fit a track. In the continuous concrete pouring process, an error band with a certain area is formed on a path to be poured at the tail end of the distributing robot, the formed path error band is divided into regions by using a Q learning algorithm, reward values are given to the divided regions according to pouring targets and constraints, Q value training is carried out on a given grid, finally, action sequences of all joints of the robot are formed, the robot action is directly obtained, and the complex track planning process based on kinematic inverse solution optimization is avoided.

Description

Q learning-based concrete distributing robot trajectory planning method

Technical Field

The invention relates to a novel track planning method of an intelligent concrete distributing robot, which is suitable for the autonomous pouring control of the concrete distributing robot, avoids complex interpolation calculation and belongs to the field of intelligent construction.

Background

The concrete distributing robot is a building industrial robot for conveying concrete to a construction site, and plays a very important role in the development process of urban modern construction. Along with the requirement of engineering construction on the efficiency of the material distribution robot is continuously improved, the intelligent control research on the material distribution robot is gradually developed. The intelligent control is inseparable from the path and track planning of the mechanical arm, for an industrial robot, the path planning mainly refers to the track of the tail end motion of the mechanical arm, and the track planning refers to curves of displacement, speed, acceleration and the like when the operated arm frames perform combined motion. The specific work of the material distribution robot is to realize the displacement of the pumping port at the tail end of the arm support, and the displacement can be analyzed according to the mechanism of the industrial robot. The cloth pouring surface is generally divided into a non-rotary horizontal plane, a vertical cylindrical surface and a non-rotary spatial plane or curved surface, the target pouring routes of the cloth are different according to different pouring surfaces, and in order to simplify the pouring route planning, the general routes are composed of straight lines.

In recent years, intelligent real-time dynamic autonomous planning plays an extremely important role in operations of industrial robots and the like. Aiming at the problem of planning of concrete distribution, the intelligent real-time dynamic path planning of the automatic concrete distribution robot and the distribution robot is combined, and the pouring efficiency and quality of a distribution task are further improved. At present, for a large-scale distribution boom system, the construction condition is relatively severe, the calculation amount of the traditional track planning method is large, and the optimal performance index is difficult to determine, so that the traditional track planning method cannot cope with real-time variation factors of a construction site, in the face of construction surfaces with different shapes, the pouring path of a working section is often planned and judged according to the limited visual angle of site workers, the quality of construction pouring excessively depends on the operation experience and the technical level of workers, the automation degree is low, and the track movement requirement of the pouring end of the large-scale distribution boom system cannot be met. The reinforcement learning is used as a branch of a machine learning method, the basic principle of the reinforcement learning is to simulate the learning process of an organism, and an intelligent agent obtains learning experience in continuous interaction with the external environment to gradually train the autonomous planning capability of the intelligent agent. Technically, the reinforcement learning method is similar to the learning operation process of workers, and is an effective way for meeting the autonomous path planning requirement of the work end of the material distribution robot.

Disclosure of Invention

The invention designs a track planner based on Q learning aiming at the autonomous planning and control precision requirements of an intelligent redundant material distribution robot. The method comprises the following steps that a path required by a material distribution robot to be subjected to trajectory planning is a known condition, and firstly, a joint trajectory is planned by a material distribution machine from an initial position to a path starting point by adopting an inner point method with optimal time; when the track planning is carried out on the known straight path, an error band with the bandwidth of 2 times of precision is established by taking the known pouring path as the center according to the pouring precision requirement, the error band is divided in a gridding mode based on a Q learning algorithm in reinforcement learning, the divided grid is given an antipodal scalar reward value according to the working condition of the cloth robot and general indexes, the reward value is the maximum target, distribution training is carried out, the optimal observation-action-reward sequence in the error band environment is obtained, and the joint planning track of the cloth robot is formed according to the action sequence.

The intelligent concrete distributing robot has redundancy, and the robot is a continuous casting straight line or curve path when distributing, so the path planning scheme for designing the distributing robot is carried out in a Cartesian space, and the planning problems of continuous path casting and redundancy freedom degree are considered. According to the existing planning method, firstly, a DH coordinate system transformation method is adopted to establish a positive kinematics model for the robot, on the premise of a known pouring path in a Cartesian space, the path is firstly dispersed according to a certain quantity value, namely, the path is divided into a plurality of sections of paths, the starting point and the end point of each discrete section of path are subjected to kinematics inverse solution operation, namely, the robot kinematics inverse derivation is carried out from the pouring position at the tail end, and the change angle of each joint of the mechanical arm is obtained. Due to the existence of redundant degrees of freedom, the number of the transformable joints of the robot is larger than that of the degrees of freedom in the motion space, the inverse solution operation of kinematics can generate a multi-solution condition, generally, a target optimization algorithm is adopted to select the optimal inverse solution, and the optimization target is set to be optimal in time or energy. After the inverse kinematics solution operation is completed, a series of joint angle values are obtained, data fitting is carried out on the joint angle values, the fitting method uses cubic polynomial fitting, quintic polynomial fitting or B spline curve fitting for multiple purposes, the movement track of each joint of the mechanical arm is obtained, and the track planning task is completed.

The conventional method has many problems for a robot arm which has redundancy and performs continuous linear motion. Firstly, after the trajectory planning is completed, positive kinematic operation is carried out on the planned trajectory again, namely a path in a Cartesian space is deduced from the space trajectory of each joint, and the result shows that although discrete points of the path can be reached, the deviation between the middle section of the discrete points and the actual path is large, particularly in an overlong large mechanical arm such as a distributing robot, the trajectory error of the traditional method can reach about 1m, so that the traditional concrete distributing robot is poor in autonomy and difficult to achieve the goal of intelligent manufacturing due to the fact that the traditional concrete distributing robot is used for manually operating a pouring tail end to control the actual path range. For the situation, the existing research focuses on the fitting of a higher-order polynomial or the research of performing path point segmentation fitting, such as a three-five-three-order polynomial interpolation method, which needs higher mathematical skill and has high computational difficulty, and is difficult to implement in the real-time planning of an industrial robot. Due to the existence of redundancy, the inverse kinematics solution is solved by adopting an optimization algorithm, the optimization algorithm mostly aims at time optimization or energy optimization, weight and constraint can be added in the design of a target function for coupling, so that the multi-objective optimization requirement is met, the algorithm is complex and large in calculation amount, and greatly depends on the correct setting of performance indexes, the working condition of the distributing robot is complex, the performance indexes and the constraint conditions change along with the change of the construction environment at any time, so that the algorithm design is difficult, the internal structure of the complex algorithm is difficult to change, and a fixed algorithm frame with generalization cannot be found.

The invention aims to overcome the defects of the prior art and discloses a concrete distributing robot track planning method based on Q learning.

The technical scheme adopted by the invention for solving the technical problem is as follows:

aiming at the characteristics of a redundant concrete distribution robot, a universal track planning frame is designed, and the track planning of the distribution robot is divided into two parts: one part is a rapid movement process of the cloth robot from an initial state to a path starting point and from a path end point to an initial state; and the other part is the process that the material distribution robot carries out continuous concrete pouring from the pouring path starting point to the pouring path end point. In the process of rapid movement, because a middle path does not need to be considered, a simple interior point method is adopted to perform inverse solution optimization with time optimization as a target, and a cubic polynomial is adopted to fit a track. In the continuous pouring process of concrete, an error band with a certain area is formed on a path to be poured at the tail end of the distributing robot, and the error band width is set according to a given pouring precision condition. And performing region division on the formed path error band by using a Q learning algorithm, giving reward values to the divided regions according to a poured target and constraint, training a given grid, finally forming an action sequence of each joint of the robot, directly obtaining the action of the robot, and avoiding a complex track planning process.

The invention has the beneficial effects that: aiming at the working characteristics of the cloth robot, the trajectory planner is designed, the universality and the generalization are better, the internal parameters are mainly set scalar reward values, and the change and the test are easy. The track planning is carried out by adopting a mode of designing error bands for pouring paths, the error size can be set independently according to the working precision requirement, the actual path deviation is ensured to be within the working precision requirement range, and the problem of overlarge errors among path points in the traditional interpolation planning method is solved. The motion values of all joints of the robot can be directly obtained by adopting a Q learning algorithm for training, so that the complex multi-target optimization inverse solution operation and data fitting process is avoided. The on-line learning mode is adopted for planning, so that the autonomy of the concrete distributing machine is improved, and the aim of unmanned engineering machinery in intelligent construction is easily achieved.

Drawings

FIG. 1 is an overall structure diagram of an intelligent redundant concrete distribution robot (a three-joint rotary concrete distribution robot, previously filed patent 2020111625562);

FIG. 2 is a diagram of a concrete placement robot trajectory planner based on Q-learning;

FIG. 3 is a flow chart of trajectory planning for a continuous straight pouring process;

FIG. 4 is an exemplary diagram of a straight-line trajectory planning using a trajectory planner.

Detailed Description

Fig. 1 shows the overall structure of a designed intelligent redundant concrete distribution robot, which mainly comprises five modules, including a column assembly (1), a pipeline assembly (2), a pipe clamp (3), a cantilever support (4) and a turntable assembly (5). The design adopts three rotary joints, aims at two-degree-of-freedom plane pouring, and has one redundant degree of freedom.

Fig. 2 is a general structure of the trajectory planner, including a fast moving part and a summary of the planning method of the concrete continuous casting part.

Fig. 3 is a detailed description of an idea layer and a technical layer of a track planning section in a concrete continuous casting process, in which a Q learning method is adopted, and fig. 3 is a block diagram and a summary of an application manner and a process thereof.

In fig. 2, a concrete distribution robot trajectory planning overall design framework based on Q learning is shown, and the design framework is divided according to layers and mainly comprises a structural layer, an idea layer, a technical layer and an output layer. In a structural layer, a concrete distributing robot trajectory planner based on Q learning is divided into two parts: one part is a rapid movement process of the cloth robot from an initial state to a path starting point and from a path end point to an initial state; and the other part is the process that the material distribution robot carries out continuous concrete pouring from the pouring path starting point to the pouring path end point.

Based on the division of the structural layer, the following ideas are adopted: in the process of rapid movement, because an intermediate path does not need to be considered in a general situation, a traditional inverse kinematics trajectory planning idea is adopted to perform inverse solution optimization with time optimization as a target, a simple interior point method is adopted in the technology, a cubic polynomial fitting trajectory can be adopted subsequently, and a trajectory curve of each joint angle is finally generated; in the concrete continuous pouring process, the idea that an error band with a certain area is formed on a path to be poured at the tail end of a distributing robot is planned, the error bandwidth is set according to a given pouring precision condition and is mostly 2-time precision, a Q learning algorithm is adopted in the technology, the formed path error band is divided into areas, the divided areas are given reward values according to pouring targets and constraints, given lattices are trained, action sequences of joints of the robot are finally generated, robot actions are directly obtained, and the complex track planning process is avoided.

The continuous concrete pouring process comprises two processes relative to the rapid movement process, wherein the first process is the rapid movement process, and the second process is the continuous concrete pouring process. The fast moving process means that the initial position is not necessarily at the moving starting point of the set pouring track when the material distribution robot starts to work, so that the fast moving process is set to enable the initial position of the material distribution robot to return to the starting point of the pouring track.

In fig. 3, a path planning process of the material distributing robot in the continuous straight pouring process of the concrete is shown.

Firstly, planning a path according to a construction environment, generally setting a pouring track as a straight line, then determining a construction precision requirement, and establishing a track planning error band which takes the path as a regional center line and takes 2 times precision as a regional width. Considering that in the path optimization process, the shortest path does not need to be found, but a suboptimal solution is found under the condition of balancing efficiency and route quality, an error band is established to sacrifice part of precision, the efficiency and the route quality are balanced under the condition of a certain width of the error band, and the error is controllable, so that the requirement of track planning can be met.

And secondly, dividing grids after determining the error band, establishing a dynamic reward value model R according to a grid region, wherein the reward value of each grid is stored in a matrix form, setting the pouring tail end of the material distribution robot to be positive reward towards a path target terminal point, otherwise setting the pouring tail end of the material distribution robot to be negative reward, and setting the reward of regions outside the error band to be negative infinity so as to ensure that the material distribution robot always moves in the error band region in the planning process.

And thirdly, quantizing the reward values in the R matrix so as to facilitate the subsequent planner to learn.

1. And setting that the reward value is subtracted by one unit every time the robot acts once, so as to ensure the optimal energy requirement of the robot, namely the pouring track requirement is met with the least actions.

2. And adding scalar rewards deviating from the central value of the path and scalar rewards of finished action distances into the R matrix, namely storing the track error of the distributing robot in each state in the R matrix, directly taking the error value as the equivalent scalar negative reward, and taking the distance between the pouring position of the robot in each state and the starting point of the path as the equivalent scalar positive reward to ensure that the distributing robot moves to the target point in the error range.

3. Setting the grid of the path terminal point as the maximum reward value to ensure that the planner always moves towards the target terminal point direction during tracing planning

And fourthly, establishing an initial R matrix according to the requirements, and establishing a motion matrix a of the cloth robot, wherein the motion comprises 3 types of forward motion, static motion and backward motion, so that the specification of the motion matrix is 3 multiplied by 3, the motion matrix represents 27 states of the cloth robot under all feasible motions, and in order to prevent a dead zone of a long-term static state, a (0,0,0) state is removed, namely all joint static states are removed, namely the motion matrix a comprises 26 feasible motion states of the cloth robot.

And fifthly, establishing a dynamic Q matrix according to the specified reward matrix R and the action matrix a, wherein the Q matrix is equivalent to an action-value function, inputting the current state of the robot and the next action to be performed to obtain all reward values reaching a target point, and initializing the reward values to be 0 in all grids, and setting the grid Q value of the post-marking terminal point to be 100 units.

According to the steps, a Q learning updating strategy formula Q (s, a) ═ E [ R + gamma maxQ (s ', a') | s, a ] dynamic Q matrix is adopted to continuously train and iterate, wherein Q (s ', a') is the maximum Q value obtained in one training, gamma is the learning rate, and the value is 0.9, the updating is stopped until the set iteration number reaches the upper limit or the Q matrix is converged, and the Q matrix obtained by default is the optimal matrix, according to the method, a series of optimal mapping relations (action values and reward values) can be obtained, so that a sequence of 3 joint action values of the cloth robot obtained by planning the track in a set error zone is obtained, therefore, the track planning of straight line pouring is completed, the complex inverse kinematics solution optimization operation is avoided, and the optimal path point of the target pouring track and the state-action sequence of the material distribution robot are obtained.

As shown in fig. 4, the optimal continuous path planning points are obtained by the Q learning trajectory planner of the present application, and the optimal continuous trajectory can be obtained by fitting the portions between the path points in a conventional curve fitting manner. The concrete continuous pouring track obtained by the method provided by the application is composed of a series of discrete points, and each discrete point represents a motion state of the material distribution robot.

In fig. 4, all path points of a straight path obtained by the concrete distributing robot trajectory planner based on Q learning are all states reached by the distributing robot and are within a set error band range, so that the requirements of autonomous planning and error control of the intelligent construction engineering machinery are met.

Claims

1. The utility model provides a concrete cloth robot path planning method based on Q study, to the characteristics of redundancy concrete cloth robot, its characterized in that designs a general path planning frame, divides cloth robot path planning into two parts: one part is a rapid movement process of the cloth robot from an initial state to a path starting point and from a path end point to an initial state; the other part is the process that the material distribution robot carries out continuous concrete pouring from the starting point of the pouring path to the end point of the pouring path;

furthermore, in the process of rapid movement, an interior point method is adopted to perform inverse solution optimization with time optimization as a target without considering a middle path, and a cubic polynomial is adopted to fit a track; in the continuous pouring process of concrete, forming an error band with a certain area on a path to be poured at the tail end of the distributing robot, wherein the error band width is set according to a given pouring precision condition;

furthermore, the formed path error band is divided into regions by using a Q learning algorithm, the reward value is given to the divided regions according to the poured target and the constraint, the given grids are trained, the action sequence of each joint of the robot is finally formed, the action of the robot is directly obtained, and the complex track planning process is avoided.