CN114200936B - AGV real-time path planning method based on optimal control and width learning - Google Patents

AGV real-time path planning method based on optimal control and width learning Download PDF

Info

Publication number
CN114200936B
CN114200936B CN202111482549.5A CN202111482549A CN114200936B CN 114200936 B CN114200936 B CN 114200936B CN 202111482549 A CN202111482549 A CN 202111482549A CN 114200936 B CN114200936 B CN 114200936B
Authority
CN
China
Prior art keywords
agv
optimal
width learning
state
optimal control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111482549.5A
Other languages
Chinese (zh)
Other versions
CN114200936A (en
Inventor
吴宗泽
赖家伦
李嘉俊
任志刚
曾德宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202111482549.5A priority Critical patent/CN114200936B/en
Publication of CN114200936A publication Critical patent/CN114200936A/en
Application granted granted Critical
Publication of CN114200936B publication Critical patent/CN114200936B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides an AGV real-time path planning method based on optimal control and width learning, which relates to the technical field of AGV path planning.

Description

AGV real-time path planning method based on optimal control and width learning
Technical Field
The invention relates to the technical field of AGV path planning, in particular to an AGV real-time path planning method based on optimal control and width learning.
Background
Unmanned carrier (Automated Guided Vehicle, AGV for short) refers to a carrier equipped with an electromagnetic or optical automatic guide device, which can travel along a prescribed guide path and has safety protection and various transfer functions, and is a key device of a flexible production system, and plays an important role in object handling automation and intelligent storage.
Currently, dynamic and flexible manufacturing environments present many challenges to shop AGV path planning and real-time control. The AGV can be roughly divided into a remote control type, a semi-autonomous type and an autonomous type according to the control mode and the autonomous degree, and navigation based on a multi-track type is a path planning method adopted by the AGV at the earliest time and is a method adopted by most of path planning of the current AGV. In this method, the AGV determines the travel route by identifying the magnetic track laid on the ground, but this method is limited by the inflexibility of the magnetic track, and the expansion path is relatively complex; the visual and two-dimensional code navigation is also a navigation mode which is applied to the current AGV field, the AGV obtains two-dimensional code information to determine the position and the travel route by identifying the two-dimensional code which is stuck on the ground and has the uniqueness at intervals, and compared with a magnetic track type navigation mode, the mode has more flexible action and easy scheduling, but has the problems of easy abrasion of marks, high requirement on ambient light and the like; the laser SLAM navigation is to determine the position by transmitting laser signals through an AGV and then determining the position by reflecting the signals by a reflecting plate arranged on a wall or an upright post, the defects of the two modes can be overcome, but the problems of long drawing time, high cost and the like exist, and the market application is less.
With the development of deep learning technology, a method for applying deep learning to AGV path planning is developed, as in the prior art, an AGV path planning method and system based on reinforcement learning are disclosed, in the method, an AGV dynamics model is firstly constructed, then the AGV is taken as an intelligent body, environmental information perceived by the AGV in running is taken as state information, a state space is designed by considering destination positions and barrier positions, a continuous action space and a multiple rewarding mechanism are designed, path planning Markov process modeling is completed, in the scheme, any different starting points, target points and barriers at any positions can be given in the state space, generalizability is high, a Actor-Critic framework is subsequently introduced to conduct strategy learning training, the problem of large calculation amount is avoided in online running, the calculation force requirement is low, real-time decision control of the AGV on any target and barrier is realized, however, in the technical scheme of the patent, because a longer trial learning process of the AGV and the environment is involved, convergence is slow, relatively more time consuming, a proper function design, a neural network structure design and the like are involved in the training process.
Disclosure of Invention
In order to solve the problems that in the existing AGV path planning method based on deep learning, the parameter adjustment of a deep neural network is complex and the training process is slow, the invention provides the AGV real-time path planning method based on optimal control and width learning, no parameter adjustment work with strong human priori exists, the off-line efficient training of the width learning network is low in time consumption, and further expansion and application are provided for future large-scale workshop AGV formation and obstacle avoidance application.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
an AGV real-time path planning method based on optimal control and width learning, comprising the following steps:
s1, constructing an AGV dynamics model;
s2, taking an AGV dynamic model as dynamic constraint and taking time-burnup optimization as an objective function to establish an optimal control model;
s3, randomly generating an AGV initial point, taking the initial point as an initial condition for solving an optimal control model, and solving the optimal control model in an offline forward direction to generate a plurality of optimal control tracks with different initial points, wherein the optimal control tracks comprise 'optimal state-control rate' pairs;
s4, constructing a width learning network, integrating and classifying the optimal control track into different training data sets according to target positions based on an optimal state-control rate pair, and performing incremental offline training on the width learning network;
s5, after the offline training is completed, weight parameters of the width learning network are determined, and the trained width learning network is used as a real-time controller for real-time planning of the AGV path.
In the technical scheme, an AGV dynamic model is firstly constructed, the AGV dynamic model is used as dynamic constraint, time-burnup optimization is used as an objective function, an optimal control model is established and solved, the numerical solution of the optimal control model is considered to have optimality, but the calculation complexity is high, the optimal control model is solved in an off-line forward mode on the premise of being unsuitable for real-time calculation, a plurality of optimal control tracks with different starting points are generated, under the condition, real-time optimal control is difficult to achieve by considering off-line optimization so as to achieve the aim of optimal path planning, in order to avoid hysteresis caused by off-line optimization solution, a width learning network is introduced, different target points are used as classification basis, the optimal control tracks are integrated and classified into training data sets of different navigation tasks, an incremental training width learning network is used for obtaining the width learning network which is finally used for AGV real-time path planning, and the point AGV real-time optimal control with any starting point within a certain range is realized.
Preferably, the AGV dynamics model described in step S1 is:
Figure BDA0003395413930000031
wherein t is a time variable, t.epsilon.0, t f ],t f Designating the moment corresponding to the end state, wherein x (t) and y (t) represent the abscissa and ordinate of the position coordinate of the midpoint of the AGV at the moment t, and P= (x, y) represents the position coordinate of the center of the AGV; θ (t) represents the azimuth angle between the AGV and the target position at time t, φ (t) represents the steering angle at time t, and α (t) represents the acceleration in the azimuth direction at time t; omega (t) represents angular velocity, L w Indicating the track length of the AGV.
Preferably, in step S2, the expression of the optimal control model established by using the AGV dynamics model as a dynamic constraint and using the time-burnup optimization as an objective function is:
objective function:
Figure BDA0003395413930000032
constraint conditions:
Figure BDA0003395413930000033
wherein ,J1 Indicating burnup;
Figure BDA0003395413930000037
representing the importance degree of time optimization and fuel consumption optimization in the balance optimization target; e, e i Representing a path constraint designed from known obstacle locations, satisfying: />
Figure BDA0003395413930000034
Where i=1,..,
Figure BDA0003395413930000038
representing the position coordinate of the ith barrier at t moment, r i Representing the corresponding obstacle radius, k representing a collision early warning threshold;
Figure BDA0003395413930000035
Representing the path constraint of the transformation, ε is a positive number approaching 0;
Figure BDA0003395413930000036
Representing an AGV dynamics model, s (t) = [ x (t), y (t), v (t), phi (t), theta (t)]As state variables, c= [ α (t), ω (t)]For control rate, bound (s (t), u (t)) represents the boundary value constraints of the AGV; s (t) 0) and s(tf ) Indicating a given initial and final status of the AGV.
Preferably, the randomly generated initial point of the AGV in step S3 is expressed as:
s(t 0 )=[x_random,y_random,0,0,0]
taking the initial point as an initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction, wherein the method is not limited to a pseudo-spectrum method and a targeting method in an optimal control direct method; firstly, interpolating a state variable S (tau) and a control rate C (tau), wherein the interpolation method is only to obtain a value of a middle point for solving and calculating, is not limited to a Lagrange interpolation method, and finally generates a plurality of optimal control tracks with different starting points, and is expressed as follows:
Figure BDA0003395413930000041
wherein ,
Figure BDA0003395413930000042
represents an optimal set of control trajectories, each (s t ,c t ) And the "optimal state-control rate" pair (s, c) is formed.
Preferably, according to different target position integration, based on the "optimal state-control rate" pair (s, c) corresponding to each target position, the optimal control trajectory integration is categorized into different training data sets, the correspondence being characterized as:
Figure BDA0003395413930000043
wherein ,
Figure BDA0003395413930000044
representing an independent dataset comprising 'optimal state-control rate' pairs obtained for all optimal control trajectories in the target state for the A-position +.>
Figure BDA0003395413930000045
The same applies.
Preferably, the constructed width learning network comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises characteristic nodes, enhancement nodes and incremental enhancement nodes;
let S denote the optimal state in training data set with a certain position as target state, C denote the control rate in training data set with a certain position as target state, S input the input layer of the width learning network, then form n groups of characteristic node matrix by n groups of characteristic mapping, let Z i Representing the ith group of characteristic nodes, and splicing n groups of characteristic node matrixes into: z is Z n =[Z 1 ,Z 2 ,...,Z n ]Wherein the ith set of feature nodes is represented as:
Z i =Q(SW eiei ),i=1,2,...,n
wherein Q represents a linear or nonlinear activation function, W ei and βei Respectively randomly initialized weights and biases; the mapping features are enhancement nodes with randomly generated weights, and m groups of enhancement node matrixes H are formed by nonlinear transformation on the basis of a feature node matrix m =[H 1 ,H 2 ,...,H m ],H j Representing the j-th set of enhancement nodes, expressed as:
H j =ξ(Z n W hjhj ),j=1,2,...,m
wherein ζ represents a nonlinear activation function, W hj and βhj Random weights and biases, respectively; hidden layer node matrix is spliced into A m =[Z n |H m ]The output of the breadth-learning network is:
Figure BDA0003395413930000051
after the feature node is unchanged and the enhancement node is newly added, the hidden layer is changed into A m+1 =[A m |ξ(Z n W m+1m+1 )],W m+1 and βm+1 Respectively, new random weights and biases, which are randomly generated and kept unchanged during training, by newly increasing W m+1 The table capacity of the width learning network is enhanced, the final fixed network structure is fitted to the target output control rate C, the weight between the hidden layer and the output layer is solved by means of a pseudo-inverse matrix, the weight is approximated by a ridge regression method, and the expression of the pseudo-inverse matrix is as follows:
Figure BDA0003395413930000052
w is then m =(A m )+C。
Preferably, the width learning network is incrementally trained by adding the enhancement nodes, and the newly added enhancement nodes are expressed as:
Figure BDA0003395413930000053
the hidden layer is expressed as: a is that m+1 =[A m |H m+1 ]The pseudo-inverse of the change due to the newly added enhancement node is expressed as:
Figure BDA0003395413930000054
wherein ,
Figure BDA0003395413930000055
Figure BDA0003395413930000056
after the enhancement node is added, the weight matrix expression of the mapping relation from the optimal state to the optimal control is as follows:
Figure BDA0003395413930000057
at this time, the actual output of the width learning network output layer is:
Figure BDA0003395413930000058
calculate the actual output
Figure BDA0003395413930000059
Error from the control rate c in the training dataset (s, c):
Figure BDA00033954139300000510
wherein I F If the error does not meet the threshold value, the incremental training of the width learning network is continued by increasing the enhancement nodes; and when the error meets the threshold value, stopping increasing the enhancement nodes, and storing the width learning network model at the moment.
The training of the width learning network does not have the parameter adjustment work with strong human priori, does not have the process of slow gradient optimization until the objective function converges, and can obtain the network parameter matrix through solving the pseudo-inverse matrix by only using an iterative incremental learning method, thereby having low time consumption.
Preferably, the training data set is subjected to a normalization process of the data prior to use in the breadth-learning network training.
Preferably, after the optimal state S and the control rate C in the training data set in the state of targeting a certain position are input into the width learning network to complete training, the result output by the width learning network needs inverse normalization processing.
Preferably, after the training is completed by inputting the optimal state S and the control rate C in the training data set in a state where a certain position is a target, the weight of the hidden layer of the current width learning network is saved, and the other positions except the position are extracted as the optimal state and the control rate in the training data set in the target state for training until the training data sets corresponding to all the target positions are traversed, and the training of the width learning network corresponding to the target positions one by one is completed.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
compared with the traditional method of obtaining the path track by only directly solving the optimal control model offline, the AGV implementation path planning method based on the optimal control and the width learning provided by the invention has the advantages that the defect that the real-time optimal control cannot be realized by offline optimization to obtain the optimal track is emphasized, in order to avoid the hysteresis caused by offline optimization solution, the width learning network is introduced, different target points are used as classification basis, the optimal control track is classified into training data sets of different navigation tasks, the width learning network is incrementally trained, the width learning network which is finally used for AGV real-time path planning is obtained, the real-time optimal control of the AGV at any initial point within a certain range is realized, and when the width learning network is used for online operation, the problem of large calculation amount does not exist because the control rate prediction only relates to simple matrix operation, and the real-time performance of path planning is ensured. In addition, the method provided by the invention has no problems of abrasion of the mark, difficult path expansion, high environmental requirement and long drawing time, compared with the offline training of the deep neural network, the training of the width learning network has no artificial priori strong parameter adjustment work, has no slow gradient optimization until the objective function converges, and can complete the training by only obtaining the weight parameter matrix through an iterative incremental learning method, thereby having low time consumption.
Drawings
FIG. 1 is a schematic flow chart of an AGV adaptive path planning method based on optimal control and width learning according to embodiment 1 of the present invention;
FIG. 2 is a physical schematic diagram corresponding to the AGV dynamics model proposed in embodiment 1 of the present invention;
fig. 3 is a diagram showing the overall framework of the width learning network proposed in embodiment 1 of the present invention;
FIG. 4 is a schematic diagram showing the AGV path output at different target positions as proposed in embodiment 3 of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
it will be appreciated by those skilled in the art that some well known descriptions in the figures may be omitted.
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
The positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent;
example 1
The path planning problem of an AGV can be regarded as a path planning problem in which the starting point of the AGV is not fixed, but the end point (i.e., the coordinate point of the delivery task) is fixed, and the conventional path method can be divided into: the conventional path searching method is usually based on a grid map, the searched path does not necessarily conform to the dynamic constraint of the vehicle (the vehicle cannot possibly make a traversing action), so that uncertainty of optimization time and optimization quality is brought to the later path optimization, and therefore, in the practical implementation, in order to ensure the validity of a path planning result, the dynamic model of the AGV is considered in the planning as the constraint at the beginning stage, and at the moment, the path planning problem of the AGV is converted into the path planning problem with fixed starting point and end point, so that the optimal control method for solving the boundary value problem of two points is used for carrying out forward solving.
Specifically, referring to fig. 1, the present embodiment proposes an AGV real-time path planning method based on optimal control and width learning, where the method includes the following steps:
s1, constructing an AGV dynamics model;
s2, taking an AGV dynamic model as dynamic constraint and taking time-burnup optimization as an objective function to establish an optimal control model;
s3, randomly generating an AGV initial point, taking the initial point as an initial condition for solving an optimal control model, and solving the optimal control model in an offline forward direction to generate a plurality of optimal control tracks with different initial points, wherein the optimal control tracks comprise 'optimal state-control rate' pairs;
s4, constructing a width learning network, integrating and classifying the optimal control track into different training data sets according to target positions based on an optimal state-control rate pair, and performing incremental offline training on the width learning network;
s5, after the offline training is completed, weight parameters of the width learning network are determined, and the trained width learning network is used as a real-time controller for real-time planning of the AGV path.
In this embodiment, based on newton classical mechanics and according to the general nature of the actual AGV that has been dosed, a two-degree-of-freedom vehicle model is used to model the kinematics of the AGV, and in conjunction with fig. 2, the dynamics model of the AGV is expressed as:
Figure BDA0003395413930000071
wherein t is a time variable, t.epsilon.0, t f ],t f Designating the moment corresponding to the end state, wherein x (t) and y (t) represent the abscissa and ordinate of the position coordinate of the midpoint of the AGV at the moment t, and P= (x, y) represents the position coordinate of the center of the AGV; θ (t) represents the azimuth angle between the AGV and the target position at time t, φ (t) represents the steering angle at time t, and α (t) represents the acceleration in the azimuth direction at time t; omega (t) represents angular velocity, L w Indicating the track length of the AGV.
The AGV dynamic model is used as dynamic constraint, the collision constraint is smoothed, and the time-burnup optimization is used as an objective function, so that the track planning of the AGV is converted into solving of an optimal control problem with corresponding constraint, and the established optimal control model expression is:
objective function (optimization objective with time-burn-up optimization):
Figure BDA0003395413930000081
constraint conditions:
Figure BDA0003395413930000082
wherein ,J1 Indicating burnup;
Figure BDA0003395413930000086
representing the importance degree of time optimization and fuel consumption optimization in the balance optimization target; e, e i Representing a path constraint designed from known obstacle locations, satisfying:
Figure BDA0003395413930000083
where i=1,..,
Figure BDA0003395413930000087
representing the position coordinate of the ith barrier at t moment, r i Representing the corresponding obstacle radius, k representing a collision early warning threshold;
Figure BDA0003395413930000084
Representing the path constraint of the transformation, ε is a positive number approaching 0;
Figure BDA0003395413930000085
Representing an AGV dynamics model, s (t) = [ x (t), y (t), v (t), phi (t), theta (t)]As state variables, c= [ α (t), ω (t)]For control rate, bound (s (t), u (t)) represents the boundary value constraints of the AGV; s (t) 0) and s(tf ) Indicating a given initial and final status of the AGV.
Because the AGV is in a scene of material handling, the AGV is an optimal control problem with fixed end state, and the method for solving the optimal control model is not limited to one of the direct methods of optimal control by independently taking the AGV as the end state to carry out iterative calculation aiming at a required handling destination. Specifically, let the randomly generated initial AGV point be:
s(t 0 )=[x_random,y_random,0,0,0]
taking the initial point as an initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction, wherein the method is not limited to a pseudo-spectrum method and a targeting method in an optimal control direct method; firstly, interpolating a state variable S (tau) and a control rate C (tau) for solving and calculating, wherein the interpolation method is not limited to a Lagrange interpolation method, and the state variable S (tau) and the control rate C (tau) are interpolated, and the process is as follows:
Figure BDA0003395413930000091
Figure BDA0003395413930000092
Figure BDA0003395413930000093
Figure BDA0003395413930000094
finally, generating a plurality of optimal control tracks with different starting points, wherein the optimal control tracks are expressed as follows:
Figure BDA0003395413930000095
wherein ,
Figure BDA0003395413930000096
represents an optimal set of control trajectories, each (s t ,c t ) And the "optimal state-control rate" pair (s, c) is formed.
Many track optimization problems in actual engineering, such as spacecraft track optimization, unmanned vehicle track optimization and the like, are two-point boundary value problems, can be expressed as optimal control problems to solve, and because of high-order nonlinearity of a model and complex path constraint conditions, the problem is relatively complex to directly solve, and only numerical solutions can be generally obtained. The traditional optimal control theory is based on a variational method and a Pontryagin extremum principle, the optimal track design problem is converted into a Hamiltonian-Jacobian-Belman equation and a two-point boundary value problem, and then the control object performs track tracking along the designed track. This approach is commonly used for optimal control problems where performance metrics are defined to maximize search area, minimize time consumption, minimize burnup plans, minimize end state errors, etc. The main idea of the traditional optimal control method is to track a pre-designed optimal track according to a real state, however, one important problem related to the strategy is that as the model is a nonlinear differential equation, the differential equation is solved in an optimization process, and for a complex scene of the model, an analytical solution is difficult to solve.
On the premise of offline solution, the aim that optimal control is difficult to realize in real time by offline optimization is considered to achieve the aim of optimal track, in order to avoid hysteresis caused by offline optimization solution, a width learning network is introduced, and before formally being used for the width learning network, the optimal control track is integrated and classified into training data sets of different navigation tasks by taking different target points as classification basis, specifically:
according to different target position integration, based on the 'optimal state-control rate' pair (s, c) corresponding to each target position, classifying the optimal control track integration into different training data sets, and correspondingly characterizing as:
Figure BDA0003395413930000097
wherein ,
Figure BDA0003395413930000101
representing an independent dataset comprising 'optimal state-control rate' pairs obtained for all optimal control trajectories in the target state for the A-position +.>
Figure BDA0003395413930000102
The same applies.
In this embodiment, referring to fig. 3, the constructed width learning network includes an input layer, a hidden layer and an output layer, where the hidden layer includes feature nodes, enhancement nodes and incremental enhancement nodes, the structure is formed by introducing incremental learning ideas on the basis of a width learning system, and the new structure can iteratively improve the feature extraction capability of the model, increase the expression capability of the model, so that the fitting performance of the model is improved, and the width learning network can quickly learn to be closer to the actual law by using these updated weights, and the specific process is as follows:
let S denote the optimal state in training data set with a certain position as target state, C denote the control rate in training data set with a certain position as target state, S input the input layer of the width learning network, then form n groups of characteristic node matrix by n groups of characteristic mapping, let Z i Representing the ith group of characteristic nodes, and splicing n groups of characteristic node matrixes into: z is Z n =[Z 1 ,Z 2 ,...,Z n ]In order to obtain sparse representation of input data in the feature mapping process, weights of an input layer and a hidden layer can be adjusted through a sparse self-coding technology, and optimal weights are automatically selected in the decoding process. Wherein the i-th set of feature nodes is represented as:
Z i =Q(SS eiei ),i=1,2,...,n
wherein Q represents a linear or nonlinear activation function, W ei and βei Respectively randomly initialized weights and biases; the mapping is characterized by randomly generating weighted enhancement nodes, in particularOn the basis of the sign node matrix, forming m groups of enhanced node matrixes H through nonlinear transformation m =[H 1 ,H 2 ,...,H m ],H j Representing the j-th set of enhancement nodes, expressed as:
H j =ξ(Z n W hjhj ),j=1,2,...,m
wherein ζ represents a nonlinear activation function, W hj and βhj Random weights and biases, respectively; hidden layer node matrix is spliced into A m =[Z n |H m ]The output of the breadth-learning network is:
Figure BDA0003395413930000104
after the feature node is unchanged and the enhancement node is newly added, the hidden layer is changed into A m+1 =[A m |ξ(Z n W m+1m+1 )],W m+1 and βm+1 Respectively, new random weights and biases, which are randomly generated and kept unchanged during training, by newly increasing W m+1 The table capacity of the width learning network is enhanced, the final fixed network structure is fitted to the target output control rate C, the weight between the hidden layer and the output layer is solved by means of a pseudo-inverse matrix, the weight is approximated by a ridge regression method, and the expression of the pseudo-inverse matrix is as follows:
Figure BDA0003395413930000103
w is then m =(A m )+C。
In order to make the width network have better fitting performance, the width learning network is incrementally trained by adding the enhancement nodes, and the newly added enhancement nodes are expressed as:
Figure BDA0003395413930000111
the hidden layer is expressed as: a is that m+1 =[A m |H m+1 ]Because of adding the reinforcing sectionThe pseudo-inverse matrix of the points as a function is expressed as:
Figure BDA0003395413930000112
wherein ,
Figure BDA0003395413930000113
Figure BDA0003395413930000114
after the enhancement node is added, the weight matrix expression of the mapping relation from the optimal state to the optimal control is as follows:
Figure BDA0003395413930000115
at this time, the actual output of the width learning network output layer is:
Figure BDA0003395413930000116
calculate the actual output
Figure BDA0003395413930000117
Error from the control rate c in the training dataset (s, c):
Figure BDA0003395413930000118
wherein I F If the error does not meet the threshold value, the incremental training of the width learning network is continued by increasing the enhancement nodes; and when the error meets the threshold value, stopping increasing the enhancement nodes, and storing the width learning network model at the moment. The training of the width learning network does not have the parameter adjustment work with strong human priori, does not have the process of slow gradient optimization until the objective function converges, and can be obtained by solving the pseudo-inverse matrix through an iterative incremental learning methodAnd the time consumption is low when the network parameter matrix is output.
After the optimal state S and the control rate C in the training data set with a certain position as the target state are input into the width learning network to complete training, the weight of the hidden layer of the current width learning network is saved, other positions except the position are extracted to be used as the optimal state and the control rate in the training data set with the target state to complete training until the training data sets corresponding to all the target positions are traversed, and the width learning network training corresponding to a plurality of the target positions one by one is completed.
Example 2
In this example, the breadth-learning network was subjected to normalization of the data prior to use in the breadth-learning network training, including but not limited to max-min normalization, Z-score normalization, and function transformation, since the magnitude of the data set was not within one magnitude, in addition to the training described in example 1.
After the training is completed by the input width learning network of the optimal state S and the control rate C in the training data set with a certain position as a target state (such as starting as
Figure BDA0003395413930000121
) The result output by the width learning network is required to be inversely normalized, and finally is used as the control rate conforming to the physical meaning.
Example 3
In this embodiment, on the basis of embodiment 1 and embodiment 2, focus is considered on the discussion of the path planning of the AGV in which the initial point is extended to any point and the destination is extended to different end states, under the condition that different initial points are obtained based on a certain fixed transport end point, a width learning network is independently trained on the "optimal state-control rate" pair (s, c), and the weight matrix W conforming to the mapping relationship from the optimal state to the optimal control can be quickly learned by combining with the pseudo-inverse matrix solution of the incremental method, that is, the optimal control problem when the initial point is changed can be solved, and the initial point in the online optimal control is promoted to any point in the set area.
The number of the enhanced nodes M and the number of the newly added enhanced nodes can be selected by balancing according to the calculation power and the prediction precision in a specific scene.
For different carrying target points, namely end states, in an actual carrying scene, only the end states need to be repeatedly and positively solved for the optimal control problem, the width learning network weight parameters corresponding to the different end states are respectively learned and then stored in a network, and referring to fig. 4, real-time optimal path planning control on different end states (such as a target A and a target Z different from the target A in fig. 4) is realized by calling each trained width learning network.
It is to be understood that the above examples of the present invention are provided by way of illustration only and are not intended to limit the scope of the invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.

Claims (8)

1. An AGV real-time path planning method based on optimal control and width learning is characterized by comprising the following steps:
s1, constructing an AGV dynamics model;
the AGV dynamics model described in the step S1 is as follows:
Figure FDA0004186161360000011
wherein t is a time variable, t.epsilon.0, t f ],t f Designating the moment corresponding to the end state, wherein x (t) and y (t) represent the abscissa and ordinate of the position coordinate of the midpoint of the AGV at the moment t, and P= (x, y) represents the position coordinate of the center of the AGV; θ (t) represents the azimuth angle between the AGV and the target position at time t, φ (t) represents the steering angle at time t, and α (t) represents the azimuth angle at time tAcceleration in the direction; omega (t) represents angular velocity, L w Indicating the track length of the AGV;
s2, taking an AGV dynamic model as dynamic constraint and taking time-burnup optimization as an objective function to establish an optimal control model;
in the step S2, the AGV dynamic model is used as dynamic constraint, the time-burnup optimization is used as an objective function, and the established optimal control model expression is as follows:
objective function:
Figure FDA0004186161360000012
constraint conditions:
Figure FDA0004186161360000013
wherein ,J1 Indicating burnup;
Figure FDA0004186161360000014
representing the importance degree of time optimization and fuel consumption optimization in the balance optimization target; e, e i Representing a path constraint designed from known obstacle locations, satisfying:
Figure FDA0004186161360000015
where i=1, …, N,
Figure FDA0004186161360000021
representing the position coordinate of the ith barrier at t moment, r i Representing the corresponding obstacle radius, k representing a collision early warning threshold;
Figure FDA0004186161360000022
Representing the path constraint of the transformation, ε is a positive number approaching 0;
Figure FDA0004186161360000023
Representing an AGV dynamics model, s (t) = [ x (t), y (t), v (t), phi (t), theta (t)]As state variables, c= [ α (t), ω (t)]For control rate, bound (s (t), u (t)) represents the boundary value constraints of the AGV; s (t) 0) and s(tf ) Indicating the initial state and the final state of the given AGV;
s3, randomly generating an AGV initial point, taking the initial point as an initial condition for solving an optimal control model, and solving the optimal control model in an offline forward direction to generate a plurality of optimal control tracks with different initial points, wherein the optimal control tracks comprise 'optimal state-control rate' pairs;
s4, constructing a width learning network, integrating and classifying the optimal control track into different training data sets according to target positions based on an optimal state-control rate pair, and performing incremental offline training on the width learning network;
s5, after the offline training is completed, weight parameters of the width learning network are determined, and the trained width learning network is used as a real-time controller for real-time planning of the AGV path.
2. The method for planning a real-time path of an AGV based on optimal control and width learning according to claim 1, wherein the randomly generated initial point of the AGV in step S3 is expressed as:
s(t 0 )=[x_random,y_random,0,0,0]
taking the initial point as an initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction; firstly, interpolating state variables and control rates for solving and calculating, and finally generating a plurality of optimal control tracks with different starting points, wherein the optimal control tracks are expressed as follows:
Figure FDA0004186161360000024
wherein ,
Figure FDA0004186161360000025
represents an optimal set of control trajectories, each (s t ,c t ) And the "optimal state-control rate" pair (s, c) is formed.
3. The AGV real-time path planning method according to claim 2, wherein the optimal control trajectory integration is categorized into different training data sets based on the "optimal state-control rate" pair (s, c) corresponding to each target position according to different target position integration, and the correspondence is characterized by:
Figure FDA0004186161360000026
wherein ,
Figure FDA0004186161360000027
representing an independent dataset comprising 'optimal state-control rate' pairs obtained for all optimal control trajectories in the target state for the A-position +.>
Figure FDA0004186161360000028
The independent data sets which are summarized by the optimal state-control rate obtained by all the optimal control tracks in the B position as a target state and the optimal state-control rate obtained by all the optimal control tracks in the C position as a target state are respectively represented.
4. The AGV real-time path planning method based on optimal control and width learning according to claim 3, wherein the constructed width learning network comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises feature nodes, enhancement nodes and incremental enhancement nodes;
let S denote the optimal state in training data set with a certain position as target state, C denote the control rate in training data set with a certain position as target state, S input the input layer of width learning network, and then input the data into n groupsFeature mapping to form n groups of feature node matrixes, and setting Z i Representing the ith group of characteristic nodes, and splicing n groups of characteristic node matrixes into: z is Z n =[Z 1 ,Z 2 ,...,Z n ]Wherein the ith set of feature nodes is represented as:
Z i =Q(SW eiei ),i=1,2,...,n
wherein Q represents a linear or nonlinear activation function, W ei and βei Respectively randomly initialized weights and biases; the mapping features are enhancement nodes with randomly generated weights, and m groups of enhancement node matrixes H are formed by nonlinear transformation on the basis of a feature node matrix m =[H 1 ,H 2 ,…,H m ],H j Representing the j-th set of enhancement nodes, expressed as:
H j =ξ(Z n W hjhj ),j=1,2,...,m
wherein ζ represents a nonlinear activation function, W hj and βhj Random weights and biases, respectively; hidden layer node matrix is spliced into A m =[Z n |H m ]The output of the breadth-learning network is:
Figure FDA0004186161360000031
after the feature node is unchanged and the enhancement node is newly added, the hidden layer is changed into A m+1 =[A m |ξ(Z n W m+1m+1 )],W m+1 and βm+1 Respectively, new random weights and biases, which are randomly generated and kept unchanged during training, by newly increasing W m+1 The table capacity of the width learning network is enhanced, the final fixed network structure is fitted to the target output control rate C, the weight between the hidden layer and the output layer is solved by means of a pseudo-inverse matrix, the weight is approximated by a ridge regression method, and the expression of the pseudo-inverse matrix is as follows:
Figure FDA0004186161360000032
w is then m =(A m ) + C。
5. The method for planning a real-time path of an AGV based on optimal control and width learning according to claim 4, wherein the incremental training of the width learning network by adding the newly added enhancement nodes is performed by:
Figure FDA0004186161360000033
Z m+1 representing the m+1th group of characteristic nodes, +.>
Figure FDA0004186161360000034
Represents the h m+1 The set of random weights is chosen to be of a type,
Figure FDA0004186161360000035
represents the h m+1 Group bias, then the hidden layer is expressed as: a is that m+1 =[A m |H m+1 ]The pseudo-inverse of the change due to the newly added enhancement node is expressed as:
Figure FDA0004186161360000041
wherein ,
Figure FDA0004186161360000042
Figure FDA0004186161360000043
after the enhancement node is added, the weight matrix expression of the mapping relation from the optimal state to the optimal control is as follows:
Figure FDA0004186161360000044
at this time, the actual output of the width learning network output layer is:
Figure FDA0004186161360000045
calculate the actual output
Figure FDA0004186161360000046
Error from the control rate C in the training dataset (s, C):
Figure FDA0004186161360000047
wherein I F If the error does not meet the threshold value, the incremental training of the width learning network is continued by increasing the enhancement nodes; and when the error meets the threshold value, stopping increasing the enhancement nodes, and storing the width learning network model at the moment.
6. The optimal control and width learning based AGV real time path planning method according to claim 5 wherein the training dataset is normalized before being used for the width learning network training.
7. The method for planning a real-time path of an AGV based on optimal control and width learning according to claim 6, wherein after the training is completed by the width learning network for inputting the optimal state S and the control rate C in the training data set in a state of targeting a certain position, the result outputted by the width learning network is subjected to inverse normalization.
8. The method for planning the real-time path of the AGV based on the optimal control and the width learning according to claim 7, wherein after the optimal state S and the control rate C in the training data set in a state of taking a certain position as a target are input into the width learning network to complete training, weight information of a hidden layer of the current width learning network is stored, other positions except the position are extracted to be used as the optimal state and the control rate in the training data set in the target state to complete training until the training data sets corresponding to all the target positions are traversed, and the training of the width learning network corresponding to a plurality of target positions one by one is completed.
CN202111482549.5A 2021-12-06 2021-12-06 AGV real-time path planning method based on optimal control and width learning Active CN114200936B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111482549.5A CN114200936B (en) 2021-12-06 2021-12-06 AGV real-time path planning method based on optimal control and width learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111482549.5A CN114200936B (en) 2021-12-06 2021-12-06 AGV real-time path planning method based on optimal control and width learning

Publications (2)

Publication Number Publication Date
CN114200936A CN114200936A (en) 2022-03-18
CN114200936B true CN114200936B (en) 2023-06-13

Family

ID=80650869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111482549.5A Active CN114200936B (en) 2021-12-06 2021-12-06 AGV real-time path planning method based on optimal control and width learning

Country Status (1)

Country Link
CN (1) CN114200936B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117391575B (en) * 2023-12-08 2024-03-22 青岛盈智科技有限公司 Truck transportation route planning method based on path analysis

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635245A (en) * 2018-09-29 2019-04-16 中国矿业大学 A kind of robust width learning system
CN109884886B (en) * 2019-03-29 2021-09-28 大连海事大学 Ship motion model-free adaptive optimal control method based on width learning
CN111880405B (en) * 2020-07-03 2022-06-14 广东工业大学 AGV self-adaptive path planning real-time control method in flexible manufacturing workshop system

Also Published As

Publication number Publication date
CN114200936A (en) 2022-03-18

Similar Documents

Publication Publication Date Title
CN111780777B (en) Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning
Li et al. Prescribed performance concurrent control of connected vehicles with nonlinear third-order dynamics
Zhao et al. A novel direct trajectory planning approach based on generative adversarial networks and rapidly-exploring random tree
Orozco-Rosas et al. Mobile robot path planning using a QAPF learning algorithm for known and unknown environments
CN109491389A (en) A kind of robot trace tracking method with constraint of velocity
CN115683145A (en) Automatic driving safety obstacle avoidance method based on track prediction
CN111880405B (en) AGV self-adaptive path planning real-time control method in flexible manufacturing workshop system
Li et al. Navigation of mobile robots based on deep reinforcement learning: Reward function optimization and knowledge transfer
Han Automatic parking path planning based on ant colony optimization and the grid method
CN114200936B (en) AGV real-time path planning method based on optimal control and width learning
CN113391633A (en) Urban environment-oriented mobile robot fusion path planning method
Xu et al. Model predictive control-based path tracking control for automatic guided vehicles
Zhang et al. Hybrid path planning model for multiple robots considering obstacle avoidance
CN113959446B (en) Autonomous logistics transportation navigation method for robot based on neural network
CN115903894A (en) Unmanned aerial vehicle trajectory planning and tracking control method based on improved AAPF-IRRT algorithm
Xu et al. Path Planning for Autonomous Articulated Vehicle Based on Improved Goal‐Directed Rapid‐Exploring Random Tree
Yu et al. Hierarchical reinforcement learning combined with motion primitives for automated overtaking
Li et al. Path planning for intelligent vehicles based on improved D* Lite
Chen et al. Path tracking controller design of automated parking systems via NMPC with an instructible solution
Xie et al. A distributed multi-agent formation control method based on deep Q learning
CN115061470A (en) Unmanned vehicle improved TEB navigation method suitable for narrow space
CN114559439A (en) Intelligent obstacle avoidance control method and device for mobile robot and electronic equipment
Lu et al. A Two-layer MPC Approach for Consistent Planning and Control of Autonomous Vehicles
Yang et al. Automatic control method of driving direction of unmanned ground vehicle based on association rules
Lian et al. A Full Transfer Learning LSTM-Based Fractional Order Optimization Method of GM ($ r $, 2) for Inferring Driving Intention

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant