CN114200936B - AGV real-time path planning method based on optimal control and width learning - Google Patents
AGV real-time path planning method based on optimal control and width learning Download PDFInfo
- Publication number
- CN114200936B CN114200936B CN202111482549.5A CN202111482549A CN114200936B CN 114200936 B CN114200936 B CN 114200936B CN 202111482549 A CN202111482549 A CN 202111482549A CN 114200936 B CN114200936 B CN 114200936B
- Authority
- CN
- China
- Prior art keywords
- agv
- optimal
- width learning
- state
- optimal control
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 64
- 238000012549 training Methods 0.000 claims description 71
- 238000005457 optimization Methods 0.000 claims description 34
- 239000011159 matrix material Substances 0.000 claims description 26
- 230000006870 function Effects 0.000 claims description 23
- 238000013507 mapping Methods 0.000 claims description 11
- 230000009466 transformation Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 230000004888 barrier function Effects 0.000 claims description 6
- 230000010354 integration Effects 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 5
- 230000008685 targeting Effects 0.000 claims description 4
- 239000000446 fuel Substances 0.000 claims description 3
- 230000008859 change Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 description 12
- 238000004364 calculation method Methods 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000009975 flexible effect Effects 0.000 description 3
- 238000005299 abrasion Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 230000003014 reinforcing effect Effects 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000004800 variational method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 238000002759 z-score normalization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05D—SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
- G05D1/00—Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
- G05D1/02—Control of position or course in two dimensions
- G05D1/021—Control of position or course in two dimensions specially adapted to land vehicles
- G05D1/0212—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
- G05D1/0221—Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Aviation & Aerospace Engineering (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Feedback Control In General (AREA)
Abstract
The invention provides an AGV real-time path planning method based on optimal control and width learning, which relates to the technical field of AGV path planning.
Description
Technical Field
The invention relates to the technical field of AGV path planning, in particular to an AGV real-time path planning method based on optimal control and width learning.
Background
Unmanned carrier (Automated Guided Vehicle, AGV for short) refers to a carrier equipped with an electromagnetic or optical automatic guide device, which can travel along a prescribed guide path and has safety protection and various transfer functions, and is a key device of a flexible production system, and plays an important role in object handling automation and intelligent storage.
Currently, dynamic and flexible manufacturing environments present many challenges to shop AGV path planning and real-time control. The AGV can be roughly divided into a remote control type, a semi-autonomous type and an autonomous type according to the control mode and the autonomous degree, and navigation based on a multi-track type is a path planning method adopted by the AGV at the earliest time and is a method adopted by most of path planning of the current AGV. In this method, the AGV determines the travel route by identifying the magnetic track laid on the ground, but this method is limited by the inflexibility of the magnetic track, and the expansion path is relatively complex; the visual and two-dimensional code navigation is also a navigation mode which is applied to the current AGV field, the AGV obtains two-dimensional code information to determine the position and the travel route by identifying the two-dimensional code which is stuck on the ground and has the uniqueness at intervals, and compared with a magnetic track type navigation mode, the mode has more flexible action and easy scheduling, but has the problems of easy abrasion of marks, high requirement on ambient light and the like; the laser SLAM navigation is to determine the position by transmitting laser signals through an AGV and then determining the position by reflecting the signals by a reflecting plate arranged on a wall or an upright post, the defects of the two modes can be overcome, but the problems of long drawing time, high cost and the like exist, and the market application is less.
With the development of deep learning technology, a method for applying deep learning to AGV path planning is developed, as in the prior art, an AGV path planning method and system based on reinforcement learning are disclosed, in the method, an AGV dynamics model is firstly constructed, then the AGV is taken as an intelligent body, environmental information perceived by the AGV in running is taken as state information, a state space is designed by considering destination positions and barrier positions, a continuous action space and a multiple rewarding mechanism are designed, path planning Markov process modeling is completed, in the scheme, any different starting points, target points and barriers at any positions can be given in the state space, generalizability is high, a Actor-Critic framework is subsequently introduced to conduct strategy learning training, the problem of large calculation amount is avoided in online running, the calculation force requirement is low, real-time decision control of the AGV on any target and barrier is realized, however, in the technical scheme of the patent, because a longer trial learning process of the AGV and the environment is involved, convergence is slow, relatively more time consuming, a proper function design, a neural network structure design and the like are involved in the training process.
Disclosure of Invention
In order to solve the problems that in the existing AGV path planning method based on deep learning, the parameter adjustment of a deep neural network is complex and the training process is slow, the invention provides the AGV real-time path planning method based on optimal control and width learning, no parameter adjustment work with strong human priori exists, the off-line efficient training of the width learning network is low in time consumption, and further expansion and application are provided for future large-scale workshop AGV formation and obstacle avoidance application.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
an AGV real-time path planning method based on optimal control and width learning, comprising the following steps:
s1, constructing an AGV dynamics model;
s2, taking an AGV dynamic model as dynamic constraint and taking time-burnup optimization as an objective function to establish an optimal control model;
s3, randomly generating an AGV initial point, taking the initial point as an initial condition for solving an optimal control model, and solving the optimal control model in an offline forward direction to generate a plurality of optimal control tracks with different initial points, wherein the optimal control tracks comprise 'optimal state-control rate' pairs;
s4, constructing a width learning network, integrating and classifying the optimal control track into different training data sets according to target positions based on an optimal state-control rate pair, and performing incremental offline training on the width learning network;
s5, after the offline training is completed, weight parameters of the width learning network are determined, and the trained width learning network is used as a real-time controller for real-time planning of the AGV path.
In the technical scheme, an AGV dynamic model is firstly constructed, the AGV dynamic model is used as dynamic constraint, time-burnup optimization is used as an objective function, an optimal control model is established and solved, the numerical solution of the optimal control model is considered to have optimality, but the calculation complexity is high, the optimal control model is solved in an off-line forward mode on the premise of being unsuitable for real-time calculation, a plurality of optimal control tracks with different starting points are generated, under the condition, real-time optimal control is difficult to achieve by considering off-line optimization so as to achieve the aim of optimal path planning, in order to avoid hysteresis caused by off-line optimization solution, a width learning network is introduced, different target points are used as classification basis, the optimal control tracks are integrated and classified into training data sets of different navigation tasks, an incremental training width learning network is used for obtaining the width learning network which is finally used for AGV real-time path planning, and the point AGV real-time optimal control with any starting point within a certain range is realized.
Preferably, the AGV dynamics model described in step S1 is:
wherein t is a time variable, t.epsilon.0, t f ],t f Designating the moment corresponding to the end state, wherein x (t) and y (t) represent the abscissa and ordinate of the position coordinate of the midpoint of the AGV at the moment t, and P= (x, y) represents the position coordinate of the center of the AGV; θ (t) represents the azimuth angle between the AGV and the target position at time t, φ (t) represents the steering angle at time t, and α (t) represents the acceleration in the azimuth direction at time t; omega (t) represents angular velocity, L w Indicating the track length of the AGV.
Preferably, in step S2, the expression of the optimal control model established by using the AGV dynamics model as a dynamic constraint and using the time-burnup optimization as an objective function is:
objective function:
constraint conditions:
wherein ,J1 Indicating burnup;representing the importance degree of time optimization and fuel consumption optimization in the balance optimization target; e, e i Representing a path constraint designed from known obstacle locations, satisfying: />
Where i=1,..,representing the position coordinate of the ith barrier at t moment, r i Representing the corresponding obstacle radius, k representing a collision early warning threshold;Representing the path constraint of the transformation, ε is a positive number approaching 0;Representing an AGV dynamics model, s (t) = [ x (t), y (t), v (t), phi (t), theta (t)]As state variables, c= [ α (t), ω (t)]For control rate, bound (s (t), u (t)) represents the boundary value constraints of the AGV; s (t) 0) and s(tf ) Indicating a given initial and final status of the AGV.
Preferably, the randomly generated initial point of the AGV in step S3 is expressed as:
s(t 0 )=[x_random,y_random,0,0,0]
taking the initial point as an initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction, wherein the method is not limited to a pseudo-spectrum method and a targeting method in an optimal control direct method; firstly, interpolating a state variable S (tau) and a control rate C (tau), wherein the interpolation method is only to obtain a value of a middle point for solving and calculating, is not limited to a Lagrange interpolation method, and finally generates a plurality of optimal control tracks with different starting points, and is expressed as follows:
wherein ,represents an optimal set of control trajectories, each (s t ,c t ) And the "optimal state-control rate" pair (s, c) is formed.
Preferably, according to different target position integration, based on the "optimal state-control rate" pair (s, c) corresponding to each target position, the optimal control trajectory integration is categorized into different training data sets, the correspondence being characterized as:
wherein ,representing an independent dataset comprising 'optimal state-control rate' pairs obtained for all optimal control trajectories in the target state for the A-position +.>The same applies.
Preferably, the constructed width learning network comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises characteristic nodes, enhancement nodes and incremental enhancement nodes;
let S denote the optimal state in training data set with a certain position as target state, C denote the control rate in training data set with a certain position as target state, S input the input layer of the width learning network, then form n groups of characteristic node matrix by n groups of characteristic mapping, let Z i Representing the ith group of characteristic nodes, and splicing n groups of characteristic node matrixes into: z is Z n =[Z 1 ,Z 2 ,...,Z n ]Wherein the ith set of feature nodes is represented as:
Z i =Q(SW ei +β ei ),i=1,2,...,n
wherein Q represents a linear or nonlinear activation function, W ei and βei Respectively randomly initialized weights and biases; the mapping features are enhancement nodes with randomly generated weights, and m groups of enhancement node matrixes H are formed by nonlinear transformation on the basis of a feature node matrix m =[H 1 ,H 2 ,...,H m ],H j Representing the j-th set of enhancement nodes, expressed as:
H j =ξ(Z n W hj +β hj ),j=1,2,...,m
wherein ζ represents a nonlinear activation function, W hj and βhj Random weights and biases, respectively; hidden layer node matrix is spliced into A m =[Z n |H m ]The output of the breadth-learning network is:
after the feature node is unchanged and the enhancement node is newly added, the hidden layer is changed into A m+1 =[A m |ξ(Z n W m+1 +β m+1 )],W m+1 and βm+1 Respectively, new random weights and biases, which are randomly generated and kept unchanged during training, by newly increasing W m+1 The table capacity of the width learning network is enhanced, the final fixed network structure is fitted to the target output control rate C, the weight between the hidden layer and the output layer is solved by means of a pseudo-inverse matrix, the weight is approximated by a ridge regression method, and the expression of the pseudo-inverse matrix is as follows:
w is then m =(A m )+C。
Preferably, the width learning network is incrementally trained by adding the enhancement nodes, and the newly added enhancement nodes are expressed as:the hidden layer is expressed as: a is that m+1 =[A m |H m+1 ]The pseudo-inverse of the change due to the newly added enhancement node is expressed as:
wherein , after the enhancement node is added, the weight matrix expression of the mapping relation from the optimal state to the optimal control is as follows:
at this time, the actual output of the width learning network output layer is:
wherein I F If the error does not meet the threshold value, the incremental training of the width learning network is continued by increasing the enhancement nodes; and when the error meets the threshold value, stopping increasing the enhancement nodes, and storing the width learning network model at the moment.
The training of the width learning network does not have the parameter adjustment work with strong human priori, does not have the process of slow gradient optimization until the objective function converges, and can obtain the network parameter matrix through solving the pseudo-inverse matrix by only using an iterative incremental learning method, thereby having low time consumption.
Preferably, the training data set is subjected to a normalization process of the data prior to use in the breadth-learning network training.
Preferably, after the optimal state S and the control rate C in the training data set in the state of targeting a certain position are input into the width learning network to complete training, the result output by the width learning network needs inverse normalization processing.
Preferably, after the training is completed by inputting the optimal state S and the control rate C in the training data set in a state where a certain position is a target, the weight of the hidden layer of the current width learning network is saved, and the other positions except the position are extracted as the optimal state and the control rate in the training data set in the target state for training until the training data sets corresponding to all the target positions are traversed, and the training of the width learning network corresponding to the target positions one by one is completed.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
compared with the traditional method of obtaining the path track by only directly solving the optimal control model offline, the AGV implementation path planning method based on the optimal control and the width learning provided by the invention has the advantages that the defect that the real-time optimal control cannot be realized by offline optimization to obtain the optimal track is emphasized, in order to avoid the hysteresis caused by offline optimization solution, the width learning network is introduced, different target points are used as classification basis, the optimal control track is classified into training data sets of different navigation tasks, the width learning network is incrementally trained, the width learning network which is finally used for AGV real-time path planning is obtained, the real-time optimal control of the AGV at any initial point within a certain range is realized, and when the width learning network is used for online operation, the problem of large calculation amount does not exist because the control rate prediction only relates to simple matrix operation, and the real-time performance of path planning is ensured. In addition, the method provided by the invention has no problems of abrasion of the mark, difficult path expansion, high environmental requirement and long drawing time, compared with the offline training of the deep neural network, the training of the width learning network has no artificial priori strong parameter adjustment work, has no slow gradient optimization until the objective function converges, and can complete the training by only obtaining the weight parameter matrix through an iterative incremental learning method, thereby having low time consumption.
Drawings
FIG. 1 is a schematic flow chart of an AGV adaptive path planning method based on optimal control and width learning according to embodiment 1 of the present invention;
FIG. 2 is a physical schematic diagram corresponding to the AGV dynamics model proposed in embodiment 1 of the present invention;
fig. 3 is a diagram showing the overall framework of the width learning network proposed in embodiment 1 of the present invention;
FIG. 4 is a schematic diagram showing the AGV path output at different target positions as proposed in embodiment 3 of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
it will be appreciated by those skilled in the art that some well known descriptions in the figures may be omitted.
The technical scheme of the invention is further described below with reference to the accompanying drawings and examples.
The positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent;
example 1
The path planning problem of an AGV can be regarded as a path planning problem in which the starting point of the AGV is not fixed, but the end point (i.e., the coordinate point of the delivery task) is fixed, and the conventional path method can be divided into: the conventional path searching method is usually based on a grid map, the searched path does not necessarily conform to the dynamic constraint of the vehicle (the vehicle cannot possibly make a traversing action), so that uncertainty of optimization time and optimization quality is brought to the later path optimization, and therefore, in the practical implementation, in order to ensure the validity of a path planning result, the dynamic model of the AGV is considered in the planning as the constraint at the beginning stage, and at the moment, the path planning problem of the AGV is converted into the path planning problem with fixed starting point and end point, so that the optimal control method for solving the boundary value problem of two points is used for carrying out forward solving.
Specifically, referring to fig. 1, the present embodiment proposes an AGV real-time path planning method based on optimal control and width learning, where the method includes the following steps:
s1, constructing an AGV dynamics model;
s2, taking an AGV dynamic model as dynamic constraint and taking time-burnup optimization as an objective function to establish an optimal control model;
s3, randomly generating an AGV initial point, taking the initial point as an initial condition for solving an optimal control model, and solving the optimal control model in an offline forward direction to generate a plurality of optimal control tracks with different initial points, wherein the optimal control tracks comprise 'optimal state-control rate' pairs;
s4, constructing a width learning network, integrating and classifying the optimal control track into different training data sets according to target positions based on an optimal state-control rate pair, and performing incremental offline training on the width learning network;
s5, after the offline training is completed, weight parameters of the width learning network are determined, and the trained width learning network is used as a real-time controller for real-time planning of the AGV path.
In this embodiment, based on newton classical mechanics and according to the general nature of the actual AGV that has been dosed, a two-degree-of-freedom vehicle model is used to model the kinematics of the AGV, and in conjunction with fig. 2, the dynamics model of the AGV is expressed as:
wherein t is a time variable, t.epsilon.0, t f ],t f Designating the moment corresponding to the end state, wherein x (t) and y (t) represent the abscissa and ordinate of the position coordinate of the midpoint of the AGV at the moment t, and P= (x, y) represents the position coordinate of the center of the AGV; θ (t) represents the azimuth angle between the AGV and the target position at time t, φ (t) represents the steering angle at time t, and α (t) represents the acceleration in the azimuth direction at time t; omega (t) represents angular velocity, L w Indicating the track length of the AGV.
The AGV dynamic model is used as dynamic constraint, the collision constraint is smoothed, and the time-burnup optimization is used as an objective function, so that the track planning of the AGV is converted into solving of an optimal control problem with corresponding constraint, and the established optimal control model expression is:
objective function (optimization objective with time-burn-up optimization):
constraint conditions:
wherein ,J1 Indicating burnup;representing the importance degree of time optimization and fuel consumption optimization in the balance optimization target; e, e i Representing a path constraint designed from known obstacle locations, satisfying:
where i=1,..,representing the position coordinate of the ith barrier at t moment, r i Representing the corresponding obstacle radius, k representing a collision early warning threshold;Representing the path constraint of the transformation, ε is a positive number approaching 0;Representing an AGV dynamics model, s (t) = [ x (t), y (t), v (t), phi (t), theta (t)]As state variables, c= [ α (t), ω (t)]For control rate, bound (s (t), u (t)) represents the boundary value constraints of the AGV; s (t) 0) and s(tf ) Indicating a given initial and final status of the AGV.
Because the AGV is in a scene of material handling, the AGV is an optimal control problem with fixed end state, and the method for solving the optimal control model is not limited to one of the direct methods of optimal control by independently taking the AGV as the end state to carry out iterative calculation aiming at a required handling destination. Specifically, let the randomly generated initial AGV point be:
s(t 0 )=[x_random,y_random,0,0,0]
taking the initial point as an initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction, wherein the method is not limited to a pseudo-spectrum method and a targeting method in an optimal control direct method; firstly, interpolating a state variable S (tau) and a control rate C (tau) for solving and calculating, wherein the interpolation method is not limited to a Lagrange interpolation method, and the state variable S (tau) and the control rate C (tau) are interpolated, and the process is as follows:
finally, generating a plurality of optimal control tracks with different starting points, wherein the optimal control tracks are expressed as follows:
wherein ,represents an optimal set of control trajectories, each (s t ,c t ) And the "optimal state-control rate" pair (s, c) is formed.
Many track optimization problems in actual engineering, such as spacecraft track optimization, unmanned vehicle track optimization and the like, are two-point boundary value problems, can be expressed as optimal control problems to solve, and because of high-order nonlinearity of a model and complex path constraint conditions, the problem is relatively complex to directly solve, and only numerical solutions can be generally obtained. The traditional optimal control theory is based on a variational method and a Pontryagin extremum principle, the optimal track design problem is converted into a Hamiltonian-Jacobian-Belman equation and a two-point boundary value problem, and then the control object performs track tracking along the designed track. This approach is commonly used for optimal control problems where performance metrics are defined to maximize search area, minimize time consumption, minimize burnup plans, minimize end state errors, etc. The main idea of the traditional optimal control method is to track a pre-designed optimal track according to a real state, however, one important problem related to the strategy is that as the model is a nonlinear differential equation, the differential equation is solved in an optimization process, and for a complex scene of the model, an analytical solution is difficult to solve.
On the premise of offline solution, the aim that optimal control is difficult to realize in real time by offline optimization is considered to achieve the aim of optimal track, in order to avoid hysteresis caused by offline optimization solution, a width learning network is introduced, and before formally being used for the width learning network, the optimal control track is integrated and classified into training data sets of different navigation tasks by taking different target points as classification basis, specifically:
according to different target position integration, based on the 'optimal state-control rate' pair (s, c) corresponding to each target position, classifying the optimal control track integration into different training data sets, and correspondingly characterizing as:
wherein ,representing an independent dataset comprising 'optimal state-control rate' pairs obtained for all optimal control trajectories in the target state for the A-position +.>The same applies.
In this embodiment, referring to fig. 3, the constructed width learning network includes an input layer, a hidden layer and an output layer, where the hidden layer includes feature nodes, enhancement nodes and incremental enhancement nodes, the structure is formed by introducing incremental learning ideas on the basis of a width learning system, and the new structure can iteratively improve the feature extraction capability of the model, increase the expression capability of the model, so that the fitting performance of the model is improved, and the width learning network can quickly learn to be closer to the actual law by using these updated weights, and the specific process is as follows:
let S denote the optimal state in training data set with a certain position as target state, C denote the control rate in training data set with a certain position as target state, S input the input layer of the width learning network, then form n groups of characteristic node matrix by n groups of characteristic mapping, let Z i Representing the ith group of characteristic nodes, and splicing n groups of characteristic node matrixes into: z is Z n =[Z 1 ,Z 2 ,...,Z n ]In order to obtain sparse representation of input data in the feature mapping process, weights of an input layer and a hidden layer can be adjusted through a sparse self-coding technology, and optimal weights are automatically selected in the decoding process. Wherein the i-th set of feature nodes is represented as:
Z i =Q(SS ei +β ei ),i=1,2,...,n
wherein Q represents a linear or nonlinear activation function, W ei and βei Respectively randomly initialized weights and biases; the mapping is characterized by randomly generating weighted enhancement nodes, in particularOn the basis of the sign node matrix, forming m groups of enhanced node matrixes H through nonlinear transformation m =[H 1 ,H 2 ,...,H m ],H j Representing the j-th set of enhancement nodes, expressed as:
H j =ξ(Z n W hj +β hj ),j=1,2,...,m
wherein ζ represents a nonlinear activation function, W hj and βhj Random weights and biases, respectively; hidden layer node matrix is spliced into A m =[Z n |H m ]The output of the breadth-learning network is:
after the feature node is unchanged and the enhancement node is newly added, the hidden layer is changed into A m+1 =[A m |ξ(Z n W m+1 +β m+1 )],W m+1 and βm+1 Respectively, new random weights and biases, which are randomly generated and kept unchanged during training, by newly increasing W m+1 The table capacity of the width learning network is enhanced, the final fixed network structure is fitted to the target output control rate C, the weight between the hidden layer and the output layer is solved by means of a pseudo-inverse matrix, the weight is approximated by a ridge regression method, and the expression of the pseudo-inverse matrix is as follows:
w is then m =(A m )+C。
In order to make the width network have better fitting performance, the width learning network is incrementally trained by adding the enhancement nodes, and the newly added enhancement nodes are expressed as:the hidden layer is expressed as: a is that m+1 =[A m |H m+1 ]Because of adding the reinforcing sectionThe pseudo-inverse matrix of the points as a function is expressed as:
wherein , after the enhancement node is added, the weight matrix expression of the mapping relation from the optimal state to the optimal control is as follows:
at this time, the actual output of the width learning network output layer is:
wherein I F If the error does not meet the threshold value, the incremental training of the width learning network is continued by increasing the enhancement nodes; and when the error meets the threshold value, stopping increasing the enhancement nodes, and storing the width learning network model at the moment. The training of the width learning network does not have the parameter adjustment work with strong human priori, does not have the process of slow gradient optimization until the objective function converges, and can be obtained by solving the pseudo-inverse matrix through an iterative incremental learning methodAnd the time consumption is low when the network parameter matrix is output.
After the optimal state S and the control rate C in the training data set with a certain position as the target state are input into the width learning network to complete training, the weight of the hidden layer of the current width learning network is saved, other positions except the position are extracted to be used as the optimal state and the control rate in the training data set with the target state to complete training until the training data sets corresponding to all the target positions are traversed, and the width learning network training corresponding to a plurality of the target positions one by one is completed.
Example 2
In this example, the breadth-learning network was subjected to normalization of the data prior to use in the breadth-learning network training, including but not limited to max-min normalization, Z-score normalization, and function transformation, since the magnitude of the data set was not within one magnitude, in addition to the training described in example 1.
After the training is completed by the input width learning network of the optimal state S and the control rate C in the training data set with a certain position as a target state (such as starting as) The result output by the width learning network is required to be inversely normalized, and finally is used as the control rate conforming to the physical meaning.
Example 3
In this embodiment, on the basis of embodiment 1 and embodiment 2, focus is considered on the discussion of the path planning of the AGV in which the initial point is extended to any point and the destination is extended to different end states, under the condition that different initial points are obtained based on a certain fixed transport end point, a width learning network is independently trained on the "optimal state-control rate" pair (s, c), and the weight matrix W conforming to the mapping relationship from the optimal state to the optimal control can be quickly learned by combining with the pseudo-inverse matrix solution of the incremental method, that is, the optimal control problem when the initial point is changed can be solved, and the initial point in the online optimal control is promoted to any point in the set area.
The number of the enhanced nodes M and the number of the newly added enhanced nodes can be selected by balancing according to the calculation power and the prediction precision in a specific scene.
For different carrying target points, namely end states, in an actual carrying scene, only the end states need to be repeatedly and positively solved for the optimal control problem, the width learning network weight parameters corresponding to the different end states are respectively learned and then stored in a network, and referring to fig. 4, real-time optimal path planning control on different end states (such as a target A and a target Z different from the target A in fig. 4) is realized by calling each trained width learning network.
It is to be understood that the above examples of the present invention are provided by way of illustration only and are not intended to limit the scope of the invention. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are desired to be protected by the following claims.
Claims (8)
1. An AGV real-time path planning method based on optimal control and width learning is characterized by comprising the following steps:
s1, constructing an AGV dynamics model;
the AGV dynamics model described in the step S1 is as follows:
wherein t is a time variable, t.epsilon.0, t f ],t f Designating the moment corresponding to the end state, wherein x (t) and y (t) represent the abscissa and ordinate of the position coordinate of the midpoint of the AGV at the moment t, and P= (x, y) represents the position coordinate of the center of the AGV; θ (t) represents the azimuth angle between the AGV and the target position at time t, φ (t) represents the steering angle at time t, and α (t) represents the azimuth angle at time tAcceleration in the direction; omega (t) represents angular velocity, L w Indicating the track length of the AGV;
s2, taking an AGV dynamic model as dynamic constraint and taking time-burnup optimization as an objective function to establish an optimal control model;
in the step S2, the AGV dynamic model is used as dynamic constraint, the time-burnup optimization is used as an objective function, and the established optimal control model expression is as follows:
objective function:
constraint conditions:
wherein ,J1 Indicating burnup;representing the importance degree of time optimization and fuel consumption optimization in the balance optimization target; e, e i Representing a path constraint designed from known obstacle locations, satisfying:
where i=1, …, N,representing the position coordinate of the ith barrier at t moment, r i Representing the corresponding obstacle radius, k representing a collision early warning threshold;Representing the path constraint of the transformation, ε is a positive number approaching 0;Representing an AGV dynamics model, s (t) = [ x (t), y (t), v (t), phi (t), theta (t)]As state variables, c= [ α (t), ω (t)]For control rate, bound (s (t), u (t)) represents the boundary value constraints of the AGV; s (t) 0) and s(tf ) Indicating the initial state and the final state of the given AGV;
s3, randomly generating an AGV initial point, taking the initial point as an initial condition for solving an optimal control model, and solving the optimal control model in an offline forward direction to generate a plurality of optimal control tracks with different initial points, wherein the optimal control tracks comprise 'optimal state-control rate' pairs;
s4, constructing a width learning network, integrating and classifying the optimal control track into different training data sets according to target positions based on an optimal state-control rate pair, and performing incremental offline training on the width learning network;
s5, after the offline training is completed, weight parameters of the width learning network are determined, and the trained width learning network is used as a real-time controller for real-time planning of the AGV path.
2. The method for planning a real-time path of an AGV based on optimal control and width learning according to claim 1, wherein the randomly generated initial point of the AGV in step S3 is expressed as:
s(t 0 )=[x_random,y_random,0,0,0]
taking the initial point as an initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction; firstly, interpolating state variables and control rates for solving and calculating, and finally generating a plurality of optimal control tracks with different starting points, wherein the optimal control tracks are expressed as follows:
3. The AGV real-time path planning method according to claim 2, wherein the optimal control trajectory integration is categorized into different training data sets based on the "optimal state-control rate" pair (s, c) corresponding to each target position according to different target position integration, and the correspondence is characterized by:
wherein ,representing an independent dataset comprising 'optimal state-control rate' pairs obtained for all optimal control trajectories in the target state for the A-position +.>The independent data sets which are summarized by the optimal state-control rate obtained by all the optimal control tracks in the B position as a target state and the optimal state-control rate obtained by all the optimal control tracks in the C position as a target state are respectively represented.
4. The AGV real-time path planning method based on optimal control and width learning according to claim 3, wherein the constructed width learning network comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises feature nodes, enhancement nodes and incremental enhancement nodes;
let S denote the optimal state in training data set with a certain position as target state, C denote the control rate in training data set with a certain position as target state, S input the input layer of width learning network, and then input the data into n groupsFeature mapping to form n groups of feature node matrixes, and setting Z i Representing the ith group of characteristic nodes, and splicing n groups of characteristic node matrixes into: z is Z n =[Z 1 ,Z 2 ,...,Z n ]Wherein the ith set of feature nodes is represented as:
Z i =Q(SW ei +β ei ),i=1,2,...,n
wherein Q represents a linear or nonlinear activation function, W ei and βei Respectively randomly initialized weights and biases; the mapping features are enhancement nodes with randomly generated weights, and m groups of enhancement node matrixes H are formed by nonlinear transformation on the basis of a feature node matrix m =[H 1 ,H 2 ,…,H m ],H j Representing the j-th set of enhancement nodes, expressed as:
H j =ξ(Z n W hj +β hj ),j=1,2,...,m
wherein ζ represents a nonlinear activation function, W hj and βhj Random weights and biases, respectively; hidden layer node matrix is spliced into A m =[Z n |H m ]The output of the breadth-learning network is:
after the feature node is unchanged and the enhancement node is newly added, the hidden layer is changed into A m+1 =[A m |ξ(Z n W m+1 +β m+1 )],W m+1 and βm+1 Respectively, new random weights and biases, which are randomly generated and kept unchanged during training, by newly increasing W m+1 The table capacity of the width learning network is enhanced, the final fixed network structure is fitted to the target output control rate C, the weight between the hidden layer and the output layer is solved by means of a pseudo-inverse matrix, the weight is approximated by a ridge regression method, and the expression of the pseudo-inverse matrix is as follows:
w is then m =(A m ) + C。
5. The method for planning a real-time path of an AGV based on optimal control and width learning according to claim 4, wherein the incremental training of the width learning network by adding the newly added enhancement nodes is performed by:Z m+1 representing the m+1th group of characteristic nodes, +.>Represents the h m+1 The set of random weights is chosen to be of a type,represents the h m+1 Group bias, then the hidden layer is expressed as: a is that m+1 =[A m |H m+1 ]The pseudo-inverse of the change due to the newly added enhancement node is expressed as:
wherein , after the enhancement node is added, the weight matrix expression of the mapping relation from the optimal state to the optimal control is as follows:
at this time, the actual output of the width learning network output layer is:
wherein I F If the error does not meet the threshold value, the incremental training of the width learning network is continued by increasing the enhancement nodes; and when the error meets the threshold value, stopping increasing the enhancement nodes, and storing the width learning network model at the moment.
6. The optimal control and width learning based AGV real time path planning method according to claim 5 wherein the training dataset is normalized before being used for the width learning network training.
7. The method for planning a real-time path of an AGV based on optimal control and width learning according to claim 6, wherein after the training is completed by the width learning network for inputting the optimal state S and the control rate C in the training data set in a state of targeting a certain position, the result outputted by the width learning network is subjected to inverse normalization.
8. The method for planning the real-time path of the AGV based on the optimal control and the width learning according to claim 7, wherein after the optimal state S and the control rate C in the training data set in a state of taking a certain position as a target are input into the width learning network to complete training, weight information of a hidden layer of the current width learning network is stored, other positions except the position are extracted to be used as the optimal state and the control rate in the training data set in the target state to complete training until the training data sets corresponding to all the target positions are traversed, and the training of the width learning network corresponding to a plurality of target positions one by one is completed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111482549.5A CN114200936B (en) | 2021-12-06 | 2021-12-06 | AGV real-time path planning method based on optimal control and width learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111482549.5A CN114200936B (en) | 2021-12-06 | 2021-12-06 | AGV real-time path planning method based on optimal control and width learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114200936A CN114200936A (en) | 2022-03-18 |
CN114200936B true CN114200936B (en) | 2023-06-13 |
Family
ID=80650869
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111482549.5A Active CN114200936B (en) | 2021-12-06 | 2021-12-06 | AGV real-time path planning method based on optimal control and width learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114200936B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117391575B (en) * | 2023-12-08 | 2024-03-22 | 青岛盈智科技有限公司 | Truck transportation route planning method based on path analysis |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635245A (en) * | 2018-09-29 | 2019-04-16 | 中国矿业大学 | A kind of robust width learning system |
CN109884886B (en) * | 2019-03-29 | 2021-09-28 | 大连海事大学 | Ship motion model-free adaptive optimal control method based on width learning |
CN111880405B (en) * | 2020-07-03 | 2022-06-14 | 广东工业大学 | AGV self-adaptive path planning real-time control method in flexible manufacturing workshop system |
-
2021
- 2021-12-06 CN CN202111482549.5A patent/CN114200936B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN114200936A (en) | 2022-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111780777B (en) | Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning | |
Li et al. | Prescribed performance concurrent control of connected vehicles with nonlinear third-order dynamics | |
Zhao et al. | A novel direct trajectory planning approach based on generative adversarial networks and rapidly-exploring random tree | |
Orozco-Rosas et al. | Mobile robot path planning using a QAPF learning algorithm for known and unknown environments | |
CN109491389A (en) | A kind of robot trace tracking method with constraint of velocity | |
CN115683145A (en) | Automatic driving safety obstacle avoidance method based on track prediction | |
CN111880405B (en) | AGV self-adaptive path planning real-time control method in flexible manufacturing workshop system | |
Li et al. | Navigation of mobile robots based on deep reinforcement learning: Reward function optimization and knowledge transfer | |
Han | Automatic parking path planning based on ant colony optimization and the grid method | |
CN114200936B (en) | AGV real-time path planning method based on optimal control and width learning | |
CN113391633A (en) | Urban environment-oriented mobile robot fusion path planning method | |
Xu et al. | Model predictive control-based path tracking control for automatic guided vehicles | |
Zhang et al. | Hybrid path planning model for multiple robots considering obstacle avoidance | |
CN113959446B (en) | Autonomous logistics transportation navigation method for robot based on neural network | |
CN115903894A (en) | Unmanned aerial vehicle trajectory planning and tracking control method based on improved AAPF-IRRT algorithm | |
Xu et al. | Path Planning for Autonomous Articulated Vehicle Based on Improved Goal‐Directed Rapid‐Exploring Random Tree | |
Yu et al. | Hierarchical reinforcement learning combined with motion primitives for automated overtaking | |
Li et al. | Path planning for intelligent vehicles based on improved D* Lite | |
Chen et al. | Path tracking controller design of automated parking systems via NMPC with an instructible solution | |
Xie et al. | A distributed multi-agent formation control method based on deep Q learning | |
CN115061470A (en) | Unmanned vehicle improved TEB navigation method suitable for narrow space | |
CN114559439A (en) | Intelligent obstacle avoidance control method and device for mobile robot and electronic equipment | |
Lu et al. | A Two-layer MPC Approach for Consistent Planning and Control of Autonomous Vehicles | |
Yang et al. | Automatic control method of driving direction of unmanned ground vehicle based on association rules | |
Lian et al. | A Full Transfer Learning LSTM-Based Fractional Order Optimization Method of GM ($ r $, 2) for Inferring Driving Intention |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |