CN114200936A - AGV real-time path planning method based on optimal control and width learning - Google Patents

AGV real-time path planning method based on optimal control and width learning Download PDF

Info

Publication number
CN114200936A
CN114200936A CN202111482549.5A CN202111482549A CN114200936A CN 114200936 A CN114200936 A CN 114200936A CN 202111482549 A CN202111482549 A CN 202111482549A CN 114200936 A CN114200936 A CN 114200936A
Authority
CN
China
Prior art keywords
agv
width learning
optimal control
time
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111482549.5A
Other languages
Chinese (zh)
Other versions
CN114200936B (en
Inventor
吴宗泽
赖家伦
李嘉俊
任志刚
曾德宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202111482549.5A priority Critical patent/CN114200936B/en
Publication of CN114200936A publication Critical patent/CN114200936A/en
Application granted granted Critical
Publication of CN114200936B publication Critical patent/CN114200936B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • G05D1/0221Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory involving a learning process
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides an AGV real-time path planning method based on optimal control and width learning, which relates to the technical field of AGV path planning, and comprises the steps of firstly constructing an AGV dynamics model, an AGV dynamic model is used as dynamic constraint, time-optimal fuel consumption is used as an objective function, an optimal control model is established and off-line forward solving is carried out to generate optimal control tracks of a plurality of different starting points, in this case, considering that off-line optimization is difficult to realize real-time optimal control to achieve the purpose of optimal trajectory, in order to avoid hysteresis caused by offline optimization solution, a width learning network is introduced, different target points are used as classification bases, the optimal control track is integrated and classified into training data sets of different navigation tasks, the width learning network is trained in an incremental mode, the width learning network finally used for AGV real-time path planning is obtained, and real-time optimal control of AGV path planning at any starting point within a certain range is achieved.

Description

AGV real-time path planning method based on optimal control and width learning
Technical Field
The invention relates to the technical field of AGV path planning, in particular to an AGV real-time path planning method based on optimal control and width learning.
Background
An Automated Guided Vehicle (AGV) is a transport Vehicle equipped with an electromagnetic or optical automatic guide device, which can travel along a predetermined guide path, and has safety protection and various transfer functions, and is a key device of a flexible production system, and plays an important role in automation of object transportation and intelligent storage.
Currently, dynamic and flexible manufacturing environments present many challenges to workshop AGV path planning and real-time control. The AGV can be roughly divided into a remote control type, a semi-autonomous type and an autonomous type according to the control mode and the autonomous degree, and navigation based on a multi-magnetic-track type is a path planning method which is firstly adopted by the AGV and is also a method which is adopted by most path planning of the current AGV. In this method, the AGV determines the travel route by recognizing the magnetic track laid on the ground, but this method is limited by the inflexibility of the magnetic track, and the extended path is relatively complicated; the method is characterized in that vision and two-dimensional code type navigation is also a navigation mode which is applied to the field of the current AGV more frequently, the AGV obtains two-dimensional code information to determine the position and the advancing route by identifying two-dimensional codes which are pasted on the ground at intervals and have uniqueness, and compared with a magnetic track type, the method is more flexible in action and easy to schedule, but has the problems of easy abrasion of identification, high requirements on ambient light and the like; the laser SLAM navigation is to determine the position by AGV transmitting laser signal and then reflecting the signal back by the reflector arranged on the wall or the column, which can overcome the defects of the two methods, but has the problems of long drawing time, high cost and less market application.
With the development of deep learning technology, a method for applying deep learning to AGV path planning comes along, and as disclosed in the prior art, an AGV path planning method and a system based on reinforcement learning are provided, in the method, an AGV dynamics model is firstly established, then, the AGV is taken as an intelligent agent, environmental information sensed by the traveling of the AGV is taken as state information, a state space is designed by considering a destination position and an obstacle position, a continuous action space and a multi-reward mechanism are designed, the Markov process modeling of path planning is completed, in the scheme, the state space can be given any different starting points, target points and obstacles at any positions, the generalization is high, an Actor-Critic framework is introduced for strategy learning training subsequently, the problem of large calculation amount is avoided by online operation, the calculation force requirement is low, the real-time decision control of the AGV on any target and obstacle is realized, however, in the technical scheme of the patent, because a relatively long trial-and-error learning process of the AGV and the environment is involved, convergence is slow, time consumption is relatively high, and a suitable artificial experience dependence part such as reward function design and neural network structure design is involved in the training process.
Disclosure of Invention
In order to solve the problems that in the existing AGV path planning method based on deep learning, the adjustment of deep neural network parameters is complex, and the training process is slow, the invention provides the AGV real-time path planning method based on optimal control and width learning, which does not have the manual parameter adjustment work with strong priori performance, trains the width learning network efficiently in an off-line manner, has low time consumption, and provides further expansion and application for future large-scale workshop AGV formation and obstacle avoidance application.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
an AGV real-time path planning method based on optimal control and width learning comprises the following steps:
s1, constructing an AGV dynamics model;
s2, establishing an optimal control model by taking an AGV dynamic model as dynamic constraint and taking time-fuel consumption optimization as a target function;
s3, randomly generating initial points of the AGV, taking the initial points as initial conditions for solving the optimal control model, solving the optimal control model in an offline forward direction, and generating optimal control tracks of a plurality of different initial points, wherein the optimal control tracks comprise optimal state-control rate pairs;
s4, constructing a width learning network, integrating and classifying the optimal control tracks into different training data sets according to target positions based on the optimal state-control rate pair, and performing incremental offline training on the width learning network;
and S5, after the off-line training is finished, determining the weight parameters of the width learning network, and using the trained width learning network as a real-time controller for the real-time planning of the AGV path.
In the technical scheme, an AGV dynamics model is firstly constructed, the AGV dynamics model is used as dynamic constraint, the time-fuel consumption optimization is used as a target function, an optimal control model is established and solved, the numerical solution of the optimal control model has optimality under the consideration of high computational complexity, the optimal control model is solved in an off-line forward direction on the premise that the optimal control model is not suitable for real-time computation, optimal control tracks of a plurality of different starting points are generated, under the condition, the purpose of optimal path planning is achieved by considering that the off-line optimization is difficult to realize real-time optimal control, in order to avoid the hysteresis caused by the off-line optimization solution, a width learning network is introduced, different target points are used as classification bases, the optimal control tracks are integrated into training data sets of different navigation tasks, the width learning network is trained in an incremental mode, and the width learning network finally used for the real-time path planning of the AGV is obtained, the method has the advantages that the AGV real-time optimal control of any initial point in a certain range is realized, the width learning network is used for online operation, and the real-time performance can be guaranteed because the control rate prediction only relates to simple matrix operation and the problem of large calculation amount does not exist.
Preferably, the AGV dynamics model in step S1 is:
Figure BDA0003395413930000031
wherein t is a time variable, and t belongs to [0, t ∈f],tfX (t), y (t) and x (t) are specified as the time corresponding to the end state, the abscissa and the ordinate of the position coordinate where the middle point of the AGV is located at the time t are represented, and the position coordinate where the center of the AGV is located is represented by P ═ x, y; theta (t) represents an azimuth angle between the AGV and the target position at the time t, phi (t) represents a steering angle at the time t, and alpha (t) represents an acceleration in the azimuth direction at the time t; ω (t) represents angular velocity, LwIndicating the track length of the AGV.
Preferably, in step S2, the AGV dynamics model is used as a dynamic constraint, the time-fuel consumption optimization is used as an objective function, and the optimal control model expression is established as follows:
an objective function:
Figure BDA0003395413930000032
constraint conditions are as follows:
Figure BDA0003395413930000033
wherein ,J1Represents the burn-up;
Figure BDA0003395413930000037
expressing the attention degree of time optimization and fuel consumption optimization in the balance optimization target; e.g. of the typeiRepresenting a path constraint designed from known obstacle positions, satisfying:
Figure BDA0003395413930000034
wherein, i is 1., N,
Figure BDA0003395413930000038
represents the position coordinate of the ith obstacle at t time, riRepresenting the corresponding obstacle radius, and k representing a collision warning threshold;
Figure BDA0003395413930000035
representing the path constraint of the transformation, epsilon is a positive number approaching 0;
Figure BDA0003395413930000036
denotes AGV dynamics model, s (t) [ < x (t), < y (t), < v (t), < t >, θ (t) ]]Is the state variable, c ═ α (t), ω (t)]For the control rate, Bound (s (t), u (t)) represents the boundary value constraint of the AGV; s (t)0) and s(tf) Indicating the initial and final states of a given AGV.
Preferably, the randomly generated AGV initial point of step S3 is expressed as:
s(t0)=[x_random,y_random,0,0,0]
taking the initial point as the initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction, wherein the method is not limited to a pseudo-spectrum method and a target practice method in the optimal control direct method; firstly, interpolating a state variable S (tau) and a control rate C (tau), wherein an interpolation method only obtains a value of a middle point for solving and calculating, the interpolation method is not limited to a Lagrange interpolation method, and finally, optimal control tracks of a plurality of different starting points are generated, and the optimal control tracks are expressed as follows:
Figure BDA0003395413930000041
wherein ,
Figure BDA0003395413930000042
representing a set of optimal control trajectories, each(s)t,ct) Constitute an "optimal state-control rate" pair (s, c).
Preferably, according to different target position integration, based on the corresponding "optimal state-control rate" pair (s, c) of each target position, the optimal control trajectory integration is classified into different training data sets, and the correspondence is characterized as:
Figure BDA0003395413930000043
wherein ,
Figure BDA0003395413930000044
the representation contains an independent data set summarized by all the optimal control tracks under the state that the position A is taken as a target,
Figure BDA0003395413930000045
the same applies.
Preferably, the constructed width learning network comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises feature nodes, enhancement nodes and increment enhancement nodes;
s is set to represent the optimal state in the training data set with a certain position as a target state, C is set to represent the control rate in the training data set with the certain position as the target state, S is input into an input layer of a width learning network and then is subjected to n sets of feature mapping to form n sets of feature node matrixes, and Z is setiAnd (3) representing the ith group of characteristic nodes, and splicing the n groups of characteristic node matrixes into: zn=[Z1,Z2,...,Zn]Wherein, the ith group of feature nodes is represented as:
Zi=Q(SWeiei),i=1,2,...,n
wherein Q represents a linear or non-linear activation function, Wei and βeiWeights and offsets for random initialization, respectively; the mapping characteristics are enhanced nodes for randomly generating weights, and m groups of enhanced node matrixes H are formed through nonlinear transformation on the basis of the characteristic node matrixesm=[H1,H2,...,Hm],HjRepresents the jth group of enhanced nodes, represented as:
Hj=ξ(ZnWhjhj),j=1,2,...,m
where ξ denotes a nonlinear activation function, Whj and βhjRandom weight and bias, respectively; the hidden layer node matrix is spliced into Am=[Zn|Hm]The output of the width learning network is:
Figure BDA0003395413930000051
the characteristic node is not changed, and after the enhanced node is newly added, the hidden layer is changed into Am+1=[Am|ξ(ZnWm+1m+1)],Wm+1 and βm+1Respectively, new random weight and bias, which are generated randomly and kept unchanged in the training process, are added with Wm+1The tabular ability of the width learning network is enhanced, so that the final fixed network structure realizes the aimAnd (3) fitting the standard output control rate C, solving the weight between the hidden layer and the output layer by means of a pseudo-inverse matrix, and approximating by a ridge regression method, wherein the expression of the pseudo-inverse matrix is as follows:
Figure BDA0003395413930000052
then W ism=(Am)+C。
Preferably, the width learning network is incrementally trained by adding the enhanced node, and the added enhanced node is represented as:
Figure BDA0003395413930000053
the hidden layer is then represented as: a. them+1=[Am|Hm+1]The pseudo-inverse matrix changed by the newly added enhanced node is expressed as:
Figure BDA0003395413930000054
wherein ,
Figure BDA0003395413930000055
Figure BDA0003395413930000056
after the enhanced nodes are added, the weight matrix expression from the optimal state to the optimal control mapping relation is as follows:
Figure BDA0003395413930000057
at this time, the actual output of the width learning network output layer is:
Figure BDA0003395413930000058
calculating actual output
Figure BDA0003395413930000059
Error from control rate c in training dataset (s, c):
Figure BDA00033954139300000510
wherein | · | purple sweetFIf the error does not meet the threshold value, the width learning network is trained in an incremental mode by adding the enhanced nodes; and when the error meets the threshold value, stopping adding the enhanced nodes, and storing the width learning network model at the moment.
In the training of the width learning network, artificial parameter adjustment work with strong priori performance does not exist, a slow gradient optimization process till the objective function is converged does not exist, the network parameter matrix can be obtained only through an iterative incremental learning method and through the solution of a pseudo-inverse matrix, and the time consumption is low.
Preferably, the training data set is subjected to a normalization process of the data prior to use in the width learning network training.
Preferably, after the optimal state S and the control rate C in the training data set in the state of taking a certain position as a target are input into the width learning network to complete training, the result output by the width learning network needs to be subjected to inverse normalization processing.
Preferably, after the optimal state S and the control rate C in the training data set with a certain position as a target state are input to the width learning network to complete training, the weight of the hidden layer of the current width learning network is saved, and other positions except the position are extracted as the optimal state and the control rate in the training data set with the target state to perform training until the training data sets corresponding to all target positions are traversed, and the training of the width learning network corresponding to the target positions one to one is completed.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
compared with the traditional method of obtaining the path track by only directly solving the optimal control model offline, the method for implementing the path planning based on the optimal control and the width learning emphasizes the defect that the offline optimization cannot realize the real-time optimal control to obtain the optimal track, introduces the width learning network in order to avoid the hysteresis caused by the offline optimization solution, takes different target points as classification bases, classifies the optimal control track into training data sets of different navigation tasks, trains the width learning network incrementally to obtain the width learning network finally used for AGV real-time path planning, realizes the AGV real-time optimal control of any initial point in a certain range, and ensures the real-time performance of the path planning because the control rate prediction only relates to simple matrix operation when the width learning network is used for online operation. In addition, the method provided by the invention has the advantages that the problems of identification abrasion, difficult path expansion, high environmental requirement and long drawing time do not exist, in comparison with the offline training of a deep neural network, the training of a width learning network does not have the parameter adjusting work with strong artificial priority, the process from slow gradient optimization to target function convergence does not exist, the training can be completed only by obtaining a weight parameter matrix through an iterative incremental learning method, and the time consumption is low.
Drawings
Fig. 1 is a schematic flowchart illustrating an AGV adaptive path planning method based on optimal control and width learning according to embodiment 1 of the present invention;
FIG. 2 is a physical schematic diagram of an AGV dynamics model according to embodiment 1 of the present invention;
fig. 3 is an overall frame diagram of a width learning network proposed in embodiment 1 of the present invention;
FIG. 4 is a schematic diagram of AGV path output at different target positions as proposed in embodiment 3 of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
it will be understood by those skilled in the art that certain well-known descriptions of the figures may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
example 1
The path planning problem of the AGV can be regarded as a trajectory planning problem that a starting point of the AGV is not fixed, but a terminal point (i.e. a coordinate point of a distribution task) is fixed, and a conventional trajectory method can be divided into: the method comprises the steps of path searching and track optimization, but a traditional path searching method is usually based on a grid map, a searched route does not necessarily accord with vehicle dynamics constraint (a vehicle cannot move transversely), therefore, uncertainty of optimization time and optimization quality is brought to track optimization in the later period, and therefore, when the method is actually implemented, in order to guarantee validity of a path planning result, it is necessary to consider a dynamical model of the AGV as constraint in planning in the initial stage, at this time, the AGV path planning navigation problem is converted into a track planning problem with a fixed starting point and a fixed end point, and forward solution is carried out by an optimal control method for solving a two-point boundary value problem.
Specifically, referring to fig. 1, the present embodiment provides an AGV real-time path planning method based on optimal control and width learning, including the following steps:
s1, constructing an AGV dynamics model;
s2, establishing an optimal control model by taking an AGV dynamic model as dynamic constraint and taking time-fuel consumption optimization as a target function;
s3, randomly generating initial points of the AGV, taking the initial points as initial conditions for solving the optimal control model, solving the optimal control model in an offline forward direction, and generating optimal control tracks of a plurality of different initial points, wherein the optimal control tracks comprise optimal state-control rate pairs;
s4, constructing a width learning network, integrating and classifying the optimal control tracks into different training data sets according to target positions based on the optimal state-control rate pair, and performing incremental offline training on the width learning network;
and S5, after the off-line training is finished, determining the weight parameters of the width learning network, and using the trained width learning network as a real-time controller for the real-time planning of the AGV path.
In this embodiment, based on newton's classical mechanics, and according to the general nature of AGVs that have been put into service actually, adopt the vehicle model of two degrees of freedom to carry out kinematics modeling to AGVs, combine fig. 2, the AGV dynamics model is expressed as:
Figure BDA0003395413930000071
wherein t is a time variable, and t belongs to [0, t ∈f],tfX (t), y (t) and x (t) are specified as the time corresponding to the end state, the abscissa and the ordinate of the position coordinate where the middle point of the AGV is located at the time t are represented, and the position coordinate where the center of the AGV is located is represented by P ═ x, y; theta (t) represents an azimuth angle between the AGV and the target position at the time t, phi (t) represents a steering angle at the time t, and alpha (t) represents an acceleration in the azimuth direction at the time t; ω (t) represents angular velocity, LwIndicating the track length of the AGV.
The method comprises the following steps of taking an AGV dynamics model as dynamic constraint, smoothing collision constraint, and taking time-fuel consumption optimization as a target function, so that the trajectory planning of the AGV is converted into the optimal control problem with corresponding constraint, and the established optimal control model expression is as follows:
objective function (with time-burnup optimum as optimization objective):
Figure BDA0003395413930000081
constraint conditions are as follows:
Figure BDA0003395413930000082
wherein ,J1Represents the burn-up;
Figure BDA0003395413930000086
expressing the attention degree of time optimization and fuel consumption optimization in the balance optimization target; e.g. of the typeiRepresenting a path constraint designed from known obstacle positions, satisfying:
Figure BDA0003395413930000083
wherein, i is 1., N,
Figure BDA0003395413930000087
represents the position coordinate of the ith obstacle at t time, riRepresenting the corresponding obstacle radius, and k representing a collision warning threshold;
Figure BDA0003395413930000084
representing the path constraint of the transformation, epsilon is a positive number approaching 0;
Figure BDA0003395413930000085
denotes AGV dynamics model, s (t) [ < x (t), < y (t), < v (t), < t >, θ (t) ]]Is the state variable, c ═ α (t), ω (t)]For the control rate, Bound (s (t), u (t)) represents the boundary value constraint of the AGV; s (t)0) and s(tf) Indicating the initial and final states of a given AGV.
The AGV is in a material handling scene, so that the AGV is an optimal control problem with a fixed final state, the AGV is separately used as the final state aiming at a required handling destination, iterative calculation is carried out, and the method for solving the optimal control model is not limited to one of direct methods of optimal control. Specifically, let the randomly generated AGV initial point be expressed as:
s(t0)=[x_random,y_random,0,0,0]
taking the initial point as the initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction, wherein the method is not limited to a pseudo-spectrum method and a target practice method in the optimal control direct method; firstly, interpolating the state variable S (tau) and the control rate C (tau) for solving and calculating, wherein the interpolation method is not limited to a Lagrange interpolation method, the state variable S (tau) and the control rate C (tau) are interpolated, and the process meets the following requirements:
Figure BDA0003395413930000091
Figure BDA0003395413930000092
Figure BDA0003395413930000093
Figure BDA0003395413930000094
and finally generating optimal control tracks of a plurality of different starting points, wherein the optimal control tracks are expressed as follows:
Figure BDA0003395413930000095
wherein ,
Figure BDA0003395413930000096
representing a set of optimal control trajectories, each(s)t,ct) Constitute an "optimal state-control rate" pair (s, c).
Many trajectory optimization problems in practical engineering, such as spacecraft trajectory optimization, unmanned vehicle trajectory optimization and the like, are two-point boundary value problems, and can be expressed as optimal control problems to solve. The traditional optimal control theory is based on a variational method and a Pontryagin extreme value principle, converts an optimal track design problem into a Hamilton-Jacobian-Bellman equation and a two-point boundary value problem, and then controls an object to track along the designed track. This method is often used for optimal control problems where performance metrics are defined to maximize search area, minimize time consumption, minimize burn-up planning, minimize end-of-state errors, and the like. The main idea of the traditional optimal control method is to track a pre-designed optimal track according to a real state, however, an important problem related to the strategy is that since the model is a nonlinear differential equation, the optimization process is to solve the differential equation, and for a scene with a complex model, an analytic solution is difficult to solve.
On the premise of off-line solution, the purpose that off-line optimization is difficult to realize optimal control in real time so as to achieve the optimal trajectory is considered, in order to avoid hysteresis caused by off-line optimization solution, a width learning network is introduced, before the off-line optimization solution is formally used for the width learning network, different target points are used as classification bases, and the optimal control trajectory is integrated and classified into training data sets of different navigation tasks, specifically:
according to different target position integration, based on the corresponding optimal state-control rate pair (s, c) of each target position, the optimal control track integration is classified into different training data sets, and the corresponding characterization is as follows:
Figure BDA0003395413930000097
wherein ,
Figure BDA0003395413930000101
the representation contains an independent data set summarized by all the optimal control tracks under the state that the position A is taken as a target,
Figure BDA0003395413930000102
the same applies.
In this embodiment, referring to fig. 3, the constructed width learning network includes an input layer, a hidden layer, and an output layer, where the hidden layer includes feature nodes, enhancement nodes, and incremental enhancement nodes, and the structure is formed by introducing an incremental learning idea on the basis of the width learning system, and the new structure can iteratively improve the feature extraction capability of the model, increase the expression capability of the model, and improve the fitting performance of the model, and the width learning network can quickly learn a rule closer to the reality by using these updated weights, and the specific process is as follows:
s is set to represent the optimal state in the training data set with a certain position as a target state, C is set to represent the control rate in the training data set with the certain position as the target state, S is input into an input layer of a width learning network and then is subjected to n sets of feature mapping to form n sets of feature node matrixes, and Z is setiAnd (3) representing the ith group of characteristic nodes, and splicing the n groups of characteristic node matrixes into: zn=[Z1,Z2,...,Zn]In order to obtain sparse representation of input data in the feature mapping process, the weights of the input layer and the hidden layer can be adjusted through a sparse self-coding technology, and the optimal weight is automatically selected in the decoding process. Wherein the ith group of feature nodes is represented as:
Zi=Q(SSeiei),i=1,2,...,n
wherein Q represents a linear or non-linear activation function, Wei and βeiWeights and offsets for random initialization, respectively; the mapping characteristics are enhanced nodes for randomly generating weights, and m groups of enhanced node matrixes H are formed through nonlinear transformation on the basis of the characteristic node matrixesm=[H1,H2,...,Hm],HjRepresents the jth group of enhanced nodes, represented as:
Hj=ξ(ZnWhjhj),j=1,2,...,m
where ξ denotes a nonlinear activation function, Whj and βhjRandom weight and bias, respectively; the hidden layer node matrix is spliced into Am=[Zn|Hm]The output of the width learning network is:
Figure BDA0003395413930000104
the characteristic node is not changed, and after the enhanced node is newly added, the hidden layer is changed into Am+1=[Am|ξ(ZnWm+1m+1)],Wm+1 and βm+1Respectively new random weight sumThe bias is generated randomly and kept unchanged in the training process through adding Wm+1Enhancing the tabular ability of the width learning network, enabling the final fixed network structure to realize the fitting of the target output control rate C, solving the weight between the hidden layer and the output layer by means of a pseudo-inverse matrix, and approximating by a ridge regression method, wherein the expression of the pseudo-inverse matrix is as follows:
Figure BDA0003395413930000103
then W ism=(Am)+C。
In order to enable the width network to have better fitting performance, the width learning network is incrementally trained in a mode of adding enhancement nodes, and the adding enhancement nodes are represented as follows:
Figure BDA0003395413930000111
the hidden layer is then represented as: a. them+1=[Am|Hm+1]The pseudo-inverse matrix changed by the newly added enhanced node is expressed as:
Figure BDA0003395413930000112
wherein ,
Figure BDA0003395413930000113
Figure BDA0003395413930000114
after the enhanced nodes are added, the weight matrix expression from the optimal state to the optimal control mapping relation is as follows:
Figure BDA0003395413930000115
at this time, the actual output of the width learning network output layer is:
Figure BDA0003395413930000116
calculating actual output
Figure BDA0003395413930000117
Error from control rate c in training dataset (s, c):
Figure BDA0003395413930000118
wherein | · | purple sweetFIf the error does not meet the threshold value, the width learning network is trained in an incremental mode by adding the enhanced nodes; and when the error meets the threshold value, stopping adding the enhanced nodes, and storing the width learning network model at the moment. The training of the width learning network does not have the parameter adjusting work with strong artificial priority, and does not have the process from slow gradient optimization to target function convergence, and the network parameter matrix can be obtained only by an iterative incremental learning method and the solution of the pseudo-inverse matrix, so that the time consumption is low.
After the optimal state S and the control rate C in the training data set with a certain position as a target state are input into the width learning network to complete training, the weight of the hidden layer of the current width learning network is saved, and other positions except the position are extracted to be used as the optimal state and the control rate in the training data set with the target state to train until the training data sets corresponding to all target positions are traversed, and the training of the width learning network corresponding to the target positions one by one is completed.
Example 2
In this embodiment, in addition to the training of the breadth learning network described in embodiment 1, the training data set is normalized before being used for the breadth learning network training, since the data set is not in an order of magnitude, using methods including, but not limited to, max-min normalization, Z-score normalization, and function transformation.
Optimal state S and control in a training dataset with a position as targetAfter the training of the rate C input width learning network is completed (if the training is started to be
Figure BDA0003395413930000121
) The result output by the width learning network needs inverse normalization processing and is finally used as a control rate according with physical significance.
Example 3
In this embodiment, based on embodiments 1 and 2, the discussion of expanding the initial point to any point and expanding the destination to different final states of the AGV path planning is mainly considered, and based on a certain fixed transport end point, a "optimal state-control rate" pair (s, c) is trained independently to form a width learning network, and by combining with the pseudo-inverse matrix solution of the incremental method, a weight matrix W that conforms to the mapping relationship from the optimal state to the optimal control can be learned quickly, that is, the optimal control problem when the initial point changes can be solved, and the initial point in the online optimal control can be popularized to any point in the set area.
The number of each group of characteristic nodes N, the number of enhanced nodes M and the number of enhanced nodes added each time of the width learning network can be selected in a balancing manner according to the calculation power and the prediction precision in a specific scene.
For different transport target points, namely end states, in an actual transport scene, only the optimal control problem needs to be solved repeatedly in the forward direction for the end states, the network is saved after the weight parameters of the width learning network corresponding to the different end states are learned respectively, and referring to fig. 4, the real-time optimal path planning control for the different end states (such as a target a and a target Z different from the target a in fig. 4) is realized by calling each trained width learning network.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. An AGV real-time path planning method based on optimal control and width learning is characterized by comprising the following steps:
s1, constructing an AGV dynamics model;
s2, establishing an optimal control model by taking an AGV dynamic model as dynamic constraint and taking time-fuel consumption optimization as a target function;
s3, randomly generating initial points of the AGV, taking the initial points as initial conditions for solving the optimal control model, solving the optimal control model in an offline forward direction, and generating optimal control tracks of a plurality of different initial points, wherein the optimal control tracks comprise optimal state-control rate pairs;
s4, constructing a width learning network, integrating and classifying the optimal control tracks into different training data sets according to target positions based on the optimal state-control rate pair, and performing incremental offline training on the width learning network;
and S5, after the off-line training is finished, determining the weight parameters of the width learning network, and using the trained width learning network as a real-time controller for the real-time planning of the AGV path.
2. The AGV real-time path planning method based on optimal control and width learning of claim 1, wherein the AGV dynamics model in step S1 is:
Figure FDA0003395413920000011
wherein t is a time variable, and t belongs to [0, t ∈f],tfX (t), y (t) and x (t) are specified as the time corresponding to the end state, the abscissa and the ordinate of the position coordinate where the middle point of the AGV is located at the time t are represented, and the position coordinate where the center of the AGV is located is represented by P ═ x, y; θ (t) represents an azimuth angle between the AGV and the target position at time t, [ phi ] (t) represents a steering angle at time t, and [ alpha ] (t) represents an azimuth at time tAcceleration in an angular direction; ω (t) represents angular velocity, LwIndicating the track length of the AGV.
3. The AGV real-time path planning method based on optimal control and width learning of claim 2, wherein in step S2, the AGV dynamics model is used as dynamic constraint, the time-fuel consumption optimization is used as a target function, and an optimal control model expression is established as follows:
an objective function:
Figure FDA0003395413920000012
constraint conditions are as follows:
Figure FDA0003395413920000021
wherein ,J1Represents the burn-up;
Figure FDA00033954139200000210
expressing the attention degree of time optimization and fuel consumption optimization in the balance optimization target; e.g. of the typeiRepresenting a path constraint designed from known obstacle positions, satisfying:
Figure FDA0003395413920000022
wherein, i is 1, …, N,
Figure FDA0003395413920000023
represents the position coordinate of the ith obstacle at t time, riRepresenting the corresponding obstacle radius, and k representing a collision warning threshold;
Figure FDA0003395413920000024
representing the path constraint of the transformation, e is positive approaching 0Counting;
Figure FDA0003395413920000025
denotes AGV dynamics model, s (t) [ < x (t), < y (t), < v (t), < t >, θ (t) ]]Is the state variable, c ═ α (t), ω (t)]For the control rate, Bound (s (t), u (t)) represents the boundary value constraint of the AGV; s (t)0) and s(tf) Indicating the initial and final states of a given AGV.
4. The AGV real-time path planning method based on optimal control and width learning of claim 3, wherein the randomly generated AGV initial points of step S3 are expressed as:
s(t0)=[x_random,y_random,0,0,0]
taking the initial point as the initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction, wherein the method is not limited to a pseudo-spectrum method and a target practice method in the optimal control direct method; firstly, interpolating a state variable S (tau) and a control rate C (tau) for solving and calculating, wherein the interpolation method is not limited to a Lagrange interpolation method, and finally generating optimal control tracks of a plurality of different starting points, which are expressed as:
Figure FDA0003395413920000026
wherein ,
Figure FDA0003395413920000027
representing a set of optimal control trajectories, each(s)t,ct) Constitute an "optimal state-control rate" pair (s, c).
5. The AGV real-time path planning method based on optimal control and width learning of claim 4, wherein according to integration of different target positions, based on the "optimal state-control rate" pair (s, c) corresponding to each target position, the optimal control trajectory is integrated and classified into different training data sets, and the corresponding characterization is as follows:
Figure FDA0003395413920000028
wherein ,
Figure FDA0003395413920000029
the representation contains an independent data set summarized by all the optimal control tracks under the state that the position A is taken as a target,
Figure FDA0003395413920000031
the same applies.
6. The AGV real-time path planning method based on optimal control and width learning of claim 5, wherein the constructed width learning network comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises feature nodes, enhancement nodes and incremental enhancement nodes;
s is set to represent the optimal state in the training data set with a certain position as a target state, C is set to represent the control rate in the training data set with the certain position as the target state, S is input into an input layer of a width learning network and then is subjected to n sets of feature mapping to form n sets of feature node matrixes, and Z is setiAnd (3) representing the ith group of characteristic nodes, and splicing the n groups of characteristic node matrixes into: zn=[Z1,Z2,...,Zn]Wherein, the ith group of feature nodes is represented as:
Zi=Q(SWeiei),i=1,2,...,n
wherein Q represents a linear or non-linear activation function, Wei and βeiWeights and offsets for random initialization, respectively; the mapping characteristics are enhanced nodes for randomly generating weights, and m groups of enhanced node matrixes H are formed through nonlinear transformation on the basis of the characteristic node matrixesm=[H1,H2,…,Hm],HjRepresents the jth group of enhanced nodes, represented as:
Hj=ξ(ZnWhjhj),j=1,2,...,m
where ξ denotes a nonlinear activation function, Whj and βhjRandom weight and bias, respectively; the hidden layer node matrix is spliced into Am=[Zn|Hm]The output of the width learning network is:
Figure FDA0003395413920000032
the characteristic node is not changed, and after the enhanced node is newly added, the hidden layer is changed into Am+1=[Am|ξ(ZnWm+1m+1)],Wm+1 and βm+1Respectively, new random weight and bias, which are generated randomly and kept unchanged in the training process, are added with Wm+1Enhancing the tabular ability of the width learning network, enabling the final fixed network structure to realize the fitting of the target output control rate C, solving the weight between the hidden layer and the output layer by means of a pseudo-inverse matrix, and approximating by a ridge regression method, wherein the expression of the pseudo-inverse matrix is as follows:
Figure FDA0003395413920000033
then W ism=(Am)+C。
7. The AGV real-time path planning method based on optimal control and width learning of claim 6, wherein the width learning network is incrementally trained by adding enhanced nodes, and the added enhanced nodes are represented as:
Figure FDA0003395413920000034
the hidden layer is then represented as: a. them+1=[Am|Hm+1]The pseudo-inverse matrix changed by the newly added enhanced node is expressed as:
Figure FDA0003395413920000041
wherein ,
Figure FDA0003395413920000042
Figure FDA0003395413920000043
Figure FDA0003395413920000044
Figure FDA0003395413920000045
after the enhanced nodes are added, the weight matrix expression from the optimal state to the optimal control mapping relation is as follows:
Figure FDA0003395413920000046
at this time, the actual output of the width learning network output layer is:
Figure FDA0003395413920000047
calculating actual output
Figure FDA0003395413920000048
Error from control rate c in training dataset (s, c):
Figure FDA0003395413920000049
wherein | · | purple sweetFIf the error does not meet the threshold value, the width learning network is trained in an incremental mode by adding the enhanced nodes; when the error satisfies the threshold, the increase is stoppedStrong nodes, and saves the width learning network model at this time.
8. The AGV real-time trajectory planning method based on optimal control and width learning of claim 7, wherein the training data set is normalized before being used for width learning network training.
9. The AGV real-time path planning method based on optimal control and width learning of claim 8, wherein after the optimal state S and the control rate C in the training data set with a certain position as a target state are input into the width learning network to complete training, the result output by the width learning network needs to be subjected to inverse normalization processing.
10. The AGV real-time path planning method based on optimal control and width learning of claim 7, wherein after a position is used as an optimal state S and a control rate C in a training data set in a target state and is input into the width learning network to complete training, weight information of a hidden layer of the current width learning network is stored, and positions except the position are extracted as an optimal state and a control rate in the training data set in the target state to perform training until training data sets corresponding to all target positions are traversed and the training of the width learning network corresponding to a plurality of target positions one by one is completed.
CN202111482549.5A 2021-12-06 2021-12-06 AGV real-time path planning method based on optimal control and width learning Active CN114200936B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111482549.5A CN114200936B (en) 2021-12-06 2021-12-06 AGV real-time path planning method based on optimal control and width learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111482549.5A CN114200936B (en) 2021-12-06 2021-12-06 AGV real-time path planning method based on optimal control and width learning

Publications (2)

Publication Number Publication Date
CN114200936A true CN114200936A (en) 2022-03-18
CN114200936B CN114200936B (en) 2023-06-13

Family

ID=80650869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111482549.5A Active CN114200936B (en) 2021-12-06 2021-12-06 AGV real-time path planning method based on optimal control and width learning

Country Status (1)

Country Link
CN (1) CN114200936B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117391575A (en) * 2023-12-08 2024-01-12 青岛盈智科技有限公司 Truck transportation route planning method based on path analysis

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635245A (en) * 2018-09-29 2019-04-16 中国矿业大学 A kind of robust width learning system
CN109884886A (en) * 2019-03-29 2019-06-14 大连海事大学 A kind of ship movement model-free adaption method for optimally controlling based on width study
CN111880405A (en) * 2020-07-03 2020-11-03 广东工业大学 AGV self-adaptive path planning real-time control method in flexible manufacturing workshop system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635245A (en) * 2018-09-29 2019-04-16 中国矿业大学 A kind of robust width learning system
CN109884886A (en) * 2019-03-29 2019-06-14 大连海事大学 A kind of ship movement model-free adaption method for optimally controlling based on width study
CN111880405A (en) * 2020-07-03 2020-11-03 广东工业大学 AGV self-adaptive path planning real-time control method in flexible manufacturing workshop system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117391575A (en) * 2023-12-08 2024-01-12 青岛盈智科技有限公司 Truck transportation route planning method based on path analysis
CN117391575B (en) * 2023-12-08 2024-03-22 青岛盈智科技有限公司 Truck transportation route planning method based on path analysis

Also Published As

Publication number Publication date
CN114200936B (en) 2023-06-13

Similar Documents

Publication Publication Date Title
CN111780777B (en) Unmanned vehicle route planning method based on improved A-star algorithm and deep reinforcement learning
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
Li et al. Prescribed performance concurrent control of connected vehicles with nonlinear third-order dynamics
Zhao et al. A novel direct trajectory planning approach based on generative adversarial networks and rapidly-exploring random tree
Orozco-Rosas et al. Mobile robot path planning using a QAPF learning algorithm for known and unknown environments
CN109491389A (en) A kind of robot trace tracking method with constraint of velocity
CN112857385B (en) Rapid unmanned vehicle local path planning method based on non-uniform grid model
CN112506194B (en) Distributed safety learning control method for mobile robot cluster
CN115683145A (en) Automatic driving safety obstacle avoidance method based on track prediction
Mu et al. A time-varying lookahead distance of ILOS path following for unmanned surface vehicle
Sierra‐Garcia et al. Combining reinforcement learning and conventional control to improve automatic guided vehicles tracking of complex trajectories
CN114200936B (en) AGV real-time path planning method based on optimal control and width learning
Elhaki et al. Robust amplitude-limited interval type-3 neuro-fuzzy controller for robot manipulators with prescribed performance by output feedback
Tian et al. Personalized lane change planning and control by imitation learning from drivers
Lin et al. Research on UUV obstacle avoiding method based on recurrent neural networks
Angulo et al. Policy optimization to learn adaptive motion primitives in path planning with dynamic obstacles
Fridovich-Keil et al. Approximate solutions to a class of reachability games
CN117093009A (en) Logistics AGV trolley navigation control method and system based on machine vision
CN114559439B (en) Mobile robot intelligent obstacle avoidance control method and device and electronic equipment
CN115061470B (en) Unmanned vehicle improved TEB navigation method suitable for narrow space
Zang et al. Formation Trajectory Tracking Control of UTVs: A Coupling Multi-Objective Iterative Distributed Model Predictive Control Approach
CN113959446B (en) Autonomous logistics transportation navigation method for robot based on neural network
Chen et al. Framework of active obstacle avoidance for autonomous vehicle based on hybrid soft actor-critic algorithm
Zhang et al. Fuzzy Goal Programming Algorithm for Multi-objective Trajectory Optimal Parking of Autonomous Vehicles
Tang et al. Actively learning Gaussian process dynamical systems through global and local explorations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant