CN114200936A

CN114200936A - AGV real-time path planning method based on optimal control and width learning

Info

Publication number: CN114200936A
Application number: CN202111482549.5A
Authority: CN
Inventors: 吴宗泽; 赖家伦; 李嘉俊; 任志刚; 曾德宇
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-12-06
Filing date: 2021-12-06
Publication date: 2022-03-18
Anticipated expiration: 2041-12-06
Also published as: CN114200936B

Abstract

The invention provides an AGV real-time path planning method based on optimal control and width learning, which relates to the technical field of AGV path planning, and comprises the steps of firstly constructing an AGV dynamics model, an AGV dynamic model is used as dynamic constraint, time-optimal fuel consumption is used as an objective function, an optimal control model is established and off-line forward solving is carried out to generate optimal control tracks of a plurality of different starting points, in this case, considering that off-line optimization is difficult to realize real-time optimal control to achieve the purpose of optimal trajectory, in order to avoid hysteresis caused by offline optimization solution, a width learning network is introduced, different target points are used as classification bases, the optimal control track is integrated and classified into training data sets of different navigation tasks, the width learning network is trained in an incremental mode, the width learning network finally used for AGV real-time path planning is obtained, and real-time optimal control of AGV path planning at any starting point within a certain range is achieved.

Description

AGV real-time path planning method based on optimal control and width learning

Technical Field

The invention relates to the technical field of AGV path planning, in particular to an AGV real-time path planning method based on optimal control and width learning.

Background

An Automated Guided Vehicle (AGV) is a transport Vehicle equipped with an electromagnetic or optical automatic guide device, which can travel along a predetermined guide path, and has safety protection and various transfer functions, and is a key device of a flexible production system, and plays an important role in automation of object transportation and intelligent storage.

Currently, dynamic and flexible manufacturing environments present many challenges to workshop AGV path planning and real-time control. The AGV can be roughly divided into a remote control type, a semi-autonomous type and an autonomous type according to the control mode and the autonomous degree, and navigation based on a multi-magnetic-track type is a path planning method which is firstly adopted by the AGV and is also a method which is adopted by most path planning of the current AGV. In this method, the AGV determines the travel route by recognizing the magnetic track laid on the ground, but this method is limited by the inflexibility of the magnetic track, and the extended path is relatively complicated; the method is characterized in that vision and two-dimensional code type navigation is also a navigation mode which is applied to the field of the current AGV more frequently, the AGV obtains two-dimensional code information to determine the position and the advancing route by identifying two-dimensional codes which are pasted on the ground at intervals and have uniqueness, and compared with a magnetic track type, the method is more flexible in action and easy to schedule, but has the problems of easy abrasion of identification, high requirements on ambient light and the like; the laser SLAM navigation is to determine the position by AGV transmitting laser signal and then reflecting the signal back by the reflector arranged on the wall or the column, which can overcome the defects of the two methods, but has the problems of long drawing time, high cost and less market application.

With the development of deep learning technology, a method for applying deep learning to AGV path planning comes along, and as disclosed in the prior art, an AGV path planning method and a system based on reinforcement learning are provided, in the method, an AGV dynamics model is firstly established, then, the AGV is taken as an intelligent agent, environmental information sensed by the traveling of the AGV is taken as state information, a state space is designed by considering a destination position and an obstacle position, a continuous action space and a multi-reward mechanism are designed, the Markov process modeling of path planning is completed, in the scheme, the state space can be given any different starting points, target points and obstacles at any positions, the generalization is high, an Actor-Critic framework is introduced for strategy learning training subsequently, the problem of large calculation amount is avoided by online operation, the calculation force requirement is low, the real-time decision control of the AGV on any target and obstacle is realized, however, in the technical scheme of the patent, because a relatively long trial-and-error learning process of the AGV and the environment is involved, convergence is slow, time consumption is relatively high, and a suitable artificial experience dependence part such as reward function design and neural network structure design is involved in the training process.

Disclosure of Invention

In order to solve the problems that in the existing AGV path planning method based on deep learning, the adjustment of deep neural network parameters is complex, and the training process is slow, the invention provides the AGV real-time path planning method based on optimal control and width learning, which does not have the manual parameter adjustment work with strong priori performance, trains the width learning network efficiently in an off-line manner, has low time consumption, and provides further expansion and application for future large-scale workshop AGV formation and obstacle avoidance application.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

an AGV real-time path planning method based on optimal control and width learning comprises the following steps:

s1, constructing an AGV dynamics model;

s2, establishing an optimal control model by taking an AGV dynamic model as dynamic constraint and taking time-fuel consumption optimization as a target function;

s3, randomly generating initial points of the AGV, taking the initial points as initial conditions for solving the optimal control model, solving the optimal control model in an offline forward direction, and generating optimal control tracks of a plurality of different initial points, wherein the optimal control tracks comprise optimal state-control rate pairs;

s4, constructing a width learning network, integrating and classifying the optimal control tracks into different training data sets according to target positions based on the optimal state-control rate pair, and performing incremental offline training on the width learning network;

and S5, after the off-line training is finished, determining the weight parameters of the width learning network, and using the trained width learning network as a real-time controller for the real-time planning of the AGV path.

In the technical scheme, an AGV dynamics model is firstly constructed, the AGV dynamics model is used as dynamic constraint, the time-fuel consumption optimization is used as a target function, an optimal control model is established and solved, the numerical solution of the optimal control model has optimality under the consideration of high computational complexity, the optimal control model is solved in an off-line forward direction on the premise that the optimal control model is not suitable for real-time computation, optimal control tracks of a plurality of different starting points are generated, under the condition, the purpose of optimal path planning is achieved by considering that the off-line optimization is difficult to realize real-time optimal control, in order to avoid the hysteresis caused by the off-line optimization solution, a width learning network is introduced, different target points are used as classification bases, the optimal control tracks are integrated into training data sets of different navigation tasks, the width learning network is trained in an incremental mode, and the width learning network finally used for the real-time path planning of the AGV is obtained, the method has the advantages that the AGV real-time optimal control of any initial point in a certain range is realized, the width learning network is used for online operation, and the real-time performance can be guaranteed because the control rate prediction only relates to simple matrix operation and the problem of large calculation amount does not exist.

Preferably, the AGV dynamics model in step S1 is:

wherein t is a time variable, and t belongs to [0, t ∈_f]，t_fX (t), y (t) and x (t) are specified as the time corresponding to the end state, the abscissa and the ordinate of the position coordinate where the middle point of the AGV is located at the time t are represented, and the position coordinate where the center of the AGV is located is represented by P ═ x, y; theta (t) represents an azimuth angle between the AGV and the target position at the time t, phi (t) represents a steering angle at the time t, and alpha (t) represents an acceleration in the azimuth direction at the time t; ω (t) represents angular velocity, L_wIndicating the track length of the AGV.

Preferably, in step S2, the AGV dynamics model is used as a dynamic constraint, the time-fuel consumption optimization is used as an objective function, and the optimal control model expression is established as follows:

an objective function:

constraint conditions are as follows:

wherein ,J₁Represents the burn-up;

expressing the attention degree of time optimization and fuel consumption optimization in the balance optimization target; e.g. of the type_iRepresenting a path constraint designed from known obstacle positions, satisfying:

wherein, i is 1., N,

represents the position coordinate of the ith obstacle at t time, r_iRepresenting the corresponding obstacle radius, and k representing a collision warning threshold;

representing the path constraint of the transformation, epsilon is a positive number approaching 0;

denotes AGV dynamics model, s (t) [ < x (t), < y (t), < v (t), < t >, θ (t) ]]Is the state variable, c ═ α (t), ω (t)]For the control rate, Bound (s (t), u (t)) represents the boundary value constraint of the AGV; s (t)₀) and s(t_f) Indicating the initial and final states of a given AGV.

Preferably, the randomly generated AGV initial point of step S3 is expressed as:

s(t₀)＝[x_random，y_random，0，0，0]

taking the initial point as the initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction, wherein the method is not limited to a pseudo-spectrum method and a target practice method in the optimal control direct method; firstly, interpolating a state variable S (tau) and a control rate C (tau), wherein an interpolation method only obtains a value of a middle point for solving and calculating, the interpolation method is not limited to a Lagrange interpolation method, and finally, optimal control tracks of a plurality of different starting points are generated, and the optimal control tracks are expressed as follows:

wherein ,

representing a set of optimal control trajectories, each(s)_t，c_t) Constitute an "optimal state-control rate" pair (s, c).

Preferably, according to different target position integration, based on the corresponding "optimal state-control rate" pair (s, c) of each target position, the optimal control trajectory integration is classified into different training data sets, and the correspondence is characterized as:

wherein ,

the representation contains an independent data set summarized by all the optimal control tracks under the state that the position A is taken as a target,

the same applies.

Preferably, the constructed width learning network comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises feature nodes, enhancement nodes and increment enhancement nodes;

s is set to represent the optimal state in the training data set with a certain position as a target state, C is set to represent the control rate in the training data set with the certain position as the target state, S is input into an input layer of a width learning network and then is subjected to n sets of feature mapping to form n sets of feature node matrixes, and Z is set_iAnd (3) representing the ith group of characteristic nodes, and splicing the n groups of characteristic node matrixes into: zⁿ＝[Z₁，Z₂，...，Z_n]Wherein, the ith group of feature nodes is represented as:

Z_i＝Q(SW_ei+β_ei)，i＝1，2，...，n

wherein Q represents a linear or non-linear activation function, W_ei and β_eiWeights and offsets for random initialization, respectively; the mapping characteristics are enhanced nodes for randomly generating weights, and m groups of enhanced node matrixes H are formed through nonlinear transformation on the basis of the characteristic node matrixes^m＝[H₁，H₂，...，H_m]，H_jRepresents the jth group of enhanced nodes, represented as:

H_j＝ξ(ZⁿW_hj+β_hj)，j＝1，2，...，m

where ξ denotes a nonlinear activation function, W_hj and β_hjRandom weight and bias, respectively; the hidden layer node matrix is spliced into A^m＝[Zⁿ|H^m]The output of the width learning network is:

the characteristic node is not changed, and after the enhanced node is newly added, the hidden layer is changed into A^m+1＝[A^m|ξ(ZⁿW_m+1+β_m+1)]，W_m+1 and β_m+1Respectively, new random weight and bias, which are generated randomly and kept unchanged in the training process, are added with W^m+1The tabular ability of the width learning network is enhanced, so that the final fixed network structure realizes the aimAnd (3) fitting the standard output control rate C, solving the weight between the hidden layer and the output layer by means of a pseudo-inverse matrix, and approximating by a ridge regression method, wherein the expression of the pseudo-inverse matrix is as follows:

then W is^m＝(A^m)+C。

Preferably, the width learning network is incrementally trained by adding the enhanced node, and the added enhanced node is represented as:

the hidden layer is then represented as: a. the^m+1＝[A^m|H^m+1]The pseudo-inverse matrix changed by the newly added enhanced node is expressed as:

wherein ,

after the enhanced nodes are added, the weight matrix expression from the optimal state to the optimal control mapping relation is as follows:

at this time, the actual output of the width learning network output layer is:

calculating actual output

Error from control rate c in training dataset (s, c):

wherein | · | purple sweet_FIf the error does not meet the threshold value, the width learning network is trained in an incremental mode by adding the enhanced nodes; and when the error meets the threshold value, stopping adding the enhanced nodes, and storing the width learning network model at the moment.

In the training of the width learning network, artificial parameter adjustment work with strong priori performance does not exist, a slow gradient optimization process till the objective function is converged does not exist, the network parameter matrix can be obtained only through an iterative incremental learning method and through the solution of a pseudo-inverse matrix, and the time consumption is low.

Preferably, the training data set is subjected to a normalization process of the data prior to use in the width learning network training.

Preferably, after the optimal state S and the control rate C in the training data set in the state of taking a certain position as a target are input into the width learning network to complete training, the result output by the width learning network needs to be subjected to inverse normalization processing.

Preferably, after the optimal state S and the control rate C in the training data set with a certain position as a target state are input to the width learning network to complete training, the weight of the hidden layer of the current width learning network is saved, and other positions except the position are extracted as the optimal state and the control rate in the training data set with the target state to perform training until the training data sets corresponding to all target positions are traversed, and the training of the width learning network corresponding to the target positions one to one is completed.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

compared with the traditional method of obtaining the path track by only directly solving the optimal control model offline, the method for implementing the path planning based on the optimal control and the width learning emphasizes the defect that the offline optimization cannot realize the real-time optimal control to obtain the optimal track, introduces the width learning network in order to avoid the hysteresis caused by the offline optimization solution, takes different target points as classification bases, classifies the optimal control track into training data sets of different navigation tasks, trains the width learning network incrementally to obtain the width learning network finally used for AGV real-time path planning, realizes the AGV real-time optimal control of any initial point in a certain range, and ensures the real-time performance of the path planning because the control rate prediction only relates to simple matrix operation when the width learning network is used for online operation. In addition, the method provided by the invention has the advantages that the problems of identification abrasion, difficult path expansion, high environmental requirement and long drawing time do not exist, in comparison with the offline training of a deep neural network, the training of a width learning network does not have the parameter adjusting work with strong artificial priority, the process from slow gradient optimization to target function convergence does not exist, the training can be completed only by obtaining a weight parameter matrix through an iterative incremental learning method, and the time consumption is low.

Drawings

Fig. 1 is a schematic flowchart illustrating an AGV adaptive path planning method based on optimal control and width learning according to embodiment 1 of the present invention;

FIG. 2 is a physical schematic diagram of an AGV dynamics model according to embodiment 1 of the present invention;

fig. 3 is an overall frame diagram of a width learning network proposed in embodiment 1 of the present invention;

FIG. 4 is a schematic diagram of AGV path output at different target positions as proposed in embodiment 3 of the present invention.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

it will be understood by those skilled in the art that certain well-known descriptions of the figures may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

example 1

The path planning problem of the AGV can be regarded as a trajectory planning problem that a starting point of the AGV is not fixed, but a terminal point (i.e. a coordinate point of a distribution task) is fixed, and a conventional trajectory method can be divided into: the method comprises the steps of path searching and track optimization, but a traditional path searching method is usually based on a grid map, a searched route does not necessarily accord with vehicle dynamics constraint (a vehicle cannot move transversely), therefore, uncertainty of optimization time and optimization quality is brought to track optimization in the later period, and therefore, when the method is actually implemented, in order to guarantee validity of a path planning result, it is necessary to consider a dynamical model of the AGV as constraint in planning in the initial stage, at this time, the AGV path planning navigation problem is converted into a track planning problem with a fixed starting point and a fixed end point, and forward solution is carried out by an optimal control method for solving a two-point boundary value problem.

Specifically, referring to fig. 1, the present embodiment provides an AGV real-time path planning method based on optimal control and width learning, including the following steps:

s1, constructing an AGV dynamics model;

In this embodiment, based on newton's classical mechanics, and according to the general nature of AGVs that have been put into service actually, adopt the vehicle model of two degrees of freedom to carry out kinematics modeling to AGVs, combine fig. 2, the AGV dynamics model is expressed as:

The method comprises the following steps of taking an AGV dynamics model as dynamic constraint, smoothing collision constraint, and taking time-fuel consumption optimization as a target function, so that the trajectory planning of the AGV is converted into the optimal control problem with corresponding constraint, and the established optimal control model expression is as follows:

objective function (with time-burnup optimum as optimization objective):

constraint conditions are as follows:

wherein ,J₁Represents the burn-up;

wherein, i is 1., N,

The AGV is in a material handling scene, so that the AGV is an optimal control problem with a fixed final state, the AGV is separately used as the final state aiming at a required handling destination, iterative calculation is carried out, and the method for solving the optimal control model is not limited to one of direct methods of optimal control. Specifically, let the randomly generated AGV initial point be expressed as:

s(t₀)＝[x_random，y_random，0，0，0]

taking the initial point as the initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction, wherein the method is not limited to a pseudo-spectrum method and a target practice method in the optimal control direct method; firstly, interpolating the state variable S (tau) and the control rate C (tau) for solving and calculating, wherein the interpolation method is not limited to a Lagrange interpolation method, the state variable S (tau) and the control rate C (tau) are interpolated, and the process meets the following requirements:

and finally generating optimal control tracks of a plurality of different starting points, wherein the optimal control tracks are expressed as follows:

wherein ,

Many trajectory optimization problems in practical engineering, such as spacecraft trajectory optimization, unmanned vehicle trajectory optimization and the like, are two-point boundary value problems, and can be expressed as optimal control problems to solve. The traditional optimal control theory is based on a variational method and a Pontryagin extreme value principle, converts an optimal track design problem into a Hamilton-Jacobian-Bellman equation and a two-point boundary value problem, and then controls an object to track along the designed track. This method is often used for optimal control problems where performance metrics are defined to maximize search area, minimize time consumption, minimize burn-up planning, minimize end-of-state errors, and the like. The main idea of the traditional optimal control method is to track a pre-designed optimal track according to a real state, however, an important problem related to the strategy is that since the model is a nonlinear differential equation, the optimization process is to solve the differential equation, and for a scene with a complex model, an analytic solution is difficult to solve.

On the premise of off-line solution, the purpose that off-line optimization is difficult to realize optimal control in real time so as to achieve the optimal trajectory is considered, in order to avoid hysteresis caused by off-line optimization solution, a width learning network is introduced, before the off-line optimization solution is formally used for the width learning network, different target points are used as classification bases, and the optimal control trajectory is integrated and classified into training data sets of different navigation tasks, specifically:

according to different target position integration, based on the corresponding optimal state-control rate pair (s, c) of each target position, the optimal control track integration is classified into different training data sets, and the corresponding characterization is as follows:

wherein ,

the same applies.

In this embodiment, referring to fig. 3, the constructed width learning network includes an input layer, a hidden layer, and an output layer, where the hidden layer includes feature nodes, enhancement nodes, and incremental enhancement nodes, and the structure is formed by introducing an incremental learning idea on the basis of the width learning system, and the new structure can iteratively improve the feature extraction capability of the model, increase the expression capability of the model, and improve the fitting performance of the model, and the width learning network can quickly learn a rule closer to the reality by using these updated weights, and the specific process is as follows:

s is set to represent the optimal state in the training data set with a certain position as a target state, C is set to represent the control rate in the training data set with the certain position as the target state, S is input into an input layer of a width learning network and then is subjected to n sets of feature mapping to form n sets of feature node matrixes, and Z is set_iAnd (3) representing the ith group of characteristic nodes, and splicing the n groups of characteristic node matrixes into: zⁿ＝[Z₁，Z₂，...，Z_n]In order to obtain sparse representation of input data in the feature mapping process, the weights of the input layer and the hidden layer can be adjusted through a sparse self-coding technology, and the optimal weight is automatically selected in the decoding process. Wherein the ith group of feature nodes is represented as:

Z_i＝Q(SS_ei+β_ei)，i＝1，2，...，n

H_j＝ξ(ZⁿW_hj+β_hj)，j＝1，2，...，m

the characteristic node is not changed, and after the enhanced node is newly added, the hidden layer is changed into A^m+1＝[A^m|ξ(ZⁿW_m+1+β_m+1)]，W_m+1 and β_m+1Respectively new random weight sumThe bias is generated randomly and kept unchanged in the training process through adding W^m+1Enhancing the tabular ability of the width learning network, enabling the final fixed network structure to realize the fitting of the target output control rate C, solving the weight between the hidden layer and the output layer by means of a pseudo-inverse matrix, and approximating by a ridge regression method, wherein the expression of the pseudo-inverse matrix is as follows:

then W is^m＝(A^m)+C。

In order to enable the width network to have better fitting performance, the width learning network is incrementally trained in a mode of adding enhancement nodes, and the adding enhancement nodes are represented as follows:

wherein ,

at this time, the actual output of the width learning network output layer is:

calculating actual output

Error from control rate c in training dataset (s, c):

wherein | · | purple sweet_FIf the error does not meet the threshold value, the width learning network is trained in an incremental mode by adding the enhanced nodes; and when the error meets the threshold value, stopping adding the enhanced nodes, and storing the width learning network model at the moment. The training of the width learning network does not have the parameter adjusting work with strong artificial priority, and does not have the process from slow gradient optimization to target function convergence, and the network parameter matrix can be obtained only by an iterative incremental learning method and the solution of the pseudo-inverse matrix, so that the time consumption is low.

After the optimal state S and the control rate C in the training data set with a certain position as a target state are input into the width learning network to complete training, the weight of the hidden layer of the current width learning network is saved, and other positions except the position are extracted to be used as the optimal state and the control rate in the training data set with the target state to train until the training data sets corresponding to all target positions are traversed, and the training of the width learning network corresponding to the target positions one by one is completed.

Example 2

In this embodiment, in addition to the training of the breadth learning network described in embodiment 1, the training data set is normalized before being used for the breadth learning network training, since the data set is not in an order of magnitude, using methods including, but not limited to, max-min normalization, Z-score normalization, and function transformation.

Optimal state S and control in a training dataset with a position as targetAfter the training of the rate C input width learning network is completed (if the training is started to be

) The result output by the width learning network needs inverse normalization processing and is finally used as a control rate according with physical significance.

Example 3

In this embodiment, based on embodiments 1 and 2, the discussion of expanding the initial point to any point and expanding the destination to different final states of the AGV path planning is mainly considered, and based on a certain fixed transport end point, a "optimal state-control rate" pair (s, c) is trained independently to form a width learning network, and by combining with the pseudo-inverse matrix solution of the incremental method, a weight matrix W that conforms to the mapping relationship from the optimal state to the optimal control can be learned quickly, that is, the optimal control problem when the initial point changes can be solved, and the initial point in the online optimal control can be popularized to any point in the set area.

The number of each group of characteristic nodes N, the number of enhanced nodes M and the number of enhanced nodes added each time of the width learning network can be selected in a balancing manner according to the calculation power and the prediction precision in a specific scene.

For different transport target points, namely end states, in an actual transport scene, only the optimal control problem needs to be solved repeatedly in the forward direction for the end states, the network is saved after the weight parameters of the width learning network corresponding to the different end states are learned respectively, and referring to fig. 4, the real-time optimal path planning control for the different end states (such as a target a and a target Z different from the target a in fig. 4) is realized by calling each trained width learning network.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. An AGV real-time path planning method based on optimal control and width learning is characterized by comprising the following steps:

s1, constructing an AGV dynamics model;

2. The AGV real-time path planning method based on optimal control and width learning of claim 1, wherein the AGV dynamics model in step S1 is:

wherein t is a time variable, and t belongs to [0, t ∈_f]，t_fX (t), y (t) and x (t) are specified as the time corresponding to the end state, the abscissa and the ordinate of the position coordinate where the middle point of the AGV is located at the time t are represented, and the position coordinate where the center of the AGV is located is represented by P ═ x, y; θ (t) represents an azimuth angle between the AGV and the target position at time t, [ phi ] (t) represents a steering angle at time t, and [ alpha ] (t) represents an azimuth at time tAcceleration in an angular direction; ω (t) represents angular velocity, L_wIndicating the track length of the AGV.

3. The AGV real-time path planning method based on optimal control and width learning of claim 2, wherein in step S2, the AGV dynamics model is used as dynamic constraint, the time-fuel consumption optimization is used as a target function, and an optimal control model expression is established as follows:

an objective function:

constraint conditions are as follows:

wherein ,J₁Represents the burn-up;

wherein, i is 1, …, N,

representing the path constraint of the transformation, e is positive approaching 0Counting;

4. The AGV real-time path planning method based on optimal control and width learning of claim 3, wherein the randomly generated AGV initial points of step S3 are expressed as:

s(t₀)＝[x_random,y_random,0,0,0]

taking the initial point as the initial condition for solving the optimal control model, and solving the optimal control model in an offline forward direction, wherein the method is not limited to a pseudo-spectrum method and a target practice method in the optimal control direct method; firstly, interpolating a state variable S (tau) and a control rate C (tau) for solving and calculating, wherein the interpolation method is not limited to a Lagrange interpolation method, and finally generating optimal control tracks of a plurality of different starting points, which are expressed as:

wherein ,

representing a set of optimal control trajectories, each(s)_t,c_t) Constitute an "optimal state-control rate" pair (s, c).

5. The AGV real-time path planning method based on optimal control and width learning of claim 4, wherein according to integration of different target positions, based on the "optimal state-control rate" pair (s, c) corresponding to each target position, the optimal control trajectory is integrated and classified into different training data sets, and the corresponding characterization is as follows:

wherein ,

the same applies.

6. The AGV real-time path planning method based on optimal control and width learning of claim 5, wherein the constructed width learning network comprises an input layer, a hidden layer and an output layer, wherein the hidden layer comprises feature nodes, enhancement nodes and incremental enhancement nodes;

s is set to represent the optimal state in the training data set with a certain position as a target state, C is set to represent the control rate in the training data set with the certain position as the target state, S is input into an input layer of a width learning network and then is subjected to n sets of feature mapping to form n sets of feature node matrixes, and Z is set_iAnd (3) representing the ith group of characteristic nodes, and splicing the n groups of characteristic node matrixes into: zⁿ＝[Z₁,Z₂,...,Z_n]Wherein, the ith group of feature nodes is represented as:

Z_i＝Q(SW_ei+β_ei),i＝1,2,...,n

wherein Q represents a linear or non-linear activation function, W_ei and β_eiWeights and offsets for random initialization, respectively; the mapping characteristics are enhanced nodes for randomly generating weights, and m groups of enhanced node matrixes H are formed through nonlinear transformation on the basis of the characteristic node matrixes^m＝[H₁,H₂,…,H_m],H_jRepresents the jth group of enhanced nodes, represented as:

H_j＝ξ(ZⁿW_hj+β_hj),j＝1,2,...,m

the characteristic node is not changed, and after the enhanced node is newly added, the hidden layer is changed into A^m+1＝[A^m|ξ(ZⁿW_m+1+β_m+1)]，W_m+1 and β_m+1Respectively, new random weight and bias, which are generated randomly and kept unchanged in the training process, are added with W^m+1Enhancing the tabular ability of the width learning network, enabling the final fixed network structure to realize the fitting of the target output control rate C, solving the weight between the hidden layer and the output layer by means of a pseudo-inverse matrix, and approximating by a ridge regression method, wherein the expression of the pseudo-inverse matrix is as follows:

then W is^m＝(A^m)⁺C。

7. The AGV real-time path planning method based on optimal control and width learning of claim 6, wherein the width learning network is incrementally trained by adding enhanced nodes, and the added enhanced nodes are represented as:

wherein ,

at this time, the actual output of the width learning network output layer is:

calculating actual output

Error from control rate c in training dataset (s, c):

wherein | · | purple sweet_FIf the error does not meet the threshold value, the width learning network is trained in an incremental mode by adding the enhanced nodes; when the error satisfies the threshold, the increase is stoppedStrong nodes, and saves the width learning network model at this time.

8. The AGV real-time trajectory planning method based on optimal control and width learning of claim 7, wherein the training data set is normalized before being used for width learning network training.

9. The AGV real-time path planning method based on optimal control and width learning of claim 8, wherein after the optimal state S and the control rate C in the training data set with a certain position as a target state are input into the width learning network to complete training, the result output by the width learning network needs to be subjected to inverse normalization processing.

10. The AGV real-time path planning method based on optimal control and width learning of claim 7, wherein after a position is used as an optimal state S and a control rate C in a training data set in a target state and is input into the width learning network to complete training, weight information of a hidden layer of the current width learning network is stored, and positions except the position are extracted as an optimal state and a control rate in the training data set in the target state to perform training until training data sets corresponding to all target positions are traversed and the training of the width learning network corresponding to a plurality of target positions one by one is completed.