CN114371626B - Discrete control fence function improvement optimization method, optimization system, terminal and medium - Google Patents

Discrete control fence function improvement optimization method, optimization system, terminal and medium Download PDF

Info

Publication number
CN114371626B
CN114371626B CN202210026694.0A CN202210026694A CN114371626B CN 114371626 B CN114371626 B CN 114371626B CN 202210026694 A CN202210026694 A CN 202210026694A CN 114371626 B CN114371626 B CN 114371626B
Authority
CN
China
Prior art keywords
optimization
fence function
constraint
function
control fence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210026694.0A
Other languages
Chinese (zh)
Other versions
CN114371626A (en
Inventor
李石磊
袁志民
李猛
杨智超
叶清
王甲生
何涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Naval University of Engineering PLA
Original Assignee
Naval University of Engineering PLA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Naval University of Engineering PLA filed Critical Naval University of Engineering PLA
Priority to CN202210026694.0A priority Critical patent/CN114371626B/en
Publication of CN114371626A publication Critical patent/CN114371626A/en
Application granted granted Critical
Publication of CN114371626B publication Critical patent/CN114371626B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of control fence functions, and discloses an improved optimization method, an optimization system, a terminal and a storage medium for a discrete control fence function, wherein a feasible state set is defined according to the constraint requirement of the discrete control fence function; dynamically adjusting and controlling gamma of the fence function according to constraint optimization solving conditions of a plurality of tasks k The method comprises the steps of carrying out a first treatment on the surface of the Gamma of the fence to be controlled k As a first generation optimized variable, obtaining an optimized control fence function; changing the constraint requirement of the discrete form control fence function, and directly optimizing the satisfaction degree of the task constraint; introducing S in optimization objective k And the related preference targets are adopted, so that the motion path obtained by simulation optimization accords with the preference. The invention can obviously improve the feasible domain space of the optimization algorithm and can obtain more flexible and changeable optimization solutions. Meanwhile, the constraint requirements of the invention have more obvious physical meanings, and the arrangement is more visual and easy.

Description

Discrete control fence function improvement optimization method, optimization system, terminal and medium
Technical Field
The invention belongs to the technical field of control fence functions, and particularly relates to an improved optimization method, an optimization system, a terminal and a storage medium for a discrete control fence function.
Background
Currently, the definition of control fence functions is closely related to the concept of control invariance sets, if anyAt a control function pi:
Figure SMS_1
so that for any starting condition x (0) ∈IS, always +.>
Figure SMS_2
The set IS referred to as a control invariant set. Giving a closed set->
Figure SMS_3
Assume that set C satisfies:
Figure SMS_4
Figure SMS_5
Figure SMS_6
wherein Int (C) and
Figure SMS_7
respectively representing the interior and the boundary of the set C, h:/, respectively>
Figure SMS_8
As a continuous micro-function if for all +.>
Figure SMS_9
Satisfy->
Figure SMS_10
And->
Figure SMS_11
The method meets the following conditions:
Figure SMS_12
then h (x) is called the control fence function. Wherein gamma (& gt) is a classThe K function, i.e. γ (·) satisfies a strict monotonic increase and γ (0) =0, in practical applications γ (·) is often taken as a linear function of constant coefficients, i.e. γ (h (x))=γh (x). As can be seen from the definition, if the initial value h (x) of h (x) is equal to or greater than 0, then due to
Figure SMS_13
Always greater than exponential decay, so h (x) 0 is always guaranteed with forward invariance, that is, the set { x|h (x) 0} is a control invariant set of the system.
In addition, the control fence function is closely related to the control lyapunov function, in nonlinear control, if the invention needs to ensure the stability of the system, namely x (t) →0, in order to solve the problem of judging the stability of the system by directly solving the system state, the control lyapunov function is constructed by constructing a controller, if it can make a positive definite function V (x):
Figure SMS_14
Approaching zero, the stability of the system can be indirectly determined. If the controller meets the requirement%>
Figure SMS_15
It can be demonstrated that the system will converge steadily, V (x * ) =0, i.e. x * =0. By means of a function V and by constructing a controller
Figure SMS_16
The stability of the system can be indirectly ensured. Similar to the thought of controlling the Lyapunov function, the control fence function indirectly ensures the forward invariance of the system by constructing a controller, and when the control fence function is used, task constraint can be indirectly ensured to be satisfied by describing the task constraint into a form of h (x) more than or equal to 0 and optimizing the control quantity by means of the concept of the forward invariance of the control fence function. It is worth noting here that controlling the Lyapunov function requirements
Figure SMS_17
Not just +.>
Figure SMS_18
To ensure a stable convergence speed of the system, and similarly, to control the fence function +.>
Figure SMS_19
This constraint requirement is only a conservative subset of the forward invariance of the system, which may lead to situations where the optimization algorithm cannot get a viable solution when multiple task constraints expressed in terms of the control fence function are applied simultaneously. Thus, there is a need for a new discrete control fence function improved optimization method based on discrete time control fence functions.
Through the above analysis, the problems and defects existing in the prior art are as follows: (1) Existing control fence functions
Figure SMS_20
The constraint requirements of (a) are only a conservative subset of the forward invariance of the system, which may lead to situations where the optimization algorithm cannot get a viable solution when multiple task constraints expressed in terms of the control fence function are applied simultaneously.
(2) In the prior art, in multi-agent motion trail data planning facing complex dynamic scenes, the traditional method has the problem that the dimension of the problem is increased sharply, and the requirements on the operation efficiency and the motion details are difficult to meet. The accuracy of the obtained multi-agent motion trail data is low.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides an improved optimizing method, an optimizing system, a terminal and a storage medium for a discrete control fence function, and particularly designs an improved optimizing method for the discrete control fence function based on an optimizable discrete time control fence function.
The invention is realized in such a way that a discrete control fence function improvement optimization method based on a discrete time control fence function comprises the following steps:
Step one, defining a feasible state set according to control fence function constraint requirements in a discretization form;
step two, dynamically adjusting and controlling gamma of the fence function according to the constraint optimization solving conditions of a plurality of tasks k
Step three, controlling the gamma of the fence k As a first generation optimized variable, obtaining an optimized control fence function;
step four, changing the constraint requirement of the discrete form control fence function, and directly changing h (x t+k|t ) Optimizing the satisfaction degree of task constraint;
step five, S is introduced into the optimization target k And the related preference targets enable the multi-agent motion path data of the complex dynamic scene obtained by simulation optimization to accord with the preference.
Further, in the step one
Figure SMS_21
In the form of discretization of:
h t+Δt -h t ≥-γh t
according to the control fence function constraint requirement of a discretization form, defining a feasible state set corresponding to a certain simulation time point t+k as follows:
S CBF,k ={x∈X:h(x t+k+1 )≥(1-γ k )h(x t+k )};
where the range of the feasible state set is defined by h (x t+k ) And gamma k Co-determination, obviously gamma k The larger the time for the next time h (x t+k+1 ) The lower the requirements for (2); conversely, gamma k The smaller the time for the next time h (x t+k+1 ) The higher the demand for (2).
Further, in the step two, in the existing control fence function definition, γ k A pre-determined hyper-parameter; if gamma is k The arrangement is not reasonable enough, which can lead to the situation of no feasible solution, so the gamma in the fence function is controlled k And carrying out dynamic adjustment according to the actual conditions of the constraint optimization solution of the tasks.
Further, in the third step, gamma in the fence is controlled k As a variable of the generation of optimization, it is called an optimizable control fence function.
Further, in the fourth step, according to the description formula of the feasible state set, the method is as follows k After fixing, h (x t+k ) Will also be applied to h (x t+k+1 ) Applying a constraint if h (x t+k ) Larger, then h (x t+k+1 ) It must also take a larger value to meet the CBF constraint requirement, whereas the essential requirement for forward invariance is in fact h (x t+k+1 ) And (3) not less than 0, and changing the constraint requirement of the control fence function in a discrete form into:
Figure SMS_22
wherein S is k For the newly introduced optimization variables, for substitution of (1-gamma) h (x t ) Directly to h (x t+k|t ) The satisfaction degree of task constraints is optimized and called GOCBF, so that multiple task constraints are adjusted more directly, and the range of feasible solution space is further improved.
Further, S is introduced into the optimization target in the fifth step k Related preference targets, introducing S in the optimization targets in the fifth step k The related preference targets, enabling the multi-agent motion path data of the complex dynamic scene obtained by simulation optimization to conform to the preference comprises the following steps:
introducing S in optimization objective k A related preference target comprising Φ (S k )=(S T -S 0 )W S (S-S 0 ) Thereby enabling the motion path obtained by simulation optimization to accord with the preference.
Another object of the present invention is to provide a multi-agent motion trajectory optimization system based on a discrete time control fence function applying the discrete time control fence function based discrete control fence function improvement optimization method, the multi-agent motion trajectory optimization system based on a discrete time control fence function comprising:
the feasible state set definition module is used for defining a feasible state set according to the control fence function constraint requirement in a discretization form;
the variable dynamic adjustment module is used for dynamically adjusting and controlling gamma of the fence function according to the constraint optimization solving conditions of a plurality of tasks k
An optimized control fence function acquisition module for acquiring gamma of the control fence k As a first generation optimized variable, obtaining an optimized control fence function;
a constraint requirement changing module for changing the constraint requirement of the control fence function in a discrete form and directly changing the constraint requirement of h (x t+k|t ) Optimizing the satisfaction degree of task constraint;
a motion path conforming preference acquisition module for introducing S in the optimization objective k And the related preference targets are adopted, so that the motion path obtained by simulation optimization accords with the preference.
It is a further object of the present invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
defining a feasible state set according to the control fence function constraint requirement of the discretization form; dynamically adjusting and controlling gamma of the fence function according to constraint optimization solving conditions of a plurality of tasks k The method comprises the steps of carrying out a first treatment on the surface of the Gamma of the fence to be controlled k As a first generation optimized variable, obtaining an optimized control fence function; changes the constraint requirement of the control fence function in a discrete form, and directly changes h (x t+k|t ) Optimizing the satisfaction degree of task constraint; introducing S in optimization objective k And the related preference targets are adopted, so that the motion path obtained by simulation optimization accords with the preference.
Another object of the present invention is to provide a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:
Defining a feasible state set according to the control fence function constraint requirement of the discretization form; dynamically adjusting and controlling gamma of the fence function according to constraint optimization solving conditions of a plurality of tasks k The method comprises the steps of carrying out a first treatment on the surface of the Will control the fenceγ k As a first generation optimized variable, obtaining an optimized control fence function; changes the constraint requirement of the control fence function in a discrete form, and directly changes h (x t+k|t ) Optimizing the satisfaction degree of task constraint; introducing S in optimization objective k And the related preference targets are adopted, so that the motion path obtained by simulation optimization accords with the preference.
The invention further aims to provide an information data processing terminal which is used for realizing the multi-agent motion trail optimization system based on the discrete time control fence function.
By combining all the technical schemes, the invention has the advantages and positive effects that: the optimized discrete time control fence function provided by the invention is more universal and flexible, and when a plurality of task constraints are added, the feasible region space of an optimization algorithm can be obviously improved, and more flexible and changeable optimization solutions can be obtained; the constraint requirement has more obvious physical meaning, and the setting is more visual and easy.
Aiming at the problems that task constraints are limited in a geometric space, consideration of dynamic characteristics of an intelligent agent is lacked, and overall consideration among different task constraints is lacked, the invention provides an optimizable GOCBF function, dynamic physical simulation and motion planning are combined based on MPC-GOCBF, and the unified description of the multi-task constraints of the motion planning is realized by utilizing an improved CBF function based on an augmented physical simulation thought, so that the explicit constraint relation analysis solving (or searching) process is converted into implicit simulation calculation, a multi-intelligent-agent motion planning physical simulation realization environment is constructed, the algorithm is tested and verified, and experimental results prove the flexibility and the dynamic adaptability of the algorithm. And the accuracy of the obtained multi-agent motion trail data is high.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a discrete control fence function improvement optimization method based on a discrete time control fence function according to an embodiment of the present invention.
FIG. 2 is a block diagram of a multi-agent motion trajectory optimization system based on discrete time control fence functions provided by an embodiment of the present invention;
in the figure: 1. a feasible state set definition module; 2. a variable dynamic adjustment module; 3. the fence function acquisition module can be optimally controlled; 4. a constraint requirement changing module; 5. the motion path conforms to the preference acquisition module.
Fig. 3 is a schematic diagram of an optimization solving process of a model predictive control algorithm according to an embodiment of the invention.
Fig. 4 is a schematic diagram of generating an optimized movement path of an agent based on augmented physics simulation according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of conservative behavior movement path optimization generation provided by an embodiment of the present invention.
Fig. 5 (a) is a schematic diagram of an intelligent body motion track according to an embodiment of the present invention.
FIG. 5 (b) is a graph showing the control fence function coefficient gamma provided by an embodiment of the present invention k Schematic diagram.
FIG. 6 is a schematic diagram of the generation of an adventure behavior movement path optimization provided by an embodiment of the present invention.
Fig. 6 (a) is a schematic diagram of an intelligent body motion track according to an embodiment of the present invention.
FIG. 6 (b) is a graph showing the control fence function coefficient gamma provided by an embodiment of the present invention k Schematic diagram.
FIG. 7 is a schematic diagram of user-defined behavioral movement path optimization generation provided by an embodiment of the present invention.
Fig. 7 (a) is a schematic diagram of an intelligent body motion track according to an embodiment of the present invention.
FIG. 7 (b) is a graph showing the control fence function coefficient gamma provided by an embodiment of the present invention k Schematic diagram.
FIG. 8 is a schematic diagram of the MPC-GOCBF motion path optimization generation provided by the embodiment of the invention.
Fig. 8 (a) is a schematic diagram of an intelligent agent movement track according to an embodiment of the present invention.
Fig. 8 (b) is a schematic diagram of the h value of the control fence function according to the embodiment of the present invention.
Fig. 9 is a schematic diagram of an adaptive cruise simulation test result (s=25) provided by an embodiment of the present invention.
Fig. 9 (a) is a schematic diagram of a movement speed of a cruising agent according to an embodiment of the present invention.
Fig. 9 (b) is a schematic diagram of a control fence function h according to an embodiment of the present invention.
Fig. 9 (c) is a schematic diagram of an optimized control amount provided by an embodiment of the present invention.
Fig. 9 (d) is a schematic diagram of S values corresponding to the GOCBF provided in the embodiment of the present invention.
Fig. 10 is a schematic diagram of an adaptive cruise simulation test result (s=40) provided by an embodiment of the present invention.
Fig. 10 (a) is a schematic diagram of a movement speed of a cruising agent according to an embodiment of the present invention.
Fig. 10 (b) is a schematic diagram of a control fence function h according to an embodiment of the present invention.
Fig. 10 (c) is a schematic diagram of an optimized control amount provided by an embodiment of the present invention.
Fig. 10 (d) is a schematic diagram of S values corresponding to the GOCBF provided in the embodiment of the present invention.
Fig. 11 is a schematic diagram of a multi-agent collision avoidance experiment according to an embodiment of the present invention.
FIG. 11 (a) is a screenshot of a simulation process provided by an embodiment of the invention.
Fig. 11 (b) is a schematic diagram of a motion track of an obstacle avoidance agent according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Aiming at the problems existing in the prior art, the invention provides an improved optimization method of a discrete control fence function based on a discrete time control fence function, and the invention is described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the discrete control fence function improvement optimization method based on the discrete time control fence function provided by the embodiment of the invention comprises the following steps:
S101, defining a feasible state set according to control fence function constraint requirements in a discretization form;
s102, dynamically adjusting and controlling r of the fence function according to the constraint optimization solving conditions of a plurality of tasks k
S103, controlling r of fence k As a first generation optimized variable, obtaining an optimized control fence function;
s104, changing the constraint requirement of the discrete form control fence function, and directly changing h (x t+k|t ) Optimizing the satisfaction degree of task constraint;
s105, introducing S into the optimization target k And the related preference targets are adopted, so that the motion path obtained by simulation optimization accords with the preference.
In step S105, the multi-agent motion path data of the complex dynamic scene obtained by the simulation optimization is conformed to the preference.
As shown in fig. 2, the multi-agent motion trajectory optimization system based on discrete time control fence function provided by the embodiment of the invention includes:
the feasible state set definition module 1 is used for defining a feasible state set according to the control fence function constraint requirement in a discretization form;
the variable dynamic adjustment module 2 is used for dynamically adjusting and controlling the gamma of the fence function according to the constraint optimization solving conditions of a plurality of tasks k
An optimizable control fence function acquisition module 3 for acquiring gamma of the control fence k As a first generation optimized variable, obtaining an optimized control fence function;
constraint requirement modification module 4 for making discrete form control fence function constraint requirements more demandingChange, and directly apply to h (x t+k|t ) Optimizing the satisfaction degree of task constraint;
a motion path conforming preference acquisition module 5 for introducing S in the optimization objective k And the related preference targets are adopted, so that the motion path obtained by simulation optimization accords with the preference.
The technical scheme of the invention is further described below with reference to specific embodiments.
Examples: optimized discrete time control fence function design
Figure SMS_23
In the form of discretization of:
h t+Δt -h t ≥-γh t (1)
according to the control fence function constraint requirement of a discretization form, the invention defines a feasible state set corresponding to a certain simulation time point t+k as follows:
S CBF,k ={x∈X:h(x t+k+1 )≥(1-γ k )h(x t+k )} (2)
according to equation (2), the range of the feasible state set is defined by h (x t+k ) And gamma k Co-determination, obviously gamma k The larger the time for the next time h (x t+k+1 ) The lower the demand on (2), and conversely, gamma k The smaller the time for the next time h (x t+k+1 ) The higher the demand for (2). In the existing control fence function definition, gamma k For a pre-determined hyper-parameter. If gamma is k The arrangement is not reasonable enough, which results in the situation of no feasible solution. The invention therefore proposes to control gamma in the fence function k Dynamic adjustment is carried out according to the actual condition of the constraint optimization solution of a plurality of tasks. According to the above thought, the invention further controls the gamma in the fence k As a variable for the generation of optimization, it is called an Optimizable control fence function (Optimizable CBF).
According to the description formula (1) of the feasible state set, at gamma k After fixing, h (x t+k ) Will also be applied to h (x t+k+1 ) Applying a constraint if h (x t+k ) Larger, then h (x t+k+1 ) It must also take a larger value to meet the CBF constraint requirement, whereas the essential requirement for forward invariance is in fact h (x t+k+1 ) 0, therefore the invention further proposes to change the control fence function constraint requirements in discrete form to:
Figure SMS_24
in the formula (3), S k The newly introduced optimization variables are used for replacing (1-gamma) h (x t ) Thus, the invention can directly process h (x t+k|t ) The method and the device optimize the satisfaction degree of the task constraints, namely GOCBF (General and Optimizable CBF), so that the plurality of task constraints are more directly adjusted, and the range of a feasible solution space is effectively improved. Also, the present invention introduces S in the optimization objective k Related preference targets, e.g. phi (S k )=(S T -S 0 )W S (S-S 0 ) Thereby enabling the motion path obtained by simulation optimization to accord with the preference.
The technical scheme of the invention is further described below in connection with simulation experiments.
Simulation experiment: motion planning multitask constraint unified description based on augmented physics simulation
The basic goal of motion planning is to find a feasible path from a starting state to a target state in a specific state space, and the essence is to guide a search algorithm to reach a target point by means of partial priori heuristic knowledge (specific constraint or even completely random sampling), which is a typical high-dimensional pathological redundancy problem, namely a plurality of feasible solutions exist at a specific space-time point, a better solution needs to be found out from the plurality of feasible solutions based on the existing task constraint requirements, and at the moment, proper description of various task constraints becomes the key of problem solving and directly determines the solving efficiency of the path planning algorithm and the advantages and disadvantages of a final planned path.
The conventional motion planning algorithm limits task constraint to geometric space, and realizes direct generation of paths by directly judging the satisfaction of geometric constraint relationship, so that physical characteristic limitations (such as speed and acceleration constraint) of an intelligent body are often not considered, consideration of dynamic characteristics of the intelligent body is lacking, and physical feasibility of the motion paths cannot be ensured. In addition, geometrical constraint relationships such as obstacle avoidance and specific target point arrival are often directly described by inequality, and overall consideration among different task constraints is lacking. At present, the problem of multi-agent motion planning of complex dynamic scenes is faced, and the requirements of the traditional method on the operation efficiency and the motion details are difficult to meet due to the fact that the dimension of the problem is increased sharply.
The method specifically comprises the following steps:
1. in constraint-based motion planning path optimization generation algorithms, the overall problem is generally described as an optimization problem. Various task constraint requirements are often directly described by geometric and physical logic equality relationships, such as distance d > 0 from the obstacle, speed limit in a certain range
Figure SMS_25
And the like, and then directly obtaining a path result by utilizing optimization solution calculation. In the aspect of optimizing and solving, a gradient-free intelligent optimization algorithm with a certain heuristic characteristic and a gradient-based optimization calculation method are applied. The heuristic intelligent optimization algorithm cannot ensure that a better result can be obtained every time due to inherent random exploratory property of the algorithm; the optimization method based on the gradient is easy to fall into local optimum, the dependence of the optimization calculation on an Initial Guess value (Initial Guess) is large, and if the Initial Guess value is set inadequately, a better planning result is difficult to obtain. In order to ensure that the gradient-based optimization algorithm always obtains a better solution, firstly, a rough guess path is generated by using an A-algorithm, then the rough path is used as an initial value of a candidate optimization algorithm, an optimization target is set to be the minimum deviation between a final optimization value and an initial estimated path, and finally, a final path is generated in an optimization mode based on specific task constraint. The method can solve the problem of high dependence of the optimization algorithm based on the gradient on the initial estimated value, but is only suitable for off-line global path optimization generation and cannot be applied to complex dynamic loops because the initial estimated value of the global path needs to be generated And generating the planned path in the environment on line in real time.
To enable online dynamic generation of planned paths, numerous researchers have proposed using model predictive control (Model Prediction Control or Receding Horizon Control) to enable constraint optimization-based dynamic generation of planned paths. The optimization calculation process of a model predictive control algorithm with a prediction step length of N can be described as:
Figure SMS_26
in the formula (1.1), (a) represents a control target to be optimized, (b) represents a dynamic model of the agent after discretization, (c) represents various states and control input constraints which the agent needs to meet at each discrete time point, and (d) represents initial and end states of the agent in the optimization time period. In model predictive control optimization solution, the optimization decision variable is a control sequence, i.e., control inputs at N time points within the prediction horizon
Figure SMS_27
However, in the actual execution, only the control quantity at the first time point is +.>
Figure SMS_28
Inputting the actual system to obtain the system state x at the next moment t+1 And the state is taken as an initial state again, the optimization solving process is repeated, and the whole process is shown in fig. 3.
Based on the model predictive control method, various task requirements which need to be met in the motion planning process can be conveniently used as constraint conditions to be added into the optimization solving process. And the model predictive control can effectively improve the quality of a planned path and avoid the short-sighted phenomenon by considering the possible motion trail of the intelligent body in a period of time in the future when optimizing and solving. Therefore, if the path planning quality is further improved, the prediction step length N needs to be increased, so that the solving efficiency of the system is difficult to ensure with the increase of the number of agents and the task constraint requirements. Therefore, under the framework of model predictive control, how to properly describe various task constraint requirements becomes a key for quick problem solving.
With the rapid development of the processing speed of a computer, the physical simulation method is widely applied when simulating various natural phenomena, and after various physical constraints are given, the system can automatically generate physical detail effects meeting various constraint requirements through dynamic simulation calculation, so that the system is flexible and convenient. However, the pure physical simulation lacks an effective control mechanism, a target controller needs to be constructed, and the physical simulation result can be ensured to run towards an assumed target. Different from physical simulation, the motion planning algorithm has the natural advantage of long-distance target control, but for multi-agent navigation behavior generation in a complex dynamic scene, the problem dimension is increased sharply, and the solution efficiency can not be ensured by generating all configuration space (Configuration Space) state information of the multi-agent in the motion process by the planning algorithm, so that the possibility is provided for proper combination of two ideas. Therefore, the invention combines dynamic physical simulation and motion planning, the motion planning is responsible for target control, various index constraints are described as real or virtual physical constraints in a working space (workbench), the traditional real dynamic physical simulation is expanded and augmented, the unified description of the motion planning multi-task constraints is realized based on the augmented physical simulation, and the explicit constraint relation analysis solving (or searching) process is converted into implicit simulation calculation, so that the fusion complementation of the advantages of the two is realized.
Recently proposed control fence function (Control Barrier Functions) is used for converting task constraint explicit solution based on intelligent body state into optimal generation of control quantity by referring to thought of control Lyapunov function (Control Lyapunov Functions), and the control quantity is generated by optimizing to indirectly ensure the satisfaction of task constraint, so that the calculation efficiency can be greatly improved. The prior paper compares the control fence function with an artificial potential field method, and proves that the artificial potential field method is a special case of the control fence function, and the overall performance of the control fence function is superior to that of the artificial potential field method.
From the viewpoint of dynamic physical simulation, the control fence function is considered to be the control quantity for converting specific task constraint into dynamic simulation process, and a better means tool is provided for the augmented physical simulation provided by the invention, so that the method for generating the multi-task constraint unified description and motion path augmented physical simulation under the concept frame of the analysis model predictive control-control fence function (MPC-CBF) is based on the control fence function.
2. Description of the problem
The invention firstly assumes that all the agents move on a flat terrain, each agent is simplified into a cylinder, and therefore the global motion path of each agent can be simplified into a two-dimensional path planning problem. Each agent has a radius r i (i=1.,), N) is represented by a circle of the formula, at any time t, agent A i The position of (a) is expressed as x i (t) speed is denoted v i (t) acceleration is denoted as a i (t) and the speed, acceleration are limited by maximum values, i.e
Figure SMS_29
Indicating the maximum speed value that the agent can reach, < >>
Figure SMS_30
Indicating the maximum acceleration value that the agent can reach.
The invention aims to enable an intelligent agent to generate a dynamic path in real time according to environmental constraint requirements in the process of reaching a designated target area from a starting position, and the intelligent agent does not collide with obstacles in the environment and between the intelligent agent individuals in the process. The whole process can be described as an optimization framework as follows:
Figure SMS_31
in the formula (2.1) of the present invention,
Figure SMS_32
kinetic model representing an agent, x (0) =x 0 ,x e =x g Respectively represent the starting position and the end position of the intelligent body, X is E X, U is E U adm (x (t)) tableIllustrating the state and control quantity constraint requirements of an agent, J (u, x) is an optimization objective, i.e., a performance index measurement function of a planned path, where c (x, u) is a cost function, and is generally expressed as:
c(x,u)=Q(x)+u T Ru (2.2)
in the formula (2.2), Q (x) and u T Ru represents costs related to the state and control quantity of the agent, respectively, where r=r T And > 0 is a symmetric positive definite matrix. The invention expects that the total cost is as small as possible under the precondition of meeting task constraint in the path optimization generation process.
3. Control fence function background knowledge
The definition of the control fence function is closely related to the concept of a control invariant set, if there is one control function pi:
Figure SMS_33
so that for any starting condition x (0) ∈IS, always +.>
Figure SMS_34
The set IS referred to as a control invariant set. Giving a closed set->
Figure SMS_35
Assume that set C satisfies:
Figure SMS_36
in equation (3.1), int (C) and
Figure SMS_37
respectively representing the interior and the boundary of the set C, h:/, respectively>
Figure SMS_38
As a continuous micro-function if for all +.>
Figure SMS_39
Satisfy->
Figure SMS_40
And->
Figure SMS_41
The method meets the following conditions:
Figure SMS_42
then h (x) is called the control fence function. In the formula (3.2), γ (·) is a class-K function, that is, γ (·) satisfies a strict monotonic increase and γ (0) =0, in practical application, γ (·) is often taken as a linear function with a constant coefficient, that is, γ (h (x))=γh (x). As can be seen from definition (3.2), if the initial value h (x) of h (x) is 0 or more, then due to
Figure SMS_43
Always greater than exponential decay, so h (x) 0 is always guaranteed with forward invariance, that is, the set { x|h (x) 0} is a control invariant set of the system.
In addition, the control fence function is closely related to the control lyapunov function, in nonlinear control, if the invention needs to ensure the stability of the system, namely x (t) →0, in order to solve the problem of judging the stability of the system by directly solving the system state, the control lyapunov function is constructed by constructing a controller, if it can make a positive definite function V (x):
Figure SMS_44
Approaching zero, the stability of the system can be indirectly determined. If the controller meets the requirement%>
Figure SMS_45
It can be demonstrated that the system will converge steadily, V (x * ) =0, i.e. x * =0. By means of a function V and by constructing a controller
Figure SMS_46
The stability of the system can be indirectly ensured. Similar to the thought of controlling Lyapunov function, the control fence function indirectly ensures the forward invariance of the system by constructing a controller, and the invention can be used when the control fence function is usedThe task constraint is described as h (x) not less than 0, and the task constraint is indirectly ensured to be satisfied by optimizing the control quantity by means of the concept of controlling the forward invariance of the fence function. It is worth noting here that control of the Lyapunov function requires +.>
Figure SMS_47
Not just +.>
Figure SMS_48
To ensure a stable convergence speed of the system, and similarly, to control the fence function +.>
Figure SMS_49
This constraint requirement is only a conservative subset of the forward invariance of the system, which may lead to situations where the optimization algorithm cannot get a viable solution when multiple task constraints expressed in terms of the control fence function are applied simultaneously.
4. Task constraint unified description and dynamics physical simulation optimization generation based on optimizable control fence function
Based on the control fence function, explicit task constraint satisfaction test can be converted into implicit dynamic simulation calculation, but the description of the multi-task constraint is required to meet the feasibility requirement, so that the situation that the optimization calculation cannot obtain a feasible solution is avoided. Therefore, the invention provides an optimized control fence function based on the concept of the control fence function, and provides an improved optimization solving thought under discrete time condition based on a model prediction control optimization solving framework according to the dynamics simulation calculation characteristics of the optimized control fence function.
4.1 unified description of task constraints based on control fence functions
Based on the definition of the control fence function, the invention firstly needs to set the invariable safety set, and realizes the indirect expression of task constraint through the invariable safety set.
(1) Description of hard constraints
For the hard constraint which the intelligent body always meets in the motion process, the control fence function can be directly used for description.
If the corresponding collision avoidance task is constrained, the control fence function may be set to:
h 1 (x)=d(x i ,o i )-(r i +r o )≥0 (4.1)
in the formula (4.1), d (x) i ,o i ) Represents the distance between the intelligent body and the obstacle o, r i 、r o Respectively represent the radius of the agent and the obstacle. Thus, when the motion path is optimally generated, the invention can ensure that the control quantity input u of the intelligent agent is satisfied by optimizing
Figure SMS_50
Thus, according to the forward invariance, h can be always ensured 1 (x) And the collision between the intelligent agent and various obstacles in the scene is not more than 0.
For bi-directional range constraints like speed, acceleration and control quantity that the agent itself can impose, it can be described by adding two forms of control fence functions. Such as speed range constraints
Figure SMS_51
I.e. < ->
Figure SMS_52
The corresponding control fence function is:
Figure SMS_53
using equation (4.2), the description of the bi-directional range task constraint can be achieved with two control fence functions.
(2) Description of Soft constraints
The soft constraint is a target which needs to be achieved as much as possible or gradually in the movement process of the intelligent body, and on one hand, when the soft constraint can be fully satisfied, the constraint requirement of the soft constraint is satisfied as much as possible; on the other hand, when the soft constraint collides with other soft constraints, it is necessary to appropriately reduce the satisfaction degree of the soft constraint, and ensure that the agent as a whole satisfies as many task constraints as possible. If the soft constraint and other soft constraints conflict with each other, at this time, according to the priority relation of different constraints, it is required to ensure that task constraints with high priority meet the higher degree, and soft task constraints with low priority can properly reduce the limitation requirements. In the invention, soft constraint is described by adopting two modes based on an MPC-CBF optimization framework:
The first is described in the cost function to be optimized. On the global level, soft constraints which are required to be met by the intelligent agent are converted into target quantities to be optimized in the cost function, so that the maximum possible meeting of the corresponding soft constraints can be ensured through optimization calculation in the process of optimizing and generating the motion path of the intelligent agent.
For example, as for task constraint requirements that the control quantity applied by the intelligent agent is as small as possible (energy optimal) in the movement process of the intelligent agent, u can be added in the cost function T Ru item description; task constraints corresponding to reaching a particular target point can be determined by adding (x i -x g ) 2 In the form of (a) guiding the agent to gradually reach the target state.
And secondly, directly describing by using a control fence function form. For the more general soft task constraint, on a local level, it can be described by directly adding a form of relaxation variable to the corresponding control fence function constraint requirement, at this time, the corresponding control fence function constraint requirement (3.2) is transformed into:
Figure SMS_54
in the formula (4.3), epsilon > 0 is a relaxation variable, and the pair is further relaxed by adding the relaxation variable epsilon
Figure SMS_55
When a plurality of soft constraints and hard constraints are optimized together for solving and calculating, the satisfaction degree of the soft constraints can be automatically adjusted through the size change of epsilon, and the smaller epsilon is, the higher the satisfaction degree of the corresponding soft constraints is, otherwise, the larger epsilon obtained by optimizing and solving is, and the lower the satisfaction degree of the corresponding soft constraints is.
When multiple soft task constraints need to be applied simultaneously during motion planning, the priority order between different soft constraints needs to be set. In order to realize modeling expression of different soft constraint priority orders, the invention adds optimization items of corresponding relaxation variables of different soft constraints in the corresponding cost functions:
Figure SMS_56
in the formula (4.4) of the present invention,
Figure SMS_57
w i the larger, the corresponding ε in the optimization solution process i The smaller the corresponding soft constraint is, the higher the satisfaction degree of the corresponding soft constraint is, so that different soft constraints correspond to w i Can intuitively express the importance degree of the relative size of the soft task in different soft task constraints, and the priority of different soft constraints can be realized through the coefficient W in the diagonal matrix W i To make an intuitive description. Corresponding to the soft task constraint with high priority, the invention can set higher w i While the low priority task constraint sets a lower w i
4.2 agent motion Path optimization Generation under augmented physical simulation framework
The conventional motion planning algorithm limits task constraint to geometric space, and realizes direct generation of paths by directly judging the satisfaction of geometric constraint relations, so that the motion planning result has insufficient satisfaction of specific tasks. The invention carries out physical modeling expression on various task constraints based on the control fence function, and realizes implicit solution of the task constraints through physical simulation calculation, and the specific thinking is shown in fig. 4.
Firstly, on the basis of the thought that the temporal logic requirements and the spatial constraints of the third chapter are separately described, the time logic relationship of various space-time task constraints is described by using a DFA, then the spatial constraints are converted into the optimal solution of the control quantity required by the corresponding dynamic physical simulation through a CBF, the physical characteristic limitations of the corresponding intelligent agent are also described through the CBF, the specific task targets can be described through a cost function, and finally the whole augmented physical simulation framework can be converted into the following optimal solution framework:
Figure SMS_58
in equation (4.5), there are n task constraints in total, where
Figure SMS_59
For hard constraint based on CBF description, +.>
Figure SMS_60
Is a soft constraint based on CBF description. In the optimization solving and calculating process, the equation (4.4) is solved based on a numerical simulation algorithm based on the MPC algorithm by converting the continuous equation into a discretization form, and according to the equation (1.1), the discrete form under the MPC optimization solving framework corresponding to the equation (4.5) is as follows:
Figure SMS_61
in the formula (4.6), Δh (x) t+k|t ,u t+k|t ):=h(x t+k+1 )-h(x t+k ),Δh(x t+k|t ,u t+k|t )≥-γh(x t+k|t ) Is that
Figure SMS_62
Is a discretized version of (c). This is because +.>
Figure SMS_63
Then there are:
Figure SMS_64
h t+Δt -h t ≥-γ·Δt·h t (4.8)
since Δt is the simulation calculation step length, which is a certain value, equation (4.8) can be simplified to h t+Δt -h t ≥-γh t Equivalent to replacing the original with a new value gamma The new value gamma of gamma.delta t is 0 < gamma.ltoreq.1.
According to the formula (4.5), the intelligent agent meets the requirement of the specific task constraint to generate the planning of the motion path, and can be converted into a dynamic simulation calculation process, and the required control input is generated based on the optimization of the MPC framework. Due to the inherently conservative nature of the control fence function (which is only a conservative subset of the forward invariance of the system is satisfied), situations may arise in which viable control inputs are not available when n task constraints are applied simultaneously. In order to improve the feasibility of the system, the invention further provides an optimized control fence function.
According to the control fence function constraint requirement of a discretization form, the invention defines a feasible state set corresponding to a certain simulation time point t+k as follows:
S CBF,k ={x∈X:h(x t+k+1 )≥(1-γ k )h(x t+k )} (4.9)
according to equation (4.9), the range of the feasible state set is defined by h (x t+k ) And gamma k Co-determination, obviously gamma k The larger the time for the next time h (x t+k+1 ) The lower the demand on (2), and conversely, gamma k The smaller the time for the next time h (x t+k+1 ) The higher the demand for (2). In the existing control fence function definition, gamma k For a pre-determined hyper-parameter. If gamma is k The arrangement is not reasonable enough, which results in the situation of no feasible solution. The invention therefore proposes to control gamma in the fence function k Dynamic adjustment is carried out according to the actual condition of the constraint optimization solution of a plurality of tasks. According to the above thought, the invention further controls the gamma in the fence k As a variable for the generation of optimization, the invention is called an Optimizable control fence function (Optimizable CBF), and then the formula (4.6) is changed to:
Figure SMS_65
wherein, phi (gamma) t ) For further introduced optimization objectives, different forms can be set according to specific needs. The invention has the following three development and definitionForm:
(1) Preference conservative behavior
If the movement track of the intelligent agent is required to be conservative, the constraint requirement is met as high as possible, and the gamma at each time step point is set k As small as possible, the present invention therefore defines an optimization objective:
Figure SMS_66
at this time
Figure SMS_67
In the optimization process, the gamma tends to be allowed k As small as possible, thereby achieving conservative behavior generation.
(2) Preference adventure behavior
If the movement track of the intelligent agent is required to deviate from risk, the task is completed under the minimum condition of meeting the constraint requirement, and the gamma at each time step point is set k As large as possible, the present invention therefore defines an optimization objective:
Figure SMS_68
at this time, the optimization process tends to let γ k As large as possible, thereby enabling risk action generation.
(3) User-defined behavior
If the user is to gamma k Has a specific requirement gamma t =γ 0 At this time, the present invention defines an optimization objective:
Figure SMS_69
at this time, the optimization process tends to let γ k As close to gamma as possible 0 Thereby realizing the user-defined behavior generation.
According to the description formula (4.8) of the feasible state set, at gamma k After fixing, h (x t+k ) Will also be applied to h (x t+k+1 ) Applying a constraint if h (x t+k ) Larger, then h (x t+k+1 ) It must also take a larger value to meet the CBF constraint requirement, whereas the essential requirement for forward invariance is in fact h (x t+k+1 ) 0, therefore the invention further proposes to change the control fence function constraint requirements in discrete form to:
Figure SMS_70
in the formula (4.14), S k The newly introduced optimization variables are used for replacing (1-gamma) h (x t ) Thus, the invention can directly process h (x t+k|t ) The method and the device optimize the satisfaction degree of the task constraints, namely GOCBF (General and Optimizable CBF), so that the plurality of task constraints are more directly adjusted, and the range of a feasible solution space is effectively improved. Also, the present invention introduces S in the optimization objective k Related preference targets, e.g. phi (S k )=(S T -S 0 )W S (S-S 0 ) Thereby enabling the motion path obtained by simulation optimization to accord with the preference.
5. Simulation results and analysis
In order to test and verify the MPC-OCBF and MPC-GOCBF augmented physics simulation frameworks proposed by the project, the invention utilizes MATLAB, based on Yalmip optimization computing language, and utilizes IPOPT optimization computing package to carry out simulation experiment on the algorithm.
5.1 agent motion Path optimization computation to generate diverse behaviors
Consider a punctiform agent whose kinetic model is second order:
X k+1 =AX k +BU k (5.1)
wherein the state variable X k =[x,y,v x ,v y ] T U is the position and speed of the intelligent body in the two-dimensional plane k =[u x ,u y ] T For the control amount input required by the system, the state transition matrices A, B are respectively:
Figure SMS_71
due to the limitation of physical characteristics of the intelligent body, the physical constraint to be satisfied is as follows:
Figure SMS_72
taking x in simulation experiment max ,x min =±5·I 4×1 ,u max ,u min =I 2×1 Wherein I n×n Is an n x n identity matrix. The radius r needs to be avoided in the movement process of the intelligent body obs Obstacle=1.5, corresponding control fence function is defined as:
Figure SMS_73
the initial and target positions of the agent are (-5; -5) and (0; 0), and the position coordinates of the obstacle are (x) obs ,y obs ) = (-2, -2.25). The cost function in the optimization process is defined as:
c(x,u)=x' k Qx k +u' k Ru k +x' N Px N (5.5)
in the formula (5), q=10·i is taken 4×4 ,R=I 2×2 ,P=100·I 4×4 Prediction window length (MPC horizon) n=8.
(1) Conservative behavior movement path optimization generation
In the simulation experiment process, the invention adopts fixed coefficients gamma respectively k (0.01 and 0.1) and making gamma in the optimization objective k And the motion path dynamics physical simulation optimization generation is carried out by a method as small as possible, and the simulation result is shown in fig. 5.
As can be seen from FIG. 5 (b), the coefficient gamma is calculated by using the MPC-OCBF algorithm framework proposed by the present invention k Can be automatically adjusted according to the actual optimization calculation process, and when the target is reached as soon as possible and the obstacle avoids the collision of the target, the algorithm framework is automatically liftedHigh gamma k The constraint requirements are reduced, other constraints (less walking) are simultaneously met as much as possible on the premise of ensuring that the collision avoidance constraint can be met, and the coefficient gamma is the coefficient gamma at other times k Near zero, the collision avoidance constraints can be met to the greatest extent. As can be seen from FIG. 5 (a), as the present invention increases W gradually in the optimization objective γ The motion path obtained by optimization calculation is more and more conservative and more deviated from the obstacle.
(2) Risk behavior movement path optimization generation
In the simulation experiment process, the invention adopts fixed coefficients gamma respectively k (0.9 and 1.0) and letting gamma in an optimization objective k And the motion path dynamics physical simulation optimization generation is carried out by a method as large as possible, and the simulation result is shown in fig. 6.
As can be seen from FIG. 6 (a), different W γ The obtained movement track of the intelligent agent is basically the same, because the risk behavior target is consistent with the task constraint requirement of reaching the target point as soon as possible (straight line), and the risk behavior target and the task constraint requirement are not in conflict, the obtained gamma is obtained during the optimization calculation k Always remains approaching 1 (as shown in fig. 6 (b)). Therefore, as can be seen from fig. 5 and fig. 6, the MPC-OCBF augmented physical simulation optimization computing framework provided by the present invention can automatically implement appropriate optimization adjustment for different task constraints, thereby increasing the solution space and obtaining diversified trajectory paths.
(3) User-defined behavioral path optimization generation
In the simulation experiment process, the invention sets the user-defined coefficient gamma 0 =[0.3,0.5,0.8],W γ =10 2 ·I N The simulation experiment results are shown in fig. 7.
As can be seen from FIG. 7 (a), the coefficient γ is fixed directly k Compared with the MPC-CBF method, the MPC-OCBF method provided by the invention has little difference in motion trail generated by the MPC-OCBF method because when target conflict occurs, the method is characterized in that γ =10 2 ·I N The setting is smaller, and the optimization algorithm framework automatically pairs gamma k Performing an increase adjustment, as shown in FIG. 7 (b), to ensure that the resulting path satisfies the collision avoidance constraintThe target position is reached as soon as possible under beam conditions.
(4) MPC-GOCBF algorithm behavior path optimization generation
Finally, the invention utilizes the MPC-GOCBF algorithm (formula 4.14) proposed by the project, and directly sets S k And (3) performing motion path simulation optimization calculation. In experiments, the invention sets S 0 =[0,1,2],W S =10 2 ·I N The final results are shown in FIG. 8.
It can be seen from FIG. 8 (a) that by providing different S k The invention can obtain motion paths with different behavioral characteristics. S is S k Has obvious physical and practical meaning, S k The larger the distance of the intelligent body from the obstacle is, and in actual operation, the intelligent body can be directly provided with different S k A movement path is obtained which meets specific requirements. Notably, in the process of the simulation optimization calculation, the control fence function does not necessarily strictly satisfy the requirement of being larger than S 0 Instead of dynamically adjusting the constraint requirements according to the actual situation of the task constraint, as shown in fig. 8 (b), when the collision avoidance constraint requirements collide with the task constraint requirements that reach the target point as soon as possible (straight line), the optimization algorithm automatically reduces S k The optimization adjustment between the multitask constraints is automatically realized, and the whole algorithm framework has stronger adaptability.
2 adaptive cruise motion path optimization calculation in autopilot
In order to further test the performance of the MPC-GOCBF algorithm framework proposed by the project, the invention further carries out test comparison on the performance of the MPC-GOCBF algorithm by using the self-adaptive cruising problem of an intelligent agent in automatic driving. The kinetic model of the agent is:
Figure SMS_74
wherein (x) 1 ,x 2 ) Representing the position and speed of cruising agents
Figure SMS_75
m is the mass of the intelligent agent, x 3 For distance of cruising agent from front guiding agent, front guiding agent is at a fixed speed v l Advancing. />
Figure SMS_76
Is an intelligent dynamic parameter, wherein f 0 ,f 1 And f 2 The range of control inputs that the agent can exert is defined as c, which is an empirical constant d mg≤u≤c a mg,c d For the constant coefficient, the value of the parameter is completely consistent with the prior art. The task constraints that the cruise agent needs to meet are represented by the control lyapunov function:
V=(x 2 -v d ) 2 (5.7)
in the formula (7), v d Representing the cruise speed desired by the cruise agent.
Defining a control fence function as:
h=x 3 -1.8x 2 (5.8)
in order to test the limit performance of MPC-GOCBF algorithm framework, the invention takes the optimization time window length N=1 of MPC, so that the algorithm is degenerated into a single-step QP optimization problem, and the algorithm result and the fixed coefficient gamma are combined k Compared with the optimal-decay CBF-QP method with the optimized parameters additionally added in the CBF-QP by using the CBF-QP method with the optimized parameters of the CBF-QP of which the number is equal to 0.5, the invention takes the value of S=25 and W S =10 4 The simulation results are shown in fig. 9.
In a simulation experiment, the adaptability of the algorithm framework is tested by setting different initial speeds of the cruising intelligent body, and the initial speeds of the cruising intelligent body are respectively 26, 28, 30 and 32. From the simulation results, as can be seen from fig. 8 (a), when the initial speeds are 30 and 32, the CBF-QP (Nominal) cannot obtain a feasible solution, but the GOCBF algorithm of the present invention can ensure that the system obtains a feasible solution, and can achieve the same performance as the optimal-decay CBF-QP method, and ensure that the safety constraint requirement h is greater than or equal to 0 is satisfied in the whole process, as shown in fig. 8 (b). Meanwhile, the S can be automatically and dynamically adjusted in the whole process, and the feasibility of optimizing and solving the system is ensured.
The invention further increases S, takes s=40, further improves the safety constraint requirement of the distance between the vehicle and the front vehicle, and the simulation result is shown in fig. 10.
As can be seen from fig. 10, the GOCBF algorithm proposed by the project enhances the constraint requirement of the safety distance between the front vehicle and the algorithm, and can still ensure the feasibility of optimizing and solving the system, compared with fig. 8, the final h is also increased from 0 to 9.7, thereby increasing the safety distance between the front vehicle and the algorithm and generating the cruise behavior meeting the requirement.
6. Multi-agent motion planning simulation demonstration software design development
In order to conveniently test and verify the augmented physics simulation algorithm based on the MPC-GOCBF thought, the project is further designed and developed to simulate and demonstrate a software environment for multi-agent motion planning, and the software environment is used for testing and verifying the multi-agent motion planning algorithm.
6.1 demonstration software design development
Demonstration software is realized based on Python programming, optimization solution is realized by using CasaDi (https:// web. CasADi. Org /), and the CasaDi provides an efficient open source optimization problem solution, and is very suitable for solving nonlinear optimization problems (nonlinear optimization) and realizing automatic differentiation (algorithmic differentiation). Compared with other optimization libraries, the method provides standard C/C++ and MATLAB support, and provides a Python API which is popular at present, so that nonlinear optimization problems can be conveniently and rapidly solved based on Python. The use of Casadi essentially comprises 3 steps: constructing variables, constructing objective functions and setting solvers, the whole process is very visual and friendly.
First, a CasADi optimization package is introduced:
import casadi as ca
opti=ca.Opti()
when constructing variables to be optimized, the optimization variables can be defined through an opti.variable () function, and relevant parameters can be intuitively defined through the opti.parameter (), such as:
opt_control=opti.variable (N) # defines the control amount to be optimized for N time points in MPC
opt_states=opti.variable (n+1) # defines the state quantity to be optimized for n+1 time points in MPC
Subsequently, the constraint related to the opti_to () function needs to be defined and the optimization target opt_cost needs to be set, such as
opti.subject_to(self.opti.bounded(-self.v_max,v,self.v_max))
opt_cost=self.opt_cost+ca.mtimes([self.opt_controls[i,:],R,self.opt_controls[i,:].T])
When the optimization solution is carried out, the method utilizes the ipopt optimization solution package to calculate, and the specific codes are as follows:
#Optimizer configures
opti.minimize(opt_cost)
opts_setting={'ipopt.max_iter':200,'ipopt.print_level':1,'print_time':0,'ipopt.acceptable_tol':1e-5,'ipopt.acceptable_obj_change_tol':1e-5}
opti.solver('ipopt',opts_setting)
for the dynamics simulation of multiple intelligent agents, the project adopts a Robotarium-open-source multi-robot simulation platformhttps://github.com/robotarium/robotarium_python_simulator) The simulation modeling of multiple agents can be conveniently realized based on the platform.
In order to improve the universality of demonstration software, the project adopts an object-oriented programming idea, and an MPC-GOCBF algorithm is designed into a class: class clf_cbf_nmpc (), which completes the specific solution calculations associated with the MPC-GOCBF algorithm. The main functions of the design are:
# __ add_system_constraints defining physical Property constraints
# __ add_dynamics_constraints: defining an agent dynamics model
# __ add_safe_constraints: implementation task constraint description based on GOCBF
# __ set_cost_func: defining a cost function
# solid: performing specific MPC-GOCBF optimization solution calculations
6.2 Multi-agent motion Path Generation test
In order to test the multi-agent motion planning simulation demonstration software and verify the performance of the algorithm provided by the invention, the project utilizes the software environment to test the dynamic obstacle avoidance of the multi-agent based on the MPC-GOCBF algorithm. The obstacle avoidance agent needs to pass through a plurality of dynamic agents to reach a target state, and a dynamic simulation model of the obstacle avoidance agent is as follows:
Figure SMS_77
and the simulation software environment is utilized to add physical characteristics, task requirements and dynamic obstacle avoidance constraints, a simulation experiment is carried out, and a screenshot of a single simulation process and an intelligent body motion path are shown in fig. 11.
In the simulation test process, the initial positions of other intelligent agents are randomly generated, the simulation is operated for 100 times, the number of other obstacle intelligent agents is changed, the MPC-GOCBF algorithm is tested and evaluated from three evaluation indexes of collision avoidance success rate, average path length and average time, and compared with an optimal-decay CBF method, and the results are shown in Table 1.
TABLE 1 Multi-agent collision avoidance performance statistics
Figure SMS_78
As can be seen from Table 1, the success rate of the MPC-GOCBF algorithm provided by the invention is reduced along with the increase of the number of obstacles, but the overall performance is kept stable, and the problem of multi-agent collision avoidance is better solved.
7. Aiming at the problems that task constraints are limited in a geometric space, consideration of dynamic characteristics of an intelligent agent is lacked, and overall consideration among different task constraints is lacked, the invention provides an optimizable GOCBF function, dynamic physical simulation and motion planning are combined based on MPC-GOCBF, and the unified description of the multi-task constraints of the motion planning is realized by utilizing an improved CBF function based on an augmented physical simulation thought, so that the explicit constraint relation analysis solving (or searching) process is converted into implicit simulation calculation, a multi-intelligent-agent motion planning physical simulation realization environment is constructed, the algorithm is tested and verified, and experimental results prove the flexibility and the dynamic adaptability of the algorithm.
The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims (8)

1. The discrete control fence function improvement and optimization method based on the discrete time control fence function is characterized by comprising the following steps of:
step one, obtaining a feasible state set according to the control fence function constraint requirement of a discretization form;
step two, dynamically adjusting and controlling gamma of the fence function according to the constraint optimization solving conditions of a plurality of tasks k
Step three, controlling the gamma of the fence k As a first generation optimized variable, obtaining an optimized control fence function;
step four, changing the constraint requirement of the discrete form control fence function, and directly changing h (x t+kt ) Optimizing the satisfaction degree of task constraint;
step five, S is introduced into the optimization target k The related preference targets enable the multi-agent motion path data of the complex dynamic scene obtained by simulation optimization to accord with the preference;
in the fourth step, the constraint requirement of the control fence function in a discrete form is changed into:
Figure QLYQS_1
wherein S is k For the newly introduced optimization variables, for substitution of (1-gamma) h (x t ) Directly to h (x t+kt ) Optimizing the satisfaction degree of task constraint;
introducing S into the optimization target in the fifth step k The related preference targets, enabling the multi-agent motion path data of the complex dynamic scene obtained by simulation optimization to conform to the preference comprises the following steps:
introducing S in optimization objective k A related preference target comprising Φ (S k )=(S T -S 0 )W S (S-S 0 ) And enabling the motion path obtained by simulation optimization to accord with the preference.
2. The improved discrete-time-controlled fence function optimization method based on the discrete-time-controlled fence function as set forth in claim 1, wherein said step one is
Figure QLYQS_2
In the form of discretization of:
h t+Δt -h t ≥-γh t
according to the control fence function constraint requirement of a discretization form, defining a feasible state set corresponding to a certain simulation time point t+k as follows:
S CBF,k ={x∈X:h(x t+k+1 )≥(1-γ k )h(x t+k )};
where the range of the feasible state set is defined by h (x t+k ) And gamma k And (5) jointly determining.
3. The improved discrete control fence function optimization method based on a discrete time control fence function as set forth in claim 1, wherein in said step two, γ is defined in the existing control fence function definition k For a predetermined hyper-parameter, control gamma in the fence function k And carrying out dynamic adjustment according to the actual conditions of the constraint optimization solution of the tasks.
4. The improved discrete control fence function optimization method based on a discrete time control fence function as set forth in claim 1, wherein in said step three, γ in the control fence is to be performed k Is optimizableControlling the fence function.
5. A multi-agent motion trajectory optimization system based on a discrete time control fence function applying the discrete time control fence function based discrete control fence function improvement optimization method of any one of claims 1 to 4, characterized in that the multi-agent motion trajectory optimization system based on a discrete time control fence function comprises:
the feasible state set definition module is used for defining a feasible state set according to the control fence function constraint requirement in a discretization form;
the variable dynamic adjustment module is used for dynamically adjusting and controlling gamma of the fence function according to the constraint optimization solving conditions of a plurality of tasks k
An optimized control fence function acquisition module for acquiring gamma of the control fence k As a first generation optimized variable, obtaining an optimized control fence function;
a constraint requirement changing module for changing the constraint requirement of the control fence function in a discrete form and directly changing the constraint requirement of h (x t+kt ) Optimizing the satisfaction degree of task constraint;
a motion path conforming preference acquisition module for introducing S in the optimization objective k And the related preference targets are adopted, so that the motion path obtained by simulation optimization accords with the preference.
6. A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the discrete-time-control-fence-function-based discrete control fence function improvement optimization method of any one of claims 1 to 4.
7. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the discrete-time control fence function based discrete-control fence function improvement optimization method of any one of claims 1 to 4.
8. An information data processing terminal, wherein the information data processing terminal is used for realizing the function of the multi-agent motion trail optimization system based on the discrete time control fence function according to claim 5.
CN202210026694.0A 2022-01-11 2022-01-11 Discrete control fence function improvement optimization method, optimization system, terminal and medium Active CN114371626B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210026694.0A CN114371626B (en) 2022-01-11 2022-01-11 Discrete control fence function improvement optimization method, optimization system, terminal and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210026694.0A CN114371626B (en) 2022-01-11 2022-01-11 Discrete control fence function improvement optimization method, optimization system, terminal and medium

Publications (2)

Publication Number Publication Date
CN114371626A CN114371626A (en) 2022-04-19
CN114371626B true CN114371626B (en) 2023-07-14

Family

ID=81144332

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210026694.0A Active CN114371626B (en) 2022-01-11 2022-01-11 Discrete control fence function improvement optimization method, optimization system, terminal and medium

Country Status (1)

Country Link
CN (1) CN114371626B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110471408A (en) * 2019-07-03 2019-11-19 天津大学 Automatic driving vehicle paths planning method based on decision process
CN112116830A (en) * 2020-09-02 2020-12-22 南京航空航天大学 Unmanned aerial vehicle dynamic geo-fence planning method based on airspace meshing
CN113190613A (en) * 2021-07-02 2021-07-30 禾多科技(北京)有限公司 Vehicle route information display method and device, electronic equipment and readable medium
CN113238563A (en) * 2021-06-04 2021-08-10 重庆大学 High-real-time automatic driving motion planning method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733332B2 (en) * 2017-06-08 2020-08-04 Bigwood Technology, Inc. Systems for solving general and user preference-based constrained multi-objective optimization problems
US10996639B2 (en) * 2019-03-11 2021-05-04 Mitsubishi Electric Research Laboratories, Inc. Model predictive control of systems with continuous and discrete elements of operations

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110471408A (en) * 2019-07-03 2019-11-19 天津大学 Automatic driving vehicle paths planning method based on decision process
CN112116830A (en) * 2020-09-02 2020-12-22 南京航空航天大学 Unmanned aerial vehicle dynamic geo-fence planning method based on airspace meshing
CN113238563A (en) * 2021-06-04 2021-08-10 重庆大学 High-real-time automatic driving motion planning method
CN113190613A (en) * 2021-07-02 2021-07-30 禾多科技(北京)有限公司 Vehicle route information display method and device, electronic equipment and readable medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Enhancing Feasibility and Safety of Nonlinear Model Predictive Control with Discrete-Time Control Barrier Functions;Jun Zeng;《2021 60th IEEE Conference on Decision and Control (CDC)》;全文 *
High-quality trajectory planning for heterogeneous individuals;LI Shi-lei;《J. Cent. South Univ》;全文 *
Optimizable control barrier functions to improve feasibility and add behavior diversity while ensuring safety;李石磊;《electronics》;全文 *

Also Published As

Publication number Publication date
CN114371626A (en) 2022-04-19

Similar Documents

Publication Publication Date Title
Bhattacharyya et al. Simulating emergent properties of human driving behavior using multi-agent reward augmented imitation learning
Amarjyoti Deep reinforcement learning for robotic manipulation-the state of the art
Leottau et al. Decentralized reinforcement learning of robot behaviors
Kwak et al. Introduction to quantum reinforcement learning: Theory and pennylane-based implementation
Rückert et al. Learned graphical models for probabilistic planning provide a new class of movement primitives
Rubies-Royo et al. A classification-based approach for approximate reachability
Zhu et al. An overview of the action space for deep reinforcement learning
Sefati et al. Towards tactical behaviour planning under uncertainties for automated vehicles in urban scenarios
Hesse et al. A reinforcement learning strategy for the swing-up of the double pendulum on a cart
Wu et al. Human-guided reinforcement learning with sim-to-real transfer for autonomous navigation
Bouton et al. Utility decomposition with deep corrections for scalable planning under uncertainty
Zucker et al. Reinforcement planning: RL for optimal planners
Meng et al. Sympocnet: Solving optimal control problems with applications to high-dimensional multiagent path planning problems
Look et al. Differentiable implicit layers
Uchibe Cooperative and competitive reinforcement and imitation learning for a mixture of heterogeneous learning modules
Elsisi et al. Improved bald eagle search algorithm with dimension learning-based hunting for autonomous vehicle including vision dynamics
Oliveira et al. Learning to race through coordinate descent bayesian optimisation
Banerjee et al. A survey on physics informed reinforcement learning: Review and open problems
CN114371626B (en) Discrete control fence function improvement optimization method, optimization system, terminal and medium
Zhang et al. An efficient planning method based on deep reinforcement learning with hybrid actions for autonomous driving on highway
CN115488881A (en) Man-machine sharing autonomous teleoperation method and system based on multi-motor skill prior
Lan et al. Cooperative Guidance of Multiple Missiles: A Hybrid Coevolutionary Approach
Cao et al. Robot motion planning based on improved RRT algorithm and RBF neural network sliding
Yang et al. Reinforcement Learning with Reward Shaping and Hybrid Exploration in Sparse Reward Scenes
García et al. Incremental reinforcement learning for multi-objective robotic tasks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant