CN111882184A

CN111882184A - Multi-agent system null space behavior control dynamic task priority planning method

Info

Publication number: CN111882184A
Application number: CN202010677791.7A
Authority: CN
Inventors: 黄捷; 张祯毅; 陈宇韬; 蔡逢煌; 齐义文; 王武; 柴琴琴; 林琼斌; 李卓敏
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2020-07-14
Filing date: 2020-07-14
Publication date: 2020-11-03
Anticipated expiration: 2040-07-14
Also published as: CN111882184B

Abstract

The invention relates to a zero-space behavior control dynamic task priority planning method for a multi-agent system, which comprises the following steps of firstly, establishing a composite task by a zero-space projection method; secondly, converting the dynamic task priority planning problem into a switching mode optimal control problem; and finally, solving the optimal control problem by using a mixed integer optimization method to obtain an optimal composite task track. The method solves the problem of dynamic task priority planning of multi-agent system zero-space behavior control, does not need artificial switching conditions for setting task priorities, reduces the workload of researchers, has good expandability, and can be applied to large-quantity task priority dynamic planning which is difficult to process by a traditional logic method. In addition, the method considers the prediction information of the future state of the agent in the process of switching the task priority, so that the method has more ideal switching effect compared with the traditional logic method.

Description

Multi-agent system null space behavior control dynamic task priority planning method

Technical Field

The invention relates to the technical field of intelligent robots, in particular to a zero-space behavior control dynamic task priority planning method for a multi-agent system.

Background

The multi-agent system has the characteristics of autonomy, fault tolerance, flexibility, expandability, cooperative capability and the like, and is widely applied to the fields of military affairs and civil affairs. Multiple autonomous mobile robots, a typical multi-agent system, are generally required to work in unstructured environments and to perform multiple tasks including moving along preset trajectories, avoiding static and dynamic obstacles, maintaining formation and aggregation, etc. However, these tasks may collide during execution, for example, an autonomous mobile robot cannot move along a preset trajectory while avoiding obstacles appearing on the preset trajectory.

Behavior control has been extensively studied in the last decade as one of the ways to resolve task conflicts. The task priority is set and a collaborative scheme with the priority task is designed to solve the conflict between the tasks. The traditional behavior control method mainly includes a hierarchical control method (the layered control approach) and a motion mode control method (the motor schema control approach), wherein the hierarchical control method is a competitive method, that is, in the process of task execution, only a task with high priority is executed, and a task with low priority can be executed only after the task with high priority is completed; the motion pattern control method is a collaborative method in which the velocity outputs of all tasks are weighted and executed as the final task output, so that none of the tasks is completely completed. Then, the literature (g.antonelliand s.chiaveri, "Kinematic control of planes of autonomus events," ieee transactions on Robotics, vol.22, No.6, pp.1285-1292,2006.) proposes a null-space based behavior control that projects low priority tasks to the null-space of high priority tasks while ensuring that high priority tasks are fully executed, so as to utilize the redundancy of the system to execute sub-priority tasks, which would not be executed if they used the full freedom of the system. Therefore, the null-space behavior control has a good potential in handling task conflicts. However, the priority of a task is determined by the supervision of the task and is usually set in advance and cannot be changed during the execution of the task. This may degrade the control performance of the null-space behavior control method, limiting its application in dynamic and unknown environments. For example, when an obstacle does not exist, setting the obstacle avoidance task to the highest priority level may affect the execution of other tasks.

In this regard, researchers have conducted certain research. A finite state machine is a state transition mechanism in which the state of a system automatically transitions from one state to another state based on a state transition trigger condition. Under the framework of zero-space behavior control, different task priority sequences are hidden in the state of each finite-state machine, and the task priority is planned by designing a trigger condition of state transition. In addition, the fuzzy logic method is to design some fuzzy rules to realize the task priority planning. However, both of these methods require a manual switching rule of task priorities designed in advance, which is very difficult when the number of tasks is very large or in a dynamic and unknown environment, and both of these methods use only current state information to the autonomous robot without predicting a future state, thereby degrading performance of priority switching.

Disclosure of Invention

In view of the above, the invention provides a multi-agent system zero-space behavior control dynamic task priority planning method, which converts a task priority dynamic planning problem into an optimization problem to be processed, and obtains an optimal composite task track by solving the optimization problem, aiming at the defects that the existing zero-space behavior control priority dynamic planning needs a person to set priority switching rules in advance and is difficult to process a large number of task priority dynamic plans.

The invention is realized by adopting the following scheme: a multi-agent system null space behavior control dynamic task priority planning method comprises the following steps:

step S1: designing a basic task rho of an intelligent agent by using a zero-space-based behavior control method, enumerating the possibility of the priorities of all tasks, and combining the designed basic behaviors into a composite task in different priority sequences by using a zero-space projection method;

step S2: converting the dynamic task priority planning problem into a switching mode optimal control problem at each sampling moment by using a model predictive control technology; step S3: and solving the optimal control problem by using a mixed integer optimization method to obtain an optimal composite task track.

Further, the step S1 specifically includes the following steps:

step S11: design of basic tasks:

a basic task is formed by a task variable rho epsilon R^mCoding is carried out, which is a code matching with the system configuration x epsilon RⁿA function of correlation, and can be expressed as:

ρ＝g(x) (1)

differentiation thereof gives:

wherein J (x) e R^m×nIs a task Jacobian matrix related to system configuration, v belongs toRⁿIs the speed of the system; by locally linear inverse mapping, the velocity v of the reference_dExpressed as:

where ρ is_dA function of the desired task is represented,

representing a pseudo-inverse of a jacobian matrix; in practical applications, the integral of the reference velocity will result in a numerical drift of the position of the agent, and therefore the following form of closed-loop inverse kinematics (closed-loop inverse kinematics) is used to compensate for this numerical drift:

where a is a suitably constant positive definite gain matrix,

is a task error;

step S12: and (3) constructing a composite task:

a compound task is a combination of a plurality of basic behaviors in a certain priority order; setting up

Is a task function where i ∈ N_r，N_r＝{1,...,r}，m_iA dimension representing a task space; defining a time-dependent priority function y_i(i,t):N_r×[0,∞]→N_r(ii) a At the same time, a task hierarchy is defined with the following rules:

a task i with a priority of y (i) cannot interfere with a task j with a priority of y (j), if

The mapping relationship from the speed to the task speed is formed by the taskOf the jacobian matrix

Represents;

the dimension of task i with the lowest priority may be larger than

Thus ensuring that dimension n is greater than the total dimension of all tasks;

the value of y (i) is distributed by the task supervisor according to the requirements of the task and the sensor information;

by assigning a given priority to the basic tasks, the speed of the compound task at time t is expressed as:

wherein v is_i,tIs the speed of task j with priority y (j, t) ═ i; equation (5) is the composite task formed after all the basic tasks are subjected to zero-space projection.

Further, the specific content of step S2 is:

consider a set of n (n ≧ 2) autonomous robots and a series of m (m ≧ 2) tasks for each autonomous robot; defining a state vector

Is composed of

Wherein each element represents a state of a robot; introducing a binary variable

Wherein each element w_i(t)∈{0,1}^M(ii) a The dynamics of the robot are written as convex combinations of dynamics in different modes:

wherein

Is w_iThe (j) th element of (a),

a velocity in the form of formula (5) for the jth task of the ith robot; the total number of the task speeds is M ═ M! (ii) a

One constraint of (2) is expressed as:

this constraint is called a class of special-ordered-set-of-type-one, which guarantees that at least and only one mode is activated at any time;

defining the tracking error of the ith robot as:

wherein

Is the desired task function of the ith robot. The cost function is defined as follows:

wherein u is_iDenotes the relaxation variable, K_iAnd P_iIs a normal number; thus, one optimal control problem is described as:

wherein

In the initial state, the state of the device is as follows,

the position of an obstacle detected by the ith robot at time T is shown, and T is a prediction range; the relaxation vector is

Constraint (10d) ensures that all relaxation variables are non-negative, making (10c) a soft-beam of the local minimum problem.

Further, the specific content of step S3 is:

different initial states at each sampling time in the task execution process

Such that the problem (10) (10a) (10b) (10c) (10d) must be solved repeatedly; however, problem (10) contains integer variables that are difficult to solve; the problem (10) can be effectively solved on line by adopting a real-time model predictive control method, and dynamic task priority planning is provided; the algorithm adopts a direct method of first dispersion and then optimization and external convection of an integer relaxation technology;

will time domain [0, T]Divided into N intervals characterised by equidistant grid points, i.e. 0-t₀＜t₁＜···＜t_N-1＜t_N＝T，

At each interval, a binary variable is assumed

Is constant and its value can only be changed at grid points; the binary variable is relaxed to the real variable and satisfied

Using the multi-objective targeting method, the problem (10) is transformed into a nonlinear programming problem and is represented as:

wherein

s is the current sample and the current sample,

predicting at sample s, x for k step state_i,s|kIs composed of

The ith element of (1); in the formula (11), the reaction mixture,

is the number of passesThe function resulting from the value-integral solution of the nonlinear dynamical equation (6) and the constraint (11c) ensures that the state trajectory is continuous over the entire prediction range after the solution of the problem (11), i.e. the equation 11(a)11(b)11(c)11(d)11(e), converges; the nonlinear programming problem shown in equation (11) can be solved efficiently by a standard nonlinear programming problem solver using a sequential quadratic programming or interior point method without using an integer optimization algorithm;

after solving equations 11(a), 11(b), 11(c), 11(d), 11(e), a summary rounding method is applied from

In-taking binary variables

The steps are as follows:

wherein j is 1, 0, M, k is 0, N-1, and the optimal composite task trajectory is formula (12a) and formula (12b), and the formula (12a) and formula (12b) have the following propositions;

if it is not

Is measurable and satisfies

Then function

Is converted from the formulas (12a) (12b) by using the zero order hold, and satisfies

Wherein

Satisfy the requirement of

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention takes task supervision controlled by zero-space behavior as a research object, and researches aiming at the problem of dynamic task priority planning; the invention can effectively solve the problem of dynamic task priority planning of multi-agent system null space behavior control, avoids the switching rule of artificial predesigned tasks and reduces the workload of researchers. Meanwhile, the proposed method has good expandability and can be used for processing the problem of dynamic task priority planning in a large number of tasks and a dynamic unknown environment, and the traditional method needs a design rule in advance, so that great difficulty can be brought to designers under the conditions of a large number of tasks and an unknown environment, and the designers are difficult to consider all possible complex conditions, thereby influencing the performance of task priority switching. The method does not need to artificially set the switching condition of the task priority, thereby reducing the workload of researchers, having good expandability and being applied to the dynamic planning of a large number of task priorities which are difficult to process by the traditional logic method. In addition, the method not only considers the current state information of the autonomous robot but also considers the future state information in the process of switching the task priorities, and further improves the task priority switching effect.

(2) The method introduces a model predictive control technology into the dynamic task priority planning of the zero-space behavior control, converts the planning problem into an optimization problem at each moment for processing, and obtains the optimal task priority track by solving the designed optimization problem. The method does not need to artificially set task priority switching conditions, and can be applied to large-quantity task priority dynamic planning which is difficult to process by the existing method.

Drawings

FIG. 1 is a schematic block diagram of a method of an embodiment of the invention.

FIG. 2 shows an embodiment of the present invention with N

N-1 initial guesses of

Example graph of the direct discretization method of initial guessing.

FIG. 3 is a continuous trace plot after problem (11) solution convergence according to an embodiment of the present invention.

Fig. 4 is a robot trajectory diagram of the model predictive control method and the conventional method according to the embodiment of the present invention.

Fig. 5 is a diagram illustrating a distance between the robot and an obstacle according to the embodiment of the present invention.

Fig. 6 is a tracking error map of the robot according to the embodiment of the present invention.

FIG. 7 is a track diagram of a compound task according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

As shown in fig. 1, the present embodiment provides a method for zero-space behavior control dynamic task priority planning for a multi-agent system, which includes the following steps:

step S1: designing a basic task rho of an intelligent agent by using a zero-space-based behavior control method, enumerating the possibility of the priorities of all tasks, and combining the designed basic behaviors into a composite task in different priority sequences by using a zero-space projection method; if there are m basic tasks, there are theoretically m! A composite task;

step S2: converting the dynamic task priority planning problem into a switching mode optimal control problem at each sampling moment by using a model predictive control technology; the method mainly comprises the following steps: introducing a binary variable, performing evagination processing, and constructing a target function and a constraint condition, so as to convert the dynamic task priority planning problem into a switching mode optimal control problem;

step S3: and solving the optimal control problem by using a mixed integer optimization method to obtain an optimal composite task track. The main contents comprise: discretization, relaxation and projection to integer sets.

In this embodiment, the step S1 specifically includes the following steps:

step S11: design of basic tasks:

ρ＝g(x) (1)

differentiation thereof gives:

wherein J (x) e R^m×nIs a task Jacobian matrix related to system configuration, v belongs to RⁿIs the speed of the system; by locally linear inverse mapping, the velocity v of the reference_dExpressed as:

where ρ is_dExpressing the expected task function (which is a position function related to time, namely the track of the intelligent agent and is the expected task function of behavior control)

where a is a suitably constant positive definite gain matrix,

is a task error;

step S12: and (3) constructing a composite task:

The mapping relation from the speed to the task speed is determined by the Jacobian matrix of the task

Represents;

the dimension of task i with the lowest priority may be larger than

wherein v is_i,tIs the speed of task j with priority y (j, t) ═ i. Formula (5) is a composite task formed after all basic tasks are subjected to zero-space projection, and is different from the conventional method in that the priority of the tasks is time-varying, for example, the basic task with the highest priority is not fixed and is determined according to the requirements of the tasks, namely, the subsequent optimization problem and the solution thereof. The null space projection method is the part I-J + J in the formula, wherein J + is a Jacobian pseudo inverse matrix of a high-priority task, J is a Jacobian matrix of a low priority, and the physical meaning of the part is to project a task with a suboptimal priority onto the null space of the high-priority task, wherein I-J + J is the so-called null space projection.

In this embodiment, the specific content of step S2 is:

Is composed of

wherein

Is w_iThe (j) th element of (a),

One constraint of (2) is expressed as:

defining the tracking error of the ith robot as:

wherein

Is the expected task function of the ith robot; the cost function is defined as follows:

wherein

In the initial state, the state of the device is as follows,

In this embodiment, the specific content of step S3 is:

different initial states at each sampling time in the task execution process

Such that the problem (10) (10a) (10b) (10c) (10d) must be solved repeatedly; however, problem (10) contains integer variables that are difficult to solve; the real-time model prediction control method can be effectively used for on-lineSolving the problem (10), providing a dynamic task priority plan; the algorithm adopts a direct method of first dispersion and then optimization and external convection of an integer relaxation technology; problem 10 is the complete formula of (10a) (10b) (10c) (10d), which constitutes an optimization problem, where 10(a) is a cost function and 10b) (10c) (10d) is its constraint, and the so-called solution optimization problem is to solve 10(a) on the basis of the constraints (10b) (10c) (10d), so that problem 10 refers to the complete formula.

At each interval, a binary variable is assumed

wherein

s is the current sample and the current sample,

predicting at sample s, x for k step state_i,s|kIs composed of

The ith element of (1); an example of such discretization in the initial guess is shown in fig. 2. In the formula (11), the reaction mixture,

is a function obtained by solving the nonlinear dynamical equation (6) by numerical integration, and the constraint (11c) ensures that the state trajectory is continuous in the whole prediction range after the solution of the problem (11), i.e. the equation 11(a)11(b)11(c)11(d)11(e), reaches convergence; a diagram of the state of the solution and the pattern trajectory is shown in fig. 3. The nonlinear programming problem shown in equation (11) can be solved efficiently by a standard nonlinear programming problem solver using a sequential quadratic programming or interior point method without using an integer optimization algorithm;

In-taking binary variables

The steps are as follows:

where j is 1, 0, M, k is 0, N-1, and the optimal compound task trajectory is equation (12a) and equation (12b),

the optimal composite task track is embodied in a formula (12a) and a formula (12b), because the weight of one composite task is 1 and is activated, and the other composite tasks are 0, which are obtained by solving the optimization problem at the moment, the solution is the optimal composite task at the current moment, and the basic tasks of the composite task are composed in a certain priority order, the priority order at the moment is the optimal priority order, and the optimal composite tasks at different moments can be obtained in the whole time field, so the optimal composite task track is obtained.

The propositions for the formulas (12a) and (12b) are as follows;

if it is not

Is measurable and satisfies

Then function

Wherein

Satisfy the requirement of

Preferably, in this embodiment, the basic task ρ of the agent is designed by using a zero-space-based behavior control method, and the basic behaviors are combined into the composite task in different priority orders by using a zero-space projection method.

Preferably, in this embodiment, first, a binary variable is introduced

Secondly, carrying out convex treatment; finally, an objective function is constructed

And constraint conditions, and further converting the dynamic task priority planning problem into a switching mode optimal control problem at each sampling moment.

Preferably, in this embodiment, in order to process an optimization problem including integer variables, a real-time model prediction control method is provided, and first, a multi-target shooting method (Multiple shooting) is used to discretize the optimization problem of continuous time, then, a relaxation process is performed, and finally, a sum-up-rounding method is used to project a solution to an integer set, so as to obtain an optimal composite task trajectory.

Preferably, in the present embodiment, an embodiment is given to show the effectiveness and superiority of the proposed method for dynamic task priority planning for null-space behavior control of multi-agent system based on model predictive control.

1. Design of basic tasks

a. Moving tasks

In the movement task, the robot moves to a target point along a predetermined trajectory. The corresponding task function may be defined as:

ρ_B＝p∈R²(14)

wherein p ═ p_x,p_y]^TIs the position of the robot. The speed of the corresponding desired movement task may be expressed as:

wherein J_B＝I₂Jacobian matrix for moving tasks, I₂The unit matrix is represented by a matrix of units,

to move the pseudo-inverse of the task jacobian,ρ_B,dfor the desired task function, Λ_BThe gain matrix is determined positively for a constant of the movement task,

is the error of the movement task.

b. Obstacle avoidance task

In the obstacle avoidance task, the robot must avoid the obstacle detected on the preset trajectory by the sensor. The corresponding task function may be defined as:

ρ_A＝||p-p_O||∈R (16)

wherein p is_O＝[p_xO,p_yO]^TIs the location of the obstacle. The speed of the corresponding desired obstacle avoidance task may be expressed as:

wherein

For the jacobian matrix of the obstacle avoidance task,

pseudo-inverse of jacobian matrix for obstacle avoidance tasks, Λ_AA constant positive definite gain matrix for the obstacle avoidance task,

d represents a safe distance for avoiding the task error of the obstacle avoidance task.

2. Construction of composite tasks

The first composite task takes the obstacle avoidance task as the highest priority task, and the final composite task speed obtained according to the zero space projection method is as follows:

and (4) compound task A:

wherein

Is a zero space of the obstacle avoidance task.

The second compound task takes the mobile task as the highest priority task, and the final compound task speed obtained according to the zero space projection method is as follows:

and (4) compound task B:

wherein

Is a zero space of the obstacle avoidance task.

3. Simulation comparison and analysis

In the given simulation case, the model predictive control method is compared with a traditional logic method, wherein the traditional logic method switches task priorities by judging the distance between the obstacle and the robot and the size of the safe distance, namely, the obstacle avoidance task is the highest priority when the distance between the obstacle and the robot is smaller than the safe distance, and the mobile task is the highest priority when the distance between the obstacle and the intelligent body is larger than the safe distance. The preset tracks of the three robots are respectively

The obstacle avoidance task gain is 15, the moving task gain is 13, the safety distance is 1m, and the positions of the three obstacles are P_O1＝[10；10.5]，P_O2＝[8；12.5]，P_O3＝[16；11.5]The sampling frequency of the model predictive control method and the sampling frequency of the traditional logic method are both 20HZ, and the prediction time range of the model predictive control method is 3 s. The robot trajectory of the proposed model predictive control method and the conventional method is shown in fig. 4, the distance between the robot and the obstacle is shown in fig. 5, and the tracking error of the robot is shown in fig. 6. The result shows that the proposed model predictive control method is obviously superior to the traditional method under the same sampling frequency. The trajectory of the compound task is shown in fig. 7. The conventional method is as followsWhen an obstacle is encountered, switching is frequently performed in order to avoid the obstacle, resulting in oscillation of the trajectory and violation of the safety distance. On the other hand, the proposed model predictive control method can smoothly complete the task by only using two priority switches under the condition of not violating the safety distance constraint. This can be explained by the fact that the model predictive control method is able to predict obstacles in advance, thereby preparing priority switching in advance. This demonstrates the effectiveness of the proposed model predictive control method and its advantages over traditional logic-based methods.

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A method for planning the priority of dynamic tasks of a multi-agent system for controlling zero-space behavior is characterized in that: the method comprises the following steps:

step S2: converting the dynamic task priority planning problem into a switching mode optimal control problem at each sampling moment by using a model predictive control technology;

step S3: and solving the optimal control problem by using a mixed integer optimization method to obtain an optimal composite task track.

2. The multi-agent system null-space behavior control dynamic task priority planning method according to claim 1, characterized in that: the step S1 specifically includes the following steps:

step S11: design of basic tasks:

ρ＝g(x) (1)

differentiation thereof gives:

where ρ is_dA function of the desired task is represented,

representing a pseudo-inverse of a jacobian matrix; in practical applications, the integration of the reference velocity will result in a numerical drift of the position of the agent, and therefore the following form of closed loop inverse kinematics is used to compensate for this numerical drift:

where a is a suitably constant positive definite gain matrix,

is a task error;

step S12: and (3) constructing a composite task:

a task i with a priority of y (i) cannot interfere with a task j with a priority of y (j), if y (i) ≧ y (j),

i≠j；

i∈N_rRepresents;

the dimension of task i with the lowest priority may be larger than

3. The multi-agent system null-space behavior control dynamic task priority planning method according to claim 2, characterized in that:

Is composed of

wherein

Is w_iThe (j) th element of (a),

One constraint of (2) is expressed as:

this constraint is called a class of special ordered sets, which guarantees that at least and only one mode is activated at any time;

defining the tracking error of the ith robot as:

wherein

Is a desired task of the ith robotA function; the cost function is defined as follows:

wherein

In the initial state, the state of the device is as follows,

Constraint (10d) ensures that all slack variables are non-negative, making (10c) a soft constraint for the local minimum problem.

4. The multi-agent system null-space behavior control dynamic task priority planning method according to claim 1, characterized in that:

during task executionAt each sampling instant, different initial states

Such that the problem (10) (10a) (10b) (10c) (10d) must be solved repeatedly; however, problem 10 contains integer variables that are difficult to solve; the problem (10) can be effectively solved on line by adopting a real-time model predictive control method, and dynamic task priority planning is provided; the algorithm adopts a direct method of first dispersion and then optimization and external convection of an integer relaxation technology;

At each interval, a binary variable is assumed

wherein

s is the current sample and the current sample,

predicting at sample s, x for k step state_i,s|kIs composed of

The ith element of (1); in the formula (11), the reaction mixture,

is a function obtained by solving the nonlinear dynamical equation (6) by numerical integration, and the constraint (11c) ensures that the state trajectory is continuous in the whole prediction range after the solution of the problem (11), i.e. the equation 11(a)11(b)11(c)11(d)11(e), reaches convergence; the nonlinear programming problem shown in equations 11(a)11(b)11(c)11(d)11(e) can be solved efficiently by a standard nonlinear programming problem solver using a sequential quadratic programming or interior point method without using an integer optimization algorithm;

In-taking binary variables

The steps are as follows:

if it is not

Is measurable and satisfies

Then function

Wherein

Satisfy the requirement of