CN110470306A

CN110470306A - A kind of multi-robot formation air navigation aid based on deeply study of certifiable connectivity constraint

Info

Publication number: CN110470306A
Application number: CN201910795982.0A
Authority: CN
Inventors: 林俊潼; 成慧; 杨旭韵; 郑培炜
Original assignee: National Sun Yat Sen University
Current assignee: National Sun Yat Sen University
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2019-11-19
Anticipated expiration: 2039-08-27
Also published as: CN110470306B

Abstract

The present invention relates to mobile robot technology fields, more particularly, to a kind of multi-robot formation air navigation aid based on deeply study of certifiable connectivity constraint.The geometric center of multi-robot formation can efficiently be navigated to target point by this method in collisionless situation, and guarantee the connectivity of robot team formation in navigation procedure.The present invention indicates navigation strategy using the parameterized function that can meet constraint condition, guarantees the connectivity of robot team formation in navigation procedure with this.Meanwhile the present invention realizes the parameterized function that can meet constraint condition using virtual policy-extension environment frame, with the compatible deeply learning algorithm required to the parameterized function property led.

Description

A kind of multi-robot formation based on deeply study of certifiable connectivity constraint Air navigation aid

Technical field

The present invention relates to mobile robot technology field, more particularly, to a kind of certifiable connectivity constraint based on The multi-robot formation air navigation aid of deeply study.

Background technique

Multi-robot formation has wide practical use, such as rescue, search, exploration, agricultural spray and collaboration are carried. In the task of execution, multi-robot formation is likely to require the operation in unknown complex scene.At this point, multi-robot formation navigates Strategy is critically important for the safety and efficiency of multi-robot formation.Under normal conditions, the communication distance of robot is limited, because , in order to guarantee being in communication with each other between multirobot, multi-robot formation navigation strategy needs to consider the company of multi-robot formation for this The general character.Multi-robot formation air navigation aid includes for rule-based method and the method learnt based on deeply.Based on rule Barrier map of the method then dependent on building, and sometimes, the building of barrier map is relatively difficult and can account for With many computing resources.It, can directly will be original using the method learnt based on deeply compared to rule-based method Perception data is mapped to the control amount of robot, without the need to build barrier map, thus obtains extensive concern.However, making When with the method learnt based on deeply, last control amount is generated by a unconfined parameterized function mostly, therefore The control amount is possible to that the connectivity of multi-robot formation can be destroyed, and then leads to the communication disruption between multi-robot formation.

There is different methods that can avoid producing to the resulting strategy addition constraint condition of method learnt based on deeply Life can destroy the control amount of constraint condition.Reward is moulded through modification reward function and is constrained for strategy addition.However due to most Control amount afterwards is still generated by unconfined parameterized function, therefore the constraint added is soft-constraint, that is to say, that Zhi Nengti Height meets the probability constrained, reduces the probability for destroying constraint, and not can guarantee certain satisfaction constraint.Method for normalizing passes through in nothing Normalized function (such as Sigmoid, Tanh and Clipping function) is arranged finally to guarantee in the parameterized function of constraint Output is fallen within certain section, then by acquiring last control amount multiplied by a coefficient.This method can be fine Ground handles the constraint of Interval Type, but can not handle the constraint of connectivity.Method based on control theory is used and is such as controlled The tool of the control theories such as barrier function and liapunov function constrains to add, and has very strong theoretical foundation.However it is this kind of Method need to be introduced into it is additional it is assumed that and these hypothesis in multi-robot formation navigates and may be unsatisfactory for.Hierarchical structure Decision process has been divided into decision of the senior level and bottom decision using the thought divided and ruled by method.Decision of the senior level is learnt by deeply It arrives, bottom decision is by that can guarantee that the bottom decision-making device of constraint condition is completed.However, Design hierarchy structure (i.e. decision of the senior level and The line of demarcation of bottom decision is at which) it is not easy to, and sometimes, can not guarantee the bottom decision of constraint condition Device.

Summary of the invention

The present invention is to overcome above-mentioned defect in the prior art, provides a kind of the strong based on depth of certifiable connectivity constraint The multi-robot formation air navigation aid that chemistry is practised, can effectively ensure that the connectivity of multi-robot formation in navigation procedure.

In order to solve the above technical problems, the technical solution adopted by the present invention is that: a kind of certifiable connectivity constraint based on The multi-robot formation air navigation aid of deeply study, key are to carry out table using the parameterized function that can meet constraint condition Show multi-robot formation navigation strategy, guarantees the connectivity of multi-robot formation in navigation procedure with this.Meanwhile the present invention makes The parameterized function that can meet constraint condition is realized with virtual policy-extension environment frame, parameterizes letter with compatibility requirements The guidable deeply learning algorithm of number.

Further, the parameterized function that can meet constraint condition includes two parts, is a general no constraint respectively Parameterized function (such as neural network) and a constrained optimization module.Since last control amount is by a constrained optimization Module is generated rather than is directly generated by no constrained parameters function, therefore can guarantee to meet constraint condition.

It is parameter without constrained parameters functions value o according to the observation using θ in the parameterized function that can meet constraint condition The z being calculated_θ(o) no longer it is final control amount, but passes to constrained optimization problem module as an input.It constrains excellent Change module according to incoming z_θ(o) constrained optimization problem is solved, the control amount a that can guarantee connectivity is obtained.Specifically, about Objective function f (the z of beam optimization problem_o(o), a) with the variable a to be optimized and output z without constrained parameters function_θ(o) phase It closes, the constraint condition of constrained optimization problem is connectivity constraint.

For given observed value o, different parameter θs₁And θ₂It can generate differentWithAnd then it generates not Same objective functionWithAnd different objective functionWith After constrained optimization module, different final control amount a can be being generated₁And a₂.And since final control amount is by constraining always Optimization module generates, therefore final control amount centainly can satisfy connectivity constraint.

Further, can meet constraint condition parameterized function be one with observed value o for input, control amount a be it is defeated Out, θ is the function of parameter, and mathematical form is as follows:

In formula, z_θIt (o) is no constrained parameters function, f (z_θ(o), a) be constrained optimization problem objective function；g_i(z_θ (o), a), h_i(z_θIt (o), a) is inequality constraints function and equality constraints functions in constrained optimization problem respectively；By f (z, A) it is used as medium, the parameterized function of constraint condition can be met while having the ability to express peace treaty of no constrained parameters function Restriction ability of the beam optimization problem to final control amount.

7. further, the derivation about connectivity constraint: assuming that the kinematics model of i-th of robot are as follows:

WhereinWithPosition and control amount of respectively i-th of the robot in t moment, Δ t are time interval；Design mesh (z, concrete form a) are a to scalar functions f^Ta+z^TA, then meeting the mathematical form of the parameterized function of connectivity constraint condition such as Under:

Wherein N is the quantity of robot in multi-robot formation, and d is communication distance, a_tAnd o_tIt is that t moment is entire respectively The splicing of multi-robot formation N number of robot control amount and observed value, i.e., And t The observed value of i-th of robot of momentInclude the perception information to environment For the present speed of itself,For it The position of remaining robot, andFor the position of target point.

8. it is further, it, can be further by its observed value about teammate's information for different robot iDetermine Justice:

Formula (4) are substituted into the above-mentioned parameterized function formula (3) for meeting connectivity constraint condition；It is artificial with the 1st machine The mathematical form of example, the final resulting parameterized function for meeting connectivity constraint condition is as follows:

Constraints above optimization problem is convex optimization problem, therefore can be with Efficient Solution.To sum up, each robot utilizes shared Without constrained parameters functionAccording to the observed value of itselfIt calculates respectiveThen all robots pass through information Interaction is ownedConstrained optimization problem of equal value is respectively solved, a that can satisfy connectivity constraint is obtained_t；Finally According to the number of oneself from a_tMiddle taking-up is correspondingIt is executed using it as control amount.

In the present invention, a constrained optimization module is contained since the parameterized function of constraint condition can be met, even if No constrained parameters function therein be it is guidable, finally can entirely meet constraint condition parameterized function be also likely to be can not It leads.Multi-robot formation navigation strategy is directly indicated with the parameterized function that can meet constraint condition, then can not utilize and want Seek the guidable deeply learning method of parameterized function.

In order to enable the guidable deeply of parameterized function compatibility requirements parameterized function of constraint condition can be met Learning method, the present invention realize the parameterized function that can meet constraint condition using virtual policy-extension environment mode.Pass through Virtual policy-extension environment framework, intensified learning problem originally is (by the parametrization that guidable may not meet constraint condition Function and primal environment are constituted) an intensified learning problem of equal value is converted into (by guidable virtual policy and extension environment Constitute), therefore can be using requiring the guidable deeply learning method of parameterized function to be solved.Next will above may be used Meet the parameterized function of constraint condition and its is substituted into based on the realization of virtual policy-extension environment framework based on deeply Final navigation strategy can be acquired in the multi-robot formation air navigation aid of habit.

Compared with prior art, beneficial effect is: the present invention proposes to come using the parameterized function that can meet constraint condition It indicates multi-robot formation navigation strategy, guarantees the connectivity of multi-robot formation in navigation procedure with this.Compared to level Structure Method, for method of the invention while can guarantee connectivity constraint, more plug and play is (i.e. without explicitly design level Secondary structure does not depend on the bottom decision-making device that can guarantee constraint condition yet).Meanwhile the present invention uses virtual policy-extension environment Frame realize the parameterized function that can meet constraint condition, with the guidable deeply study of compatibility requirements parameterized functions Algorithm.

Detailed description of the invention

Fig. 1 shows the parameterized functions that can meet constraint condition.

Fig. 2 expression illustrates the parameterized function that can meet constraint condition.

Fig. 3 indicates strategy-environment framework structural schematic diagram.

Fig. 4 indicates virtual policy-extension environment block schematic illustration.

Fig. 5 indicates decision flow diagram.

Specific embodiment

Attached drawing only for illustration, is not considered as limiting the invention；In order to better illustrate this embodiment, attached Scheme certain components to have omission, zoom in or out, does not represent the size of actual product；To those skilled in the art, The omitting of some known structures and their instructions in the attached drawings are understandable.Being given for example only property of positional relationship is described in attached drawing Illustrate, is not considered as limiting the invention.

Embodiment 1

The multi-robot formation air navigation aid based on deeply study of certifiable connectivity constraint proposed by the present invention Key be to indicate multi-robot formation navigation strategy using the parameterized function that can meet constraint condition, guaranteed with this The connectivity of multi-robot formation in navigation procedure.Meanwhile the present invention can to realize using virtual policy-extension environment frame Meet the parameterized function of constraint condition, with the guidable deeply learning algorithm of compatibility requirements parameterized function.

As shown in Figure 1, the parameterized function that can meet constraint condition includes two parts, it is a general no constraint respectively Parameterized function (such as neural network) and a constrained optimization module.Since last control amount is by a constrained optimization Module is generated rather than is directly generated by no constrained parameters function, therefore can guarantee to meet constraint condition.

It is parameter without constrained parameters functions value o according to the observation using θ in the parameterized function that can meet constraint condition The z being calculated_θ(o) no longer it is final control amount, but passes to constrained optimization problem module as an input.It constrains excellent Change module according to incoming z_θ(o) constrained optimization problem is solved, the control amount a that can guarantee connectivity is obtained.Specifically, about Objective function f (the z of beam optimization problem_θ(o), a) with the variable a to be optimized and output z without constrained parameters function_θ(o) phase It closes, the constraint condition of constrained optimization problem is connectivity constraint.

Illustrate the input/output procedure that can meet the parameterized function of constraint condition with the example of Fig. 2 below.For giving Fixed observed value o, different parameter θs₁And θ₂It can generate differentWithAnd then generate different objective functionsWithAnd different objective functionWithPassing through constrained optimization After module, different final control amount a can be generated₁And a₂.And since final control amount is generated by constrained optimization module always, Therefore final control amount centainly can satisfy connectivity constraint.

To sum up, can meet constraint condition parameterized function be one with observed value o for input, control amount a be output, θ For the function of parameter, mathematical form is as follows:

By f, (z a) is used as medium, can meet the parameterized function of constraint condition while have no constrained parameters letter Restriction ability of several abilities to express and constraint optimization problem to final control amount.

Next further derivation of the supplement about connectivity constraint.Assuming that the kinematics model of i-th of robot are as follows:

WhereinWithPosition and control amount of respectively i-th of the robot in t moment, Δ t are time interval.Design mesh (z, concrete form a) are a to scalar functions f^Ta+z^TA, then meeting the mathematical form of the parameterized function of connectivity constraint condition such as Under:

Wherein N=3 is the quantity of robot in multi-robot formation, and d=3.5 is communication distance, a_tAnd o_tIt is t respectively The splicing of moment entire multi-robot formation N number of robot control amount and observed value, i.e.,And the observed value of i-th of robot of t momentInclude the perception letter to environment Breath(i.e. the point cloud data of two-dimensional laser radar), the present speed of itselfThe position of remaining robotAnd mesh The position of punctuate

It, can be further by its observed value about teammate's information for different robot iDefinition:

Formula (4) are substituted into the above-mentioned parameterized function formula (3) for meeting connectivity constraint condition.It is artificial with the 1st machine The mathematical form of example, the final resulting parameterized function for meeting connectivity constraint condition is as follows:

Constraints above optimization problem is convex optimization problem, therefore can be with Efficient Solution.To sum up, it is ensured that connectivity constraint The overall process of multi-robot formation navigation based on deeply study as shown in figure 5, each robot using shared without about Beam parameterized functionAccording to the observed value of itselfIt calculates respectiveThen all robots are obtained by information exchange To allConstrained optimization problem of equal value is respectively solved, a that can satisfy connectivity constraint is obtained_t；Last basis is certainly Oneself number is from a_tMiddle taking-up is correspondingIt is executed using it as control amount.

Parameterized function due to that can meet constraint condition contains a constrained optimization module, even if no constraint therein Parameterized function be it is guidable, it is also likely to be not guidable for finally can entirely meeting the parameterized function of constraint condition.Therefore false As shown in Figure 3 as, multi-robot formation navigation strategy is directly indicated with the parameterized function that can meet constraint condition, then without Method utilizes and requires the guidable deeply learning method of parameterized function.

In order to enable the guidable deeply of parameterized function compatibility requirements parameterized function of constraint condition can be met Learning method, the present invention realize the parameterized function that can meet constraint condition using virtual policy-extension environment mode.Such as figure Shown in 4, extension environment is no longer located at as Fig. 3 about intelligent body with the boundary of environment virtual policy-can meet constraint condition Parameterized function and environment between, but be virtual policy and extension environment between.Wherein, virtual policy is that can meet about In the parameterized function of beam condition without constrained parameters function, extend environment by that can meet in the parameterized function of constraint condition Constrained optimization module and primal environment constitute.

By virtual policy-extension environment framework, intensified learning problem originally is (by that guidable may not meet constraint The parameterized function and primal environment of condition are constituted) an intensified learning problem of equal value is converted into (by guidable virtual plan Slightly constituted with extension environment), therefore can be using requiring the guidable deeply learning method of parameterized function to be solved.It connects The parameterized function of constraint condition will can be met above and its substitute into base based on the realization of virtual policy-extension environment framework by getting off Final navigation strategy can be acquired in the multi-robot formation air navigation aid of deeply study.

In the present embodiment, multi-robot formation navigation plan is indicated using the parameterized function that can meet constraint condition Slightly, guarantee the connectivity of multi-robot formation in navigation procedure with this；It is realized using virtual policy-extension environment frame The parameterized function of constraint condition can be met, with the compatible deeply learning algorithm required to the parameterized function property led.

Obviously, the above embodiment of the present invention be only to clearly illustrate example of the present invention, and not be pair The restriction of embodiments of the present invention.For those of ordinary skill in the art, may be used also on the basis of the above description To make other variations or changes in different ways.There is no necessity and possibility to exhaust all the enbodiments.It is all this Made any modifications, equivalent replacements, and improvements etc., should be included in the claims in the present invention within the spirit and principle of invention Protection scope within.

Claims

1. a kind of multi-robot formation air navigation aid based on deeply study of certifiable connectivity constraint, feature exist In indicating multi-robot formation navigation strategy using the parameterized function that can meet constraint condition；Constraint condition can be met Parameterized function includes two parts, is one respectively without constrained parameters function and a constrained optimization module；Constraint can be met The parameterized function of condition be one with observed value o for input, control amount a be output, θ be parameter function, mathematical form It is as follows:

In formula, z_θIt (o) is no constrained parameters function, f (z_θ(o), a) be constrained optimization problem objective function；g_i(z_θ(o), a)、h_i(z_θIt (o), a) is inequality constraints function and equality constraints functions in constrained optimization problem respectively.

2. the multi-robot formation navigation based on deeply study of certifiable connectivity constraint according to claim 1 Method, which is characterized in that assuming that the kinematics model of i-th of robot are as follows:

WhereinWithPosition and control amount of respectively i-th of the robot in t moment, Δ t are time interval；Design object letter (z, concrete form a) are a to number f^Ta+z^TA, then the mathematical form for meeting the parameterized function of connectivity constraint condition is as follows:

Wherein N is the quantity of robot in multi-robot formation, and d is communication distance, a_tAnd o_tIt is the entire multimachine of t moment respectively Device people forms into columns the splicing of N number of robot control amount and observed value, i.e., And t moment The observed value of i-th of robotInclude the perception information to environment For the present speed of itself,For remaining machine The position of people, andFor the position of target point.

3. the multi-robot formation navigation based on deeply study of certifiable connectivity constraint according to claim 2 Method, which is characterized in that, can be further by its observed value about teammate's information for different robot iDefinition:

Formula (4) are substituted into the above-mentioned parameterized function formula (3) for meeting connectivity constraint condition；By taking the 1st robot as an example, most The mathematical form of the resulting parameterized function for meeting connectivity constraint condition is as follows eventually:

4. the multi-robot formation navigation based on deeply study of certifiable connectivity constraint according to claim 3 Method, which is characterized in that each robot is using shared without constrained parameters functionAccording to the observed value of itselfIt calculates It is respectiveThen all robots are owned by information exchangeConstrained optimization of equal value is respectively solved to ask Topic, obtains a that can satisfy connectivity constraint_t；Finally according to the number of oneself from a_tMiddle taking-up is correspondingUsing it as control Amount processed executes.

5. the multimachine device based on deeply study of certifiable connectivity constraint according to any one of claims 1 to 4 People's formation air navigation aid, which is characterized in that realize the ginseng that can meet constraint condition using virtual policy-extension environment mode Numberization function；By virtual policy-extension environment framework, intensified learning problem originally is converted into an extensive chemical of equal value Habit problem is realized with this using requiring the guidable deeply learning method of parameterized function to be solved；It is wherein original strong Change problem concerning study by the parameterized function of constraint condition guidable may not be met and primal environment is constituted；Extensive chemical of equal value Habit problem is made of guidable virtual policy and extension environment.

6. the multi-robot formation navigation based on deeply study of certifiable connectivity constraint according to claim 5 Method, which is characterized in that assuming that a is intelligent body, e is environment, and f is the parameterized function that can meet constraint condition；F is by b and c two Part forms, and wherein b is no constrained parameters function, and c is constrained optimization module；When code is realized, virtual policy-is used Constrained optimization module c is realized in the data prediction part of environment e, i.e., will be located at intelligent body a originally by the mode for extending environment The constrained optimization module c of end moves on to the front end of environment e.