CN117806158A

CN117806158A - Stability control method and device for indoor article flight transfer robot

Info

Publication number: CN117806158A
Application number: CN202311866657.1A
Authority: CN
Inventors: 林必毅; 贺振中; 吴福胜
Original assignee: Shenzhen Huasairuifei Intelligent Technology Co ltd
Current assignee: Shenzhen Huasairuifei Intelligent Technology Co ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-02

Abstract

A stability control method and device of an indoor article flight transfer robot comprises the steps of constructing a transfer motion model of the flight transfer robot in an article transfer scene, and generating a control prediction result according to a control input pair of the flight transfer robot and the transfer motion model; the control input pairs of the flying transfer robot comprise a control action pair and a control state pair; constructing a dynamics model corresponding to the flying transfer robot based on the control prediction result; when the cost function corresponding to the dynamic model meets a preset cost threshold, performing stability control on a control input pair of the flying transfer robot by using a model-free optimizer; the model-free optimizer includes an improved deep reinforcement learning algorithm based on hybrid reinforcement learning. The problem of indoor article flight transfer robot in article place and take away the twinkling of an eye, because the effort that produces between article and the flight transfer robot leads to the flight transfer robot flight in-process unstable is solved.

Description

Stability control method and device for indoor article flight transfer robot

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a stability control method and device of an indoor article flying and carrying robot.

Background

In recent years, flying robots are becoming more and more attractive as flexible platforms for people to perform various tasks. With the improvement of computer technology, positioning technology, obstacle avoidance technology and motion planning technology, and the generation of novel sensors and controllers, new generation flying robots are generated, small fuselages and sensitive speeds can enable the flying robots to fly at low altitudes in limited complex indoor environments, so that the flying robots can be civilian, for example, the flying robots can be used for carrying indoor articles. The flying robot can vertically rise and fall, can finish lifting in an indoor complex environment, is extremely agile, can realize different gesture control, and is applied to track tracking, trick flight, formation control and the like in the indoor environment.

However, in the process of carrying the articles, the indoor article flying and carrying robot generates acting forces between the articles and the flying robot at the moment of placing and taking away the articles, the forces can make the flying robot unstable in the process of flying, and the flying robot can crash when serious, so that the analysis and research of a dynamic system model and a stability control algorithm of the indoor article flying and carrying robot are necessary. However, the indoor article flying transfer robot is a highly nonlinear and unstable dynamic system, and the moment of placing and taking away the articles aggravates the instability of the system, which places a great burden on the design of the stability controller.

Disclosure of Invention

The invention mainly solves the technical problem that the indoor article flying and carrying robot is unstable in the flying process of the flying and carrying robot due to acting force generated between the article and the flying and carrying robot at the moment of placing and taking away the article.

According to a first aspect, in one embodiment, there is provided a method for controlling stability of an indoor article flight transfer robot, including:

constructing a carrying action model of the flying carrying robot in a preset article carrying scene, and generating a control prediction result according to a control input pair of the pre-constructed flying carrying robot and the carrying action model; the carrying action model comprises a first model corresponding to the article placing action and a second model corresponding to the article taking action; the control input pairs of the flying transfer robot comprise control action pairs and control state pairs;

constructing a dynamics model corresponding to the flying transfer robot based on the control prediction result;

constructing a cost function corresponding to the dynamic model; the cost function includes a function that causes the flying transfer robot to reach a steady state;

when the cost function meets a preset cost threshold, performing stability control on a control input pair of the flying transfer robot by using a model-free optimizer; the model-free optimizer includes an improved deep reinforcement learning algorithm based on hybrid reinforcement learning.

In one embodiment, the model-free optimizer includes an improved deep reinforcement learning algorithm based on hybrid reinforcement learning, comprising:

carrying out improvement optimization on a preset depth reinforcement learning algorithm by using a preset load reduction method, a preset parameter integration method and a preset mixed reinforcement loss function; the load reduction method comprises the step of reducing the load calculated in the preset depth reinforcement learning algorithm to a preset load threshold.

In one embodiment, the preset parameter integration method includes integrating the preset parameter and the depth reinforcement learning algorithm to generate an integrated parameter, and updating and storing the preset parameter in a preset replay memory.

In one embodiment, the preset hybrid enhancement loss function includes:

wherein,representing the preset mixed strengthening loss function, R _j ^λ Representing the time difference return of step j, Q (s _j ,a _j I θ) represents the state pair corresponding to the Q network, s _j Representing a motion state matrix, a _j Representing an acceleration matrix, θ representing a quaternion matrix.

In an embodiment, the constructing a cost function corresponding to the dynamics model includes:

wherein cost [ c(s) _t+1 )]Representing a cost function corresponding to the dynamics model, wherein I represents an identity matrix, and Λ ^-1 Representing a diagonal precision matrix, s _t+1 Representing a state of the flying transfer robot at time t+1, s _t The state of the flying robot at the T-th time is represented, T represents matrix transposition, and Ω ^-1 The state expected to be realized by the flying transfer robot at the moment of article placement and removal is represented, and sigma represents the standard deviation.

In an embodiment, the dynamic model corresponding to the flying carrier robot includes:

wherein,representing the status of the flying transfer robot at different moments +.>Input value representing training sample, +.>The test input corresponding to the control prediction result is represented,/>representing estimated variance->Representing a probability distribution function>Represents the input function, i represents the number of control state pairs, j represents the number of control action pairs,/->Representing the training object.

In an embodiment, before the stability control of the control input pair of the flying carrier robot by the model-free optimizer, the method further comprises:

expanding the state space of the flying transfer robot to obtain an expanded state space; the extended state space comprisesWherein S represents an expanded state space, X and Y represent the current position, q of the flying carrier robot _l Indicating the state of the article at the moment of placement +.>First derivative, q, representing state correspondence at article placement instant _r Indicating the state of the article at the moment of removal +.>The first derivative corresponding to the state representing the article removal instant, T representing the matrix transpose.

According to a second aspect, in one embodiment, there is provided a stability control device for an indoor article flight transfer robot, comprising:

the carrying motion model construction module is used for constructing a carrying motion model of the flying carrying robot in a preset object carrying scene and generating a control prediction result according to a control input pair of the pre-constructed flying carrying robot and the carrying motion model; the carrying action model comprises a first model corresponding to the article placing action and a second model corresponding to the article taking action; the control input pairs of the flying transfer robot comprise control action pairs and control state pairs;

the dynamics model construction module is used for constructing a dynamics model corresponding to the flight transfer robot based on the control prediction result;

the cost function construction module is used for constructing a cost function corresponding to the dynamic model; the cost function includes a function that causes the flying transfer robot to reach a steady state;

the stability control module is used for controlling the stability of the control input pair of the flying transfer robot by using the model-free optimizer when the cost function meets a preset cost threshold; the model-free optimizer includes an improved deep reinforcement learning algorithm based on hybrid reinforcement learning.

According to a third aspect, there is provided in one embodiment a stability control apparatus of an indoor article flight transfer robot, comprising:

a memory for storing a program;

a processor configured to implement a method as described in any of the embodiments herein by executing a program stored in the memory.

According to a fourth aspect, an embodiment provides a computer readable storage medium having stored thereon a program executable by a processor to implement a method as described in any of the embodiments herein.

According to the method, the device, the equipment and the computer readable storage medium for controlling the stability of the indoor article flying and carrying robot, the control prediction result is generated according to the control input pair and the carrying action model of the flying and carrying robot, the dynamic model corresponding to the flying and carrying robot is constructed by utilizing the control prediction result, the training data when the dynamic model is constructed are reduced, the control input pair is only required to be obtained as the training data when the control prediction result is generated, the estimation time can be simplified, and the training data amount is reduced. The method comprises the steps of constructing a cost function corresponding to a dynamic model, controlling the flight transfer robot to reach the convergence time and the action space of a steady state by controlling the cost function, and controlling the stability of a control input pair of the flight transfer robot by a model-free optimizer after hybrid reinforcement learning improvement when the cost function meets a cost threshold.

Drawings

FIG. 1 is a flow chart of stability control of an indoor item flight transfer robot according to an embodiment of the present application;

fig. 2 is a block diagram of a stability control device of the indoor article flight transfer robot according to an embodiment of the present application.

Detailed Description

The invention will be described in further detail below with reference to the drawings by means of specific embodiments. Wherein like elements in different embodiments are numbered alike in association. In the following embodiments, numerous specific details are set forth in order to provide a better understanding of the present application. However, one skilled in the art will readily recognize that some of the features may be omitted, or replaced by other elements, materials, or methods in different situations. In some instances, some operations associated with the present application have not been shown or described in the specification to avoid obscuring the core portions of the present application, and may not be necessary for a person skilled in the art to describe in detail the relevant operations based on the description herein and the general knowledge of one skilled in the art.

Furthermore, the described features, operations, or characteristics of the description may be combined in any suitable manner in various embodiments. Also, various steps or acts in the method descriptions may be interchanged or modified in a manner apparent to those of ordinary skill in the art. Thus, the various orders in the description and drawings are for clarity of description of only certain embodiments, and are not meant to be required orders unless otherwise indicated.

The numbering of the components itself, e.g. "first", "second", etc., is used herein merely to distinguish between the described objects and does not have any sequential or technical meaning. The terms "coupled" and "connected," as used herein, are intended to encompass both direct and indirect coupling (coupling), unless otherwise indicated.

In some embodiments, since the control prediction result is generated according to the control input pair and the carrying action model of the flight transfer robot, and the dynamics model corresponding to the flight transfer robot is constructed by using the control prediction result, training data when constructing the dynamics model is reduced, and only the control input pair is required to be obtained as the training data when the control prediction result is generated, the estimation time can be simplified, and the training data amount can be reduced. The method comprises the steps of constructing a cost function corresponding to a dynamic model, controlling the flight transfer robot to reach the convergence time and the action space of a steady state by controlling the cost function, and controlling the stability of a control input pair of the flight transfer robot by a model-free optimizer after hybrid reinforcement learning improvement when the cost function meets a cost threshold.

Referring to fig. 1, some embodiments of the present invention provide a method for controlling stability of an indoor article flying and transporting robot, which includes steps S10 to S40, and is specifically described below.

Step S10: constructing a carrying action model of the flying carrying robot in a preset object carrying scene, and generating a control prediction result according to a control input pair of the pre-constructed flying carrying robot and the carrying action model; the carrying action model comprises a first model corresponding to the article placing action and a second model corresponding to the article taking action; the control input pairs of the flying carrier robot include a control action pair and a control state pair.

In some embodiments, constructing a transfer motion model of the flying transfer robot in the preset article transfer scene includes constructing a first model corresponding to the article placement motion and a second model corresponding to the article removal motion, the constructed transfer motion model belongs to a process in a first layer of gaussian process in a layered gaussian process, and a prediction result obtained in the first layer of gaussian process can be used as input data to generate a kinetic model in a second layer of gaussian process.

In some embodiments, the pair of control inputs for the flying carrier robot includes a pair of control actions and a pair of control states, the pair of control actions being [ a ] ₁ ,a ₂ ,…,a _N ]Where N is the total number of pairs of actions for performing stable control by the flight transfer robot. The control state pair refers toWherein (1)>Input value representing training samples in a first layer gaussian process, is->Representing a first input value,/->Representing a second input value,/->Representing the nth input value. Representing training samples, ++>Representing the first training sample in the first Gaussian process,/>Representing a second training sample in the first Gaussian process,/A->Representing the nth training sample in the first layer gaussian process.

Step S20: and constructing a dynamics model corresponding to the flight transfer robot based on the control prediction result.

In some embodiments, constructing the dynamics model corresponding to the flying robot based on the control prediction result may be regarded as a second layer gaussian process in the layered gaussian process, and input data of the second layer gaussian process is the control prediction result output in the first layer gaussian process.

In some embodiments, the corresponding dynamics model of the flying carrier robot comprises:

wherein,indicating the status of flying transfer robot at different moments, < ->Representing the input value of the training sample,representation control pre-predictionTest input corresponding to test result, < >>Representing estimated variance->Representing a probability distribution function>Represents the input function, i represents the number of control state pairs, j represents the number of control action pairs,/->Representing the training object.

In some embodiments, assume that a given control prediction corresponds to a test inputThen test input +.>The corresponding estimated variance can be expressed as:

wherein,representing test input +.>Corresponding estimated variance, K _f Represents a square index kernel function, K represents a deviation number, sigma represents a standard deviation, y represents a dependent variable of a test input,>a function representing the input of the test is presented,mean value of independent variable ∈>Representing training samples, ++>Representing the argument of the test input.

In some embodiments, the layered gaussian process is used to generate the control prediction result and the dynamics model corresponding to the flight transfer robot respectively, so that the amount of training data required for constructing the dynamics model can be reduced, the training data only needs to be collected once in the first layer of gaussian process, the training data in the second layer of gaussian process depends on the output data of the first layer of gaussian process, and no additional training sample is needed, so that the estimation time can be greatly simplified, and the efficiency of the stability control algorithm can be improved.

Step S30: constructing a cost function corresponding to the dynamic model; the cost function includes a function for bringing the flying carrier robot to a steady state.

In some embodiments, constructing a cost function corresponding to the kinetic model includes:

wherein cost [ c(s) _t+1 )]Representing a cost function corresponding to the dynamic model, wherein I represents an identity matrix, and Λ ^-1 Representing a diagonal precision matrix, s _t+1 Representing the state of the flight transfer robot at time t+1, s _t The state of the flight transfer robot at the T-th time is represented, T represents matrix transposition, and Ω ^-1 The state expected to be achieved by the flying carrier robot at the moment of article placement and removal is represented by σ, and the standard deviation is represented by σ.

In some embodiments, when the dynamic model is constructed, the motion space of the flying robot is significantly reduced, so that the convergence time for reaching the steady state is also greatly reduced. And the cost function of the dynamics model includes a function for enabling the flying transfer robot to reach a stable state, the calculation efficiency of the stability control algorithm can be remarkably improved by minimizing the cost function of the required stable state.

Step S40: when the cost function meets a preset cost threshold, performing stability control on a control input pair of the flying transfer robot by using the model-free optimizer; the model-free optimizer includes an improved deep reinforcement learning algorithm based on hybrid reinforcement learning.

In some embodiments, deep Q Networks (DQN) are successful manifestations of Deep reinforcement learning, which are similar to Deep neural networks, learning parameterized Deep Q network functions to generalize state pairs. The deep Q network uses replay memory to store and transform different states and gradient descent to find parameters that minimize the loss function. However, the disadvantage of the deep Q network is that only the current reward is used to determine the loss function, i.e. only the first order time difference is used to update the function, so that the deep reinforcement learning algorithm, i.e. the deep Q network, is improved by hybrid reinforcement learning resulting in an improved model-free optimizer.

In some embodiments, the model-less optimizer includes an improved deep reinforcement learning algorithm based on hybrid reinforcement learning, comprising:

carrying out improvement optimization on a preset depth reinforcement learning algorithm by using a preset load reduction method, a preset parameter integration method and a preset mixed reinforcement loss function; the load reduction method comprises the step of reducing the load calculated in the preset deep reinforcement learning algorithm to a preset load threshold.

In some embodiments, the preset load threshold is O (N). By reducing the load calculated in the preset deep reinforcement learning algorithm to the preset load threshold O (N), the efficiency of calculating long reward trajectories can be improved and the rewards converted to the final maximized sum of the discount rewards and action values.

In some embodiments, the preset parameter integration method includes integrating the preset parameter and the depth reinforcement learning algorithm, generating an integrated parameter, and updating and storing the preset parameter in a preset replay memory.

In some embodiments, the predetermined parameter λ is integrated with a deep reinforcement learning algorithm, resulting in an integrated parameter DQN (λ), and the predetermined parameter λ is periodically updated and stored in a corresponding replay memory.

In some embodiments, the preset blend enhancement loss function includes:

In some embodiments, prior to stability control of the control input pair of the fly-by-robot with the model-less optimizer, further comprising:

expanding the state space of the flying transfer robot to obtain an expanded state space; the extended state space comprisesWherein S represents an expanded state space, X and Y represent the current position of the flying carrier robot, q _l Indicating the state of the article at the moment of placement +.>First derivative, q, representing state correspondence at article placement instant _r Indicating the state of the article at the moment of removal +.>Indicating the moment of taking the article awayT represents the matrix transpose.

In some embodiments, state q is at the moment of article placement _l And state q at the moment of article removal _r Is a matrix constructed from the speed and acceleration of the position of the flying carrier robot.

In some embodiments, in order to obtain an accurate strategy with minimum variance, not only the joint information as a state space, but also the current position of the flying carrier robot need to be considered when the stability control is performed by using the model-free optimizer, and thus the state space of the flying carrier robot needs to be expanded.

In some embodiments, the hybrid reinforcement learning-based modified deep reinforcement learning algorithm builds a model-free optimizer that uses two layers of predictions to estimate the transition model twice, enabling the algorithm to estimate the system model with very few training samples, thus greatly reducing training time, as compared to other hybrid reinforcement learning-based and model-based reinforcement learning algorithms that use state-action pairs to predict the transition model. In addition, after the rough model of the system is obtained, the model-based estimator is not updated in the online learning process, so that the calculation load of online learning is greatly reduced, the sample efficiency of online learning is greatly improved, the model-free optimizer only needs to evaluate a small number of possible operations of each state pair, and the proposed hybrid reinforcement learning structure also avoids the problem of distribution mismatch.

Referring to fig. 2, in some embodiments, a stability control apparatus for an indoor article flight transfer robot is provided, which includes a transfer motion model building module 10, a dynamics model building module 20, a cost function building module 30, and a stability control module 40, and is described in detail below.

The carrying motion model construction module 10 is used for constructing a carrying motion model of the flying carrying robot in a preset object carrying scene, and generating a control prediction result according to a control input pair of the pre-constructed flying carrying robot and the carrying motion model; the carrying action model comprises a first model corresponding to the article placing action and a second model corresponding to the article taking action; the control input pairs of the flying carrier robot include a control action pair and a control state pair.

The dynamics model construction module 20 is configured to construct a dynamics model corresponding to the flight transfer robot based on the control prediction result.

In some embodiments, the dynamics model construction module 20 constructs a dynamics model corresponding to the fly-by-robot based on the control prediction result, which may be regarded as a second-layer gaussian process in the layered gaussian process, and input data of the second-layer gaussian process is the control prediction result output in the first-layer gaussian process.

wherein,indicating the status of flying transfer robot at different moments, < ->Representing the input value of the training sample,test input corresponding to the control prediction result +.>Representing estimated variance->Representing a probability distribution function>Represents the input function, i represents the number of control state pairs, j represents the number of control action pairs,/->Representing the training object.

The cost function construction module 30 is configured to construct a cost function corresponding to the dynamics model; the cost function includes a function for bringing the flying carrier robot to a steady state.

In some embodiments, cost function construction module 30 constructs a cost function corresponding to the kinetic model, including:

The stability control module 40 is configured to perform stability control on a control input pair of the flying carrier robot by using the model-free optimizer when the cost function meets a preset cost threshold; the model-free optimizer includes an improved deep reinforcement learning algorithm based on hybrid reinforcement learning.

In some embodiments, the model-less optimizer in the stability control module 40 includes a hybrid reinforcement learning-based modified deep reinforcement learning algorithm comprising:

In some embodiments, the preset blend enhancement loss function includes:

expanding the state space of the flying transfer robot to obtain an expanded state space; the extended state space comprisesWherein S represents an expanded state space, X and Y represent the current position of the flying carrier robot, q _l Indicating the state of the article at the moment of placement +.>First derivative, q, representing state correspondence at article placement instant _r Indicating the state of the article at the moment of removal +.>The first derivative corresponding to the state representing the article removal instant, T representing the matrix transpose.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by a computer program. When all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a computer readable storage medium, and the storage medium may include: read-only memory, random access memory, magnetic disk, optical disk, hard disk, etc., and the program is executed by a computer to realize the above-mentioned functions. For example, the program is stored in the memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above can be realized. In addition, when all or part of the functions in the above embodiments are implemented by means of a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and the program in the above embodiments may be implemented by downloading or copying the program into a memory of a local device or updating a version of a system of the local device, and when the program in the memory is executed by a processor.

The foregoing description of the invention has been presented for purposes of illustration and description, and is not intended to be limiting. Several simple deductions, modifications or substitutions may also be made by a person skilled in the art to which the invention pertains, based on the idea of the invention.

Claims

1. A stability control method of an indoor article flying and carrying robot is characterized by comprising the following steps:

2. The method of claim 1, wherein the model-less optimizer comprises a hybrid reinforcement learning-based improved deep reinforcement learning algorithm comprising:

3. The method of claim 2, wherein the preset parameter integration method comprises integrating a preset parameter with the deep reinforcement learning algorithm, generating an integrated parameter, and updating and storing the preset parameter in a preset replay memory.

4. The method of claim 2, wherein the predetermined blend strengthening loss function comprises:

wherein,representing the preset hybrid enhancement loss function, < >>Representing the time difference return of step j, Q (s _j ,a _j I θ) represents the state pair corresponding to the Q network, s _j Representing a motion state matrix, a _j Representing an acceleration matrix, θ representing a quaternion matrix.

5. The method of claim 1, wherein said constructing a cost function corresponding to said kinetic model comprises:

6. The method of claim 1, wherein the corresponding kinetic model of the flying carrier robot comprises:

wherein,representing the status of the flying transfer robot at different moments +.>Representing the input value of the training sample,test input corresponding to the control prediction result +.>Representing estimated variance->Representing a probability distribution function>Represents the input function, i represents the number of control state pairs, j represents the number of control action pairs,/->Representing the training object.

7. The method of claim 1, wherein prior to stability control of the pair of control inputs to the flying carrier robot with a model-free optimizer, the method further comprises:

8. A stability control device for an indoor article flying and handling robot, comprising:

9. A stability control device for an indoor article flight transfer robot, comprising:

a memory for storing a program;

a processor for implementing the method of any of claims 1-7 by executing a program stored in the memory.

10. A computer readable storage medium, characterized in that the medium has stored thereon a program executable by a processor to implement the method of any of claims 1-7.