WO2023165177A1 - 构建机器人的控制器的方法、机器人的运动控制方法、装置以及机器人 - Google Patents

构建机器人的控制器的方法、机器人的运动控制方法、装置以及机器人 Download PDF

Info

Publication number
WO2023165177A1
WO2023165177A1 PCT/CN2022/134041 CN2022134041W WO2023165177A1 WO 2023165177 A1 WO2023165177 A1 WO 2023165177A1 CN 2022134041 W CN2022134041 W CN 2022134041W WO 2023165177 A1 WO2023165177 A1 WO 2023165177A1
Authority
WO
WIPO (PCT)
Prior art keywords
robot
controller
control
data
motion
Prior art date
Application number
PCT/CN2022/134041
Other languages
English (en)
French (fr)
Inventor
王帅
张竞帆
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to US18/203,910 priority Critical patent/US20230305563A1/en
Publication of WO2023165177A1 publication Critical patent/WO2023165177A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/04Programme control other than numerical control, i.e. in sequence controllers or logic controllers
    • G05B19/042Programme control other than numerical control, i.e. in sequence controllers or logic controllers using digital processors
    • G05B19/0423Input/output
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/02Control of position or course in two dimensions
    • G05D1/021Control of position or course in two dimensions specially adapted to land vehicles
    • G05D1/0212Control of position or course in two dimensions specially adapted to land vehicles with means for defining a desired trajectory
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/04Programme control other than numerical control, i.e. in sequence controllers or logic controllers
    • G05B19/042Programme control other than numerical control, i.e. in sequence controllers or logic controllers using digital processors
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/20Pc systems
    • G05B2219/25Pc structure of the system
    • G05B2219/25257Microcontroller

Definitions

  • the present invention relates to the field of artificial intelligence and robotics, and more specifically relates to a method for constructing a robot controller, a robot motion control method, a device, a robot, a computer-readable storage medium, and a computer program product.
  • robots based on artificial intelligence and robotics are playing an increasingly important role in the fields of intelligent transportation and smart home, and are also facing higher requirements.
  • the present disclosure provides a method for constructing a robot controller, a robot motion control method, a device, a robot, a computer-readable storage medium, and a computer program product.
  • the present disclosure provides a method for constructing a controller of a robot, which is executed by a processor, and the method includes: using a first controller to control the motion of the robot, and acquiring motion state data and control data during the motion of the robot; Using strategy iteration to update the linear balance parameter matrix of the first controller according to the motion state data and the control data; and constructing dynamic characteristics corresponding to the robot based on the updated linear balance parameter matrix the second controller.
  • the present disclosure provides a method for controlling motion of a robot, executed by a processor, the robot moves by driving a driving wheel, the method includes:
  • the movement instruction indicating the movement trajectory of the robot
  • the driving force applied to the driving wheel is controlled by the first controller, so that the robot moves according to the movement trajectory;
  • the driving force applied to the driving wheel is controlled by the second controller to make the robot move smoothly.
  • the present disclosure provides a robot, the robot includes: a data acquisition device configured to: acquire motion state data of the robot when the first controller controls the motion of the robot; a data processing device , configured to: acquire control data corresponding to the motion state data; update the linear balance parameter matrix of the first controller in a strategy iteration manner based on the motion state data and the control data; and update based on After linearly balancing the parameter matrix, construct a second controller corresponding to the dynamics of the robot.
  • a data acquisition device configured to: acquire motion state data of the robot when the first controller controls the motion of the robot
  • a data processing device configured to: acquire control data corresponding to the motion state data; update the linear balance parameter matrix of the first controller in a strategy iteration manner based on the motion state data and the control data; and update based on After linearly balancing the parameter matrix, construct a second controller corresponding to the dynamics of the robot.
  • the present disclosure provides an apparatus for constructing a controller of a robot, the apparatus comprising:
  • a motion control module configured to use the first controller to control the motion of the robot, and obtain motion state data and control data of the robot during motion;
  • a strategy iteration module configured to update the linear balance parameter matrix of the first controller in a strategy iteration manner according to the motion state data and the control data;
  • the second controller construction module is configured to construct a second controller corresponding to the dynamic characteristics of the robot based on the updated linear equilibrium parameter matrix.
  • the present disclosure provides a robot motion control device, the robot moves by driving a driving wheel, the device includes:
  • An instruction receiving module configured to receive a movement instruction, the movement instruction indicating the movement trajectory of the robot
  • An instruction execution module configured to control the driving force applied to the driving wheel through the first controller according to a movement instruction, so that the robot moves according to the movement trajectory;
  • a data acquisition module configured to acquire motion state data and control data of the robot during motion
  • a strategy iteration module configured to use strategy iteration to construct a second controller corresponding to the dynamic characteristics of the robot based on the motion state data and the control data;
  • a driving force control module configured to use the second controller to control the driving force applied to the driving wheels, so as to make the robot move smoothly.
  • the present disclosure provides a computer-readable storage medium, on which computer-readable instructions are stored, and when the computer-readable instructions are executed by one or more processors, the method described in any one of the above-mentioned is implemented A step of.
  • the present disclosure provides a computer program product, including computer readable instructions, which, when executed by one or more processors, implement the steps of any one of the methods described above.
  • Fig. 1 shows a schematic structural diagram of a robot having a left wheel leg and a right wheel leg in a single wheel leg configuration according to an embodiment of the present disclosure.
  • Fig. 2 shows an exemplary flowchart of a method for constructing a robot controller according to an embodiment of the present disclosure.
  • Fig. 3 shows a schematic diagram of labeling corresponding to a robot according to a disclosed embodiment.
  • Fig. 4 shows a control architecture diagram corresponding to a robot according to a disclosed embodiment.
  • Fig. 5 shows an exemplary flowchart of a method for constructing a controller of a robot according to an embodiment of the present disclosure.
  • Fig. 6 shows another structural view of the robot according to the embodiment of the present disclosure.
  • Fig. 7A shows motion state data and control data during the motion process of the robot using the first controller to control the motion of the robot according to an embodiment of the present disclosure.
  • FIG. 7B shows the convergence process of the linear equilibrium parameter matrix in the process of constructing the second controller according to an embodiment of the present disclosure, where the base heights of the robot are 0.5 m and 0.6 m, respectively.
  • FIG. 7C shows the motion state data of the robot using the first controller and the second controller to control the robot to walk straight when the height of the base part is 0.6 meters according to an embodiment of the present disclosure.
  • FIG. 8 shows an exemplary flow chart of constructing a first controller according to an embodiment of the present disclosure.
  • Fig. 9 shows an exemplary schematic diagram of a method for constructing a robot controller according to an embodiment of the present disclosure.
  • Fig. 10 shows a schematic diagram of a process of collecting motion state data and control data of a robot according to an embodiment of the present disclosure.
  • Fig. 11 shows a graph of collecting motion state data and control data of a robot according to an embodiment of the present disclosure.
  • Fig. 12 shows an iterative schematic diagram of a linear balance parameter matrix in the process of acquiring a robot according to an embodiment of the present disclosure.
  • Fig. 13 shows an experimental schematic diagram of a controller of a test robot according to an embodiment of the present disclosure.
  • FIG. 14 shows a graph of experimental data for testing a controller of a robot according to an embodiment of the present disclosure.
  • the disclosed technical solution mainly relates to robot technology in artificial intelligence technology, and mainly relates to robot intelligent control.
  • a robot is a kind of mechanical and electronic equipment that can imitate a certain skill of a person by using mechanical transmission and modern microelectronic technology.
  • the robot is developed on the basis of electronics, machinery and information technology.
  • a robot does not necessarily have to look like a human, as long as it can independently complete the tasks and orders given to it by humans, it is a member of the robot family.
  • a robot is an automated machine that has some intelligent abilities similar to humans or creatures, such as perception, planning, action, and coordination. It is an automated machine with high flexibility. With the development of computer technology and artificial intelligence technology, the function and technical level of robots have been greatly improved. Mobile robots and robot vision and tactile technologies are typical representatives.
  • the present disclosure relates to an application of artificial intelligence in robot control. Specifically, the present disclosure proposes a method for constructing a robot controller based on artificial intelligence.
  • a controller for the dynamics of the robot A robot under the control of a controller corresponding to the dynamics of the robot has a motion state closer to the equilibrium point than a robot under the control of this arbitrary controller to be optimized.
  • the robot involved in the present disclosure may be an underactuated robot, and an underactuated robot is a type of nonlinear control object in which the number of independent control variables is less than the number of degrees of freedom.
  • the underactuated robot can be a wheel-legged robot as shown in Fig. 1 .
  • FIG. 1 shows a schematic structural diagram of a robot having a left wheel leg and a right wheel leg in a single wheel leg configuration according to an embodiment of the present disclosure.
  • the wheel-leg robot 100 may include: a wheel leg portion 103; the wheel leg portion 103 includes a driving wheel 104 (also called a foot).
  • the wheel-legged robot 100 can also include a base part 101, the base part 101 refers to the main body part of the robot, for example, it can be the torso of the robot, and the base part can be, for example, a part connected to the wheel leg part of the robot. Planar plate-shaped parts or cuboid-shaped parts.
  • one end of the wheel leg portion 103 is connected to the base portion 101 , and the other end is connected to the driving wheel 104 .
  • the base part 101 is equipped with a power output device (for example, a motor), which can be used to provide power for the driving wheel of the driving wheel leg part 103 .
  • a power output device for example, a motor
  • the robot may also include a base part connected to the wheel leg part or an additional component arranged on the base part. It should be understood that the above only gives a structural example of a robot, and embodiments of the present disclosure are not limited by the specific components and connection methods of the robot.
  • the driving wheel 104 in the wheel leg part 103 can make the wheel leg type machine 100 people can walk, also can carry out wheeled movement.
  • the wheel-legged robot 100 may also include a controllable additional component (eg, a tail), which can be used to balance the wheel-legged robot, or assist the wheel-legged robot to move.
  • the tail can assist wheel-legged robots to maintain balance during movement.
  • the wheel-legged robot can also include a controllable mechanical arm, which can be used to perform operational tasks such as carrying and picking.
  • Wheel-legged robots may include multi-legged wheel-legged robots, such as bipedal wheel-legged robots, quadruped wheel-legged robots, and the like.
  • the wheel leg portion 103 is a parallel structure leg (the balance point is located between the two legs of the biped wheel-leg robot 100 ).
  • the wheel leg portion 102 of the robot 100 includes a left wheel leg portion and a right wheel leg portion, each of the left wheel leg portion and the right wheel leg portion includes a driving wheel and two parallel legs, and the two parallel legs The part is connected to the central shaft of the driving wheel and is used to realize the motion control of the driving wheel.
  • the left wheel leg includes a left driving wheel, a first left wheel leg and a second left wheel leg, and the first left wheel leg and the second left wheel leg are connected in parallel; and the right wheel leg 112 includes a right driving wheel, for example, The first right wheel leg and the second right wheel leg, the first right wheel leg and the second right wheel leg are connected in parallel.
  • the left wheel leg and the right wheel leg are mirror images.
  • the driving wheel can be a single wheel, two wheels, four wheels or other numbers of driving wheels, and the movement of each driving wheel can be controlled by two legs connected in parallel or multiple legs connected in series. It should be understood that the embodiments of the present disclosure are not limited by the specific composition type of the left and right wheel legs and the number of driving wheels. In some embodiments, both the left wheel leg and the right wheel leg are of a single wheel leg configuration. Single wheel leg configuration means that the wheel leg portion only includes a single drive wheel.
  • the left wheel leg and the right wheel leg may include the same number of joints and have the same joint configuration, or, according to actual needs, the left wheel leg and the right wheel leg may have different joint numbers, for example. or have different joint configurations, or have different joint numbers and different joint configurations.
  • the embodiments of the present disclosure are not limited by the specific number of joints and joint configurations of the left wheel leg and the right wheel leg. Taking the example shown in Figure 1 as an example, each of the left and right wheel legs contains 5 joints, and there are 2 rotational degrees of freedom in total, and the wheel leg/base part can be adjusted by adjusting each joint of the wheel leg 103 The variation of the centroid height and the inclination angle of the base portion.
  • the legs of the robot can be a series structure leg or a parallel structure leg. Compared with the series structure leg, the parallel structure leg can have stronger rigidity and can withstand the impact that may be caused by complex movements.
  • the driving wheels 104 can provide the wheel-legged robot 100 with the ability to slide.
  • the biped wheel-legged robot 100 may also include an additional component 102 connected to the base part 101 .
  • a driven wheel 105 can be installed on the additional part 102 .
  • the attachment part 102 includes 1 rotational degree of freedom. The movement of the additional part 102 will also affect the changes of the base part 101 and the leg part 103, for example, the position change of the additional part can drive the base part to have a certain rotation speed.
  • the balance and posture of the robot 100 can be adjusted by adjusting the position of the attachment part 102 .
  • the wheel-legged robot 100 has both the flexibility of the wheeled robot and the flexibility of the legged robot, so it can move quickly on flat ground and also can cross rough roads.
  • some wheel-legged robots similar to the wheel-legged robot 100 shown in FIG. 1
  • the wheel-legged robot 100 is a non-minimum phase system, so it is still difficult to balance the wheel-legged robot 100 in practical applications.
  • due to the complex mechanical structure of the (wheel-legged) robot it is difficult to determine the dynamics of the (wheel-legged) robot. Since the traditional balance control method needs to know the dynamic characteristics of the robot, it is difficult for the traditional balance control method to perform balance control on such a robot when the dynamic characteristics are unknown.
  • the method for constructing the controller of the robot involves using the adaptive dynamic programming method (ADP) and/or the whole body dynamics method to design the robot so that the dynamic characteristics of the robot are unknown.
  • ADP essentially solves the infinite time-domain LQR problem, but the parameters of the system model are completely unknown. Therefore, the well-known algebraic Riccati equation cannot be solved analytically.
  • the embodiments of the present disclosure realize that the solution to the LQR problem can still be obtained through an artificial intelligence solution when the system model cannot be used to solve the LQR problem.
  • the adaptive dynamic programming method can be based on a data-driven (data-driven) policy iteration (Policy Iteration, PI) scheme.
  • Policy Iteration, PI policy iteration
  • the embodiments of the present disclosure may optionally combine optimal control technology to propose a policy iteration method based on adaptive dynamic programming.
  • the strategy iteration method based on adaptive dynamic programming can dynamically iterate the controller when the dynamic parameters of the robot are changed or the dynamic characteristics are unknown, so as to obtain the controller corresponding to the dynamic characteristics of the robot after the parameter changes.
  • the controller enables the robot to travel along the target trajectory with the optimal control effect in a balanced state even if the dynamic parameters of the robot change.
  • the numerical iterative method based on adaptive dynamic programming may not require any initial controller, but requires a relatively large amount of data, and is more suitable for offline iterative controllers.
  • the policy iteration method based on adaptive dynamic programming needs an initial controller, the amount of data required is much smaller than the numerical iteration method based on adaptive dynamic programming.
  • the embodiments of the present disclosure are based on artificial intelligence technology, such as reinforcement learning and ADP technology, and utilize policy iteration, numerical iteration or whole-body dynamics control technology to solve the optimal balance control of the robot controller in the case of unknown dynamic characteristics of the robot question.
  • the process of building the controller in the embodiment of the present disclosure only requires the wheel-legged robot to travel for a period of time or a trajectory under the control of a non-optimal controller or an arbitrary controller, and collect the motion corresponding to the time period or the trajectory State data and control data are used as training data. Therefore, the amount of training data in the embodiments of the present disclosure is much smaller than the amount of data required by traditional reinforcement learning algorithms.
  • the controller trained in the embodiments of the present disclosure gradually converges to the controller corresponding to the optimal solution of the linear quadratic regulation problem as the learning step increases, so that the stability of the closed-loop system can be guaranteed, and its The training process is greatly simplified, and there is no need to impose additional restrictions on the training data, thereby simplifying the design process of the controller of the wheel-legged robot.
  • the various embodiments of the present disclosure are all data collected from real robots, and the control strategies obtained based on these real robot data are directly applied to the robots, so there is no need to consider the difference between simulated control and real control, Improved controller performance on real robots.
  • Fig. 2 shows an exemplary flowchart of a method for constructing a robot controller according to an embodiment of the present disclosure.
  • the method for constructing a robot controller may include steps S201 to S203.
  • steps S201 to S203 can be performed online or offline, and the present disclosure is not limited thereto.
  • the method of constructing a controller for a robot may optionally be applied to any robot that includes wheel legs including driving wheels.
  • the method of constructing the controller of the robot will be further described by taking the robot 100 shown in FIG. 1 as an example.
  • the robot 100 in FIG. 1 is further marked with reference to FIG. 3 .
  • the complex robot 100 shown in FIG. 1 can be marked in the robot's generalized coordinate system.
  • centers P 1 and P 2 of the drive wheel are shown as two separate points, and those skilled in the art should understand that P 1 and P 2 are substantially the same point.
  • q ⁇ , ⁇ and ⁇ ⁇ , ⁇ are used to identify the parameters of the wheel legs involved in each joint, where q ⁇ , ⁇ identifies the joint rotation angle, ⁇ ⁇ , ⁇ identifies the torque of the joint.
  • q ⁇ 1,2 ⁇ identifies the joint rotation angle between the first link of the left wheel leg of the robot and the base
  • ⁇ ⁇ 1,2 ⁇ identifies the joint rotation angle between the first link of the left wheel leg of the robot and the base.
  • Joint rotation torque between seats can be set correspondingly.
  • Embodiments of the present disclosure optionally combine whole-body dynamics technology to propose a whole-body dynamics control method based on adaptive dynamic programming.
  • the output of the robot controller calculated based on adaptive dynamic programming is used as a reference parameter of the whole-body dynamics control, thereby improving the overall flexibility of the robot's movement.
  • the embodiments of the present disclosure optionally combine the optimal control technology, and propose a strategy iteration method based on adaptive dynamic programming.
  • the strategy iteration method based on adaptive dynamic programming can dynamically iterate the linear equilibrium parameter matrix in the controller to obtain the controller corresponding to the dynamic characteristics of the robot after the parameter changes when the dynamic parameters of the robot change. .
  • the controller enables the robot to travel along the target trajectory with the optimal control effect in a balanced state even if the dynamic parameters of the robot change.
  • step S201 use the first controller to control the movement of the robot, and acquire motion state data and control data of the robot during movement.
  • the exact dynamics of the robot are unknown, or only part of the dynamics of the robot can be roughly determined.
  • the dynamics of the robot may involve some variable parameters. Taking the robot 100 as an example, as the robot 100 moves, the height of the center of gravity of the robot 100 may change. In addition, if the robot 100 is used to carry objects, the mass of the robot 100 may also change accordingly. Whether it is the change of the height of the center of gravity or the change of the mass, it may lead to the change of the kinetic energy, potential energy, momentum and other characteristics of the robot during the movement process, thus resulting in the change of the corresponding dynamic model of the robot 100 . Although the present disclosure has only been described by taking the height of the center of gravity and the mass as examples of variable parameters, it is not limited thereto.
  • the first controller can be constructed based on historical motion data of the robot.
  • the processor can obtain historical motion state data and historical control data from the historical motion data of the robot, and the diversity metrics of the historical motion state data and historical control data are higher than a predetermined threshold; according to the historical motion state data and historical control data , calculating a linear balance parameter matrix in a numerical iterative manner; and constructing a first controller for controlling the motion of the robot based on the linear balance parameter matrix.
  • the processor can control the robot to move along a predetermined trajectory, and obtain motion state data and control data during the movement.
  • the predetermined trajectory can be roughly estimated based on the structural characteristics, motion characteristics, and dynamic characteristics of the robot to collect motion data of the robot in various motion situations (scenes) so that the diversity of motion state data and control data is high enough.
  • Controlling robot motion can be achieved by determining the control torques used to control each joint of the robot's wheel legs.
  • the processor adaptively determines control information for controlling the rotation of the active wheel based on historical motion information of the robot, and determines first control information for controlling multiple joints based on the control information for controlling the rotation of the active wheel.
  • the information keeps the robot in balance.
  • the second control information for controlling multiple joints is determined.
  • the second control information makes the robot move along the target trajectory, and then based on the robot’s motion constraints, the first control information and the second control information
  • the second control information determines the control torque of each joint in the wheel leg of the robot, so that each joint can be driven based on the control torque to control the movement of the robot.
  • the linear balance parameter matrix of the first controller may be a stable initial gain matrix.
  • the control force provided by the controller at a moment is inversely related to the product of the linear balance parameter matrix and the motion state data of the robot at a moment.
  • the robot 100 at least includes: a wheel leg portion including a plurality of joints, a base portion connected to the wheel leg portion, and a driving wheel for controlling the wheel leg portion.
  • the motion state data includes: the pitch angle of the base part, the pitch angular velocity of the base part, and the linear velocity of the driving wheel.
  • the control data includes: the output torque of the drive motor.
  • both motion state data and control data can be discretely collected by relevant measuring instruments, both motion state data and control data correspond to multiple discrete continuous moments or multiple continuous time intervals.
  • the first controller may be a non-optimal controller.
  • a non-optimal controller is, for example, a controller that can only move the robot 100 stumbling along a target trajectory.
  • the first controller that is a non-optimal controller may be a controller corresponding to simplified dynamics.
  • simplified dynamics For example, for an accurate dynamic model corresponding to a complex wheel-legged robot, it can be simplified to an equivalent dynamic model consisting only of driving wheels and a base, and so on.
  • the first controller may be used to control the robot to move in a state of quasi-balance. For example, under the control of some first controllers, the robot will swing left and right at the balance point with a certain range. For example, if a robot with unknown dynamic characteristics is controlled to move in a quasi-equilibrium state under the control of the first controller, the output of the first controller can be used as control data. And if the experimenter uses a remote controller to control the robot to move, then the control data can be obtained by collecting the output of the controller (for example, detecting the driving force of the driving wheel, etc.) on the real robot. The present disclosure does not limit the manner of obtaining the control data.
  • the first controller may also be a PID controller or the like.
  • the linear balance parameter matrix of the first controller can even arbitrarily stabilize the control gain matrix.
  • the robot can be directly and randomly controlled to travel a certain distance with any control data, and the control data and motion state data before the robot completely loses balance (such as falling) can be intercepted as the motion state data and control data obtained in step 201 That's it.
  • the present disclosure does not limit the specific design of the first controller, as long as it can control the robot without completely losing balance.
  • one or more first controllers obtained using a numerical iteration solution may also be used to control the robot to move, and the specific implementation of this solution will be described in detail later.
  • a numerical iteration scheme may be used to determine the first controller.
  • using a numerical iteration scheme to determine the first controller can be realized offline. For example, let the variable parameter be the height of the robot, and let the first value of the variable parameter be 0.38 meters, and the second value of the variable parameter be 0.5 meters. When the height of the robot 100 is 0.38 meters, the optimal controller for controlling the robot to walk in a straight line can be determined by using the numerical iteration scheme, and this controller can be used as the first controller.
  • the height of the robot 100 is adjusted to 0.5 meters. Then, continue to use the first controller to control the height-adjusted robot to travel for a period of time or a certain distance, and correspondingly collect motion state data and control data. Subsequently, the motion state data and control data will be used as the training data of the second controller to obtain the optimal controller when the variable parameter is the second value.
  • first value and second value are just examples, and the present disclosure is not limited thereto.
  • the numerical iterative scheme can determine the first controller suitable for a robot with a height of 0.38 m, it is necessary to recalculate the second controller suitable for the robot after the height change when the height of the robot changes . Offline calculations can be time-consuming and can cause disruptions in the robot's motion.
  • steps S202 to S203 can be used to construct a second controller by adopting a scheme of strategy iteration.
  • the linear balance parameter matrix of the first controller is updated in a strategy iteration manner according to the motion state data and the control data.
  • step S203 based on the updated linear equilibrium parameter matrix, a second controller corresponding to the dynamic characteristics of the robot is constructed.
  • the robot under the control of the second controller may have a better control effect during the motion process than the robot under the control of the first controller.
  • the swing range of the robot around the equilibrium point under the second controller corresponding to the dynamic characteristics of the robot may be smaller than that under the first controller.
  • the robot under the control of the controller compared with the robot under the control of the first controller, can converge to the vicinity of the balance point faster during the movement, or the vibration of the robot is smaller, or the control speed Faster, or smaller overshoot, or smaller steady-state error, etc.
  • an arbitrary controller and a controller have equivalent control effects, but the controller's control input is smaller. This disclosure is not limited in this regard.
  • a traveling robot 100 is taken as an example for illustration.
  • the robot 100 in a balanced state may be in a stable balanced state in the linear motion dimension and the rotational motion dimension.
  • the robot 100 in a balanced state can maintain the same or a very similar state to the state defined by the balance point during the movement, or can return to the state of balance with the fastest speed or the minimum energy consumption during the movement.
  • the state defined by the balance point may be such that the robot 100 is in a state where the pitch angle is zero, the angular velocity corresponding to the pitch angle is zero, and the linear velocity is at the target velocity.
  • the posture of the robot 100 is in a vertical upward state, and the robot 100 does not have a speed in the rotational motion dimension but only has a target speed in the linear motion dimension, that is, the robot 100 is in a state defined by the balance point.
  • the robot 100 in the quasi-balance state is in a state defined near the balance point during the movement.
  • the robot 100 in the quasi-equilibrium state may be in an intermediate state from a stable equilibrium state to an unstable equilibrium state in the linear motion dimension and the rotational motion dimension.
  • the robot 100 in a quasi-balanced state may require a large force and torque provided by the driving wheels to ensure that it does not fall down during the movement process.
  • the robot 100 can tilt left and right, and the robot 100 has a speed in the linear motion dimension and also has a speed in the rotational motion dimension, that is, the robot 100 is in a state defined by a quasi-balance point.
  • the robot 100 in the quasi-balance state may also be in a state of near-unstable equilibrium in the linear motion dimension or the rotational motion dimension at certain moments during the motion, as long as it can pass the driving force of the driving wheel 104 Return to the state where you can run normally.
  • the robot 100 in the equilibrium state can always maintain a vertical upward posture and move in a straight line at a uniform speed, that is, the basic structure of the robot 100 in an unstable equilibrium state
  • the central axis of the seat can be perpendicular to the horizontal at all times and has no velocity or acceleration in the dimension of rotational movement.
  • the base of the robot 100 in a quasi-balanced state may have an inclination angle (pitch angle), and at least one of velocity or acceleration in the rotational motion dimension.
  • the robot may first be made to travel for a period of time or a trajectory under the control of the first controller, and the motion state data and control data corresponding to the time period or the trajectory are collected as training data .
  • the embodiments of the present disclosure can be determined as the optimal controller through strategy iteration the second controller.
  • Embodiments of the present disclosure utilize a numerically driven strategy iteration scheme to calculate a linear balance parameter matrix, and then construct the second controller. The control effect of the second controller will be better than that of the first controller.
  • the constructed second controller is able to converge to a controller corresponding to the optimal solution of the linear quadratic regulation problem.
  • the controller corresponding to the optimal solution of the linear quadratic regulation problem is also the controller corresponding to the exact dynamics of the robot.
  • the controller corresponding to the optimal solution of the linear quadratic regulation problem can minimize the cost functional of the robot in the process of motion, so that the robot can travel along the target trajectory with the optimal control effect in the balanced state.
  • the policy iteration scheme and the calculation scheme of the linear balance parameter matrix will be further described later on.
  • the amount of training data in the embodiments of the present disclosure is much smaller than the amount of data required by traditional reinforcement learning algorithms.
  • the controller trained in the embodiments of the present disclosure gradually converges to the controller corresponding to the optimal solution of the linear quadratic regulation problem as the learning step increases, so that the stability of the closed-loop system can be guaranteed, and its The training process is greatly simplified and does not require additional constraints on the training data, thus simplifying the design process of the robot's controller.
  • the embodiments of the present disclosure can directly collect data on a real robot, and the controller obtained through training can be directly applied to the real robot.
  • the embodiments of the present disclosure do not need to collect data in a simulator based on a physics engine, and also avoid some problems caused by the migration of data in the virtual world to the real world.
  • it can be To characterize the generalized coordinates of the robot.
  • the generalized coordinate parameters of the robot Pose including the base and n j joint angles
  • the generalized velocity set of the robot joints can be determined and the generalized acceleration set Those skilled in the art should understand that, and represent the instantaneous angular velocity and instantaneous angular acceleration of the robot body, respectively.
  • M(q) is used to represent the mass matrix of the robot. Used to represent the gravitational, centrifugal, and Coriolis terms of the robot.
  • the matrix S is used to select the active joint from all the joints. If the element value of an element in S is 0, it means that it is a non-driven joint. If the element value is not 0, it is identified as an active joint.
  • f is the generalized force provided by the ground at the point of contact when the robot is in contact with the ground.
  • J f is the concatenated contact Jacobian matrix concatenated for f.
  • is the closed-loop force of the front leg acting on the rear leg.
  • J ⁇ is the concatenated contact Jacobs matrix for ⁇ .
  • n c is the number of contact points between the driving wheel and the ground.
  • n ⁇ is the number of contact points between the open-loop links.
  • the wheel legs of the robot are five-bar linkages.
  • the number of contact points between the open-loop links of the closed-loop constraints of the five-bar linkage mechanism is two.
  • FIG. 4 shows a control architecture diagram corresponding to a robot according to a disclosed embodiment. Specifically, taking the robot marked in FIG. 3 as an example, FIG. 4 shows a plurality of exemplary control tasks of the robot and the association between the control tasks. The combination and association of these exemplary control tasks is also referred to as the corresponding dynamic whole-body dynamics control of the robot.
  • Figure 4 also shows another example, which uses the measured values to estimate the motion state of the robot, and then inputs the state estimated value to the data processing module for adaptively determining the control information for controlling the rotation of the driving wheel , so that the data module can learn the corresponding measurement values at each moment more quickly, so as to more efficiently calculate the optimal controller for controlling the rotation of the driving wheel.
  • control information for controlling the rotation of the driving wheel can be either the acceleration of the driving wheel or the torque of the driving wheel.
  • the control information for controlling the rotation of the driving wheel can be either the acceleration of the driving wheel or the torque of the driving wheel.
  • the corresponding dynamic whole-body dynamics control of a robot can be described as aiming at minimizing the total input energy for each joint and minimizing the error between the target trajectory and the target trajectory while ensuring the balance of the robot. Joints are controlled.
  • the dynamic whole body dynamics control objective argminz for the robot marked in Fig. 3 can be expressed by formula (2).
  • ⁇ des is a vector formed by the combination of torque set for each joint in the target trajectory.
  • is a vector formed by the combination of the torque sets of each joint in the actual motion process.
  • f is the generalized force provided by the ground at the contact point when the robot is actually in contact with the ground.
  • is the closed-loop force of the front leg acting on the rear leg during the robot's movement.
  • the subscripts W q , W ⁇ , W f , and W ⁇ respectively identify ⁇ , f and ⁇ need to be multiplied by the weight coefficient matrix when calculating the norm of formula (2).
  • the controller determined by the adaptive dynamic programming will be used to control the active wheels shown in Fig. 1 and Fig. 3 .
  • the motion state and power state of the active wheel will provide input reference or input limit for each control task correspondingly, so as to change the posture and balance state of the robot.
  • the active joints e.g., q ⁇ 1,2 ⁇ and q ⁇ 7,8 ⁇
  • the active joints e.g., q ⁇ 3,4 ⁇ and q ⁇ 9,10 ⁇
  • joint moments e.g., ⁇ ⁇ 1,2 ⁇ and ⁇ ⁇ 5,6 ⁇
  • the rotation of the driving wheel under the control of the adaptive dynamic programming controller will provide an input reference Ref to at least one of the wheel balance control task, wheel travel and rotation control task.
  • the target trajectory will provide input references for wheel movement and rotation control tasks, base attitude control tasks, and tail control tasks.
  • the active wheels and target trajectories do not directly provide input references to other control tasks (e.g., torque control tasks and external force control tasks), considering that each control task often requires the same robot components (e.g., main wheels, connecting rods, components, joint hinges, etc.), the control effects of these control tasks are often limited by the driving wheel and the target trajectory.
  • the movement of the robot is also limited by various constraints, for example, the maximum torque that each joint can provide, the limitation of the mechanical configuration, and so on.
  • constraints for example, the maximum torque that each joint can provide, the limitation of the mechanical configuration, and so on.
  • dynamic constraints for example, dynamic constraints, close-loop linkage constraints, nonholonomic constraints, and friction constraints.
  • the dynamic model shown in formula (1) can be used as an example of dynamic constraints to limit the energy variation range of the robot during motion. It should be understood by those skilled in the art that the limitations of the kinetic model are not limited thereto.
  • a simplified dynamic model can be established for the robot, so as to simplify the constraints of the dynamic model corresponding to formula (1) in dynamic whole-body dynamic control.
  • formula (3) shows an example of a closed-loop linkage constraint for the robot in FIG. 3 .
  • the closed-loop linkage constraint can also be shown in other ways. The present disclosure is not limited thereto.
  • J ⁇ ,l and J ⁇ ,r identify the left and right wheel legs of the robot, respectively.
  • formula (4) shows a nonholonomic constraint for the robot in Fig. 3 example.
  • non-holonomic constraints can also be shown in other ways.
  • the setting of friction constraints can be based on the assumption that the friction cone at the contact point between the ground and the robot during actual motion is approximated as a pyramidal friction cone .
  • the friction constraint can be expressed as
  • one-sided constraints can also be set correspondingly.
  • An example of a one-sided constraint could be f i,z >0.
  • control models of various control tasks can be determined correspondingly.
  • the rotation of the active wheel under the control of the adaptive dynamic programming controller will provide the input reference to the wheel balance control task, while the target trajectory will provide the input reference to the other control tasks.
  • the rotation speed of the driving wheel will affect the posture and speed of the base part, and the posture and speed of the base part will affect the balance state of the robot.
  • the desired acceleration of the base part can be calculated by a PD control law (proportional-derivative controller)
  • PD control law proportional-derivative controller
  • at least part of the PD control law is derived based on an input reference for attitude and an input reference for velocity.
  • the input reference for the pose is also called the reference pose, which indicates: the pose of each joint except the joint q ⁇ 5,6 ⁇ due to the rotation of the active wheel under the control of the adaptive dynamic programming controller The change.
  • the input reference for velocity is also referred to as reference velocity, which indicates the change in velocity of each joint except joint q ⁇ 5,6 ⁇ due to the rotation of the driving wheel under the control of the adaptive dynamic programming controller.
  • formula (5) can be used to compare formula (2) approximate expression.
  • formula (6) can be used to further improve the formula (2) approximate expression. Assume other joints than joint q ⁇ 5,6 ⁇ in Equation (6) The moment of (i ⁇ 3,4) is approximately zero.
  • the input reference for attitude includes: the distance from the center of gravity of the robot to the center of the active wheel line projected on the ground (for example, identified by state_com_p).
  • Input references for velocity include: the resulting velocity (identified as state_com_v, for example) based on the difference of the distance from the center of gravity of the robot to the center of the driving wheel line projected on the ground, and the linear velocity of the driving wheels (identified as wheel_x_v).
  • the aforementioned PD control law can use state_com_p, state_com_v, and wheel_x_v as input states to obtain at least one of the reference acceleration or reference torque of the driving wheel.
  • Embodiments of the present disclosure optionally combine whole-body dynamics technology to propose a whole-body dynamics control method based on adaptive dynamic programming.
  • the whole-body dynamics control method uses the output of the controller of a certain joint of the robot calculated based on adaptive dynamic programming as a reference parameter for the whole-body dynamics control, so that the controller of this joint can cooperate with the controllers of other joints, The overall flexibility of the robot's movement is thereby improved.
  • embodiments of the present disclosure also correspondingly disclose a method for robot motion control using a controller corresponding to the dynamic characteristics of the robot.
  • the robot includes a wheel leg portion and a base portion connected to the wheel leg portion, the wheel leg portion including driving wheels and at least one joint.
  • the method includes: receiving a motion command from the second controller, the motion command indicating the motion trajectory of the robot; according to the motion command, using a controller corresponding to the dynamic characteristics of the robot to control the driving force of the driving wheel, so that the robot moves along the Moving smoothly along the target trajectory.
  • the robot under the control of the controller corresponding to the dynamics of the robot is closer to the equilibrium point during the movement than the robot under the control of the first controller.
  • the embodiments of the present disclosure also correspondingly disclose a method for controlling a robot.
  • the method includes: receiving a motion command from the first controller, the motion command indicating the motion trajectory of the robot; controlling the driving force of the driving wheel according to the motion command, so that the robot moves under the control of the first controller and acquires the motion
  • the motion state data and control data in the process based on the motion state data and control data, use strategy iteration to build a second controller corresponding to the dynamic characteristics of the robot, and use the second controller to control the driving force of the driving wheel to make the robot move smoothly.
  • the robot under the control of the second controller has a better control effect during motion, for example, is closer to the balance point than the robot under the control of any other controller.
  • the method for controlling a robot in the embodiments of the present disclosure can enable a robot with unknown dynamics to learn data during movement, and gradually improve/generate a controller corresponding to the dynamics of the robot, and finally achieve smooth movement. Since the control input of the first controller can be used to control the robot to move for a period of time to obtain training data, in this case, the embodiments of the present disclosure realize the non-optimal The refinement of the optimal controller generates a second controller corresponding to the (exact) dynamics of the robot. That is, the embodiments of the present disclosure can enable the robot to flexibly control the robot without an accurate dynamic model.
  • the motion process of the robot 100 can be regarded as a continuous time linear system in mathematics. Assume that there is a controller corresponding to the optimal solution of the linear quadratic regulation problem for the robot 100, which can minimize the cost functional corresponding to the motion process of the robot. For example, a controller corresponding to the optimal solution of a linear quadratic regulation problem can minimize the cost of the robot being near the equilibrium point and can travel along the target trajectory with minimum energy consumption.
  • a linear quadratic scaling problem can be defined by equation (7), which indicates that in In the case of , solve the controller that minimizes the cost functional J of the continuous-time linear system.
  • J is the cost functional of the continuous-time linear system
  • Q is a real symmetric and positive semidefinite matrix
  • observable and R>0.
  • x is related to robot configuration and wheel balancing tasks. For example, referring to the example in Fig. 4, if the controller needs to be determined for the active wheels, then x optionally includes the pitch angle, the pitch angular velocity, and the linear velocity of the robot, and u is the sum of the input torques of the two wheels.
  • the above-mentioned step S202 may further include: performing nonlinear combination on the motion state data and control data corresponding to multiple time intervals to determine the training data set, based on the training data set, determining the iterative relational function; according to the iterative relational function , carry out multiple policy iterations on the iterative target item, and approximate the linear equilibrium parameter matrix corresponding to the dynamic characteristics of the robot.
  • step S202 is described by using the examples described in FIGS. 1 to 4 .
  • the first controller u 0 may be used to control the robot to move, and collect motion state data and control data corresponding to multiple time intervals.
  • the closed-loop system can be expressed as formula (11).
  • the motion state data is collected by the sensor at certain time intervals during a period of time, which respectively correspond to the motion state of the robot at each discrete moment within a period of time. Therefore, the motion state data and the control data of the first controller may correspond to multiple time intervals in [t 0 ,t r ]. Any one of the multiple time intervals from t to t+ ⁇ t can be recorded as [t,t+ ⁇ t], and its duration ⁇ t can be determined according to the data collection time interval that the robot sensor can achieve.
  • the motion state data and control data corresponding to multiple time intervals can be combined nonlinearly to construct an iterative relationship function.
  • the motion state data and control data after integral calculation will be used as training data, and participate in the process of strategy iteration on the iterative target item in step S202, so as to obtain a linear equilibrium parameter matrix corresponding to the dynamic characteristics of the robot. It should be noted that what is described below is only an exemplary integral operation, and the present disclosure is not limited thereto.
  • an exemplary formula (13) can be determined.
  • formula (13) can be iterated over multiple time intervals.
  • the time integral of the motion state data between any two adjacent moments t and t+ ⁇ t may be related to at least one of the following items: the quadratic term, the quadratic term of the motion state data at time t+ ⁇ t, the product of the motion state data at time t and the motion state data at time t+ ⁇ t, the product of the control data at time t and the motion state data at time t, and the time t The product of the control data of + ⁇ t and the motion state data of time t+ ⁇ t and so on.
  • the control data at time t is the control data for using the first controller to control the movement of the robot.
  • the embodiment of the present disclosure optionally defines the following three matrices as example elements in the training data set with formula (14), the first matrix ⁇ xx , The second matrix ⁇ xx , and the third matrix ⁇ xu .
  • the first matrix second matrix and Each matrix corresponds to a nonlinear combination of motion state data and control data in multiple time intervals, for example involving integral operations and product calculations.
  • any element in the first matrix ⁇ xx The product of any two of the pitch angle of the base, the pitch angular velocity of the base, and the linear velocity of the driving wheel corresponding to time t i and time t i + ⁇ t , or the difference of the quadratic term of any one of them.
  • Any element in the second matrix ⁇ xx The product of any two of the pitch angle of the base, the pitch angular velocity of the base, and the linear velocity of the driving wheel corresponding to time t i and time t i + ⁇ t , or the integral of the quadratic term of any one.
  • any element in the third matrix ⁇ xu The integral corresponding to the product of any one of the pitch angle of the base part, the pitch angular velocity of the base part, and the linear speed of the driving wheel at time t i and time t i + ⁇ t and the driving force controlled by the first controller.
  • the configurations of different robots will correspond to different matrices, and the above is only shown as an example, and the present disclosure is not limited thereto.
  • vec( ⁇ ) indicates that the content inside the brackets is vectorized. also, and can be defined as shown in equation (16).
  • k indicates the number of policy iterations
  • P k is the solution of the Lyapunov equation in the k-th policy iteration
  • K k is the linear balance parameter matrix used in the k-th policy iteration
  • K k+ 1 is the linear balance parameter matrix in the k+1th policy iteration.
  • FIG. 6 shows another structural view of the robot 100 .
  • Fig. 7A shows motion state data and control data in the process of controlling the motion of the robot by using the first controller.
  • FIG. 7B shows the convergence process of the linear balance parameter matrix in the process of constructing the controller corresponding to the dynamic characteristics of the robot, where the height of the base portion of the robot is 0.5 m and 0.6 m, respectively.
  • FIG. 7C shows the motion state data of the robot using the first controller and the second controller to control the robot to walk straight when the height of the base part is 0.6 meters.
  • the robot 100 includes a data acquisition device, a data processing device and a driving motor in addition to the wheel legs and the base described in FIGS. 1 to 4 .
  • the data collection device may be configured to: acquire motion state data and control data during the motion process when the first controller is used to control the motion of the robot.
  • the data collector may include: a first sensor for measuring the pitch angle ⁇ of the base part and its angular velocity
  • the second sensor is used to measure the rotational angular velocity of the left and right drive wheels and
  • the first sensor may be an inertial measurement unit (IMU for short), which may include a three-axis gyroscope, a three-axis accelerometer, or a three-axis magnetometer.
  • the second sensor may be a motor encoder with a sampling frequency of 200Hz.
  • the data processing device is configured to: use strategy iteration to update the linear balance parameter matrix of the first controller according to the motion state data and the control data; and based on the updated linear balance parameter matrix, construct a power corresponding to the robot second controller with learning characteristics.
  • the data processing means may include a microprocessor, digital signal processor (“DSP”), application specific integrated circuit (“ASIC”), field programmable gate array, state machine, or other processing for processing electrical signals received from sensor wires device.
  • processing devices may include programmable electronic devices such as PLCs, programmable interrupt controllers (“PICs”), programmable logic devices (“PLDs”), programmable read-only memories (“PROMs”), electronic programmable Read memory, etc.
  • the present disclosure only gives an example of using the first controller or the second controller to control the driving wheel 104 , and those skilled in the art should understand that the solutions of the present disclosure can also be used to control other components of the robot.
  • the active wheels are only used to control the forward and backward motion of the robot, for curved target trajectories, a controller for controlling the yaw angle is also required to control the robot steering.
  • the controller that controls the yaw angle is set as in is the target yaw rate. then pass and Calculate the torque of the left and right wheels. Because ⁇ ⁇ does not change the force along the longitudinal direction of the robot. Therefore, yaw motion does not affect the balance of the robot. Thereafter, the angular units are converted to "degrees (deg)" for readability.
  • the data processing device calculates the control data of the first controller based on the given target trajectory.
  • the first controller corresponds to the optimal controller that can control the robot 100 to walk upright and obtained by numerical iteration when the height of the robot is the lowest. Specifically, the minimum height of the robot is 0.33 meters.
  • the control frequency of the data processing device is optionally 1000 Hz.
  • the motion state data and control data will be used to calculate the first matrix ⁇ xx , the second matrix ⁇ xx , and the third matrix ⁇ xu .
  • These data require continuous signals of x and u, so the data processing device can further use trapezoidal integrals to calculate integrals in case the first controller and the controller corresponding to the dynamics of the robot control the movement of the robot 100 .
  • the step size of trapezoidal integration is 0.01s, which is the same as the sampling period.
  • exploring noise is often used in the fields of learning and system identification. Exploring noise can trigger various system behaviors to avoid repeated data collection.
  • the search noise ⁇ (t) sin(10 ⁇ t)+0.4cos(6 ⁇ t).
  • the data processing device may be further configured with the following instructions to perform the calculation of the control data of the first controller and the construction of the second controller.
  • the instructions are shown in the form of pseudocodes, and those skilled in the art can use any programming language to calculate the control data of the first controller and construct the second controller based on the following pseudocodes.
  • the left graph of FIG. 7B using only 37 iterations, it is possible to make
  • 10 ⁇ 5 .
  • 10 ⁇ 5 . Obviously, the convergence speed of strategy iteration is very fast, so the embodiments of the present disclosure can be applied to online computing.
  • the same sinusoidal noise is added to both the first controller u 0 (t) and the second controller u(t) to simulate the external disturbance acting on the wheel.
  • both controllers are robust to noise and have similar control performance.
  • the updated gain K The control effect is better in the adjustment, so that the robot's traveling state is more stable.
  • the embodiments of the present disclosure combine the optimal control technology and propose a numerical iterative method based on adaptive dynamic programming.
  • the numerical iterative method based on adaptive dynamic programming can be calculated and converged to a value corresponding to A controller for the dynamics of the robot.
  • the controller corresponding to the precise dynamic characteristics of the robot is also the controller corresponding to the optimal solution of the linear quadratic regulation problem, which can make the robot travel along the target trajectory with the optimal control effect in a balanced state.
  • FIG. 8 shows a flow chart of building the first controller based on the historical motion data of the robot.
  • the motion process of the robot 100 can be regarded as a continuous time linear system in mathematics. It is assumed that there is a controller corresponding to the optimal solution of the linear quadratic regulation problem for the robot 100, which can minimize the cost functional corresponding to the motion process of the robot. For example, a controller corresponding to the optimal solution of a linear quadratic regulation problem can minimize the cost of the robot being near the equilibrium point and can travel along the target trajectory with minimum energy consumption.
  • an embodiment of the present disclosure shows a data processing process for constructing a first controller.
  • step S801 historical motion state data and historical control data are acquired from historical motion data of the robot, and the diversity metrics of the historical motion state data and historical control data are higher than a predetermined threshold.
  • the robot can be controlled to move along a predetermined trajectory, and historical motion state data and historical control data during historical motion can be obtained.
  • the predetermined trajectory can be roughly estimated based on the robot's structural characteristics, motion characteristics, and dynamic characteristics to collect historical motion data of the robot in various motion situations (scenes) so that the diversity metrics of historical motion state data and historical control data are high enough (eg, at least above a predetermined threshold).
  • the diversity measure can be characterized by information entropy, which indicates that there are enough unique/dissimilar values in both the historical motion state data and the historical control data.
  • the diversity measure can also be characterized by data feature quantities.
  • any controller can be used to control the robot to move along a predetermined trajectory.
  • the robot can be manually controlled to move along a straight line with different accelerations, regardless of whether the robot is in a state of balanced and stable motion.
  • the robot 100 will quickly fall backwards. If the drive wheels 104 provide too little acceleration, the destination cannot be reached quickly and may tip forward.
  • historical motion state data and historical control data satisfying the diversity measure may be collected in the following manner.
  • the drive motor may be controlled to output the first torque, so that the robot loses balance due to low-speed motion.
  • the first torque can be a small value, so that when the drive motor is controlled to output the first torque, the center of mass of the base portion of the robot first rises and then lowers, and when the robot loses balance, the front end of the base portion is in contact with the ground. touch. That is, the robot rushes forward from the head-down state (the lower state of the center of mass of the base portion), but because the force of the rush is not large enough, the head is half raised and then lowered.
  • the drive motor can also be controlled to output the second torque, so that the robot loses balance due to high-speed motion.
  • the second torque can be a relatively large value.
  • the drive motor is controlled to output the second torque, the center of mass of the base portion of the robot first rises and then decreases, and when the robot loses balance, the rear end of the base portion is in contact with the ground. touch. That is, the robot rushes forward from the head-down state (the state where the center of mass of the base part is low), but due to the excessive force of the rush, it passes the balance point (the highest point of the center of mass of the base part) and falls backward.
  • the drive motor can also be controlled to output a third torque, so that the robot maintains a balanced state for a period of time.
  • the drive motor when the drive motor is controlled to output the third torque, the center of mass of the base portion of the robot remains at a constant height when the robot maintains a balanced state.
  • control the drive motor to output the fourth torque so that the robot maintains a quasi-balanced state for a period of time, and the robot in the quasi-balanced state is near the balance point during motion.
  • the driving motor is controlled to output the fourth torque, the base part of the robot shakes back and forth while the robot maintains a quasi-balanced state.
  • commands can be entered manually at the remote controller and sent to the remote controller.
  • the remote controller can determine its corresponding control data.
  • the movement of the robot can be controlled, and the movement state data during the movement can be obtained.
  • the remote controller is not a kind of balance controller, which often causes the robot to lose its balance.
  • step S802 according to the historical motion state data and the historical control data, the linear balance parameter matrix is calculated by means of numerical iteration.
  • step S803 a first controller for controlling the motion of the robot is constructed based on the linear equilibrium parameter matrix. Among them, the robot under the control of the controller corresponding to the dynamic characteristics of the robot has a better control effect during the movement process than the robot under the control of the remote controller.
  • the controller corresponding to the dynamic characteristics of the robot is a linear controller, and for each moment in the motion process, the control torque provided by the controller corresponding to the dynamic characteristics of the robot is negatively related to the linear balance parameter matrix and the robot The product between the motion state data.
  • step S802 in FIG. 8 it may further include: performing integration operations on historical motion state data and historical control data in multiple time intervals to construct an iterative relational function; Numerical iterations are performed to obtain a linear equilibrium parameter matrix corresponding to the dynamic characteristics of the robot.
  • the linear balance parameter matrix K is Among them, s tends to negative infinity.
  • the historical exercise state data and the historical control data may be the exercise state data for training and the control data for training.
  • the motion state data used for training and the control data used for training are historical motion state data and control data at the moment when the robot is not overturned (for example, the front end/rear end or the tail portion of the base portion is not in contact with the ground) data. That is, at least during this segment of motion, based on formula (18), the following formula (19) holds true.
  • the historical motion state data is collected by the sensor at certain time intervals during a period of time, which respectively correspond to the historical motion state of the robot at each discrete moment within a period of time. Therefore, the historical motion state data and the historical control data of the first controller may correspond to multiple time intervals in [t 0 ,t r ]. Any one of the multiple time intervals t i to t i+1 can be recorded as [t, t+ ⁇ t], and its duration ⁇ t can be determined according to the data collection time interval that the robot sensor can achieve.
  • the embodiment of the present disclosure constructs the first matrix only by collecting historical motion state data and historical control data of a robot with unknown dynamic characteristics before losing balance (falling), and performing an integral operation on these historical data to the third matrix as training data. Therefore, the amount of training data in the embodiments of the present disclosure is much smaller than the amount of data required by traditional reinforcement learning algorithms.
  • Embodiments of the present disclosure also correspondingly construct an iterative relationship function (for example, formula (20)), so that the target iteration items (for example, P(s), K(s) and H(s)) increase with the learning step size gradually converges.
  • the converged target iteration term can obtain a controller that converges to the controller corresponding to the optimal solution of the linear quadratic regulation problem, so that the stability of the closed-loop system can be guaranteed, and its training process is greatly simplified.
  • the whole process does not require additional restrictions on the training data, which simplifies the design process of the robot's controller.
  • the processor can further process the data collected by the data collection device.
  • the present disclosure only gives an example of controlling the driving wheel 104 , and those skilled in the art should understand that the solutions of the present disclosure can also be used to control other components of the robot.
  • the data processing device sets control data for training based on the given target trajectory.
  • the present disclosure is not limited to the specific control law of the controller used for training.
  • an experimenter manually controls the movement of the robot to extract motion state data and control data as an example for illustration.
  • the control frequency of the data processing device is 1000 Hz.
  • motion state data and control data will be used to calculate ⁇ xx , ⁇ xx , ⁇ xu . These data require continuous signals of x and u.
  • the motion data is collected in a manner similar to that of FIG. 7A.
  • the remote controller can be used to input commands manually to determine the data of the robot motion controlled by the remote controller. .
  • the experimenters cannot accurately know the dynamic characteristics of the robot 100, and the manual control of the robot often fails to adjust the controller of the robot accurately and in time, causing the robot to fall down.
  • the collected motion state data can also be further processed to obtain a controller corresponding to the dynamic characteristics of the robot as soon as possible.
  • An example experiment using the data iteration scheme to compute the controller corresponding to the robot dynamics is shown below.
  • the height of the robot is the minimum height of 0.33m.
  • the motion command is directly given manually by the remote controller to indicate the torque of the driving wheel.
  • the robot starts from an initial state (shown in state A), and moves using the driving wheel (shown in state B and state C), and finally falls down (state D) . Since, eventually, the robot loses its balance, the remote controller in this case is not a class balance controller.
  • a controller corresponding to the dynamic characteristics of the robot is constructed.
  • the controller is used to control the real robot to travel in the path shown in Figure 13, and the test data of the inclination angle (which is roughly within plus or minus 2 degrees), linear velocity, and yaw velocity data shown in Figure 14 are collected.
  • the scheme of data iteration can obtain a controller with sufficient robustness and stability.
  • controller can also be used to control other motions, and the present disclosure is not limited thereto.
  • robustness of the controller is much higher than that of the PID controller, that is, when the robot 100 is disturbed from the outside, the robot under the control of the controller can quickly restore its balance.
  • the embodiments of the present disclosure are based on reinforcement learning and ADP technology in artificial intelligence, and use a numerical iteration scheme to solve the optimal balance control problem of the robot when the dynamic characteristics of the robot are unknown.
  • the process of building the controller in the embodiment of the present disclosure only requires the wheel-legged robot to travel for a period of time/a trajectory under the control of a non-optimal controller/arbitrary controller, and collect the motion state corresponding to the time period/trajectory data and control data as training data. Therefore, the amount of training data in the embodiments of the present disclosure is much smaller than the amount of data required by traditional reinforcement learning algorithms.
  • the controller trained in the embodiments of the present disclosure gradually converges to the controller most corresponding to the optimal solution of the linear quadratic regulation problem as the learning step increases, thereby ensuring the stability of the closed-loop system,
  • the training process is greatly simplified, and there is no need to impose additional restrictions on the training data, thereby simplifying the design process of the controller of the wheel-legged robot.
  • the application provides a device for constructing a controller of a robot, the device comprising:
  • a motion control module configured to use the first controller to control the motion of the robot, and obtain motion state data and control data of the robot during motion;
  • a strategy iteration module configured to update the linear balance parameter matrix of the first controller in a strategy iteration manner according to the motion state data and the control data;
  • the second controller construction module is configured to construct a second controller corresponding to the dynamic characteristics of the robot based on the updated linear equilibrium parameter matrix.
  • the dynamics of the robot are associated with at least one variable parameter; the first controller corresponds to the dynamics with the variable parameter at a first value; the second controller corresponds to Kinetic properties where the variable parameter is a second value.
  • the first controller controls the robot to move in a quasi-balance state, and the robot in the quasi-balance state is near the balance point during the movement; under the control of the second controller Compared with the robot under the control of the first controller, the robot has a better control effect in the motion process.
  • both the first controller and the second controller are linear controllers; at each moment in the motion process, the control torque provided by the linear controller is negatively related to the linear balance The product between the parameter matrix and the motion state data of the robot.
  • the motion control module is further configured to use the first controller to determine an initial control instruction according to the current motion state of the robot; apply disturbance to the control data indicated by the initial control instruction, obtaining control data of the first controller; and controlling the movement of the robot according to the control data of the first controller, and collecting movement state data during the movement.
  • the motion state data and the control data correspond to a plurality of time intervals
  • the policy iteration module is further configured to correspond to the motion state data and the control data in the multiple time intervals
  • Non-linear combination of the control data to determine a training data set; determine an iterative target item, and determine an iterative relational function based on the training data set; and perform multiple strategy iterations on the iterative target item according to the iterative relational function , approximated to obtain a linear equilibrium parameter matrix corresponding to the dynamics of the robot.
  • the strategy iteration module is also used to determine whether the iteration target item converges in each strategy iteration, and stop the strategy iteration if the iteration target item converges; and according to the convergence The iteration target item updates the linear balance parameter matrix.
  • the iteration relational function conforms to the form of Lyapunov equation
  • the iteration target item includes the linear balance parameter matrix to be iterated
  • the solution of the equation; the iteration relation function is used to calculate the linear balance parameter matrix corresponding to the next strategy iteration according to the linear balance parameter matrix in the current strategy iteration and the solution of the Lyapunov equation corresponding to the current strategy iteration.
  • the convergence of the iteration target item includes: the difference between the solutions of the Lyapunov equation corresponding to two adjacent strategy iterations is smaller than a preset value.
  • the device for constructing the controller of the robot further includes a first controller building module, which is used to obtain historical motion state data and historical control data from the historical motion data of the robot, the historical motion state data and the diversity measure of the historical control data is higher than a predetermined threshold; according to the historical motion state data and the historical control data, a linear balance parameter matrix is calculated in a numerical iterative manner; and based on the linear balance parameter matrix, construct A first controller for controlling the movement of the robot.
  • a first controller building module which is used to obtain historical motion state data and historical control data from the historical motion data of the robot, the historical motion state data and the diversity measure of the historical control data is higher than a predetermined threshold; according to the historical motion state data and the historical control data, a linear balance parameter matrix is calculated in a numerical iterative manner; and based on the linear balance parameter matrix, construct A first controller for controlling the movement of the robot.
  • the historical motion data is the motion data obtained when driving each joint in the wheel legs of the robot based on the control torque to drive the robot to move along the target trajectory;
  • the device for constructing a robot controller also includes a control torque acquisition module, which is used to adaptively determine control information for controlling the rotation of the active wheel of the robot based on the existing motion information of the robot; based on controlling the rotation of the active wheel control information for controlling the multiple joints of the robot, the first control information makes the robot maintain balance; based on the target trajectory of the robot, determine the first control information for controlling the multiple joints of the robot.
  • the second control information of the joint the second control information makes the robot move along the target trajectory; based on the motion constraints of the robot, the first control information and the second control information, determine the Control torque for each joint in the wheel leg.
  • the present application provides a robot motion control device, the robot moves by driving a driving wheel, and the device includes:
  • An instruction receiving module configured to receive a movement instruction, the movement instruction indicating the movement trajectory of the robot
  • An instruction execution module configured to control the driving force applied to the driving wheel through the first controller according to a movement instruction, so that the robot moves according to the movement trajectory;
  • a data acquisition module configured to acquire motion state data and control data of the robot during motion
  • a strategy iteration module configured to use strategy iteration to construct a second controller corresponding to the dynamic characteristics of the robot based on the motion state data and the control data;
  • a driving force control module configured to use the second controller to control the driving force applied to the driving wheels, so as to make the robot move smoothly.
  • the present application also provides a computer-readable storage medium, on which computer-readable instructions are stored, and when the computer-readable instructions are executed by one or more processors, the steps of the methods described in the above embodiments are implemented.
  • the present application also provides a computer program product, including computer readable instructions, and when the computer readable instructions are executed by one or more processors, the steps of the methods described in the above embodiments are implemented.
  • the robot may also include, for example, a bus, a memory, a sensor component, a communication module, an input and output device, and the like.
  • a bus for example, a bus, a memory, a sensor component, a communication module, an input and output device, and the like.
  • Embodiments of the present disclosure are not limited by the specific components of the robot.
  • a bus may be an electrical circuit that interconnects the various components of the robot and communicates communication information (eg, control messages or data) among the various components.
  • Sensor components can be used to perceive the physical world, such as cameras, infrared sensors, ultrasonic sensors, etc.
  • the sensor component may also include a device for measuring the current operation and motion state of the robot, such as a Hall sensor, a laser position sensor, or a strain sensor.
  • the communication module may, for example, be wired or network-connected to facilitate communication with the physical world (eg, a server).
  • the communication module may be wireless and may include a wireless interface such as IEEE 802.11, Bluetooth, a wireless local area network (“WLAN”) transceiver, or a radio interface for accessing a cellular telephone network (e.g., for accessing CDMA, GSM, transceiver/antenna for UMTS or other mobile communication networks).
  • the communication module may be wired and may include an interface such as Ethernet, USB, or IEEE 1394.
  • the input-output device may transmit commands or data input, for example, from a user or any other external device to one or more other parts of the robot, or may output commands or data received from one or more other parts of the robot to the user or other external devices.
  • a robotic system can be composed of a plurality of robots that are communicatively connected to a server and that receive collaborative robot instructions from the server to cooperatively complete a task.
  • Tangible, permanent storage media may include the internal memory or storage used by any computer, processor, or similar device or related modules. For example, various semiconductor memories, tape drives, disk drives, or any similar device that provides storage for software.
  • All or portions of the Software may from time to time communicate over a network, such as the Internet or other communication network. Such communications may load software from one computer device or processor to another. Therefore, another medium that can transmit software elements can also be used as a physical connection between local devices, such as light waves, radio waves, electromagnetic waves, etc., and can be transmitted through cables, optical cables, or air.
  • the physical medium used for carrier waves, such as electrical cables, wireless connections, or fiber optic cables, can also be considered a medium for carrying software.
  • tangible "storage” media other terms referring to computer or machine "readable media” mean media that participate in the execution of any instructions by a processor.
  • first/second embodiment means a certain feature, structure or characteristic related to at least one embodiment of the present disclosure. Therefore, it should be emphasized and noted that two or more references to “an embodiment” or “an embodiment” or “an alternative embodiment” in different places in this specification do not necessarily refer to the same embodiment .
  • certain features, structures or characteristics of one or more embodiments of the present disclosure may be properly combined.
  • aspects of the present disclosure may be illustrated and described in several patentable categories or circumstances, including any new and useful process, machine, product, or combination of substances, or combinations thereof Any new and useful improvements. Accordingly, various aspects of the present disclosure may be entirely executed by hardware, may be entirely executed by software (including firmware, resident software, microcode, etc.), or may be executed by a combination of hardware and software.
  • the above hardware or software may be referred to as “block”, “module”, “engine”, “unit”, “component” or “system”.
  • aspects of the present disclosure may be embodied as a computer product comprising computer readable program code on one or more computer readable media.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Manipulator (AREA)

Abstract

一种构建机器人的控制器的方法、机器人的运动控制方法、装置以及机器人,方法包括:利用第一控制器控制机器人运动,并获取机器人在运动过程中的运动状态数据和控制数据(S201);根据运动状态数据和控制数据,使用策略迭代的方式对第一控制器的线性平衡参数矩阵进行更新(S202);以及基于更新后的线性平衡参数矩阵,构建对应于机器人的动力学特性的第二控制器(S203)。

Description

构建机器人的控制器的方法、机器人的运动控制方法、装置以及机器人
相关申请的交叉引用
本申请要求于2022年03月01日提交中国专利局,申请号为202210194306X,申请名称为“构建机器人的控制器的方法和机器人”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及人工智能及机器人领域,更具体地涉及一种构建机器人的控制器的方法、机器人的运动控制方法、装置、机器人、计算机可读存储介质以及计算机程序产品。
背景技术
随着人工智能及机器人技术在民用和商用领域的广泛应用,基于人工智能及机器人技术的机器人在智能交通、智能家居等领域起到日益重要的作用,也面临着更高的要求。
当前对机器人,特别是欠驱动机器人进行运动控制时,通常需要设计与机器人机械结构精确对应的动力学模型,然后基于该动力学模型在行进过程中的变化来确定机器人各个关节处的控制力,以保证机器人的运动过程中的平衡。然而,由于机器人的机械结构复杂,尤其是某些轮腿式机器人,即使在知晓机器人的机械结构的情况下也很难得出准确的动力学模型。此外,即使动力学模型已知,但在一些情况下难以准确地进行动力学模型中的参数辨识。如果动力学模型的参数虽然已知但不准确,也会造成机器人的控制器效果不理想。为此需要提出一种能够对机器人进行灵活控制的方案。
发明内容
本公开提供了一种构建机器人的控制器的方法、机器人的运动控制方法、装置、机器人、计算机可读存储介质以及计算机程序产品。
一方面,本公开提供了一种构建机器人的控制器的方法,由处理器执行,所述方法包括:利用第一控制器控制机器人运动,并获取机器人运动过程中的运动状态数据和控制数据;根据所述运动状态数据和所述控制数据,使用策略迭代的方式对第一控制器的线性平衡参数矩阵进行更新;以及基于更新后的线性平衡参数矩阵,构建对应于所述机器人的动力学特性的第二控制器。
在又一方面,本公开提供了一种机器人运动控制方法,由处理器执行,所述机器人通过驱动主动轮运动,所述方法包括:
接收运动指令,所述运动指令指示所述机器人的运动轨迹;
根据运动指令,通过所述第一控制器控制施加给所述主动轮的驱动力,以使得所述机器人按照所述运动轨迹运动;
获取所述机器人在运动过程中的运动状态数据和控制数据;
基于所述运动状态数据和所述控制数据,使用策略迭代的方式构建对应于所述机器人的动力学特性的第二控制器;以及
利用所述第二控制器控制施加给所述主动轮的驱动力,以使得所述机器人平稳运动。
在又一方面,本公开提供了一种机器人,所述机器人包括:数据采集装置,被配置为:在第一控制器控制机器人运动的情况下,获取所述机器人的运动状态数据;数据处理装置,被配置为:获取与所述运动状态数据对应的控制数据;基于所述运动状态数据和所述控制数据,使用策略迭代的方式对第一控制器的线性平衡参数矩阵进行更新;以及基于更新后的线性平衡参数矩阵,构建对应于所述机器人的动力学特性的第二控制器。
在又一方面,本公开提供了一种构建机器人的控制器的装置,所述装置包括:
运动控制模块,用于利用第一控制器控制机器人运动,并获取所述机器人在运动过程中的运动状态数据和控制数据;
策略迭代模块,用于根据所述运动状态数据和所述控制数据,使用策略迭代的方式对所述第一控制器的线性平衡参数矩阵进行更新;以及
第二控制器构建模块,用于基于更新后的线性平衡参数矩阵,构建对应于所述机器人的动力学特性的第二控制器。
在又一方面,本公开提供了一种机器人运动控制装置,所述机器人通过驱动主动轮运动,所述装置包括:
指令接收模块,用于接收运动指令,所述运动指令指示所述机器人的运动轨迹;
指令执行模块,用于根据运动指令,通过所述第一控制器控制施加给所述主动轮的驱动力,以使得所述机器人按照所述运动轨迹运动;
数据获取模块,用于获取所述机器人在运动过程中的运动状态数据和控制数据;
策略迭代模块,用于基于所述运动状态数据和所述控制数据,使用策略迭代的方式构建对应于所述机器人的动力学特性的第二控制器;以及
驱动力控制模块,用于利用所述第二控制器控制施加给所述主动轮的驱动力,以使得所述机器人平稳运动。
在又一方面,本公开提供了一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时实现上述任一项所述的方法的步骤。
在又一方面,本公开提供了一种计算机程序产品,包括计算机可读指令,所述计算机可读指令被一个或多个处理器执行时实现上述任一项所述的方法的步骤。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员而言,在没有做出创造性劳动的前提下,还可以根据这些附图获得其他的附图。以下附图并未刻意按实际尺寸等比例缩放绘制,重点在于示出本发明的主旨。
图1示出了根据本公开实施例的具有单轮腿式构型的左轮腿部及右轮腿部的机器人的结构示意图。
图2示出了根据本公开实施例的构建机器人的控制器的方法的示例性流程图。
图3示出了根据公开实施例的机器人对应的标注示意图。
图4示出了根据公开实施例的机器人对应的控制架构图。
图5示出了根据本公开实施例的构建机器人的控制器的方法的示例性流程图。
图6示出了根据本公开实施例的机器人的又一结构视图。
图7A示出了根据本公开实施例的机器人利用第一控制器控制机器人的运动过程中的运动状态数据和控制数据。
图7B示出了根据本公开实施例的构建第二控制器的过程中线性平衡参数矩阵的收敛过程,其中机器人的基座部高度分别为0.5米和0.6米。
图7C示出了根据本公开实施例的机器人利用第一控制器和第二控制器分别控制机器人在基座部高度为0.6米的情况下进行直线行走的运动状态数据。
图8示出了根据本公开实施例的构建第一控制器的示例性流程图。
图9示出了根据本公开实施例的构建机器人的控制器的方法的示例示意图。
图10示出了根据本公开实施例的采集机器人的运动状态数据和控制数据的过程示意图。
图11示出了根据本公开实施例的采集机器人的运动状态数据和控制数据的曲线图。
图12示出了根据本公开实施例的采集机器人的过程中的线性平衡参数矩阵的迭代示 意图。
图13示出了根据本公开实施例的测试机器人的控制器的实验示意图。
图14示出了根据本公开实施例的测试机器人的控制器的实验数据图。
具体实施方式
下面将结合附图对本发明实施例中的技术方案进行清楚、完整地描述,显而易见地,所描述的实施例仅仅是本发明的部分实施例,而不是全部的实施例。基于本发明实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,也属于本发明保护的范围。
如本公开和权利要求书中所示,除非上下文明确提示例外情形,“一”、“一个”、“一种”和/或“该”等词并非特指单数,也可包括复数。一般说来,术语“包括”与“包含”仅提示包括已明确标识的步骤和元素,而这些步骤和元素不构成一个排它性的罗列,方法或者设备也可能包含其他的步骤或元素。
虽然本公开对根据本公开的实施例的系统中的某些模块做出了各种引用,然而,任何数量的不同模块可以被使用并运行在用户终端和/或服务器上。所述模块仅是说明性的,并且所述系统和方法的不同方面可以使用不同模块。
本公开中使用了流程图用来说明根据本公开的实施例的系统所执行的操作。应当理解的是,前面或下面操作不一定按照顺序来精确地执行。相反,根据需要,可以按照倒序或同时处理各种步骤。同时,也可以将其他操作添加到这些过程中,或从这些过程移除某一步或数步操作。
本公开技术方案主要涉及人工智能技术中的机器人技术,主要涉及机器人智能控制。机器人是利用机械传动、现代微电子技术组合而成的一种能模仿人某种技能的机械电子设备,机器人是在电子、机械及信息技术的基础上发展而来的。机器人的样子不一定必须像人,只要能自主完成人类所赋予他的任务与命令,就属于机器人大家族的成员。机器人是一种自动化的机器,这种机器具备一些与人或生物相似的智能能力,如感知能力、规划能力、动作能力和协同能力,是一种具有高度灵活性的自动化机器。随着计算机技术和人工智能技术的发展,使机器人在功能和技术层次上有了很大的提高,移动机器人和机器人的视觉和触觉等技术就是典型的代表。
本公开涉及一种人工智能在机器人控制方面的应用,具体地,本公开提出了一种基于人工智能来构建机器人的控制器的方法,该方案增加了利用待优化的任意控制器来构建对应于机器人的动力学特性的控制器。在对应于机器人的动力学特性的控制器的控制下的机器人,相对于在该待优化的任意控制器的控制下的机器人,具有更靠近于平衡点的运动状态。
本公开涉及的机器人可以是欠驱动的机器人,欠驱动是独立控制变量个数小于自由度个数的一类非线性控制对象。例如,欠驱动的机器人可以是如图1所示轮腿式机器人。具体地,图1示出了根据本公开实施例的具有单轮腿式构型的左轮腿部及右轮腿部的机器人的结构示意图。
如图1所示,其示例性示出了一种机器人100的结构示意图。该轮腿式机器人100可以包括:轮腿部103;,轮腿部103包括主动轮104(又称为足部)。轮腿式机器人100还可以包括基座部101,基座部101是指该机器人的主体部分,例如可以为该机器人的躯干部,该基座部例如可以为连接至该机器人的轮腿部的平面板状部件或长方体状部件。作为示例,轮腿部103的一端与基座部101连接,另一端与主动轮104连接。基座部101上装配有动力输出装置(例如,电机),可用于为驱动轮腿部103的主动轮提供动力。应了解,根据实际需要,该机器人还可以包括连接至该轮腿部的基座部或设置在该基座部上的附加部件。应了解,上述仅给出一种机器人的结构示例,本公开的实施例不受该机器人的具体组成部件及其连接方式的限制。
轮腿部103中的主动轮104可以使得轮腿式机器100人既可以进行行走,也可以进行轮式运动。可选地,轮腿式机器人100还可以包括可控的附加部件(例如,尾巴),该尾巴可用于平衡轮腿式机器人,也可以辅助轮腿式机器人运动。例如,该尾巴可以辅助轮腿式机器人在运动中保持平衡。可选地,轮腿式机器人还可以包括可控的机械臂,机械臂可用于执行如搬运、拾取等操作任务。轮腿式机器人可以包括多足轮腿式机器人,例如双足轮腿式机器人、四足轮腿式机器人等。
例如,轮腿部103为并联结构腿(双足轮腿式机器人100的两条腿之间为平衡点所在位置)。参照图1,机器人100的轮腿部102包括左轮腿部和右轮腿部,左轮腿部及右轮腿部中的每一个,均包括主动轮及两个并联腿部,该两个并联腿部连接至该主动轮的中心轴且用于实现对该主动轮的运动控制。例如,该左轮腿部例如包括左主动轮,第一左轮腿部及第二左轮腿部,第一左轮腿部与第二左轮腿部并联;且该右轮腿部112例如包括右主动轮,第一右轮腿部及第二右轮腿部,第一右轮腿部与第二右轮腿部并联。如图1所示,左轮腿部和右轮腿部镜像对称。
例如,该主动轮例如可以为单轮、两轮、四轮或其他个数的主动轮构型,每一个主动轮的运动可以通过并联的两个腿部或串联的多个腿部进行控制。应了解,本公开的实施例不受该左、右轮腿部的具体组成类型及其主动轮数量的限制。在一些实施例中,左轮腿部及右轮腿部均为单轮腿式构型。单轮腿式构型是指该轮腿部仅包括单个主动轮。
例如,该左轮腿部与该右轮腿部可以包括相同的关节个数且具有相同的关节构型,或者,根据实际需要,该左轮腿部和该右轮腿部例如可以具有不同的关节个数或具有不同的关节构型,或既具有不同的关节个数又具有不同的关节构型。本公开的实施例不受该左轮腿部与该右轮腿部所具有的具体关节个数及关节构型的限制。以图1所示的示例为例,左、右轮腿部中各包含5个关节,且共有2个转动自由度,可以通过调整轮腿部103的各个关节来调整轮腿部/基座部的质心高度的变化和基座部的倾斜角。
机器人的腿部可以是串联结构腿,也可以是并联结构腿,与串联结构腿相比,并联结构腿能够具有更强的刚度,可承受复杂运动中可能带来的冲击。在与地面接触时,主动轮104可为轮腿式机器人100提供滑行的运动能力。可选地,双足轮腿式机器人100还可以包括附加部件102,附加部件102和基座部101连接。附加部件102上可装置被动轮105。附加部件102包括1个转动自由度。附加部件102的运动也会对基座部101和轮腿部103的变化产生影响,例如,附加部件的位置变化可以带动基座部,使其具有一定的转动速度。由此,可通过调整附加部件102的位置来调节机器人100的平衡与姿态。
轮腿式机器人100既具有轮式机器人的灵活性又具有腿式机器人的灵活性,因此可以在平坦的地面上快速移动、也可以穿越崎岖不平的道路。但是,对于某些轮腿式机器人(类似于图1所示的轮腿式机器人100),该机器人与地面之间只有两个接触点。并且该轮腿式机器人100是非最小相位系统,因此在实际应用中仍难以对该轮腿式机器人100进行平衡控制。此外,由于(轮腿式)机器人的复杂机械结构,很难确定该(轮腿式)机器人的动力学特性。由于传统的平衡控制方法需要知晓机器人的动力学特性,因此,在未知动力学特性的情况下,传统的平衡控制方法难以对这样的机器人进行平衡控制。
本公开的实施例提供的构建机器人的控制器的方法,涉及在机器人的动力学特性未知的情况下,利用自适应动态规划方法(ADP)和/或全身动力学方法来设计使得该机器人在运动中能够保持平衡的控制器。ADP本质上解决了无限时域LQR问题,但系统模型的参数是完全未知的。因此,众所周知的代数黎卡提方程无法解析求解。本公开的实施例实现了在无法基于系统模型求解LQR问题的情况下,仍能通过人工智能方案得到LQR问题的解。
可选地,自适应动态规划方法可以通过基于数据驱动(data-driven)的策略迭代(Policy Iteration,PI)方案。
例如,本公开的实施例可选地结合最优控制技术,提出了一种基于自适应动态规划的策略迭代方法。基于自适应动态规划的策略迭代方法能够在机器人的动力学参数存在改变或动力学特性未知的情况下,动态地迭代控制器,以获取对应于参数变化后的机器人的动力学特性的控制器。该控制器使得即使机器人的动力学参数变化,也能使得机器人在平衡状态下也能以最优的控制效果沿目标轨迹行进。
作为一个示例,基于自适应动态规划的数值迭代方法可以不需要任何初始的控制器,但所需要的数据量相对较多,更适用于离线迭代控制器。而基于自适应动态规划的策略迭代方法虽然需要初始的控制器,但是所需要的数据量远小于基于自适应动态规划的数值迭代方法。
本公开的实施例基于人工智能技术,例如强化学习和ADP技术,利用策略迭代、数值迭代或全身动力学控制技术,在未知机器人的动力学特性的情况下解决了机器人控制器的最优平衡控制问题。本公开的实施例的构建控制器的过程仅需要轮腿式机器人在非最优控制器或任意控制器的控制下行进一段时间或一段轨迹,并收集与该时间段或该轨迹相对应的运动状态数据和控制数据作为训练数据。由此,本公开的实施例的训练数据的数量远远小于传统的强化学习算法所需的数据量。
更进一步地,本公开的实施例的训练的控制器随着学习步长的增加而逐渐收敛到对应于线性二次调节问题的最优解的控制器,从而可以保证闭环系统的稳定性,其训练过程被大大的简化,并且不需要对训练数据进行额外的限制,从而简化了轮腿式机器人的控制器的设计过程。更进一步地,由于本公开的各个实施例均是从真实机器人上采集的数据,并将基于这些真实机器人数据获取的控制策略直接应用于机器人,从而无需考虑模拟控制和真实控制之间的差异,提高了控制器在真实机器人上的应用效果。
为便于进一步描述本公开,在这里先简要解释后文中可能使用到的各种运算符和集合的含义。
在本公开中,
Figure PCTCN2022134041-appb-000001
表示实数集。|·|表示向量的欧几里得范数(Euclidean norm)。
Figure PCTCN2022134041-appb-000002
表示克罗内克积(Kronecker product)。粗体字母代表向量或矩阵。斜体字母代表标量。
对于任意矩阵A=[a 1,…,a n],
Figure PCTCN2022134041-appb-000003
对于任意对称矩阵S=[s i,j],vecs(S)=[s 1,1,2s 1,2,…,2s 1,n,s 2,2,2s 2,3,…,2s n-1,n,s n,n] T。对于任意向量
Figure PCTCN2022134041-appb-000004
图2示出了根据本公开实施例的构建机器人的控制器的方法的示例性流程图。
如图2所示,根据本公开至少一实施例的构建机器人的控制器的方法可以包括步骤S201至步骤S203。可选地,步骤S201至步骤S203既可以在线地执行,也可以离线地执行,本公开并不以此为限。
如上所述,构建机器人的控制器的方法可以可选地应用于任何包括轮腿部、且轮腿部包括主动轮的机器人。为便于描述,接下来将以图1中示出的机器人100为例来进一步描述构建机器人的控制器的方法。为便于描述构建机器人的控制器的方法中涉及的各种特征量,参考图3对图1中的机器人100进行了进一步的标注。
例如,如图3所示,可以将图1中示出的复杂的机器人100在机器人的广义坐标系下进行标注。为便于标注,在图3中,主动轮的中心P 1和P 2被示出为两个分离的点,本领域技术人员应当理解,实质上P 1和P 2是相同的点。
具体地,在图3中,以q {·,·}和τ {·,·}分别标识轮腿部涉及各个关节的参数,其中,q {·,·}标识关节转动角度,τ {·,·}标识关节的扭矩。例如,如q {1,2}标识机器人的左轮腿部的 第一连杆与基座部之间关节转动角度,而τ {1,2}标识机器人的左轮腿部的第一连杆与基座部之间关节转动扭矩。虽然在图3中并未示出,可以对应地设置尾部关节的角度和转动扭矩。
本公开的实施例可选地结合全身动力学技术,提出一种基于自适应动态规划的全身动力学控制方法。该全身动力学控制方法将基于自适应动态规划而计算的机器人控制器的输出作为全身动力学控制的参考参数,从而提高该机器人运动的整体的灵活性。
根据本公开的一个方面,本公开的实施例可选地结合了最优控制技术,提出了一种基于自适应动态规划的策略迭代方法。该基于自适应动态规划的策略迭代方法能够在机器人的动力学参数存在改变的情况下,动态地迭代控制器中的线性平衡参数矩阵以获取对应于参数变化后的机器人的动力学特性的控制器。该控制器使得即使机器人的动力学参数变化,也能使得机器人在平衡状态下也能以最优的控制效果沿目标轨迹行进。
在步骤S201中,利用第一控制器控制机器人运动,并获取机器人在运动过程中的运动状态数据和控制数据。
可选地,机器人的精确动力学特性未知,或者仅能粗略的确定机器人的部分动力学特性。此外,机器人的动力学特性可能还涉及部分可变参数。以机器人100为例,随着机器人100的运动,机器人100的重心的高度可能改变。此外,如果利用机器人100搬运物体,机器人100的质量也可能对应地改变。不论是重心的高度的改变还是质量的改变,都可能导致机器人在运动过程中的动能、势能、动量等特性的改变,从而导致机器人100对应的动力学模型的变化。本公开虽然仅以重心高度和质量作为可变参数的示例进行了描述,但并不以此为限。
其中,第一控制器可以基于机器人的历史运动数据来构建。具体来说,处理器可以从机器人的历史运动数据中获取历史运动状态数据和历史控制数据,历史运动状态数据和历史控制数据的多样性度量高于预定阈值;根据历史运动状态数据和历史控制数据,使用数值迭代的方式计算线性平衡参数矩阵;以及基于线性平衡参数矩阵,构建用于控制机器人运动的第一控制器。
具体来说,处理器可以控制机器人沿预定轨迹运动,并获取运动过程中的运动状态数据和控制数据。预定轨迹可以基于机器人的结构特性、运动特性、动力特性粗略估计的,以采集机器人在各种运动情形(场景)下的运动数据从而使得运动状态数据和控制数据的多样性度量足够高。
控制机器人运动可以通过确定用于控制机器人轮腿部的每个关节的控制力矩来实现。
示例性的,处理器基于机器人的历史运动信息,自适应地确定控制主动轮转动的控制信息,基于控制主动轮转动的控制信息,确定用于控制多个关节的第一控制信息,第一控制信息使得机器人保持平衡,基于机器人的目标轨迹,确定用于控制多个关节的第二控制信息,第二控制信息使得机器人沿目标轨迹运动,然后基于机器人的运动约束条件、第一控制信息以及第二控制信息,确定机器人的轮腿部中每个关节的控制力矩,以使得能够基于控制力矩驱动各关节,以控制机器人运动。
可选地,第一控制器的线性平衡参数矩阵可以为稳定初始增益矩阵。可选地,控制器在一时刻提供的控制力负相关于线性平衡参数矩阵和机器人在时刻的运动状态数据之间的乘积。例如,控制器可以具有u(t)=-Kx(t)的形式,其中K是对应于机器人的线性平衡参数矩阵,u(t)是对应于控制器在时刻t输出的控制力或力矩中的至少一种,x(t)是对应于机器人在时刻t的运动状态数据。以图1至图4描述的机器人100为例,机器人100至少包括:包括多个关节的轮腿部、连接至轮腿部的基座部、以及用于控制轮腿部中的主动轮的驱动电机。对应地,运动状态数据包括:基座部俯仰角、基座部俯仰角速度、以及主动轮的线性速度。控制数据包括:驱动电机的输出力矩。对应地,由于运动 数据和控制数据均可以通过相关测量仪器离散地进行采集,因此,运动状态数据和控制数据均对应于多个离散的连续时刻或对应于多个连续的时间区间。
可选地,第一控制器可以是非最优控制器。非最优控制器例如是仅能使得机器人100沿目标轨迹跌跌撞撞地运动的控制器。例如,作为非最优控制器的第一控制器可以是与简化动力学特性对应的控制器。例如,对于复杂轮腿式机器人对应的精确的动力学模型,可以将其简化成仅由主动轮和基座部组成的等效动力学模型等等。
作为一个示例,可以使用第一控制器来控制机器人在类平衡状态下运动,例如,在一些第一控制器的控制下,机器人会以一定幅度在平衡点左右摆动。例如,如果在第一控制器控制下控制动力学特性未知的机器人在类平衡状态下运动,那么可以以第一控制器的输出作为控制数据。而如果由实验人员利用遥控器来控制机器人进行运动,那么可以通过在真实机器人上采集控制器的输出(例如,检测主动轮的驱动力等)来获取控制数据。本公开不对控制数据的获取方式进行限制。
又例如,第一控制器也可以是PID控制器等等。在一些情况下,第一控制器的线性平衡参数矩阵甚至可以任意的稳定控制增益矩阵。甚至,可以直接随机地以任意控制数据来控制机器人行进一段距离,并截取机器人在彻底失去平衡前(例如摔倒)前的控制数据和运动状态数据作为步骤201中得到的运动状态数据和控制数据即可。本公开不对第一控制器的具体设计方案进行限制,只要其能够控制机器人能够不彻底失去平衡即可。
作为一个示例,还可以使用利用数值迭代方案获取的一个或多个第一控制器来控制机器人进行运动,这一方案的具体实现将在后续详细描述。例如,在可变参数为第一值时,可以利用数值迭代方案确定第一控制器。其中,利用数值迭代方案确定第一控制器可以离线地实现。例如,令可变参数为机器人的身高,并令可变参数的第一值为0.38米,可变参数的第二值为0.5米。在机器人100的身高为0.38米时,可以利用数值迭代方案确定控制该机器人直线行走的最优控制器,并将该控制器作为第一控制器。然后,通过改变关节角度,将机器人100的身高调至0.5米。然后,继续利用第一控制器控制身高调高后的机器人行进一段时间或一段距离,并对应地采集运动状态数据和控制数据。后续将利用该运动状态数据和控制数据作为第二控制器的训练数据,以获取在可变参数为第二值时的最优控制器。
本领域技术人员应当理解上述的第一值和第二值均仅为示例,本公开并不限于此。虽然数值迭代方案能够确定与身高为0.38米的机器人适配的第一控制器,但是,在机器人的身高改变的情况下,需要离线地重新计算与身高变更后的机器人适配的第二控制器。离线计算可能是耗时的,并可能导致机器人的运动中断。
为降低计算量,可以利用步骤S202至S203,采用策略迭代的方案,构建第二控制器。具体地,在步骤S202中,根据运动状态数据和控制数据,使用策略迭代的方式对第一控制器的线性平衡参数矩阵进行更新。在步骤S203中,基于更新后的线性平衡参数矩阵,构建对应于机器人的动力学特性的第二控制器。
例如,在第二控制器的控制下的机器人,相对于在第一控制器的控制下的机器人,在运动过程中可以具有更优的控制效果。比如,在对应于机器人的动力学特性的第二控制器下的机器人在平衡点左右的摆动幅度可以比在第一控制器下的更小。又例如,在控制器的控制下的机器人,相对于在第一控制器的控制下的机器人,在运动过程中可以以较快的收敛于平衡点附近、或机器人的震荡更小、或控制速度更快、或超调量更小、或稳态误差更小等等。或者,在一些情况下,任意控制器和控制器具有同等的控制效果,但是控制器的控制输入更小。本公开对此不进行限制。
示例性的,以行进的机器人100为例进行说明。处于平衡状态下的机器人100在线性运动维度和旋转运动维度上可以是处于稳定平衡状态的。例如,处于平衡状态下的机 器人100能够在运动过程中保持与平衡点定义的状态相同或是非常近似的状态,或者能够在运动过程中以最快的速度或是最小的能耗恢复到与平衡点定义的状态。平衡点定义的状态可以是使得机器人100处于俯仰角为零、俯仰角对应的角速度为零、且线性速度处于目标速度的状态。例如,机器人100的姿态为竖直向上的状态,并且机器人100不具备旋转运动维度上的速度仅具备线性运动维度上的目标速度,即为机器人100处于平衡点定义的状态。
而处于类平衡状态下的机器人100,则是在运动过程中处于平衡点附近定义的状态。例如,类平衡状态下的机器人100在线性运动维度和旋转运动维度上可能处于由稳定平衡状态过渡到不稳定平衡状态的中间状态。类平衡状态下的机器人100在运动过程可能需要主动轮提供较大的力和力矩才能保证其不跌倒。例如,机器人100可以左右倾斜,并且机器人100具备线性运动维度上的速度的同时还具备旋转运动维度上的速度,即为机器人100处于类平衡点定义的状态。值得注意的是,本文中处于类平衡状态下的机器人100在运动中的某些时刻也可能在线性运动维度或旋转运动维度上处于接近不稳定平衡状态,只要其能够通过主动轮104的驱动力恢复到能够正常行进的状态即可。
作为一个示例,如果机器人100仅在主动轮104的控制下沿直线运动,平衡状态下的机器人100能够始终保持竖直向上的姿态以匀速直线运动,也即,不稳定平衡状态的机器人100的基座部的中轴线能够时刻垂直于水平线并且不具备旋转运动维度上的速度或加速度。而类平衡状态下的机器人100的基座部则可能具备倾斜角(俯仰角),并具备旋转运动维度上的速度或加速度中的至少一种。
在本公开的实施例中,可以首先使得机器人在第一控制器的控制下行进一段时间或是一段轨迹,并收集与该时间段或是该轨迹相对应的运动状态数据和控制数据作为训练数据。即使机器人100的动力学特性未知或是不准确或动力学特性在运动过程中改变,并且第一控制器为非最优控制器,本公开的实施例也能通过策略迭代确定作为最优控制器的第二控制器。本公开的实施例利用数值驱动的策略迭代方案来计算线性平衡参数矩阵,进而构建第二控制器。第二控制器的控制效果将优于第一控制器的控制效果。
所构建的第二控制器能够收敛于对应于线性二次调节问题的最优解的控制器。对应于线性二次调节问题的最优解的控制器也即是与该机器人的精确动力学特性对应的控制器。且对应于线性二次调节问题的最优解的控制器能够最小化机器人在运动过程中的成本泛函,以使得机器人在平衡状态下以最优的控制效果沿目标轨迹行进。之后将在下文中进一步描述策略迭代方案和线性平衡参数矩阵的计算方案。
由此,本公开的实施例的训练数据的数量远远小于传统的强化学习算法所需的数据量。更进一步地,本公开的实施例的训练的控制器随着学习步长的增加而逐渐收敛到对应于线性二次调节问题的最优解的控制器,从而可以保证闭环系统的稳定性,其训练过程被大大的简化,并且不需要对训练数据进行额外的限制,从而简化了机器人的控制器的设计过程。此外,本公开的实施例可以在实现真实机器人上直接进行数据采集,训练得到的控制器直接应用于真实机器人。本公开的实施例不需要在基于物理引擎的仿真器中进行数据采集,也省去了虚拟世界中的数据向现实世界中迁移带来的一些问题。具体地,参加图1至图4,针对任意带基座部的机器人,可以以
Figure PCTCN2022134041-appb-000005
来表征该机器人的广义坐标参数(generalized coordinates)。其中,机器人的广义坐标参数
Figure PCTCN2022134041-appb-000006
Figure PCTCN2022134041-appb-000007
包括基座部的姿态
Figure PCTCN2022134041-appb-000008
以及n j个关节角度
Figure PCTCN2022134041-appb-000009
Figure PCTCN2022134041-appb-000010
针对图1和图3中示出的机器人,也可以类似地得到该机器人的广义坐标参数q,其中,n j=12,q i可以为图3中以q {·,·}标识的任意关节之一。
基于机器人的广义坐标参数q,可以确定该机器人关节的广义速度集合
Figure PCTCN2022134041-appb-000011
Figure PCTCN2022134041-appb-000012
以及广义加速度集合
Figure PCTCN2022134041-appb-000013
本领域技术人员应当理解,
Figure PCTCN2022134041-appb-000014
Figure PCTCN2022134041-appb-000015
分别表示机器人本体(body)的瞬时角速度和瞬时角加速度。类似地,关节扭矩还可以使用τ=[τ 12,…,τ 8] T来标识。
由此,可以构建如下式(1)所示的通用的动力学模型。
Figure PCTCN2022134041-appb-000016
其中,
Figure PCTCN2022134041-appb-000017
M(q)用于表示机器人的质量矩阵。
Figure PCTCN2022134041-appb-000018
Figure PCTCN2022134041-appb-000019
用于表示机器人的重力项、离心力项和科里奥利力项。
Figure PCTCN2022134041-appb-000020
Figure PCTCN2022134041-appb-000021
矩阵S用于从所有关节中选择主动关节,其中,如果S中的某个元素的元素值为0,则代表其为无驱动关节,如果元素值不为0,则标识其为主动关节。f为机器人在与地面接触的时候,地面在接触点提供的广义力。
Figure PCTCN2022134041-appb-000022
J f是针对f串联的接触雅各布矩阵(concatenated contact Jacobian matrix)。λ是前腿部作用于后腿部的闭环力。
Figure PCTCN2022134041-appb-000023
J λ是针对λ的串联的接触雅各布矩阵。n c是主动轮和地面之间的接触点数量。考虑到闭环约束(也即,在真实机器人上,机器人的各个关节应当是固定连接的),n λ是开环环节之间的接触点的数量。针对图1和图3中示出的机器人,n c=2并且n λ=2。具体地,机器人的轮腿部是五连杆机构。五连杆机构的闭环约束的开环环节之间的(例如,图3中P 1和P 2点之间的)接触点的数量为2。
在获得机器人构型的基础上,可以考虑到机器人的行进过程,为机器人设置对应的控制架构和控制任务,并通过数学语言对控制架构和控制任务进行描述。以下参考图4来进一步描述上述的控制架构和控制任务。图4示出了根据公开实施例的机器人对应的控制架构图。具体地,以图3中标注的机器人为例,图4示出了对该机器人的多个示例性控制任务以及控制任务之间的关联。这些示例性控制任务的组合和关联又称为该机器人对应的动态全身动力学控制。
图4还示出了另一种示例,其利用测量值进行了机器人的运动状态估计,然后再将状态估计后的值输入至用于自适应地确定控制主动轮转动的控制信息的数据处理模块,以便于该数据模块能够更快速地学习各个时刻对应的测量值,以更高效地计算用于控制主动轮转动的最优控制器。
可选地,控制主动轮转动的控制信息既可以是主动轮的加速度,也可以是主动轮的力矩。虽然从数学意义上说,这两个物理量作为控制主动轮转动的控制信息并没有太大的区别,但是实际物理系统当中,并不是这两个物理量都可以被准确测量。因此,本领域技术人员在实验中,可以根据具体情况,选择数据测试效果较好,比较符合模型的物理量进行后续计算和迭代。
例如,机器人对应的动态全身动力学控制可以被描述为在保证机器人平衡的情况下,以最小化对于各个关节的总输入能量、并最小化与目标轨迹之间的误差为目标,对于机器人的各个关节进行控制。例如,可以以公式(2)来表示对于图3中标注的机器人的动态全身动力学控制目标argminz。
Figure PCTCN2022134041-appb-000024
其中,
Figure PCTCN2022134041-appb-000025
为目标轨迹针对各个关节设置的加速度的集合组合而成的向量。
Figure PCTCN2022134041-appb-000026
为各个关节在运动过程的加速度的集合组合而成的向量。τ des为目标轨迹针对各个关节设置的力矩的集合组合而成的向量。τ为各个关节在实际运动过程的力矩的集合组合而成的向量。f为机器人在与地面实际接触的时候,地面在接触点提供的广义力。λ是机器人在运动过程中前腿部作用于后腿部的闭环力。下标W q、W τ、W f、W λ分别标识
Figure PCTCN2022134041-appb-000027
τ、f和λ在计算公式(2)的范数时需要乘以的权重系数矩阵。
如图4所示,自适应动态规划确定的控制器将用于控制图1和图3中示出的主动轮。而主动轮的运动状态和动力状态将对应地向各个控制任务提供输入参考或输入限制,从而对机器人的姿势和平衡状态进行改变。对应地,为避免机器人失去平衡,图3中的主动关节(例如,q {1,2}和q {7,8})将在主动轮(例如,q {5,6})、无驱动关节(例如,q {3,4}和q {9,10})以及关节力矩(例如,τ {1,2}和τ {5,6})的共同作用下转动,以调整机器人的位姿,使得机器人保持平衡。
如图4所示,主动轮在自适应动态规划控制器的控制下的转动将向轮平衡控制任务、轮行进和旋转控制任务中的至少一种提供输入参考Ref。目标轨迹将向轮移动和旋转控制任务、基座部姿态控制任务、尾部控制任务提供输入参考。虽然主动轮和目标轨迹并未直接向其他的控制任务(例如,扭矩控制任务和外力控制任务)提供输入参考,但是考虑到各个控制任务往往需要对相同的机器人组件(例如,主轮、连杆组件、关节铰链等等)进行控制,这些控制任务的控制效果也往往受到主动轮和目标轨迹的限制。
进一步参考图4,机器人的运动还受到各种约束的限制,例如,各个关节能够提供的最大扭矩、机械构型的限制等等。图4中给出了四种示例约束,动力学(dynamic)约束、闭环联动(close-loop linkage)约束、非完整约束(nonholonomic)和摩擦力(friction)约束。
作为一个示例,公式(1)示出的动力学模型可以作为动力学约束的一个示例,以限定该机器人在运动过程中能量的变化范围。本领域技术人员应当理解动力学模型的限制不限于此。例如,为便于分析机器人的能量变化,可以对机器人建立简化的动力学模型,以简化公式(1)在动态全身动力学控制中对应的动力学模型限制。
作为又一个示例,公式(3)示出了针对图3中的机器人的一种闭环联动约束的示例。本领域技术人员应当闭环联动约束还可以以其它方式示出。本公开不限于此。
Figure PCTCN2022134041-appb-000028
其中,
Figure PCTCN2022134041-appb-000029
Figure PCTCN2022134041-appb-000030
分别是点P 1和P 2对应的雅克比矩阵。下标J ·,l和J ·,r分别标识机器人的左轮腿部和右轮腿部。
作为又一个示例,假设轮是纯滚动并与地面接触,在轮的径向和轴向不存在滑移和滑动,公式(4)示出了针对图3中的机器人的一种非完整约束的示例。本领域技术人员应当理解,非完整约束还可以以其它方式示出。
Figure PCTCN2022134041-appb-000031
其中,
Figure PCTCN2022134041-appb-000032
是主动轮-地面接触点相对于基座部的雅可比矩阵的x轴和z轴。
继续图4中的示例,摩擦力约束的设置可以基于这样的假设:实际运动过程中地面与机器人之间的接触点处的摩擦锥(friction cone)被近似为金字塔形的摩擦锥(friction pyramid)。在每个接触点对应的接触力f i的局部坐标系下,给定摩擦系数μ,摩擦力约束可以被表达为|f i,x|≤μf i,z和|f i,y|≤μf i,z
除了图4中示出的四种约束以外,还可以对应地设置单边约束。单边约束的示例可以是f i,z>0。
在受到上述的各种约束的情况下,可以对应地确定各种控制任务的控制模型。具体地,主动轮在自适应动态规划控制器的控制下的转动将向轮平衡控制任务提供了输入参考,而目标轨迹将向其它控制任务提供输入参考。例如,主动轮的转动速度将对基座部的姿态和速度造成影响,而基座部的姿态和速度将对机器人的平衡状态造成影响。
作为一个轮平衡控制任务的示例,为控制基座部的运动,可以通过PD控制律(比例微分控制器)来计算基座部的期望加速度
Figure PCTCN2022134041-appb-000033
在一个示例中,该PD控制律中的至少部分是基于针对姿态的输入参考和针对速度的输入参考得到的。
具体地,针对姿态的输入参考又称为参考姿态,其指示:由于主动轮在自适应动态规划控制器的控制下的转动,导致的除了关节q {5,6}以外的其他各个关节的姿态的变化。针对速度的输入参考又称为参考速度,其指示:由于主动轮在自适应动态规划控制器的控制下的转动,导致的除了关节q {5,6}以外的各个关节的速度的变化。
也即,可以以公式(5)来对公式(2)中的
Figure PCTCN2022134041-appb-000034
进行近似的表达。
Figure PCTCN2022134041-appb-000035
此外,还可以以公式(6)来进一步地对公式(2)中的
Figure PCTCN2022134041-appb-000036
进行近似的表达。在公式(6)中假定除关节q {5,6}以外的其他关节
Figure PCTCN2022134041-appb-000037
(i≠3,4)的力矩近似为零。
Figure PCTCN2022134041-appb-000038
又例如,以图1至图3描述的机器人为例,针对姿态的输入参考包括:机器人的重心到主动轮连线中心在地面上投影的距离(例如,以state_com_p标识)。针对速度的输入参考包括:基于机器人的重心到主动轮连线中心在地面上投影的距离的差分的得到的速度(例如,以state_com_v标识)、以及主动轮的线速度(以wheel_x_v标识)。上述的PD控制律可以以state_com_p、state_com_v、wheel_x_v作为输入状态解算得到主动轮的参考加速度或参考扭矩中的至少一种。
本公开的实施例可选地结合全身动力学技术,提出一种基于自适应动态规划的全身动力学控制方法。该全身动力学控制方法将基于自适应动态规划而计算的机器人某个关节的控制器的输出作为全身动力学控制的参考参数,从而使得该关节的控制器能够与其它关节的控制器相配合,从而提高该机器人运动的整体的灵活性。
可选地,本公开的实施例还对应地公开了一种利用对应于机器人的动力学特性的控制器的机器人运动控制的方法。机器人包括轮腿部和连接至轮腿部的基座部,轮腿部包括主动轮和至少一个关节。具体地,该方法包括:接收第二控制器的运动指令,运动指令指示机器人的运动轨迹;根据运动指令,利用对应于机器人的动力学特性的控制器控制主动轮的驱动力,以使得机器人沿着目标轨迹平稳运动。在对应于机器人的动力学特性的控制器的控制下的机器人,相对于在第一控制器的控制下的机器人,在运动过程中更靠近于平衡点。
可选地,本公开的实施例还对应地公开了一种控制机器人的方法。具体地,该方法包括:接收第一控制器的运动指令,运动指令指示机器人的运动轨迹;根据运动指令,控制主动轮的驱动力,以使得机器人在第一控制器的控制下运动并获取运动过程中的运动状态数据和控制数据;基于运动状态数据和控制数据,使用策略迭代的方式构建对应于机器人的动力学特性的第二控制器,利用第二控制器控制主动轮的驱动力,以使得机器人平稳运动。在第二控制器的控制下的机器人,相对于在任意其它控制器的控制下的机器人,在运动过程中具有更优的控制效果,例如,更靠近于平衡点。
由此,本公开的实施例的控制机器人的方法能够使得动力学特性未知的机器人学习运动过程中的数据,并逐步改进/生成对应于机器人的动力学特性的控制器,最终能够实现平稳运动。由于可以使用第一控制器的控制输入来控制机器人运动一段时间以获得训练数据,在这样的情况下,本公开的实施例实现了在动力学特性未知或动力学特性改变的情况下对非最优控制器的改进,生成了对应于机器人的(精确)动力学特性的第二控制器。也即本公开的实施例可以使得机器人在没有精确的动力学模型的情况下,也能够对机器人进行灵活控制。
例如,进一步地参考图5的示例,机器人100的运动过程在数学上可被看作一个连续时间线性系统。假设对于机器人100存在对应于线性二次调节问题的最优解的控制 器,其能够使得机器人的运动过程对应的成本泛函最小。例如,对应于线性二次调节问题的最优解的控制器能够最小化机器人处于平衡点附近的成本并能以最小能耗沿目标轨迹行进。
作为一个示例,线性二次调节问题可以由公式(7)定义,其指示在
Figure PCTCN2022134041-appb-000039
的情况下,求解能够最小化连续时间线性系统的成本泛函J的控制器。其中,类似地,
Figure PCTCN2022134041-appb-000040
并且
Figure PCTCN2022134041-appb-000041
Figure PCTCN2022134041-appb-000042
其中,J是该连续时间线性系统的成本泛函,Q是一个实对称且正半定的矩阵,
Figure PCTCN2022134041-appb-000043
Figure PCTCN2022134041-appb-000044
是可观测的,并且R>0。x与机器人构型和轮平衡任务相关。例如,参考图4中的示例,如果需要对主动轮确定控制器,那么x可选地包括俯仰角、俯仰角角速度以及机器人的线速度,u则是两个轮的输入扭矩之和。
根据最优控制理论,数学上,代数黎卡提(Algebraic Riccati)等式(公式(8))可作为由公式(7)定义的以下线性二次调节(LQR)问题的解。
Figure PCTCN2022134041-appb-000045
其中,u *(t)为对应于线性二次调节问题的最优解的控制器,u *(t)=-K *x(t),其中,
Figure PCTCN2022134041-appb-000046
P *为满足
Figure PCTCN2022134041-appb-000047
的矩阵。
如果机器人100的精确动力学特性已知,那么公式(7)和(8)中的矩阵A和B就已知。在已知公式(7)和(8)中的矩阵A和B的情况下,能够对应的求解出u *(t)。
然而,如上所述,在机器人100的精确动力学特性未知,或仅能够确定机器人100的部分动力学特性的情况下,在实际应用中无法确定上述的最优控制器u *(t)=-K *x(t)。更进一步地,公式(7)和(8)中的P并非线性,从而导致了难以求解出准确的P *
在本公开的实施例的各个方面中,在机器人100的精确动力学特性未知或动力学特性中的可变参数改变的情况下,或仅能够确定机器人100的部分动力学特性的情况下,通过策略迭代的方案来确定上述的最优控制器u *(t)=-K *x(t)。具体地,根据策略迭代的相关理论,可以对应地确定:
假设存在
Figure PCTCN2022134041-appb-000048
K 0是一个稳定控制增益矩阵。也即,A-BK 0是Hurwitz的。那么,如果通过公式(9)不断地更新K k,那么在k趋向于正无穷时,K k将趋近于K *,也即lim k→∞K k=K *
Figure PCTCN2022134041-appb-000049
在公式(9)中,P k>0,并且P k是李雅普诺夫(Lyapunov)方程的解。李雅普诺夫方程的示例参见公式(10)。
Figure PCTCN2022134041-appb-000050
在公式(9)和公式(10)中,k=0,1,2,…,A k=A-BK k。与K k类似地,lim k→∞P k=P *。由此,公式(9)和公式(10)描述了K k+1、K k和P k三者之间的关系。
由此,至少部分地基于上述公式(9)和公式(10),可以示例性地确定利用策略迭代方式对第一控制器的线性平衡参数矩阵进行更新的方案。
例如,上述的步骤S202可以进一步地包括:对对应于多个时间区间中的运动状态数据和控制数据进行非线性组合以确定训练数据集合,基于训练数据集合,确定迭代关 系函数;根据迭代关系函数,对迭代目标项进行多次策略迭代,逼近得到对应于机器人的动力学特性的线性平衡参数矩阵。
接下来,以图1至图4中描述的示例来说明步骤S202。根据图1至图4的描述,可以使用第一控制器u 0来控制机器人进行运动,并采集对应于多个时间区间中的运动状态数据和控制数据。具体地,例如,对于控制策略u=u 0,闭环系统可以以公式(11)所示。
Figure PCTCN2022134041-appb-000051
然后,通过公式(9)和公式(10),沿公式(11)定义的轨迹,x(t) TP kx(t)的对于时间导数可示例性地通过公式(12)示出。
Figure PCTCN2022134041-appb-000052
其中,
Figure PCTCN2022134041-appb-000053
进一步地,运动状态数据是通过传感器在一段时间中以一定的时间间隔采集的,其分别对应于一段时间内的各个离散的时刻的机器人的运动状态。因此运动状态数据和第一控制器的控制数据可以对应于[t 0,t r]中的多个时间区间。多个时间区间中的任意一个时间区间t至t+δt可以被记为[t,t+δt],其时长δt可以根据机器人传感器能够达到的数据收集时间间隔来确定。
参考图5,可以对对应于多个时间区间中的运动状态数据和控制数据分别进行非线性组合,以用于构建迭代关系函数。积分运算后的运动状态数据和控制数据将作为训练数据,参与步骤S202中的对迭代目标项进行策略迭代的过程,以逼近得到对应于机器人的动力学特性的线性平衡参数矩阵。值得注意的是以下描述的仅是一种示例性的积分运算,本公开并不以此为限。
例如,可以取公式(12)两边在时间区间[t,t+δt]上的积分,并重新排列公式(12),可以确定示例性的公式(13)。
Figure PCTCN2022134041-appb-000054
为了确定变量P k和k k+1,可以在多个时间区间对公式(13)进行迭代。例如,可以通过将r指定为一个足够大的整数,并使得对于所有i=0,1,…,r-1,δt≤t i+1-t i
根据公式(13)可知,任意两个相邻时刻t和t+δt之间的运动状态数据在时间上的积分可能与以下各项中的至少一项相关:时刻t的运动状态数据的二次项、时刻t+δt的运动状态数据的二次项、时刻t的运动状态数据与时刻t+δt的运动状态数据的乘积、时刻t的控制数据与时刻t的运动状态数据的乘积、时刻t+δt的控制数据与时刻t+δt的运动状态数据的乘积等等。可选地,时刻t的控制数据为使用第一控制器控制机器人行进的控制数据。
为便于进一步描述策略迭代的过程,对于给定的整数r,本公开的实施例可选地以公式(14)定义了以下三个矩阵作为训练数据集合中的示例元素,第一矩阵Δ xx、第二矩阵∑ xx、和第三矩阵∑ xu。其中,第一矩阵
Figure PCTCN2022134041-appb-000055
第二矩阵
Figure PCTCN2022134041-appb-000056
并 且
Figure PCTCN2022134041-appb-000057
每个矩阵都对应于多个时间区间中的运动状态数据和控制数据的非线性组合,例如涉及积分运算和乘积计算等等。
Figure PCTCN2022134041-appb-000058
其中,0≤t 0<t 1<…<t r。运算符
Figure PCTCN2022134041-appb-000059
表示克罗内克积(Kronecker product)。
例如,对于图1至图4描述的机器人100,第一矩阵Δ xx中的任意元素
Figure PCTCN2022134041-appb-000060
对应于时刻t i和时刻t it的基座部俯仰角、基座部俯仰角速度、主动轮的线性速度中任意两项的乘积或任意一项的二次项之差。第二矩阵∑ xx中的任意元素
Figure PCTCN2022134041-appb-000061
对应于时刻t i和时刻t it的基座部俯仰角、基座部俯仰角速度、主动轮的线性速度中任意两项的乘积或任意一项的二次项的积分。第三矩阵∑ xu中的任意元素
Figure PCTCN2022134041-appb-000062
对应于时刻t i和时刻t it的基座部俯仰角、基座部俯仰角速度、主动轮的线性速度中任意一项与由第一控制器控制的驱动力的乘积的积分。不同机器人的构型将对应于与不同的矩阵,以上仅作为示例示出,本公开并不以此为限。
接下来,针对不同的t,例如,t=t 0,t 1,…,t r,公式(13)的方程组可以示例性地写成公式(15)的形式。本领域技术人员应当理解,不同的训练数据的线性组合方式将对应地影响所构建的迭代关系函数的形式。以下仅是示例性的给出基于公式(13)而得到的迭代关系函数(例如,公式(15)),其中,迭代目标项包括待迭代的线性平衡参数矩阵,以及以待迭代的线性平衡参数矩阵为参数的李雅普诺夫方程的解。当然,本公开并不以此为限。
Figure PCTCN2022134041-appb-000063
其中,vec(·)标识对括号内的内容进行矢量化。此外,
Figure PCTCN2022134041-appb-000064
并且
Figure PCTCN2022134041-appb-000065
Figure PCTCN2022134041-appb-000066
可被定义成公式(16)中所示的形式。其中,如上述所述,k指示策略迭代的次数,P k为第k次策略迭代中的李雅普诺夫方程的解,K k为第k次策略迭代中使用的线性平衡参数矩阵,K k+1为第k+1次策略迭代中的线性平衡参数矩阵。
Figure PCTCN2022134041-appb-000067
在上述的从公式(13)到公式(15)之间的转换过程中,为了简化计算,可令
Figure PCTCN2022134041-appb-000068
由此,通过将公式(16)中的K k更新为公式(15)中的K k+1,策略迭代方案使得最优控制器的生成不再依赖于模型信息(A,B)。此外,公式(16)还可以收集在线采集的数据,并利用公式(15)将控制策略从K k更新为K k+1。因此,在公式(16)中收集的数据还可以被重复使用,以应用公式(15)针对k=0,1,…,l更新K k,并且更新过程可以是在线的或离线的。因此,这样的策略迭代过程还可以被称为脱机策略迭代(off-policy iteration)。
此外,为了确保唯一的一对(P k,K k+1)存在以满足公式(15)的要求,还需要满足公式(17)定义的秩条件。
rank([∑ xx ∑ xu])=n(n+3)/2                  (17)
具体地,根据lim k→∞P k=P *可知,如果相邻两次策略迭代对应的李雅普诺夫方程的解P k和P k+1之差小于预设值(例如一个非常小的值),那么迭代目标项收敛,并且策略迭代结束。
接下来参考图6至图7B进一步描述对图1至图4所示的机器人进行控制的方法。图6示出了机器人100的又一结构视图。图7A示出了利用第一控制器控制机器人的运动过程中的运动状态数据和控制数据。图7B示出了构建对应于机器人的动力学特性的控制器的过程中线性平衡参数矩阵的收敛过程,其中机器人的基座部高度分别为0.5米和0.6米。图7C示出了机器人利用第一控制器和第二控制器分别控制机器人在基座部高度为0.6米的情况下进行直线行走的运动状态数据。
如图6所示,机器人100除了图1至图4中所描述的轮腿部和基座部外,还包括数据采集装置、数据处理装置和驱动电机。
其中,数据采集装置可以被配置为:在利用第一控制器控制机器人运动的情况下,获取运动过程中的运动状态数据和控制数据。例如,数据采集器可以包括:第一传感器,用于测量基座部的俯仰角θ及其角速度
Figure PCTCN2022134041-appb-000069
第二传感器,用于测量左右主动轮的旋转角速度
Figure PCTCN2022134041-appb-000070
Figure PCTCN2022134041-appb-000071
其中,第一传感器可以是惯性测量单元(Inertial measurement unit,简称IMU),其可以包括三轴陀螺仪、三轴加速度计、或三轴磁力计。第二传感器可以是电机编码器,采样频率为200Hz。
数据处理装置,被配置为:根据运动状态数据和控制数据,使用策略迭代的方式对第一控制器的线性平衡参数矩阵进行更新;以及基于更新后的线性平衡参数矩阵,构建对应于机器人的动力学特性的第二控制器。
数据处理装置可以包括微处理器、数字信号处理器(“DSP”)、专用集成电路(“ASIC”)、现场可编程门阵列、状态机或用于处理从传感器线接收的电信号的其他处理器件。这种处理器件可以包括可编程电子设备,例如PLC,可编程中断控制器(“PIC”)、可编程逻辑器件(“PLD”)、可编程只读存储器(“PROM”)、电子可编程只读存储器等。
例如,数据处理装置还可以进一步地配置为对数据采集装置采集的数据进行进一步处理。例如,基于左右主动轮的旋转角速度
Figure PCTCN2022134041-appb-000072
Figure PCTCN2022134041-appb-000073
数据处理装置可以计算机器人100的线速度
Figure PCTCN2022134041-appb-000074
其中r w为主动轮半径。可选地,r w=0.1m。可选地,数据处理装置还可以计算机器人的偏航角速度
Figure PCTCN2022134041-appb-000075
其中w d为机器人宽度,可选地,w d=0.47m。
为便于说明,本公开仅给出利用第一控制器或第二控制器控制主动轮104的示例,本领域技术人员应当理解,本公开的方案也可以用于控制机器人的其它组件。由于主动轮仅用于控制机器人的向前和向后运动,对于弯曲的目标轨迹,还需要一个用于控制偏航角的控制器来控制机器人转向。为简化描述,将该控制偏航角的控制器设置为
Figure PCTCN2022134041-appb-000076
Figure PCTCN2022134041-appb-000077
其中
Figure PCTCN2022134041-appb-000078
是目标偏航角速度。然后通过
Figure PCTCN2022134041-appb-000079
Figure PCTCN2022134041-appb-000080
计算左右轮的扭矩。由于
Figure PCTCN2022134041-appb-000081
τ ψ不改变沿机器人纵向的力。因此,偏航运动不会影响机器人的平衡。此后,角度单位被转换为“度(deg)”,以便阅读。
接着,数据处理装置基于给定的目标轨迹,计算第一控制器的控制数据。为便于说明,后续以线性平衡参数矩阵K=K 0=[-81.99,-34.96,-16.38]的控制器u 0作为第一控制器作为示例进行说明。该第一控制器对应于机器人的身高最低时的,采用数值迭代的方式获得的,能够控制机器人100直立行走的最优控制器。具体地,机器人的最 低身高为0.33米。更进一步地,数据处理装置的控制频率可选地为1000Hz。
如上所述,运动状态数据和控制数据将用于计算第一矩阵Δ xx、第二矩阵∑ xx、和第三矩阵∑ xu。这些数据需要x和u的连续信号,因此在第一控制器和对应于机器人的动力学特性的控制器控制机器人100运动的情况下,数据处理装置进一步地还可以使用梯形积分来计算积分。梯形积分的步长为0.01s,与采样周期相同。
如图7A所示,可以将机器人的身高升至0.5米,使用第一控制器(其线性平衡参数矩阵K=K 0=[-81.99,-34.96,-16.38])叠加探索噪声β(t)控制机器人行走5秒,并对应地采集偏航角、偏航角速度、线速度和主动轮的扭矩。具体地,探索噪声通常用于学习和系统识别领域。探索噪声可以触发各种系统行为,以避免重复收集数据。作为一个示例,探索噪声β(t)=sin(10πt)+0.4cos(6πt)。
作为一个示例,数据处理装置还可以进一步被配置有如下指令来执行第一控制器的控制数据的计算以及第二控制器的构建。为便于表述,指令以伪代码的形式示出,本领域技术人员可以基于以下伪代码利用任何编程语言计算第一控制器的控制数据和构建第二控制器。
1:选择一个稳定的初始增益矩阵K 0,并使得t 0=0。
2:对机器人施加u 0(t)=-K 0x(t)+β(t),其中β(t)为噪声,并利用利用数据采集装置采集数据,以计算第一矩阵至第三矩阵直到公式(17)满足。
3:重复k+1赋值给k,并利用公式(15)求解P k和K k+1
4:如果|P k+1-P k|<ε就停止迭代。ε可以是一个很小的预设阈值。
5:使用u=-K kx作为第二控制器。
在数据处理装置被配置有上述的伪代码的情况下,如图7B的上图所示,线性平衡参数矩阵逐渐收敛至K=[-99.58 -35.87 -5.18]。如图7B的左图所示,仅使用37次迭代,就可以使得|P k+1-P k|<ε=10 -5
类似地,本公开的实施例还以u=-Kx作为第一控制器(其中,K=[-99.58 -35.87 -5.18]),然后将机器人的身高升至0.6米进行了一组类似的实验,如图7B的下图所示,线性平衡参数矩阵收敛至K=[-109.64,-34.08,-11.58]。如图7B的下图所示,仅使用47次迭代,就可以使得|P k+1-P k|<ε=10 -5。很明显,策略迭代的收敛速度非常快,因此本公开的实施例能够适用于在线计算。
图7C进一步示出了第一控制器u 0(t)=-K 0x(t)(其中,K 0=[-81.99,-34.96,-16.38])与第二控制器u(t)=-K 1x(t)(其中,K 1=[-109.64,-34.08,-11.58])的控制效果之间的比较。在该比较实验中,在第一控制器u 0(t)和第二控制器u(t)均加入相同的正弦噪声来模拟外界作用在轮上的扰动。如图7C的上图和下图所示,两个控制器对噪声都是鲁棒的,并且控制性能相似。然而,如图7C的下图中由第二控制器的较小幅度所指示的,更新的增益K 1在俯仰角速度
Figure PCTCN2022134041-appb-000082
的调节中控制效果更好,从而机器人的行进状态更为稳定。
本公开的实施例结合最优控制技术,提出了一种基于自适应动态规划的数值迭代方法,该基于自适应动态规划的数值迭代方法能够在机器人动力学特性未知的情况下计算收敛至对应于机器人的动力学特性的控制器。与该机器人的精确动力学特性对应的控制器也即对应于线性二次调节问题的最优解的控制器,其能够使得机器人在平衡状态下以最优的控制效果沿目标轨迹行进。
接下来参考图8进一步描述构建对应于机器人的动力学特性的第一控制器的示例。图8示出了基于机器人的历史运动数据来构建第一控制器的的流程图。
首先,机器人100的运动过程在数学上可被看作一个连续时间线性系统。假设对于 机器人100存在对应于线性二次调节问题的最优解的控制器,其能够使得机器人的运动过程对应的成本泛函最小。例如,对应于线性二次调节问题的最优解的控制器能够最小化机器人处于平衡点附近的成本并能以最小能耗沿目标轨迹行进。
作为一个示例,如上所述,已经阐述了公式(7)和公式(8)。如果机器人100的精确动力学特性已知,那么公式(7)和(8)中的矩阵A和B就已知。在已知公式(7)和(8)中的矩阵A和B的情况下,能够对应的求解出u *(t)。
然而,如上所述,在机器人100的精确动力学特性未知,或仅能够确定机器人100的部分动力学特性的情况下,在实际应用中无法确定上述的最优控制器u *(t)=-K *x(t)。更进一步地,公式(7)和(8)中的P并非线性,从而导致了难以求解出准确的P *
如上所述,根据LQR相关理论以及数值迭代相关理论,针对公式(7)和公式(8),如果(A,B)是可稳定的且
Figure PCTCN2022134041-appb-000083
是可观测的,那么对于任何S≥0,lim s→-∞P(s)=P *,其中,P(s)是下列微分黎卡提方程(公式(18))的解,而P *是公式(8)的解。
Figure PCTCN2022134041-appb-000084
也即,在s趋向负无穷时,P(s)将收敛于P *
基于以上理论,如图8所示,本公开的实施例示出了构建第一控制器的数据处理过程。
在步骤S801中,从机器人的历史运动数据中获取历史运动状态数据和历史控制数据,历史运动状态数据和历史控制数据的多样性度量高于预定阈值。
例如,可以控制机器人沿预定轨迹运动,并获取历史运动过程中的历史运动状态数据和历史控制数据。预定轨迹可以基于机器人的结构特性、运动特性、动力特性粗略估计的,以采集机器人在各种运动情形(场景)下的历史运动数据从而使得历史运动状态数据和历史控制数据的多样性度量足够高(例如,至少高于预定阈值)。在一个示例中,多样性度量可以信息熵来进行表征,其表征历史运动状态数据和历史控制数据均存在足够多的不重复/不相近的值。在又一个示例中,多样性度量还可以以数据特征量来表征。
此时,可以以任意控制器控制机器人沿预定轨迹运动。例如,可以手动控制机器人以不同的加速度沿直线运动,而不论机器人是否处于平衡稳定运动的状态。如图1至图4所示的机器人,如果主动轮104提供过大的加速度,机器人100很快就会向后倾倒。如果主动轮104提供过小的加速度,则无法很快到达目的地并可能向前倾倒。
因此,在本公开的一个示例中,可以采用以下方式收集满足多样性度量的历史运动状态数据和历史控制数据。
如图9所示,例如,可以先控制驱动电机输出第一扭矩,以使得机器人由于低速运动而失去平衡。例如,第一扭矩可以为较小值,从而在控制驱动电机输出第一扭矩的情况下,机器人的基座部的质心先升高后降低,并且机器人在失去平衡时基座部的前端与地面接触。也即,机器人从低头状态(基座部的质心较低的状态)向前冲,但由于冲的力不够大,抬头抬了一半又低下去。
然后,还可以控制驱动电机输出第二扭矩,以使得机器人由于高速运动而失去平衡。例如,第二扭矩可以为较大值,在控制驱动电机输出第二扭矩的情况下,机器人的基座部的质心先升高后降低,并且机器人在失去平衡时基座部的后端与地面接触。也即,机器人从低头状态(基座部的质心较低的状态)向前冲,但由于冲的力过大,过了 平衡点(基座部的质心最高点)向后侧倒下去。
然后,还可以控制驱动电机输出第三扭矩,以使得机器人维持平衡态一段时间。其中,在控制驱动电机输出第三扭矩的情况下,机器人的基座部的质心在机器人维持平衡态的情况下保持高度不变。或者控制驱动电机输出第四扭矩,以使得机器人维持类平衡态一段时间,类平衡状态下的机器人在运动过程中处于平衡点附近。在控制驱动电机输出第四扭矩的情况下,机器人的基座部在机器人维持类平衡态的情况下前后晃动。
如上所述,可以通过手动在远程遥控器输入的指令,并将这些指令发送给遥控控制器。遥控控制器接收到该指令后,可以确定其对应的控制数据。然后可以根据遥控控制器的控制数据,控制机器人运动,并获取运动过程中的运动状态数据。在一些实施例中,由于人眼和人手的反应速度难以满足机器人的控制要求,机器人的平衡很难通过手动控制。也即,遥控控制器并非是类平衡控制器,其往往导致机器人失去平衡。
接着,在步骤S802中,根据历史运动状态数据和历史控制数据,使用数值迭代的方式计算线性平衡参数矩阵。在步骤S803中,基于线性平衡参数矩阵,构建用于控制机器人运动的第一控制器。其中,在对应于机器人的动力学特性的控制器的控制下的机器人,相对于在遥控控制器的控制下的机器人,在运动过程中具有更优的控制效果。
可选地,对应于机器人的动力学特性的控制器为线性控制器,针对运动过程中的各个时刻,对应于机器人的动力学特性的控制器提供的控制力矩负相关于线性平衡参数矩阵和机器人的运动状态数据之间的乘积。
例如,针对图8中的步骤S802,其可以进一步包括:对多个时间区间中的历史运动状态数据和历史控制数据分别进行积分运算,构建迭代关系函数;以及根据迭代关系函数,对迭代目标项进行数值迭代,逼近得到对应于机器人的动力学特性的线性平衡参数矩阵。可选地,线性平衡参数矩阵K即为
Figure PCTCN2022134041-appb-000085
其中,s趋向负无穷。
接下来,以图1至图4中描述的示例来分别说明步骤S802。根据图1至图4的描述,历史运动状态数据和历史控制数据可以是用于训练的运动状态数据和用于训练的控制数据。可选地,用于训练的运动状态数据和用于训练的控制数据是机器人并未倾倒(例如,基座部的前端/后端或尾部未与地面接触)的时刻的历史运动状态数据和控制数据。也即,至少在该段运动过程中,基于公式(18),以下公式(19)成立。
Figure PCTCN2022134041-appb-000086
其中,H(s)=A TP(s)+P(s)A并且
Figure PCTCN2022134041-appb-000087
进一步地,历史运动状态数据是通过传感器在一段时间中按照一定的时间间隔采集的,其分别对应于一段时间内的各个离散的时刻的机器人的历史运动状态。因此历史运动状态数据和第一控制器的历史控制数据是可以对应于[t 0,t r]中的多个时间区间。多个时间区间中的任意一个时间区间t i至t i+1可以被记为[t,t+δt],其时长δt可以根据机器人传感器能够达到的数据收集时间间隔来确定。
例如,取公式(19)两边在时间区间[t,t+δt]上的积分,针对不同的t,例如,t=t 0,t 1,…,t r,可以得到公式(20)。
Figure PCTCN2022134041-appb-000088
其中,
Figure PCTCN2022134041-appb-000089
在公式(14)中已经给出了Δ xx,∑ xx,∑ xu的表达式。由此,通过不断地迭代地通过求解公式(20)并更新公式(18),在秩条件满足且存在一对唯一的(H(s),P(s))的情况下,线性平衡参数矩阵K *能够通过数值迭代生成,并且整个过程不再依赖于模型信息(A,B)。也即,在迭代目标项在数值迭代过程中收敛的情况下,可以停止数值迭代;然后根据收敛的迭代目标项,重建对应于机器人的 动力学特性的线性平衡参数矩阵。
如上所述,本公开的实施例仅通过采集动力学特性未知的机器人在失去平衡(摔倒)前的历史运动状态数据和历史控制数据,并通过对这些历史数据进行积分运算来构建第一矩阵至第三矩阵以作为训练数据。由此,本公开实施例的训练数据的数量远远小于传统的强化学习算法所需的数据量。本公开的实施例还对应地构建了迭代关系函数(例如,公式(20)),以使得目标迭代项(例如,P(s)、K(s)和H(s))随着学习步长的增加而逐渐收敛。并且,收敛的目标迭代项可以得到一个控制器,该控制器收敛于对应于线性二次调节问题的最优解的控制器,从而可以保证闭环系统的稳定性,其训练过程被大大的简化。整个过程不需要对训练数据进行额外的限制,从而简化了机器人的控制器的设计过程。
进一步地,处理器还可以对数据采集装置采集的数据进行进一步处理。为便于说明,本公开仅给出控制主动轮104的示例,本领域技术人员应当理解,本公开的方案也可以用于控制机器人的其它组件。接着,数据处理装置基于给定的目标轨迹,设定用于训练的控制数据。正如参考图3,本公开并不对用于训练的控制器的具体控制律进行限制。为便于说明本公开对于用于训练的控制器的非限制性,后续以实验人员手动控制机器人运动来提取运动状态数据和控制数据作为示例进行说明。更进一步地,数据处理装置的控制频率为1000Hz。
如上所述,运动状态数据和控制数据将用于计算Δ xx,∑ xx,∑ xu。这些数据需要x和u的连续信号。采用与图7A类似的方式搜集运动数据,例如可以收集在机器人100的基座部的高度l=0.33m的情况下,手动地利用远程遥控器输入指令,以确定遥控控制器控制机器人运动的数据。具体地,由于实验人员并不能准确知晓机器人100的动力学特性,并且人工手动控制机器人往往由于不能准确及时调节机器人的控制器,导致机器人摔倒。
具体地,还可以对采集的运动状态数据进行进一步处理以尽快地获得对应于机器人的动力学特性的控制器。以下示出一个采用数据迭代方案计算对应于机器人动力学特性的控制器的示例实验。如图10所示,机器人的身高为最小高度0.33m。并且由远程遥控器手动直接给出运动指令,以指示主动轮的扭矩。在该实验中,随着主动轮的扭矩增加,机器人从初始状态(以状态A示出)开始,并且使用主动轮移动(以状态B和状态C示出),并且最终摔倒(状态D)。由于,最终机器人失去了平衡,因此这样情况下的遥控控制器不是类平衡控制器。
类似的过程重复三次,三次采集的数据绘制在图11中,其中,扭矩是两个主动轮电机的总扭矩。特别地,当系统被假设为线性时,使用接近简化模型的线性区域的数据,即-20度<倾斜角<20度。如图11所示,三次数据采集的过程的持续时间分别为0.515秒、0.155秒、0.586秒,总共为1.256秒。任何非专业人员都可以通过遥控器手动输入扭矩来轻松地收集这些短时数据。此外,由于数据迭代的方案可以离线进行,从而可以容易地调节各项参数以使迭代项收敛。
针对图11中的运动状态数据和控制数据,设置Q=diag[20000,8000,3000],R=20,t i+1-t i=0.1s,可以得到图12所示出的P、K的迭代示意图。根据实验人员的测试,在第3275次数值迭代后,可以得到收敛的K=[-81.99,-34.96,-16.38]。
基于K=[-81.99,-34.96,-16.38],构建了对应于机器人的动力学特性的控制器的。利用该控制器控制真实机器人在图13所示的路径中行进,采集到了图14所示的倾斜角(其大致在正负2度内)、线性速度、偏航速度数据的测试数据,可见采用数据迭代的方案能够得到鲁棒性和稳定性都足够强的控制器。
本领域技术人员应当理解,控制器还可以用于控制其他运动,本公开并不以此为 限。此外,经测试,控制器的鲁棒性远远高于PID控制器,也即在外部对机器人100进行干扰时,控制器控制下的机器人能很快恢复平衡。
由此,本公开的实施例基于人工智能中的强化学习和ADP技术,利用数值迭代方案,在未知机器人的动力学特性的情况下解决了机器人的最优平衡控制问题。本公开的实施例的构建控制器的过程仅需要轮腿式机器人在非最优控制器/任意控制器的控制下行进一段时间/一段轨迹,并收集与该时间段/轨迹相对应的运动状态数据和控制数据作为训练数据。由此,本公开的实施例的训练数据的数量远远小于传统的强化学习算法所需的数据量。更进一步地,本公开的实施例的训练的控制器随着学习步长的增加而逐渐收敛到最对应于线性二次调节问题的最优解的控制器,从而可以保证闭环系统的稳定性,其训练过程被大大的简化,并且不需要对训练数据进行额外的限制,从而简化了轮腿式机器人的控制器的设计过程。
本申请提供了一种构建机器人的控制器的装置,所述装置包括:
运动控制模块,用于利用第一控制器控制机器人运动,并获取所述机器人在运动过程中的运动状态数据和控制数据;
策略迭代模块,用于根据所述运动状态数据和所述控制数据,使用策略迭代的方式对所述第一控制器的线性平衡参数矩阵进行更新;以及
第二控制器构建模块,用于基于更新后的线性平衡参数矩阵,构建对应于所述机器人的动力学特性的第二控制器。
在一些实施例中,所述机器人的动力学特性关联于至少一个可变参数;所述第一控制器对应于所述可变参数为第一值的动力学特性;所述第二控制器对应于所述可变参数为第二值的动力学特性。
在一些实施例中,所述第一控制器控制所述机器人在类平衡运动状态下运动,所述类平衡状态下的机器人在运动过程中处于平衡点附近;在所述第二控制器控制下的机器人,相对于在所述第一控制器控制下的机器人,在运动过程中具有更优的控制效果。
在一些实施例中,所述第一控制器和所述第二控制器均为线性控制器;在运动过程中的各个时刻,所述线性控制器提供的控制力矩,负相关于所述线性平衡参数矩阵和所述机器人的运动状态数据之间的乘积。
在一些实施例中,所述运动控制模块,还用于由所述第一控制器,根据所述机器人的当前运动状态,确定初始控制指令;对所述初始控制指令指示的控制数据施加扰动,得到所述第一控制器的控制数据;以及根据所述第一控制器的控制数据,控制所述机器人运动,并采集所述运动过程中的运动状态数据。
在一些实施例中,所述运动状态数据和所述控制数据对应于多个时间区间,所述策略迭代模块,还用于对对应于所述多个时间区间中的所述运动状态数据和所述控制数据进行非线性组合以确定训练数据集合;确定迭代目标项,并基于所述训练数据集合,确定迭代关系函数;以及根据所述迭代关系函数,对所述迭代目标项进行多次策略迭代,逼近得到对应于所述机器人的动力学特性的线性平衡参数矩阵。
在一些实施例中,所述策略迭代模块,还用于在各次策略迭代中,确定所述迭代目标项是否收敛,在所述迭代目标项收敛的情况下,停止策略迭代;以及根据收敛的所述迭代目标项,更新所述线性平衡参数矩阵。
在一些实施例中,所述迭代关系函数符合李雅普诺夫方程的形式,所述迭代目标项包括待迭代的线性平衡参数矩阵、以及以所述待迭代的线性平衡参数矩阵为参数的李雅普诺夫方程的解;所述迭代关系函数,用于根据本次策略迭代中的线性平衡参数矩阵以及本次策略迭代对应的李雅普诺夫方程的解,计算下次策略迭代对应的线性平衡参数矩阵。
在一些实施例中,所述迭代目标项收敛包括:相邻两次策略迭代对应的李雅普诺夫 方程的解之差小于预设值。
在一些实施例中,所述构建机器人的控制器的装置,还包括第一控制器构建模块,用于从机器人的历史运动数据中获取历史运动状态数据和历史控制数据,所述历史运动状态数据和所述历史控制数据的多样性度量高于预定阈值;根据所述历史运动状态数据和所述历史控制数据,使用数值迭代的方式计算线性平衡参数矩阵;以及基于所述线性平衡参数矩阵,构建用于控制所述机器人运动的第一控制器。
在一些实施例中,所述历史运动数据,是基于控制力矩驱动所述机器人的轮腿部中每个关节,以带动所述机器人沿目标轨迹运动时获取的运动数据;
所述构建机器人的控制器的装置,还包括控制力矩获取模块,用于基于机器人的已有运动信息,自适应地确定控制所述机器人的主动轮转动的控制信息;基于控制所述主动轮转动的控制信息,确定用于控制所述机器人的多个关节的第一控制信息,所述第一控制信息使得所述机器人保持平衡;基于所述机器人的目标轨迹,确定用于控制所述多个关节的第二控制信息,所述第二控制信息使得所述机器人沿目标轨迹运动;基于所述机器人的运动约束条件、所述第一控制信息以及所述第二控制信息,确定所述机器人的轮腿部中每个关节的控制力矩。
本申请提供了一种机器人运动控制装置,所述机器人通过驱动主动轮运动,所述装置包括:
指令接收模块,用于接收运动指令,所述运动指令指示所述机器人的运动轨迹;
指令执行模块,用于根据运动指令,通过所述第一控制器控制施加给所述主动轮的驱动力,以使得所述机器人按照所述运动轨迹运动;
数据获取模块,用于获取所述机器人在运动过程中的运动状态数据和控制数据;
策略迭代模块,用于基于所述运动状态数据和所述控制数据,使用策略迭代的方式构建对应于所述机器人的动力学特性的第二控制器;以及
驱动力控制模块,用于利用所述第二控制器控制施加给所述主动轮的驱动力,以使得所述机器人平稳运动。
本申请还提供了一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时实现以上各实施例所述的方法的步骤。
本申请还提供了一种计算机程序产品,包括计算机可读指令,所述计算机可读指令被一个或多个处理器执行时实现以上各实施例所述的方法的步骤。
根据实际需要,该机器人例如还可以包括总线、存储器、传感器组件、通信模块和输入输出装置等。本公开的实施例不受该机器人的具体组成部分的限制。
总线可以是将该机器人的各部件互连并在各部件之中传递通信信息(例如,控制消息或数据)的电路。
传感器组件可以用于对物理世界进行感知,例如包括摄像头、红外传感器超声波传感器等。此外,传感器组件还可以包括用于测量机器人当前运行及运动状态的装置,例如霍尔传感器、激光位置传感器、或应变力传感器等。
通信模块例如可以通过有线或无效与网络连接,以便于与物理世界(例如,服务器)通信。通信模块可以是无线的并且可以包括无线接口,例如IEEE 802.11、蓝牙、无线局域网(“WLAN”)收发器、或用于接入蜂窝电话网络的无线电接口(例如,用于接入CDMA、GSM、UMTS或其他移动通信网络的收发器/天线)。在另一示例中,通信模块可以是有线的并且可以包括诸如以太网、USB或IEEE 1394之类的接口。
输入输出装置可以将例如从用户或任何其他外部设备输入的命令或数据传送到机器人的一个或多个其他部件,或者可以将从机器人的一个或多个其他部件接收的命令或数据输出到用户或其他外部设备。
多个机器人可以组成机器人系统以协同地完成一项任务,该多个机器人通信地连接到 服务器,并且从服务器接收协同机器人指令。
上述技术中的程序部分可以被认为是以可执行的代码和/或相关数据的形式而存在的“产品”或“制品”,通过计算机可读的介质所参与或实现的。有形的、永久的储存介质可以包括任何计算机、处理器、或类似设备或相关的模块所用到的内存或存储器。例如,各种半导体存储器、磁带驱动器、磁盘驱动器或者类似任何能够为软件提供存储功能的设备。
所有软件或其中的一部分有时可能会通过网络进行通信,如互联网或其他通信网络。此类通信可以将软件从一个计算机设备或处理器加载到另一个。因此,另一种能够传递软件元素的介质也可以被用作局部设备之间的物理连接,例如光波、电波、电磁波等,通过电缆、光缆或者空气等实现传播。用来载波的物理介质如电缆、无线连接或光缆等类似设备,也可以被认为是承载软件的介质。在这里的用法除非限制了有形的“储存”介质,其他表示计算机或机器“可读介质”的术语都表示在处理器执行任何指令的过程中参与的介质。
本公开使用了特定词语来描述本公开的实施例。如“第一/第二实施例”、“一实施例”、和/或“一些实施例”意指与本公开至少一个实施例相关的某一特征、结构或特点。因此,应强调并注意的是,本说明书中在不同位置两次或多次提及的“一实施例”或“一个实施例”或“一替代性实施例”并不一定是指同一实施例。此外,本公开的一个或多个实施例中的某些特征、结构或特点可以进行适当的组合。
此外,本领域技术人员可以理解,本公开的各方面可以通过若干具有可专利性的种类或情况进行说明和描述,包括任何新的和有用的工序、机器、产品或物质的组合,或对他们的任何新的和有用的改进。相应地,本公开的各个方面可以完全由硬件执行、可以完全由软件(包括固件、常驻软件、微码等)执行、也可以由硬件和软件组合执行。以上硬件或软件均可被称为“数据块”、“模块”、“引擎”、“单元”、“组件”或“系统”。此外,本公开的各方面可能表现为位于一个或多个计算机可读介质中的计算机产品,该产品包括计算机可读程序编码。
除非另有定义,这里使用的所有术语(包括技术和科学术语)具有与本发明所属领域的普通技术人员共同理解的相同含义。还应当理解,诸如在通常字典里定义的那些术语应当被解释为具有与它们在相关技术的上下文中的含义相一致的含义,而不应用理想化或极度形式化的意义来解释,除非这里明确地这样定义。
上面是对本发明的说明,而不应被认为是对其的限制。尽管描述了本发明的若干示例性实施例,但本领域技术人员将容易地理解,在不背离本发明的新颖教学和优点的前提下可以对示例性实施例进行许多修改。因此,所有这些修改都意图包含在权利要求书所限定的本发明范围内。应当理解,上面是对本发明的说明,而不应被认为是限于所公开的特定实施例,并且对所公开的实施例以及其他实施例的修改意图包含在所附权利要求书的范围内。本发明由权利要求书及其等效物限定。

Claims (20)

  1. 一种构建机器人的控制器的方法,由处理器执行,所述方法包括:
    利用第一控制器控制机器人运动,并获取所述机器人在运动过程中的运动状态数据和控制数据;
    根据所述运动状态数据和所述控制数据,使用策略迭代的方式对所述第一控制器的线性平衡参数矩阵进行更新;以及
    基于更新后的线性平衡参数矩阵,构建对应于所述机器人的动力学特性的第二控制器。
  2. 如权利要求1所述的方法,其中,所述机器人的动力学特性关联于至少一个可变参数;
    所述第一控制器对应于所述可变参数为第一值的动力学特性;所述第二控制器对应于所述可变参数为第二值的动力学特性。
  3. 如权利要求1所述的方法,其中,所述第一控制器控制所述机器人在类平衡运动状态下运动,所述类平衡状态下的机器人在运动过程中处于平衡点附近;
    在所述第二控制器控制下的机器人,相对于在所述第一控制器控制下的机器人,在运动过程中具有更优的控制效果。
  4. 如权利要求1所述的方法,其中,所述第一控制器和所述第二控制器均为线性控制器;
    在运动过程中的各个时刻,所述线性控制器提供的控制力矩,负相关于所述线性平衡参数矩阵和所述机器人的运动状态数据之间的乘积。
  5. 如权利要求1所述的方法,其中,所述利用第一控制器控制机器人运动,并获取机器人在运动过程中的运动状态数据和控制数据,包括:
    由所述第一控制器,根据所述机器人的当前运动状态,确定初始控制指令;
    对所述初始控制指令指示的控制数据施加扰动,得到所述第一控制器的控制数据;以及
    根据所述第一控制器的控制数据,控制所述机器人运动,并采集所述运动过程中的运动状态数据。
  6. 如权利要求5所述的方法,其中,所述运动状态数据和所述控制数据对应于多个时间区间,所述根据所述运动状态数据和所述控制数据,使用策略迭代的方式对所述第一控制器的线性平衡参数矩阵进行更新,包括:
    对对应于所述多个时间区间中的所述运动状态数据和所述控制数据进行非线性组合以确定训练数据集合,
    确定迭代目标项,并基于所述训练数据集合,确定迭代关系函数;以及
    根据所述迭代关系函数,对所述迭代目标项进行多次策略迭代,逼近得到对应于所述机器人的动力学特性的线性平衡参数矩阵。
  7. 如权利要求6所述的方法,其中,所述根据所述迭代关系函数,对所述迭代目标项进行多次策略迭代,逼近得到对应于所述机器人的动力学特性的线性平衡参数矩阵,包括:
    在各次策略迭代中,确定所述迭代目标项是否收敛;
    在所述迭代目标项收敛的情况下,停止策略迭代;以及
    根据收敛的所述迭代目标项,更新所述线性平衡参数矩阵。
  8. 如权利要求7所述的方法,其中,所述迭代关系函数符合李雅普诺夫方程的形式,所述迭代目标项包括待迭代的线性平衡参数矩阵、以及以所述待迭代的线性平衡参数矩阵为参数的李雅普诺夫方程的解,
    所述迭代关系函数,用于根据本次策略迭代中的线性平衡参数矩阵以及本次策略迭代对应的李雅普诺夫方程的解,计算下次策略迭代对应的线性平衡参数矩阵。
  9. 如权利要求7所述的方法,其中,所述迭代目标项收敛包括:相邻两次策略迭代对 应的李雅普诺夫方程的解之差小于预设值。
  10. 如权利要求1至9中任一项所述的方法,其中,所述第一控制器的构建过程包括:
    从机器人的历史运动数据中获取历史运动状态数据和历史控制数据,所述历史运动状态数据和所述历史控制数据的多样性度量高于预定阈值;
    根据所述历史运动状态数据和所述历史控制数据,使用数值迭代的方式计算线性平衡参数矩阵;以及
    基于所述线性平衡参数矩阵,构建用于控制所述机器人运动的第一控制器。
  11. 如权利要求10所述的方法,所述历史运动数据,是基于控制力矩驱动所述机器人的轮腿部中每个关节,以带动所述机器人沿目标轨迹运动时获取的运动数据;
    所述控制力矩的获取过程,包括:
    基于机器人的已有运动信息,自适应地确定控制所述机器人的主动轮转动的控制信息;
    基于控制所述主动轮转动的控制信息,确定用于控制所述机器人的多个关节的第一控制信息,所述第一控制信息使得所述机器人保持平衡;
    基于所述机器人的目标轨迹,确定用于控制所述多个关节的第二控制信息,所述第二控制信息使得所述机器人沿目标轨迹运动;
    基于所述机器人的运动约束条件、所述第一控制信息以及所述第二控制信息,确定所述机器人的轮腿部中每个关节的控制力矩。
  12. 一种机器人运动控制方法,由处理器执行,所述机器人通过驱动主动轮运动,所述方法包括:
    接收运动指令,所述运动指令指示所述机器人的运动轨迹;
    根据运动指令,通过所述第一控制器控制施加给所述主动轮的驱动力,以使得所述机器人按照所述运动轨迹运动;
    获取所述机器人在运动过程中的运动状态数据和控制数据;
    基于所述运动状态数据和所述控制数据,使用策略迭代的方式构建对应于所述机器人的动力学特性的第二控制器;以及
    利用所述第二控制器控制施加给所述主动轮的驱动力,以使得所述机器人平稳运动。
  13. 一种机器人,所述机器人包括:
    数据采集装置,被配置为:在第一控制器控制机器人运动的情况下,获取所述机器人的运动状态数据;
    数据处理装置,被配置为:
    获取与所述运动状态数据对应的控制数据;
    基于所述运动状态数据和所述控制数据,使用策略迭代的方式对第一控制器的线性平衡参数矩阵进行更新;以及
    基于更新后的线性平衡参数矩阵,构建对应于所述机器人的动力学特性的第二控制器。
  14. 如权利要求13所述的机器人,所述机器人包括轮腿部和设置在所述机器人上的驱动电机;
    所述驱动电机,被配置为基于所述第一控制器或所述第二控制器,驱动所述轮腿部中的主动轮,以带动所述机器人运动。
  15. 一种构建机器人的控制器的装置,所述装置包括:
    运动控制模块,用于利用第一控制器控制机器人运动,并获取所述机器人在运动过程中的运动状态数据和控制数据;
    策略迭代模块,用于根据所述运动状态数据和所述控制数据,使用策略迭代的方式对所述第一控制器的线性平衡参数矩阵进行更新;以及
    第二控制器构建模块,用于基于更新后的线性平衡参数矩阵,构建对应于所述机器人的动力学特性的第二控制器。
  16. 如权利要求15所述的装置,其中,所述机器人的动力学特性关联于至少一个可变参数;
    所述第一控制器对应于所述可变参数为第一值的动力学特性;所述第二控制器对应于所述可变参数为第二值的动力学特性。
  17. 如权利要求15所述的装置,其中,所述第一控制器控制所述机器人在类平衡运动状态下运动,所述类平衡状态下的机器人在运动过程中处于平衡点附近;
    在所述第二控制器控制下的机器人,相对于在所述第一控制器控制下的机器人,在运动过程中具有更优的控制效果。
  18. 一种机器人运动控制装置,所述机器人通过驱动主动轮运动,所述装置包括:
    指令接收模块,用于接收运动指令,所述运动指令指示所述机器人的运动轨迹;
    指令执行模块,用于根据运动指令,通过所述第一控制器控制施加给所述主动轮的驱动力,以使得所述机器人按照所述运动轨迹运动;
    数据获取模块,用于获取所述机器人在运动过程中的运动状态数据和控制数据;
    策略迭代模块,用于基于所述运动状态数据和所述控制数据,使用策略迭代的方式构建对应于所述机器人的动力学特性的第二控制器;以及
    驱动力控制模块,用于利用所述第二控制器控制施加给所述主动轮的驱动力,以使得所述机器人平稳运动。
  19. 一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被一个或多个处理器执行时实现权利要求1至12中任一项所述的方法的步骤。
  20. 一种计算机程序产品,包括计算机可读指令,所述计算机可读指令被一个或多个处理器执行时实现权利要求1至12中任一项所述的方法的步骤。
PCT/CN2022/134041 2022-03-01 2022-11-24 构建机器人的控制器的方法、机器人的运动控制方法、装置以及机器人 WO2023165177A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/203,910 US20230305563A1 (en) 2022-03-01 2023-05-31 Method for building controller for robot, method, device for controlling motion of robot, and robot

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210194306.XA CN116736749A (zh) 2022-03-01 2022-03-01 构建机器人的控制器的方法和机器人
CN202210194306.X 2022-03-01

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/203,910 Continuation US20230305563A1 (en) 2022-03-01 2023-05-31 Method for building controller for robot, method, device for controlling motion of robot, and robot

Publications (1)

Publication Number Publication Date
WO2023165177A1 true WO2023165177A1 (zh) 2023-09-07

Family

ID=87882882

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/134041 WO2023165177A1 (zh) 2022-03-01 2022-11-24 构建机器人的控制器的方法、机器人的运动控制方法、装置以及机器人

Country Status (3)

Country Link
US (1) US20230305563A1 (zh)
CN (1) CN116736749A (zh)
WO (1) WO2023165177A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103381826A (zh) * 2013-07-31 2013-11-06 中国人民解放军国防科学技术大学 基于近似策略迭代的自适应巡航控制方法
CN113290559A (zh) * 2021-05-26 2021-08-24 深圳市优必选科技股份有限公司 机器人平衡控制方法、装置、机器人控制设备及存储介质
CN113498523A (zh) * 2019-03-06 2021-10-12 三菱电机株式会社 用于控制机器对象的操作的装置和方法以及存储介质
CN113753150A (zh) * 2021-05-31 2021-12-07 腾讯科技(深圳)有限公司 轮腿式机器人的控制方法、装置、设备及可读存储介质
US20220026871A1 (en) * 2020-07-23 2022-01-27 Mitsubishi Electric Research Laboratories, Inc. System and Method for Feasibly Positioning Servomotors with Unmodeled Dynamics

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103381826A (zh) * 2013-07-31 2013-11-06 中国人民解放军国防科学技术大学 基于近似策略迭代的自适应巡航控制方法
CN113498523A (zh) * 2019-03-06 2021-10-12 三菱电机株式会社 用于控制机器对象的操作的装置和方法以及存储介质
US20220026871A1 (en) * 2020-07-23 2022-01-27 Mitsubishi Electric Research Laboratories, Inc. System and Method for Feasibly Positioning Servomotors with Unmodeled Dynamics
CN113290559A (zh) * 2021-05-26 2021-08-24 深圳市优必选科技股份有限公司 机器人平衡控制方法、装置、机器人控制设备及存储介质
CN113753150A (zh) * 2021-05-31 2021-12-07 腾讯科技(深圳)有限公司 轮腿式机器人的控制方法、装置、设备及可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"Master's Thesis", 20 April 2017, UNIVERSITY OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA, CN, article DING, LIJIA: "Research on Balance Control and Gait Planning of Biped Robot", pages: 1 - 87, XP009548528 *

Also Published As

Publication number Publication date
US20230305563A1 (en) 2023-09-28
CN116736749A (zh) 2023-09-12

Similar Documents

Publication Publication Date Title
Kashyap et al. Particle Swarm Optimization aided PID gait controller design for a humanoid robot
Pi et al. Low-level autonomous control and tracking of quadrotor using reinforcement learning
WO2022252863A1 (zh) 轮腿式机器人的控制方法、装置、轮腿式机器人及设备
Wiedebach et al. Walking on partial footholds including line contacts with the humanoid robot atlas
CN109352656B (zh) 一种具有时变输出约束的多关节机械臂控制方法
CN113821045B (zh) 一种腿足机器人强化学习动作生成系统
JP6781101B2 (ja) 非線形システムの制御方法、二足歩行ロボットの制御装置、二足歩行ロボットの制御方法及びそのプログラム
Park et al. Pose and posture estimation of aerial skeleton systems for outdoor flying
WO2023165174A1 (zh) 构建机器人的控制器的方法、机器人的运动控制方法、装置以及机器人
Goher A two-wheeled machine with a handling mechanism in two different directions
CN109062039A (zh) 一种三自由度Delta并联机器人的自适应鲁棒控制方法
WO2023165177A1 (zh) 构建机器人的控制器的方法、机器人的运动控制方法、装置以及机器人
Madhumitha et al. Physical modeling and control of self-balancing platform on a cart
WO2023165192A1 (zh) 机器人控制方法、装置、机器人以及计算机可读存储介质
Zadeh et al. LQR motion control and analysis of a prototype spherical robot
Askari et al. Dynamic modeling and gait analysis for miniature robots in the absence of foot placement control
CN115781658A (zh) 构建机器人的控制器的方法和机器人
Majczak et al. Comparison of two efficient control strategies for two-wheeled balancing robot
Zhong et al. Theoretical and experimental study on remote dynamic balance control for a suspended wheeled mobile manipulator
CN105467841B (zh) 一种类人机器人上肢运动的类神经控制方法
CN108227493A (zh) 一种机器人轨迹跟踪方法
Zeini et al. Design and construction of a unicycle robot controlled by its center of gravity
Liu et al. Learning control of quadruped robot galloping
WO2024021767A1 (zh) 控制足式机器人的方法、装置、设备、足式机器人、计算机可读存储介质和计算机程序产品
Mayr et al. Static inertial parameter identification for humanoid robots using a torque-free support

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22929605

Country of ref document: EP

Kind code of ref document: A1