WO2021103392A1 - 基于对抗结构化控制的仿生机器鱼运动控制方法、系统 - Google Patents

基于对抗结构化控制的仿生机器鱼运动控制方法、系统 Download PDF

Info

Publication number
WO2021103392A1
WO2021103392A1 PCT/CN2020/085045 CN2020085045W WO2021103392A1 WO 2021103392 A1 WO2021103392 A1 WO 2021103392A1 CN 2020085045 W CN2020085045 W CN 2020085045W WO 2021103392 A1 WO2021103392 A1 WO 2021103392A1
Authority
WO
WIPO (PCT)
Prior art keywords
control
robotic fish
steering gear
model
bionic robotic
Prior art date
Application number
PCT/CN2020/085045
Other languages
English (en)
French (fr)
Inventor
吴正兴
喻俊志
闫帅铮
王健
谭民
Original Assignee
中国科学院自动化研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院自动化研究所 filed Critical 中国科学院自动化研究所
Priority to US17/094,820 priority Critical patent/US10962976B1/en
Publication of WO2021103392A1 publication Critical patent/WO2021103392A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/008Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B63SHIPS OR OTHER WATERBORNE VESSELS; RELATED EQUIPMENT
    • B63CLAUNCHING, HAULING-OUT, OR DRY-DOCKING OF VESSELS; LIFE-SAVING IN WATER; EQUIPMENT FOR DWELLING OR WORKING UNDER WATER; MEANS FOR SALVAGING OR SEARCHING FOR UNDERWATER OBJECTS
    • B63C11/00Equipment for dwelling or working underwater; Means for searching for underwater objects
    • B63C11/52Tools specially adapted for working underwater, not otherwise provided for

Definitions

  • the invention belongs to the field of bionic robot control, and specifically relates to a bionic robotic fish motion control method and system based on counter-structured control.
  • the bionic robotic fish plays an increasingly important role in many fields such as science education, hydrological monitoring, and biological motion analysis. Among them, good motion control can help the robotic fish achieve fast, stable and energy-saving underwater swimming, and better complete complex tasks. Therefore, in recent years, there have been endless research results on the motion optimization method of the bionic robotic fish.
  • DRL Deep Reinforcement Learning
  • the existing technology directly uses deep reinforcement learning methods to learn the nonlinear control law of the bionic robotic fish. Due to the lack of data, the low real-time visual feedback and the limitation of computing resources, the training difficulty is high. In the traditional method, the bionic fish The adopted motion control or simple intelligent control has low motion efficiency and poor robustness.
  • the present invention provides a bionic robotic fish motion control method based on counter-structured control.
  • the motion control method of the bionic robotic fish includes:
  • Step S10 Obtain the swimming path of the bionic robotic fish, and divide the swimming path into a set of sequentially connected basic subpaths;
  • Step S20 based on the starting point and the end point of each sub-path in the sub-path set in sequence, through the trained global control model of the steering gear, obtain the global control value of each steering gear of the biomimetic robotic fish at time t;
  • Step S30 based on the acquired pose information of the bionic robotic fish at time t and the global control values of each steering gear of the bionic robotic fish at time t, through the trained steering gear compensation control model, obtain the compensation control values of each steering gear of the bionic robotic fish at time t;
  • Step S40 Sum the global control values of the steering gears of the bionic robotic fish at time t and the compensation control values of the steering gears of the bionic robotic fish at time t to obtain the control values of the steering gears of the bionic robotic fish at time t+1, and pass the The control variables of each steering gear of the bionic robotic fish at time t+1 are controlled by the motion of the bionic robotic fish at time t+1;
  • the steering gear global control model and the steering gear compensation control model respectively include one-to-one corresponding sets of steering gear global control sub-models and steering gear compensation control sub-model pairs constructed for different types of sub-paths. .
  • the steering gear global control sub-model and the steering gear compensation control sub-model pair are constructed based on the CPG model and the DDPG network respectively, and are trained through an iterative confrontation method.
  • the training method is:
  • Step B10 constructing the optimization objective function of the pair of the steering gear global control sub-model and the steering gear compensation control sub-model;
  • Step B20 Optimize the parameters of the steering gear global control sub-model according to the preset first gradient function gradient descent direction through the ES algorithm, until the value of the optimized objective function no longer increases or the increase value is lower than the set first threshold, and obtain The first steering gear global control sub-model;
  • Step B30 based on the parameters of the first steering gear global control sub-model, perform parameter optimization of the action strategy network and the action value network in the steering gear compensation control sub-model according to the preset second gradient function gradient descent direction, until the optimization The value of the objective function no longer increases or the increase value is lower than the set first threshold, and the first steering gear compensation control sub-model is obtained;
  • Step B40 based on the parameters of the first steering gear compensation control sub-model, skip to step B20, and iteratively optimize the parameters of the steering gear global control sub-model and the steering gear compensation control sub-model until the value of the optimization objective function is not Then increase or increase the value below the set first threshold to obtain the trained steering gear global control sub-model and steering gear compensation control sub-model.
  • the objective function is:
  • represents the object optimized by the objective function, that is, CPG model parameters and DDPG network parameters;
  • ⁇ e represents the yaw angle of the bionic robotic fish to the target point, and ⁇ e ⁇ (- ⁇ , ⁇ ] is its setting range;
  • Represents the velocity vector of the bionic robotic fish in the world reference frame Represents the modulus of the speed vector, v 0 is the upper limit of the speed preset to ensure the effect of energy consumption optimization;
  • v 0 is the upper limit of the speed preset to ensure the effect of energy consumption optimization;
  • is a positive value, which represents the correlation coefficient between reward and loss.
  • the first gradient function is:
  • F( ⁇ ) represents the optimization objective function
  • represents the CPG model parameters
  • represents the step size of the parameter disturbance
  • represents the gradient direction of the parameter disturbance
  • It represents the mathematical expectation of the optimized objective function obtained by the motion of the biomimetic robotic fish under the control of ⁇ whose gradient direction is updated by sampling from the standard normal distribution.
  • the second gradient function is:
  • ⁇ Q ) represents the action state value function
  • ⁇ ⁇ ) represents the action strategy function
  • N represents the number of samples in the batch update method
  • i represents the sample obtained from the experience pool
  • the i-th sample, a represents the control amount
  • s i represents the state of the i-th sample
  • J represents the objective function of the action strategy network
  • step S40 "summing the global control values of the steering gears of the bionic robotic fish at time t and the adjustment values of the steering gears of the bionic robotic fish at time t", the method is as follows:
  • a t represents ROBOFISH servo control signals, s t, Respectively represent the state and expected state of the bionic robotic fish at time t, Respectively represent the global control amount of the steering gear and the compensation control amount of the steering gear related to the state of the bionic robotic fish.
  • a bionic robotic fish motion control system based on counter-structured control.
  • the bionic robotic fish motion control system includes a path acquisition module, a steering gear global control module, a steering gear compensation control module, and a steering gear.
  • the path obtaining module is configured to obtain a swimming path of a bionic robotic fish, and divide the swimming path into a set of sequentially connected basic subpaths;
  • the steering gear global control module is configured to obtain the global control value of each steering gear of the bionic robotic fish at time t based on the starting point and the end point of each sub-path in the sub-path set in sequence, through the trained global control model of the steering gear ;
  • the steering gear compensation control module is configured to obtain the bionic robot fish at time t based on the acquired position and attitude information of the bionic robotic fish at time t and the global control values of each steering gear of the bionic robotic fish at time t, and obtain the bionic robotic fish at time t through the trained steering gear compensation control model Compensation control amount of each steering gear;
  • the steering gear acquisition module is configured to sum the global control variables of the steering gears of the bionic robotic fish at time t and the compensation control values of the steering gears of the bionic robotic fish at time t to obtain the rudders of the bionic robotic fish at time t+1 Machine control
  • the motion control module is configured to perform motion control of the bionic robotic fish at time t+1 through the control quantities of each steering gear of the bionic robotic fish at time t+1.
  • a storage device in which a plurality of programs are stored, and the programs are suitable for being loaded and executed by a processor to realize the above-mentioned anti-structured control-based motion control method of a bionic robotic fish.
  • a processing device including a processor and a storage device; the processor is suitable for executing each program; the storage device is suitable for storing multiple programs; the program is suitable for Loaded and executed by the processor to realize the above-mentioned bionic robotic fish motion control method based on counter-structured control.
  • the present invention is based on a bionic robotic fish motion control method against structured control, combined with prior knowledge of the fish's periodic movement, and optimized the rhythm signal generated by the CPG model through the evolutionary strategy (ES, Evolutionary Strategy) as the benchmark of the robotic fish
  • the control signal combined with the use of the deep reinforcement learning algorithm to learn the compensation control signal near the reference control signal to perform the common control of the bionic robotic fish, the generated control law conforms to the sinusoidal signal of the fish body wave, thereby ensuring the high swimming of the robotic fish It is more efficient, and compared to directly using deep reinforcement to learn complex nonlinear control laws, training and optimizing the CPG model involves fewer parameters, which reduces the difficulty of optimizing training.
  • the present invention is based on the bionic robotic fish motion control method against structured control, and proposes a corresponding objective function for the energy-saving motion optimization task to achieve the complex requirements of the robotic fish to complete the motion target while reducing the motion loss; Training methods are used to improve the defect of traditional heuristic optimization algorithms that are easy to fall into local optimal values, and further improve the motion efficiency and robustness of robotic fish.
  • Fig. 1 is a schematic flow chart of the motion control method of a bionic robotic fish based on counter-structured control according to the present invention
  • FIG. 2 is a schematic diagram of the algorithm structure of an embodiment of the bionic robotic fish motion control method based on the anti-structured control of the present invention
  • FIG. 3 is a schematic diagram of Mujoco robotic fish simulation training of an embodiment of a bionic robotic fish motion control method based on counter-structured control according to the present invention
  • FIG. 4 is a schematic diagram of real robotic fish numerical simulation training of an embodiment of a bionic robotic fish motion control method based on counter-structured control according to the present invention
  • FIG. 5 is an example diagram of a real four-bar bionic robotic fish after optimizing a set of poor initial states in an embodiment of the motion control method of a bionic robotic fish based on counter-structured control according to the present invention.
  • a motion control method of a bionic robotic fish based on counter-structured control of the present invention includes:
  • Step S10 Obtain the swimming path of the bionic robotic fish, and divide the swimming path into a set of sequentially connected basic subpaths;
  • Step S20 based on the starting point and the end point of each sub-path in the sub-path set in sequence, through the trained global control model of the steering gear, obtain the global control value of each steering gear of the biomimetic robotic fish at time t;
  • Step S30 based on the acquired pose information of the bionic robotic fish at time t and the global control values of each steering gear of the bionic robotic fish at time t, through the trained steering gear compensation control model, obtain the compensation control values of each steering gear of the bionic robotic fish at time t;
  • Step S40 Sum the global control values of the steering gears of the bionic robotic fish at time t and the compensation control values of the steering gears of the bionic robotic fish at time t to obtain the control values of the steering gears of the bionic robotic fish at time t+1, and pass the The control variables of each steering gear of the bionic robotic fish at time t+1 are controlled by the motion of the bionic robotic fish at time t+1;
  • the motion control method of a bionic robotic fish based on counter-structured control includes steps S10 to S50, and each step is described in detail as follows:
  • Step S10 Obtain the swimming path of the bionic robotic fish, and divide the swimming path into a set of sequentially connected basic subpaths.
  • FIG. 1 it is a schematic diagram of the algorithm structure of an embodiment of the bionic robotic fish motion control method based on the anti-structured control of the present invention.
  • the final control signals of the bionic robotic fish are respectively generated by the global reference control and the local compensation control.
  • Global reference control is a parameter-optimized CPG model, responsible for generating rhythmic signals as global reference control signals;
  • local compensation control is a real-time system obtained through DDPG training, the input is the real-time pose information of the bionic robotic fish, and the output is The amount of compensation control is the same as the number of servos based on position control.
  • the bionic robotic fish produces a global motion trend, and the compensation signal helps the robotic fish to fine-tune the current state based on the reference signal, thereby calibrating the path, improving the motion accuracy and reducing the motion loss.
  • the entire swimming task of the biomimetic robotic fish can be divided into relatively simple subtasks, and each subtask corresponds to a simple swimming path, such as turning left, turning right, going straight and so on. Between adjacent subtasks, the ending point of the previous path and the starting point of the next path are the same point. Through various combinations of simple swimming paths, the motion control of the complex swimming tasks of the biomimetic robotic fish is finally realized.
  • step S20 based on the starting point and the end point of each sub-path in the sub-path set in sequence, through the trained global control model of the steering gear, the global control amount of each steering gear of the biomimetic robotic fish at time t is obtained.
  • Step S30 based on the acquired position and attitude information of the bionic robotic fish at time t and the global control values of each steering gear of the bionic robotic fish at time t, through the trained steering gear compensation control model, obtain the compensation control values of each steering gear of the bionic robotic fish at time t.
  • the steering gear global control model and the steering gear compensation control model respectively include one-to-one corresponding sets of steering gear global control sub-models and steering gear compensation control sub-model pairs constructed for different types of sub-paths.
  • the steering gear global control sub-model and the steering gear compensation control sub-model pair are respectively constructed based on the CPG model and the DDPG network, and are trained through an iterative confrontation method.
  • the training method is:
  • Step B10 construct the optimized objective function of the pair of the steering gear global control sub-model and the steering gear compensation control sub-model, as shown in formula (1):
  • represents the object optimized by the objective function, that is, CPG model parameters and DDPG network parameters;
  • ⁇ e represents the yaw angle of the bionic robotic fish to the target point, and ⁇ e ⁇ (- ⁇ , ⁇ ] is its setting range;
  • is a positive value, which represents the correlation coefficient between reward and loss.
  • the motion optimization method proposed by the present invention respectively aims at two different models.
  • the proposed optimization objective function has generalization.
  • Step B20 Optimize the parameters of the steering gear global control sub-model according to the preset first gradient function gradient descent direction through the ES algorithm, until the value of the optimized objective function no longer increases or the increase value is lower than the set first threshold, and obtain The global control sub-model of the first steering gear.
  • biological CPG is a dedicated neural network located in the spinal cord, which has the ability to produce coordinated rhythmic activity patterns, such as breathing, chewing, or leg movements during walking.
  • the CPG model can generate rhythm signals without any input from feedback or higher control centers.
  • CPG model-based control is widely used to generate swimming strategies for various robotic fish.
  • the CPG model is used as an online gait generator, simply changing the characteristics of the output signal, even if the parameters change suddenly, it can remain smooth and continuous. Therefore, the global reference control of the present invention also uses a steering gear global control model constructed based on the CPG model to generate a robotic fish global control signal.
  • the present invention takes the global benchmark control as the initial optimization object, and uses the ES algorithm to optimize the parameters of the CPG model.
  • the ES algorithm in reinforcement learning to perturb the parameters in the CPG model by generating mirrored random gradients, control the robot fish to move in the environment and obtain reward feedback of different sizes, and finally update the CPG model parameters with different weights according to the reward ranking.
  • the gradient function is shown in formula (2):
  • F( ⁇ ) represents the optimization objective function
  • represents the CPG model parameters
  • represents the step size of the parameter disturbance
  • represents the gradient direction of the parameter disturbance
  • It represents the mathematical expectation of the optimized objective function obtained by the motion of the biomimetic robotic fish under the control of ⁇ whose gradient direction is updated by sampling from the standard normal distribution.
  • FIG. 3 it is a schematic diagram of Mujoco robotic fish simulation training of an embodiment of the bionic robotic fish motion control method based on counter-structured control of the present invention.
  • the continuous curve Speed ( ⁇ 1000) represents the changes in the head linear velocity of the robotic fish in the straight-swimming task in different training rounds
  • the abscissa round represents the number of training rounds
  • the ordinate value represents the optimization objective function value.
  • Step B30 based on the parameters of the first steering gear global control sub-model, perform parameter optimization of the action strategy network and the action value network in the steering gear compensation control sub-model according to the preset second gradient function gradient descent direction, until the optimization The value of the objective function no longer increases or the increase value is lower than the set first threshold, and the first steering gear compensation control sub-model is obtained.
  • the method of the present invention locks the global reference control signal output by the module, that is, fixes the CPG model parameters, and then converts the training object to update the parameters of the action strategy network and action value network in DDPG.
  • the second The gradient function is shown in formula (3):
  • ⁇ Q ) represents the action state value function
  • ⁇ ⁇ ) represents the action strategy function
  • N represents the number of samples in the batch update method
  • i represents the sample obtained from the experience pool
  • the i-th sample, a represents the control amount
  • s i represents the state of the i-th sample
  • J represents the objective function of the action strategy network
  • the present invention proposes to use the DDPG algorithm to generate real-time control local compensation control signals, which is mainly derived from the core idea of residual neural network: training the residual compensation control signal on the control signal that has obtained the better result, and its worst result Only the residual control network output is zero, which is equivalent to only controlling the motion of the robotic fish through the global reference control signal. Therefore, the present invention sets the weight and bias of the DDPG action strategy network to be 0. At the same time, according to the limitation of the maximum rotation angle ⁇ max per unit time of the robotic fish carrying the steering gear, the method of the present invention sets the action strategy network output as shown in equation (4) Show:
  • a t represents the output signal of the operation timing of each partial compensation control, Represents the output of the output layer of the action strategy network.
  • the nonlinear activation function tanh limits the output range to [-1,1], and K represents the upper limit of the compensation signal fine-tuning amount set according to ⁇ max.
  • the action strategy network designed by the method of the present invention includes two hidden layers, and each layer includes 64 nodes. Dimensions and the actual number of input state servos related to multi-link robotic fish, which are mainly physical meaning: the current position and the target distance from the point P i, the current position and the target angular deviation of the point P i, the current heading angle, each servo Rotation angle, rotation angular velocity of each servo.
  • the action value network also has two hidden layers, each containing 64 nodes. Among them, the state and action use vector splicing to form the input of the value network, and the output of the value network is the action state value function Q ⁇ (s, a).
  • DDPG and ES use the same optimization objective function, but ES uses the Monte Carlo method, taking the total reward of a period of Episode as the feedback score; DDPG uses the time difference method, and the network parameters are updated for each step of the movement. Stop DDPG training when the final objective function score converges.
  • Step B40 based on the parameters of the first steering gear compensation control sub-model, skip to step B20, and iteratively optimize the parameters of the steering gear global control sub-model and the steering gear compensation control sub-model until the value of the optimization objective function is not Then increase or increase the value below the set first threshold to obtain the trained steering gear global control sub-model and steering gear compensation control sub-model.
  • Fig. 4 it is a schematic diagram of real robotic fish numerical simulation training of an embodiment of the bionic robotic fish motion control method based on counter-structured control of the present invention.
  • the gray curve Cost Curve represents the loss term in the optimized objective function in different counter-training rounds.
  • the black curve Reward Curve represents the change of the reward item in the optimized objective function under different adversarial training rounds.
  • 1st ES, 2nd ES, and 3rd ES respectively represent the first, second, and third updates through the evolutionary strategy algorithm.
  • CPG model parameters, 1st RL, 2nd RL, 3rd RL represent the first, second, and third update of the DDPG model parameters respectively.
  • the abscissa round represents the number of adversarial training rounds, and the ordinate value represents the optimized objective function value.
  • the score of the objective function is improved again.
  • the objective function score has stabilized and is no longer improved.
  • the method of the present invention can bring obvious motion optimization effects to the bionic robotic fish, and obtain a higher degree of task completion.
  • Step S40 Sum the global control values of the steering gears of the bionic robotic fish at time t and the compensation control values of the steering gears of the bionic robotic fish at time t to obtain the control values of the steering gears of the bionic robotic fish at time t+1, and pass the At t+1, the control variables of each steering gear of the bionic robotic fish are controlled by the motion of the bionic robotic fish at t+1.
  • Control algorithms based on traditional control theory, such as PID, Active Disturbance Rejection Control (ADRC, Active Disturbance Rejection Control), etc. usually only focus on the single purpose of reducing tracking errors when solving the path tracking problem of the bionic robotic fish. It is very difficult to solve the control law that combines high performance and low power consumption by theoretically derived methods. Therefore, the method of the present invention transforms the problem of solving the control law into a target optimization problem, thereby achieving the task requirements of both high tracking accuracy and low power consumption.
  • we set the reference control signal as an optimized rhythm signal.
  • the global reference control is designed to use ES to optimize the parameters of the CPG model, and the compensation control is designed to further optimize and stabilize the local motion of the reference control through the DDPG algorithm.
  • the linear combination of the two signals is the final control law, as in equation (5) Shown:
  • a t represents ROBOFISH servo control signals, s t, Respectively represent the state and expected state of the bionic robotic fish at time t, Respectively represent the global control amount of the steering gear and the compensation control amount of the steering gear related to the state of the bionic robotic fish.
  • FIG. 5 it is an example diagram of a real four-bar bionic robot fish that has been optimized for a set of poor initial states and a direct swimming example diagram based on an embodiment of the anti-structured control-based motion control method of a bionic robotic fish.
  • Figure 5(a) shows that at the beginning of the experiment, the robotic fish was still in the water, and then realized the direct swimming task with a poor swimming posture.
  • Figure 5(b) shows that although the robotic fish only completes the direct-swimming target under the control of the CPG model, the robotic fish has a very large swing amplitude, and the path recorded by the global vision system presents jagged fluctuations.
  • FIGS 5 (c) and (d) show the robot fish's straight-swimming motion path optimized by the method of the present invention. It can be intuitively seen that the path recorded by the global vision system is almost a straight line with minimal fluctuation. We can also see that energy is well preserved under the requirement of ensuring that the speed is not reduced or even increased.
  • the second embodiment of the present invention is a bionic robotic fish motion control system based on counter-structured control.
  • the bionic robotic fish motion control system includes a path acquisition module, a steering gear global control module, a steering gear compensation control module, and a steering gear acquisition module , Motion control module;
  • the path obtaining module is configured to obtain a swimming path of a bionic robotic fish, and divide the swimming path into a set of sequentially connected basic subpaths;
  • the steering gear global control module is configured to obtain the global control value of each steering gear of the bionic robotic fish at time t based on the starting point and the end point of each sub-path in the sub-path set in sequence, through the trained global control model of the steering gear ;
  • the steering gear compensation control module is configured to obtain the bionic robot fish at time t based on the acquired position and attitude information of the bionic robotic fish at time t and the global control values of each steering gear of the bionic robotic fish at time t, and obtain the bionic robotic fish at time t through the trained steering gear compensation control model Compensation control amount of each steering gear;
  • the steering gear acquisition module is configured to sum the global control variables of the steering gears of the bionic robotic fish at time t and the compensation control values of the steering gears of the bionic robotic fish at time t to obtain the rudders of the bionic robotic fish at time t+1 Machine control
  • the motion control module is configured to perform motion control of the bionic robotic fish at time t+1 through the control quantities of each steering gear of the bionic robotic fish at time t+1.
  • biomimetic robotic fish motion control system based on counter-structured control provided by the above embodiments is only illustrated by the division of the above-mentioned functional modules.
  • the above-mentioned functions can be assigned differently according to needs. That is, the modules or steps in the embodiments of the present invention are further decomposed or combined.
  • the modules of the above-mentioned embodiments can be combined into one module, or can be further divided into multiple sub-modules to complete all of the above described Or part of the function.
  • the names of the modules and steps involved in the embodiments of the present invention are only for distinguishing each module or step, and are not regarded as improper limitations on the present invention.
  • a storage device In a storage device according to a third embodiment of the present invention, a plurality of programs are stored therein, and the programs are adapted to be loaded and executed by a processor to implement the above-mentioned method for controlling motion of a bionic robotic fish based on counter-structured control.
  • a processing device includes a processor and a storage device; the processor is suitable for executing each program; the storage device is suitable for storing multiple programs; the program is suitable for being loaded and executed by the processor In order to realize the above-mentioned bionic robotic fish motion control method based on counter-structured control.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Ocean & Marine Engineering (AREA)
  • Feedback Control In General (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)

Abstract

一种基于对抗结构化控制的仿生机器鱼运动控制方法、系统,属于仿生机器人控制领域,旨在解决现有仿生鱼控制方法训练难度高、运动效率低、鲁棒性差的问题。该方法包括:以运动至目标点的精度与速度为奖励项,以舵机功率和为损失项,构建优化目标函数;优化产生舵机全局控制量的中枢模式发生器模型的参数,固化其参数后优化舵机补偿控制模型的参数;迭代进行模型的参数优化;通过训练好的模型获取仿生机器鱼全局控制和补偿控制信号,并将两组输出信号的线性组合作为机器鱼舵机的控制信号,实现机器鱼的运动控制。本方法和系统结合全局控制信号与局部补偿控制信号,进行模型对抗训练,训练难度低,仿生机器鱼运动精确、能耗低。

Description

基于对抗结构化控制的仿生机器鱼运动控制方法、系统 技术领域
本发明属于仿生机器人控制领域,具体涉及了一种基于对抗结构化控制的仿生机器鱼运动控制方法、系统。
背景技术
仿生机器鱼作为一种典型的水下机器人,在科普教育、水文监测、生物运动分析等诸多领域发挥着愈发重要的作用。其中,良好的运动控制能够帮助机器鱼实现水下快速、稳定与节能的游动,更好地完成复杂任务。因此,近年来针对仿生机器鱼的运动优化方法的研究成果层出不穷。
在研究初期,通常针对机器鱼仿生对象的差异建立不同动力学模型来提高机器鱼游动性能,如基于Kane方法的机器鱼波状游动动力学建模[1]等,为机器鱼运动控制研究提供重要理论支撑;利用广义循环神经网络辨识机器鱼游动控制的强非线性关系,并依据此关系对机器鱼稳速直游进行运动优化[2];利用粒子群算法优化中枢模式发生器(CPG,Central Pattern Generator)参数的方法来提高仿生机器鱼正游、倒游游速与稳定性[3]。
除此之外,日趋火热的深度强化学习(DRL,Deep Reinforcement Learning)方法为高维连续控制下的多目标优化问题提供了较好的算法方案,但其应用于真实世界机器人的可行性和准确性备受质疑,仍有待进一步的研究。在DRL算法的实际应用中,Levine等人建立了一个手眼机器人训练的大规模数据采集的案例[4];Ebert等人采用基于 自监督模型的方法来教机械臂学习新技能[5];Pong等人将基于模型和非模型的训练方法相结合,提出了学习效率高、性能稳定的时域差分模型[6];Srouji等人研究了利用结构化控制网络提高归纳偏差来提高真实机器人训练中的采样效率[7]。但是对于专注于高机动性运动机制研究的仿生机器鱼来说,数据量缺乏、视觉反馈实时性低以及计算资源的限制使得这些大规模数据采集方法无法充分发挥其优势。同时,利用神经网络训练非线性控制系统的难度较大,实际应用中鲁棒性较差。因此,仿生机器鱼的实际运动控制大多采用传统控制或简单智能控制,如比例-积分-微分控制器(PID)、反步滑膜控制、模糊控制等。
总的来说,现有技术直接采用深度强化学习方法学习仿生机器鱼的非线性控制律,由于数据量缺乏、视觉反馈实时性低以及计算资源的限制,训练难度高,而传统方法中仿生鱼采用的运动控制或简单智能控制,运动效率低、鲁棒性差。
以下文献是与本发明相关的技术背景资料:
[1]夏丹、陈维山、刘军考、韩路辉,基于Kane方法的仿鱼机器人波状游动的动力学建模,机械工程学报,20090615.
[2]郭顺利、朱其新、谢广明,基于GRNN的机器鱼直游稳态速度建模,兵工自动化期刊,20101115.
[3]汪明、喻俊志、谭民,胸鳍推进型机器鱼的CPG控制及实现,机器人期刊,20100315.
[4]Levine S,Pastor P,Krizhevsky A,Ibarz J,Quillen D.Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection[J].The International Journal of Robotics Research,2018,37(4-5):421-436.
[5]Ebert F,Finn C,Lee A X,Levine S.Self-supervised visual planning with temporal skip connections[J].arXiv preprint arXiv:1710.05268,2017.
[6]Pong V,Gu S,Dalal M,Levine S.Temporal difference models:Model-free deep rl for model-based control[J].arXiv preprint arXiv:1802.09081,2018.
[7]Srouji M,Zhang J,Salakhutdinov R.Structured control nets for deep reinforcement learning[J].arXiv preprint arXiv:1802.08311,2018.
发明内容
为了解决现有技术中的上述问题,即现有仿生鱼控制方法训练难度高、运动效率低、鲁棒性差的问题,本发明提供了一种基于对抗结构化控制的仿生机器鱼运动控制方法,该仿生机器鱼运动控制方法包括:
步骤S10,获取仿生机器鱼游动路径,并将所述游动路径划分为顺次连接的基本子路径集合;
步骤S20,顺次基于所述子路径集合中每一个子路径的起点与终点,通过训练好的舵机全局控制模型,获取t时刻仿生机器鱼各舵机全局控制量;
步骤S30,基于获取的t时刻仿生机器鱼位姿信息、t时刻仿生机器鱼各舵机全局控制量,通过训练好的舵机补偿控制模型,获取t时刻仿生机器鱼各舵机补偿控制量;
步骤S40,对所述t时刻仿生机器鱼各舵机全局控制量与t时刻仿生机器鱼各舵机补偿控制量求和,获取t+1时刻仿生机器鱼各舵机控制量,并通过所述t+1时刻仿生机器鱼各舵机控制量进行t+1时刻仿生机器鱼运动控制;
步骤S50,令t=t+1,跳转步骤S20,直至所述仿生机器鱼到达所述游动路径终点。
在一些优选的实施例中,所述舵机全局控制模型、舵机补偿控制模型分别包括一一对应的针对不同类型子路径构建的舵机全局控制子模型与舵机补偿控制子模型对的集合。
在一些优选的实施例中,所述舵机全局控制子模型与舵机补偿控制子模型对分别基于CPG模型、DDPG网络构建,并通过迭代对抗方法进行训练,其训练方法为:
步骤B10,构建所述舵机全局控制子模型与舵机补偿控制子模型对的优化目标函数;
步骤B20,通过ES算法按照预设第一梯度函数梯度下降方向进行舵机全局控制子模型的参数优化,直至所述优化目标函数的值不再增加或增加值低于设定第一阈值,获得第一舵机全局控制子模型;
步骤B30,基于所述第一舵机全局控制子模型的参数,按照预设第二梯度函数梯度下降方向进行舵机补偿控制子模型中动作策略网络与动作价值网络的参数优化,直至所述优化目标函数的值不再增加或增加值低于设定第一阈值,获得第一舵机补偿控制子模型;
步骤B40,基于所述第一舵机补偿控制子模型的参数,跳转步骤B20,迭代进行舵机全局控制子模型、舵机补偿控制子模型的参数优化,直至所述优化目标函数的值不再增加或增加值低于设定第一阈值,获得训练好的舵机全局控制子模型与舵机补偿控制子模型。
在一些优选的实施例中,所述目标函数为:
Figure PCTCN2020085045-appb-000001
其中,ψ表示通过目标函数优化的对象,即CPG模型参数、DDPG网络参数;θ e表示仿生机器鱼与目标点偏航角度,θ e∈(-π,π]为其设定范围;
Figure PCTCN2020085045-appb-000002
表示仿生机器鱼在世界参考系下的速度矢量;
Figure PCTCN2020085045-appb-000003
表示该速度矢量的模,v 0是为保证能耗优化效果预先设定的速度上限;
Figure PCTCN2020085045-appb-000004
分 别表示仿生机器鱼舵机的力矩矢量、角速度矢量;β为正值,表示奖励与损耗之间的相关系数。
在一些优选的实施例中,所述第一梯度函数为:
Figure PCTCN2020085045-appb-000005
其中,F(·)代表优化目标函数,θ代表CPG模型参数,σ表示参数扰动的步长,ε表示参数扰动的梯度方向,
Figure PCTCN2020085045-appb-000006
代表仿生机器鱼在向n个从标准正态分布采样得到梯度方向更新后的θ控制下运动得到的优化目标函数的数学期望。
在一些优选的实施例中,所述第二梯度函数为:
Figure PCTCN2020085045-appb-000007
其中,Q(s,a|θ Q)表示动作状态价值函数,μ(s|θ μ)表示动作策略函数,N代表批处理更新方法中样本的个数,i代表从经验池中采样得到的第i个样本,a代表控制量,s i代表第i个样本的状态,J代表动作策略网络的目标函数,
Figure PCTCN2020085045-appb-000008
代表动作策略网络对网络内参数的梯度。
在一些优选的实施例中,步骤S40中“对所述t时刻仿生机器鱼各舵机全局控制量与t时刻仿生机器鱼各舵机控制量的调整量求和”,其方法为:
Figure PCTCN2020085045-appb-000009
其中,a t表示仿生机器鱼舵机控制信号,s t
Figure PCTCN2020085045-appb-000010
分别表示t时刻仿生机器鱼的状态与期望状态,
Figure PCTCN2020085045-appb-000011
分别表示与仿生机器鱼状态相关的舵机全局控制量和舵机补偿控制量。
本发明的另一方面,提出了一种基于对抗结构化控制的仿生机器鱼运动控制系统,该仿生机器鱼运动控制系统包括路径获取模块、 舵机全局控制模块、舵机补偿控制模块、舵机控制量获取模块、运动控制模块;
所述路径获取模块,配置为获取仿生机器鱼游动路径,并将所述游动路径划分为顺次连接的基本子路径集合;
所述舵机全局控制模块,配置为顺次基于所述子路径集合中每一个子路径的起点与终点,通过训练好的舵机全局控制模型,获取t时刻仿生机器鱼各舵机全局控制量;
所述舵机补偿控制模块,配置为基于获取的t时刻仿生机器鱼位姿信息、t时刻仿生机器鱼各舵机全局控制量,通过训练好的舵机补偿控制模型,获取t时刻仿生机器鱼各舵机补偿控制量;
所述舵机控制量获取模块,配置为对所述t时刻仿生机器鱼各舵机全局控制量与t时刻仿生机器鱼各舵机补偿控制量求和,获取t+1时刻仿生机器鱼各舵机控制量;
所述运动控制模块,配置为通过所述t+1时刻仿生机器鱼各舵机控制量进行t+1时刻仿生机器鱼运动控制。
本发明的第三方面,提出了一种存储装置,其中存储有多条程序,所述程序适于由处理器加载并执行以实现上述的基于对抗结构化控制的仿生机器鱼运动控制方法。
本发明的第四方面,提出了一种处理装置,包括处理器、存储装置;所述处理器,适于执行各条程序;所述存储装置,适于存储多条程序;所述程序适于由处理器加载并执行以实现上述的基于对抗结构化控制的仿生机器鱼运动控制方法。
本发明的有益效果:
(1)本发明基于对抗结构化控制的仿生机器鱼运动控制方法,结合鱼类周期性运动的先验知识,通过进化策略(ES,Evolutionary Strategy)优化CPG模型产生的节律信号作为机器鱼的基准控制信号,结 合利用深度强化学习算法学习在基准控制信号附近的补偿控制信号来进行仿生机器鱼的共同控制,生成的控制律符合鱼体波的类正弦信号,从而保证了机器鱼游动的高效率,并且相较于直接用深度强化学习复杂的非线性控制律,训练优化CPG模型涉及的参数较少,降低了优化训练的难度。
(2)本发明基于对抗结构化控制的仿生机器鱼运动控制方法,针对节能式运动优化任务提出了相应的目标函数来实现机器鱼完成运动目标的同时降低运动损耗的复杂要求;又通过对抗式训练方法来改善传统启发式优化算法易陷入局部最优值的缺陷,进一步提高机器鱼的运动效率与鲁棒性。
附图说明
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本申请的其它特征、目的和优点将会变得更明显:
图1是本发明基于对抗结构化控制的仿生机器鱼运动控制方法的流程示意图;
图2是本发明基于对抗结构化控制的仿生机器鱼运动控制方法一种实施例的算法结构示意图;
图3是本发明基于对抗结构化控制的仿生机器鱼运动控制方法一种实施例的Mujoco机器鱼仿真训练示意图;
图4是本发明基于对抗结构化控制的仿生机器鱼运动控制方法一种实施例的真实机器鱼数值仿真训练示意图;
图5是本发明基于对抗结构化控制的仿生机器鱼运动控制方法一种实施例的对一组较差初始状态优化后的真实四连杆仿生机器鱼的直游示例图。
具体实施方式
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
本发明的一种基于对抗结构化控制的仿生机器鱼运动控制方法,该仿生机器鱼运动控制方法包括:
步骤S10,获取仿生机器鱼游动路径,并将所述游动路径划分为顺次连接的基本子路径集合;
步骤S20,顺次基于所述子路径集合中每一个子路径的起点与终点,通过训练好的舵机全局控制模型,获取t时刻仿生机器鱼各舵机全局控制量;
步骤S30,基于获取的t时刻仿生机器鱼位姿信息、t时刻仿生机器鱼各舵机全局控制量,通过训练好的舵机补偿控制模型,获取t时刻仿生机器鱼各舵机补偿控制量;
步骤S40,对所述t时刻仿生机器鱼各舵机全局控制量与t时刻仿生机器鱼各舵机补偿控制量求和,获取t+1时刻仿生机器鱼各舵机控制量,并通过所述t+1时刻仿生机器鱼各舵机控制量进行t+1时刻仿生机器鱼运动控制;
步骤S50,令t=t+1,跳转步骤S20,直至所述仿生机器鱼到达所述游动路径终点。
为了更清晰地对本发明基于对抗结构化控制的仿生机器鱼运动控制方法进行说明,下面结合图1对本发明方法实施例中各步骤展开详述。
本发明一种实施例的基于对抗结构化控制的仿生机器鱼运动控制方法,包括步骤S10-步骤S50,各步骤详细描述如下:
步骤S10,获取仿生机器鱼游动路径,并将所述游动路径划分为顺次连接的基本子路径集合。
如图1所示,为本发明基于对抗结构化控制的仿生机器鱼运动控制方法一种实施例的算法结构示意图,仿生机器鱼最终的控制信号分别由全局基准控制和局部补偿控制产生。全局基准控制就是一个经过参数优化的CPG模型,负责生成节律性信号作为全局基准控制信号;局部补偿控制是通过DDPG训练得到的一个实时性系统,输入为仿生机器鱼的实时位姿信息,输出为与基于位置控制的舵机个数相同的补偿控制量。在全局基准信号的控制下,仿生机器鱼产生一个全局性的运动趋势,而补偿信号在基准信号的基础上帮助机器鱼根据当前状态进行微调,从而校准路径,提高运动精度并降低运动损耗。
仿生机器鱼的整个游动任务可以划分为一个个相对简单的子任务,每个子任务对应一种简单的游动路径,比如左拐、右拐、直行等等。相邻子任务间,上一个路径的终止点与下一个路径的起始点为同一点。通过简单的游动路径的各种组合,最终实现仿生机器鱼的复杂游动任务的运动控制。
步骤S20,顺次基于所述子路径集合中每一个子路径的起点与终点,通过训练好的舵机全局控制模型,获取t时刻仿生机器鱼各舵机全局控制量。
步骤S30,基于获取的t时刻仿生机器鱼位姿信息、t时刻仿生机器鱼各舵机全局控制量,通过训练好的舵机补偿控制模型,获取t时刻仿生机器鱼各舵机补偿控制量。
舵机全局控制模型、舵机补偿控制模型分别包括一一对应的针对不同类型子路径构建的舵机全局控制子模型与舵机补偿控制子模型对的集合。
舵机全局控制子模型与舵机补偿控制子模型对分别基于CPG模型、DDPG网络构建,并通过迭代对抗方法进行训练,其训练方法为:
步骤B10,构建所述舵机全局控制子模型与舵机补偿控制子模型对的优化目标函数,如式(1)所示:
Figure PCTCN2020085045-appb-000012
其中,ψ表示通过目标函数优化的对象,即CPG模型参数、DDPG网络参数;θ e表示仿生机器鱼与目标点偏航角度,θ e∈(-π,π]为其设定范围;
Figure PCTCN2020085045-appb-000013
表示仿生机器鱼在世界参考系下的速度矢量;
Figure PCTCN2020085045-appb-000014
表示该速度矢量的模,v 0是为保证能耗优化效果预先设定的速度上限;
Figure PCTCN2020085045-appb-000015
分别表示仿生机器鱼舵机的力矩矢量、角速度矢量;β为正值,表示奖励与损耗之间的相关系数。
本发明提出的运动优化方法分别针对了两个不同的模型,为了保证优化效果的一致性,提出的优化目标函数具有泛化性。
步骤B20,通过ES算法按照预设第一梯度函数梯度下降方向进行舵机全局控制子模型的参数优化,直至所述优化目标函数的值不再增加或增加值低于设定第一阈值,获得第一舵机全局控制子模型。
通常来说,生物CPG是位于脊髓的专用神经网络,它有能力产生协调的节律活动模式,如呼吸、咀嚼或行走时的腿部运动。特别地, CPG模型可以在没有任何来自反馈或更高控制中心的输入的情况下产生节律信号。基于CPG模型的控制被广泛用于生成各种机器鱼的游动策略。与传统的鱼体波拟合方法相比,CPG模型作为在线步态发生器,简单地改变输出信号的特征,即使参数突然改变,也能保持平稳连续。因此,本发明的全局基准控制也采用基于CPG模型构建的舵机全局控制模型产生机器鱼全局控制信号。
训练阶段,本发明以全局基准控制作为初始优化对象,利用ES算法对CPG模型的参数进行优化。采用强化学习中ES算法通过生成镜像随机梯度的方法扰动CPG模型中的参数,控制机器鱼在环境中运动并得到大小不同的奖励反馈,最后根据奖励排序按不同权重更新CPG模型参数,其第一梯度函数如式(2)所示:
Figure PCTCN2020085045-appb-000016
其中,F(·)代表优化目标函数,θ代表CPG模型参数,σ表示参数扰动的步长,ε表示参数扰动的梯度方向,
Figure PCTCN2020085045-appb-000017
代表仿生机器鱼在向n个从标准正态分布采样得到梯度方向更新后的θ控制下运动得到的优化目标函数的数学期望。
对于每段子路径L i,根据机器鱼初始位姿
Figure PCTCN2020085045-appb-000018
与目标点P i关系,经验性给定一组训练初值,训练至目标函数得分收敛,记录训练最优结果对应的CPG模型参数与机器鱼终止位姿
Figure PCTCN2020085045-appb-000019
如图3所示,为本发明基于对抗结构化控制的仿生机器鱼运动控制方法一种实施例的Mujoco机器鱼仿真训练示意图,图3左图、图3右图分别是ES算法对一个较差和较好的初始CPG模型参数优化数值仿真图,短线段曲线Train score代表优化目标函数值在不同训练回合下的变化,点曲线Joint power(W)代表每个episode内单位时间的功率总和在不同训练回合下的变化,连续曲线Speed(×1000)代表直游任务中机器鱼的头部线速度在不同训练回合下 的变化,横坐标round代表训练回合数,纵坐标value代表优化目标函数值,无论CPG初始参数的优劣,ES算法都能有效地进行优化,其中,机器鱼游动过程中的能量损耗由于游动路径与姿态的优化呈现明显的下降。
步骤B30,基于所述第一舵机全局控制子模型的参数,按照预设第二梯度函数梯度下降方向进行舵机补偿控制子模型中动作策略网络与动作价值网络的参数优化,直至所述优化目标函数的值不再增加或增加值低于设定第一阈值,获得第一舵机补偿控制子模型。
在全局基准控制经过第一轮优化后,本发明方法锁定模块输出的全局基准控制信号,即固定CPG模型参数,然后转换训练对象,更新DDPG中动作策略网络与动作价值网络的参数,其第二梯度函数如式(3)所示:
Figure PCTCN2020085045-appb-000020
其中,Q(s,a|θ Q)表示动作状态价值函数,μ(s|θ μ)表示动作策略函数,N代表批处理更新方法中样本的个数,i代表从经验池中采样得到的第i个样本,a代表控制量,s i代表第i个样本的状态,J代表动作策略网络的目标函数,
Figure PCTCN2020085045-appb-000021
代表动作策略网络对网络内参数的梯度。
本发明提出运用DDPG算法产生实时控制的局部补偿控制信号,主要源于残差神经网络的核心思想:在已取得较优结果的控制信号上训练其残差的补偿控制信号,其最差的结果只是残差控制网络输出为零,等价于仅通过全局基准控制信号控制机器鱼的运动。因此,本发明设定DDPG动作策略网络的权重与偏置均为0,同时根据机器鱼搭载舵机的单位时间最大转角θ max的限制,本发明方法设置动作策略网络输出如式(4)所示:
Figure PCTCN2020085045-appb-000022
其中,a t代表每个时刻局部补偿控制的输出动作信号,
Figure PCTCN2020085045-appb-000023
表示动作策略网络输出层的输出,非线性激活函数tanh将输出范围限制在[-1,1],K代表根据θ max设置的补偿信号微调量的上限值。
本发明方法设计的动作策略网络包括两个隐藏层,每层包含64个节点。输入状态维度与多连杆机器鱼的实际舵机数量有关,其物理含义主要有:当前位置与目标点P i的距离、当前位置与目标点P i的偏差角、当前航向角、各舵机旋转角度、各舵机旋转角速度。动作价值网络同样设置了两层隐藏层,每层包含节点64个。其中状态和动作采用向量拼接的方式共同组成价值网络的输入,价值网络的输出即为动作状态价值函数Q π(s,a)。
训练阶段,DDPG与ES采用同样的优化目标函数,但是ES采用蒙特·卡洛方法,将一段Episode的总奖励作为反馈得分;DDPG采用时间差分方法,每一步运动都对网络参数进行更新。当最终目标函数得分收敛时停止DDPG的训练。
步骤B40,基于所述第一舵机补偿控制子模型的参数,跳转步骤B20,迭代进行舵机全局控制子模型、舵机补偿控制子模型的参数优化,直至所述优化目标函数的值不再增加或增加值低于设定第一阈值,获得训练好的舵机全局控制子模型与舵机补偿控制子模型。
本发明方法在对抗结构化控制的训练中,除了训练得到初始的全局基准控制外,后续通过ES算法更新CPG模型参数同样要求固定DDPG网络参数再进行优化。本发明方法提出的这种对抗式的训练方法能够有效的避免CPG模型参数和DDPG网络参数在优化过程中陷入局部最优值的情况。如图4所示,为本发明基于对抗结构化控制的仿生机器鱼运动控制方法一种实施例的真实机器鱼数值仿真训练示意图,灰色曲线Cost Curve代表优化目标函数中损耗项在不同对抗训练回合下的变化,黑色曲线Reward Curve代表优化目标函数中奖励项在不同对抗训练回合下的变 化,1st ES、2nd ES、3rd ES分别代表第一次、第二次、第三次通过进化策略算法更新CPG模型参数,1st RL、2nd RL、3rd RL分别代表第一次、第二次、第三次更新DDPG模型参数,横坐标round代表对抗训练回合数,纵坐标value代表优化目标函数值,每一轮经过ES算法优化收敛的基准控制叠加上补偿控制后,目标函数得分再次提升。在本发明一个实施例中,经过三轮对抗训练后目标函数得分已趋于稳定,不再提升。同时,如图4中阴影部分展示,给定不同的初始条件,本发明方法都能为仿生机器鱼带来明显的运动优化效果,并获得较高的任务完成度。
步骤S40,对所述t时刻仿生机器鱼各舵机全局控制量与t时刻仿生机器鱼各舵机补偿控制量求和,获取t+1时刻仿生机器鱼各舵机控制量,并通过所述t+1时刻仿生机器鱼各舵机控制量进行t+1时刻仿生机器鱼运动控制。
基于传统控制理论提出的控制算法如PID、自抗扰控制技术(ADRC,Active Disturbance Rejection Control)等在解决仿生机器鱼路径跟踪问题时,通常只能注重降低跟踪误差的单一目的。通过理论推导的方法求解高性能、低功耗相结合的控制律是非常困难的。因此,本发明方法将求解控制律的问题转化为目标优化问题,从而实现兼顾高跟踪精度与低功耗的任务要求。直观地,根据仿生机器鱼节律性运动的先验知识,我们将基准控制信号设定为经过优化的节律信号。因此,全局基准控制被设计为利用ES优化CPG模型参数,而补偿控制被设计为通过DDPG算法进一步优化与稳定基准控制的局部运动,两个信号线性组合为最终的控制律,如式(5)所示:
Figure PCTCN2020085045-appb-000024
其中,a t表示仿生机器鱼舵机控制信号,s t
Figure PCTCN2020085045-appb-000025
分别表示t时刻仿生机器鱼的状态与期望状态,
Figure PCTCN2020085045-appb-000026
分别表示与仿生机器鱼状态相关的舵机全局控制量和舵机补偿控制量。
步骤S50,令t=t+1,跳转步骤S20,直至所述仿生机器鱼到达所述游动路径终点。
如图5所示,为本发明基于对抗结构化控制的仿生机器鱼运动控制方法一种实施例的对一组较差初始状态优化后的真实四连杆仿生机器鱼的直游示例图,从图5(a)可以看出,实验开始时,机器鱼静止于水中,然后以一个较差的游动姿态实现直游任务。图5(b)展示出虽然机器鱼仅在基于CPG模型的控制下完成了直游目标,但机器鱼摆动幅度非常大,全局视觉系统记录的路径呈现锯齿状波动。因此,该运动姿态带来的水阻力非常大,运动效率很低,同时仿生机器鱼的游动速度较低,而能量损耗非常高。图5(c)、(d)展示的是采用本发明方法优化后的机器鱼直游运动路径,直观上可以看出全局视觉系统记录的路径近乎一条直线,波动极小。我们也可以看出在保证速度不降低甚至提升的要求下,能量也得到了很好的保存。
本发明第二实施例的基于对抗结构化控制的仿生机器鱼运动控制系统,该仿生机器鱼运动控制系统包括路径获取模块、舵机全局控制模块、舵机补偿控制模块、舵机控制量获取模块、运动控制模块;
所述路径获取模块,配置为获取仿生机器鱼游动路径,并将所述游动路径划分为顺次连接的基本子路径集合;
所述舵机全局控制模块,配置为顺次基于所述子路径集合中每一个子路径的起点与终点,通过训练好的舵机全局控制模型,获取t时刻仿生机器鱼各舵机全局控制量;
所述舵机补偿控制模块,配置为基于获取的t时刻仿生机器鱼位姿信息、t时刻仿生机器鱼各舵机全局控制量,通过训练好的舵机补偿控制模型,获取t时刻仿生机器鱼各舵机补偿控制量;
所述舵机控制量获取模块,配置为对所述t时刻仿生机器鱼各舵机全局控制量与t时刻仿生机器鱼各舵机补偿控制量求和,获取t+1时刻仿生机器鱼各舵机控制量;
所述运动控制模块,配置为通过所述t+1时刻仿生机器鱼各舵机控制量进行t+1时刻仿生机器鱼运动控制。
所属技术领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统的具体工作过程及有关说明,可以参考前述方法实施例中的对应过程,在此不再赘述。
需要说明的是,上述实施例提供的基于对抗结构化控制的仿生机器鱼运动控制系统,仅以上述各功能模块的划分进行举例说明,在实际应用中,可以根据需要而将上述功能分配由不同的功能模块来完成,即将本发明实施例中的模块或者步骤再分解或者组合,例如,上述实施例的模块可以合并为一个模块,也可以进一步拆分成多个子模块,以完成以上描述的全部或者部分功能。对于本发明实施例中涉及的模块、步骤的名称,仅仅是为了区分各个模块或者步骤,不视为对本发明的不当限定。
本发明第三实施例的一种存储装置,其中存储有多条程序,所述程序适于由处理器加载并执行以实现上述的基于对抗结构化控制的仿生机器鱼运动控制方法。
本发明第四实施例的一种处理装置,包括处理器、存储装置;处理器,适于执行各条程序;存储装置,适于存储多条程序;所述程序适于由处理器加载并执行以实现上述的基于对抗结构化控制的仿生机器鱼运动控制方法。
所属技术领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的存储装置、处理装置的具体工作过程及有关说明,可以参考前述方法实施例中的对应过程,在此不再赘述。
本领域技术人员应该能够意识到,结合本文中所公开的实施例描述的各示例的模块、方法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,软件模块、方法步骤对应的程序可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。为了清楚地说明电子硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以电子硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
术语“第一”、“第二”等是用于区别类似的对象,而不是用于描述或表示特定的顺序或先后次序。
术语“包括”或者任何其它类似用语旨在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备/装置不仅包括那些要素,而且还包括没有明确列出的其它要素,或者还包括这些过程、方法、物品或者设备/装置所固有的要素。
至此,已经结合附图所示的优选实施方式描述了本发明的技术方案,但是,本领域技术人员容易理解的是,本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下,本领域技术人员可以对相关技术特征作出等同的更改或替换,这些更改或替换之后的技术方案都将落入本发明的保护范围之内。

Claims (10)

  1. 一种基于对抗结构化控制的仿生机器鱼运动控制方法,其特征在于,该仿生机器鱼运动控制方法包括:
    步骤S10,获取仿生机器鱼游动路径,并将所述游动路径划分为顺次连接的基本子路径集合;
    步骤S20,顺次基于所述子路径集合中每一个子路径的起点与终点,通过训练好的舵机全局控制模型,获取t时刻仿生机器鱼各舵机全局控制量;
    步骤S30,基于获取的t时刻仿生机器鱼位姿信息、t时刻仿生机器鱼各舵机全局控制量,通过训练好的舵机补偿控制模型,获取t时刻仿生机器鱼各舵机补偿控制量;
    步骤S40,对所述t时刻仿生机器鱼各舵机全局控制量与t时刻仿生机器鱼各舵机补偿控制量求和,获取t+1时刻仿生机器鱼各舵机控制量,并通过所述t+1时刻仿生机器鱼各舵机控制量进行t+1时刻仿生机器鱼运动控制;
    步骤S50,令t=t+1,跳转步骤S20,直至所述仿生机器鱼到达所述游动路径终点。
  2. 根据权利要求1所述的基于对抗结构化控制的仿生机器鱼运动控制方法,其特征在于,所述舵机全局控制模型、舵机补偿控制模型分别包括一一对应的针对不同类型子路径构建的舵机全局控制子模型与舵机补偿控制子模型对的集合。
  3. 根据权利要求2所述的基于对抗结构化控制的仿生机器鱼运动控制方法,其特征在于,所述舵机全局控制子模型与舵机补偿控制子模型 对分别基于CPG模型、DDPG网络构建,并通过迭代对抗方法进行训练,其训练方法为:
    步骤B10,构建所述舵机全局控制子模型与舵机补偿控制子模型对的优化目标函数;
    步骤B20,通过ES算法按照预设第一梯度函数梯度下降方向进行舵机全局控制子模型的参数优化,直至所述优化目标函数的值不再增加或增加值低于设定第一阈值,获得第一舵机全局控制子模型;
    步骤B30,基于所述第一舵机全局控制子模型的参数,按照预设第二梯度函数梯度下降方向进行舵机补偿控制子模型中动作策略网络与动作价值网络的参数优化,直至所述优化目标函数的值不再增加或增加值低于设定第一阈值,获得第一舵机补偿控制子模型;
    步骤B40,基于所述第一舵机补偿控制子模型的参数,跳转步骤B20,迭代进行舵机全局控制子模型、舵机补偿控制子模型的参数优化,直至所述优化目标函数的值不再增加或增加值低于设定第一阈值,获得训练好的舵机全局控制子模型与舵机补偿控制子模型。
  4. 根据权利要求3所述的基于对抗结构化控制的仿生机器鱼运动控制方法,其特征在于,所述目标函数为:
    Figure PCTCN2020085045-appb-100001
    其中,ψ表示通过目标函数优化的对象,即CPG模型参数、DDPG网络参数;θ e表示仿生机器鱼与目标点偏航角度,θ e∈(-π,π]为其设定范围;
    Figure PCTCN2020085045-appb-100002
    表示仿生机器鱼在世界参考系下的速度矢量;
    Figure PCTCN2020085045-appb-100003
    表示该速度矢量的模,v 0是为保证能耗优化效果预先设定的速度上限;
    Figure PCTCN2020085045-appb-100004
    分别表示仿生机器鱼舵机的力矩矢量、角速度矢量;β为正值,表示奖励与损耗之间的相关系数。
  5. 根据权利要求3所述的基于对抗结构化控制的仿生机器鱼运动控制方法,其特征在于,所述第一梯度函数为:
    Figure PCTCN2020085045-appb-100005
    其中,F(·)代表优化目标函数,θ代表CPG模型参数,σ表示参数扰动的步长,ε表示参数扰动的梯度方向,
    Figure PCTCN2020085045-appb-100006
    代表仿生机器鱼在向n个从标准正态分布采样得到梯度方向更新后的θ控制下运动得到的优化目标函数的数学期望。
  6. 根据权利要求3所述的基于对抗结构化控制的仿生机器鱼运动控制方法,其特征在于,所述第二梯度函数为:
    Figure PCTCN2020085045-appb-100007
    其中,Q(s,a|θ Q)表示动作状态价值函数,μ(s|θ μ)表示动作策略函数,N代表批处理更新方法中样本的个数,i代表从经验池中采样得到的第i个样本,a代表控制量,s i代表第i个样本的状态,J代表动作策略网络的目标函数,
    Figure PCTCN2020085045-appb-100008
    代表动作策略网络对网络内参数的梯度。
  7. 根据权利要求3所述的基于对抗结构化控制的仿生机器鱼运动控制方法,其特征在于,步骤S40中“对所述t时刻仿生机器鱼各舵机全局控制量与t时刻仿生机器鱼各舵机控制量的调整量求和”,其方法为:
    Figure PCTCN2020085045-appb-100009
    其中,a t表示仿生机器鱼舵机控制信号,s t
    Figure PCTCN2020085045-appb-100010
    分别表示t时刻仿生机器鱼的状态与期望状态,
    Figure PCTCN2020085045-appb-100011
    分别表示与仿生机器鱼状态相关的舵机全局控制量和舵机补偿控制量。
  8. 一种基于对抗结构化控制的仿生机器鱼运动控制系统,其特征在于,该仿生机器鱼运动控制系统包括路径获取模块、舵机全局控制模块、舵机补偿控制模块、舵机控制量获取模块、运动控制模块;
    所述路径获取模块,配置为获取仿生机器鱼游动路径,并将所述游动路径划分为顺次连接的基本子路径集合;
    所述舵机全局控制模块,配置为顺次基于所述子路径集合中每一个子路径的起点与终点,通过训练好的舵机全局控制模型,获取t时刻仿生机器鱼各舵机全局控制量;
    所述舵机补偿控制模块,配置为基于获取的t时刻仿生机器鱼位姿信息、t时刻仿生机器鱼各舵机全局控制量,通过训练好的舵机补偿控制模型,获取t时刻仿生机器鱼各舵机补偿控制量;
    所述舵机控制量获取模块,配置为对所述t时刻仿生机器鱼各舵机全局控制量与t时刻仿生机器鱼各舵机补偿控制量求和,获取t+1时刻仿生机器鱼各舵机控制量;
    所述运动控制模块,配置为通过所述t+1时刻仿生机器鱼各舵机控制量进行t+1时刻仿生机器鱼运动控制。
  9. 一种存储装置,其中存储有多条程序,其特征在于,所述程序适于由处理器加载并执行以实现权利要求1-7任一项所述的基于对抗结构化控制的仿生机器鱼运动控制方法。
  10. 一种处理装置,包括
    处理器,适于执行各条程序;以及
    存储装置,适于存储多条程序;
    其特征在于,所述程序适于由处理器加载并执行以实现:
    权利要求1-7任一项所述的基于对抗结构化控制的仿生机器鱼运动控制方法。
PCT/CN2020/085045 2019-11-29 2020-04-16 基于对抗结构化控制的仿生机器鱼运动控制方法、系统 WO2021103392A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/094,820 US10962976B1 (en) 2019-11-29 2020-11-11 Motion control method and system for biomimetic robotic fish based on adversarial structured control

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911199839.1A CN110909859B (zh) 2019-11-29 2019-11-29 基于对抗结构化控制的仿生机器鱼运动控制方法、系统
CN201911199839.1 2019-11-29

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/094,820 Continuation US10962976B1 (en) 2019-11-29 2020-11-11 Motion control method and system for biomimetic robotic fish based on adversarial structured control

Publications (1)

Publication Number Publication Date
WO2021103392A1 true WO2021103392A1 (zh) 2021-06-03

Family

ID=69820684

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/085045 WO2021103392A1 (zh) 2019-11-29 2020-04-16 基于对抗结构化控制的仿生机器鱼运动控制方法、系统

Country Status (2)

Country Link
CN (1) CN110909859B (zh)
WO (1) WO2021103392A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113868115A (zh) * 2021-08-30 2021-12-31 天津大学 基于多目标优化与深度强化学习的游戏软件自动测试方法
CN114065663A (zh) * 2021-11-15 2022-02-18 中国海洋大学 一种基于cfd和mlp的仿生鱼水动力预测方法
CN114216466A (zh) * 2021-12-09 2022-03-22 中国电子科技集团公司第五十四研究所 一种基于动态信任机制的群体智能仿生导航方法
CN114779645A (zh) * 2022-04-29 2022-07-22 北京航空航天大学 一种有向固定通信拓扑下胸鳍拍动式机器鱼编队控制方法
CN116050304A (zh) * 2023-03-15 2023-05-02 重庆交通大学 一种智能鱼流场模拟控制方法、系统、设备及存储介质
CN116300473A (zh) * 2023-04-14 2023-06-23 清华大学深圳国际研究生院 一种基于cpg模型的软体仿生机器鱼游动优化方法
CN116700015A (zh) * 2023-07-28 2023-09-05 中国科学院自动化研究所 水下航行器主动增稳控制方法及装置

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110909859B (zh) * 2019-11-29 2023-03-24 中国科学院自动化研究所 基于对抗结构化控制的仿生机器鱼运动控制方法、系统
US10962976B1 (en) 2019-11-29 2021-03-30 Institute Of Automation, Chinese Academy Of Sciences Motion control method and system for biomimetic robotic fish based on adversarial structured control
CN111443605B (zh) * 2020-04-01 2021-03-23 西安交通大学 构建仿生波动鳍推进运动控制方程及其参数整定优化方法
CN111666846B (zh) * 2020-05-27 2023-05-30 厦门大学 一种人脸属性识别方法和装置
CN111830832B (zh) * 2020-07-27 2021-08-31 中国科学院自动化研究所 仿生滑翔机器海豚平面路径跟踪方法及系统
CN112904873B (zh) * 2021-01-26 2022-08-26 西湖大学 基于深度强化学习的仿生机器鱼控制方法及装置
CN113095463A (zh) * 2021-03-31 2021-07-09 南开大学 一种基于进化强化学习的机器人对抗方法
CN113753209B (zh) * 2021-08-18 2022-09-06 中国科学院自动化研究所 基于仿生机器鱼的仿生波动控制方法及系统
CN113561187B (zh) * 2021-09-24 2022-01-11 中国科学院自动化研究所 机器人控制方法、装置、电子设备及存储介质
CN113867156A (zh) * 2021-12-02 2021-12-31 湖南工商大学 融合bp-rbf神经网络的机器鱼路径跟踪方法及装置
CN114800487B (zh) * 2022-03-14 2024-02-02 中国科学院自动化研究所 基于扰动观测技术的水下机器人作业控制方法
CN115808931B (zh) * 2023-02-07 2023-06-02 中国科学院自动化研究所 水下机器人运动控制方法、装置、系统、设备和存储介质
CN117452806B (zh) * 2023-12-18 2024-03-19 广东海洋大学 水下仿生鱼机器人的航向控制方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090125285A1 (en) * 2007-11-13 2009-05-14 Gugaliya Jinendra K Decomposition of nonlinear dynamics using multiple model approach and gap metric analysis
CN104002948A (zh) * 2014-06-06 2014-08-27 西北工业大学 二自由度仿生机器鱼携带目标的控制方法
CN104881045A (zh) * 2015-06-17 2015-09-02 中国科学院自动化研究所 嵌入式视觉引导下仿生机器鱼三维追踪控制方法
CN110286592A (zh) * 2019-06-28 2019-09-27 山东建筑大学 一种基于bp神经网络的机器鱼多模态运动方法及系统
CN110488611A (zh) * 2019-09-02 2019-11-22 山东建筑大学 一种仿生机器鱼运动控制方法、控制器及仿生机器鱼
CN110909859A (zh) * 2019-11-29 2020-03-24 中国科学院自动化研究所 基于对抗结构化控制的仿生机器鱼运动控制方法、系统

Family Cites Families (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100520049B1 (ko) * 2003-09-05 2005-10-10 학교법인 인하학원 자율이동로봇을 위한 경로계획방법
JP5052013B2 (ja) * 2005-03-17 2012-10-17 ソニー株式会社 ロボット装置及びその制御方法
CN101916071B (zh) * 2010-08-04 2012-05-02 中国科学院自动化研究所 仿生机器鱼运动的cpg反馈控制方法
CN102320223B (zh) * 2011-05-10 2013-05-08 中国科学院自动化研究所 基于液位传感反馈的两栖仿生机器人运动控制装置
KR20120138295A (ko) * 2011-06-14 2012-12-26 한국과학기술원 로봇물고기의 유영을 위한 제어 방법
KR101317761B1 (ko) * 2011-12-13 2013-10-11 한국과학기술원 안정적인 이족 보행을 위한 수직 방향과 로봇 정면, 측면 방향의 허리 중심과 발의 움직임 생성 방법
CN103558856A (zh) * 2013-11-21 2014-02-05 东南大学 动态环境下服务动机器人导航方法
CN104142688B (zh) * 2014-08-06 2017-02-15 深圳乐智机器人有限公司 一种水下机器人平台
CN105437232B (zh) * 2016-01-11 2017-07-04 湖南拓视觉信息技术有限公司 一种控制多关节移动机器人避障的方法及装置
WO2018053187A1 (en) * 2016-09-15 2018-03-22 Google Inc. Deep reinforcement learning for robotic manipulation
CN110869174B (zh) * 2017-07-10 2023-12-05 海别得公司 用于生成材料处理机器人工具路径的计算机实现的方法和系统
CN107918391A (zh) * 2017-11-17 2018-04-17 上海斐讯数据通信技术有限公司 一种移动机器人导航纠偏方法及装置
CN108052004B (zh) * 2017-12-06 2020-11-10 湖北工业大学 基于深度增强学习的工业机械臂自动控制方法
CN108549237B (zh) * 2018-05-16 2020-04-28 华南理工大学 基于深度增强学习的预观控制仿人机器人步态规划方法
CN108958241B (zh) * 2018-06-21 2020-09-04 北京极智嘉科技有限公司 机器人路径的控制方法、装置、服务器和存储介质
CN108931988B (zh) * 2018-08-14 2021-04-23 清华大学深圳研究生院 一种基于中枢模式发生器的四足机器人的步态规划方法、中枢模式发生器及机器人
CN109405843B (zh) * 2018-09-21 2020-01-03 北京三快在线科技有限公司 一种路径规划方法及装置和移动设备
CN109605377B (zh) * 2019-01-21 2020-05-22 厦门大学 一种基于强化学习的机器人关节运动控制方法及系统
CN109816315B (zh) * 2019-02-22 2023-07-21 拉扎斯网络科技(上海)有限公司 路径规划方法、装置、电子设备及可读存储介质
CN110374804B (zh) * 2019-07-03 2020-06-19 西安交通大学 一种基于深度确定性策略梯度补偿的变桨距控制方法
CN110333739B (zh) * 2019-08-21 2020-07-31 哈尔滨工程大学 一种基于强化学习的auv行为规划及动作控制方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090125285A1 (en) * 2007-11-13 2009-05-14 Gugaliya Jinendra K Decomposition of nonlinear dynamics using multiple model approach and gap metric analysis
CN104002948A (zh) * 2014-06-06 2014-08-27 西北工业大学 二自由度仿生机器鱼携带目标的控制方法
CN104881045A (zh) * 2015-06-17 2015-09-02 中国科学院自动化研究所 嵌入式视觉引导下仿生机器鱼三维追踪控制方法
CN110286592A (zh) * 2019-06-28 2019-09-27 山东建筑大学 一种基于bp神经网络的机器鱼多模态运动方法及系统
CN110488611A (zh) * 2019-09-02 2019-11-22 山东建筑大学 一种仿生机器鱼运动控制方法、控制器及仿生机器鱼
CN110909859A (zh) * 2019-11-29 2020-03-24 中国科学院自动化研究所 基于对抗结构化控制的仿生机器鱼运动控制方法、系统

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113868115A (zh) * 2021-08-30 2021-12-31 天津大学 基于多目标优化与深度强化学习的游戏软件自动测试方法
CN113868115B (zh) * 2021-08-30 2024-04-16 天津大学 基于多目标优化与深度强化学习的游戏软件自动测试方法
CN114065663A (zh) * 2021-11-15 2022-02-18 中国海洋大学 一种基于cfd和mlp的仿生鱼水动力预测方法
CN114065663B (zh) * 2021-11-15 2024-04-19 中国海洋大学 一种基于cfd和mlp的仿生鱼水动力预测方法
CN114216466B (zh) * 2021-12-09 2023-12-29 中国电子科技集团公司第五十四研究所 一种基于动态信任机制的群体智能仿生导航方法
CN114216466A (zh) * 2021-12-09 2022-03-22 中国电子科技集团公司第五十四研究所 一种基于动态信任机制的群体智能仿生导航方法
CN114779645A (zh) * 2022-04-29 2022-07-22 北京航空航天大学 一种有向固定通信拓扑下胸鳍拍动式机器鱼编队控制方法
CN114779645B (zh) * 2022-04-29 2024-04-26 北京航空航天大学 一种有向固定通信拓扑下胸鳍拍动式机器鱼编队控制方法
CN116050304B (zh) * 2023-03-15 2024-03-26 重庆交通大学 一种智能鱼流场模拟控制方法、系统、设备及存储介质
CN116050304A (zh) * 2023-03-15 2023-05-02 重庆交通大学 一种智能鱼流场模拟控制方法、系统、设备及存储介质
CN116300473B (zh) * 2023-04-14 2023-09-22 清华大学深圳国际研究生院 一种基于cpg模型的软体仿生机器鱼游动优化方法
CN116300473A (zh) * 2023-04-14 2023-06-23 清华大学深圳国际研究生院 一种基于cpg模型的软体仿生机器鱼游动优化方法
CN116700015B (zh) * 2023-07-28 2023-10-31 中国科学院自动化研究所 水下航行器主动增稳控制方法及装置
CN116700015A (zh) * 2023-07-28 2023-09-05 中国科学院自动化研究所 水下航行器主动增稳控制方法及装置

Also Published As

Publication number Publication date
CN110909859B (zh) 2023-03-24
CN110909859A (zh) 2020-03-24

Similar Documents

Publication Publication Date Title
WO2021103392A1 (zh) 基于对抗结构化控制的仿生机器鱼运动控制方法、系统
US10962976B1 (en) Motion control method and system for biomimetic robotic fish based on adversarial structured control
CN108115681B (zh) 机器人的模仿学习方法、装置、机器人及存储介质
Abreu et al. Learning low level skills from scratch for humanoid robot soccer using deep reinforcement learning
US11529733B2 (en) Method and system for robot action imitation learning in three-dimensional space
Lin et al. Evolutionary digital twin: A new approach for intelligent industrial product development
Yan et al. Efficient cooperative structured control for a multijoint biomimetic robotic fish
Liu et al. Modeling and control of robotic manipulators based on artificial neural networks: a review
CN112405542A (zh) 基于脑启发多任务学习的肌肉骨骼机器人控制方法及系统
Zhang et al. A cerebellum-inspired prediction and correction model for motion control of a musculoskeletal robot
CN114326722B (zh) 六足机器人自适应步态规划方法、系统、装置及介质
Yan et al. Hierarchical policy learning with demonstration learning for robotic multiple peg-in-hole assembly tasks
CN111531543B (zh) 基于生物启发式神经网络的机器人自适应阻抗控制方法
Meng et al. Reinforcement learning based variable impedance control for high precision human-robot collaboration tasks
Wang et al. Deep reinforcement learning of cooperative control with four robotic agents by MADDPG
CN116841303A (zh) 一种针对水下机器人的智能择优高阶迭代自学习控制方法
Rajendran et al. Learning based speed control of soft robotic fish
CN116149179A (zh) 针对机器鱼的非一致轨迹长度差分进化迭代学习控制方法
Zhou et al. Intelligent Control of Manipulator Based on Deep Reinforcement Learning
Wang et al. Target-following control of a biomimetic autonomous system based on predictive reinforcement learning
Jalaeian et al. A dynamic-growing fuzzy-neuro controller, application to a 3PSP parallel robot
Gale et al. RBF network pruning techniques for adaptive learning controllers
Yan et al. Motion Optimization for a Robotic Fish Based on Adversarial Structured Control
Man et al. Intelligent Motion Control Method Based on Directional Drive for 3-DOF Robotic Arm
Vaandrager et al. Imitation learning with non-parametric regression

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20894007

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20894007

Country of ref document: EP

Kind code of ref document: A1