CN114089633A - Multi-motor coupling drive control device and method for underwater robot - Google Patents

Multi-motor coupling drive control device and method for underwater robot Download PDF

Info

Publication number
CN114089633A
CN114089633A CN202111381879.5A CN202111381879A CN114089633A CN 114089633 A CN114089633 A CN 114089633A CN 202111381879 A CN202111381879 A CN 202111381879A CN 114089633 A CN114089633 A CN 114089633A
Authority
CN
China
Prior art keywords
network
motor
strategy
error
evaluation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111381879.5A
Other languages
Chinese (zh)
Other versions
CN114089633B (en
Inventor
王伟然
姚杰
葛慧林
智鹏飞
朱志宇
邱海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202111381879.5A priority Critical patent/CN114089633B/en
Publication of CN114089633A publication Critical patent/CN114089633A/en
Application granted granted Critical
Publication of CN114089633B publication Critical patent/CN114089633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Abstract

The invention relates to the technical field of underwater robot control, in particular to a multi-motor coupling drive control device and a method for an underwater robot. Aiming at the scheme, a multi-vortex Decision drive Algorithm (ML-PDDA) controller is designed to control the rotating speed of each motor, and weight factors are distributed on line for synchronous errors among the motors.

Description

Multi-motor coupling drive control device and method for underwater robot
Technical Field
The invention relates to the technical field of underwater robot control, in particular to a multi-motor coupling driving control device and method for an underwater robot.
Background
The underwater robot is widely applied to the fields of military affairs, underwater resource exploration, underwater search and rescue and the like, and can replace human beings to complete more and more difficult tasks along with the development of science and technology. In each task, it is necessary to control the movement locus and attitude of the underwater robot with high accuracy. However, the underwater environment is complex, and different environments generate different interferences to the movement and operation of the robot, so that a proper control structure needs to be designed, the mutual cooperative work of a plurality of motors is realized, the underwater robot can resist various interferences underwater, and the subsequent operation task is completed according to accurate track movement.
At present, the multi-motor cooperative control mainly comprises the following algorithms:
(1) parallel control
The given rotating speeds of all motors in the control system are the same, and synchronization can be realized only when the loads of all motors are strictly the same. Each motor can only feed back the tracking error of the motor, the synchronous error between the motors is not considered, the motor control units are independent and have no coupling, when a certain motor unit is interfered by the outside, other motors cannot receive interference information, multi-motor coordination control cannot be realized, and the disturbance resistance is poor. Obviously, the method cannot meet the complex underwater environment.
(2) Master-slave control
The relation between the motors is a master-slave relation, the output of the main motor is used as a slave motor rotating speed input reference value, and the slave motor tracks the speed of the main motor. However, the master-slave control system has no feedback mechanism from the slave to the master, if a motor of a certain stage is interfered, all motors of the upper stage of the motor cannot receive the interference information, all control units of the lower stage can make corresponding speed adjustment and transmit the speed adjustment to the lower stage by the same method, so that larger time delay and poor anti-interference performance are brought, and the use of the method is limited due to the defect.
(3) Virtual spindle control
The virtual main shaft control system simulates the synchronous characteristic of a mechanical main shaft. After the input rotating speed signal of the motor is acted by the main shaft, the output signal is used as a given signal of each driving unit, and the driving units track the given signal. Since the signal is a signal obtained by being subjected to the action of the main shaft and filtered, there may be a deviation between the main reference value and the actual rotational speed of the motor.
(4) Cross coupling control
And comparing the speed or position signals of two adjacent motors to obtain a difference value, using the difference value as a system feedback signal, and tracking the feedback signal. The system is able to react to any motor load change. This strategy is not suitable for more than two motors because calculating feedback approximations for more than two motors is cumbersome.
(5) Offset coupling control
The deviation coupling control is to feed back the sum of the error of each motor and all the other motors as a compensation signal, thereby realizing the multi-motor synchronous control. However, the calculation amount is greatly increased, and the method has the problems of saturation failure of the controller and the like in the starting process.
When the underwater robot finishes underwater operation, the underwater robot can not only overcome the interference of the surrounding environment to enable the underwater robot to be in a stable state, but also run according to an expected track. The above multi-motor cooperative control algorithms all aim at keeping the rotation speeds of multiple motors absolutely synchronous, so that the underwater robot can be ensured to operate in a certain attitude all the time, but the underwater robot cannot be ensured to stably change the course to operate as expected.
Disclosure of Invention
In order to solve the technical problems, the invention discloses a multi-motor coupling driving control device and a method for an underwater robot, which are used for ensuring that the underwater robot can resist the interference of the surrounding environment in a complex underwater environment and accurately and stably drive the underwater robot. The device can ensure that the underwater robot stably runs in a complex underwater environment, provides powerful guarantee for underwater operation of the underwater robot, improves the working efficiency of the underwater robot, and also reduces the risk of the underwater operation of workers.
According to the invention, a multi-vortex Decision-Driven Algorithm (ML-PDDA) is used for distributing weight factors to synchronous errors among all motors, and a multi-motor mutual coupling control structure is designed in cooperation with multi-motor different-speed operation. For a single motor control unit, a controller is designed by adopting a multi-eddy current decision-making driving algorithm (ML-PDDA), and the controller is matched with a multi-motor mutual coupling control system to realize the control of the underwater robot driving device.
The invention adopts the following specific technical scheme:
a multi-motor coupling driving control device of an underwater robot is composed of a multi-motor mutual coupling algorithm and a depth certainty strategy gradient algorithm controller, and specifically comprises the following three modules: the system comprises a single motor control module, a synchronous error weight distribution module and a multi-motor mutual coupling control module.
In the technical scheme, a single motor control module is composed of an ML-PDDA algorithm controller and a permanent magnet synchronous motor, the rotating speed error of the motor is used as the input quantity of the controller, the vector control model of the permanent magnet synchronous motor is combined, and the ML-PDDA algorithm strategy network processing is carried out to obtain the control quantity q-axis current of the motor model and a synchronous error weight factor alpha, so that the rotating speed control of the motor is realized, and the cooperative control of the underwater robot driving is realized by matching with a multi-motor mutual coupling control module.
According to the further improvement of the invention, the synchronous error weight distribution module utilizes an evaluation reward mechanism of an ML-PDDA algorithm to set the weight factor of the synchronous error, when the reward generated by the output weight factor is maximum, the optimal weight factor is obtained, and the set synchronous error is input into the controller as a state quantity, so that the coordination condition among multiple motors can be better reflected, the power of a main propulsion motor of the underwater robot, namely the 1 st motor, is maximum, and the expected rotating speed is defined as the reference rotating speed. The actual rotating speed of the ith motor is recorded as niThe synchronization error of the ith motor and the rest motors is ei', take the 1 st motor as an example, the synchronization error e1' the calculation formula is as in formula (1).
e'1=α1×|n1-n2|+α2×|n1-n3| (1)
In equation (1): alpha is alpha1,α2Is an error weight factor, n, set by an ML-PDDA algorithm1,n2,n3Are respectivelyActual rotational speed of each motor.
The invention also discloses a multi-motor coupling driving control method of the underwater robot, which specifically comprises the following steps:
step 1: designing a strategy network and an evaluation network;
step 2: constructing a value function;
and 3, step 3: finding an optimal strategy;
and 4, step 4: and updating the evaluation network.
In the step 1 of designing a strategy network and an evaluation network, the strategy network consists of an input layer, two full-connection layers and an output layer, the input quantity of a state input layer comprises tracking errors, synchronous errors and backward differences of all motors and 6 accumulated states, so 6 nodes are set, the full-connection layers are respectively provided with 200 and 200 nodes, and the output layer comprises iqAnd [ alpha ]12]Three control quantities are set, so 3 nodes are set, and the input layer and the output layer both adopt Relu functions as activation functions; the evaluation network structure is similar to the strategy network, 6 error state quantities and 3 output control quantities of the motor are jointly used as input quantities of the evaluation network, the input quantities are fused through the neural convolution network, 9 state quantities are input into the full-connection layer, and finally the output pair control quantity i is outputqAnd [ alpha ]12]The number of the nodes of the input layer is set to be the same as that of the policy network, the output layer only has one evaluation value Q, so that the number of the nodes is set to be 1, and the Sigmoid function is adopted as the activation function.
In the step 2, a value function Q (e, a) is constructed to evaluate a motor control quantity Q-axis current i output by the strategy networkqAnd the error weight vector α ═ α12]And training a strategy network and an evaluation network, wherein the value function of the strategy mu is as the formula (2).
Figure BDA0003364350050000041
In equation (2): e.g. of the typetThe input quantity of the controller at the time t comprises a motor tracking error vector and a synchronous error vector; a istIs the control quantity output by the controller according to the input motor speed error at the time t, including iqAnd α ═ α12];γkThe discount factor being k steps is here taken to be 0.99, rt+kIs that the controller outputs a under the condition of errors e and etThe reward to time k is as shown in equation (3).
Figure BDA0003364350050000042
In equation (3): n isi(t)Is the actual rotating speed of the ith motor at the moment t; e.g. of a cylinderi(t)The tracking error of the ith motor at the time t is 0.1, the tracking error is prevented from being 0, and the reward tends to be infinite; e.g. of a cylinderi(t)' is the synchronization error of the ith motor with other motors.
The reward is increased only when the tracking error and the synchronization error decrease, i.e. the motor speed approaches the desired value and keeps running in coordination, and if the desired value is completely reached, the reward is obtained to be maximum, otherwise the reward is decreased. The maximum reward is obtained when the tracking error and the synchronization error are minimum, the output control quantity of the controller at the moment is considered as the optimal control quantity, iqAnd [ alpha ]12]Is the most suitable for the working requirement of multiple motors at the moment.
In the step 3 for finding the optimal strategy, because the gradient algorithm of the depth deterministic strategy adopts the deterministic strategy, i of each output of the controllerqAnd alpha can be obtained by strategy mu calculation, and an evaluation function J is definedπAnd (mu) to evaluate the new strategy learned by the current ML-PDDA algorithm, as shown in formula (4).
Jπ(μ)=E[Qμ(e,μ(e))] (4)
In equation (4): qμUnder the condition that different motor rotating speed errors are input by a controller, a value function outputs i according to a mu strategyqAnd the Q value calculated by α, i.e. the cumulative prize earned by the μ strategy, is calculated according to the formula (2).
The optimal strategy, i.e. the strategy mu that can achieve the maximum jackpot, is found by maximizing equation (4), as in equation (5).
μ=arg maxμJπ(μ) (5)
Equation (4) for parameters of policy μ
Figure BDA0003364350050000043
The deviation is calculated to obtain the strategy gradient, as shown in equation (6).
Figure BDA0003364350050000044
And updating the policy network parameters by adopting a gradient descent method, such as formula (7).
Figure BDA0003364350050000045
In equation (7): thetaμIs a policy network parameter.
Updating the policy network by applying the policy μ under the maximum jackpot prize to the policy network that generates i that receives the maximum jackpot prizeqAnd [ alpha ]12]And (6) updating the direction.
In the step 4, updating an evaluation network, establishing an experience pool, and inputting motor rotating speed errors e and e' of the controller and outputting iqAnd alpha, corresponding to the prize r earnedtAnd the motor rotating speed error at the next moment is stored in an experience pool as a group of experience data, and the target network acquires the experience data group from the experience pool to update and evaluate network parameters
Figure BDA0003364350050000051
The motor rotating speed error e at the next momentt+1Put into a target strategy network to obtain a determined output iqAnd α is denoted as at+1Then a is addedt+1And et+1Fusing together through a neural convolution network, and jointly using the fused neural convolution network as the input of a target value network to obtain a target network pair at+1The evaluation value Q' is then calculated, and the actual evaluation y of the target network is calculatedtAs in equation (8).
yt=rt+γQ'(et+1,μ'(et+1μ')|θQ') (8)
Formula (8):
Figure BDA0003364350050000052
is output at the target strategy muqAnd alpha;
Figure BDA0003364350050000053
is the target evaluation network pair iqAnd evaluation of α;
Figure BDA0003364350050000054
respectively, a target policy and a target evaluation network parameter.
And meanwhile, establishing an error function L, calculating the error of the online evaluation network, and updating the online evaluation network by minimizing the error, as shown in the formula (9).
Figure BDA0003364350050000055
Evaluation network parameters by loss function L pairs
Figure BDA0003364350050000056
The derivative is obtained as in equation (10).
Figure BDA0003364350050000057
The network parameter updates are evaluated as in equation (11).
Figure BDA0003364350050000058
The evaluation network is updated through the loss function L, so that the evaluation network can more accurately calculate the reward obtained by the output control quantity of the strategy network, and the ML-PDDA controller outputs i which most meets the actual operation requirement of multiple motorsqAnd [ alpha ]12]. Online policyThe network and the online evaluation network continuously update network parameters through a strategy gradient and a loss function, and the target strategy network and the target evaluation network are updated through a formula (12) in small-batch training, so that the correlation between the cumulative reward Q calculated by the online evaluation network and the cumulative reward Q' calculated by the target evaluation network can be reduced, and the output i of the online strategy network can be improvedqAnd [ alpha ]12]The effectiveness of (c).
Figure BDA0003364350050000061
In equation (12):
Figure BDA0003364350050000062
respectively target policy network output iqAnd [ alpha ]12]Action parameter of (1) and target evaluation network output iqAnd [ alpha ]12]A cost function parameter of (2);
Figure BDA0003364350050000063
respectively, online strategy and online evaluation network parameters; k is the learning rate, and is taken as 0.001.
The data set stored in the experience pool is used for training and updating the network parameters of the evaluation strategy, so that the controller outputs a control quantity iqAnd the error weight alpha acts on the controlled motor, and the rotating speed and the error output of the motor are fed back to the controller, so that the iterative training is completed. When the underwater robot enters an unfamiliar water area, the experience pool can serve as a function of accumulating experience data. Through the accumulation of experience data, when the underwater robot needs to change the course and resist external interference, the controller can quickly output the i capable of generating the maximum reward according to the instruction of the upper computerqAnd alpha, the tracking error and the synchronization error of the multiple motors are minimized, the multiple motors operate according to the expected rotating speed, and meanwhile, the multiple motors can keep constant rotating speed difference through the coupling of the synchronization error of weight distribution, so that the controller has quick dynamic response and strong robustness.
The invention has the beneficial effects that:
1. the virtual main shaft, cross coupling and deviation coupling of the existing multi-motor control scheme are that each motor gives the same expected rotating speed, and the tracking error and the synchronization error are eliminated through a controller, so that the rotating speed of each motor is kept synchronous. This patent is at first for every motor sets up the rotational speed proportionality coefficient for each motor can obtain different expectation rotational speeds according to underwater robot when needs change the course, distributes synchronous error weight and the rotational speed cooperation of difference simultaneously, guarantees many motors intercoupling, strengthens many motor system's interference killing feature. In the conventional multi-motor cooperative control scheme, the weight is distributed to the synchronous error of the motor rotating speed, the purpose is to ensure that the rotating speeds of all the motors are the same, and online setting can not be carried out according to the actual running condition of the motors, so that the underwater robot cannot adjust the course in time and resist the interference of the environment, and therefore, the multi-motor mutual coupling control device designed by the patent is more suitable for the actual working environment of the underwater robot and has stronger pertinence;
2. the multi-vortex decision-driven algorithm (ML-PDDA) effectively solves the problem of sequence decision under a high-dimensional state space by utilizing the perception capability of deep learning. The existing control methods applied to the field of multi-motor cooperative control comprise fuzzy logic, neural networks, model predictive control and the like, the methods need a large amount of past empirical data and complex mathematical models, the convergence rate is low, the interference of an underwater environment on an underwater robot is nonlinear, and the change of internal parameters of a system makes it difficult to establish a proper mathematical model and obtain a good control effect. The ML-PDDA algorithm introduces water flow disturbance on the basis of the PDDA algorithm, simulates water flow to train a strategy network through a plurality of Lamb vortexes, explores a strategy more suitable for an underwater environment, improves training efficiency and stability, and enables the underwater robot to better adapt to the interference of water flow during movement. The ML-PDDA algorithm has good online learning capacity, can learn a mathematical model of the motor according to input and output data of the motor, and uses an online network and a target network, so that the learning process is more stable, and the convergence rate of the model is higher;
3. the mutual coupling control of multiple motors is matched with a multi-eddy decision drive algorithm (ML-PDDA), so that the rotation speed of each motor can be independently changed, and the underwater robot driving device is cooperatively controlled through the mutual coupling of relatively improved synchronous errors. The underwater robot can flexibly change the course according to the instruction of the control system and resist the interference around.
Drawings
FIG. 1 is a schematic diagram of a multi-motor mutually coupled ML-PDDA control algorithm control device.
Fig. 2 is a structural diagram of synchronization error calculation in the present invention.
FIG. 3 is a flow chart of the synchronous error weight factor setting in the present invention.
FIG. 4 is a diagram of a multi-vortex decision-driven algorithm (ML-PDDA) network architecture in accordance with the present invention.
Fig. 5 is a diagram of a policy network architecture in the present invention.
Fig. 6 is a diagram illustrating an evaluation network structure in the present invention.
Detailed Description
For the purpose of enhancing understanding of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and examples, which are provided for illustration only and are not intended to limit the scope of the present invention.
The embodiment provides an underwater robot multi-motor coupling drive control device which comprises three modules: the system comprises a single motor control module, a synchronous error weight distribution module and a multi-motor mutual coupling control module. The single motor control module is composed of an ML-PDDA algorithm controller and a permanent magnet synchronous motor, the rotating speed error of the motor is used as the input quantity of the controller, the vector control model of the permanent magnet synchronous motor is combined, the control quantity q-axis current and the synchronous error weight factor alpha of the motor model are obtained through ML-PDDA algorithm strategy network processing, the rotating speed control of the motor is achieved, and the underwater robot driving cooperative control is achieved by matching with a multi-motor mutual coupling control module, as shown in figure 1.
The drive control device of the underwater robot with the multiple motors coupled with each other is designed in the embodiment, the upper computer passes through the actual running route of the underwater robot, the expected rotating speed of the main propulsion motor is used as a reference, and the rotating speed proportion of the rest side propulsion motor and the rest pitch propulsion motor is adjusted, so that the multiple motors cooperatively complete the drive of the underwater robot at different rotating speeds. In the actual power configuration of the underwater robot, a plurality of power propulsion motors and posture control motors (side-push and pitch control motors) may be provided, and the motors may be decomposed and combined in the coordinate system of the underwater robot, and are classified according to the acting force directions: the main propulsion motor, the side propulsion motor and the pitching motor. For the convenience of the description of the subsequent content in this patent, define the main propulsion motor after the thrust normalization process as the 1 st motor, the side propulsion motor after the thrust normalization process as the 2 nd motor, and the pitch motor after the thrust normalization process as the 3 rd motor.
Desired speed n in fig. 1refThe constant rotation speed of the 1 st motor is taken as a reference, the upper computer passes through the motion path of the underwater robot, and the rotation speed ratio R of the motors 2 and 3 is constant2、R3Making adjustments to obtain the actual expected speed n of each motorref1,2,3And then with the actual speed n of the motor1,2,3And performing subtraction to obtain a tracking error e. Since the actual expected rotation speed of each motor is different, the synchronization error of each motor cannot be directly calculated, so that the weighting factor α is assigned to the synchronization error between different motors in fig. 1, and the synchronization error e' is calculated. Three states of e and e' are selected as input quantities of the controller, namely errors, backward differences of the errors and accumulation. The learning capability of the ML-PDDA algorithm is utilized to process six input state quantities and output a control quantity q-axis current i of the motorqAnd the error weight alpha is used for finishing the accurate control of the motor and realizing the different-speed cooperative driving of the underwater robot.
The underwater robot reaches an underwater operation position from the water surface according to an instruction of a central control system, the underwater robot needs to go down, go forward, float up and the like, the moving direction of the underwater robot needs to be adjusted for many times in the process, and the rotating speed of each motor needs to be changed because the underwater robot does not have a rudder, so that the rotating speed difference is formed among the motors, the steering/posture adjusting thrust is formed, and the underwater robot moves according to a specified track. The traditional multi-motor cooperative control device is generally applied to the field of chemical engineering, and each motor in a system is required to keep the same speed, so that the traditional multi-motor cooperative control device cannot meet the control requirements of time-varying and dynamic response requirements and different-speed adjustment of an underwater robot.
When the underwater robot turns or adjusts the posture, the expected rotating speeds of the three motors are different, so the method designs a synchronous error weight distribution module, utilizes an evaluation reward mechanism of an ML-PDDA algorithm to set the weight factors of the synchronous errors, obtains the optimal weight factors when the reward generated by the output weight factors is maximum, and inputs the set synchronous errors into a controller as state quantity to better reflect the cooperation condition among the multiple motors.
The power of the main propulsion motor of the underwater robot, namely the No. 1 motor, is maximum, so that the expected rotating speed of the underwater robot is defined as the reference rotating speed. The actual rotating speed of the ith motor is recorded as niThe synchronization error of the ith motor and the rest motors is ei', take the 1 st motor as an example, the synchronization error e1' the calculation formula is as in formula (1).
e'1=α1×|n1-n2|+α2×|n1-n3| (1)
In equation (1): alpha is alpha1,α2Is an error weight factor, n, set by an ML-PDDA algorithm1,n2,n3The actual rotational speeds of the motors are respectively.
In equation (1), the error weight factor α1、α2The reward under different weight factors is calculated through a reward mechanism as a formula (3) by setting a strategy network of an ML-PDDA controller, and the maximum reward can be obtained only when the tracking error and the synchronization error are reduced. When the expected given rotating speeds of the three motors are consistent, the weight of the synchronous error among the multiple motors does not need to be changed; when the expected given rotating speeds of the three motors are inconsistent, the constant rotating speed difference between the multiple motors can be kept for cooperative operation by setting the synchronous error weight factor.
The synchronization error calculation module is shown in fig. 2.
In fig. 2, the actual synchronization error between the motor 1 and the motors 2 and 3 is calculated, the error weighting factors obtained by ML-PDDA algorithm setting are used to recombine the actual synchronization error to obtain a new synchronization error e ', the backward difference Δ e' and the accumulated Σ e 'of e' are calculated, and the three state quantities are used as the input quantities of the ML-PDDA controller and also used as feedback.
The error weight factor is set by using the learning ability of the ML-PDDA algorithm, and the value function accumulated reward is maximized through the constructed value function and reward mechanism as shown in FIG. 3. Will award rtAs an index for evaluating the optimal motor control effect of the ML-PDDA controller, when the controller trains data, a small-batch training mode is adopted, and the length of each small batch is the total training time TfAnd controller sampling time TsIs rounded upwards, i.e. Tf/Ts. Each training batch receives a corresponding reward rtWill award the maximum rtmaxAnd setting the optimal quantity, and outputting a synchronous error weight factor alpha at the moment to complete the control of the multi-motor mutual coupling control device on the driving of the underwater robot.
FIG. 4 is a diagram of the ML-PDDA controller in FIG. 1, which can effectively solve the problem of high-dimensional state space sequence decision by using the strong learning ability of ML-PDDA, selects the synchronous error and tracking error of the motor as state quantities, and outputs the control quantity q-axis current i of the motor through an ML-PDDA algorithm strategy networkqAnd the error weight vector α ═ α12]. When the strategy network is trained, a plurality of Lamb vortexes are introduced to be superposed to simulate disturbance of water flow, the strategy network is trained by introducing the water flow disturbance, the controller is guided to explore strategies according to the working environment of the underwater robot, the training effectiveness is improved, and the ML-PDDA controller can be better adapted to the underwater environment. The method adopts a small batch training mode, and the length of each small batch is the total training time TfAnd controller sampling time TsIs rounded upwards, i.e. Tf/Ts. Each training batch receives a reward rtBy TfMaximum reward r over timetmaxTo determine whether the control amount output by the controller is optimal. The rotating speed of the machine is controlled according to the formula (3)When the tracking error and the synchronization error are minimum, the maximum reward can be obtained, the actual rotating speed is closest to the expected rotating speed, the multi-motor synchronization effect is best, and the control quantity i at the moment is outputqAnd alpha, the control effect in the state is considered to be optimal, and the optimal control effect is stored in an experience pool.
Because the working environment of the underwater robot is complicated, when the course of the underwater robot needs to be changed, the input expected rotating speed of each motor is adjusted only by the upper computer, and the control quantity i of the motor is output by the ML-PDDA controllerqIt is also difficult to make the underwater robot resistant to surrounding nonlinear interference, where multiple motors are required to cooperate with each other. When the course is changed, the controller can distribute a weight factor to the synchronous error of the motor to obtain a new synchronous error, and the new synchronous error is input into the controller as a state quantity, and the controller outputs i by combining the tracking error and the synchronous errorqTherefore, constant rotating speed difference can be kept among all motors, all motors are coupled with each other by calculating new synchronous errors, and multi-motor cooperative control is realized. And when the underwater robot enters the next navigation state, the upper computer changes the expected rotating speed of the motor, and the ML-PDDA controller distributes new weight to the synchronous error so that the underwater robot moves according to the set route.
Step 1: design policy network and evaluation network
The strategy network comprises an input layer, two full-connection layers and an output layer, wherein the input quantity of the state input layer comprises tracking errors, synchronous errors, backward differences and accumulated 6 states of each motor, so 6 nodes are arranged, the full-connection layers are respectively provided with 200 and 200 nodes, and the output layer comprises iqAnd [ alpha ]12]Three control quantities, 3 nodes are set, and the Relu function is adopted by the input layer and the output layer as the activation function.
The evaluation network structure is similar to the strategy network, 6 error state quantities and 3 output control quantities of the motor are jointly used as input quantities of the evaluation network, the input quantities are fused through the neural convolution network, 9 state quantities are input into the full-connection layer, and finally the output pair control quantity i is outputqAnd [ alpha ]12]The number of input layer nodes is set to be the same as that of the policy network, and the evaluation value Q is outputThe layer has only one evaluation value Q, so the node number is set to 1, and the Sigmoid function is adopted as the activation function.
Step 2: constructing a value function
Constructing a value function Q (e, a) to evaluate a motor control quantity Q-axis current i output by the strategy networkqAnd the error weight vector α ═ α12]And training a strategy network and an evaluation network, wherein the value function of the strategy mu is as the formula (2).
In equation (2): e.g. of the typetThe input quantity of the controller at the time t comprises a motor tracking error vector and a synchronous error vector; a istIs the control quantity output by the controller according to the input motor speed error at the time t, including iqAnd α ═ α12];γkThe discount factor being k steps is here taken to be 0.99, rt+kIs that the controller outputs a under the condition of errors e and etThe reward to time k is as shown in equation (3).
In equation (3): n isi(t)Is the actual rotating speed of the ith motor at the moment t; e.g. of the typei(t)The tracking error of the ith motor at the time t is 0.1, the tracking error is prevented from being 0, and the reward tends to be infinite; e.g. of the typei(t)' is the synchronization error of the ith motor with other motors.
The reward is increased only when the tracking error and the synchronization error decrease, i.e. the motor speed approaches the desired value and keeps running in coordination, and if the desired value is completely reached, the reward is obtained to be maximum, otherwise the reward is decreased. When the tracking error and the synchronization error are minimum, the reward is maximum, the output control quantity of the controller at the moment is considered as the optimal control quantity, iqAnd [ alpha ]12]Is the most suitable for the working requirement of multiple motors at the moment.
And 3, step 3: finding optimal strategies
Since the deep deterministic strategy gradient algorithm employs a deterministic strategy, i is output by the controller at a timeqAnd alpha can be obtained by strategy mu calculation, and an evaluation function J is definedπAnd (mu) to evaluate the new strategy learned by the current ML-PDDA algorithm, as shown in formula (4).
Jπ(μ)=E[Qμ(e,μ(e))] (4)
In equation (4): qμUnder the condition that different motor rotating speed errors are input by a controller, a value function outputs i according to a mu strategyqAnd the Q value calculated by α, i.e. the cumulative prize earned by the μ strategy, is calculated according to the formula (2).
The optimal strategy, i.e. the strategy mu that can achieve the maximum jackpot, is found by maximizing equation (4), as equation (5).
μ=arg maxμJπ(μ) (5)
Equation (4) for parameters of policy μ
Figure BDA0003364350050000111
The deviation is calculated to obtain the strategy gradient, as shown in equation (6).
Figure BDA0003364350050000112
And updating the policy network parameters by adopting a gradient descent method, such as formula (7).
Figure BDA0003364350050000113
In equation (7): thetaμIs a policy network parameter.
Updating the policy network by applying the policy μ under the maximum jackpot prize to the policy network that generates i that receives the maximum jackpot prizeqAnd [ alpha ]12]And (6) updating the direction.
And 4, step 4: updating an evaluation network
Establishing an experience pool, inputting motor rotating speed errors e and e' of a controller, and outputting iqAnd alpha, corresponding to the prize r earnedtAnd the motor rotating speed error at the next moment is stored in an experience pool as a group of experience data, and the target network acquires the experience data group from the experience pool to update and evaluate network parameters
Figure BDA0003364350050000114
The motor rotating speed error e at the next momentt+1Put into a target strategy network to obtain a determined output iqAnd α is denoted as at+1Then a is addedt+1And et+1Fusing together through a neural convolution network, and jointly using the fused neural convolution network as the input of a target value network to obtain a target network pair at+1The evaluation value Q' is then calculated, and the actual evaluation y of the target network is calculatedtAs in equation (8).
yt=rt+γQ'(et+1,μ'(et+1μ')|θQ') (8)
Formula (8):
Figure BDA0003364350050000121
is output at the target strategy muqAnd alpha;
Figure BDA0003364350050000122
is the target evaluation network pair iqAnd evaluation of α;
Figure BDA0003364350050000123
respectively, a target policy and a target evaluation network parameter.
And meanwhile, establishing an error function L, calculating the error of the online evaluation network, and updating the online evaluation network by minimizing the error, as shown in the formula (9).
Figure BDA0003364350050000124
Evaluation network parameters by loss function L pairs
Figure BDA0003364350050000125
The derivative is obtained as in equation (10).
Figure BDA0003364350050000126
The network parameter updates are evaluated as in equation (11).
Figure BDA0003364350050000127
Updating the evaluation network through the loss function L, so that the evaluation network can more accurately calculate the reward obtained by the output control quantity of the strategy network, and the ML-PDDA controller outputs i which most meets the actual operation requirement of multiple motorsqAnd [ alpha ]12]. The online strategy network and the online evaluation network continuously update network parameters through strategy gradients and loss functions, and the target strategy network and the target evaluation network are updated through a formula (12) in small-batch training, so that the correlation between the cumulative reward Q calculated by the online evaluation network and the cumulative reward Q' calculated by the target evaluation network can be reduced, and the output i of the online strategy network can be improvedqAnd [ alpha ]12]The effectiveness of (c).
Figure BDA0003364350050000128
In equation (12):
Figure BDA0003364350050000129
respectively target policy network output iqAnd [ alpha ]12]Operation parameter of (2), target evaluation network output iqAnd [ alpha ]12]A cost function parameter of (2);
Figure BDA00033643500500001210
respectively, online strategy and online evaluation network parameters; k is the learning rate, and is taken as 0.001.
The data set stored in the experience pool is used for training and updating the network parameters of the evaluation strategy so that the controller outputs a control quantity iqAnd the error weight alpha acts on the controlled motor, and the rotating speed and the error output of the motor are fed back to the controller, so that the iterative training is completed. When the underwater robot enters an unfamiliar water area, the experience pool can serve as a function of accumulating experience data. Through the accumulation of empirical data, when the underwater robot needs to be changedCourse and resistance to external interference, the controller can rapidly output i capable of generating the maximum reward according to the instruction of the upper computerqAnd alpha, the tracking error and the synchronization error of the multiple motors are minimized, the multiple motors operate according to the expected rotating speed, and meanwhile, the multiple motors can keep constant rotating speed difference through the coupling of the synchronization error of weight distribution, so that the controller has quick dynamic response and strong robustness.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (8)

1. A multi-motor coupling driving control device of an underwater robot is composed of a multi-motor mutual coupling algorithm and a depth certainty strategy gradient algorithm controller, and is characterized by comprising three modules: the system comprises a single motor control module, a synchronous error weight distribution module and a multi-motor mutual coupling control module.
2. The underwater robot multi-motor coupling drive control device as claimed in claim 1, wherein the single motor control module is composed of an ML-PDDA algorithm controller and a permanent magnet synchronous motor.
3. The underwater robot multi-motor coupling drive control device according to claim 2, wherein the synchronous error weight distribution module adjusts the weight factor of the synchronous error by using an evaluation reward mechanism of an ML-PDDA algorithm, obtains the optimal weight factor when the reward generated by the output weight factor is the maximum, inputs the adjusted synchronous error into the controller as a state quantity, and can better reflect the cooperation condition among the multiple motors, and the main propulsion motor of the underwater robot is the 1 st stationThe desired speed is defined as the reference speed since the motor has the maximum power, and the actual speed of the ith motor is recorded as niThe synchronization error of the ith motor and the rest motors is ei', take the 1 st motor as an example, the synchronization error e1' the calculation formula is as in formula (1).
e′1=α1×|n1-n2|+α2×|n1-n3| (1)
In equation (1): alpha is alpha1,α2Is an error weight factor, n, set by an ML-PDDA algorithm1,n2,n3The actual rotational speeds of the motors are respectively.
4. A multi-motor coupling drive control method of an underwater robot is characterized in that the multi-motor coupling drive control device of the underwater robot is used, and the method specifically comprises the following steps:
step 1: designing a strategy network and an evaluation network;
step 2: constructing a value function;
and 3, step 3: finding an optimal strategy;
and 4, step 4: and updating the evaluation network.
5. The underwater robot multi-motor coupling drive control method according to claim 4, wherein in the step 1 of designing the strategy network and the evaluation network, the strategy network is composed of an input layer, two full-connection layers and an output layer, and the input quantity of a state input layer is provided with 6 nodes: including tracking error, synchronous error and the backward difference and the 6 states of accumulation of each motor, the full articulamentum sets up 200 and 200 nodes respectively, and the output layer sets up 3 nodes: includes iqAnd [ alpha ]12]The input layer and the output layer adopt Relu functions as activation functions; the evaluation network structure takes 6 error state quantities and 3 output control quantities of the motor as input quantities of the evaluation network, the input quantities are fused through the neural convolution network, 9 state quantities are input into the full-connection layer, and finally the control quantity i of the output pair is outputqAnd [ alpha ]12]The number of the nodes of the input layer is set to be the same as that of the policy network, the output layer only has one evaluation value Q, the number of the nodes is set to be 1, and the activation function adopts a Sigmoid function.
6. The underwater robot multi-motor coupling drive control method according to claim 5, characterized in that in the step 2, a value function Q (e, a) is constructed to evaluate a motor control quantity Q-axis current i output by the strategy networkqAnd the error weight vector α ═ α12]And training a strategy network and an evaluation network, wherein the value function of the strategy mu is as the formula (2):
Figure FDA0003364350040000021
in equation (2): e.g. of the typetThe input quantity of the controller at the time t comprises a motor tracking error vector and a synchronous error vector; a istIs the control quantity output by the controller according to the input motor speed error at the time t, including iqAnd α ═ α12];γkThe discount factor being k steps is here taken to be 0.99, rt+kIs that the controller outputs a under the condition of errors e and etThe reward to time k, as shown in equation (3):
Figure FDA0003364350040000022
in equation (3): n isi(t)Is the actual rotating speed of the ith motor at the moment t; e.g. of the typei(t)The tracking error of the ith motor at the time t is 0.1, the tracking error is prevented from being 0, and the reward tends to be infinite; e.g. of the typei(t)' is the synchronization error of the ith motor with other motors.
7. The underwater robot multi-motor coupling drive control method according to claim 6, wherein in the step 3 of finding the optimal strategy, a score is definedPrice function Jπ(μ) to evaluate the new strategy learned by the current ML-PDDA algorithm, as formula (4):
Jπ(μ)=E[Qμ(e,μ(e))] (4)
in equation (4): qμUnder the condition that different motor rotating speed errors are input by a controller, a value function outputs i according to a mu strategyqAnd the Q value calculated by alpha, i.e. the cumulative prize earned by the mu strategy, is calculated according to the formula (2),
finding the optimal strategy, i.e. the strategy μ that can achieve the maximum jackpot, according to maximizing equation (4), is shown in equation (5):
μ=arg maxμJπ(μ) (5)
equation (4) for parameters of policy μ
Figure FDA0003364350040000023
Calculating the partial derivative to obtain a strategy gradient, as shown in formula (6):
Figure FDA0003364350040000024
updating the policy network parameters by adopting a gradient descent method, as shown in formula (7):
Figure FDA0003364350040000031
in equation (7): thetaμIs a policy network parameter;
updating the policy network by applying the policy μ under the maximum jackpot prize to the policy network that generates i that receives the maximum jackpot prizeqAnd [ alpha ]12]And (6) updating the direction.
8. The underwater robot multi-motor coupling drive control method as claimed in claim 7, wherein in the 4 th step updating evaluation network, an experience pool is established, and motor rotation speed errors e and e' input into the controller and i output therefrom are input into the controllerqAnd alpha, toReward r to be earnedtAnd the motor rotating speed error at the next moment is stored in an experience pool as a group of experience data, and the target network acquires the experience data group from the experience pool to update and evaluate network parameters
Figure FDA0003364350040000032
The motor rotating speed error e at the next momentt+1Put into a target strategy network to obtain a determined output iqAnd α is denoted as at+1Then a is addedt+1And et+1Fusing together through a neural convolution network, and jointly using the fused neural convolution network as the input of a target value network to obtain a target network pair at+1The evaluation value Q' is then calculated, and the actual evaluation y of the target network is calculatedtAs in equation (8):
yt=rt+γQ'(et+1,μ'(et+1μ')|θQ') (8)
formula (8):
Figure FDA0003364350040000033
is output at the target strategy muqAnd alpha;
Figure FDA0003364350040000034
is the target evaluation network pair iqAnd evaluation of α;
Figure FDA0003364350040000035
respectively a target strategy and a target evaluation network parameter;
meanwhile, an error function L is established, the error of the online evaluation network is calculated, and the online evaluation network is updated by minimizing the error, as shown in the formula (9):
Figure FDA0003364350040000036
evaluation network parameters by loss function L pairs
Figure FDA0003364350040000037
Derivation, as in equation (10):
Figure FDA0003364350040000038
evaluating network parameter updates as in equation (11):
Figure FDA0003364350040000039
updating the evaluation network through the loss function L, so that the evaluation network can more accurately calculate the reward obtained by the output control quantity of the strategy network, and the ML-PDDA controller outputs i which most meets the actual operation requirement of multiple motorsqAnd [ alpha ]12]The online strategy network and the online evaluation network continuously update network parameters through strategy gradients and loss functions, and the target strategy network and the target evaluation network are updated through a formula (12) in small-batch training, so that the correlation between the cumulative reward Q calculated by the online evaluation network and the cumulative reward Q' calculated by the target evaluation network can be reduced, and the output i of the online strategy network can be improvedqAnd [ alpha ]12]The effectiveness of (2):
Figure FDA0003364350040000041
in equation (12):
Figure FDA0003364350040000042
respectively target policy network output iqAnd [ alpha ]12]Action parameter of (1) and target evaluation network output iqAnd [ alpha ]12]A cost function parameter of (2);
Figure FDA0003364350040000043
respectively, online strategy and online evaluation network parameters; k is the learning rate, and is taken as 0.001.
CN202111381879.5A 2021-11-19 2021-11-19 Multi-motor coupling driving control device and method for underwater robot Active CN114089633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111381879.5A CN114089633B (en) 2021-11-19 2021-11-19 Multi-motor coupling driving control device and method for underwater robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111381879.5A CN114089633B (en) 2021-11-19 2021-11-19 Multi-motor coupling driving control device and method for underwater robot

Publications (2)

Publication Number Publication Date
CN114089633A true CN114089633A (en) 2022-02-25
CN114089633B CN114089633B (en) 2024-04-26

Family

ID=80302617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111381879.5A Active CN114089633B (en) 2021-11-19 2021-11-19 Multi-motor coupling driving control device and method for underwater robot

Country Status (1)

Country Link
CN (1) CN114089633B (en)

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104935217A (en) * 2015-05-29 2015-09-23 天津大学 Improved deviation coupling control method suitable for multi-motor system
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning
WO2018053187A1 (en) * 2016-09-15 2018-03-22 Google Inc. Deep reinforcement learning for robotic manipulation
CN108803321A (en) * 2018-05-30 2018-11-13 清华大学 Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
CN110323981A (en) * 2019-05-14 2019-10-11 广东省智能制造研究所 A kind of method and system controlling permanent magnetic linear synchronous motor
CN110406526A (en) * 2019-08-05 2019-11-05 合肥工业大学 Parallel hybrid electric energy management method based on adaptive Dynamic Programming
CN110597058A (en) * 2019-08-28 2019-12-20 浙江工业大学 Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning
CN110936824A (en) * 2019-12-09 2020-03-31 江西理工大学 Electric automobile double-motor control method based on self-adaptive dynamic planning
US20200327411A1 (en) * 2019-04-14 2020-10-15 Di Shi Systems and Method on Deriving Real-time Coordinated Voltage Control Strategies Using Deep Reinforcement Learning
CN111966118A (en) * 2020-08-14 2020-11-20 哈尔滨工程大学 ROV thrust distribution and reinforcement learning-based motion control method
CN112383248A (en) * 2020-10-29 2021-02-19 浙江大学 Model prediction current control method for dual-motor torque synchronization system
CN112388636A (en) * 2020-11-06 2021-02-23 广州大学 DDPG multi-target genetic self-optimization triaxial delta machine platform and method
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 Auv action plan and operation control method based on reinforcement learning
US10962976B1 (en) * 2019-11-29 2021-03-30 Institute Of Automation, Chinese Academy Of Sciences Motion control method and system for biomimetic robotic fish based on adversarial structured control
CN112631315A (en) * 2020-12-08 2021-04-09 江苏科技大学 Multi-motor cooperative propulsion underwater robot path tracking method
CN113031528A (en) * 2021-02-25 2021-06-25 电子科技大学 Multi-legged robot motion control method based on depth certainty strategy gradient
CN113140104A (en) * 2021-04-14 2021-07-20 武汉理工大学 Vehicle queue tracking control method and device and computer readable storage medium

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104935217A (en) * 2015-05-29 2015-09-23 天津大学 Improved deviation coupling control method suitable for multi-motor system
WO2018053187A1 (en) * 2016-09-15 2018-03-22 Google Inc. Deep reinforcement learning for robotic manipulation
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning
CN108803321A (en) * 2018-05-30 2018-11-13 清华大学 Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
US20200327411A1 (en) * 2019-04-14 2020-10-15 Di Shi Systems and Method on Deriving Real-time Coordinated Voltage Control Strategies Using Deep Reinforcement Learning
CN110323981A (en) * 2019-05-14 2019-10-11 广东省智能制造研究所 A kind of method and system controlling permanent magnetic linear synchronous motor
CN110406526A (en) * 2019-08-05 2019-11-05 合肥工业大学 Parallel hybrid electric energy management method based on adaptive Dynamic Programming
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 Auv action plan and operation control method based on reinforcement learning
CN110597058A (en) * 2019-08-28 2019-12-20 浙江工业大学 Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning
US10962976B1 (en) * 2019-11-29 2021-03-30 Institute Of Automation, Chinese Academy Of Sciences Motion control method and system for biomimetic robotic fish based on adversarial structured control
CN110936824A (en) * 2019-12-09 2020-03-31 江西理工大学 Electric automobile double-motor control method based on self-adaptive dynamic planning
US20210170883A1 (en) * 2019-12-09 2021-06-10 Jiangxi University Of Science And Technology Method for dual-motor control on electric vehicle based on adaptive dynamic programming
CN111966118A (en) * 2020-08-14 2020-11-20 哈尔滨工程大学 ROV thrust distribution and reinforcement learning-based motion control method
CN112383248A (en) * 2020-10-29 2021-02-19 浙江大学 Model prediction current control method for dual-motor torque synchronization system
CN112388636A (en) * 2020-11-06 2021-02-23 广州大学 DDPG multi-target genetic self-optimization triaxial delta machine platform and method
CN112631315A (en) * 2020-12-08 2021-04-09 江苏科技大学 Multi-motor cooperative propulsion underwater robot path tracking method
CN113031528A (en) * 2021-02-25 2021-06-25 电子科技大学 Multi-legged robot motion control method based on depth certainty strategy gradient
CN113140104A (en) * 2021-04-14 2021-07-20 武汉理工大学 Vehicle queue tracking control method and device and computer readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
张振宇;张昱;陈丽;张东波;: "永磁同步直线电机DDPG自适应控制", 微电机, no. 04 *
漆星;郑常宝;张倩;: "基于深度确信策略梯度的电动汽车异步电机参数标定方法", 电工技术学报, no. 20, 25 October 2020 (2020-10-25) *
赵文涛;俞建成;张艾群;李岩;: "基于卫星测高数据的海洋中尺度涡流动态特征检测", 海洋学研究, no. 03 *

Also Published As

Publication number Publication date
CN114089633B (en) 2024-04-26

Similar Documents

Publication Publication Date Title
CN110989576B (en) Target following and dynamic obstacle avoidance control method for differential slip steering vehicle
CN108803321B (en) Autonomous underwater vehicle track tracking control method based on deep reinforcement learning
CN104865829B (en) Multi-robot system distributed self-adaption neutral net continuous tracking control method of electro
Zhang et al. Distributed control of coordinated path tracking for networked nonholonomic mobile vehicles
Wang et al. Dynamic tanker steering control using generalized ellipsoidal-basis-function-based fuzzy neural networks
CN108161934A (en) A kind of method for learning to realize robot multi peg-in-hole using deeply
CN111176116B (en) Closed-loop feedback control method for robot fish based on CPG model
Batra et al. Decentralized control of quadrotor swarms with end-to-end deep reinforcement learning
CN111240345A (en) Underwater robot trajectory tracking method based on double BP network reinforcement learning framework
CN114741886B (en) Unmanned aerial vehicle cluster multi-task training method and system based on contribution degree evaluation
CN111176122B (en) Underwater robot parameter self-adaptive backstepping control method based on double BP neural network Q learning technology
CN111240344B (en) Autonomous underwater robot model-free control method based on reinforcement learning technology
Song et al. Guidance and control of autonomous surface underwater vehicles for target tracking in ocean environment by deep reinforcement learning
CN112947505B (en) Multi-AUV formation distributed control method based on reinforcement learning algorithm and unknown disturbance observer
CN112462792A (en) Underwater robot motion control method based on Actor-Critic algorithm
Fang et al. Autonomous underwater vehicle formation control and obstacle avoidance using multi-agent generative adversarial imitation learning
CN109145451A (en) A kind of the motor behavior identification and track estimation method of high speed glide vehicle
CN112405542B (en) Musculoskeletal robot control method and system based on brain inspiring multitask learning
CN111273677A (en) Autonomous underwater robot speed and heading control method based on reinforcement learning technology
CN107450311A (en) Inversion model modeling method and device and adaptive inverse control and device
Zhao et al. Global path planning and waypoint following for heterogeneous unmanned surface vehicles assisting inland water monitoring
CN109946972A (en) Underwater robot Predictive Control System and method based on on-line study modelling technique
CN113485323A (en) Flexible formation method for cascaded multiple mobile robots
CN112835368A (en) Multi-unmanned-boat collaborative formation control method and system
CN114089633A (en) Multi-motor coupling drive control device and method for underwater robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant