CN114089633A

CN114089633A - Multi-motor coupling drive control device and method for underwater robot

Info

Publication number: CN114089633A
Application number: CN202111381879.5A
Authority: CN
Inventors: 王伟然; 姚杰; 葛慧林; 智鹏飞; 朱志宇; 邱海洋
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-02-25
Anticipated expiration: 2041-11-19
Also published as: CN114089633B

Abstract

The invention relates to the technical field of underwater robot control, in particular to a multi-motor coupling drive control device and a method for an underwater robot. Aiming at the scheme, a multi-vortex Decision drive Algorithm (ML-PDDA) controller is designed to control the rotating speed of each motor, and weight factors are distributed on line for synchronous errors among the motors.

Description

Multi-motor coupling drive control device and method for underwater robot

Technical Field

The invention relates to the technical field of underwater robot control, in particular to a multi-motor coupling driving control device and method for an underwater robot.

Background

The underwater robot is widely applied to the fields of military affairs, underwater resource exploration, underwater search and rescue and the like, and can replace human beings to complete more and more difficult tasks along with the development of science and technology. In each task, it is necessary to control the movement locus and attitude of the underwater robot with high accuracy. However, the underwater environment is complex, and different environments generate different interferences to the movement and operation of the robot, so that a proper control structure needs to be designed, the mutual cooperative work of a plurality of motors is realized, the underwater robot can resist various interferences underwater, and the subsequent operation task is completed according to accurate track movement.

At present, the multi-motor cooperative control mainly comprises the following algorithms:

(1) parallel control

The given rotating speeds of all motors in the control system are the same, and synchronization can be realized only when the loads of all motors are strictly the same. Each motor can only feed back the tracking error of the motor, the synchronous error between the motors is not considered, the motor control units are independent and have no coupling, when a certain motor unit is interfered by the outside, other motors cannot receive interference information, multi-motor coordination control cannot be realized, and the disturbance resistance is poor. Obviously, the method cannot meet the complex underwater environment.

(2) Master-slave control

The relation between the motors is a master-slave relation, the output of the main motor is used as a slave motor rotating speed input reference value, and the slave motor tracks the speed of the main motor. However, the master-slave control system has no feedback mechanism from the slave to the master, if a motor of a certain stage is interfered, all motors of the upper stage of the motor cannot receive the interference information, all control units of the lower stage can make corresponding speed adjustment and transmit the speed adjustment to the lower stage by the same method, so that larger time delay and poor anti-interference performance are brought, and the use of the method is limited due to the defect.

(3) Virtual spindle control

The virtual main shaft control system simulates the synchronous characteristic of a mechanical main shaft. After the input rotating speed signal of the motor is acted by the main shaft, the output signal is used as a given signal of each driving unit, and the driving units track the given signal. Since the signal is a signal obtained by being subjected to the action of the main shaft and filtered, there may be a deviation between the main reference value and the actual rotational speed of the motor.

(4) Cross coupling control

And comparing the speed or position signals of two adjacent motors to obtain a difference value, using the difference value as a system feedback signal, and tracking the feedback signal. The system is able to react to any motor load change. This strategy is not suitable for more than two motors because calculating feedback approximations for more than two motors is cumbersome.

(5) Offset coupling control

The deviation coupling control is to feed back the sum of the error of each motor and all the other motors as a compensation signal, thereby realizing the multi-motor synchronous control. However, the calculation amount is greatly increased, and the method has the problems of saturation failure of the controller and the like in the starting process.

When the underwater robot finishes underwater operation, the underwater robot can not only overcome the interference of the surrounding environment to enable the underwater robot to be in a stable state, but also run according to an expected track. The above multi-motor cooperative control algorithms all aim at keeping the rotation speeds of multiple motors absolutely synchronous, so that the underwater robot can be ensured to operate in a certain attitude all the time, but the underwater robot cannot be ensured to stably change the course to operate as expected.

Disclosure of Invention

In order to solve the technical problems, the invention discloses a multi-motor coupling driving control device and a method for an underwater robot, which are used for ensuring that the underwater robot can resist the interference of the surrounding environment in a complex underwater environment and accurately and stably drive the underwater robot. The device can ensure that the underwater robot stably runs in a complex underwater environment, provides powerful guarantee for underwater operation of the underwater robot, improves the working efficiency of the underwater robot, and also reduces the risk of the underwater operation of workers.

According to the invention, a multi-vortex Decision-Driven Algorithm (ML-PDDA) is used for distributing weight factors to synchronous errors among all motors, and a multi-motor mutual coupling control structure is designed in cooperation with multi-motor different-speed operation. For a single motor control unit, a controller is designed by adopting a multi-eddy current decision-making driving algorithm (ML-PDDA), and the controller is matched with a multi-motor mutual coupling control system to realize the control of the underwater robot driving device.

The invention adopts the following specific technical scheme:

a multi-motor coupling driving control device of an underwater robot is composed of a multi-motor mutual coupling algorithm and a depth certainty strategy gradient algorithm controller, and specifically comprises the following three modules: the system comprises a single motor control module, a synchronous error weight distribution module and a multi-motor mutual coupling control module.

In the technical scheme, a single motor control module is composed of an ML-PDDA algorithm controller and a permanent magnet synchronous motor, the rotating speed error of the motor is used as the input quantity of the controller, the vector control model of the permanent magnet synchronous motor is combined, and the ML-PDDA algorithm strategy network processing is carried out to obtain the control quantity q-axis current of the motor model and a synchronous error weight factor alpha, so that the rotating speed control of the motor is realized, and the cooperative control of the underwater robot driving is realized by matching with a multi-motor mutual coupling control module.

According to the further improvement of the invention, the synchronous error weight distribution module utilizes an evaluation reward mechanism of an ML-PDDA algorithm to set the weight factor of the synchronous error, when the reward generated by the output weight factor is maximum, the optimal weight factor is obtained, and the set synchronous error is input into the controller as a state quantity, so that the coordination condition among multiple motors can be better reflected, the power of a main propulsion motor of the underwater robot, namely the 1 st motor, is maximum, and the expected rotating speed is defined as the reference rotating speed. The actual rotating speed of the ith motor is recorded as n_iThe synchronization error of the ith motor and the rest motors is e_i', take the 1 st motor as an example, the synchronization error e₁' the calculation formula is as in formula (1).

e'₁＝α₁×|n₁-n₂|+α₂×|n₁-n₃| (1)

In equation (1): alpha is alpha₁，α₂Is an error weight factor, n, set by an ML-PDDA algorithm₁，n₂，n₃Are respectivelyActual rotational speed of each motor.

The invention also discloses a multi-motor coupling driving control method of the underwater robot, which specifically comprises the following steps:

step 1: designing a strategy network and an evaluation network;

step 2: constructing a value function;

and 3, step 3: finding an optimal strategy;

and 4, step 4: and updating the evaluation network.

In the step 1 of designing a strategy network and an evaluation network, the strategy network consists of an input layer, two full-connection layers and an output layer, the input quantity of a state input layer comprises tracking errors, synchronous errors and backward differences of all motors and 6 accumulated states, so 6 nodes are set, the full-connection layers are respectively provided with 200 and 200 nodes, and the output layer comprises i_qAnd [ alpha ]₁,α₂]Three control quantities are set, so 3 nodes are set, and the input layer and the output layer both adopt Relu functions as activation functions; the evaluation network structure is similar to the strategy network, 6 error state quantities and 3 output control quantities of the motor are jointly used as input quantities of the evaluation network, the input quantities are fused through the neural convolution network, 9 state quantities are input into the full-connection layer, and finally the output pair control quantity i is output_qAnd [ alpha ]₁,α₂]The number of the nodes of the input layer is set to be the same as that of the policy network, the output layer only has one evaluation value Q, so that the number of the nodes is set to be 1, and the Sigmoid function is adopted as the activation function.

In the step 2, a value function Q (e, a) is constructed to evaluate a motor control quantity Q-axis current i output by the strategy network_qAnd the error weight vector α ═ α₁,α₂]And training a strategy network and an evaluation network, wherein the value function of the strategy mu is as the formula (2).

In equation (2): e.g. of the type_tThe input quantity of the controller at the time t comprises a motor tracking error vector and a synchronous error vector; a is_tIs the control quantity output by the controller according to the input motor speed error at the time t, including i_qAnd α ═ α₁,α₂]；γ^kThe discount factor being k steps is here taken to be 0.99, r_t+kIs that the controller outputs a under the condition of errors e and e_tThe reward to time k is as shown in equation (3).

In equation (3): n is_i(t)Is the actual rotating speed of the ith motor at the moment t; e.g. of a cylinder_i(t)The tracking error of the ith motor at the time t is 0.1, the tracking error is prevented from being 0, and the reward tends to be infinite; e.g. of a cylinder_i(t)' is the synchronization error of the ith motor with other motors.

The reward is increased only when the tracking error and the synchronization error decrease, i.e. the motor speed approaches the desired value and keeps running in coordination, and if the desired value is completely reached, the reward is obtained to be maximum, otherwise the reward is decreased. The maximum reward is obtained when the tracking error and the synchronization error are minimum, the output control quantity of the controller at the moment is considered as the optimal control quantity, i_qAnd [ alpha ]₁,α₂]Is the most suitable for the working requirement of multiple motors at the moment.

In the step 3 for finding the optimal strategy, because the gradient algorithm of the depth deterministic strategy adopts the deterministic strategy, i of each output of the controller_qAnd alpha can be obtained by strategy mu calculation, and an evaluation function J is defined_πAnd (mu) to evaluate the new strategy learned by the current ML-PDDA algorithm, as shown in formula (4).

J_π(μ)＝E[Q^μ(e,μ(e))] (4)

In equation (4): q^μUnder the condition that different motor rotating speed errors are input by a controller, a value function outputs i according to a mu strategy_qAnd the Q value calculated by α, i.e. the cumulative prize earned by the μ strategy, is calculated according to the formula (2).

The optimal strategy, i.e. the strategy mu that can achieve the maximum jackpot, is found by maximizing equation (4), as in equation (5).

μ＝arg max_μJ_π(μ) (5)

Equation (4) for parameters of policy μ

The deviation is calculated to obtain the strategy gradient, as shown in equation (6).

And updating the policy network parameters by adopting a gradient descent method, such as formula (7).

In equation (7): theta^μIs a policy network parameter.

Updating the policy network by applying the policy μ under the maximum jackpot prize to the policy network that generates i that receives the maximum jackpot prize_qAnd [ alpha ]₁,α₂]And (6) updating the direction.

In the step 4, updating an evaluation network, establishing an experience pool, and inputting motor rotating speed errors e and e' of the controller and outputting i_qAnd alpha, corresponding to the prize r earned_tAnd the motor rotating speed error at the next moment is stored in an experience pool as a group of experience data, and the target network acquires the experience data group from the experience pool to update and evaluate network parameters

The motor rotating speed error e at the next moment_t+1Put into a target strategy network to obtain a determined output i_qAnd α is denoted as a_t+1Then a is added_t+1And e_t+1Fusing together through a neural convolution network, and jointly using the fused neural convolution network as the input of a target value network to obtain a target network pair a_t+1The evaluation value Q' is then calculated, and the actual evaluation y of the target network is calculated_tAs in equation (8).

y_t＝r_t+γQ'(e_t+1,μ'(e_t+1|θ^μ')|θ^Q') (8)

Formula (8):

is output at the target strategy mu_qAnd alpha;

is the target evaluation network pair i_qAnd evaluation of α;

respectively, a target policy and a target evaluation network parameter.

And meanwhile, establishing an error function L, calculating the error of the online evaluation network, and updating the online evaluation network by minimizing the error, as shown in the formula (9).

Evaluation network parameters by loss function L pairs

The derivative is obtained as in equation (10).

The network parameter updates are evaluated as in equation (11).

The evaluation network is updated through the loss function L, so that the evaluation network can more accurately calculate the reward obtained by the output control quantity of the strategy network, and the ML-PDDA controller outputs i which most meets the actual operation requirement of multiple motors_qAnd [ alpha ]₁,α₂]. Online policyThe network and the online evaluation network continuously update network parameters through a strategy gradient and a loss function, and the target strategy network and the target evaluation network are updated through a formula (12) in small-batch training, so that the correlation between the cumulative reward Q calculated by the online evaluation network and the cumulative reward Q' calculated by the target evaluation network can be reduced, and the output i of the online strategy network can be improved_qAnd [ alpha ]₁,α₂]The effectiveness of (c).

In equation (12):

respectively target policy network output i_qAnd [ alpha ]₁,α₂]Action parameter of (1) and target evaluation network output i_qAnd [ alpha ]₁,α₂]A cost function parameter of (2);

respectively, online strategy and online evaluation network parameters; k is the learning rate, and is taken as 0.001.

The data set stored in the experience pool is used for training and updating the network parameters of the evaluation strategy, so that the controller outputs a control quantity i_qAnd the error weight alpha acts on the controlled motor, and the rotating speed and the error output of the motor are fed back to the controller, so that the iterative training is completed. When the underwater robot enters an unfamiliar water area, the experience pool can serve as a function of accumulating experience data. Through the accumulation of experience data, when the underwater robot needs to change the course and resist external interference, the controller can quickly output the i capable of generating the maximum reward according to the instruction of the upper computer_qAnd alpha, the tracking error and the synchronization error of the multiple motors are minimized, the multiple motors operate according to the expected rotating speed, and meanwhile, the multiple motors can keep constant rotating speed difference through the coupling of the synchronization error of weight distribution, so that the controller has quick dynamic response and strong robustness.

The invention has the beneficial effects that:

1. the virtual main shaft, cross coupling and deviation coupling of the existing multi-motor control scheme are that each motor gives the same expected rotating speed, and the tracking error and the synchronization error are eliminated through a controller, so that the rotating speed of each motor is kept synchronous. This patent is at first for every motor sets up the rotational speed proportionality coefficient for each motor can obtain different expectation rotational speeds according to underwater robot when needs change the course, distributes synchronous error weight and the rotational speed cooperation of difference simultaneously, guarantees many motors intercoupling, strengthens many motor system's interference killing feature. In the conventional multi-motor cooperative control scheme, the weight is distributed to the synchronous error of the motor rotating speed, the purpose is to ensure that the rotating speeds of all the motors are the same, and online setting can not be carried out according to the actual running condition of the motors, so that the underwater robot cannot adjust the course in time and resist the interference of the environment, and therefore, the multi-motor mutual coupling control device designed by the patent is more suitable for the actual working environment of the underwater robot and has stronger pertinence;

2. the multi-vortex decision-driven algorithm (ML-PDDA) effectively solves the problem of sequence decision under a high-dimensional state space by utilizing the perception capability of deep learning. The existing control methods applied to the field of multi-motor cooperative control comprise fuzzy logic, neural networks, model predictive control and the like, the methods need a large amount of past empirical data and complex mathematical models, the convergence rate is low, the interference of an underwater environment on an underwater robot is nonlinear, and the change of internal parameters of a system makes it difficult to establish a proper mathematical model and obtain a good control effect. The ML-PDDA algorithm introduces water flow disturbance on the basis of the PDDA algorithm, simulates water flow to train a strategy network through a plurality of Lamb vortexes, explores a strategy more suitable for an underwater environment, improves training efficiency and stability, and enables the underwater robot to better adapt to the interference of water flow during movement. The ML-PDDA algorithm has good online learning capacity, can learn a mathematical model of the motor according to input and output data of the motor, and uses an online network and a target network, so that the learning process is more stable, and the convergence rate of the model is higher;

3. the mutual coupling control of multiple motors is matched with a multi-eddy decision drive algorithm (ML-PDDA), so that the rotation speed of each motor can be independently changed, and the underwater robot driving device is cooperatively controlled through the mutual coupling of relatively improved synchronous errors. The underwater robot can flexibly change the course according to the instruction of the control system and resist the interference around.

Drawings

FIG. 1 is a schematic diagram of a multi-motor mutually coupled ML-PDDA control algorithm control device.

Fig. 2 is a structural diagram of synchronization error calculation in the present invention.

FIG. 3 is a flow chart of the synchronous error weight factor setting in the present invention.

FIG. 4 is a diagram of a multi-vortex decision-driven algorithm (ML-PDDA) network architecture in accordance with the present invention.

Fig. 5 is a diagram of a policy network architecture in the present invention.

Fig. 6 is a diagram illustrating an evaluation network structure in the present invention.

Detailed Description

For the purpose of enhancing understanding of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and examples, which are provided for illustration only and are not intended to limit the scope of the present invention.

The embodiment provides an underwater robot multi-motor coupling drive control device which comprises three modules: the system comprises a single motor control module, a synchronous error weight distribution module and a multi-motor mutual coupling control module. The single motor control module is composed of an ML-PDDA algorithm controller and a permanent magnet synchronous motor, the rotating speed error of the motor is used as the input quantity of the controller, the vector control model of the permanent magnet synchronous motor is combined, the control quantity q-axis current and the synchronous error weight factor alpha of the motor model are obtained through ML-PDDA algorithm strategy network processing, the rotating speed control of the motor is achieved, and the underwater robot driving cooperative control is achieved by matching with a multi-motor mutual coupling control module, as shown in figure 1.

The drive control device of the underwater robot with the multiple motors coupled with each other is designed in the embodiment, the upper computer passes through the actual running route of the underwater robot, the expected rotating speed of the main propulsion motor is used as a reference, and the rotating speed proportion of the rest side propulsion motor and the rest pitch propulsion motor is adjusted, so that the multiple motors cooperatively complete the drive of the underwater robot at different rotating speeds. In the actual power configuration of the underwater robot, a plurality of power propulsion motors and posture control motors (side-push and pitch control motors) may be provided, and the motors may be decomposed and combined in the coordinate system of the underwater robot, and are classified according to the acting force directions: the main propulsion motor, the side propulsion motor and the pitching motor. For the convenience of the description of the subsequent content in this patent, define the main propulsion motor after the thrust normalization process as the 1 st motor, the side propulsion motor after the thrust normalization process as the 2 nd motor, and the pitch motor after the thrust normalization process as the 3 rd motor.

Desired speed n in fig. 1_refThe constant rotation speed of the 1 st motor is taken as a reference, the upper computer passes through the motion path of the underwater robot, and the rotation speed ratio R of the

motors

2 and 3 is constant₂、R₃Making adjustments to obtain the actual expected speed n of each motor_ref1,2,3And then with the actual speed n of the motor_1,2,3And performing subtraction to obtain a tracking error e. Since the actual expected rotation speed of each motor is different, the synchronization error of each motor cannot be directly calculated, so that the weighting factor α is assigned to the synchronization error between different motors in fig. 1, and the synchronization error e' is calculated. Three states of e and e' are selected as input quantities of the controller, namely errors, backward differences of the errors and accumulation. The learning capability of the ML-PDDA algorithm is utilized to process six input state quantities and output a control quantity q-axis current i of the motor_qAnd the error weight alpha is used for finishing the accurate control of the motor and realizing the different-speed cooperative driving of the underwater robot.

The underwater robot reaches an underwater operation position from the water surface according to an instruction of a central control system, the underwater robot needs to go down, go forward, float up and the like, the moving direction of the underwater robot needs to be adjusted for many times in the process, and the rotating speed of each motor needs to be changed because the underwater robot does not have a rudder, so that the rotating speed difference is formed among the motors, the steering/posture adjusting thrust is formed, and the underwater robot moves according to a specified track. The traditional multi-motor cooperative control device is generally applied to the field of chemical engineering, and each motor in a system is required to keep the same speed, so that the traditional multi-motor cooperative control device cannot meet the control requirements of time-varying and dynamic response requirements and different-speed adjustment of an underwater robot.

When the underwater robot turns or adjusts the posture, the expected rotating speeds of the three motors are different, so the method designs a synchronous error weight distribution module, utilizes an evaluation reward mechanism of an ML-PDDA algorithm to set the weight factors of the synchronous errors, obtains the optimal weight factors when the reward generated by the output weight factors is maximum, and inputs the set synchronous errors into a controller as state quantity to better reflect the cooperation condition among the multiple motors.

The power of the main propulsion motor of the underwater robot, namely the No. 1 motor, is maximum, so that the expected rotating speed of the underwater robot is defined as the reference rotating speed. The actual rotating speed of the ith motor is recorded as n_iThe synchronization error of the ith motor and the rest motors is e_i', take the 1 st motor as an example, the synchronization error e₁' the calculation formula is as in formula (1).

e'₁＝α₁×|n₁-n₂|+α₂×|n₁-n₃| (1)

In equation (1): alpha is alpha₁，α₂Is an error weight factor, n, set by an ML-PDDA algorithm₁，n₂，n₃The actual rotational speeds of the motors are respectively.

In equation (1), the error weight factor α₁、α₂The reward under different weight factors is calculated through a reward mechanism as a formula (3) by setting a strategy network of an ML-PDDA controller, and the maximum reward can be obtained only when the tracking error and the synchronization error are reduced. When the expected given rotating speeds of the three motors are consistent, the weight of the synchronous error among the multiple motors does not need to be changed; when the expected given rotating speeds of the three motors are inconsistent, the constant rotating speed difference between the multiple motors can be kept for cooperative operation by setting the synchronous error weight factor.

The synchronization error calculation module is shown in fig. 2.

In fig. 2, the actual synchronization error between the motor 1 and the

motors

2 and 3 is calculated, the error weighting factors obtained by ML-PDDA algorithm setting are used to recombine the actual synchronization error to obtain a new synchronization error e ', the backward difference Δ e' and the accumulated Σ e 'of e' are calculated, and the three state quantities are used as the input quantities of the ML-PDDA controller and also used as feedback.

The error weight factor is set by using the learning ability of the ML-PDDA algorithm, and the value function accumulated reward is maximized through the constructed value function and reward mechanism as shown in FIG. 3. Will award r_tAs an index for evaluating the optimal motor control effect of the ML-PDDA controller, when the controller trains data, a small-batch training mode is adopted, and the length of each small batch is the total training time T_fAnd controller sampling time T_sIs rounded upwards, i.e. T_f/T_s. Each training batch receives a corresponding reward r_tWill award the maximum r_tmaxAnd setting the optimal quantity, and outputting a synchronous error weight factor alpha at the moment to complete the control of the multi-motor mutual coupling control device on the driving of the underwater robot.

FIG. 4 is a diagram of the ML-PDDA controller in FIG. 1, which can effectively solve the problem of high-dimensional state space sequence decision by using the strong learning ability of ML-PDDA, selects the synchronous error and tracking error of the motor as state quantities, and outputs the control quantity q-axis current i of the motor through an ML-PDDA algorithm strategy network_qAnd the error weight vector α ═ α₁,α₂]. When the strategy network is trained, a plurality of Lamb vortexes are introduced to be superposed to simulate disturbance of water flow, the strategy network is trained by introducing the water flow disturbance, the controller is guided to explore strategies according to the working environment of the underwater robot, the training effectiveness is improved, and the ML-PDDA controller can be better adapted to the underwater environment. The method adopts a small batch training mode, and the length of each small batch is the total training time T_fAnd controller sampling time T_sIs rounded upwards, i.e. T_f/T_s. Each training batch receives a reward r_tBy T_fMaximum reward r over time_tmaxTo determine whether the control amount output by the controller is optimal. The rotating speed of the machine is controlled according to the formula (3)When the tracking error and the synchronization error are minimum, the maximum reward can be obtained, the actual rotating speed is closest to the expected rotating speed, the multi-motor synchronization effect is best, and the control quantity i at the moment is output_qAnd alpha, the control effect in the state is considered to be optimal, and the optimal control effect is stored in an experience pool.

Because the working environment of the underwater robot is complicated, when the course of the underwater robot needs to be changed, the input expected rotating speed of each motor is adjusted only by the upper computer, and the control quantity i of the motor is output by the ML-PDDA controller_qIt is also difficult to make the underwater robot resistant to surrounding nonlinear interference, where multiple motors are required to cooperate with each other. When the course is changed, the controller can distribute a weight factor to the synchronous error of the motor to obtain a new synchronous error, and the new synchronous error is input into the controller as a state quantity, and the controller outputs i by combining the tracking error and the synchronous error_qTherefore, constant rotating speed difference can be kept among all motors, all motors are coupled with each other by calculating new synchronous errors, and multi-motor cooperative control is realized. And when the underwater robot enters the next navigation state, the upper computer changes the expected rotating speed of the motor, and the ML-PDDA controller distributes new weight to the synchronous error so that the underwater robot moves according to the set route.

Step 1: design policy network and evaluation network

The strategy network comprises an input layer, two full-connection layers and an output layer, wherein the input quantity of the state input layer comprises tracking errors, synchronous errors, backward differences and accumulated 6 states of each motor, so 6 nodes are arranged, the full-connection layers are respectively provided with 200 and 200 nodes, and the output layer comprises i_qAnd [ alpha ]₁,α₂]Three control quantities, 3 nodes are set, and the Relu function is adopted by the input layer and the output layer as the activation function.

The evaluation network structure is similar to the strategy network, 6 error state quantities and 3 output control quantities of the motor are jointly used as input quantities of the evaluation network, the input quantities are fused through the neural convolution network, 9 state quantities are input into the full-connection layer, and finally the output pair control quantity i is output_qAnd [ alpha ]₁,α₂]The number of input layer nodes is set to be the same as that of the policy network, and the evaluation value Q is outputThe layer has only one evaluation value Q, so the node number is set to 1, and the Sigmoid function is adopted as the activation function.

Step 2: constructing a value function

Constructing a value function Q (e, a) to evaluate a motor control quantity Q-axis current i output by the strategy network_qAnd the error weight vector α ═ α₁,α₂]And training a strategy network and an evaluation network, wherein the value function of the strategy mu is as the formula (2).

In equation (3): n is_i(t)Is the actual rotating speed of the ith motor at the moment t; e.g. of the type_i(t)The tracking error of the ith motor at the time t is 0.1, the tracking error is prevented from being 0, and the reward tends to be infinite; e.g. of the type_i(t)' is the synchronization error of the ith motor with other motors.

The reward is increased only when the tracking error and the synchronization error decrease, i.e. the motor speed approaches the desired value and keeps running in coordination, and if the desired value is completely reached, the reward is obtained to be maximum, otherwise the reward is decreased. When the tracking error and the synchronization error are minimum, the reward is maximum, the output control quantity of the controller at the moment is considered as the optimal control quantity, i_qAnd [ alpha ]₁,α₂]Is the most suitable for the working requirement of multiple motors at the moment.

And 3, step 3: finding optimal strategies

Since the deep deterministic strategy gradient algorithm employs a deterministic strategy, i is output by the controller at a time_qAnd alpha can be obtained by strategy mu calculation, and an evaluation function J is defined_πAnd (mu) to evaluate the new strategy learned by the current ML-PDDA algorithm, as shown in formula (4).

J_π(μ)＝E[Q^μ(e,μ(e))] (4)

The optimal strategy, i.e. the strategy mu that can achieve the maximum jackpot, is found by maximizing equation (4), as equation (5).

μ＝arg max_μJ_π(μ) (5)

Equation (4) for parameters of policy μ

In equation (7): theta^μIs a policy network parameter.

And 4, step 4: updating an evaluation network

Establishing an experience pool, inputting motor rotating speed errors e and e' of a controller, and outputting i_qAnd alpha, corresponding to the prize r earned_tAnd the motor rotating speed error at the next moment is stored in an experience pool as a group of experience data, and the target network acquires the experience data group from the experience pool to update and evaluate network parameters

y_t＝r_t+γQ'(e_t+1,μ'(e_t+1|θ^μ')|θ^Q') (8)

Formula (8):

is output at the target strategy mu_qAnd alpha;

is the target evaluation network pair i_qAnd evaluation of α;

respectively, a target policy and a target evaluation network parameter.

Evaluation network parameters by loss function L pairs

The derivative is obtained as in equation (10).

The network parameter updates are evaluated as in equation (11).

Updating the evaluation network through the loss function L, so that the evaluation network can more accurately calculate the reward obtained by the output control quantity of the strategy network, and the ML-PDDA controller outputs i which most meets the actual operation requirement of multiple motors_qAnd [ alpha ]₁,α₂]. The online strategy network and the online evaluation network continuously update network parameters through strategy gradients and loss functions, and the target strategy network and the target evaluation network are updated through a formula (12) in small-batch training, so that the correlation between the cumulative reward Q calculated by the online evaluation network and the cumulative reward Q' calculated by the target evaluation network can be reduced, and the output i of the online strategy network can be improved_qAnd [ alpha ]₁,α₂]The effectiveness of (c).

In equation (12):

respectively target policy network output i_qAnd [ alpha ]₁,α₂]Operation parameter of (2), target evaluation network output i_qAnd [ alpha ]₁,α₂]A cost function parameter of (2);

The data set stored in the experience pool is used for training and updating the network parameters of the evaluation strategy so that the controller outputs a control quantity i_qAnd the error weight alpha acts on the controlled motor, and the rotating speed and the error output of the motor are fed back to the controller, so that the iterative training is completed. When the underwater robot enters an unfamiliar water area, the experience pool can serve as a function of accumulating experience data. Through the accumulation of empirical data, when the underwater robot needs to be changedCourse and resistance to external interference, the controller can rapidly output i capable of generating the maximum reward according to the instruction of the upper computer_qAnd alpha, the tracking error and the synchronization error of the multiple motors are minimized, the multiple motors operate according to the expected rotating speed, and meanwhile, the multiple motors can keep constant rotating speed difference through the coupling of the synchronization error of weight distribution, so that the controller has quick dynamic response and strong robustness.

The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A multi-motor coupling driving control device of an underwater robot is composed of a multi-motor mutual coupling algorithm and a depth certainty strategy gradient algorithm controller, and is characterized by comprising three modules: the system comprises a single motor control module, a synchronous error weight distribution module and a multi-motor mutual coupling control module.

2. The underwater robot multi-motor coupling drive control device as claimed in claim 1, wherein the single motor control module is composed of an ML-PDDA algorithm controller and a permanent magnet synchronous motor.

3. The underwater robot multi-motor coupling drive control device according to claim 2, wherein the synchronous error weight distribution module adjusts the weight factor of the synchronous error by using an evaluation reward mechanism of an ML-PDDA algorithm, obtains the optimal weight factor when the reward generated by the output weight factor is the maximum, inputs the adjusted synchronous error into the controller as a state quantity, and can better reflect the cooperation condition among the multiple motors, and the main propulsion motor of the underwater robot is the 1 st stationThe desired speed is defined as the reference speed since the motor has the maximum power, and the actual speed of the ith motor is recorded as n_iThe synchronization error of the ith motor and the rest motors is e_i', take the 1 st motor as an example, the synchronization error e₁' the calculation formula is as in formula (1).

e′₁＝α₁×|n₁-n₂|+α₂×|n₁-n₃| (1)

4. A multi-motor coupling drive control method of an underwater robot is characterized in that the multi-motor coupling drive control device of the underwater robot is used, and the method specifically comprises the following steps:

step 1: designing a strategy network and an evaluation network;

step 2: constructing a value function;

and 3, step 3: finding an optimal strategy;

and 4, step 4: and updating the evaluation network.

5. The underwater robot multi-motor coupling drive control method according to claim 4, wherein in the step 1 of designing the strategy network and the evaluation network, the strategy network is composed of an input layer, two full-connection layers and an output layer, and the input quantity of a state input layer is provided with 6 nodes: including tracking error, synchronous error and the backward difference and the 6 states of accumulation of each motor, the full articulamentum sets up 200 and 200 nodes respectively, and the output layer sets up 3 nodes: includes i_qAnd [ alpha ]₁,α₂]The input layer and the output layer adopt Relu functions as activation functions; the evaluation network structure takes 6 error state quantities and 3 output control quantities of the motor as input quantities of the evaluation network, the input quantities are fused through the neural convolution network, 9 state quantities are input into the full-connection layer, and finally the control quantity i of the output pair is output_qAnd [ alpha ]₁,α₂]The number of the nodes of the input layer is set to be the same as that of the policy network, the output layer only has one evaluation value Q, the number of the nodes is set to be 1, and the activation function adopts a Sigmoid function.

6. The underwater robot multi-motor coupling drive control method according to claim 5, characterized in that in the step 2, a value function Q (e, a) is constructed to evaluate a motor control quantity Q-axis current i output by the strategy network_qAnd the error weight vector α ═ α₁,α₂]And training a strategy network and an evaluation network, wherein the value function of the strategy mu is as the formula (2):

in equation (2): e.g. of the type_tThe input quantity of the controller at the time t comprises a motor tracking error vector and a synchronous error vector; a is_tIs the control quantity output by the controller according to the input motor speed error at the time t, including i_qAnd α ═ α₁,α₂]；γ^kThe discount factor being k steps is here taken to be 0.99, r_t+kIs that the controller outputs a under the condition of errors e and e_tThe reward to time k, as shown in equation (3):

7. The underwater robot multi-motor coupling drive control method according to claim 6, wherein in the step 3 of finding the optimal strategy, a score is definedPrice function J_π(μ) to evaluate the new strategy learned by the current ML-PDDA algorithm, as formula (4):

J_π(μ)＝E[Q^μ(e,μ(e))] (4)

in equation (4): q^μUnder the condition that different motor rotating speed errors are input by a controller, a value function outputs i according to a mu strategy_qAnd the Q value calculated by alpha, i.e. the cumulative prize earned by the mu strategy, is calculated according to the formula (2),

finding the optimal strategy, i.e. the strategy μ that can achieve the maximum jackpot, according to maximizing equation (4), is shown in equation (5):

μ＝arg max_μJ_π(μ) (5)

equation (4) for parameters of policy μ

Calculating the partial derivative to obtain a strategy gradient, as shown in formula (6):

updating the policy network parameters by adopting a gradient descent method, as shown in formula (7):

in equation (7): theta^μIs a policy network parameter;

8. The underwater robot multi-motor coupling drive control method as claimed in claim 7, wherein in the 4 th step updating evaluation network, an experience pool is established, and motor rotation speed errors e and e' input into the controller and i output therefrom are input into the controller_qAnd alpha, toReward r to be earned_tAnd the motor rotating speed error at the next moment is stored in an experience pool as a group of experience data, and the target network acquires the experience data group from the experience pool to update and evaluate network parameters

The motor rotating speed error e at the next moment_t+1Put into a target strategy network to obtain a determined output i_qAnd α is denoted as a_t+1Then a is added_t+1And e_t+1Fusing together through a neural convolution network, and jointly using the fused neural convolution network as the input of a target value network to obtain a target network pair a_t+1The evaluation value Q' is then calculated, and the actual evaluation y of the target network is calculated_tAs in equation (8):

y_t＝r_t+γQ'(e_t+1,μ'(e_t+1|θ^μ')|θ^Q') (8)

formula (8):

is output at the target strategy mu_qAnd alpha;

is the target evaluation network pair i_qAnd evaluation of α;

respectively a target strategy and a target evaluation network parameter;

meanwhile, an error function L is established, the error of the online evaluation network is calculated, and the online evaluation network is updated by minimizing the error, as shown in the formula (9):

evaluation network parameters by loss function L pairs

Derivation, as in equation (10):

evaluating network parameter updates as in equation (11):

updating the evaluation network through the loss function L, so that the evaluation network can more accurately calculate the reward obtained by the output control quantity of the strategy network, and the ML-PDDA controller outputs i which most meets the actual operation requirement of multiple motors_qAnd [ alpha ]₁,α₂]The online strategy network and the online evaluation network continuously update network parameters through strategy gradients and loss functions, and the target strategy network and the target evaluation network are updated through a formula (12) in small-batch training, so that the correlation between the cumulative reward Q calculated by the online evaluation network and the cumulative reward Q' calculated by the target evaluation network can be reduced, and the output i of the online strategy network can be improved_qAnd [ alpha ]₁,α₂]The effectiveness of (2):

in equation (12):