CN114089633B

CN114089633B - Multi-motor coupling driving control device and method for underwater robot

Info

Publication number: CN114089633B
Application number: CN202111381879.5A
Authority: CN
Inventors: 王伟然; 姚杰; 葛慧林; 智鹏飞; 朱志宇; 邱海洋
Original assignee: Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2024-04-26
Anticipated expiration: 2041-11-19
Also published as: CN114089633A

Abstract

The invention relates to the technical field of underwater robot control, in particular to a multi-motor coupling driving control device and method for an underwater robot. Aiming at the scheme, the invention designs a Multiple vortex decision driving algorithm (ML-Policy Decision Driven Algorithm, ML-PDDA) controller to control the rotating speed of each motor and distributes weight factors on line for synchronous errors among the motors.

Description

Multi-motor coupling driving control device and method for underwater robot

Technical Field

The invention relates to the technical field of underwater robot control, in particular to a multi-motor coupling driving control device and method for an underwater robot.

Background

The underwater robot is widely applied to a plurality of fields such as military, underwater resource exploration, underwater search and rescue and the like, and with the development of technology, the underwater robot can replace human beings to finish more tasks with higher difficulty. When each task is performed, the motion trail, posture maintenance and the like of the underwater robot need to be controlled with high precision. However, the underwater environment is complex, different environments generate different interferences to the movement and operation of the robot, so that a proper control structure needs to be designed to realize the mutual cooperative work of a plurality of motors, ensure that the underwater robot can resist various underwater interferences, move according to a precise track and finish subsequent operation tasks.

At present, the cooperative control of multiple motors mainly comprises the following algorithms:

(1) Parallel control

The given rotation speed of each motor in the control system is the same, and the synchronization can be realized only when the loads of the motors are strictly the same. Each motor can only feed back tracking errors of the motors, synchronization errors among the motors are not considered, each motor control unit is independent and has no coupling, when a certain motor unit is interfered by the outside, other motors cannot receive interference information, multi-motor coordination control cannot be achieved, and disturbance resistance is poor. Obviously, the method cannot meet the complex underwater environment.

(2) Master-slave control

The relation between the motors is a master-slave relation, the output of the master motor is used as the input reference value of the rotating speed of the slave motor, and the speed tracking of the slave motor to the master motor is realized. However, the master-slave control system has no feedback mechanism from the slave to the host, if a motor at a certain level is interfered, all motors at the upper level of the motor cannot receive interference information, all control units at the next level can make corresponding speed adjustment, and the control units are transmitted to the next level by the same method, so that larger delay is caused, the interference resistance is poor, and the defect also limits the use of the method.

(3) Virtual spindle control

The virtual spindle control system simulates the characteristics of the mechanical total axis synchronization. After the input rotation speed signal of the motor is acted by the total shaft, the output signal is used as a given signal of each driving unit, and the driving units track the given signal. Because the signal is a signal that is obtained by the action of the total axis and by filtering, there may be a deviation of the main reference value from the actual rotational speed of the motor.

(4) Cross-coupling control

And comparing the speed or position signals of two adjacent motors to make a difference, taking the difference value as a feedback signal of the system, and tracking the feedback signal. The system can reflect the load change of any motor. But this strategy is not applicable to more than two motors because it is very cumbersome to calculate feedback approximations for more than two motors.

(5) Bias coupling control

The deviation coupling control is to feed back the sum of errors of each motor and all other motors as a compensation signal, so that the synchronous control of multiple motors is realized. However, the calculated amount is greatly increased, and the method has the problems of saturation failure of the controller and the like in the starting process.

When the underwater robot finishes underwater operation, the underwater robot is in a stable state by overcoming the interference of surrounding environment and also moves according to an expected track. The multiple-motor cooperative control algorithm aims at keeping the absolute synchronization of the rotating speeds of the multiple motors, can ensure that the underwater robot always operates in a certain posture, but cannot ensure that the underwater robot stably changes the course operation according to the expected.

Disclosure of Invention

In order to solve the technical problems, the invention discloses a multi-motor coupling driving control device and a multi-motor coupling driving control method for an underwater robot, which are used for guaranteeing that the underwater robot can resist the interference of surrounding environments in a complex underwater environment and accurately and stably drive the underwater robot. The device can enable the underwater robot to stably run in a complex underwater environment, provides powerful guarantee for the underwater operation of the underwater robot, improves the working efficiency of the underwater robot, and reduces the risk of the underwater operation of workers.

The invention distributes weight factors to synchronous errors among motors through a Multiple Lamb-Policy Decision Driven Algorithm (ML-PDDA) decision driving algorithm, and designs a multi-motor mutual coupling control structure by matching with the abnormal-speed running of the Multiple motors. For a single motor control unit, a controller is designed by adopting a multi-vortex decision-making driving algorithm (ML-PDDA), and the controller is matched with a multi-motor mutual coupling control system to realize the control of the underwater robot driving device.

The invention adopts the following specific technical scheme:

The multi-motor coupling driving control device of the underwater robot consists of a multi-motor mutual coupling algorithm and a depth deterministic strategy gradient algorithm controller, and specifically comprises the following three modules: the system comprises a single motor control module, a synchronous error weight distribution module and a multi-motor mutual coupling control module.

In the technical scheme, the single motor control module consists of the ML-PDDA algorithm controller and the permanent magnet synchronous motor, the rotating speed error of the motor is used as the input quantity of the controller, the vector control model of the permanent magnet synchronous motor is combined, the q-axis current of the control quantity of the motor model and the synchronous error weight factor alpha are obtained through the ML-PDDA algorithm strategy network processing, the rotating speed control of the motor is realized, and the underwater robot driving cooperative control is realized by matching with the multi-motor mutual coupling control module.

According to the invention, the synchronization error weight distribution module utilizes an evaluation rewarding mechanism of an ML-PDDA algorithm to set the weight factor of the synchronization error, when the rewarding generated by the output weight factor is maximum, the optimal weight factor is obtained, the set synchronization error is used as a state quantity to be input into the controller, the coordination condition among multiple motors can be better reflected, and the power of the main propulsion motor of the underwater robot, namely the 1 st motor, is maximum, so that the expected rotating speed is defined as the reference rotating speed. The actual rotation speed of the ith motor is recorded as n _i, the synchronization error of the ith motor and the rest motors is e _i ', taking the 1 st motor as an example, and the calculation formula of the synchronization error e ₁' is shown as formula (1).

e'₁＝α₁×|n₁-n₂|+α₂×|n₁-n₃| (1)

In formula (1): alpha ₁,α₂ is an error weight factor set by the ML-PDDA algorithm, and n ₁,n₂,n₃ is the actual rotation speed of each motor respectively.

The invention also discloses a multi-motor coupling driving control method of the underwater robot, which comprises the following steps:

step 1: designing a strategy network and an evaluation network;

Step 2: constructing a value function;

Step 3: searching an optimal strategy;

step 4: updating the evaluation network.

In the step 1 design strategy network and evaluation network, the strategy network is composed of an input layer, two full-connection layers and an output layer, wherein the input quantity of the state input layer comprises tracking errors, synchronous errors and 6 states which are differentiated and accumulated later of each motor, so 6 nodes are arranged, 200 and 200 nodes are respectively arranged on the full-connection layer, the output layer comprises three control quantities of i _q and [ alpha ₁,α₂ ], 3 nodes are arranged, and the input layer and the output layer both adopt Relu functions as activation functions; the evaluation network structure is similar to a strategy network, 6 error state quantities and 3 output control quantities of a motor are used as input quantities of the evaluation network, the input quantities are fused through a nerve convolution network, 9 state quantities are input into a full-connection layer, finally evaluation values Q of the control quantities i _q and [ alpha ₁,α₂ ] are output, the number of nodes of the input layer is the same as that of the strategy network, only one evaluation value Q is arranged on the output layer, so that the number of nodes is set to be 1, and a Sigmoid function is adopted as an activation function.

In the 2nd step, a value function Q (e, a) is constructed to evaluate the motor control quantity Q-axis current i _q and the error weight vector α= [ α ₁,α₂ ] output by the strategy network, and the strategy network and the evaluation network are trained, and the value function of the strategy μ is as in formula (2).

In formula (2): e _t is the input quantity of the t-time controller, including a motor tracking error vector and a synchronization error vector; a _t is a control amount output by the controller according to the input motor rotation speed error at time t, including i _q and α= [ α ₁,α₂];γ^k are discount factors of k steps, here taken as 0.99, and r _t+k is a reward of the controller from a _t to k times in the state of errors e and e', as shown in formula (3).

In formula (3): n _i(t) is the actual rotation speed of the ith motor at the time t; e _i(t) is the tracking error of the ith motor at the moment t, 0.1 is the tracking error prevention value 0, and the rewards tend to infinity; e _i(t)' is the synchronization error of the ith motor with the other motors.

Only when the tracking error and the synchronization error decrease, i.e. the motor speed approaches the desired value and keeps running in coordination, will the prize increase, and if the desired value is reached completely, the prize is maximized, otherwise the prize decreases. The prize is maximized when the tracking error and the synchronization error are minimized, and the output control amount of the controller is considered to be the optimal control amount, i _q and [ alpha ₁,α₂ ] are the most suitable operation requirements of the motors at the moment.

In the step 3 of searching the optimal strategy, since the depth deterministic strategy gradient algorithm adopts a deterministic strategy, i _q and alpha output by the controller each time can be obtained through calculation of strategy mu, and an evaluation function J _π (mu) is defined to evaluate a new strategy learned by the current ML-PDDA algorithm, as shown in a formula (4).

J_π(μ)＝E[Q^μ(e,μ(e))] (4)

In formula (4): q ^μ is the Q value calculated by the value function according to μ strategy output i _q and α, i.e., the cumulative prize obtained by μ strategy, under the control input of different motor speed errors, as calculated in equation (2).

The optimal strategy is found based on maximizing the value of equation (4), i.e., strategy μ that achieves the maximum jackpot, as in equation (5).

μ＝arg max_μJ_π(μ) (5)

Equation (4) vs. parameters of strategy μAnd obtaining a strategy gradient by deviator, such as a formula (6).

And updating strategy network parameters by adopting a gradient descent method, as shown in a formula (7).

In formula (7): θ ^μ is a policy network parameter.

The policy network is updated by solving for the policy mu at the maximum jackpot, such that the policy network is updated in the directions i _q and [ alpha ₁,α₂ ] that result in the maximum prize being achieved.

In the step 4 updating evaluation network, an experience pool is established, the motor rotation speed errors e and e' input into the controller, the output i _q and alpha, the corresponding obtained rewards r _t and the motor rotation speed error at the next moment are used as a group of experience data and stored in the experience pool, and the target network acquires an experience data group from the experience pool to update the evaluation network parameters

And putting the motor rotation speed error e _t+1 at the next moment into a target strategy network to obtain a determined output i _q and alpha which are marked as a _t+1, fusing the a _t+1 and the e _t+1 together through a neural convolution network and jointly taking the fused result as the input of a target value network to obtain an evaluation value Q' of the target network to the a _t+1, and calculating the actual evaluation y _t of the target network as shown in a formula (8).

y_t＝r_t+γQ'(e_t+1,μ'(e_t+1|θ^μ')|θ^Q') (8)

Equation (8): I _q and α output at the target policy μ'; /(I) Is the evaluation of i _q and alpha by the target evaluation network; /(I)The target policy and the target evaluation network parameters are respectively.

And meanwhile, an error function L is established, the error of the online evaluation network is calculated, and the online evaluation network is updated by minimizing the error, as shown in a formula (9).

The loss function L pair evaluates network parametersDerivative, as in equation (10).

The evaluation network parameter is updated as in equation (11).

The evaluation network is updated through the loss function L, so that the evaluation network can calculate rewards obtained by the control quantity output by the strategy network more accurately, and the ML-PDDA controller outputs i _q and [ alpha ₁,α₂ ] which are most in line with the actual running requirements of multiple motors. The online strategy network and the online evaluation network continuously update network parameters through the strategy gradient and the loss function, and the target strategy network and the target evaluation network are updated through a formula (12) in the small-batch training, so that the correlation between the accumulated rewards Q calculated by the online evaluation network and the accumulated rewards Q' calculated by the target evaluation network can be reduced, and the effectiveness of the online strategy network outputs i _q and [ alpha ₁,α₂ ] can be improved.

In formula (12): The action parameters of the target strategy network output i _q and [ alpha ₁,α₂ ] and the cost function parameters of the target evaluation network output i _q and [ alpha ₁,α₂ ] are respectively; /(I) Online strategies and online evaluation network parameters are respectively carried out; k is learning rate, and 0.001 is taken.

The data set stored in the experience pool is used for training and updating the evaluation strategy network parameters, so that the controller outputs the control quantity i _q and the error weight alpha to act on the controlled motor, and the rotating speed and the error output of the motor are fed back to the controller, thereby completing iterative training. When the underwater robot enters a strange water area, the experience pool can serve the function of accumulating experience data. Through accumulation of experience data, when the underwater robot needs to change the course and resist external interference, the controller can quickly output i _q and alpha which can generate maximum rewards according to the instruction of the upper computer, so that the multi-motor tracking error and the synchronization error are minimum and operate according to the expected rotating speed, and meanwhile, the constant rotating speed difference can be kept through coupling of the weight distribution synchronization errors among the motors, so that the controller has quick dynamic response and stronger robustness.

The invention has the beneficial effects that:

1. The virtual main shaft, the cross coupling and the deviation coupling of the existing multi-motor control scheme are that each motor is given the same expected rotating speed, and the controller eliminates tracking errors and synchronization errors to enable the rotating speeds of the motors to keep synchronous. The patent firstly sets the rotation speed proportionality coefficient for each motor, so that each motor can obtain different expected rotation speeds according to the course change of the underwater robot, meanwhile, synchronous error weights are distributed to be matched with different rotating speeds, so that mutual coupling of multiple motors is ensured, and the anti-interference capability of the multiple motor system is enhanced. In the existing multi-motor cooperative control scheme, a weight is distributed to the motor rotation speed synchronization error, the purpose is still to ensure that the rotation speeds of all motors are the same, on-line setting cannot be carried out according to the actual running condition of the motors, the heading is not easy to adjust in time for the underwater robot, and the interference of the environment is resisted, so that the multi-motor mutual coupling control device designed by the patent is more suitable for the actual working environment of the underwater robot and has stronger pertinence;

2. The multi-vortex decision driving algorithm (ML-PDDA) effectively solves the problem of sequence decision in a high-dimensional state space by utilizing the perception capability of deep learning. The control methods applied to the field of multi-motor cooperative control at present comprise fuzzy logic, a neural network, model predictive control and the like, the methods need a great deal of past empirical data and complex mathematical models, the convergence speed is low, the interference of the underwater environment on the underwater robot is nonlinear, the change of parameters in the system is difficult to establish a proper mathematical model, and good control effect is difficult to obtain. The ML-PDDA algorithm introduces water flow disturbance on the basis of the PDDA algorithm, simulates water flow to train a strategy network through a plurality of Lamb vortices, explores a strategy more suitable for an underwater environment, improves training efficiency and stability, and enables the underwater robot to better adapt to water flow interference during movement. The ML-PDDA algorithm has good online learning capability, can learn a mathematical model of the motor according to input and output data of the motor, and is used in an online network and a target network, so that the learning process is more stable, and the convergence speed of the model is faster;

3. The mutual coupling control of multiple motors is matched with a multiple vortex decision driving algorithm (ML-PDDA), so that the rotating speed of each motor can be independently changed, and the driving device of the underwater robot is cooperatively controlled through mutual coupling of relatively improved synchronous errors. The underwater robot can flexibly change the course according to the instruction of the control system and resist the surrounding interference.

Drawings

FIG. 1 is a schematic diagram of a control device of a multi-motor mutual coupling ML-PDDA control algorithm.

Fig. 2 is a diagram showing a synchronization error calculation structure in the present invention.

FIG. 3 is a flow chart of the synchronization error weight factor adjustment in the present invention.

FIG. 4 is a diagram of a multi-vortex decision-driven algorithm (ML-PDDA) network in accordance with the present invention.

Fig. 5 is a schematic diagram of a network structure according to the present invention.

Fig. 6 is a diagram showing an evaluation network configuration in the present invention.

Detailed Description

The present invention will be further described in detail with reference to the drawings and examples, which are only for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.

An embodiment, a multi-motor coupling drive control device of an underwater robot is composed of three modules: the system comprises a single motor control module, a synchronous error weight distribution module and a multi-motor mutual coupling control module. The single motor control module consists of an ML-PDDA algorithm controller and a permanent magnet synchronous motor, the rotating speed error of the motor is used as the input quantity of the controller, the vector control model of the permanent magnet synchronous motor is combined, the control quantity q-axis current of the motor model and the synchronous error weight factor alpha are obtained through the ML-PDDA algorithm strategy network processing, the rotating speed control of the motor is realized, and the driving cooperative control of the underwater robot is realized by matching with the multi-motor mutual coupling control module, as shown in figure 1.

The driving control device of the underwater robot with the mutually coupled motors is designed by the embodiment, an upper computer adjusts the rotation speed proportion of the left side pushing motor and the pitching motor by taking the expected rotation speed of the main propulsion motor as a reference through the actual running route of the underwater robot, so that the motors can cooperatively finish the driving of the underwater robot at different rotation speeds. In the actual power configuration of the underwater robot, a plurality of power propulsion motors and attitude control motors (side pushing and pitching control motors) can be provided, and the motors can be decomposed and combined in the self coordinate system of the underwater robot, and are classified as follows according to acting force directions: the main propulsion motor, the side propulsion motor and the pitching motor. For convenience of description of the following matters in this patent, the main propulsion motor after the thrust normalization processing is defined as the 1 st motor, the side propulsion motor after the thrust normalization processing is defined as the 2 nd motor, and the pitching motor after the thrust normalization processing is defined as the 3 rd motor.

In fig. 1, the expected rotation speed n _ref is constant, the rotation speed of the 1 st motor is taken as a reference, the upper computer adjusts the rotation speed ratio R ₂、R₃ of the motors 2 and 3 through the motion route of the underwater robot, the actual expected rotation speed n _ref1,2,3 of each motor can be obtained, and the tracking error e is obtained by making a difference with the actual rotation speed n _1,2,3 of the motor. Because the actual expected rotation speed of each motor is different, the synchronization error of each motor cannot be directly calculated, the weight factor alpha is distributed to the synchronization errors among different motors in fig. 1, and the synchronization error e' is calculated. The three states of e and e' are selected as the input of the controller, error, backward difference and accumulation of error, respectively. Six input state quantities are processed by utilizing the learning capability of the ML-PDDA algorithm, the control quantity q-axis current i _q and the error weight alpha of the motor are output, the accurate control of the motor is completed, and the abnormal speed cooperative driving of the underwater robot is realized.

The underwater robot reaches the underwater operation position from the water surface according to the instruction of the central control system, the underwater robot needs to go through the processes of submerging, advancing, floating and the like, in the process, the underwater robot needs to adjust the moving direction for a plurality of times, and as the underwater robot does not have a rudder, the rotating speed of each motor needs to be changed, so that the rotating speed difference is formed between the motors, and steering/gesture adjusting thrust is formed, so that the underwater robot moves according to the appointed track. The traditional multi-motor cooperative control device is generally applied to the chemical industry field, and the same speed is required to be kept for each motor in the system, so that the traditional multi-motor cooperative control device cannot meet the control requirements of time-varying, high dynamic response requirements and different speed adjustment of the underwater robot.

When the steering/posture of the underwater robot is adjusted, expected rotating speeds of the three motors are different, so that the patent designs a synchronous error weight distribution module, the weight factors of the synchronous errors are set by using an evaluation rewarding mechanism of an ML-PDDA algorithm, when rewards generated by the output weight factors are maximum, the optimal weight factors are obtained, and the set synchronous errors are used as state quantity to be input into a controller, so that the cooperative condition among multiple motors can be reflected better.

The power of the main propulsion motor, i.e. the 1 st motor, of the underwater robot is maximum, and thus its desired rotational speed is defined as a reference rotational speed. The actual rotation speed of the ith motor is recorded as n _i, the synchronization error of the ith motor and the rest motors is e _i ', taking the 1 st motor as an example, and the calculation formula of the synchronization error e ₁' is shown as formula (1).

e'₁＝α₁×|n₁-n₂|+α₂×|n₁-n₃| (1)

In formula (1), the error weight factor α ₁、α₂ is set by the policy network of the ML-PDDA controller, and the rewards under different weight factors are calculated by the rewarding mechanism as formula (3), and the maximum rewards can be obtained only when the tracking error and the synchronization error are reduced. When the given expected rotation speeds of the three motors are consistent, the synchronous error among the motors does not need to change the weight; when the given expected rotation speeds of the three motors are inconsistent, constant rotation speed difference can be kept between the motors to cooperatively operate by setting the synchronous error weight factors.

The synchronization error calculation module is shown in fig. 2.

In fig. 2, the actual synchronization errors of the motors 1 and 2,3 are calculated first, the error weight factors obtained by the ML-PDDA algorithm are set, the actual synchronization errors are recombined to obtain a new synchronization error e ', the backward difference Δe' of e 'is calculated again, and the accumulated Σe' is used as the input quantity of the ML-PDDA controller and also as the feedback.

The learning ability of the ML-PDDA algorithm is utilized to set the error weight factor, and the value function accumulated rewards are maximized through the constructed value function and rewarding mechanism as shown in figure 3. And taking the reward r _t as an index for evaluating the optimal control effect of the ML-PDDA controller on the motor, and when the controller trains the data, adopting a small batch training mode, wherein the length of each small batch is the ratio of the total training time T _f to the sampling time T _s of the controller, namely T _f/T_s. And (3) obtaining corresponding rewards r _t in each training batch, setting the maximum rewards r _tmax as an optimal quantity, and outputting a synchronous error weight factor alpha at the moment to complete the control of the multi-motor mutual coupling control device on the driving of the underwater robot.

FIG. 4 is a diagram showing the structure of the ML-PDDA controller in FIG. 1, which can effectively solve the problem of decision of a high-dimensional state space sequence by utilizing the stronger learning capability of the ML-PDDA, and selects the synchronous error and the tracking error of a motor as state quantities, and the ML-PDDA algorithm strategy network outputs the control quantity q-axis current i _q and the error weight vector alpha= [ alpha ₁,α₂ ] of the motor. When the strategy network is trained, a plurality of Lamb vortex superposition simulation water flow disturbance is introduced, the strategy network is trained by introducing the water flow disturbance, the controller is guided to explore strategies aiming at the working environment of the underwater robot, the training effectiveness is improved, and the ML-PDDA controller can be better adapted to the underwater environment. The method of small batch training is adopted, and the length of each small batch is the upward rounding of the ratio of the total training time T _f to the controller sampling time T _s, namely T _f/T_s. Each training batch is given a prize r _t, and the maximum prize r _tmax in the time T _f is used for judging whether the control quantity output by the controller reaches the optimal value. And (3) obtaining the maximum reward when the rotation speed tracking error and the synchronization error of the machine are minimum according to the formula, wherein the actual rotation speed is closest to the expected rotation speed, the multi-motor synchronization effect is the best, the control quantities i _q and alpha are output at the moment, the control effect in the state is considered to be the best, and the control result is stored in an experience pool.

Because the working environment of the underwater robot is mixed, when the underwater robot needs to change the course, the input expected rotating speed of each motor is adjusted only by the upper computer, the control quantity i _q of the motors is output by the ML-PDDA controller, the underwater robot is difficult to resist the surrounding nonlinear interference, and the motors are needed to cooperate with each other. When the course is changed, the controller can distribute weight factors to the synchronous errors of the motors to obtain new synchronous errors, the new synchronous errors are used as state quantity input controllers, the controllers combine tracking errors and synchronous error output i _q, so that constant rotation speed difference among the motors can be kept, the motors are mutually coupled through calculating the new synchronous errors, and multi-motor cooperative control is realized. And the ML-PDDA controller distributes new weight to the synchronous error until the underwater robot enters the next navigation state, and the upper computer changes the expected rotating speed of the motor, so that the underwater robot moves according to the set route.

Step 1: design strategy network and evaluation network

The strategy network consists of an input layer, two full-connection layers and an output layer, wherein the input quantity of the state input layer comprises tracking errors, synchronous errors and 6 states which are differentiated and accumulated after the tracking errors and the synchronous errors of all motors, so that 6 nodes are arranged, the full-connection layer is respectively provided with 200 and 200 nodes, the output layer comprises three control quantities of i _q and [ alpha ₁,α₂ ], so that 3 nodes are arranged, and the input layer and the output layer both adopt Relu functions as activation functions.

The evaluation network structure is similar to a strategy network, 6 error state quantities and 3 output control quantities of a motor are used as input quantities of the evaluation network, the input quantities are fused through a nerve convolution network, 9 state quantities are input into a full-connection layer, finally evaluation values Q of the control quantities i _q and [ alpha ₁,α₂ ] are output, the number of nodes of the input layer is the same as that of the strategy network, only one evaluation value Q is arranged on the output layer, so that the number of nodes is set to be 1, and a Sigmoid function is adopted as an activation function.

Step 2: building a value function

And constructing a value function Q (e, a) to evaluate the motor control quantity Q-axis current i _q and the error weight vector alpha= [ alpha ₁,α₂ ] output by the strategy network, and training the strategy network and the evaluation network, wherein the value function of the strategy mu is shown as a formula (2).

Step 3: finding optimal strategies

Because the depth deterministic strategy gradient algorithm employs deterministic strategies, i _q and α for each output of the controller can be obtained by calculation of strategy μ, defining an evaluation function J _π (μ) to evaluate the new strategy learned by the current ML-PDDA algorithm, as in equation (4).

J_π(μ)＝E[Q^μ(e,μ(e))] (4)

μ＝arg max_μJ_π(μ) (5)

In formula (7): θ ^μ is a policy network parameter.

Step 4: updating an evaluation network

Establishing an experience pool, taking motor rotation speed errors e and e' input into a controller, i _q and alpha output, a corresponding obtained reward r _t and motor rotation speed errors at the next moment as a group of experience data, storing the experience data in the experience pool, and acquiring an experience data group from the experience pool by a target network to update evaluation network parameters

y_t＝r_t+γQ'(e_t+1,μ'(e_t+1|θ^μ')|θ^Q') (8)

The evaluation network parameter is updated as in equation (11).

The foregoing has outlined and described the basic principles, features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The multi-motor coupling driving control device of the underwater robot consists of a multi-motor mutual coupling algorithm and a depth deterministic strategy gradient algorithm controller, and is characterized by comprising three modules: the single motor control module, synchronous error weight distribution module, the mutual coupling control module of multiple motors, single motor control module is by ML-PDDA algorithm controller and permanent magnet synchronous motor composition, synchronous error weight distribution module utilizes the evaluation rewarding mechanism of ML-PDDA algorithm, carry out setting to synchronous error's weight factor, when the rewarding that the weight factor of output produced is the biggest, obtain the best weight factor, synchronous error after setting is as state quantity input controller, can better reflect the collaborative situation between multiple motors, underwater robot's main propulsion motor is the power of 1 st motor is the biggest, therefore define its expected rotational speed as benchmark rotational speed, record the actual rotational speed of i motor as ni, i motor and the synchronous error of remaining each motor as ei ', take 1 st motor as an example, synchronous error e1' calculation formula is as formula (1):

e′₁＝α₁×|n₁-n₂|+α₂×|n₁-n₃| (1)

2. The multi-motor coupling driving control method for an underwater robot, which is characterized by using the multi-motor coupling driving control device for an underwater robot according to claim 1, specifically comprising the following steps:

step 1: designing a strategy network and an evaluation network;

Step 2: constructing a value function;

Step 3: searching an optimal strategy;

step 4: updating the evaluation network.

3. The method for controlling the multi-motor coupling driving of the underwater robot according to claim 2, wherein in the step 1 design strategy network and the evaluation network, the strategy network is composed of an input layer, two full-connection layers and an output layer, and the input quantity of the state input layer is set to 6 nodes: the full-connection-layer motor tracking error and synchronization error of each motor and 6 states of backward difference and accumulation of the tracking error and the synchronization error of each motor are respectively provided with 200 and 200 nodes, and the output layer is provided with 3 nodes: the method comprises three control amounts of i _q and [ alpha ₁,α₂ ], wherein a Relu function is adopted as an activation function for an input layer and an output layer; the evaluation network structure takes 6 error state quantities and 3 output control quantities of a motor together as input quantities of an evaluation network, the input quantities are fused through a nerve convolution network, 9 state quantities are input into a full-connection layer, finally, the evaluation values Q of the control quantities i _q and [ alpha ₁,α₂ ] are output, the number of nodes of the input layer is the same as that of a strategy network, only one evaluation value Q is arranged on the output layer, the number of nodes is set to be 1, and a Sigmoid function is adopted as an activation function.

4. The underwater robot multi-motor coupling driving control method of claim 3, wherein in the 2 nd step of constructing the value function, the value function Q (e, a) is constructed to evaluate the motor control amount Q-axis current iq and the error weight vector α= [ α1, α2] outputted by the strategy network, and train the strategy network and the evaluation network, the value function of the strategy μ is as in the formula (2):

In formula (2): e _t is the input quantity of the t-time controller, including a motor tracking error vector and a synchronization error vector; a _t is a control amount output by the controller according to the input motor rotation speed error at time t, including i _q and α= [ α ₁,α₂];γ^k are discount factors of k steps, here taken as 0.99, r _t+k is a reward of the controller from a _t to k time in the state of errors e and e', as shown in formula (3):

5. The multi-motor coupling driving control method of an underwater robot according to claim 4, wherein in the step 3 of finding the optimal strategy, an evaluation function J _π (μ) is defined to evaluate a new strategy learned by the current ML-PDDA algorithm, as shown in formula (4):

J_π(μ)＝E[Q^μ(e,μ(e))] (4)

in formula (4): q ^μ is the Q value calculated by the value function according to μ strategy output i _q and α, i.e., the cumulative prize obtained by μ strategy, with the controller inputting different motor speed errors, the calculation formula being as formula (2),

Finding the optimal strategy, i.e. the strategy that achieves the largest jackpot, μ, according to maximum value achieved by equation (4), as in equation (5):

μ＝argmax_μJ_π(μ) (5)

the formula (4) deflects the parameter theta ^μ of the strategy mu to obtain a strategy gradient, such as the formula (6):

updating strategy network parameters by adopting a gradient descent method, such as a formula (7):

in formula (7): θ ^μ is a policy network parameter;

6. The method for controlling the multi-motor coupling driving of the underwater robot according to claim 5, wherein in the 4 th step of updating the evaluation network, an experience pool is established, motor rotation speed errors e and e 'inputted to the controller, i _q and a outputted, corresponding obtained rewards r _t, and motor rotation speed errors at the next moment are stored in the experience pool as a set of experience data, the target network acquires the experience data set from the experience pool to update the evaluation network parameter θ ^Q, the motor rotation speed error e _t+1 at the next moment is put into the target policy network to obtain the determined outputs i _q and a recorded as a _t+1, the a _t+1 and e _t+1 are fused together through the neural convolution network to be input into the target value network together to obtain an evaluation value Q' of the target network to a _t+1, and then the actual evaluation y _t of the target network is calculated as formula (8):

y_t＝r_t+γQ'(e_t+1,μ'(e_t+1|θ^μ')|θ^Q') (8)

Equation (8): μ '(e _t+1|θ^μ') is i _q and α output at target policy μ'; q' (e _t+1,μ'(e_t+1|θ^μ')) is the evaluation of i _q and α by the target evaluation network; θ ^μ',θ^Q' is the target policy and target evaluation network parameters, respectively;

Meanwhile, an error function L is established, the error of the online evaluation network is calculated, and the online evaluation network is updated by minimizing the error, as shown in a formula (9):

The loss function L derives an evaluation network parameter θ ^Q as shown in formula (10):

Evaluation network parameter update as in formula (11):

The evaluation network is updated through the loss function L, so that the evaluation network can more accurately calculate rewards obtained by the output control quantity of the strategy network, the ML-PDDA controller outputs i _q and [ alpha ₁,α₂ ] which are most in line with the actual running requirements of multiple motors, the online strategy network and the online evaluation network continuously update network parameters through strategy gradients and the loss function, the target strategy network and the target evaluation network are updated through a formula (12) in the training of small batches, the correlation between the accumulated rewards Q calculated by the online evaluation network and the accumulated rewards Q' calculated by the target evaluation network can be reduced, and the effectiveness of the output i _q and [ alpha ₁,α₂ ] of the online strategy network can be improved:

In formula (12): θ ^μ',θ^Q' is the action parameters of the target policy network outputs i _q and [ alpha ₁,α₂ ], and the cost function parameters of the target evaluation network outputs i _q and [ alpha ₁,α₂ ], respectively; θ ^μ,θ^Q is the online policy and online evaluation network parameters, respectively; k is learning rate, and 0.001 is taken.