CN114089633B - Multi-motor coupling driving control device and method for underwater robot - Google Patents

Multi-motor coupling driving control device and method for underwater robot Download PDF

Info

Publication number
CN114089633B
CN114089633B CN202111381879.5A CN202111381879A CN114089633B CN 114089633 B CN114089633 B CN 114089633B CN 202111381879 A CN202111381879 A CN 202111381879A CN 114089633 B CN114089633 B CN 114089633B
Authority
CN
China
Prior art keywords
motor
network
strategy
evaluation
error
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111381879.5A
Other languages
Chinese (zh)
Other versions
CN114089633A (en
Inventor
王伟然
姚杰
葛慧林
智鹏飞
朱志宇
邱海洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN202111381879.5A priority Critical patent/CN114089633B/en
Publication of CN114089633A publication Critical patent/CN114089633A/en
Application granted granted Critical
Publication of CN114089633B publication Critical patent/CN114089633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention relates to the technical field of underwater robot control, in particular to a multi-motor coupling driving control device and method for an underwater robot. Aiming at the scheme, the invention designs a Multiple vortex decision driving algorithm (ML-Policy Decision Driven Algorithm, ML-PDDA) controller to control the rotating speed of each motor and distributes weight factors on line for synchronous errors among the motors.

Description

Multi-motor coupling driving control device and method for underwater robot
Technical Field
The invention relates to the technical field of underwater robot control, in particular to a multi-motor coupling driving control device and method for an underwater robot.
Background
The underwater robot is widely applied to a plurality of fields such as military, underwater resource exploration, underwater search and rescue and the like, and with the development of technology, the underwater robot can replace human beings to finish more tasks with higher difficulty. When each task is performed, the motion trail, posture maintenance and the like of the underwater robot need to be controlled with high precision. However, the underwater environment is complex, different environments generate different interferences to the movement and operation of the robot, so that a proper control structure needs to be designed to realize the mutual cooperative work of a plurality of motors, ensure that the underwater robot can resist various underwater interferences, move according to a precise track and finish subsequent operation tasks.
At present, the cooperative control of multiple motors mainly comprises the following algorithms:
(1) Parallel control
The given rotation speed of each motor in the control system is the same, and the synchronization can be realized only when the loads of the motors are strictly the same. Each motor can only feed back tracking errors of the motors, synchronization errors among the motors are not considered, each motor control unit is independent and has no coupling, when a certain motor unit is interfered by the outside, other motors cannot receive interference information, multi-motor coordination control cannot be achieved, and disturbance resistance is poor. Obviously, the method cannot meet the complex underwater environment.
(2) Master-slave control
The relation between the motors is a master-slave relation, the output of the master motor is used as the input reference value of the rotating speed of the slave motor, and the speed tracking of the slave motor to the master motor is realized. However, the master-slave control system has no feedback mechanism from the slave to the host, if a motor at a certain level is interfered, all motors at the upper level of the motor cannot receive interference information, all control units at the next level can make corresponding speed adjustment, and the control units are transmitted to the next level by the same method, so that larger delay is caused, the interference resistance is poor, and the defect also limits the use of the method.
(3) Virtual spindle control
The virtual spindle control system simulates the characteristics of the mechanical total axis synchronization. After the input rotation speed signal of the motor is acted by the total shaft, the output signal is used as a given signal of each driving unit, and the driving units track the given signal. Because the signal is a signal that is obtained by the action of the total axis and by filtering, there may be a deviation of the main reference value from the actual rotational speed of the motor.
(4) Cross-coupling control
And comparing the speed or position signals of two adjacent motors to make a difference, taking the difference value as a feedback signal of the system, and tracking the feedback signal. The system can reflect the load change of any motor. But this strategy is not applicable to more than two motors because it is very cumbersome to calculate feedback approximations for more than two motors.
(5) Bias coupling control
The deviation coupling control is to feed back the sum of errors of each motor and all other motors as a compensation signal, so that the synchronous control of multiple motors is realized. However, the calculated amount is greatly increased, and the method has the problems of saturation failure of the controller and the like in the starting process.
When the underwater robot finishes underwater operation, the underwater robot is in a stable state by overcoming the interference of surrounding environment and also moves according to an expected track. The multiple-motor cooperative control algorithm aims at keeping the absolute synchronization of the rotating speeds of the multiple motors, can ensure that the underwater robot always operates in a certain posture, but cannot ensure that the underwater robot stably changes the course operation according to the expected.
Disclosure of Invention
In order to solve the technical problems, the invention discloses a multi-motor coupling driving control device and a multi-motor coupling driving control method for an underwater robot, which are used for guaranteeing that the underwater robot can resist the interference of surrounding environments in a complex underwater environment and accurately and stably drive the underwater robot. The device can enable the underwater robot to stably run in a complex underwater environment, provides powerful guarantee for the underwater operation of the underwater robot, improves the working efficiency of the underwater robot, and reduces the risk of the underwater operation of workers.
The invention distributes weight factors to synchronous errors among motors through a Multiple Lamb-Policy Decision Driven Algorithm (ML-PDDA) decision driving algorithm, and designs a multi-motor mutual coupling control structure by matching with the abnormal-speed running of the Multiple motors. For a single motor control unit, a controller is designed by adopting a multi-vortex decision-making driving algorithm (ML-PDDA), and the controller is matched with a multi-motor mutual coupling control system to realize the control of the underwater robot driving device.
The invention adopts the following specific technical scheme:
The multi-motor coupling driving control device of the underwater robot consists of a multi-motor mutual coupling algorithm and a depth deterministic strategy gradient algorithm controller, and specifically comprises the following three modules: the system comprises a single motor control module, a synchronous error weight distribution module and a multi-motor mutual coupling control module.
In the technical scheme, the single motor control module consists of the ML-PDDA algorithm controller and the permanent magnet synchronous motor, the rotating speed error of the motor is used as the input quantity of the controller, the vector control model of the permanent magnet synchronous motor is combined, the q-axis current of the control quantity of the motor model and the synchronous error weight factor alpha are obtained through the ML-PDDA algorithm strategy network processing, the rotating speed control of the motor is realized, and the underwater robot driving cooperative control is realized by matching with the multi-motor mutual coupling control module.
According to the invention, the synchronization error weight distribution module utilizes an evaluation rewarding mechanism of an ML-PDDA algorithm to set the weight factor of the synchronization error, when the rewarding generated by the output weight factor is maximum, the optimal weight factor is obtained, the set synchronization error is used as a state quantity to be input into the controller, the coordination condition among multiple motors can be better reflected, and the power of the main propulsion motor of the underwater robot, namely the 1 st motor, is maximum, so that the expected rotating speed is defined as the reference rotating speed. The actual rotation speed of the ith motor is recorded as n i, the synchronization error of the ith motor and the rest motors is e i ', taking the 1 st motor as an example, and the calculation formula of the synchronization error e 1' is shown as formula (1).
e'1=α1×|n1-n2|+α2×|n1-n3| (1)
In formula (1): alpha 12 is an error weight factor set by the ML-PDDA algorithm, and n 1,n2,n3 is the actual rotation speed of each motor respectively.
The invention also discloses a multi-motor coupling driving control method of the underwater robot, which comprises the following steps:
step 1: designing a strategy network and an evaluation network;
Step 2: constructing a value function;
Step 3: searching an optimal strategy;
step 4: updating the evaluation network.
In the step 1 design strategy network and evaluation network, the strategy network is composed of an input layer, two full-connection layers and an output layer, wherein the input quantity of the state input layer comprises tracking errors, synchronous errors and 6 states which are differentiated and accumulated later of each motor, so 6 nodes are arranged, 200 and 200 nodes are respectively arranged on the full-connection layer, the output layer comprises three control quantities of i q and [ alpha 12 ], 3 nodes are arranged, and the input layer and the output layer both adopt Relu functions as activation functions; the evaluation network structure is similar to a strategy network, 6 error state quantities and 3 output control quantities of a motor are used as input quantities of the evaluation network, the input quantities are fused through a nerve convolution network, 9 state quantities are input into a full-connection layer, finally evaluation values Q of the control quantities i q and [ alpha 12 ] are output, the number of nodes of the input layer is the same as that of the strategy network, only one evaluation value Q is arranged on the output layer, so that the number of nodes is set to be 1, and a Sigmoid function is adopted as an activation function.
In the 2nd step, a value function Q (e, a) is constructed to evaluate the motor control quantity Q-axis current i q and the error weight vector α= [ α 12 ] output by the strategy network, and the strategy network and the evaluation network are trained, and the value function of the strategy μ is as in formula (2).
In formula (2): e t is the input quantity of the t-time controller, including a motor tracking error vector and a synchronization error vector; a t is a control amount output by the controller according to the input motor rotation speed error at time t, including i q and α= [ α 12];γk are discount factors of k steps, here taken as 0.99, and r t+k is a reward of the controller from a t to k times in the state of errors e and e', as shown in formula (3).
In formula (3): n i(t) is the actual rotation speed of the ith motor at the time t; e i(t) is the tracking error of the ith motor at the moment t, 0.1 is the tracking error prevention value 0, and the rewards tend to infinity; e i(t)' is the synchronization error of the ith motor with the other motors.
Only when the tracking error and the synchronization error decrease, i.e. the motor speed approaches the desired value and keeps running in coordination, will the prize increase, and if the desired value is reached completely, the prize is maximized, otherwise the prize decreases. The prize is maximized when the tracking error and the synchronization error are minimized, and the output control amount of the controller is considered to be the optimal control amount, i q and [ alpha 12 ] are the most suitable operation requirements of the motors at the moment.
In the step 3 of searching the optimal strategy, since the depth deterministic strategy gradient algorithm adopts a deterministic strategy, i q and alpha output by the controller each time can be obtained through calculation of strategy mu, and an evaluation function J π (mu) is defined to evaluate a new strategy learned by the current ML-PDDA algorithm, as shown in a formula (4).
Jπ(μ)=E[Qμ(e,μ(e))] (4)
In formula (4): q μ is the Q value calculated by the value function according to μ strategy output i q and α, i.e., the cumulative prize obtained by μ strategy, under the control input of different motor speed errors, as calculated in equation (2).
The optimal strategy is found based on maximizing the value of equation (4), i.e., strategy μ that achieves the maximum jackpot, as in equation (5).
μ=arg maxμJπ(μ) (5)
Equation (4) vs. parameters of strategy μAnd obtaining a strategy gradient by deviator, such as a formula (6).
And updating strategy network parameters by adopting a gradient descent method, as shown in a formula (7).
In formula (7): θ μ is a policy network parameter.
The policy network is updated by solving for the policy mu at the maximum jackpot, such that the policy network is updated in the directions i q and [ alpha 12 ] that result in the maximum prize being achieved.
In the step 4 updating evaluation network, an experience pool is established, the motor rotation speed errors e and e' input into the controller, the output i q and alpha, the corresponding obtained rewards r t and the motor rotation speed error at the next moment are used as a group of experience data and stored in the experience pool, and the target network acquires an experience data group from the experience pool to update the evaluation network parameters
And putting the motor rotation speed error e t+1 at the next moment into a target strategy network to obtain a determined output i q and alpha which are marked as a t+1, fusing the a t+1 and the e t+1 together through a neural convolution network and jointly taking the fused result as the input of a target value network to obtain an evaluation value Q' of the target network to the a t+1, and calculating the actual evaluation y t of the target network as shown in a formula (8).
yt=rt+γQ'(et+1,μ'(et+1μ')|θQ') (8)
Equation (8): I q and α output at the target policy μ'; /(I) Is the evaluation of i q and alpha by the target evaluation network; /(I)The target policy and the target evaluation network parameters are respectively.
And meanwhile, an error function L is established, the error of the online evaluation network is calculated, and the online evaluation network is updated by minimizing the error, as shown in a formula (9).
The loss function L pair evaluates network parametersDerivative, as in equation (10).
The evaluation network parameter is updated as in equation (11).
The evaluation network is updated through the loss function L, so that the evaluation network can calculate rewards obtained by the control quantity output by the strategy network more accurately, and the ML-PDDA controller outputs i q and [ alpha 12 ] which are most in line with the actual running requirements of multiple motors. The online strategy network and the online evaluation network continuously update network parameters through the strategy gradient and the loss function, and the target strategy network and the target evaluation network are updated through a formula (12) in the small-batch training, so that the correlation between the accumulated rewards Q calculated by the online evaluation network and the accumulated rewards Q' calculated by the target evaluation network can be reduced, and the effectiveness of the online strategy network outputs i q and [ alpha 12 ] can be improved.
In formula (12): The action parameters of the target strategy network output i q and [ alpha 12 ] and the cost function parameters of the target evaluation network output i q and [ alpha 12 ] are respectively; /(I) Online strategies and online evaluation network parameters are respectively carried out; k is learning rate, and 0.001 is taken.
The data set stored in the experience pool is used for training and updating the evaluation strategy network parameters, so that the controller outputs the control quantity i q and the error weight alpha to act on the controlled motor, and the rotating speed and the error output of the motor are fed back to the controller, thereby completing iterative training. When the underwater robot enters a strange water area, the experience pool can serve the function of accumulating experience data. Through accumulation of experience data, when the underwater robot needs to change the course and resist external interference, the controller can quickly output i q and alpha which can generate maximum rewards according to the instruction of the upper computer, so that the multi-motor tracking error and the synchronization error are minimum and operate according to the expected rotating speed, and meanwhile, the constant rotating speed difference can be kept through coupling of the weight distribution synchronization errors among the motors, so that the controller has quick dynamic response and stronger robustness.
The invention has the beneficial effects that:
1. The virtual main shaft, the cross coupling and the deviation coupling of the existing multi-motor control scheme are that each motor is given the same expected rotating speed, and the controller eliminates tracking errors and synchronization errors to enable the rotating speeds of the motors to keep synchronous. The patent firstly sets the rotation speed proportionality coefficient for each motor, so that each motor can obtain different expected rotation speeds according to the course change of the underwater robot, meanwhile, synchronous error weights are distributed to be matched with different rotating speeds, so that mutual coupling of multiple motors is ensured, and the anti-interference capability of the multiple motor system is enhanced. In the existing multi-motor cooperative control scheme, a weight is distributed to the motor rotation speed synchronization error, the purpose is still to ensure that the rotation speeds of all motors are the same, on-line setting cannot be carried out according to the actual running condition of the motors, the heading is not easy to adjust in time for the underwater robot, and the interference of the environment is resisted, so that the multi-motor mutual coupling control device designed by the patent is more suitable for the actual working environment of the underwater robot and has stronger pertinence;
2. The multi-vortex decision driving algorithm (ML-PDDA) effectively solves the problem of sequence decision in a high-dimensional state space by utilizing the perception capability of deep learning. The control methods applied to the field of multi-motor cooperative control at present comprise fuzzy logic, a neural network, model predictive control and the like, the methods need a great deal of past empirical data and complex mathematical models, the convergence speed is low, the interference of the underwater environment on the underwater robot is nonlinear, the change of parameters in the system is difficult to establish a proper mathematical model, and good control effect is difficult to obtain. The ML-PDDA algorithm introduces water flow disturbance on the basis of the PDDA algorithm, simulates water flow to train a strategy network through a plurality of Lamb vortices, explores a strategy more suitable for an underwater environment, improves training efficiency and stability, and enables the underwater robot to better adapt to water flow interference during movement. The ML-PDDA algorithm has good online learning capability, can learn a mathematical model of the motor according to input and output data of the motor, and is used in an online network and a target network, so that the learning process is more stable, and the convergence speed of the model is faster;
3. The mutual coupling control of multiple motors is matched with a multiple vortex decision driving algorithm (ML-PDDA), so that the rotating speed of each motor can be independently changed, and the driving device of the underwater robot is cooperatively controlled through mutual coupling of relatively improved synchronous errors. The underwater robot can flexibly change the course according to the instruction of the control system and resist the surrounding interference.
Drawings
FIG. 1 is a schematic diagram of a control device of a multi-motor mutual coupling ML-PDDA control algorithm.
Fig. 2 is a diagram showing a synchronization error calculation structure in the present invention.
FIG. 3 is a flow chart of the synchronization error weight factor adjustment in the present invention.
FIG. 4 is a diagram of a multi-vortex decision-driven algorithm (ML-PDDA) network in accordance with the present invention.
Fig. 5 is a schematic diagram of a network structure according to the present invention.
Fig. 6 is a diagram showing an evaluation network configuration in the present invention.
Detailed Description
The present invention will be further described in detail with reference to the drawings and examples, which are only for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.
An embodiment, a multi-motor coupling drive control device of an underwater robot is composed of three modules: the system comprises a single motor control module, a synchronous error weight distribution module and a multi-motor mutual coupling control module. The single motor control module consists of an ML-PDDA algorithm controller and a permanent magnet synchronous motor, the rotating speed error of the motor is used as the input quantity of the controller, the vector control model of the permanent magnet synchronous motor is combined, the control quantity q-axis current of the motor model and the synchronous error weight factor alpha are obtained through the ML-PDDA algorithm strategy network processing, the rotating speed control of the motor is realized, and the driving cooperative control of the underwater robot is realized by matching with the multi-motor mutual coupling control module, as shown in figure 1.
The driving control device of the underwater robot with the mutually coupled motors is designed by the embodiment, an upper computer adjusts the rotation speed proportion of the left side pushing motor and the pitching motor by taking the expected rotation speed of the main propulsion motor as a reference through the actual running route of the underwater robot, so that the motors can cooperatively finish the driving of the underwater robot at different rotation speeds. In the actual power configuration of the underwater robot, a plurality of power propulsion motors and attitude control motors (side pushing and pitching control motors) can be provided, and the motors can be decomposed and combined in the self coordinate system of the underwater robot, and are classified as follows according to acting force directions: the main propulsion motor, the side propulsion motor and the pitching motor. For convenience of description of the following matters in this patent, the main propulsion motor after the thrust normalization processing is defined as the 1 st motor, the side propulsion motor after the thrust normalization processing is defined as the 2 nd motor, and the pitching motor after the thrust normalization processing is defined as the 3 rd motor.
In fig. 1, the expected rotation speed n ref is constant, the rotation speed of the 1 st motor is taken as a reference, the upper computer adjusts the rotation speed ratio R 2、R3 of the motors 2 and 3 through the motion route of the underwater robot, the actual expected rotation speed n ref1,2,3 of each motor can be obtained, and the tracking error e is obtained by making a difference with the actual rotation speed n 1,2,3 of the motor. Because the actual expected rotation speed of each motor is different, the synchronization error of each motor cannot be directly calculated, the weight factor alpha is distributed to the synchronization errors among different motors in fig. 1, and the synchronization error e' is calculated. The three states of e and e' are selected as the input of the controller, error, backward difference and accumulation of error, respectively. Six input state quantities are processed by utilizing the learning capability of the ML-PDDA algorithm, the control quantity q-axis current i q and the error weight alpha of the motor are output, the accurate control of the motor is completed, and the abnormal speed cooperative driving of the underwater robot is realized.
The underwater robot reaches the underwater operation position from the water surface according to the instruction of the central control system, the underwater robot needs to go through the processes of submerging, advancing, floating and the like, in the process, the underwater robot needs to adjust the moving direction for a plurality of times, and as the underwater robot does not have a rudder, the rotating speed of each motor needs to be changed, so that the rotating speed difference is formed between the motors, and steering/gesture adjusting thrust is formed, so that the underwater robot moves according to the appointed track. The traditional multi-motor cooperative control device is generally applied to the chemical industry field, and the same speed is required to be kept for each motor in the system, so that the traditional multi-motor cooperative control device cannot meet the control requirements of time-varying, high dynamic response requirements and different speed adjustment of the underwater robot.
When the steering/posture of the underwater robot is adjusted, expected rotating speeds of the three motors are different, so that the patent designs a synchronous error weight distribution module, the weight factors of the synchronous errors are set by using an evaluation rewarding mechanism of an ML-PDDA algorithm, when rewards generated by the output weight factors are maximum, the optimal weight factors are obtained, and the set synchronous errors are used as state quantity to be input into a controller, so that the cooperative condition among multiple motors can be reflected better.
The power of the main propulsion motor, i.e. the 1 st motor, of the underwater robot is maximum, and thus its desired rotational speed is defined as a reference rotational speed. The actual rotation speed of the ith motor is recorded as n i, the synchronization error of the ith motor and the rest motors is e i ', taking the 1 st motor as an example, and the calculation formula of the synchronization error e 1' is shown as formula (1).
e'1=α1×|n1-n2|+α2×|n1-n3| (1)
In formula (1): alpha 12 is an error weight factor set by the ML-PDDA algorithm, and n 1,n2,n3 is the actual rotation speed of each motor respectively.
In formula (1), the error weight factor α 1、α2 is set by the policy network of the ML-PDDA controller, and the rewards under different weight factors are calculated by the rewarding mechanism as formula (3), and the maximum rewards can be obtained only when the tracking error and the synchronization error are reduced. When the given expected rotation speeds of the three motors are consistent, the synchronous error among the motors does not need to change the weight; when the given expected rotation speeds of the three motors are inconsistent, constant rotation speed difference can be kept between the motors to cooperatively operate by setting the synchronous error weight factors.
The synchronization error calculation module is shown in fig. 2.
In fig. 2, the actual synchronization errors of the motors 1 and 2,3 are calculated first, the error weight factors obtained by the ML-PDDA algorithm are set, the actual synchronization errors are recombined to obtain a new synchronization error e ', the backward difference Δe' of e 'is calculated again, and the accumulated Σe' is used as the input quantity of the ML-PDDA controller and also as the feedback.
The learning ability of the ML-PDDA algorithm is utilized to set the error weight factor, and the value function accumulated rewards are maximized through the constructed value function and rewarding mechanism as shown in figure 3. And taking the reward r t as an index for evaluating the optimal control effect of the ML-PDDA controller on the motor, and when the controller trains the data, adopting a small batch training mode, wherein the length of each small batch is the ratio of the total training time T f to the sampling time T s of the controller, namely T f/Ts. And (3) obtaining corresponding rewards r t in each training batch, setting the maximum rewards r tmax as an optimal quantity, and outputting a synchronous error weight factor alpha at the moment to complete the control of the multi-motor mutual coupling control device on the driving of the underwater robot.
FIG. 4 is a diagram showing the structure of the ML-PDDA controller in FIG. 1, which can effectively solve the problem of decision of a high-dimensional state space sequence by utilizing the stronger learning capability of the ML-PDDA, and selects the synchronous error and the tracking error of a motor as state quantities, and the ML-PDDA algorithm strategy network outputs the control quantity q-axis current i q and the error weight vector alpha= [ alpha 12 ] of the motor. When the strategy network is trained, a plurality of Lamb vortex superposition simulation water flow disturbance is introduced, the strategy network is trained by introducing the water flow disturbance, the controller is guided to explore strategies aiming at the working environment of the underwater robot, the training effectiveness is improved, and the ML-PDDA controller can be better adapted to the underwater environment. The method of small batch training is adopted, and the length of each small batch is the upward rounding of the ratio of the total training time T f to the controller sampling time T s, namely T f/Ts. Each training batch is given a prize r t, and the maximum prize r tmax in the time T f is used for judging whether the control quantity output by the controller reaches the optimal value. And (3) obtaining the maximum reward when the rotation speed tracking error and the synchronization error of the machine are minimum according to the formula, wherein the actual rotation speed is closest to the expected rotation speed, the multi-motor synchronization effect is the best, the control quantities i q and alpha are output at the moment, the control effect in the state is considered to be the best, and the control result is stored in an experience pool.
Because the working environment of the underwater robot is mixed, when the underwater robot needs to change the course, the input expected rotating speed of each motor is adjusted only by the upper computer, the control quantity i q of the motors is output by the ML-PDDA controller, the underwater robot is difficult to resist the surrounding nonlinear interference, and the motors are needed to cooperate with each other. When the course is changed, the controller can distribute weight factors to the synchronous errors of the motors to obtain new synchronous errors, the new synchronous errors are used as state quantity input controllers, the controllers combine tracking errors and synchronous error output i q, so that constant rotation speed difference among the motors can be kept, the motors are mutually coupled through calculating the new synchronous errors, and multi-motor cooperative control is realized. And the ML-PDDA controller distributes new weight to the synchronous error until the underwater robot enters the next navigation state, and the upper computer changes the expected rotating speed of the motor, so that the underwater robot moves according to the set route.
Step 1: design strategy network and evaluation network
The strategy network consists of an input layer, two full-connection layers and an output layer, wherein the input quantity of the state input layer comprises tracking errors, synchronous errors and 6 states which are differentiated and accumulated after the tracking errors and the synchronous errors of all motors, so that 6 nodes are arranged, the full-connection layer is respectively provided with 200 and 200 nodes, the output layer comprises three control quantities of i q and [ alpha 12 ], so that 3 nodes are arranged, and the input layer and the output layer both adopt Relu functions as activation functions.
The evaluation network structure is similar to a strategy network, 6 error state quantities and 3 output control quantities of a motor are used as input quantities of the evaluation network, the input quantities are fused through a nerve convolution network, 9 state quantities are input into a full-connection layer, finally evaluation values Q of the control quantities i q and [ alpha 12 ] are output, the number of nodes of the input layer is the same as that of the strategy network, only one evaluation value Q is arranged on the output layer, so that the number of nodes is set to be 1, and a Sigmoid function is adopted as an activation function.
Step 2: building a value function
And constructing a value function Q (e, a) to evaluate the motor control quantity Q-axis current i q and the error weight vector alpha= [ alpha 12 ] output by the strategy network, and training the strategy network and the evaluation network, wherein the value function of the strategy mu is shown as a formula (2).
In formula (2): e t is the input quantity of the t-time controller, including a motor tracking error vector and a synchronization error vector; a t is a control amount output by the controller according to the input motor rotation speed error at time t, including i q and α= [ α 12];γk are discount factors of k steps, here taken as 0.99, and r t+k is a reward of the controller from a t to k times in the state of errors e and e', as shown in formula (3).
In formula (3): n i(t) is the actual rotation speed of the ith motor at the time t; e i(t) is the tracking error of the ith motor at the moment t, 0.1 is the tracking error prevention value 0, and the rewards tend to infinity; e i(t)' is the synchronization error of the ith motor with the other motors.
Only when the tracking error and the synchronization error decrease, i.e. the motor speed approaches the desired value and keeps running in coordination, will the prize increase, and if the desired value is reached completely, the prize is maximized, otherwise the prize decreases. The prize is maximized when the tracking error and the synchronization error are minimized, and the output control amount of the controller is considered to be the optimal control amount, i q and [ alpha 12 ] are the most suitable operation requirements of the motors at the moment.
Step 3: finding optimal strategies
Because the depth deterministic strategy gradient algorithm employs deterministic strategies, i q and α for each output of the controller can be obtained by calculation of strategy μ, defining an evaluation function J π (μ) to evaluate the new strategy learned by the current ML-PDDA algorithm, as in equation (4).
Jπ(μ)=E[Qμ(e,μ(e))] (4)
In formula (4): q μ is the Q value calculated by the value function according to μ strategy output i q and α, i.e., the cumulative prize obtained by μ strategy, under the control input of different motor speed errors, as calculated in equation (2).
The optimal strategy is found based on maximizing the value of equation (4), i.e., strategy μ that achieves the maximum jackpot, as in equation (5).
μ=arg maxμJπ(μ) (5)
Equation (4) vs. parameters of strategy μAnd obtaining a strategy gradient by deviator, such as a formula (6).
And updating strategy network parameters by adopting a gradient descent method, as shown in a formula (7).
In formula (7): θ μ is a policy network parameter.
The policy network is updated by solving for the policy mu at the maximum jackpot, such that the policy network is updated in the directions i q and [ alpha 12 ] that result in the maximum prize being achieved.
Step 4: updating an evaluation network
Establishing an experience pool, taking motor rotation speed errors e and e' input into a controller, i q and alpha output, a corresponding obtained reward r t and motor rotation speed errors at the next moment as a group of experience data, storing the experience data in the experience pool, and acquiring an experience data group from the experience pool by a target network to update evaluation network parameters
And putting the motor rotation speed error e t+1 at the next moment into a target strategy network to obtain a determined output i q and alpha which are marked as a t+1, fusing the a t+1 and the e t+1 together through a neural convolution network and jointly taking the fused result as the input of a target value network to obtain an evaluation value Q' of the target network to the a t+1, and calculating the actual evaluation y t of the target network as shown in a formula (8).
yt=rt+γQ'(et+1,μ'(et+1μ')|θQ') (8)
Equation (8): I q and α output at the target policy μ'; /(I) Is the evaluation of i q and alpha by the target evaluation network; /(I)The target policy and the target evaluation network parameters are respectively.
And meanwhile, an error function L is established, the error of the online evaluation network is calculated, and the online evaluation network is updated by minimizing the error, as shown in a formula (9).
The loss function L pair evaluates network parametersDerivative, as in equation (10).
The evaluation network parameter is updated as in equation (11).
The evaluation network is updated through the loss function L, so that the evaluation network can calculate rewards obtained by the control quantity output by the strategy network more accurately, and the ML-PDDA controller outputs i q and [ alpha 12 ] which are most in line with the actual running requirements of multiple motors. The online strategy network and the online evaluation network continuously update network parameters through the strategy gradient and the loss function, and the target strategy network and the target evaluation network are updated through a formula (12) in the small-batch training, so that the correlation between the accumulated rewards Q calculated by the online evaluation network and the accumulated rewards Q' calculated by the target evaluation network can be reduced, and the effectiveness of the online strategy network outputs i q and [ alpha 12 ] can be improved.
In formula (12): The action parameters of the target strategy network output i q and [ alpha 12 ] and the cost function parameters of the target evaluation network output i q and [ alpha 12 ] are respectively; /(I) Online strategies and online evaluation network parameters are respectively carried out; k is learning rate, and 0.001 is taken.
The data set stored in the experience pool is used for training and updating the evaluation strategy network parameters, so that the controller outputs the control quantity i q and the error weight alpha to act on the controlled motor, and the rotating speed and the error output of the motor are fed back to the controller, thereby completing iterative training. When the underwater robot enters a strange water area, the experience pool can serve the function of accumulating experience data. Through accumulation of experience data, when the underwater robot needs to change the course and resist external interference, the controller can quickly output i q and alpha which can generate maximum rewards according to the instruction of the upper computer, so that the multi-motor tracking error and the synchronization error are minimum and operate according to the expected rotating speed, and meanwhile, the constant rotating speed difference can be kept through coupling of the weight distribution synchronization errors among the motors, so that the controller has quick dynamic response and stronger robustness.
The foregoing has outlined and described the basic principles, features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (6)

1. The multi-motor coupling driving control device of the underwater robot consists of a multi-motor mutual coupling algorithm and a depth deterministic strategy gradient algorithm controller, and is characterized by comprising three modules: the single motor control module, synchronous error weight distribution module, the mutual coupling control module of multiple motors, single motor control module is by ML-PDDA algorithm controller and permanent magnet synchronous motor composition, synchronous error weight distribution module utilizes the evaluation rewarding mechanism of ML-PDDA algorithm, carry out setting to synchronous error's weight factor, when the rewarding that the weight factor of output produced is the biggest, obtain the best weight factor, synchronous error after setting is as state quantity input controller, can better reflect the collaborative situation between multiple motors, underwater robot's main propulsion motor is the power of 1 st motor is the biggest, therefore define its expected rotational speed as benchmark rotational speed, record the actual rotational speed of i motor as ni, i motor and the synchronous error of remaining each motor as ei ', take 1 st motor as an example, synchronous error e1' calculation formula is as formula (1):
e′1=α1×|n1-n2|+α2×|n1-n3| (1)
In formula (1): alpha 12 is an error weight factor set by the ML-PDDA algorithm, and n 1,n2,n3 is the actual rotation speed of each motor respectively.
2. The multi-motor coupling driving control method for an underwater robot, which is characterized by using the multi-motor coupling driving control device for an underwater robot according to claim 1, specifically comprising the following steps:
step 1: designing a strategy network and an evaluation network;
Step 2: constructing a value function;
Step 3: searching an optimal strategy;
step 4: updating the evaluation network.
3. The method for controlling the multi-motor coupling driving of the underwater robot according to claim 2, wherein in the step 1 design strategy network and the evaluation network, the strategy network is composed of an input layer, two full-connection layers and an output layer, and the input quantity of the state input layer is set to 6 nodes: the full-connection-layer motor tracking error and synchronization error of each motor and 6 states of backward difference and accumulation of the tracking error and the synchronization error of each motor are respectively provided with 200 and 200 nodes, and the output layer is provided with 3 nodes: the method comprises three control amounts of i q and [ alpha 12 ], wherein a Relu function is adopted as an activation function for an input layer and an output layer; the evaluation network structure takes 6 error state quantities and 3 output control quantities of a motor together as input quantities of an evaluation network, the input quantities are fused through a nerve convolution network, 9 state quantities are input into a full-connection layer, finally, the evaluation values Q of the control quantities i q and [ alpha 12 ] are output, the number of nodes of the input layer is the same as that of a strategy network, only one evaluation value Q is arranged on the output layer, the number of nodes is set to be 1, and a Sigmoid function is adopted as an activation function.
4. The underwater robot multi-motor coupling driving control method of claim 3, wherein in the 2 nd step of constructing the value function, the value function Q (e, a) is constructed to evaluate the motor control amount Q-axis current iq and the error weight vector α= [ α1, α2] outputted by the strategy network, and train the strategy network and the evaluation network, the value function of the strategy μ is as in the formula (2):
In formula (2): e t is the input quantity of the t-time controller, including a motor tracking error vector and a synchronization error vector; a t is a control amount output by the controller according to the input motor rotation speed error at time t, including i q and α= [ α 12];γk are discount factors of k steps, here taken as 0.99, r t+k is a reward of the controller from a t to k time in the state of errors e and e', as shown in formula (3):
In formula (3): n i(t) is the actual rotation speed of the ith motor at the time t; e i(t) is the tracking error of the ith motor at the moment t, 0.1 is the tracking error prevention value 0, and the rewards tend to infinity; e i(t)' is the synchronization error of the ith motor with the other motors.
5. The multi-motor coupling driving control method of an underwater robot according to claim 4, wherein in the step 3 of finding the optimal strategy, an evaluation function J π (μ) is defined to evaluate a new strategy learned by the current ML-PDDA algorithm, as shown in formula (4):
Jπ(μ)=E[Qμ(e,μ(e))] (4)
in formula (4): q μ is the Q value calculated by the value function according to μ strategy output i q and α, i.e., the cumulative prize obtained by μ strategy, with the controller inputting different motor speed errors, the calculation formula being as formula (2),
Finding the optimal strategy, i.e. the strategy that achieves the largest jackpot, μ, according to maximum value achieved by equation (4), as in equation (5):
μ=argmaxμJπ(μ) (5)
the formula (4) deflects the parameter theta μ of the strategy mu to obtain a strategy gradient, such as the formula (6):
updating strategy network parameters by adopting a gradient descent method, such as a formula (7):
in formula (7): θ μ is a policy network parameter;
The policy network is updated by solving for the policy mu at the maximum jackpot, such that the policy network is updated in the directions i q and [ alpha 12 ] that result in the maximum prize being achieved.
6. The method for controlling the multi-motor coupling driving of the underwater robot according to claim 5, wherein in the 4 th step of updating the evaluation network, an experience pool is established, motor rotation speed errors e and e 'inputted to the controller, i q and a outputted, corresponding obtained rewards r t, and motor rotation speed errors at the next moment are stored in the experience pool as a set of experience data, the target network acquires the experience data set from the experience pool to update the evaluation network parameter θ Q, the motor rotation speed error e t+1 at the next moment is put into the target policy network to obtain the determined outputs i q and a recorded as a t+1, the a t+1 and e t+1 are fused together through the neural convolution network to be input into the target value network together to obtain an evaluation value Q' of the target network to a t+1, and then the actual evaluation y t of the target network is calculated as formula (8):
yt=rt+γQ'(et+1,μ'(et+1μ')|θQ') (8)
Equation (8): μ '(e t+1μ') is i q and α output at target policy μ'; q' (e t+1,μ'(et+1μ')) is the evaluation of i q and α by the target evaluation network; θ μ'Q' is the target policy and target evaluation network parameters, respectively;
Meanwhile, an error function L is established, the error of the online evaluation network is calculated, and the online evaluation network is updated by minimizing the error, as shown in a formula (9):
The loss function L derives an evaluation network parameter θ Q as shown in formula (10):
Evaluation network parameter update as in formula (11):
The evaluation network is updated through the loss function L, so that the evaluation network can more accurately calculate rewards obtained by the output control quantity of the strategy network, the ML-PDDA controller outputs i q and [ alpha 12 ] which are most in line with the actual running requirements of multiple motors, the online strategy network and the online evaluation network continuously update network parameters through strategy gradients and the loss function, the target strategy network and the target evaluation network are updated through a formula (12) in the training of small batches, the correlation between the accumulated rewards Q calculated by the online evaluation network and the accumulated rewards Q' calculated by the target evaluation network can be reduced, and the effectiveness of the output i q and [ alpha 12 ] of the online strategy network can be improved:
In formula (12): θ μ'Q' is the action parameters of the target policy network outputs i q and [ alpha 12 ], and the cost function parameters of the target evaluation network outputs i q and [ alpha 12 ], respectively; θ μQ is the online policy and online evaluation network parameters, respectively; k is learning rate, and 0.001 is taken.
CN202111381879.5A 2021-11-19 2021-11-19 Multi-motor coupling driving control device and method for underwater robot Active CN114089633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111381879.5A CN114089633B (en) 2021-11-19 2021-11-19 Multi-motor coupling driving control device and method for underwater robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111381879.5A CN114089633B (en) 2021-11-19 2021-11-19 Multi-motor coupling driving control device and method for underwater robot

Publications (2)

Publication Number Publication Date
CN114089633A CN114089633A (en) 2022-02-25
CN114089633B true CN114089633B (en) 2024-04-26

Family

ID=80302617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111381879.5A Active CN114089633B (en) 2021-11-19 2021-11-19 Multi-motor coupling driving control device and method for underwater robot

Country Status (1)

Country Link
CN (1) CN114089633B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118399799B (en) * 2024-07-01 2024-08-20 深圳市浩瀚卓越科技有限公司 Control method, device and equipment of pan-tilt motor and storage medium

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104935217A (en) * 2015-05-29 2015-09-23 天津大学 Improved deviation coupling control method suitable for multi-motor system
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning
WO2018053187A1 (en) * 2016-09-15 2018-03-22 Google Inc. Deep reinforcement learning for robotic manipulation
CN108803321A (en) * 2018-05-30 2018-11-13 清华大学 Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
CN110323981A (en) * 2019-05-14 2019-10-11 广东省智能制造研究所 A kind of method and system controlling permanent magnetic linear synchronous motor
CN110406526A (en) * 2019-08-05 2019-11-05 合肥工业大学 Parallel hybrid electric energy management method based on adaptive Dynamic Programming
CN110597058A (en) * 2019-08-28 2019-12-20 浙江工业大学 Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning
CN110936824A (en) * 2019-12-09 2020-03-31 江西理工大学 Electric automobile double-motor control method based on self-adaptive dynamic planning
CN111966118A (en) * 2020-08-14 2020-11-20 哈尔滨工程大学 ROV thrust distribution and reinforcement learning-based motion control method
CN112383248A (en) * 2020-10-29 2021-02-19 浙江大学 Model prediction current control method for dual-motor torque synchronization system
CN112388636A (en) * 2020-11-06 2021-02-23 广州大学 DDPG multi-target genetic self-optimization triaxial delta machine platform and method
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 Auv action plan and operation control method based on reinforcement learning
US10962976B1 (en) * 2019-11-29 2021-03-30 Institute Of Automation, Chinese Academy Of Sciences Motion control method and system for biomimetic robotic fish based on adversarial structured control
CN112631315A (en) * 2020-12-08 2021-04-09 江苏科技大学 Multi-motor cooperative propulsion underwater robot path tracking method
CN113031528A (en) * 2021-02-25 2021-06-25 电子科技大学 Multi-legged robot motion control method based on depth certainty strategy gradient
CN113140104A (en) * 2021-04-14 2021-07-20 武汉理工大学 Vehicle queue tracking control method and device and computer readable storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200327411A1 (en) * 2019-04-14 2020-10-15 Di Shi Systems and Method on Deriving Real-time Coordinated Voltage Control Strategies Using Deep Reinforcement Learning

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104935217A (en) * 2015-05-29 2015-09-23 天津大学 Improved deviation coupling control method suitable for multi-motor system
WO2018053187A1 (en) * 2016-09-15 2018-03-22 Google Inc. Deep reinforcement learning for robotic manipulation
CN107748566A (en) * 2017-09-20 2018-03-02 清华大学 A kind of underwater autonomous robot constant depth control method based on intensified learning
CN108803321A (en) * 2018-05-30 2018-11-13 清华大学 Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
CN110323981A (en) * 2019-05-14 2019-10-11 广东省智能制造研究所 A kind of method and system controlling permanent magnetic linear synchronous motor
CN110406526A (en) * 2019-08-05 2019-11-05 合肥工业大学 Parallel hybrid electric energy management method based on adaptive Dynamic Programming
JP2021034050A (en) * 2019-08-21 2021-03-01 哈爾浜工程大学 Auv action plan and operation control method based on reinforcement learning
CN110597058A (en) * 2019-08-28 2019-12-20 浙江工业大学 Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning
US10962976B1 (en) * 2019-11-29 2021-03-30 Institute Of Automation, Chinese Academy Of Sciences Motion control method and system for biomimetic robotic fish based on adversarial structured control
CN110936824A (en) * 2019-12-09 2020-03-31 江西理工大学 Electric automobile double-motor control method based on self-adaptive dynamic planning
CN111966118A (en) * 2020-08-14 2020-11-20 哈尔滨工程大学 ROV thrust distribution and reinforcement learning-based motion control method
CN112383248A (en) * 2020-10-29 2021-02-19 浙江大学 Model prediction current control method for dual-motor torque synchronization system
CN112388636A (en) * 2020-11-06 2021-02-23 广州大学 DDPG multi-target genetic self-optimization triaxial delta machine platform and method
CN112631315A (en) * 2020-12-08 2021-04-09 江苏科技大学 Multi-motor cooperative propulsion underwater robot path tracking method
CN113031528A (en) * 2021-02-25 2021-06-25 电子科技大学 Multi-legged robot motion control method based on depth certainty strategy gradient
CN113140104A (en) * 2021-04-14 2021-07-20 武汉理工大学 Vehicle queue tracking control method and device and computer readable storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
基于卫星测高数据的海洋中尺度涡流动态特征检测;赵文涛;俞建成;张艾群;李岩;;海洋学研究(第03期);全文 *
基于深度确信策略梯度的电动汽车异步电机参数标定方法;漆星;郑常宝;张倩;;电工技术学报;20201025(第20期);全文 *
永磁同步直线电机DDPG自适应控制;张振宇;张昱;陈丽;张东波;;微电机(第04期);全文 *

Also Published As

Publication number Publication date
CN114089633A (en) 2022-02-25

Similar Documents

Publication Publication Date Title
Wang et al. Reinforcement learning-based optimal tracking control of an unknown unmanned surface vehicle
CN108161934A (en) A kind of method for learning to realize robot multi peg-in-hole using deeply
CN111176116B (en) Closed-loop feedback control method for robot fish based on CPG model
CN111240345A (en) Underwater robot trajectory tracking method based on double BP network reinforcement learning framework
CN111240344B (en) Autonomous underwater robot model-free control method based on reinforcement learning technology
CN111273677B (en) Autonomous underwater robot speed and heading control method based on reinforcement learning technology
Fang et al. Autonomous underwater vehicle formation control and obstacle avoidance using multi-agent generative adversarial imitation learning
Song et al. Guidance and control of autonomous surface underwater vehicles for target tracking in ocean environment by deep reinforcement learning
CN113485323B (en) Flexible formation method for cascading multiple mobile robots
Kamanditya et al. Elman recurrent neural networks based direct inverse control for quadrotor attitude and altitude control
CN114089633B (en) Multi-motor coupling driving control device and method for underwater robot
CN117227758A (en) Multi-level human intelligent enhanced automatic driving vehicle decision control method and system
CN111176122A (en) Underwater robot parameter self-adaptive backstepping control method based on double BP neural network Q learning technology
Zhao et al. Global path planning and waypoint following for heterogeneous unmanned surface vehicles assisting inland water monitoring
Wang et al. A modified ALOS method of path tracking for AUVs with reinforcement learning accelerated by dynamic data-driven AUV model
CN117215197A (en) Four-rotor aircraft online track planning method, four-rotor aircraft online track planning system, electronic equipment and medium
CN107450311A (en) Inversion model modeling method and device and adaptive inverse control and device
Li et al. Position errors and interference prediction-based trajectory tracking for snake robots
Wang et al. Path Following Control for Unmanned Surface Vehicles: A Reinforcement Learning-Based Method With Experimental Validation
Wang et al. Parameters Optimization‐Based Tracking Control for Unmanned Surface Vehicles
Sun et al. Improved adaptive fuzzy control for unmanned surface vehicles with uncertain dynamics using high-power functions
Yao et al. Research and comparison of automatic control algorithm for unmanned ship
Wen et al. USV Trajectory Tracking Control Based on Receding Horizon Reinforcement Learning
Sun et al. Unmanned aerial vehicles control study using deep deterministic policy gradient
Sibona et al. EValueAction: a proposal for policy evaluation in simulation to support interactive imitation learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant