CN112904873A

CN112904873A - Bionic robot fish control method and device based on deep reinforcement learning and storage medium

Info

Publication number: CN112904873A
Application number: CN202110110948.2A
Authority: CN
Inventors: 李伟琨; 陈浩; 崔维成; 宋长会; 陈林柯
Original assignee: Westlake University
Current assignee: Westlake University
Priority date: 2021-01-26
Filing date: 2021-01-26
Publication date: 2021-06-04
Anticipated expiration: 2041-01-26
Also published as: CN112904873B; CN115390442A

Abstract

The invention provides a bionic robot fish control method and device based on deep reinforcement learning and a storage medium thereof, belonging to the technical field of bionic robot control. The method solves the problems that a bionic robot fish joint motion control method based on a deep reinforcement learning CPG network aiming at joint swimming control of the bionic robot fish is lacked in the prior art. The invention comprises S1: constructing an outer-layer bionic robot fish information network through deep learning, and giving a preliminary instruction through interaction with the environment; s2: and constructing an inner CPG network aiming at the preliminary instruction, and giving a specific joint movement instruction by constructing a movement model based on a central pattern generator. The invention has the advantages of regulating the bionic fish in a complex underwater environment and the like.

Description

Bionic robot fish control method and device based on deep reinforcement learning and storage medium

Technical Field

The invention belongs to the technical field of bionic robot control, and particularly relates to a bionic robot fish control method and device based on deep reinforcement learning and a storage medium thereof.

Background

Deep Reinforcement Learning mainly comprises Deep Learning (Deep Learning) and Reinforcement Learning (Reinforcement Learning). The concept of deep learning was first derived from Artificial Neural Networks (ANN). The model is usually formed by combining multiple layers of nonlinear operation units, and the output of a lower layer is used as the input of a higher layer, so that abstract feature representation is learned from a large amount of training data, and distributed features of the data are discovered. The deep learning theory can effectively mine the deep characteristics of data, and an important branch diagram neural network can effectively break through the limitation of the traditional neural network on the requirement of processing images by virtue of the characteristics of the important branch diagram neural network, so that the important branch diagram neural network becomes one of the most important research directions at present. Whereas CPG (central pattern generator) is a neural network that is capable of generating coordinated patterns of rhythmic activity without any rhythm input from sensory feedback or a superior control center. Because of its good performance, CPG-based controls have been widely used to generate various swimming modes, such as forward swimming, backward swimming, and turn. Although more CPG model methods are proposed, the method is simple and difficult to deal with the complex underwater environment, the intelligent degree of the control method is low, the current movement control research of the bionic robot fish integrated with the CPG of the deep reinforcement learning is still in a starting stage, a bionic robot fish joint movement control method based on the CPG network of the deep reinforcement learning aiming at the joint swimming control of the bionic robot fish is lacked, and many researches lack specific implementation schemes.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a bionic robot fish control method and device based on deep reinforcement learning and a storage medium thereof.

The first object of the present invention can be achieved by the following technical solutions: a bionic robot fish control method based on deep reinforcement learning is characterized by comprising the following steps:

s1: constructing an outer-layer bionic robot fish information network through deep learning, and giving a preliminary instruction through interaction with the environment;

s2: constructing an inner CPG network aiming at the preliminary instruction, and giving a specific joint movement instruction by constructing a movement model based on a central pattern generator;

the working principle of the invention is as follows: the method has good applicability to the joint motion control of the bionic robot fish with multiple joints or multiple degrees of freedom in a complex underwater environment, is combined with a deep reinforcement learning network, and provides a joint motion common control method of the bionic robot fish, which integrates an outer bionic robot fish information network and an inner CPG network model, and can realize the intelligent autonomous high-efficiency swimming control of the bionic robot fish.

In the above method for controlling a biomimetic robotic fish based on deep reinforcement learning, the outer-layer biomimetic robotic fish information network includes input information processed by a cooperative conversion method, and the input information is used for generating a deep reinforcement learning network of a preliminary instruction and an inner-layer CPG network transmission interface.

In the depth reinforcement learning-based bionic robot fish control method, the cooperative conversion method includes associating and labeling continuous 4-frame images acquired by an external sensor of the bionic robot fish with depth, distance and more than two kinds of data, and the cooperative conversion method packs the multi-element data into structured data which can be directly processed by a depth network and serves as subsequent depth reinforcement learning network input.

In the control method for the bionic robot fish based on the deep reinforcement learning, the deep reinforcement learning network adopts the deep reinforcement Q learning network to construct the deep network, the initial instruction of the movement of the bionic robot fish is generated through a good processing mechanism of the deep reinforcement Q learning network and the good interaction capacity with the external environment, and the deep network generates the initial instruction and then inputs an inner CPG network interface.

In the above method for controlling a biomimetic robotic fish based on deep reinforcement learning, the deep reinforcement learning network uses DQN algorithm to construct a deep reinforcement learning framework, and inputs the multivariate data into the deep reinforcement learning framework, and the deep reinforcement learning framework generates a corresponding Q value of the input multivariate data by setting a target reward value, and the Q value generation formula is as shown in formula (1):

wherein P is_a(s, s ') represents the probability of transition from the current state s to the next state s ', and R (s, s ') represents the reward after the action performed in the current state, being a gamma decay coefficient, maxQ^*(s ', a') represents the operation of selecting the current maximum Q value, generating an estimated value of Q through the deep network, and completing the parameter updating of the deep network through the difference between the Q value and the estimated value of Q, as shown in formula (2):

where L (θ) represents a loss function and E represents the desired operation.

In the above method for controlling a biomimetic robotic fish based on deep reinforcement learning, the inner-layer CPG network interface may convert the preliminary instruction and transmit the preliminary instruction into the inner-layer CPG network to implement a specific joint motion of the biomimetic robotic fish, and a generation formula of a specific joint motion model of the biomimetic robotic fish is as follows:

wherein

t is a direction control parameter, theta is a phase difference between neurons,

representing different phases, epsilon, of the ith neuron_iω denotes the amplitude and frequency of the neuron, and P_u，P_vIs a perturbation term, where P_v＝c₂u_i+ ₁sinθ+c₁u_i+1cosθ，P_u＝c₁u_i-1cosθ-c₂v_i-1sinθ，c₁，c₂After the CPG model is constructed for the neuron coupling coefficient, the phase output is converted and then input to each joint of the bionic robot fish, as shown in formula (4):

Γ_i＝ζ_iv_i+Θ_i (4)

wherein gamma is_iIndicates the input of the ith joint, ζ_iThe corresponding transformation coefficient of the joint is determined by the corresponding motor, theta_iAnd finally, finishing interaction and intelligent high-efficiency swimming control of the bionic robot fish in a complex environment through the cooperative cooperation of an outer deep reinforcement learning network and an inner CPG network for the initial instruction coefficient generated by the upper network.

In the above method for controlling a biomimetic robotic fish based on deep reinforcement learning, the inner layer CPG network interface includes a mechanism for decomposing, calibrating and transmitting a preliminary instruction.

In the above method for controlling a biomimetic robotic fish based on deep reinforcement learning, the inner CPG network includes a motion model based on a central pattern generator and a specific joint motion instruction conversion transmission mechanism.

The second object of the present invention can be achieved by the following technical solutions: a bionic robot fish joint motion control device based on a deep reinforcement learning CPG network comprises:

a computer terminal;

a controller;

one or more processors;

a memory;

and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing a method for deep reinforcement learning CPG network based biomimetic robotic fish joint motion control as described above.

The third object of the present invention can be achieved by the following technical solutions: a storage medium storing a computer program for use with a computer and a display, wherein the computer program is executable by a processor to perform the method for controlling joint motion of a biomimetic robotic fish based on a deep reinforcement learning CPG network as described above.

Compared with the prior art, the invention has the advantages of capability of adjusting the bionic fish in a complex underwater environment, sensitive adjustment and the like.

Drawings

FIG. 1 is a schematic diagram of the bionic robot fish joint motion control based on deep reinforcement learning CPG.

FIG. 2 is a schematic diagram of the angle input of a bionic robot fish with an inner layer based on a CPG network.

Detailed Description

The following are specific embodiments of the present invention and are further described with reference to the drawings, but the present invention is not limited to these embodiments.

As shown in fig. 1-2, the control method of the biomimetic robotic fish based on deep reinforcement learning is characterized by comprising the following steps:

the outer-layer bionic robot fish information network generates a bionic robot fish joint motion preliminary instruction through constructing a bionic robot fish information network based on deep reinforcement learning and environment interaction, and the inner-layer CPG network converts the preliminary instruction into a motion angle of a specific joint of the bionic robot fish through constructing a rhythm motion network based on CPG, so that joint motion control of the bionic robot fish is realized.

In further detail, the outer-layer bionic robot fish information network comprises input information processed by a cooperative conversion method, the input information is used as a deep reinforcement learning network for generating a preliminary instruction and an inner-layer CPG network transmission interface, the input information is processed by the cooperative conversion method and then serves as input, the preliminary movement instruction of the bionic robot fish is realized by adopting a DQN algorithm framework, the preliminary movement instruction is input to the lower-layer interface, and the movement angle of the specific joint of the bionic robot fish is realized by constructing a CPG-based rhythm movement network.

In further detail, the cooperative conversion method comprises the steps of correlating and labeling continuous 4-frame images acquired by an external sensor of the bionic robot fish with depth, distance and more than two kinds of data, packaging the multi-element data into structured data which can be directly processed by a depth network and used as subsequent deep reinforcement learning network input, correlating image information with the acquired data of the depth, the distance and the like, packaging the data into structured data which can be processed by the depth network and used as the input information of the deep reinforcement learning network for processing.

In further detail, the deep reinforcement learning network adopts a deep reinforcement Q learning network to construct a deep network, a good processing mechanism of the deep reinforcement Q learning network and good interaction capacity with an external environment are used for generating a preliminary instruction of the movement of the bionic robot fish, and the preliminary instruction is generated by the deep network and then is input into an inner CPG network interface.

In further detail, the deep reinforcement learning network utilizes a DQN algorithm to construct a deep reinforcement learning framework, and inputs the multivariate data into the deep reinforcement learning framework, and the deep reinforcement learning framework generates a corresponding Q value of the inputted multivariate data by setting a target reward value, and the Q value generation formula is as shown in formula (1):

wherein P is_a(s, s ') represents the probability of transition from the current state s to the next state s ', R (s, s ') represents the reward after performing the action in the current state, is a gamma attenuation coefficient, max Q (s ', a ') represents the operation of selecting the current maximum Q value, generates an estimated value of Q through the depth network, and completes the parameter update of the depth network through the difference between the Q value and the estimated value of Q, as shown in formula (2):

wherein L (theta) represents a loss function, E represents a desired operation, and the network architecture also adopts mechanisms such as memory playback, target network and the like.

In further detail, the inner layer CPG network interface can convert the preliminary instruction and transmit the preliminary instruction into the inner layer CPG network to realize the specific joint motion of the biomimetic robotic fish, and the generation formula of the specific joint motion model of the biomimetic robotic fish is as follows:

wherein

representing different phases, epsilon, of the ith neuron_iω denotes the amplitude and frequency of the neuron, and P_u，P_vIn order to be a disturbance term,

P_v＝c₂u_i+1sinθ+c₁u_i+1cosθ P_u＝c₁u_i-1cosθ-c₂v_i-1sinθ

wherein, c₁，c₂After the CPG model is constructed for the neuron coupling coefficient, the phase output is converted and then input to each joint of the bionic robot fish, as shown in formula (4):

Γ_i＝ζ_iv_i+Θ_i (4)

wherein gamma is_iIndicates the input of the ith joint, ζ_iThe corresponding transformation coefficient of the joint is determined by the corresponding motor, theta_iThe preliminary instruction coefficient generated for the upper layer network is finally completed by the cooperation of the outer deep reinforcement learning network and the inner CPG networkThe interaction and intelligent high-efficiency swimming control of the adult bionic robot fish in a complex environment.

In further detail, the interface of the inner layer CPG network includes a mechanism for decomposing, calibrating and transmitting the preliminary command, for example, decomposing "fast right rotation" into "right rotation", and transmitting "fast right rotation" into the inner layer CPG network.

In further detail, the inner CPG network comprises a movement model based on a central pattern generator and a specific joint movement instruction conversion transmission mechanism.

a computer terminal;

a controller;

one or more processors;

a memory;

and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing a method of deep reinforcement learning CPG network based biomimetic robotic fish joint motion control as described above.

The third object of the present invention can be achieved by the following technical solutions: a storage medium stores a computer program used in combination with a computer terminal and a display, and the computer program can be executed by a processor to implement the method for controlling the joint motion of the biomimetic robotic fish based on the deep reinforcement learning CPG network.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Although a large number of terms are used here more, the possibility of using other terms is not excluded. These terms are used merely to more conveniently describe and explain the nature of the present invention; they are to be construed as being without limitation to any additional limitations that may be imposed by the spirit of the present invention.

Claims

1. A bionic robot fish control method based on deep reinforcement learning is characterized by comprising the following steps:

s2: and constructing an inner CPG network aiming at the preliminary instruction, and giving a specific joint movement instruction by constructing a movement model based on a central pattern generator.

2. The bionic robotic fish control method based on deep reinforcement learning of claim 1, characterized in that: the outer-layer bionic robot fish information network comprises input information processed by adopting a cooperative conversion method, and the input information is used for generating a deep reinforcement learning network of a preliminary instruction and an inner-layer CPG network transmission interface.

3. The bionic robotic fish control method based on deep reinforcement learning of claim 1, characterized in that: the cooperative conversion method comprises the steps of correlating and labeling continuous 4-frame images acquired by an external sensor of the bionic robot fish with depth, distance and more than two kinds of data, and packaging the multi-element data into structured data which can be directly processed by a depth network and used as subsequent deep reinforcement learning network input.

4. The bionic robotic fish control method based on deep reinforcement learning of claim 1, characterized in that: the deep reinforcement learning network adopts a deep reinforcement Q learning network to construct a deep network, a good processing mechanism of the deep reinforcement Q learning network and good interaction capacity with an external environment are used for generating a preliminary instruction of the movement of the bionic robot fish, and the deep network generates the preliminary instruction and then inputs an inner CPG network interface.

5. The bionic robotic fish control method based on deep reinforcement learning of claim 1, characterized in that: the deep reinforcement learning network utilizes a DQN algorithm to construct a deep reinforcement learning framework, the multivariate data is input into the deep reinforcement learning framework, the deep reinforcement learning framework generates a corresponding Q value of the input multivariate data by setting a target reward value, and a Q value generation formula is shown as a formula (1):

Q^*(s，a)＝∑p_a(s，s′)(R_a(s，s′)+γmax_a′Q^*(s′，a′)) (1)

L(θ)＝E((R+γmax_a′(s′，a′，θ)-Q(s，a，θ))²) (2)

where L (θ) represents a loss function and E represents the desired operation.

6. The bionic robotic fish control method based on deep reinforcement learning of claim 1, characterized in that: the inner CPG network interface can convert the preliminary instruction and transmit the preliminary instruction into the inner CPG network to realize the specific joint movement of the bionic robot fish, and the generation formula of the specific joint movement model of the bionic robot fish is as shown in formula (3):

wherein

representing different phases, epsilon, of the ith neuron_iω denotes the amplitude and frequency of the neuron, and P_u，P_vIs a perturbation term, where P_v＝c₂u_i+1sinθ+c₁u_i+1cosθ，P_u＝c₁u_i-1cosθ-c₂v_i-1sinθ，c₁，c₂After the CPG model is constructed for the neuron coupling coefficient, the phase output is converted and then input to each joint of the bionic robot fish, as shown in formula (4):

Γ_i＝ζ_iv_i+Θ_i (4)

wherein r_iIndicates the input of the ith joint, ζ_iThe corresponding transformation coefficient of the joint is determined by the corresponding motor, theta_iAnd finally, finishing interaction and intelligent high-efficiency swimming control of the bionic robot fish in a complex environment through the cooperative cooperation of an outer deep reinforcement learning network and an inner CPG network for the initial instruction coefficient generated by the upper network.

7. The bionic robotic fish control method based on deep reinforcement learning of claim 1, characterized in that: the inner CPG network interface comprises a decomposition, calibration and transmission mechanism of a preliminary instruction.

8. The bionic robotic fish control method based on deep reinforcement learning of claim 1, characterized in that: the inner CPG network comprises a movement model based on a central pattern generator and a specific joint movement instruction conversion transmission mechanism.

9. The bionic robot fish control device based on deep reinforcement learning as described in any one of the above items 1 to 8, characterized in that: the method comprises the following steps:

a computer terminal;

a controller;

one or more processors;

a memory;

10. A storage medium as in any one of claims 1-8, wherein: the computer program is used in combination with a computer and a display, and can be executed by a processor to implement the method for controlling the joint motion of the biomimetic robotic fish based on the deep reinforcement learning CPG network.