CN116974187A

CN116974187A - Magnetic suspension dynamic control method and system based on deep reinforcement learning and disturbance observation

Info

Publication number: CN116974187A
Application number: CN202210429956.8A
Authority: CN
Inventors: 诸锜; 王素梅; 倪一清
Original assignee: Hong Kong Polytechnic University HKPU
Current assignee: Hong Kong Polytechnic University HKPU
Priority date: 2022-04-22
Filing date: 2022-04-22
Publication date: 2023-10-31

Abstract

The invention provides a magnetic suspension dynamic control system based on deep reinforcement learning and disturbance observation. The method comprises the following steps: s1: constructing a nonlinear dynamics model of a suspension frame of the magnetic suspension train based on a suspension mechanism of the magnetic suspension train; s2: acquiring gap signals of a rail and a magnetic levitation train body and current signals of an electromagnet for the train body under the simulated interaction environment as training data, and training a dynamic controller; s3: and acquiring the clearance signal in real time, and taking the clearance signal acquired in real time as the input of a trained dynamic controller to obtain the control signal of the output suspension system of the magnetic suspension train for controlling the electromagnet. The system of the invention can realize dynamic real-time interaction and automatic learning of the controller and the suspension system, and has high robustness and strong anti-interference capability.

Description

Magnetic suspension dynamic control method and system based on deep reinforcement learning and disturbance observation

Technical Field

The invention relates to the field of transportation, in particular to a magnetic suspension dynamic control method and system based on deep reinforcement learning and disturbance observation.

Background

A maglev train is a modern transportation means with a non-contact electromagnetic levitation, guidance and drive system that effects levitation of the train by means of electromagnetic attraction or repulsion, thereby avoiding mechanical contact between the train and the track and being driven by a linear motor. The magnetic suspension train has the characteristics of high speed, low energy consumption, comfortable riding and low noise, and is an ideal transportation means. According to the levitation principle and mode, magnetic levitation trains can be classified into two types, namely electric levitation type (Electrodynamic suspension) and electromagnetic levitation type (Electromagnetic suspension). The electric levitation type magnetic levitation system utilizes electromagnetic repulsion force to realize levitation of the train above the track, and utilizes attraction force generated by an electromagnet positioned below the track to realize levitation of the train.

In the aspect of levitation control, unlike an electric levitation type magnetic levitation system, an electromagnetic levitation type magnetic levitation train in commercial operation at present needs to apply active control to realize stable levitation of the system, so that the stable levitation of the magnetic levitation system is a key and core of safe operation of the magnetic levitation train at present. The electromagnetic suspension type suspension system has strong nonlinearity and open loop instability, and can be influenced by disturbance factors of external environment in the operation process, so that the parameters of the system have extremely high uncertainty. However, most of the current control strategies such as PID control, optimal control, fuzzy control and the like need to be manually encoded, which results in a complicated adjustment process of the controller and no self-learning capability. In addition, most of the existing controllers need to manually encode a control strategy, are complex to operate, and have poor anti-interference capability due to objective environmental problems such as wind load, track irregularity and the like. Thus, achieving an adaptive dynamic controller design is a major trend in electromagnetic levitation control.

Disclosure of Invention

In order to solve the problems, a magnetic suspension dynamic control method and a magnetic suspension dynamic control system based on deep reinforcement learning and disturbance observation are provided.

The invention provides a magnetic suspension dynamic control method based on deep reinforcement learning and disturbance observation, which comprises the following steps:

s1: constructing a nonlinear dynamics model of a suspension frame of the magnetic suspension train based on a suspension mechanism of the magnetic suspension train, wherein the nonlinear dynamics model is used for simulating an interaction environment of a controller of the suspension system;

s2: acquiring gap signals of a track and a train body of a magnetic suspension train and current signals of an electromagnet for the train body under a simulated interaction environment as training data, and training a dynamic controller, wherein the gap signals are used as the input of the dynamic controller, the output of a deep reinforcement learning algorithm in the dynamic controller is combined with a disturbance observation value of a disturbance observer to be used as a control signal of a suspension system, the control signal is used as the input of a nonlinear dynamics model in the step S1, so that a feedback gap signal is obtained in the nonlinear dynamics model according to the control signal, and the feedback gap signal is input again to be used as the input of the dynamic controller to perform a training process of multiple cycles to obtain a trained dynamic controller;

s3: and acquiring the clearance signal in real time, and taking the clearance signal acquired in real time as the input of a trained dynamic controller to obtain the control signal of the output suspension system of the magnetic suspension train for controlling the electromagnet.

In one aspect, the nonlinear dynamics model is expressed as:

wherein ,t represents time, z (t) is the gap between the train track and the train body, i (t) is the current of the coil of the electromagnet, u (t) is the excitation voltage at two ends of the electromagnet, m is the load and self gravity of the electromagnet, A is the sectional area of the electromagnet, N is the number of windings of the coil of the electromagnet, R is the resistance of the windings of the coil of the electromagnet, mu ₀ Is vacuum permeability, f _d Is an external disturbance of the levitation system.

In one aspect, the electromagnet coil current is used as a control variable, the control variableQuantity u (t) =i ² (t) let i ² (t) > 0, whereby the nonlinear dynamics model is expressed as:

wherein t represents time, m is the load and self gravity of the electromagnet, A is the sectional area of the electromagnet, N is the number of windings of the electromagnet coil, mu ₀ Is vacuum permeability, f _d For an external disturbance of the suspension system,

in accordance with the non-linear kinetic model,

wherein the disturbance term d (t) =f _d 。

In one aspect, the depth reinforcement learning algorithm employs a depth determination strategy gradient (DDPG) in step S2.

In one aspect, in step S2, the reward function in the deep reinforcement learning algorithm is

r(t)＝-(z(t)-z ₀ ) ² -0.01·i ² (t)

Wherein z (t) is a real-time suspension gap value, z ₀ For the target levitation gap value, i ² (t) is an output control variable of a deep reinforcement learning algorithm, wherein i is calculated in combination with disturbance observations when interacting with an environment ² (t)。

In one aspect, the disturbance observer adopts a nonlinear disturbance observer, which is used for eliminating disturbance values of internal control variables of the system,

the disturbance observer is a device which is used for observing the disturbance,

wherein z is the internal state of the disturbance observer, p (x) is a nonlinear equation to be designed, l (x) is the disturbance observer gain, and the relation between p (x) and the disturbance observer gain is:

in one aspect, the levitation system includes the following components:

a gap sensor for sensing a gap between the rail and the vehicle body;

a chopper coupled to the gap sensor through a gap processing board, a control board, and an interface conversion board, the chopper configured to vary a current signal supplied to the levitation electromagnet according to a control signal;

a magnetic levitation controller coupled with the chopper, the magnetic levitation controller configured to process a current signal of an output of the chopper to achieve control of the levitation electromagnet; and

and the levitation electromagnet is configured to adjust the magnetic force of the levitation electromagnet on the track according to the signal output by the magnetic levitation controller.

The invention also provides a magnetic suspension dynamic control system based on deep reinforcement learning and disturbance observation, which comprises:

a data storage device for storing one or more programs;

the system comprises a construction module, a control module and a control module, wherein the construction module is used for constructing a nonlinear dynamics model of a suspension frame of the magnetic suspension train, wherein the nonlinear dynamics model of the suspension frame of the magnetic suspension train is constructed based on a suspension mechanism of the magnetic suspension train, and the nonlinear dynamics model is used for simulating an interaction environment of a controller of a suspension system;

the acquisition module is used for acquiring a gap signal of a gap between the magnetic levitation track and a vehicle body of the magnetic levitation train;

the analysis module is used for analyzing and controlling by using the dynamic controller through the gap signal which is input in the simulation environment and the control signal which is output, taking the gap signal which is obtained in real time as the input of the trained dynamic controller to obtain the control signal of the suspension system of the output magnetic suspension train, wherein the gap signal and the current signal of the electromagnet which is used for the train body are collected to be used as training data, training the dynamic controller, taking the gap signal as the input of the dynamic controller, taking the output of the deep reinforcement learning algorithm in the dynamic controller and the disturbance observation value of the disturbance observer as the control signal of the suspension system, taking the control signal as the input of the nonlinear dynamics model, obtaining the feedback gap signal according to the control signal in the nonlinear dynamics model, and inputting the feedback gap signal as the input of the dynamic controller again to perform the training process of multiple cycles to obtain the trained dynamic controller;

and the output module is used for outputting the actual control signal.

In one aspect, the nonlinear dynamics model is expressed as:

In one aspect, the electromagnet coil current is taken as a control variable, the control variable u (t) =i ² (t) let i ² (t) > 0, whereby the nonlinear dynamics model is expressed as:

in accordance with a non-linear power system,

wherein the disturbance term d (t) =f _d 。

In one aspect, the depth reinforcement learning algorithm employs a depth determination strategy gradient (DDPG).

In one aspect, the reward function in the deep reinforcement learning algorithm is

r(t)＝-(z(t)-z ₀ ) ² -0.01·i ² (t)

In one aspect, the disturbance observer employs a nonlinear disturbance observer for eliminating disturbance values of control variables inside the system,

the disturbance observer is

in one aspect, the levitation system includes the following components:

a gap sensor for sensing a gap between the rail and the vehicle body;

The invention also provides a magnetic suspension dynamic control system based on the deep reinforcement learning and disturbance observation, which comprises a server, wherein the server comprises a memory, a processor and computer instructions which are stored on the memory and can be run on the processor, and the processor realizes the magnetic suspension dynamic control method based on the deep reinforcement learning and disturbance observation when executing the computer instructions.

The invention also provides a computer storage medium, on which a computer program is stored, which when being executed by a processor, realizes the magnetic suspension dynamic control method based on deep reinforcement learning and disturbance observation.

By using the magnetic suspension dynamic control method and system based on deep reinforcement learning and disturbance observation, the suspension system is controlled by using the dynamic controller, so that the characteristics of dynamic real-time interaction and automatic learning of the controller and the suspension system can be realized, and the magnetic suspension dynamic control method and system based on deep reinforcement learning and disturbance observation are high in robustness and high in anti-interference capability.

Drawings

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings. In which is shown:

fig. 1 is a schematic structural diagram of a magnetic levitation dynamic control system based on deep reinforcement learning and disturbance observation according to a preferred embodiment of the present invention.

Fig. 2 is a flowchart showing the steps of a magnetic levitation dynamic control method based on deep reinforcement learning and disturbance observation according to a preferred embodiment of the present invention.

Fig. 3 is a schematic diagram of a magnetic levitation control dynamics model based on deep reinforcement learning and disturbance observation according to a preferred embodiment of the present invention.

Fig. 4 is a schematic structural view showing a magnetic levitation dynamic controller according to a preferred embodiment of the present invention.

FIG. 5 is a block diagram of a deep reinforcement learning algorithm according to a preferred embodiment of the present invention.

Detailed Description

The detailed description set forth below is intended to describe various configurations of the subject technology and is not intended to represent only configurations in which the subject technology may be employed. The accompanying drawings are incorporated in and constitute a part of this detailed description. The detailed description includes specific details for the purpose of providing a thorough understanding of the subject technology. It will be apparent, however, to one skilled in the art that the subject technology is not limited to the specific details described herein, and may be practiced using one or more embodiments. In one or more instances, structures and components are shown in block diagram form in order to avoid obscuring the concepts of the subject technology. One or more embodiments of the present disclosure are illustrated by and/or described in connection with one or more diagrams.

The levitation system of the electromagnetic maglev train utilizes the attraction force generated by the electromagnet arranged on the maglev train and the electromagnet positioned below the track to realize levitation of the train. The levitation force of the electromagnetic levitation type magnetic levitation system is reduced when the gap between the train track and the train body is increased, and the levitation force is increased when the gap is reduced, so that active control is required to be applied to ensure the stability of the gap.

The levitation system of the magnetic levitation train according to a preferred embodiment of the present invention is shown in fig. 1, and comprises a magnetic levitation track 1, a gap sensor 2 coupled with a levitation electromagnet 6, the levitation electromagnet 6 connected with a vehicle body 5 or located in the vehicle body, wherein the levitation electromagnet 6 is used to interact with an electromagnet on the track to obtain a magnetic levitation state. The gap sensor 2 is used to sense the gap between the magnetically levitated track 1 and the train body of the train. The gap sensor 2 is coupled to a gap processing plate 231, and the gap processing plate 231 is coupled to a control plate 232 for transferring the gap sensed by the gap sensor 2 to the control plate 232. The control board 232 processes the gap by using a dynamic controller and outputs a corresponding control amount as a current signal of the control signal. The control board 232 is coupled to the interface conversion board 233 so that control signals are provided to the interface conversion board to be transmitted. The interface conversion plate 233 is coupled to the chopper 3, and the chopper receives a control signal and changes the current of the levitation electromagnet according to the control signal, thereby adjusting the electromagnetic force of the levitation electromagnet on the rail and stabilizing the levitation of the train. The chopper is coupled to the magnetic levitation controller to supply current to the magnetic levitation controller 4. The magnetically levitated controller 4 provides control signals to the levitation electromagnet 6 to control the interaction between the levitation electromagnet 6 and the track and thereby the gap between the track 1 and the train's body.

The invention discloses a magnetic suspension dynamic control method based on deep reinforcement learning and disturbance observation, which is shown in fig. 2.

Firstly, a nonlinear dynamics model of a levitation frame of a magnetic levitation train is constructed based on the levitation mechanism of the magnetic levitation train. The nonlinear dynamics model is used to simulate the interactive environment simulation of the controller (step S1). The dynamic controller learns the network parameters of the deep reinforcement learning in the dynamic controller through continuous interaction with the environment, and approximates the actual disturbance value. By adopting the simulation interaction environment, the whole exploration and learning process can be reduced, and the difficulty of directly training the dynamic controller in an actual magnetic suspension control system is reduced.

Then, in a simulated environment, a gap signal and an applied current signal of a gap between a train track and a train body of the magnetic levitation train are acquired. The gap signal and the current signal are used as training data, and are introduced into the dynamic controller as input to train the dynamic controller (step S2). Specifically, the gap signal is used as an input of a dynamic controller, the output of a deep reinforcement learning algorithm in the dynamic controller is combined with a disturbance observation value to be used as a control signal of a magnetic suspension system, then the control signal is transmitted back to a nonlinear dynamics model in the step S1, the nonlinear dynamics model is utilized to process the input control signal, a feedback gap signal is obtained, and the feedback gap signal is used as an input of the dynamic controller. Therefore, the dynamic controller is trained through multiple cycles from the gap signal to the control signal to the feedback gap signal, and the trained dynamic controller is obtained. The output of the dynamic controller is a current signal at two ends of the electromagnet.

Then, the trained dynamic controller is used, and the actual gap signal acquired in real time is used as input, and the dynamic controller outputs the current control signal of the magnetic suspension system, so that the magnetic suspension system is controlled (step S3).

After the peripheral hardware of the magnetic levitation system receives the control signal, the electromagnet is controlled to drive the electromagnet to move to the target position (step S4).

In a preferred embodiment, the structure of the suspension control dynamics model of the magnetic levitation train is shown in fig. 3. As shown in fig. 3, the distance between the track 301 and the vehicle body is z (t), and the coil 302 and the electromagnet 303 are located in or connected to the vehicle body, wherein the coil 302 of a plurality of turns is wound around the electromagnet 303.

The nonlinear dynamics model in step S1 shown above is constructed using the following method.

First, in the structure shown in fig. 3, according to the characteristics of the levitation frame of the magnetic levitation train, using newton's law and maxwell's equations, the following electromagnetic and mechanical equations are established, and a nonlinear dynamics model of the levitation frame is constructed for simulating the interaction environment of the controller:

wherein ,t represents time, z (t) is the gap between the train track and the train body, i (t) is the current of the coil of the electromagnet, u (t) is the excitation voltage at two ends of the electromagnet, m is the load and self gravity of the electromagnet, A is the sectional area of the electromagnet, N is the number of windings of the coil of the electromagnet, R is the resistance of the winding of the coil, mu ₀ Is vacuum permeability, f _d Is an external disturbance of the levitation system. In engineering applications, the current of the coil of an electromagnet is generally used as a control variable, wherein the control variable is i ² (t) is equal to or more than 0, so that the nonlinear dynamics model can be simplified into a model,

writing according to a nonlinear power system

Wherein the control variable u (t) =i ² (t), perturbation term d (t) =f _d 。

Second, the performance of the control method may be further improved with reference to external disturbance factors in the real world. External disturbance factors include wind force, track deformation, signal transmission feedback delay and the like. For example, the track deformation can be simulated by a power spectral density function method and used for training a dynamic controller to further improve the performance of the control method, wherein the track irregularity power spectral density function S (omega) is

Wherein Ω is spatial frequency, and related parameter A _v ＝1.5e-7m，Ω _r ＝2.06e-6rad/m，Ω _c =0.825 rad/m. The size of the suspension gap can be changed by the track irregularity, so that electromagnetic force is affected, and for simplifying the description, the invention considers that the disturbance caused by the track irregularity is as follows:

the disturbance observer is expressed as:

in a preferred embodiment, the training dynamic controller in step S2 specifically includes the following steps.

First, in step S21, a dynamic controller based on deep reinforcement learning and disturbance observation is constructed, as shown in fig. 4, which is a schematic structural diagram of a magnetic levitation dynamic controller according to a preferred embodiment. And establishing the interaction of the dynamic controller and the environment simulation of the nonlinear dynamic model, wherein the interaction data of the dynamic controller and the simulation environment are the same as the data of the actual magnetic levitation train, so as to be used for environment migration in a further step, and determining the input state of the dynamic controller as a gap signal.

And step S22, taking the gap signals and the current signals collected under the simulated environment as input data of a dynamic controller, and training the dynamic controller, wherein the output of a deep reinforcement learning algorithm in the dynamic controller is combined with a disturbance observation value to be used as a control signal of the magnetic suspension system and used for controlling the gaps of a train rail and a train body in the magnetic suspension system. After receiving the control signal, the magnetic levitation system in the simulation environment obtains a new state of the magnetic levitation system, including a gap signal of a new gap. The new gap signal is provided as a new input to the dynamic controller, a new control signal is obtained by the dynamic controller, and then the new control signal is further output to the magnetic suspension system in the simulation environment. In this way, the dynamic controller is trained cyclically.

In order to enhance the stability of the deep reinforcement learning algorithm in the dynamic controller in the control, a depth determination strategy gradient (DDPG) comprising a strategy target network and a value target network is adopted, and the structure of the deep reinforcement learning algorithm shown in fig. 5 comprises two corresponding neural networks, namely a strategy network (actor) and a value network (Critic). The sampling strategy is stored in an empirical recovery memory storage device. The data in the experience pool is input into the neural network to simulate the Q function, so that the result can be accurately predicted. Let the environmental state at time t beThe behavior of the intelligent agent for deep reinforcement learning is a _t U (t), environment in state s _t Execution behavior a _t The single step prize value returned later is r _t An online policy network in the policy network is expressed as μ (s|θ ^μ ) (network parameter is θ ^μ ) The target policy network is denoted μ' (s|θ) ^μ' ) Of the value networks, the online value network is denoted Q (s, a|θ ^Q ) The target value network is Q' (s, a|θ) ^Q' ). First, act a is selected by a policy network _t (disturbance observation value d (t) needs to be combined when the environment is interacted with), so that the interaction environment is simulatedThe behavior is performed:

a _t ＝μ(s _t |θ ^μ )。

the environment returns a single step prize value r after performing the action _t And a new state s _t+1 The policy network then transitions the state transition procedure (s _t ,a _t ,r _t ,s _t+1 ) And storing the data into an experience playback memory storage as a data set for training an online strategy network and an online value network.

And after the experience playback memory storage reaches a certain storage amount, training the DDPG network by randomly sampling N groups of state transition processes from the experience playback memory storage. Wherein the loss value (calculated by using a time difference method) of the online value network is,

the parameters of the online value network can then be updated based on the gradient descent method (Adam Optimizer) according to the loss value. The gradient algorithm of the Online policy network is that,

the parameters of the online policy network are then updated likewise in accordance with the gradient descent method (Adam Optimizer). After training a batch, carrying out parameter update on the target value network and the target strategy network according to a soft update method,

θ ^Q’ ←τθ ^Q +(1-τ)θ ^Q’ ,θ ^μ’ ←τθ ^μ +(1-τ)θ ^μ’

and after training for multiple times, obtaining an optimal strategy.

Wherein in the deep reinforcement learning algorithm, how to realize control is automatically learned by establishing a reward value (reward) provided by a reward function, and the reward function is that

r(t)＝-(z(t)-z ₀ ) ² -0.01·i ² (t)

Wherein z (t) is a real-time suspension gap value, z ₀ For the target levitation gap value, i ² And (t) is an output control variable of the deep reinforcement learning algorithm (disturbance observation value is needed to be combined when the deep reinforcement learning algorithm interacts with the environment).

The convergence of the disturbance observer is demonstrated as follows:

observer error isSubstituting the observer and the magnetic suspension nonlinear power system into the error power system of the observer which is obtained as follows

From the equation it can be concluded that the error estimate will converge exponentially to the actual error if a suitable disturbance observer gain i (x) is chosen.

Based on the dynamic controller, the nonlinear dynamics model of the magnetic levitation train is subjected to real-time evaluation as feedback information, the evaluation mechanism feeds back a reward function to the model as a reward evaluation value through rewarding or punishing the current manipulation action, and the model iteratively updates and optimizes strategies in combination with the running state. The dynamic controller and the nonlinear dynamic model are subjected to continuous interactive learning, the control and evaluation mechanism is improved, multiple simulations are realized under a simulation environment to judge gaps, and control signals and gap signals are guided by feeding back reward evaluation values, namely a deep reinforcement learning algorithm in the dynamic controller is continuously excited to update and optimize strategies, and finally converged and optimized strategies are obtained after multiple iterations, and meanwhile disturbance values observed by a disturbance observer can be converged to real disturbance, so that the robustness of the controller is improved.

According to a preferred embodiment, in step S3, the trained dynamic controller is transferred to the actual levitation system of the magnetic levitation train for control. Because the environment where the actual system is located has errors with the simulation environment, the trained dynamic controller is optimized according to the interaction of the actual gap signal and the control signal. The optimized dynamic controller is used for controlling the gap signal of the actual magnetic suspension train to obtain a control signal.

According to a preferred embodiment, in step S4, after receiving the control signal, the peripheral hardware of the magnetic levitation system controls the electromagnet to drive the electromagnet to move to the target position. Therefore, the magnetic suspension dynamic control system based on deep reinforcement learning and disturbance observation is provided.

According to a preferred embodiment of the invention, a magnetic suspension dynamic control system based on deep reinforcement learning and disturbance observation comprises a data storage device for storing one or more programs; the construction module is used for constructing a nonlinear dynamics model of a suspension frame of the magnetic suspension train; the acquisition module is used for acquiring a gap between the magnetic suspension rail and a train body of the train; the analysis module is used for utilizing the dynamic controller to analyze and control the gaps in the simulation environment, inputting the gaps in the practical application into the pre-trained dynamic controller, and obtaining control signals; and the output module is used for outputting the control signal.

The magnetic suspension dynamic control system based on the deep reinforcement learning and disturbance observation according to the preferred embodiment of the invention comprises a server, wherein the server comprises a memory, a processor and computer instructions which are stored on the memory and can be run on the processor, and the processor realizes the magnetic suspension dynamic control method based on the deep reinforcement learning and disturbance observation when executing the computer instructions.

There is also provided in accordance with a preferred embodiment of the present invention a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the aforementioned method for dynamically controlling a levitation system of a magnetic levitation train based on deep reinforcement learning.

The term "comprising" as used in this specification means "at least partially comprising". In interpreting each statement in this specification that includes the word "comprising", features other than or beginning with the word may also be present. Related terms such as "comprise" and "comprises" should be interpreted in the same manner.

Many variations in construction and widely differing embodiments and applications of the invention will become apparent to those skilled in the art to which the invention pertains without departing from its scope as defined in the appended claims. The disclosures and descriptions herein are purely illustrative and are not intended to be in any sense limiting. Where specific integers are mentioned herein having known equivalents in the art to which this invention relates, such known equivalents are deemed to be incorporated herein as if individually set forth.

As used herein, the term "and/or" means "and" or both.

In the description of the present specification, reference may be made to subject matter which is not within the scope of the appended claims. The subject matter should be readily identifiable by one skilled in the art and may be helpful in practicing the invention as defined in the appended claims.

Although the invention is generally defined as above, it will be understood by those skilled in the art that the invention is not limited thereto and that the invention also includes the following examples which give illustrative implementations.

The foregoing description of the invention includes preferred forms thereof. Modifications may be made thereto without departing from the scope of the invention.

Claims

1. The magnetic suspension dynamic control method based on deep reinforcement learning and disturbance observation is characterized by comprising the following steps of:

2. The magnetic levitation dynamic control method according to claim 1, wherein the nonlinear dynamics model is expressed as:

3. A magnetic levitation dynamic control method according to claim 2, characterized in that the electromagnet coil current is used as a control variable, the control variable u (t) =i ² (t) let i ² (t) > 0, whereby the nonlinear dynamics model is expressed as:

in accordance with the non-linear kinetic model,

wherein the disturbance term d (t) =f _d 。

4. The method according to claim 1 or 2, wherein in step S2, the deep reinforcement learning algorithm employs a depth determination strategy gradient (DDPG).

5. A magnetic levitation dynamic control method according to claim 1 or 2, wherein in step S2, the reward function in the deep reinforcement learning algorithm is

r(t)＝-(z(t)-z ₀ ) ² -0.01·i ² (t)

6. The method according to claim 1 or 2, wherein in step S2, the disturbance observer is a nonlinear disturbance observer for eliminating disturbance values of control variables inside the system,

7. a magnetic levitation dynamic control method according to claim 1 or 2, characterized in that the levitation system comprises the following components:

a gap sensor for sensing a gap between the rail and the vehicle body;

8. A magnetic levitation dynamic control system based on deep reinforcement learning and disturbance observation, characterized in that the magnetic levitation dynamic control system comprises:

a data storage device for storing one or more programs;

and the output module is used for outputting the actual control signal.

9. The magnetic levitation dynamic control system of claim 8, wherein the nonlinear dynamics model is expressed as:

10. The magnetic levitation dynamic control system of claim 9, wherein the electromagnet coil current is used as a control variable, the control variable u (t) = i ² (t) let i ² (t) > 0, whereby the nonlinear dynamics model is expressed as:

in accordance with a non-linear power system,

wherein the disturbance term d (t) =f _d 。

11. A magnetically levitated dynamic control system according to claim 8 or 9, characterized in that the deep reinforcement learning algorithm employs a depth determination strategy gradient (DDPG).

12. A magnetic levitation dynamic control system according to claim 8 or 9, wherein the reward function in the deep reinforcement learning algorithm is

r(t)＝-(z(t)-z ₀ ) ² -0.01·i ² (t)

13. A magnetic levitation dynamic control system according to claim 8 or 9, wherein the disturbance observer employs a nonlinear disturbance observer for eliminating disturbance values of control variables inside the system,

the disturbance observer is

14. a magnetically levitated dynamic control system according to claim 8 or 9, characterized in that the levitation system comprises the following components:

a gap sensor for sensing a gap between the rail and the vehicle body;

15. A magnetically levitated dynamic control system based on deep reinforcement learning and disturbance observation, comprising a server comprising a memory, a processor and computer instructions stored on the memory and executable on the processor, the processor implementing the magnetically levitated dynamic control method based on deep reinforcement learning and disturbance observation according to any one of claims 1-7 when executing the computer instructions.

16. A computer-storable medium having stored thereon a computer program which, when executed by a processor, implements the magnetic levitation dynamic control method based on deep reinforcement learning and disturbance observation as defined in any one of claims 1 to 7.