CN111310384B - Wind field cooperative control method, terminal and computer readable storage medium - Google Patents

Wind field cooperative control method, terminal and computer readable storage medium Download PDF

Info

Publication number
CN111310384B
CN111310384B CN202010056867.4A CN202010056867A CN111310384B CN 111310384 B CN111310384 B CN 111310384B CN 202010056867 A CN202010056867 A CN 202010056867A CN 111310384 B CN111310384 B CN 111310384B
Authority
CN
China
Prior art keywords
network
strategy
target
wind
networks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010056867.4A
Other languages
Chinese (zh)
Other versions
CN111310384A (en
Inventor
赵俊华
赵焕
梁高琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese University of Hong Kong Shenzhen
Original Assignee
Chinese University of Hong Kong Shenzhen
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese University of Hong Kong Shenzhen filed Critical Chinese University of Hong Kong Shenzhen
Priority to CN202010056867.4A priority Critical patent/CN111310384B/en
Publication of CN111310384A publication Critical patent/CN111310384A/en
Application granted granted Critical
Publication of CN111310384B publication Critical patent/CN111310384B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a wind field cooperative control method, a terminal and a computer readable storage medium, wherein the wind field cooperative control method comprises the following steps: constructing a wind field model; training in the wind field model by an integrated depth deterministic strategy gradient descent method to obtain K pre-trained strategy networks and K pre-trained Q networks, wherein K is an integer greater than 1; taking a pre-trained strategy network and a Q network as initial networks, and learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method; and adopting an optimal strategy to cooperatively control the wind field. Compared with the traditional method for obtaining the optimal control strategy through trial and error, the method effectively reduces the average learning cost in the actual learning process in the pre-training process, and reduces the randomness of the strategy in the learning process in the strategy integrating process, thereby reducing the randomness of the learning cost.

Description

Wind field cooperative control method, terminal and computer readable storage medium
Technical Field
The application relates to the technical field of wind power generation, in particular to a wind field cooperative control method, a terminal and a computer readable storage medium.
Background
With the increase of attention to environmental problems, the promotion of clean energy development is becoming a main goal of energy development at present. One major problem facing today, wind energy as an important component of clean energy, is how to reduce wake effects, maximizing the overall wind farm output through coordinated control of all fans in the wind farm.
The existing wind field cooperative control method without constructing a wake model by the reinforcement learning method obtains an optimal control strategy by trial and error, however, the learning process of the method has randomness and high learning cost.
Thus, the prior art has yet to be developed.
Disclosure of Invention
The application aims to solve the problems of randomness and high learning cost in the learning process of the traditional wind field cooperative control method.
In order to solve the technical problems, the invention discloses a wind field cooperative control method, which comprises the following steps:
Constructing a wind field model;
Training in the wind field model by an integrated depth deterministic strategy gradient descent method to obtain K pre-trained strategy networks and K pre-trained Q networks, wherein K is an integer greater than 1;
taking a pre-trained strategy network and a Q network as initial networks, and learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method;
and adopting an optimal strategy to cooperatively control the wind field.
Further, the wind field model comprises a wind driven generator output model and a wake model;
The output model of the wind driven generator is as follows: ;;
Wherein the method comprises the steps of Is the output of the wind driven generator,/>Is the deflection angle,/>Is the free wind speed,/>Is the area of the blade surface of the fan,/>Is an axial induction factor, which is a control variable expressed as: /(I)Wherein/>Wind speed after free wind passes through the wind driven generator;
The wake model is:
Wherein the method comprises the steps of Representing the position of the fan,/>Representing the ratio of wind speed reduction,/>Is the diameter of the blade surface of the fan,/>Is the roughness coefficient.
Further, the training method of the single strategy network and the single Q network comprises the following steps:
initializing a policy network And Q network/>Wherein/>Representing Q network parameters,/>Representing policy network parameters, s representing a state;
initializing target Q networks with identical weights And target policy network/>
Initializing a replay buffer;
Training strategy network and Q by depth deterministic strategy gradient descent method in the wind field model
A network.
Further, the step of training the strategy network and the Q network in the wind field model by a depth deterministic strategy gradient descent method includes:
Accepting random model states
Obtaining behavior from a policy network:/>; Wherein/>Is Gaussian noise;
performing behavior in the wind farm model And get rewards/>
Will be converted intoRandomly sampling n data from the transition of the replay buffer to form a batch, n being an integer greater than 1;
Updating the policy network, the Q network, the target policy network and the target Q network;
and iterating the steps until convergence, and finishing the pre-training of the strategy network and the Q network.
Further, the step of updating the policy network, the Q network, the target policy network, and the target Q network includes:
By minimizing losses Updating the Q network, and the loss equation is:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of personal data,/>For/>Target value of individual data,/>Represents the/>Model state of individual data,/>Representing the/>, in the modelExecution behavior of the individual data;
updating a strategy network by using a strategy gradient, wherein a gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize;
updating the target policy network and the target Q network: Wherein/> To update the parameters.
Further, the step of learning the actual optimal strategy in the actual environment by integrating the deep deterministic strategy gradient descent method by taking the pre-trained strategy network and the Q network as initial networks comprises the following steps:
K strategy networks obtained by pre-training Initializing K actual strategy networks;
Selecting the last pre-trained Q network to initialize to an actual Q network, initializing K target strategy networks and a target Q network with the same weight;
initializing a replay buffer;
and learning an actual optimal strategy in a real environment through an integrated depth deterministic strategy gradient descent method.
Further, the step of learning the actual optimal strategy by the integrated depth deterministic strategy gradient descent method in the real environment comprises the following steps:
Accepting random real environment states
Network integration of K policies to obtain behavior:/>Wherein, the method comprises the steps of, wherein,Is Gaussian noise;
Proxy execution behavior And get rewards/>
Will be converted intoRandomly sampling N data from the transition of the replay buffer to form a batch, N being an integer greater than 1;
Updating the policy network, the Q network, the target policy network and the target Q network;
and iterating the steps until convergence, and completing the actual training of the strategy network and the Q network.
Further, the step of updating the policy network, the Q network, the target policy network, and the target Q network includes:
updating the Q network by minimizing the loss, the loss equation is:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of data,/>For/>Target value of individual data,/>Represents the/>Status in real Environment of personal data,/>Representing the/>, in a real environmentExecution behavior of the individual data;
then, respectively updating the strategy network by using the strategy gradients, wherein the gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize;
Finally updating the target policy network and the target Q network: Wherein/> To update the parameters.
A terminal comprising a processor and a memory, wherein the processor is configured to execute a computer program stored in the memory to perform a wind farm cooperative control method as described above.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a wind farm cooperative control method as described above.
Compared with the prior art, the application has the beneficial effects that: training is carried out in a wind field model through an integrated depth deterministic strategy gradient descent method to obtain a pre-trained strategy network and a pre-trained Q network, then the pre-trained strategy network and the pre-trained Q network are used as initial networks, and an actual optimal strategy is learned in an actual environment through the integrated depth deterministic strategy gradient descent method. Compared with the traditional method for obtaining the optimal control strategy through trial and error, the average learning cost in the actual learning process is effectively reduced in the pre-training process, and the randomness of the strategy in the learning process is reduced in the strategy integrating process, so that the randomness of the learning cost is reduced.
Drawings
FIG. 1 is a schematic diagram of the training method of the present invention.
Fig. 2 (a), (b), (c) and (d) are graphs comparing training results of the present invention with those of the conventional method in four scenarios shown in table 2.
Fig. 3 is a schematic diagram of a terminal structure provided by the present invention.
Detailed Description
In order to make the objects, features and advantages of the present application more comprehensible, the technical solutions in the embodiments of the present application will be clearly described in conjunction with the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application provides a wind field cooperative control method, the principle is as shown in figure 1, firstly, pre-training is carried out through a model, then network copying is carried out, training is carried out in a real environment, and an optimal strategy is learned. Mainly comprises the following steps S1-S3:
s1, constructing a wind field model, such as a low-fidelity wind field model.
Specifically, the wind field model may include a wind turbine output model and a wake model, where the wind turbine output model is:
Wherein the method comprises the steps of Is the output of the wind driven generator,/>Is the deflection angle,/>Is the free wind speed,/>Is the area of the blade surface of the fan,/>Is an axial induction factor and is a control variable.
The control variable may be expressed as:
Wherein the method comprises the steps of Wind speed after free wind passes through the wind driven generator;
The wake model is:
Wherein the method comprises the steps of Representing the position of the fan,/>Representing the ratio of wind speed reduction,/>Is the diameter of the blade surface of the fan,/>Is the roughness coefficient.
The analytical model uses a brake disc model and a PARK model. Parameters of the wind farm simulator and PARK model are shown in table 1.
Four wind farm topologies were tested. The turbine diameter is set to be D, wind tunnel structures are used, 5*D and 7*D are selected as fan distance parameters, and 5 and 10 are selected as fan quantity parameters. Scenes are numbered 1 through 4 and scene parameters are shown in table 2.
Wind speeds of 8m/s to 16m/s are randomly generated according to the Weber distribution. Assume that the wind angle is 0.
S2, training in the wind field model through an integrated depth deterministic strategy gradient descent method to obtain K pre-trained strategy networks and K pre-trained Q networks, wherein K is an integer greater than 1.
Specifically, the method comprises the following steps S21-S24:
S21, initializing a policy network And Q network/>Wherein/>Representing Q network parameters,/>Representing policy network parameters, s representing the state, i.e. wind farm fan speed, while initializing a reward function r. Wherein, regarding the setting of the deep reinforcement learning: the policy network is a six-layer fully connected neural network, and the Q network is a seven-layer fully connected neural network. Both networks use a linear activation function in the last hidden layer and a rectifying linear unit in the remaining layers. Other parameters are shown in Table 1.
The bonus function is defined as the total output powerSubtract unsafe behavior loss/>. By multiplying the distance outside the safety range by a coefficient/>To calculate unsafe behavior loss/>, of the turbine
S22, initializing target Q networks with the same weightAnd target policy network/>
S23, initializing a playback buffer (RB).
S24, training a strategy network and a Q network in the wind field model through a depth deterministic strategy gradient descent method.
The training process includes steps S241-S246:
s241, accept model State For example, the wind speed is 11m/s.
S242, obtaining behaviors according to the policy network:/>; Wherein/>Is Gaussian noise;
s243, executing behaviors in the wind field model And get rewards/>
S244, will transitionAnd randomly sampling n data from the transition of the replay buffer to form a batch, n being an integer greater than 1.
S245, updating the strategy network, the Q network, the target strategy network and the target Q network. The updating method comprises the following steps S2451-S2453:
s2451 by minimizing losses Updating the Q network, and the loss equation is:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of data,/>For/>Target value of individual data,/>Represents the/>Model state of individual data,/>Representing the/>, in the modelExecution behavior of the individual data.
S2452, updating a strategy network by using a strategy gradient, wherein a gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize.
S2453, updating the target policy network and the target Q network: Wherein/> To update the parameters.
S246, iterating the steps S241-S245 until convergence, and completing the pre-training of the strategy network and the Q network. By the above method, 10 policy networks and 10Q networks are trained.
S3, taking the pre-trained strategy network and the Q network as initial networks, and learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method.
The training process includes steps S31-S34:
S31, initializing the 10 strategy networks obtained through pre-training into 10 actual strategy networks.
S32, selecting the last pre-trained Q network to initialize to an actual Q network, and initializing 10 target strategy networks and 1 target Q network with the same weight.
S33, initializing a playback buffer (RB).
S34, learning an actual optimal strategy in a real environment through an integrated depth deterministic strategy gradient descent method.
The training process includes steps S341-S346: s341 accepting random real environmental status
S342, obtaining behaviors according to K strategy network integration:/>Wherein/>Is gaussian noise.
S343, agent execution behaviorAnd get rewards/>
S344, will transitionN data are stored in the replay buffer and randomly sampled from the transitions in the replay buffer to form a batch, N being an integer greater than 1.
S345, updating the policy network, the Q network, the target policy network and the target Q network; the update process includes steps S3451-S3453:
S3451, updating the Q network by minimizing loss, wherein a loss equation is as follows:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of data,/>For/>Target value of individual data,/>Represents the/>Status in real Environment of personal data,/>Representing the/>, in a real environmentExecution behavior of the individual data;
S3452, respectively updating the strategy network by using the strategy gradients, wherein the gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize;
S3453, finally updating the target policy network and the target Q network: Wherein/> To update the parameters.
S346, iterating the steps until convergence, completing actual training of the strategy network and the Q network, learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method, and finally training results are shown in FIG. 2. The results show that in most cases the force of the blower in the farm in this method (ABE) is greater than in conventional methods, including the optimal greedy algorithm and Park method.
Training is carried out in a wind field model through an integrated depth deterministic strategy gradient descent method to obtain a pre-trained strategy network and a pre-trained Q network, then the pre-trained strategy network and the pre-trained Q network are used as initial networks, and an actual optimal strategy is learned in an actual environment through the integrated depth deterministic strategy gradient descent method. Compared with the traditional method for obtaining the optimal control strategy through trial and error, the pre-training process effectively reduces the average learning cost in the actual learning process, the method for integrating the strategy reduces the randomness of the strategy in the learning process,
Thereby reducing the randomness of the learning cost.
TABLE 3 comparison of learning costs for different scenarios
The invention also provides a terminal, as shown in fig. 3, comprising: a processor (processor) 10, a memory (memory) 20, a communication interface (CommunicationsInterface) 30, and a bus 40; wherein,
The processor 10, memory 20, and communication interface 30 communicate with each other via the bus 40.
The communication interface 30 is used for information transfer between communication devices of the mobile terminal.
The processor 10 is configured to invoke the computer program in the memory 20 to perform the methods provided in the above method embodiments, for example, including: when the system is started, constructing a wind field model; training in the wind field model by an integrated depth deterministic strategy gradient descent method to obtain K pre-trained strategy networks and K pre-trained Q networks, wherein K is an integer greater than 1; taking a pre-trained strategy network and a Q network as initial networks, and learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method; and adopting an optimal strategy c to cooperatively control the wind field.
The invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements one or a combination of the steps of the wind farm cooperative control method. The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although already shown above
While embodiments of the present application have been shown and described, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by those skilled in the art within the scope of the application, which are intended to be included within the scope of the application.

Claims (8)

1. The wind field cooperative control method is characterized by comprising the following steps:
Constructing a wind field model;
Training in the wind field model by an integrated depth deterministic strategy gradient descent method to obtain K pre-trained strategy networks and K pre-trained Q networks, wherein K is an integer greater than 1; the training method of the single strategy network and the single Q network comprises the following steps:
initializing a policy network And Q network/>Wherein/>Representing Q network parameters,/>Representing policy network parameters, s representing a state;
initializing target Q networks with identical weights And target policy network/>
Initializing a replay buffer;
Training a strategy network and a Q network in the wind field model by a depth deterministic strategy gradient descent method; wherein,
Step A, receiving random model state
Step B, obtaining behaviors according to the strategy network:/>; Wherein/>Is Gaussian noise;
step C, executing behaviors in the wind field model And get rewards/>
Step D, transitionRandomly sampling n data from the transition of the replay buffer to form a batch, n being an integer greater than 1;
e, updating the strategy network, the Q network, the target strategy network and the target Q network;
step F, iterating the steps A-E until convergence, and finishing pre-training of the strategy network and the Q network;
taking a pre-trained strategy network and a Q network as initial networks, and learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method;
and adopting an optimal strategy to cooperatively control the wind field.
2. The wind farm cooperative control method of claim 1, wherein the wind farm model comprises a wind generator output model and a wake model;
The output model of the wind driven generator is as follows:
Wherein the method comprises the steps of Is the output of the wind driven generator,/>Is the deflection angle,/>Is the free wind speed,/>Is the area of the blade surface of the fan,/>Is an axial induction factor, which is a control variable expressed as: /(I)Wherein/>Wind speed after free wind passes through the wind driven generator;
The wake model is:
Wherein the method comprises the steps of Representing the position of the fan,/>Representing the ratio of wind speed reduction,/>Is the diameter of the blade surface of the fan,/>Is the roughness coefficient.
3. The wind farm cooperative control method according to claim 1, wherein the step of updating the policy network, the Q network, the target policy network, and the target Q network includes:
By minimizing losses Updating the Q network, and the loss equation is:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of personal data,/>For/>Target value of individual data,/>Represents the/>Model state of individual data,/>Representing the/>, in the modelExecution behavior of the individual data;
updating a strategy network by using a strategy gradient, wherein a gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize;
updating the target policy network and the target Q network: Wherein/> To update the parameters.
4. The wind farm cooperative control method according to claim 1, wherein the step of learning the actual optimal strategy by the integrated depth deterministic strategy gradient descent method in the actual environment using the pre-trained strategy network and the Q network as initial networks comprises:
K strategy networks obtained by pre-training Initializing K actual strategy networks;
Selecting the last pre-trained Q network to initialize to an actual Q network, initializing K target strategy networks and a target Q network with the same weight;
initializing a replay buffer;
and learning an actual optimal strategy in a real environment through an integrated depth deterministic strategy gradient descent method.
5. The wind farm cooperative control method according to claim 4, wherein the step of learning the actual optimal strategy by the integrated depth deterministic strategy gradient descent method in the real environment comprises:
Accepting random real environment states
Network integration of K policies to obtain behavior
Wherein/>Is Gaussian noise;
Proxy execution behavior And get rewards/>
Will be converted intoRandomly sampling N data from the transition of the replay buffer to form a batch, N being an integer greater than 1;
Updating the policy network, the Q network, the target policy network and the target Q network;
and iterating the steps until convergence, and completing the actual training of the strategy network and the Q network.
6. The wind farm cooperative control method according to claim 4, wherein the step of updating the policy network, the Q network, the target policy network, and the target Q network comprises:
updating the Q network by minimizing the loss, the loss equation is:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of personal data,/>For/>Target value of individual data,/>Represents the/>Status in real Environment of personal data,/>Representing the/>, in a real environmentExecution behavior of the individual data;
Then, respectively updating the strategy network by using the strategy gradients, wherein the gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize;
Finally updating the target policy network and the target Q network: Wherein/> To update the parameters.
7. A terminal comprising a processor and a memory, wherein the processor is configured to execute a computer program stored in the memory to perform the wind farm cooperative control method according to any of claims 1 to 6.
8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the wind farm cooperative control method according to any of claims 1 to 6.
CN202010056867.4A 2020-01-16 2020-01-16 Wind field cooperative control method, terminal and computer readable storage medium Active CN111310384B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010056867.4A CN111310384B (en) 2020-01-16 2020-01-16 Wind field cooperative control method, terminal and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010056867.4A CN111310384B (en) 2020-01-16 2020-01-16 Wind field cooperative control method, terminal and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111310384A CN111310384A (en) 2020-06-19
CN111310384B true CN111310384B (en) 2024-05-21

Family

ID=71156346

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010056867.4A Active CN111310384B (en) 2020-01-16 2020-01-16 Wind field cooperative control method, terminal and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111310384B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112460741B (en) * 2020-11-23 2021-11-26 香港中文大学(深圳) Control method of building heating, ventilation and air conditioning system
CN114017904B (en) * 2021-11-04 2023-01-20 广东电网有限责任公司 Operation control method and device for building HVAC system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105098840A (en) * 2015-09-16 2015-11-25 国电联合动力技术有限公司 Coordinated control method for power of wind power plant and system employing coordinated control method
CN107895960A (en) * 2017-11-01 2018-04-10 北京交通大学长三角研究院 City rail traffic ground type super capacitor energy storage system energy management method based on intensified learning
CN108321795A (en) * 2018-01-19 2018-07-24 上海交通大学 Start-stop of generator set configuration method based on depth deterministic policy algorithm and system
CN108803321A (en) * 2018-05-30 2018-11-13 清华大学 Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
CN109523029A (en) * 2018-09-28 2019-03-26 清华大学深圳研究生院 For the adaptive double from driving depth deterministic policy Gradient Reinforcement Learning method of training smart body
CN110027553A (en) * 2019-04-10 2019-07-19 湖南大学 A kind of anti-collision control method based on deeply study
CN110597058A (en) * 2019-08-28 2019-12-20 浙江工业大学 Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105098840A (en) * 2015-09-16 2015-11-25 国电联合动力技术有限公司 Coordinated control method for power of wind power plant and system employing coordinated control method
CN107895960A (en) * 2017-11-01 2018-04-10 北京交通大学长三角研究院 City rail traffic ground type super capacitor energy storage system energy management method based on intensified learning
CN108321795A (en) * 2018-01-19 2018-07-24 上海交通大学 Start-stop of generator set configuration method based on depth deterministic policy algorithm and system
CN108803321A (en) * 2018-05-30 2018-11-13 清华大学 Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study
CN109523029A (en) * 2018-09-28 2019-03-26 清华大学深圳研究生院 For the adaptive double from driving depth deterministic policy Gradient Reinforcement Learning method of training smart body
CN110027553A (en) * 2019-04-10 2019-07-19 湖南大学 A kind of anti-collision control method based on deeply study
CN110597058A (en) * 2019-08-28 2019-12-20 浙江工业大学 Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Jinkyoo Park ,Kincho H.Cooperative wind turbine control for maximizing wind farm power using sequential convex programming.Energy Conversion and Management.2015,第第101卷卷第295-316页. *

Also Published As

Publication number Publication date
CN111310384A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
Liang et al. A hybrid bat algorithm for economic dispatch with random wind power
CN112862281A (en) Method, device, medium and electronic equipment for constructing scheduling model of comprehensive energy system
CN111310384B (en) Wind field cooperative control method, terminal and computer readable storage medium
CN112818588B (en) Optimal power flow calculation method, device and storage medium of power system
CN112613608A (en) Reinforced learning method and related device
CN115972211A (en) Control strategy offline training method based on model uncertainty and behavior prior
CN116388232B (en) Wind power frequency modulation integrated inertia control method, system, electronic equipment and storage medium
CN111245008B (en) Wind field cooperative control method and device
CN111310975A (en) Multi-task message propagation prediction method based on depth model
CN116131288A (en) Comprehensive energy system frequency control method and system considering wind and light fluctuation
Angel et al. Hardware in the loop experimental validation of PID controllers tuned by genetic algorithms
CN114265674A (en) Task planning method based on reinforcement learning under time sequence logic constraint and related device
Shi et al. Multi-agent reinforcement learning in Cournot games
CN113112092A (en) Short-term probability density load prediction method, device, equipment and storage medium
CN116859745B (en) Design method of jump system model-free game control based on deviation evaluation mechanism
CN111723941A (en) Rule generation method and device, electronic equipment and storage medium
Glass et al. Rapid prototyping of cooperative caching in a vanet: A case study
Angel et al. Metaheuristic Tuning and Practical Implementation of a PID Controller Employing Genetic Algorithms
CN117175585B (en) Wind power prediction method, device, equipment and storage medium
Veith et al. Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid
CN113705067B (en) Microgrid optimization operation strategy generation method, system, equipment and storage medium
CN117648585B (en) Intelligent decision model generalization method and device based on task similarity
CN116633599B (en) Method, system and medium for verifying secure login of hand-tour client
CN116362415B (en) Airport ground staff oriented shift scheme generation method and device
US20200410366A1 (en) Automatic determination of the run parameters for a software application on an information processing platform by genetic algorithm and enhanced noise management

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant