CN111310384B - Wind field cooperative control method, terminal and computer readable storage medium - Google Patents
Wind field cooperative control method, terminal and computer readable storage medium Download PDFInfo
- Publication number
- CN111310384B CN111310384B CN202010056867.4A CN202010056867A CN111310384B CN 111310384 B CN111310384 B CN 111310384B CN 202010056867 A CN202010056867 A CN 202010056867A CN 111310384 B CN111310384 B CN 111310384B
- Authority
- CN
- China
- Prior art keywords
- network
- strategy
- target
- wind
- networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000012549 training Methods 0.000 claims abstract description 33
- 238000011478 gradient descent method Methods 0.000 claims abstract description 26
- 230000006399 behavior Effects 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 6
- 230000001186 cumulative effect Effects 0.000 claims description 6
- 230000007704 transition Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 5
- 230000006698 induction Effects 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 17
- 238000011217 control strategy Methods 0.000 abstract description 4
- 238000004891 communication Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000007796 conventional method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 230000002787 reinforcement Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000010248 power generation Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Feedback Control In General (AREA)
Abstract
The invention provides a wind field cooperative control method, a terminal and a computer readable storage medium, wherein the wind field cooperative control method comprises the following steps: constructing a wind field model; training in the wind field model by an integrated depth deterministic strategy gradient descent method to obtain K pre-trained strategy networks and K pre-trained Q networks, wherein K is an integer greater than 1; taking a pre-trained strategy network and a Q network as initial networks, and learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method; and adopting an optimal strategy to cooperatively control the wind field. Compared with the traditional method for obtaining the optimal control strategy through trial and error, the method effectively reduces the average learning cost in the actual learning process in the pre-training process, and reduces the randomness of the strategy in the learning process in the strategy integrating process, thereby reducing the randomness of the learning cost.
Description
Technical Field
The application relates to the technical field of wind power generation, in particular to a wind field cooperative control method, a terminal and a computer readable storage medium.
Background
With the increase of attention to environmental problems, the promotion of clean energy development is becoming a main goal of energy development at present. One major problem facing today, wind energy as an important component of clean energy, is how to reduce wake effects, maximizing the overall wind farm output through coordinated control of all fans in the wind farm.
The existing wind field cooperative control method without constructing a wake model by the reinforcement learning method obtains an optimal control strategy by trial and error, however, the learning process of the method has randomness and high learning cost.
Thus, the prior art has yet to be developed.
Disclosure of Invention
The application aims to solve the problems of randomness and high learning cost in the learning process of the traditional wind field cooperative control method.
In order to solve the technical problems, the invention discloses a wind field cooperative control method, which comprises the following steps:
Constructing a wind field model;
Training in the wind field model by an integrated depth deterministic strategy gradient descent method to obtain K pre-trained strategy networks and K pre-trained Q networks, wherein K is an integer greater than 1;
taking a pre-trained strategy network and a Q network as initial networks, and learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method;
and adopting an optimal strategy to cooperatively control the wind field.
Further, the wind field model comprises a wind driven generator output model and a wake model;
The output model of the wind driven generator is as follows: ;;
Wherein the method comprises the steps of Is the output of the wind driven generator,/>Is the deflection angle,/>Is the free wind speed,/>Is the area of the blade surface of the fan,/>Is an axial induction factor, which is a control variable expressed as: /(I)Wherein/>Wind speed after free wind passes through the wind driven generator;
The wake model is:
Wherein the method comprises the steps of Representing the position of the fan,/>Representing the ratio of wind speed reduction,/>Is the diameter of the blade surface of the fan,/>Is the roughness coefficient.
Further, the training method of the single strategy network and the single Q network comprises the following steps:
initializing a policy network And Q network/>Wherein/>Representing Q network parameters,/>Representing policy network parameters, s representing a state;
initializing target Q networks with identical weights And target policy network/>;
Initializing a replay buffer;
Training strategy network and Q by depth deterministic strategy gradient descent method in the wind field model
A network.
Further, the step of training the strategy network and the Q network in the wind field model by a depth deterministic strategy gradient descent method includes:
Accepting random model states ;
Obtaining behavior from a policy network:/>; Wherein/>Is Gaussian noise;
performing behavior in the wind farm model And get rewards/>;
Will be converted intoRandomly sampling n data from the transition of the replay buffer to form a batch, n being an integer greater than 1;
Updating the policy network, the Q network, the target policy network and the target Q network;
and iterating the steps until convergence, and finishing the pre-training of the strategy network and the Q network.
Further, the step of updating the policy network, the Q network, the target policy network, and the target Q network includes:
By minimizing losses Updating the Q network, and the loss equation is:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of personal data,/>For/>Target value of individual data,/>Represents the/>Model state of individual data,/>Representing the/>, in the modelExecution behavior of the individual data;
updating a strategy network by using a strategy gradient, wherein a gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize;
updating the target policy network and the target Q network: Wherein/> To update the parameters.
Further, the step of learning the actual optimal strategy in the actual environment by integrating the deep deterministic strategy gradient descent method by taking the pre-trained strategy network and the Q network as initial networks comprises the following steps:
K strategy networks obtained by pre-training Initializing K actual strategy networks;
Selecting the last pre-trained Q network to initialize to an actual Q network, initializing K target strategy networks and a target Q network with the same weight;
initializing a replay buffer;
and learning an actual optimal strategy in a real environment through an integrated depth deterministic strategy gradient descent method.
Further, the step of learning the actual optimal strategy by the integrated depth deterministic strategy gradient descent method in the real environment comprises the following steps:
Accepting random real environment states ;
Network integration of K policies to obtain behavior:/>Wherein, the method comprises the steps of, wherein,Is Gaussian noise;
Proxy execution behavior And get rewards/>;
Will be converted intoRandomly sampling N data from the transition of the replay buffer to form a batch, N being an integer greater than 1;
Updating the policy network, the Q network, the target policy network and the target Q network;
and iterating the steps until convergence, and completing the actual training of the strategy network and the Q network.
Further, the step of updating the policy network, the Q network, the target policy network, and the target Q network includes:
updating the Q network by minimizing the loss, the loss equation is:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of data,/>For/>Target value of individual data,/>Represents the/>Status in real Environment of personal data,/>Representing the/>, in a real environmentExecution behavior of the individual data;
then, respectively updating the strategy network by using the strategy gradients, wherein the gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize;
Finally updating the target policy network and the target Q network: Wherein/> To update the parameters.
A terminal comprising a processor and a memory, wherein the processor is configured to execute a computer program stored in the memory to perform a wind farm cooperative control method as described above.
A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a wind farm cooperative control method as described above.
Compared with the prior art, the application has the beneficial effects that: training is carried out in a wind field model through an integrated depth deterministic strategy gradient descent method to obtain a pre-trained strategy network and a pre-trained Q network, then the pre-trained strategy network and the pre-trained Q network are used as initial networks, and an actual optimal strategy is learned in an actual environment through the integrated depth deterministic strategy gradient descent method. Compared with the traditional method for obtaining the optimal control strategy through trial and error, the average learning cost in the actual learning process is effectively reduced in the pre-training process, and the randomness of the strategy in the learning process is reduced in the strategy integrating process, so that the randomness of the learning cost is reduced.
Drawings
FIG. 1 is a schematic diagram of the training method of the present invention.
Fig. 2 (a), (b), (c) and (d) are graphs comparing training results of the present invention with those of the conventional method in four scenarios shown in table 2.
Fig. 3 is a schematic diagram of a terminal structure provided by the present invention.
Detailed Description
In order to make the objects, features and advantages of the present application more comprehensible, the technical solutions in the embodiments of the present application will be clearly described in conjunction with the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The embodiment of the application provides a wind field cooperative control method, the principle is as shown in figure 1, firstly, pre-training is carried out through a model, then network copying is carried out, training is carried out in a real environment, and an optimal strategy is learned. Mainly comprises the following steps S1-S3:
s1, constructing a wind field model, such as a low-fidelity wind field model.
Specifically, the wind field model may include a wind turbine output model and a wake model, where the wind turbine output model is:
Wherein the method comprises the steps of Is the output of the wind driven generator,/>Is the deflection angle,/>Is the free wind speed,/>Is the area of the blade surface of the fan,/>Is an axial induction factor and is a control variable.
The control variable may be expressed as:
,
Wherein the method comprises the steps of Wind speed after free wind passes through the wind driven generator;
The wake model is:
Wherein the method comprises the steps of Representing the position of the fan,/>Representing the ratio of wind speed reduction,/>Is the diameter of the blade surface of the fan,/>Is the roughness coefficient.
The analytical model uses a brake disc model and a PARK model. Parameters of the wind farm simulator and PARK model are shown in table 1.
Four wind farm topologies were tested. The turbine diameter is set to be D, wind tunnel structures are used, 5*D and 7*D are selected as fan distance parameters, and 5 and 10 are selected as fan quantity parameters. Scenes are numbered 1 through 4 and scene parameters are shown in table 2.
Wind speeds of 8m/s to 16m/s are randomly generated according to the Weber distribution. Assume that the wind angle is 0.
S2, training in the wind field model through an integrated depth deterministic strategy gradient descent method to obtain K pre-trained strategy networks and K pre-trained Q networks, wherein K is an integer greater than 1.
Specifically, the method comprises the following steps S21-S24:
S21, initializing a policy network And Q network/>Wherein/>Representing Q network parameters,/>Representing policy network parameters, s representing the state, i.e. wind farm fan speed, while initializing a reward function r. Wherein, regarding the setting of the deep reinforcement learning: the policy network is a six-layer fully connected neural network, and the Q network is a seven-layer fully connected neural network. Both networks use a linear activation function in the last hidden layer and a rectifying linear unit in the remaining layers. Other parameters are shown in Table 1.
The bonus function is defined as the total output powerSubtract unsafe behavior loss/>. By multiplying the distance outside the safety range by a coefficient/>To calculate unsafe behavior loss/>, of the turbine。
S22, initializing target Q networks with the same weightAnd target policy network/>。
S23, initializing a playback buffer (RB).
S24, training a strategy network and a Q network in the wind field model through a depth deterministic strategy gradient descent method.
The training process includes steps S241-S246:
s241, accept model State For example, the wind speed is 11m/s.
S242, obtaining behaviors according to the policy network:/>; Wherein/>Is Gaussian noise;
s243, executing behaviors in the wind field model And get rewards/>。
S244, will transitionAnd randomly sampling n data from the transition of the replay buffer to form a batch, n being an integer greater than 1.
S245, updating the strategy network, the Q network, the target strategy network and the target Q network. The updating method comprises the following steps S2451-S2453:
s2451 by minimizing losses Updating the Q network, and the loss equation is:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of data,/>For/>Target value of individual data,/>Represents the/>Model state of individual data,/>Representing the/>, in the modelExecution behavior of the individual data.
S2452, updating a strategy network by using a strategy gradient, wherein a gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize.
S2453, updating the target policy network and the target Q network: Wherein/> To update the parameters.
S246, iterating the steps S241-S245 until convergence, and completing the pre-training of the strategy network and the Q network. By the above method, 10 policy networks and 10Q networks are trained.
S3, taking the pre-trained strategy network and the Q network as initial networks, and learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method.
The training process includes steps S31-S34:
S31, initializing the 10 strategy networks obtained through pre-training into 10 actual strategy networks.
S32, selecting the last pre-trained Q network to initialize to an actual Q network, and initializing 10 target strategy networks and 1 target Q network with the same weight.
S33, initializing a playback buffer (RB).
S34, learning an actual optimal strategy in a real environment through an integrated depth deterministic strategy gradient descent method.
The training process includes steps S341-S346: s341 accepting random real environmental status。
S342, obtaining behaviors according to K strategy network integration:/>Wherein/>Is gaussian noise.
S343, agent execution behaviorAnd get rewards/>。
S344, will transitionN data are stored in the replay buffer and randomly sampled from the transitions in the replay buffer to form a batch, N being an integer greater than 1.
S345, updating the policy network, the Q network, the target policy network and the target Q network; the update process includes steps S3451-S3453:
S3451, updating the Q network by minimizing loss, wherein a loss equation is as follows:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of data,/>For/>Target value of individual data,/>Represents the/>Status in real Environment of personal data,/>Representing the/>, in a real environmentExecution behavior of the individual data;
S3452, respectively updating the strategy network by using the strategy gradients, wherein the gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize;
S3453, finally updating the target policy network and the target Q network: Wherein/> To update the parameters.
S346, iterating the steps until convergence, completing actual training of the strategy network and the Q network, learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method, and finally training results are shown in FIG. 2. The results show that in most cases the force of the blower in the farm in this method (ABE) is greater than in conventional methods, including the optimal greedy algorithm and Park method.
Training is carried out in a wind field model through an integrated depth deterministic strategy gradient descent method to obtain a pre-trained strategy network and a pre-trained Q network, then the pre-trained strategy network and the pre-trained Q network are used as initial networks, and an actual optimal strategy is learned in an actual environment through the integrated depth deterministic strategy gradient descent method. Compared with the traditional method for obtaining the optimal control strategy through trial and error, the pre-training process effectively reduces the average learning cost in the actual learning process, the method for integrating the strategy reduces the randomness of the strategy in the learning process,
Thereby reducing the randomness of the learning cost.
TABLE 3 comparison of learning costs for different scenarios
The invention also provides a terminal, as shown in fig. 3, comprising: a processor (processor) 10, a memory (memory) 20, a communication interface (CommunicationsInterface) 30, and a bus 40; wherein,
The processor 10, memory 20, and communication interface 30 communicate with each other via the bus 40.
The communication interface 30 is used for information transfer between communication devices of the mobile terminal.
The processor 10 is configured to invoke the computer program in the memory 20 to perform the methods provided in the above method embodiments, for example, including: when the system is started, constructing a wind field model; training in the wind field model by an integrated depth deterministic strategy gradient descent method to obtain K pre-trained strategy networks and K pre-trained Q networks, wherein K is an integer greater than 1; taking a pre-trained strategy network and a Q network as initial networks, and learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method; and adopting an optimal strategy c to cooperatively control the wind field.
The invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements one or a combination of the steps of the wind farm cooperative control method. The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. Although already shown above
While embodiments of the present application have been shown and described, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by those skilled in the art within the scope of the application, which are intended to be included within the scope of the application.
Claims (8)
1. The wind field cooperative control method is characterized by comprising the following steps:
Constructing a wind field model;
Training in the wind field model by an integrated depth deterministic strategy gradient descent method to obtain K pre-trained strategy networks and K pre-trained Q networks, wherein K is an integer greater than 1; the training method of the single strategy network and the single Q network comprises the following steps:
initializing a policy network And Q network/>Wherein/>Representing Q network parameters,/>Representing policy network parameters, s representing a state;
initializing target Q networks with identical weights And target policy network/>;
Initializing a replay buffer;
Training a strategy network and a Q network in the wind field model by a depth deterministic strategy gradient descent method; wherein,
Step A, receiving random model state;
Step B, obtaining behaviors according to the strategy network:/>; Wherein/>Is Gaussian noise;
step C, executing behaviors in the wind field model And get rewards/>;
Step D, transitionRandomly sampling n data from the transition of the replay buffer to form a batch, n being an integer greater than 1;
e, updating the strategy network, the Q network, the target strategy network and the target Q network;
step F, iterating the steps A-E until convergence, and finishing pre-training of the strategy network and the Q network;
taking a pre-trained strategy network and a Q network as initial networks, and learning an actual optimal strategy in an actual environment through an integrated depth deterministic strategy gradient descent method;
and adopting an optimal strategy to cooperatively control the wind field.
2. The wind farm cooperative control method of claim 1, wherein the wind farm model comprises a wind generator output model and a wake model;
The output model of the wind driven generator is as follows: ;
Wherein the method comprises the steps of Is the output of the wind driven generator,/>Is the deflection angle,/>Is the free wind speed,/>Is the area of the blade surface of the fan,/>Is an axial induction factor, which is a control variable expressed as: /(I)Wherein/>Wind speed after free wind passes through the wind driven generator;
The wake model is:
Wherein the method comprises the steps of Representing the position of the fan,/>Representing the ratio of wind speed reduction,/>Is the diameter of the blade surface of the fan,/>Is the roughness coefficient.
3. The wind farm cooperative control method according to claim 1, wherein the step of updating the policy network, the Q network, the target policy network, and the target Q network includes:
By minimizing losses Updating the Q network, and the loss equation is:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of personal data,/>For/>Target value of individual data,/>Represents the/>Model state of individual data,/>Representing the/>, in the modelExecution behavior of the individual data;
updating a strategy network by using a strategy gradient, wherein a gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize;
updating the target policy network and the target Q network: Wherein/> To update the parameters.
4. The wind farm cooperative control method according to claim 1, wherein the step of learning the actual optimal strategy by the integrated depth deterministic strategy gradient descent method in the actual environment using the pre-trained strategy network and the Q network as initial networks comprises:
K strategy networks obtained by pre-training Initializing K actual strategy networks;
Selecting the last pre-trained Q network to initialize to an actual Q network, initializing K target strategy networks and a target Q network with the same weight;
initializing a replay buffer;
and learning an actual optimal strategy in a real environment through an integrated depth deterministic strategy gradient descent method.
5. The wind farm cooperative control method according to claim 4, wherein the step of learning the actual optimal strategy by the integrated depth deterministic strategy gradient descent method in the real environment comprises:
Accepting random real environment states ;
Network integration of K policies to obtain behavior:
Wherein/>Is Gaussian noise;
Proxy execution behavior And get rewards/>;
Will be converted intoRandomly sampling N data from the transition of the replay buffer to form a batch, N being an integer greater than 1;
Updating the policy network, the Q network, the target policy network and the target Q network;
and iterating the steps until convergence, and completing the actual training of the strategy network and the Q network.
6. The wind farm cooperative control method according to claim 4, wherein the step of updating the policy network, the Q network, the target policy network, and the target Q network comprises:
updating the Q network by minimizing the loss, the loss equation is:
Wherein, Represent the/>, in a small lotSample number,/>Is the/>, in the batch dataRewarding of personal data,/>For/>Target value of individual data,/>Represents the/>Status in real Environment of personal data,/>Representing the/>, in a real environmentExecution behavior of the individual data;
Then, respectively updating the strategy network by using the strategy gradients, wherein the gradient equation is as follows:
Wherein the method comprises the steps of Representing a cumulative discount prize;
Finally updating the target policy network and the target Q network: Wherein/> To update the parameters.
7. A terminal comprising a processor and a memory, wherein the processor is configured to execute a computer program stored in the memory to perform the wind farm cooperative control method according to any of claims 1 to 6.
8. A computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the wind farm cooperative control method according to any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010056867.4A CN111310384B (en) | 2020-01-16 | 2020-01-16 | Wind field cooperative control method, terminal and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010056867.4A CN111310384B (en) | 2020-01-16 | 2020-01-16 | Wind field cooperative control method, terminal and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111310384A CN111310384A (en) | 2020-06-19 |
CN111310384B true CN111310384B (en) | 2024-05-21 |
Family
ID=71156346
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010056867.4A Active CN111310384B (en) | 2020-01-16 | 2020-01-16 | Wind field cooperative control method, terminal and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111310384B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112460741B (en) * | 2020-11-23 | 2021-11-26 | 香港中文大学(深圳) | Control method of building heating, ventilation and air conditioning system |
CN114017904B (en) * | 2021-11-04 | 2023-01-20 | 广东电网有限责任公司 | Operation control method and device for building HVAC system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105098840A (en) * | 2015-09-16 | 2015-11-25 | 国电联合动力技术有限公司 | Coordinated control method for power of wind power plant and system employing coordinated control method |
CN107895960A (en) * | 2017-11-01 | 2018-04-10 | 北京交通大学长三角研究院 | City rail traffic ground type super capacitor energy storage system energy management method based on intensified learning |
CN108321795A (en) * | 2018-01-19 | 2018-07-24 | 上海交通大学 | Start-stop of generator set configuration method based on depth deterministic policy algorithm and system |
CN108803321A (en) * | 2018-05-30 | 2018-11-13 | 清华大学 | Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study |
CN109523029A (en) * | 2018-09-28 | 2019-03-26 | 清华大学深圳研究生院 | For the adaptive double from driving depth deterministic policy Gradient Reinforcement Learning method of training smart body |
CN110027553A (en) * | 2019-04-10 | 2019-07-19 | 湖南大学 | A kind of anti-collision control method based on deeply study |
CN110597058A (en) * | 2019-08-28 | 2019-12-20 | 浙江工业大学 | Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning |
-
2020
- 2020-01-16 CN CN202010056867.4A patent/CN111310384B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105098840A (en) * | 2015-09-16 | 2015-11-25 | 国电联合动力技术有限公司 | Coordinated control method for power of wind power plant and system employing coordinated control method |
CN107895960A (en) * | 2017-11-01 | 2018-04-10 | 北京交通大学长三角研究院 | City rail traffic ground type super capacitor energy storage system energy management method based on intensified learning |
CN108321795A (en) * | 2018-01-19 | 2018-07-24 | 上海交通大学 | Start-stop of generator set configuration method based on depth deterministic policy algorithm and system |
CN108803321A (en) * | 2018-05-30 | 2018-11-13 | 清华大学 | Autonomous Underwater Vehicle Trajectory Tracking Control method based on deeply study |
CN109523029A (en) * | 2018-09-28 | 2019-03-26 | 清华大学深圳研究生院 | For the adaptive double from driving depth deterministic policy Gradient Reinforcement Learning method of training smart body |
CN110027553A (en) * | 2019-04-10 | 2019-07-19 | 湖南大学 | A kind of anti-collision control method based on deeply study |
CN110597058A (en) * | 2019-08-28 | 2019-12-20 | 浙江工业大学 | Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning |
Non-Patent Citations (1)
Title |
---|
Jinkyoo Park ,Kincho H.Cooperative wind turbine control for maximizing wind farm power using sequential convex programming.Energy Conversion and Management.2015,第第101卷卷第295-316页. * |
Also Published As
Publication number | Publication date |
---|---|
CN111310384A (en) | 2020-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Liang et al. | A hybrid bat algorithm for economic dispatch with random wind power | |
CN112862281A (en) | Method, device, medium and electronic equipment for constructing scheduling model of comprehensive energy system | |
CN111310384B (en) | Wind field cooperative control method, terminal and computer readable storage medium | |
CN112818588B (en) | Optimal power flow calculation method, device and storage medium of power system | |
CN112613608A (en) | Reinforced learning method and related device | |
CN115972211A (en) | Control strategy offline training method based on model uncertainty and behavior prior | |
CN116388232B (en) | Wind power frequency modulation integrated inertia control method, system, electronic equipment and storage medium | |
CN111245008B (en) | Wind field cooperative control method and device | |
CN111310975A (en) | Multi-task message propagation prediction method based on depth model | |
CN116131288A (en) | Comprehensive energy system frequency control method and system considering wind and light fluctuation | |
Angel et al. | Hardware in the loop experimental validation of PID controllers tuned by genetic algorithms | |
CN114265674A (en) | Task planning method based on reinforcement learning under time sequence logic constraint and related device | |
Shi et al. | Multi-agent reinforcement learning in Cournot games | |
CN113112092A (en) | Short-term probability density load prediction method, device, equipment and storage medium | |
CN116859745B (en) | Design method of jump system model-free game control based on deviation evaluation mechanism | |
CN111723941A (en) | Rule generation method and device, electronic equipment and storage medium | |
Glass et al. | Rapid prototyping of cooperative caching in a vanet: A case study | |
Angel et al. | Metaheuristic Tuning and Practical Implementation of a PID Controller Employing Genetic Algorithms | |
CN117175585B (en) | Wind power prediction method, device, equipment and storage medium | |
Veith et al. | Imitation Game: A Model-based and Imitation Learning Deep Reinforcement Learning Hybrid | |
CN113705067B (en) | Microgrid optimization operation strategy generation method, system, equipment and storage medium | |
CN117648585B (en) | Intelligent decision model generalization method and device based on task similarity | |
CN116633599B (en) | Method, system and medium for verifying secure login of hand-tour client | |
CN116362415B (en) | Airport ground staff oriented shift scheme generation method and device | |
US20200410366A1 (en) | Automatic determination of the run parameters for a software application on an information processing platform by genetic algorithm and enhanced noise management |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |