CN114909706B - Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control - Google Patents

Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control Download PDF

Info

Publication number
CN114909706B
CN114909706B CN202210432777.XA CN202210432777A CN114909706B CN 114909706 B CN114909706 B CN 114909706B CN 202210432777 A CN202210432777 A CN 202210432777A CN 114909706 B CN114909706 B CN 114909706B
Authority
CN
China
Prior art keywords
unit building
value
network
data
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210432777.XA
Other languages
Chinese (zh)
Other versions
CN114909706A (en
Inventor
刘定杰
穆佩红
金鹤峰
谢金芳
朱浩强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Engipower Technology Co ltd
Original Assignee
Changzhou Engipower Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Engipower Technology Co ltd filed Critical Changzhou Engipower Technology Co ltd
Priority to CN202210432777.XA priority Critical patent/CN114909706B/en
Publication of CN114909706A publication Critical patent/CN114909706A/en
Application granted granted Critical
Publication of CN114909706B publication Critical patent/CN114909706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24DDOMESTIC- OR SPACE-HEATING SYSTEMS, e.g. CENTRAL HEATING SYSTEMS; DOMESTIC HOT-WATER SUPPLY SYSTEMS; ELEMENTS OR COMPONENTS THEREFOR
    • F24D19/00Details
    • F24D19/10Arrangement or mounting of control or safety devices
    • F24D19/1006Arrangement or mounting of control or safety devices for water heating systems
    • F24D19/1009Arrangement or mounting of control or safety devices for water heating systems for central heating
    • F24D19/1012Arrangement or mounting of control or safety devices for water heating systems for central heating by regulating the speed of a pump
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24DDOMESTIC- OR SPACE-HEATING SYSTEMS, e.g. CENTRAL HEATING SYSTEMS; DOMESTIC HOT-WATER SUPPLY SYSTEMS; ELEMENTS OR COMPONENTS THEREFOR
    • F24D19/00Details
    • F24D19/10Arrangement or mounting of control or safety devices
    • F24D19/1006Arrangement or mounting of control or safety devices for water heating systems
    • F24D19/1009Arrangement or mounting of control or safety devices for water heating systems for central heating
    • F24D19/1015Arrangement or mounting of control or safety devices for water heating systems for central heating using a valve or valves
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24DDOMESTIC- OR SPACE-HEATING SYSTEMS, e.g. CENTRAL HEATING SYSTEMS; DOMESTIC HOT-WATER SUPPLY SYSTEMS; ELEMENTS OR COMPONENTS THEREFOR
    • F24D19/00Details
    • F24D19/10Arrangement or mounting of control or safety devices
    • F24D19/1006Arrangement or mounting of control or safety devices for water heating systems
    • F24D19/1009Arrangement or mounting of control or safety devices for water heating systems for central heating
    • F24D19/1048Counting of energy consumption
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24DDOMESTIC- OR SPACE-HEATING SYSTEMS, e.g. CENTRAL HEATING SYSTEMS; DOMESTIC HOT-WATER SUPPLY SYSTEMS; ELEMENTS OR COMPONENTS THEREFOR
    • F24D2220/00Components of central heating installations excluding heat sources
    • F24D2220/04Sensors
    • F24D2220/042Temperature sensors
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B30/00Energy efficient heating, ventilation or air conditioning [HVAC]
    • Y02B30/70Efficient control or regulation technologies, e.g. for control of refrigerant flow, motor or heating

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Thermal Sciences (AREA)
  • Mechanical Engineering (AREA)
  • Combustion & Propulsion (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Chemical & Material Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Air Conditioning Control Device (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control, which comprises the following steps: establishing a digital twin model of the heat supply secondary network unit building by adopting a mechanism modeling and data identification method; and the installation of the heat supply secondary network unit building equipment at least comprises the following steps: the method comprises the steps of installing a variable frequency pump on a water supply pipe of a unit building with unfavorable working conditions, installing electric regulating valves on ports of other unit buildings, installing a calorimeter on a water supply main pipe of each unit building, installing a differential pressure transmitter on the unit building and installing a room temperature collector on a resident of the unit building; dynamically predicting the unit building through a deep reinforcement learning algorithm to obtain a predicted value of the thermal load of the unit building in the next time period; when the predicted value of the heat load of the unit building in the next time period is inconsistent with the current actual heat load, the frequency conversion pump frequency is regulated by adopting a reinforcement learning algorithm and a PID algorithm based on the actual measured value and the set value of the pressure difference of the water supply and return; feeding back the collected water supply flow demand change to a digital twin model of the secondary network unit building, and searching a pressure difference set value required by a new pressure difference control point of the changed unit building; and performing simulation verification on the differential pressure regulation according to the digital twin model of the secondary network unit building.

Description

Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control
Technical Field
The invention belongs to the technical field of intelligent heat supply, and particularly relates to a secondary network balance regulation and control method based on reinforcement learning algorithm and differential pressure control.
Background
The urban central heating is an important civil engineering which is always concerned by various levels of government and society, is an important subject for the research of the heat supply industry, and is mainly supported in the field of infrastructure of China, the heat supply quality is improved, the heat supply cost is reduced, and the pollution emission is reduced. For a long time, since hydraulic balance of a primary heat supply network relates to safe operation of the whole heat supply network, most heat supply enterprises pay great attention, and a great deal of funds and energy are invested for research and modification. The remarkable achievement is achieved, and the heat loss rate and the water loss rate of the pipe network are obviously reduced. The existing management means of the secondary network are mostly remained in the manual regulation stage, and the regulation fineness and the flexibility degree can not meet the requirements.
The user self-regulation in the heat metering and heating system can cause the change of the system flow to generate hydraulic imbalance, so that the hydraulic working condition characteristics of the user self-regulation variable flow heating system are analyzed and the control method is researched, and the method has important guiding significance for the operation regulation of the heat metering variable flow heating system.
In the heat supply regulation and control, the quantity regulation can be divided into controlling the inlet flow of a heat user, controlling the flow of a secondary pipe network of a heat exchange station at a heat source and controlling the pressure difference of water supply and return of the least favorable loop. Pressure differential control is a common variable flow control scheme in a metering heating system where the system flow rate may change at any time. The pressure difference control is a main method of central heating control, wherein the most unfavorable loop exists in each heating system, the pressure difference of water supply and return in the most unfavorable loop is determined through calculation, the pressure difference or pressure at a certain place in the heating system is used as a control parameter, and when the hydraulic working condition of the system is changed, the water pump frequency conversion is changed to change the flow so as to keep the pressure or pressure difference of a control point unchanged. However, the existing differential pressure control mode has poor autonomous regulation effect and energy saving effect, and the reasonable control is performed, so that the running state of the whole network is optimal, the heat supply quality is the best, and the primary problem to be solved in the heat supply industry is solved.
Based on the technical problems, a new two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control is needed to be designed.
Disclosure of Invention
The invention aims to solve the technical problem of overcoming the defects of the prior art and providing a two-stage network balance regulation and control method based on reinforcement learning algorithm and differential pressure control.
In order to solve the technical problems, the technical scheme of the invention is as follows:
The invention provides a two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control, which comprises the following steps:
S1, establishing a digital twin model of a heat supply secondary network unit building by adopting a mechanism modeling and data identification method;
Step S2, installation of heat supply secondary network unit building equipment at least comprises the following steps: the method comprises the steps of installing a variable frequency pump on a water supply pipe of a unit building with unfavorable working conditions, installing electric regulating valves on ports of other unit buildings, installing a calorimeter on a water supply main pipe of each unit building, installing a differential pressure transmitter on the unit building and installing a room temperature collector on a resident of the unit building;
s3, dynamically predicting the unit building through a deep reinforcement learning algorithm to obtain a predicted value of the unit building heat load in the next time period;
s4, when the predicted value of the heat load of the unit building in the next time period is inconsistent with the current actual heat load, adjusting the frequency of the variable frequency pump by adopting a reinforcement learning algorithm and a PID algorithm based on the actual measured value and the set value of the pressure difference of the water supply and return;
S5, feeding back the collected water supply flow demand change to a digital twin model of the secondary network unit building, and searching a pressure difference set value required by a new pressure difference control point of the changed unit building; and performing simulation verification on the differential pressure regulation according to the digital twin model of the secondary network unit building.
Further, in the step S1, a mechanism modeling and data identification method is adopted to build a digital twin model of the heating secondary network unit building, which specifically includes:
Establishing a digital twin model comprising a physical entity, a virtual entity, a twin data service and connecting elements among the components of the two-level network unit building;
The physical entity is the basis of a digital twin model and is a data source driven by the whole digital twin model; the virtual entity and the physical entity are mapped one by one and interacted in real time, elements of the physical space are described from multiple dimensions and multiple scales, the actual process of the physical entity is simulated, and element data are analyzed, evaluated, predicted and controlled; the twin data service integrates the physical space information and the virtual space information, ensures the real-time performance of data transmission, provides knowledge base data comprising intelligent algorithms, models, rule standards and expert experiences, and forms a twin database by fusing the physical information, the multi-time space associated information and the knowledge base data; the connection between the components realizes the interconnection of the components, and the real-time acquisition and feedback of data are realized between the physical entity and the twin data service through the sensor and the protocol transmission specification; the physical entity and the virtual entity carry out data transmission through a protocol, physical information is transmitted to the virtual space in real time to update the correction model, and the virtual entity carries out real-time control on the physical entity through an executor; the information transfer between the virtual entity and the twin data service is realized through a database interface;
And identifying the digital twin model, accessing the multi-working-condition real-time operation data of the secondary network unit building into the established digital twin model, and adopting a reverse identification method to carry out self-adaptive identification correction on the simulation result of the digital twin model to obtain the digital twin model of the identified and corrected secondary network unit building.
Further, in the step S3, the predicted value of the thermal load of the unit building in the next time period is obtained by dynamically predicting the unit building through a deep reinforcement learning algorithm, which specifically includes:
acquiring historical heat supply data of a unit building, preprocessing the historical heat supply data to obtain a sample set of a load prediction model, wherein the historical heat supply data of the unit building at least comprises indoor temperature, weather data, unit building water supply and return temperature, unit building water supply flow and unit building instantaneous heat supply;
modeling the unit building thermal load prediction problem as a Markov decision process model, and defining states, actions and rewarding functions therein;
establishing a unit building thermal load prediction model by adopting a deep reinforcement learning algorithm, inputting historical heat supply data into the unit building thermal load prediction model, and training the unit building thermal load prediction model;
and outputting the unit building thermal load demand value through the unit building thermal load prediction model.
Further, modeling the unit building thermal load prediction problem as a Markov decision process model and defining states, actions and rewarding functions therein, specifically comprising:
the heat load data of the unit building has time sequence, and k first i time units of heat load data training sample sets of the unit building are constructed by taking time-by-time load as a unit and expressed as :X={(q1,q2,…,qi),(q2,q3,…,qi+1),…,(qk,qk+1,…,qk+i)};
Defining the initial state of the heat load of the unit building as s 0=[q1,q2,…,qk, wherein the action taken is denoted by a, and the predicted heat load of the unit building at the next moment is transferred to the state s 1=[q1,q2,…,qk+1 at the next moment; the constructed dynamic space set a= { a 1,a2,…,ak };
Constructing a reward set r= { R 1,r2,…,rk},rk=-|ak-qk+i |; the reward value is the negative number of the absolute value of the difference between the action value taken by each state and the true value of the load at the next moment, and the sample set comprises k reward values which are in one-to-one correspondence with each training sample in the training sample set;
by maximizing the jackpot Q (s, a) to obtain the optimal action, under successive iterations, the Q learning process is continually updated with the rewards after the action is completed, while learning a good strategy to maximize the target rewards value.
Further, a deep reinforcement learning algorithm is adopted to establish a unit building thermal load prediction model, historical heat supply data is input into the prediction model, and the model is trained, specifically comprising:
Adding an experience playback mechanism into the DQN algorithm, and initializing a playback memory unit;
Taking a deep neural network as a Q value network, and updating parameters of the deep neural network by using a gradient descent algorithm;
Q (s, a) under any state s is obtained through the current value network of the heat supply data of the unit building, after a value function is calculated through the current value network, an E-greedy strategy is used for selecting an action a, each state transition is performed, the action is recorded as a time step t, and the data obtained in each time step are added into a playback memory unit;
In the training process, the current value function is represented by a current value network, a target value network is used for generating a target Q value, and Q (s, a|theta i) represents an output action value function of the current network and is used for evaluating the action of the current state; Output representing target value network, use/> Calculating an approximate action value function of the target value network;
adopting the mean square error between the current Q value and the target Q value as an error function, and updating the parameters of the current value network; the error function is expressed as: l (θ i)=Es,a,r,s′[(Yi-Q(s,a|θi))2 ];
Randomly selecting one (s, a, r, s ') from the playback memory unit, respectively transmitting the (s, a), the s', and the r to a current value network, a target value network and an error function, and updating the L (theta i) about theta i by using a gradient method to obtain a predicted value, wherein the mode of updating the value function by the DQN algorithm is as follows:
Wherein, gamma is a discount factor; in the iteration process, only the parameter theta of the current action value function is updated in real time, and every time N iterations are performed, the parameter theta of the current value network is copied to the target value network.
Further, the step S3 further includes: simulating and generating a virtual sample based on current historical sample data by adopting a GAN algorithm, wherein the actual historical sample data are stored in an actual sample pool and are used for training a GAN algorithm model; the virtual sample generated by the GAN algorithm is stored in a virtual sample pool; the historical sample data and the virtual sample data are used as input information of a deep reinforcement learning algorithm DQN model together to carry out training learning, the training learning is carried out by interacting a trial-and-error mechanism with the environment, and the load prediction of the unit building is realized in a mode of maximizing accumulated rewards.
Further, in the step S4, based on the actual measurement value and the set value of the pressure difference of the water supply and return, the frequency of the variable frequency pump is adjusted by adopting a reinforcement learning algorithm and a PID algorithm, which specifically includes:
Designing a self-adaptive PID control algorithm based on an Actor-Critic structure and an RBF network;
Based on the actual measurement value and the set value of the water supply and return pressure difference, adopting a self-adaptive PID control algorithm to adaptively adjust PID parameters, acting on the controlled object variable frequency pump, adjusting the frequency of the variable frequency pump, and changing the water supply and return pressure difference;
The control principle of the adaptive PID control algorithm based on the Actor-Critic structure and the RBF network is designed as follows: defining the actual measurement value and the set value of the supply and return water pressure difference as an error e (t), and converting the error e (t) into a state vector x (t) = [ e (t) deltae (t) delta 2e(t)]T needed by RBF network learning through a state converter; the state vector x (t) is used as the input of the RBF network, and is calculated by an implicit layer and an output layer, a preliminary PID parameter value K '(t) = [ K' I k′P k′D ] is output by an Actor, and a value function V (t) is output by Critic; the random motion corrector corrects K' (t) according to the value function V (t) to obtain a final PID parameter K (t) = [ K I kP kD ].
Further, the output of the adaptive PID control algorithm Δu (t) =k pΔe(t)+kIe(t)+kDΔ2 e (t);
The RBF network comprises an input layer, an implicit layer and an output layer, wherein the input layer comprises three input nodes for respectively inputting e (t), delta e (t) and delta 2 e (t); the hidden layer comprises h nodes, the activation function is a Gaussian kernel function, and the output of the nodes is calculated; the output layer is composed of an Actor and Critic, shares the resources of the input layer and the hidden layer of the RBF network, and comprises four output nodes, wherein the first three outputs are three components of K' (t) output by the Actor, the fourth node outputs are a value function V (t) of Critic, and the values are respectively expressed as:
Wherein j=1, 2,3,4,5 is the hidden layer node number; m=1, 2,3 is the output layer node number; w jm is the weight between the jth node of the hidden layer and the mth node of the output layer Actor.
Further, the Actor is used for learning a strategy, and the parameter correction method is to superimpose a gaussian interference K η on K' (t); the Critic is used for evaluating a value function, a TD algorithm is adopted for learning, and a TD error is defined through the value function and a return function r (t): δ TD =r (t) +γv (t+1) -V (t), and updating the Actor and Critic weights and RBF network parameters according to the error.
The beneficial effects of the invention are as follows:
The method comprises the steps of dynamically predicting a unit building through a deep reinforcement learning algorithm to obtain a predicted value of the thermal load of the unit building in the next time period; when the predicted value of the heat load of the unit building in the next time period is inconsistent with the current actual heat load, the frequency conversion pump frequency is regulated by adopting a reinforcement learning algorithm and a PID algorithm based on the actual measured value and the set value of the pressure difference of the water supply and return; feeding back the collected water supply flow demand change to a digital twin model of the secondary network unit building, and searching a pressure difference set value required by a new pressure difference control point of the changed unit building; performing simulation verification on differential pressure regulation according to a digital twin model of the secondary network unit building; the frequency of the water pump can be reasonably controlled according to the pressure difference, so that the running state of the whole network is optimal, the heat supply quality is best, the hydraulic imbalance phenomenon is effectively solved, and the balance and stable running of the secondary network is ensured.
Additional features and advantages will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a two-level network balance control method based on reinforcement learning algorithm and differential pressure control;
FIG. 2 is a schematic view of the DQN model structure of the present invention;
FIG. 3 is a diagram of the DQN model training process of the present invention;
FIG. 4 is a block diagram of an adaptive PID controller based on an Actor-Critic architecture and RBF network according to the present invention;
FIG. 5 is a schematic diagram of the RBF-based Actor-Critic learning architecture of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
FIG. 1 is a flow chart of a two-level network balance control method based on reinforcement learning algorithm and differential pressure control according to the present invention.
As shown in fig. 1, the present embodiment provides a two-stage network balance adjustment and control method based on reinforcement learning algorithm and differential pressure control, which includes:
S1, establishing a digital twin model of a heat supply secondary network unit building by adopting a mechanism modeling and data identification method;
Step S2, installation of heat supply secondary network unit building equipment at least comprises the following steps: the method comprises the steps of installing a variable frequency pump on a water supply pipe of a unit building with unfavorable working conditions, installing electric regulating valves on ports of other unit buildings, installing a calorimeter on a water supply main pipe of each unit building, installing a differential pressure transmitter on the unit building and installing a room temperature collector on a resident of the unit building;
s3, dynamically predicting the unit building through a deep reinforcement learning algorithm to obtain a predicted value of the unit building heat load in the next time period;
s4, when the predicted value of the heat load of the unit building in the next time period is inconsistent with the current actual heat load, adjusting the frequency of the variable frequency pump by adopting a reinforcement learning algorithm and a PID algorithm based on the actual measured value and the set value of the pressure difference of the water supply and return;
S5, feeding back the collected water supply flow demand change to a digital twin model of the secondary network unit building, and searching a pressure difference set value required by a new pressure difference control point of the changed unit building; and performing simulation verification on the differential pressure regulation according to the digital twin model of the secondary network unit building.
In practical application, most buildings select unit building opening electric regulating valves, building distributed pumps are selected for unit buildings with unfavorable working conditions, and the opening control of the electric regulating valves adopts an original regulating strategy, including the predictive control of the opening by adopting a deep learning algorithm, a reinforcement learning algorithm and a machine learning algorithm; the frequency control of the building distributed pump is based on reinforcement learning algorithm and differential pressure control; in addition, differential pressure transmitters are typically installed at the most adverse differential pressure cell.
In this embodiment, in step S1, a mechanism modeling and data identification method is adopted to build a digital twin model of a heating secondary network unit building, which specifically includes:
Establishing a digital twin model comprising a physical entity, a virtual entity, a twin data service and connecting elements among the components of the two-level network unit building;
The physical entity is the basis of a digital twin model and is a data source driven by the whole digital twin model; the virtual entity and the physical entity are mapped one by one and interacted in real time, elements of the physical space are described from multiple dimensions and multiple scales, the actual process of the physical entity is simulated, and element data are analyzed, evaluated, predicted and controlled; the twin data service integrates the physical space information and the virtual space information, ensures the real-time performance of data transmission, provides knowledge base data comprising intelligent algorithms, models, rule standards and expert experiences, and forms a twin database by fusing the physical information, the multi-time space associated information and the knowledge base data; the connection between the components realizes the interconnection of the components, and the real-time acquisition and feedback of data are realized between the physical entity and the twin data service through the sensor and the protocol transmission specification; the physical entity and the virtual entity carry out data transmission through a protocol, physical information is transmitted to the virtual space in real time to update the correction model, and the virtual entity carries out real-time control on the physical entity through an executor; the information transfer between the virtual entity and the twin data service is realized through a database interface;
And identifying the digital twin model, accessing the multi-working-condition real-time operation data of the secondary network unit building into the established digital twin model, and adopting a reverse identification method to carry out self-adaptive identification correction on the simulation result of the digital twin model to obtain the digital twin model of the identified and corrected secondary network unit building.
In the heating system, because of uncertainty of autonomous adjustment of users, the hydraulic working condition of the pipe network system is greatly changed, and the hydraulic working condition of the system is stabilized, so that when the autonomous adjustment of users is ensured to reduce the flow, other adjustment users can still be stabilized in a set flow condition and keep own indoor temperature, and the autonomous adjustment process of users is essentially a change process of the impedance of the pipe network or the user system.
Graph theory hydraulic condition analysis basic principle: any fluid network is a geometric figure connected by a plurality of nodes and pipes, and is a directed graph because the water flow has a certain direction. The hydraulic model of the heat supply pipe network is established according to a flow balance equation and a pressure balance equation.
In order to ensure that the system has enough circulating power and can ensure that all users in a pipe network can obtain required water flow under the design working condition, a loop with the largest loop resistance relative to other loops is generally selected, and the rated lift of the circulating water pump is determined by taking the required tariff pressure head of the users on the loop under the design working condition as a basis. This loop with the greatest resistance is often referred to as the least favored loop. In most cases, the most disadvantageous loop is the loop in which the user furthest from the circulating water pump is located. At present, in the operation regulation stage of the system, the most adverse hydraulic loop is usually used as a reference object to be introduced into a design link of a control strategy, for example, a reference pressure difference regulated by a water pump selects a pressure difference of a user on the most adverse hydraulic loop, and the selection of a pressure difference set value is usually performed by referring to a service pressure difference of the user under a design working condition or referring to a pressure difference set value level required by the flow supply of the user on the most adverse hydraulic loop.
The most unfavorable thermodynamic loop has a certain identification method, and the identification method is as follows: the least adverse thermal loop in the system is only one, and the branch is the least adverse thermal loop in the pipe network; the least favorable thermodynamic loop in the system still has one, but the branch is different from the least favorable thermodynamic loop, and a certain branch in the middle of the system; there are a number of the most disadvantageous thermodynamic loops in the system. At this time, the most adverse degree of the loops should be compared in the period, and the loop with the greatest adverse degree of the loops is selected as the reference loop for controlling the pressure difference of the water pump, so that the requirements of all users can be met.
In this embodiment, in step S3, a predicted value of the thermal load of the unit building in the next time period is obtained by dynamically predicting the unit building through a deep reinforcement learning algorithm, which specifically includes:
acquiring historical heat supply data of a unit building, preprocessing the historical heat supply data to obtain a sample set of a load prediction model, wherein the historical heat supply data of the unit building at least comprises indoor temperature, weather data, unit building water supply and return temperature, unit building water supply flow and unit building instantaneous heat supply;
modeling the unit building thermal load prediction problem as a Markov decision process model, and defining states, actions and rewarding functions therein;
establishing a unit building thermal load prediction model by adopting a deep reinforcement learning algorithm, inputting historical heat supply data into the unit building thermal load prediction model, and training the unit building thermal load prediction model;
and outputting the unit building thermal load demand value through the unit building thermal load prediction model.
In this embodiment, the unit building thermal load prediction problem is modeled as a markov decision process model, and states, actions, and rewards functions therein are defined, including:
the heat load data of the unit building has time sequence, and k first i time units of heat load data training sample sets of the unit building are constructed by taking time-by-time load as a unit and expressed as :X={(q1,q2,…,qi),(q2,q3,…,qi+1),…,(qk,qk+1,…,qk+i)};
Defining the initial state of the heat load of the unit building as s 0=[q1,q2,…,qk, wherein the action taken is denoted by a, and the predicted heat load of the unit building at the next moment is transferred to the state s 1=[q1,q2,…,qk+1 at the next moment; the constructed dynamic space set a= { a 1,a2,…,ak };
Constructing a reward set r= { R 1,r2,…,rk},rk=-|ak-qk+i |; the reward value is the negative number of the absolute value of the difference between the action value taken by each state and the true value of the load at the next moment, and the sample set comprises k reward values which are in one-to-one correspondence with each training sample in the training sample set;
by maximizing the jackpot Q (s, a) to obtain the optimal action, under successive iterations, the Q learning process is continually updated with the rewards after the action is completed, while learning a good strategy to maximize the target rewards value.
Fig. 2 is a schematic diagram of the DQN model structure according to the invention.
Fig. 3 is a diagram of the DQN model training process according to the invention.
As shown in fig. 2-3, in this embodiment, a deep reinforcement learning algorithm is used to build a unit building thermal load prediction model, input historical heating data into the prediction model, and train the model, and specifically includes:
Adding an experience playback mechanism into the DQN algorithm, and initializing a playback memory unit;
Taking a deep neural network as a Q value network, and updating parameters of the deep neural network by using a gradient descent algorithm;
Q (s, a) under any state s is obtained through the current value network of the heat supply data of the unit building, after a value function is calculated through the current value network, an E-greedy strategy is used for selecting an action a, each state transition is performed, the action is recorded as a time step t, and the data obtained in each time step are added into a playback memory unit;
In the training process, the current value function is represented by a current value network, a target value network is used for generating a target Q value, and Q (s, a|theta i) represents an output action value function of the current network and is used for evaluating the action of the current state; Output representing target value network, use/> Calculating an approximate action value function of the target value network;
adopting the mean square error between the current Q value and the target Q value as an error function, and updating the parameters of the current value network; the error function is expressed as: l (θ i)=Es,a,r,s′[(Yi-Q(s,a|θi))2 ];
Randomly selecting one (s, a, r, s ') from the playback memory unit, respectively transmitting the (s, a), the s', and the r to a current value network, a target value network and an error function, and updating the L (theta i) about theta i by using a gradient method to obtain a predicted value, wherein the mode of updating the value function by the DQN algorithm is as follows:
Wherein, gamma is a discount factor; in the iteration process, only the parameter theta of the current action value function is updated in real time, and every time N iterations are performed, the parameter theta of the current value network is copied to the target value network.
In this embodiment, the step S3 further includes: simulating and generating a virtual sample based on current historical sample data by adopting a GAN algorithm, wherein the actual historical sample data are stored in an actual sample pool and are used for training a GAN algorithm model; the virtual sample generated by the GAN algorithm is stored in a virtual sample pool; the historical sample data and the virtual sample data are used as input information of a deep reinforcement learning algorithm DQN model together to carry out training learning, the training learning is carried out by interacting a trial-and-error mechanism with the environment, and the load prediction of the unit building is realized in a mode of maximizing accumulated rewards.
In practical applications, in the GAN model structure, the generator model G and the discriminant model D are represented by differentiable functions, and their respective inputs are random noise z and true data x, respectively. G (z) represents a sample generated by the generator model G that is as compliant as possible with the true data distribution; the goal of the discriminant model D is to discriminate the data source, labeled 1 if the discriminating input is from truly data, and labeled 0 if the input is from the generator model G. In the course of the continuous optimization, the goal of the generator model G is to make the label D (G (z)) of the generated dummy data G (z) on the discriminator model D coincide with the label D (x) of the real data x on the discriminator model D. In the learning process, the mutual antagonism between the two processes and the iterative optimization process continuously improve the performance of the generator model G, and meanwhile, when the discrimination capability of the discriminator model D is improved to the point that the data source cannot be correctly judged, the generator model can be considered to learn the distribution of the real data.
It should be noted that, a reinforcement learning algorithm based on generating an antagonism network is proposed. The algorithm collects experience samples through a random strategy in the initial stage of training, adds the experience samples into a real sample pool, trains and generates an countermeasure network by utilizing samples in the real sample pool, generates a new sample by utilizing the generated countermeasure network to add the new sample into a virtual sample pool, and finally combines the real sample pool and the virtual sample pool to select training samples in batches. The proposed algorithm effectively solves the problem of insufficient sample of reinforcement learning in the initial stage of training, and accelerates learning and convergence. Aiming at the problem that the Q learning algorithm is applied to nonlinear load prediction performance is low, a deep Q learning load prediction algorithm based on a generated countermeasure network is provided. The algorithm introduces a deep neural network, constructs a deep Q network as a nonlinear function approximator to approximate an action value function, and solves the problems that the algorithm performance of the Q learning algorithm is poor and even the Q learning algorithm cannot be converged in a large state space by using a value function approximation method.
FIG. 4 is a block diagram of an adaptive PID controller based on an Actor-Critic architecture and RBF network in accordance with the present invention.
FIG. 5 is a schematic diagram of RBF-based Actor-Critic learning architecture in accordance with the present invention.
As shown in fig. 4 to 5, in the embodiment, in the step S4, based on the actual measurement value and the set value of the pressure difference of the water supply and return, the frequency of the variable frequency pump is adjusted by adopting a reinforcement learning algorithm and a PID algorithm, which specifically includes:
Designing a self-adaptive PID control algorithm based on an Actor-Critic structure and an RBF network;
based on the actual measured value and the set value of the pressure difference of the water supply and return, an adaptive PID control algorithm is adopted to adaptively adjust PID parameters, the PID parameters act on the variable frequency pump of the controlled object, the frequency of the variable frequency pump is adjusted, and the pressure difference of the water supply and return is changed.
The control principle of the adaptive PID control algorithm based on the Actor-Critic structure and the RBF network is designed as follows: defining the actual measurement value and the set value of the supply and return water pressure difference as an error e (t), and converting the error e (t) into a state vector x (t) = [ e (t) deltae (t) delta 2e(t)]T needed by RBF network learning through a state converter; the state vector x (t) is used as the input of the RBF network, and is calculated by an implicit layer and an output layer, a preliminary PID parameter value K '(t) = [ K' I k′P k′D ] is output by an Actor, and a value function V (t) is output by Critic; the random motion corrector corrects K' (t) according to the value function V (t) to obtain a final PID parameter K (t) = [ K I kP kD ].
In this embodiment, the output Δu (t) =k pΔe(t)+kIe(t)+kDΔ2 e (t) of the adaptive PID control algorithm;
The RBF network comprises an input layer, an implicit layer and an output layer, wherein the input layer comprises three input nodes for respectively inputting e (t), delta e (t) and delta 2 e (t); the hidden layer comprises h nodes, the activation function is a Gaussian kernel function, and the output of the nodes is calculated; the output layer is composed of an Actor and Critic, shares the resources of the input layer and the hidden layer of the RBF network, and comprises four output nodes, wherein the first three outputs are three components of K' (t) output by the Actor, the fourth node outputs are a value function V (t) of Critic, and the values are respectively expressed as:
Wherein j=1, 2,3,4,5 is the hidden layer node number; m=1, 2,3 is the output layer node number; w jm is the weight between the jth node of the hidden layer and the mth node of the output layer Actor.
In this embodiment, the Actor is used for learning a strategy, and the parameter correction method is to superimpose a gaussian interference K η on K' (t); the Critic is used for evaluating a value function, a TD algorithm is adopted for learning, and a TD error is defined through the value function and a return function r (t): δ TD =r (t) +γv (t+1) -V (t), and updating the Actor and Critic weights and RBF network parameters according to the error.
It should be noted that the RBF network has the characteristics of strong mapping capability and simple learning rule, and is combined with the Actor-Critic structure for approximation of the Actor-Critic value function and the strategy function. A new control algorithm of a self-adaptive PID control algorithm based on an Actor-Critic structure and an RBF network is designed, PID parameters can be quickly adjusted, tracking of input signals is achieved, and compared with the control effect of a traditional PID and other algorithms, the new controller is faster in response and smaller in overshoot.
In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other manners as well. The system embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored on a computer readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
With the above-described preferred embodiments according to the present invention as an illustration, the above-described descriptions can be used by persons skilled in the relevant art to make various changes and modifications without departing from the scope of the technical idea of the present invention. The technical scope of the present invention is not limited to the description, but must be determined according to the scope of claims.

Claims (9)

1. A two-stage network balance regulation and control method based on reinforcement learning algorithm and differential pressure control is characterized by comprising the following steps:
S1, establishing a digital twin model of a heat supply secondary network unit building by adopting a mechanism modeling and data identification method;
Step S2, installation of heat supply secondary network unit building equipment at least comprises the following steps: the method comprises the steps of installing a variable frequency pump on a water supply pipe of a unit building with unfavorable working conditions, installing electric regulating valves on ports of other unit buildings, installing a calorimeter on a water supply main pipe of each unit building, installing a differential pressure transmitter on the unit building and installing a room temperature collector on a resident of the unit building;
s3, dynamically predicting the unit building through a deep reinforcement learning algorithm to obtain a predicted value of the unit building heat load in the next time period;
s4, when the predicted value of the heat load of the unit building in the next time period is inconsistent with the current actual heat load, adjusting the frequency of the variable frequency pump by adopting a reinforcement learning algorithm and a PID algorithm based on the actual measured value and the set value of the pressure difference of the water supply and return;
S5, feeding back the collected water supply flow demand change to a digital twin model of the secondary network unit building, and searching a pressure difference set value required by a new pressure difference control point of the changed unit building; and performing simulation verification on the differential pressure regulation according to the digital twin model of the secondary network unit building.
2. The method for regulating and controlling the balance of the secondary network according to claim 1, wherein in the step S1, a mechanism modeling and data identification method is adopted to build a digital twin model of the unit building of the heat supply secondary network, and the method specifically comprises the following steps:
Establishing a digital twin model comprising a physical entity, a virtual entity, a twin data service and connecting elements among the components of the two-level network unit building;
The physical entity is a data source of the whole digital twin model;
the virtual entity performs simulation on the actual process of the physical entity, and performs analysis data, evaluation, prediction and control on the element data;
the twin data service integrates the physical space information and the virtual space information, provides knowledge base data comprising intelligent algorithms, models, rule standards and expert experience, and forms a twin database by fusing the physical information, the multi-time space associated information and the knowledge base data;
The connection between the components is used for realizing interconnection of the components, and the real-time acquisition and feedback of data are realized between the physical entity and the twin data service through the sensor and the protocol transmission standard;
The physical entity and the virtual entity perform data transmission through a protocol, physical information is transmitted to the virtual space in real time to update the correction model, and the virtual entity performs real-time control on the physical entity through an actuator;
The virtual entity and the twin data service are subjected to information transfer through a database interface;
And identifying the digital twin model, accessing the multi-working-condition real-time operation data of the secondary network unit building into the established digital twin model, and adopting a reverse identification method to carry out self-adaptive identification correction on the simulation result of the digital twin model to obtain the digital twin model of the identified and corrected secondary network unit building.
3. The method of claim 1, wherein in step S3, the predicted value of the heat load of the unit building in the next time period is obtained by dynamically predicting the unit building through a deep reinforcement learning algorithm, and the method specifically comprises the following steps:
Acquiring historical heat supply data of a unit building and preprocessing the historical heat supply data to obtain a sample set of a load prediction model, wherein the historical heat supply data of the unit building at least comprises indoor temperature, weather data, unit building water supply and return temperature, unit building water supply flow and unit building instantaneous heat supply;
modeling the unit building thermal load prediction problem as a Markov decision process model, and defining states, actions and rewarding functions therein;
establishing a unit building thermal load prediction model by adopting a deep reinforcement learning algorithm, inputting historical heat supply data into the unit building thermal load prediction model, and training the unit building thermal load prediction model;
and outputting the unit building thermal load demand value through the unit building thermal load prediction model.
4. The method of claim 3, wherein modeling the cell building thermal load prediction problem as a markov decision process model and defining states, actions and rewards functions therein comprises:
The unit building thermal load data has time sequence, and k unit building thermal load data training sample sets at the first i moments are constructed by taking time-by-time load as a unit, and are expressed as follows:
X={(q1,q2,…,qi),(q2,q3,…,qi+1),…,(qk,qk+1,…,qk+i)};
Defining the initial state of the heat load of the unit building as s 0=[q1,q2,…,qk, wherein the action taken is denoted by a, and the predicted heat load of the unit building at the next moment is transferred to the state s 1=[q1,q2,…,qk+1 at the next moment; the constructed dynamic space set a= { a 1,a2,…,ak };
Constructing a reward set r= { R 1,r2,…,rk},rk=-|ak-qk+i |; the reward value is the negative number of the absolute value of the difference between the action value taken by each state and the true value of the load at the next moment, and the sample set comprises k reward values which are in one-to-one correspondence with each training sample in the training sample set;
by maximizing the jackpot Q (s, a) to obtain the optimal action, under successive iterations, the Q learning process is continually updated with the rewards after the action is completed, while learning a good strategy to maximize the target rewards value.
5. The method for controlling the balance of a secondary network according to claim 3, wherein a deep reinforcement learning algorithm is adopted to build the unit building thermal load prediction model, historical heating data is input into the unit building thermal load prediction model, and the unit building thermal load prediction model is trained, and the method specifically comprises the following steps:
Adding an experience playback mechanism into the DQN algorithm, and initializing a playback memory unit;
Taking a deep neural network as a Q value network, and updating parameters of the deep neural network by using a gradient descent algorithm;
Q (s, a) under any state s is obtained through the current value network of the heat supply data of the unit building, after a value function is calculated through the current value network, an E-greedy strategy is used for selecting an action a, each state transition is performed, the action is recorded as a time step t, and the data obtained in each time step are added into a playback memory unit;
in the training process, representing a current value function through a current value network, and generating a target Q value by using a target value network; q (s, a|θ i) represents the output action value function of the current network, used to evaluate the current state action; Output representing target value network, use/> Calculating an approximate action value function of the target value network;
adopting the mean square error between the current Q value and the target Q value as an error function, and updating the parameters of the current value network; the error function is expressed as: l (θ i)=Es,a,r,s′[(Yi-Q(s,a|θi))2 ];
Randomly selecting one (s, a, r, s ') from the playback memory unit, respectively transmitting the (s, a), the s', and the r to a current value network, a target value network and an error function, and updating the L (theta i) about theta i by using a gradient method to obtain a predicted value, wherein the mode of updating the value function by the DQN algorithm is as follows:
Wherein, gamma is a discount factor; in the iteration process, only the parameter theta of the current action value function is updated in real time, and every time N iterations are performed, the parameter theta of the current value network is copied to the target value network.
6. The method for controlling balance of a secondary network according to claim 3, wherein the step S3 further comprises:
simulating and generating a virtual sample based on current historical sample data by adopting a GAN algorithm, and storing real historical sample data in a real sample pool for training a GAN algorithm model;
The virtual sample generated by the GAN algorithm is stored in a virtual sample pool;
The historical sample data and the virtual sample data are used as input information of a deep reinforcement learning algorithm DQN model together to carry out training learning, the training learning is carried out by interacting a trial-and-error mechanism with the environment, and the load prediction of the unit building is realized in a mode of maximizing accumulated rewards.
7. The method according to claim 1, wherein in the step S4, the variable frequency pump frequency is adjusted by adopting a reinforcement learning algorithm and a PID algorithm based on the actual measurement value and the set value of the supply-return water pressure difference, and the method specifically comprises:
Designing a self-adaptive PID control algorithm based on an Actor-Critic structure and an RBF network;
Based on the actual measurement value and the set value of the water supply and return pressure difference, adopting a self-adaptive PID control algorithm to adaptively adjust PID parameters, acting on the controlled object variable frequency pump, adjusting the frequency of the variable frequency pump, and changing the water supply and return pressure difference;
The control principle of the adaptive PID control algorithm based on the Actor-Critic structure and the RBF network is designed as follows: defining the actual measurement value and the set value of the supply and return water pressure difference as an error e (t), and converting the error e (t) into a state vector x (t) = [ e (t) deltae (t) delta 2e(t)]T needed by RBF network learning through a state converter; the state vector x (t) is used as the input of the RBF network, and is calculated by an implicit layer and an output layer, a preliminary PID parameter value K '(t) = [ K' I k′P k′D ] is output by an Actor, and a value function V (t) is output by Critic; the random motion corrector corrects K' (t) according to the value function V (t) to obtain a final PID parameter K (t) = [ K I kP kD ].
8. The method according to claim 7, wherein the output Δu (t) =k pΔe(t)+kIe(t)+kDΔ2 e (t) of the adaptive PID control algorithm;
The RBF network comprises an input layer, an implicit layer and an output layer, wherein the input layer comprises three input nodes for respectively inputting e (t), delta e (t) and delta 2 e (t); the hidden layer comprises h nodes, the activation function is a Gaussian kernel function, and the output of the nodes is calculated; the output layer is composed of an Actor and Critic, shares the resources of the input layer and the hidden layer of the RBF network, and comprises four output nodes, wherein the first three outputs are three components of K' (t) output by the Actor, the fourth node outputs are a value function V (t) of Critic, and the values are respectively expressed as:
Wherein j=1, 2,3,4,5 is the hidden layer node number; m=1, 2,3 is the output layer node number; w jm is the weight between the jth node of the hidden layer and the mth node of the output layer Actor.
9. The method for regulating and controlling the balance of the secondary network according to claim 8, wherein the Actor is used for learning a strategy, and the parameter correction method is to superimpose a gaussian interference K η on K' (t); the Critic is used for evaluating a value function, a TD algorithm is adopted for learning, and a TD error is defined through the value function and a return function r (t): δ TD =r (t) +γv (t+1) -V (t), and updating the Actor and Critic weights and RBF network parameters according to the error.
CN202210432777.XA 2022-04-24 2022-04-24 Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control Active CN114909706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210432777.XA CN114909706B (en) 2022-04-24 2022-04-24 Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210432777.XA CN114909706B (en) 2022-04-24 2022-04-24 Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control

Publications (2)

Publication Number Publication Date
CN114909706A CN114909706A (en) 2022-08-16
CN114909706B true CN114909706B (en) 2024-05-07

Family

ID=82764249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210432777.XA Active CN114909706B (en) 2022-04-24 2022-04-24 Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control

Country Status (1)

Country Link
CN (1) CN114909706B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129430B (en) * 2022-09-01 2022-11-22 山东德晟机器人股份有限公司 Robot remote control instruction issuing method and system based on 5g network
CN117830033B (en) * 2024-03-06 2024-06-04 深圳市前海能源科技发展有限公司 Regional cooling and heating system regulation and control method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2182296A2 (en) * 2008-10-28 2010-05-05 Oy Scancool Ab District heating arrangement and method
CN108916986A (en) * 2018-09-10 2018-11-30 常州英集动力科技有限公司 The secondary network flow-changing water dynamic balance of information physical fusion regulates and controls method and system
CN113091123A (en) * 2021-05-11 2021-07-09 杭州英集动力科技有限公司 Building unit heat supply system regulation and control method based on digital twin model
CN113446661A (en) * 2021-07-30 2021-09-28 西安热工研究院有限公司 Intelligent and efficient heat supply network operation adjusting method
CN113657031A (en) * 2021-08-12 2021-11-16 杭州英集动力科技有限公司 Digital twin-based heat supply scheduling automation realization method, system and platform
CN113757788A (en) * 2021-09-15 2021-12-07 河北工大科雅能源科技股份有限公司 Station-load linked two-network balance online dynamic intelligent regulation and control method and system
CN114183796A (en) * 2021-11-26 2022-03-15 杭州英集动力科技有限公司 Optimal scheduling method and device based on electric heating and central heating multi-energy complementary system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2182296A2 (en) * 2008-10-28 2010-05-05 Oy Scancool Ab District heating arrangement and method
CN108916986A (en) * 2018-09-10 2018-11-30 常州英集动力科技有限公司 The secondary network flow-changing water dynamic balance of information physical fusion regulates and controls method and system
CN113091123A (en) * 2021-05-11 2021-07-09 杭州英集动力科技有限公司 Building unit heat supply system regulation and control method based on digital twin model
CN113446661A (en) * 2021-07-30 2021-09-28 西安热工研究院有限公司 Intelligent and efficient heat supply network operation adjusting method
CN113657031A (en) * 2021-08-12 2021-11-16 杭州英集动力科技有限公司 Digital twin-based heat supply scheduling automation realization method, system and platform
CN113757788A (en) * 2021-09-15 2021-12-07 河北工大科雅能源科技股份有限公司 Station-load linked two-network balance online dynamic intelligent regulation and control method and system
CN114183796A (en) * 2021-11-26 2022-03-15 杭州英集动力科技有限公司 Optimal scheduling method and device based on electric heating and central heating multi-energy complementary system

Also Published As

Publication number Publication date
CN114909706A (en) 2022-08-16

Similar Documents

Publication Publication Date Title
CN112232980B (en) Regulation and control method for heat pump unit of regional energy heat supply system
CN114909706B (en) Two-level network balance regulation and control method based on reinforcement learning algorithm and differential pressure control
CN109270842B (en) Bayesian network-based regional heat supply model prediction control system and method
CN113657031A (en) Digital twin-based heat supply scheduling automation realization method, system and platform
CN105512745A (en) Wind power section prediction method based on particle swarm-BP neural network
CN114777192B (en) Secondary network heat supply autonomous optimization regulation and control method based on data association and deep learning
CN106026084B (en) A kind of AGC power dynamic allocation methods based on virtual power generation clan
CN110866640A (en) Power load prediction method based on deep neural network
CN114811713B (en) Two-level network inter-user balanced heat supply regulation and control method based on mixed deep learning
CN114370698A (en) Indoor thermal environment learning efficiency improvement optimization control method based on reinforcement learning
CN116544934B (en) Power scheduling method and system based on power load prediction
CN111461466A (en) Heating household valve adjusting method, system and equipment based on L STM time sequence
CN108376294A (en) A kind of heat load prediction method of energy supply feedback and meteorologic factor
CN118246344B (en) On-line optimization method of heating ventilation air conditioning system based on data driving
CN114777193B (en) Method and system for switching household regulation and control modes of secondary network of heating system
CN115751441A (en) Heat supply system heating station heat regulation method and system based on secondary side flow
Li et al. Data-oriented distributed overall optimization for large-scale HVAC systems with dynamic supply capability and distributed demand response
CN117389132A (en) Heating system multi-loop PID intelligent setting system based on cloud edge end cooperation
CN117557019A (en) Heat supply system scheduling method of heat pump-containing cluster based on model predictive control
CN114909707B (en) Heat supply secondary network regulation and control method based on intelligent balance device and reinforcement learning
CN117035549A (en) Method for evaluating cost algorithm of urban water supply network scheme
CN115327890B (en) Method for optimizing main steam pressure of PID control thermal power depth peak shaving unit by improved crowd searching algorithm
CN116300755A (en) Double-layer optimal scheduling method and device for heat storage-containing heating system based on MPC
Wu et al. Neural Network Based Fea sible Region Approximation Model for Optimal Operation of Integrated Electricity and Heating System
CN114611823A (en) Optimized dispatching method and system for electricity-cold-heat-gas multi-energy-demand typical park

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant