CN114484822B - Ocean platform ventilation system control method based on temperature and hydrogen sulfide concentration control - Google Patents

Ocean platform ventilation system control method based on temperature and hydrogen sulfide concentration control Download PDF

Info

Publication number
CN114484822B
CN114484822B CN202210124691.0A CN202210124691A CN114484822B CN 114484822 B CN114484822 B CN 114484822B CN 202210124691 A CN202210124691 A CN 202210124691A CN 114484822 B CN114484822 B CN 114484822B
Authority
CN
China
Prior art keywords
agent
cabin
hydrogen sulfide
temperature
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210124691.0A
Other languages
Chinese (zh)
Other versions
CN114484822A (en
Inventor
崔璨
薛璟
黎明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202210124691.0A priority Critical patent/CN114484822B/en
Publication of CN114484822A publication Critical patent/CN114484822A/en
Application granted granted Critical
Publication of CN114484822B publication Critical patent/CN114484822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/70Control systems characterised by their outputs; Constructional details thereof
    • F24F11/72Control systems characterised by their outputs; Constructional details thereof for controlling the supply of treated air, e.g. its pressure
    • F24F11/74Control systems characterised by their outputs; Constructional details thereof for controlling the supply of treated air, e.g. its pressure for controlling air flow rate or air velocity
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/62Control or safety arrangements characterised by the type of control or by internal processing, e.g. using fuzzy logic, adaptive control or estimation of values
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16CCOMPUTATIONAL CHEMISTRY; CHEMOINFORMATICS; COMPUTATIONAL MATERIALS SCIENCE
    • G16C20/00Chemoinformatics, i.e. ICT specially adapted for the handling of physicochemical or structural data of chemical particles, elements, compounds or mixtures
    • G16C20/20Identification of molecular entities, parts thereof or of chemical compositions
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F2110/00Control inputs relating to air properties
    • F24F2110/10Temperature
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F2110/00Control inputs relating to air properties
    • F24F2110/50Air quality properties
    • F24F2110/65Concentration of specific substances or contaminants
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B30/00Energy efficient heating, ventilation or air conditioning [HVAC]
    • Y02B30/70Efficient control or regulation technologies, e.g. for control of refrigerant flow, motor or heating

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Mechanical Engineering (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Fuzzy Systems (AREA)
  • Signal Processing (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Fluid Mechanics (AREA)
  • Feedback Control In General (AREA)
  • Ventilation (AREA)

Abstract

The invention provides a control method of an ocean platform ventilation system based on temperature and hydrogen sulfide concentration control, which comprises the following steps: establishing a hydrogen sulfide gas concentration change differential equation in the ocean platform cabins, respectively defining the variable air volume air box and the air handling unit in each cabin as an intelligent agent, and obtaining N +1 intelligent agents in total, wherein N is the number of the ocean platform cabins; fitting the intelligent agents by adopting a neural network, wherein each intelligent agent comprises an operator network responsible for generating the strategy and a critic network responsible for evaluating the strategy in real time; defining states and actions of N +1 intelligent agents at the time t to define an over-limit penalty function of the temperature and the hydrogen sulfide concentration of the cabin intelligent agents; and carrying out intelligent training. The control method of the ocean platform ventilation system can control the hydrogen sulfide gas with the temperature and the concentration exceeding the standard in the cabin of the ocean platform.

Description

Ocean platform ventilation system control method based on temperature and hydrogen sulfide concentration control
Technical Field
The invention relates to the technical field of intelligent control, in particular to a control method of an ocean platform ventilation system based on temperature and hydrogen sulfide concentration control.
Background
The ventilation system is an important public system for ensuring the normal production and the personnel life of the ocean platform, and in order to ensure the normal production of the platform and the normal life of the working personnel, the living area and the production area of the ocean platform are ventilated so as to effectively eliminate the heat dissipation capacity of a room of electronic equipment, maintain the design temperature of the room and ensure the normal operation of the electronic equipment; the ventilation mode and the ventilation quantity are different according to different regions, and whether the offshore platform has good ventilation is an important basis for dividing dangerous regions, and is also an important factor related to the safety of the platform. From the safety perspective, the ventilation measures can prevent the gathering and diffusion of inflammable and explosive gases and toxic and harmful gases, maintain the positive pressure or negative pressure requirement of the room, provide enough fresh air for workers, and ensure a comfortable and sanitary living environment.
For an oil production ocean platform, some oil well products contain toxic hydrogen sulfide gas, and a small amount of high-concentration hydrogen sulfide gas absorbed by the oil well products can be fatal in a short time. The hydrogen sulfide gas will diffuse naturally into the ocean platform cabin. Ventilation is now widely used on ocean platforms to promote rapid diffusion dilution of the leaking hydrogen sulfide gas. For the accident scene of hydrogen sulfide leakage with small initial concentration, the ventilation measure is the currently available and effective slowing measure, and the health and the safety of ocean platform operators are prevented from being influenced by the aggregation and the diffusion of the hydrogen sulfide.
The ocean platform is usually a multi-cabin structure, more than one area needing ventilation is often provided, personnel, functions and the like in each cabin are different, and the concentration of hydrogen sulfide gas is different, so the ventilation requirement is also different. How to satisfy the ventilation demand that each region and platform operation personnel are different on the platform, on the basis of guaranteeing reasonable ventilation cooling, furthest dilutes and prevents its diffusion the hydrogen sulfide gas who reveals, and it is very important problem to provide safe comfortable offshore platform operational environment.
At present, the ventilation control method applied to the ocean platform mainly realizes the automatic control of the flow, the pressure difference and the like of the air conditioning system through the monitoring of the air conditioning system on the parameters such as the air flow, the temperature, the pressure and the like in the cabin, but the method depends on the hardware measures such as a fan, a pressure difference transmitter, a PLC (programmable logic controller) and the like, and meanwhile, the manual operation of an operator is required, so that the inconvenience is brought to the ventilation of the ocean platform. In addition, a method for pertinently selecting a corresponding ventilation mode and a corresponding fan type according to the type and specific requirements of each cabin of the ocean platform is also provided, and the method needs to consider the specific situation of the cabin and carry out specific thermodynamic modeling on the cabin, is complex to implement and has no universality.
Disclosure of Invention
The invention aims to solve one of the technical problems and provides a control method suitable for a multi-cabin ventilation system of an ocean platform, aiming at scientifically and reasonably realizing control indexes of temperature and hydrogen sulfide concentration, so that the ventilation system can meet different temperature requirements of each cabin by adjusting the air supply volume of different cabins under the condition that the ocean platform normally works and operates; when the ocean platform has hydrogen sulfide leakage accidents, because the ventilation measure is a measure for effectively preventing the gathering and diffusion of hydrogen sulfide, toxic gas such as hydrogen sulfide can permeate into a working area and a living area of the ocean platform, the ventilation system can monitor that the concentration of the hydrogen sulfide in a room exceeds a safety range while controlling the temperature, and a proper ventilation strategy is adopted to reduce the concentration of the hydrogen sulfide gas in each cabin as much as possible, improve the quality of air in the room and ensure the safety of workers in the platform.
In order to solve the above problems, the present invention provides the following technical solutions:
a control method of an ocean platform ventilation system based on temperature and hydrogen sulfide concentration control is disclosed, an ocean platform comprises a plurality of cabins, and the control method comprises the following steps:
s1: establishing a hydrogen sulfide gas concentration change differential equation in the cabin of the ocean platform;
assuming that hydrogen sulfide gas uniformly permeates into the cabin, the air supply and exhaust flows in the cabin are isothermal flows;
establishing a hydrogen sulfide gas concentration change differential equation in the cabin of the ocean platform as follows:
m i,t y i,t dt+x i,t dt-k i,t S i,t dt=J i ds;
S i,t is of volume J i In chamber i, the initial concentration of hydrogen sulfide gas in the air, dt, is a very small time slot, m i,t For the ventilation system air supply during dt times, y i,t Is the concentration of hydrogen sulfide in the air of the air supply, x i,t Amount of gas, k, permeated into chamber i for dt times for hydrogen sulfide i,t The exhaust air volume for exhausting the cabin in dt time, and ds is the increment of the hydrogen sulfide concentration in the cabin in dt time;
s2: respectively defining each cabin and each air handling unit as an intelligent agent, namely, 1-N intelligent agents are cabin intelligent agents, and N +1 intelligent agents are air handling machine intelligent agents, so as to obtain N +1 intelligent agents, wherein N is the number of the ocean platform cabins; fitting the intelligent agents by adopting a neural network, wherein each intelligent agent comprises an operator network responsible for generating the strategy and a critic network responsible for evaluating the strategy in real time;
s3: defining an observation set of N +1 agents at the time t:
s t =o t =(o 1,t ,...,o N+1,t );
wherein: o 1,t Represents the observed quantity, o, of the 1 st agent at time t N+1,t Representing the observed quantity of the N +1 agent at the time t;
defining agent actions a, namely cabin agent and air handler agent actions:
a t =(m 1,t ,m 2,t ,...,m N,t ,σ t );
wherein: m is a unit of 1,t For the action of agent 1 at time t, m 2,t For the action of agent 2 at time t, σ t The action of the (N + 1) th agent at the moment t;
s4: defining a temperature overrun penalty function for the cabin agent:
Figure GDA0003934252880000041
wherein: r is a radical of hydrogen i,1,t (s t ) Penalty function for temperature overrun of cabin agent, T i,t Indicates the cabin temperature of the ith cabin at time t,
Figure GDA0003934252880000042
represents the lowest permissible value of the temperature in the cabin,
Figure GDA0003934252880000043
a maximum allowable value representing the temperature in the cabin; [] + Indicates that the term is only used]If the internal value is greater than 0, taking the original value, otherwise, taking 0; when the indoor temperature exceeds the limit rangeAt the time of a high temperature, the temperature,
Figure GDA0003934252880000044
when the indoor temperature is lower than the lowest temperature of the limit range,
Figure GDA0003934252880000045
when the indoor temperature is stabilized within the limit range, then r i,1,t (s t )=0;
Defining a temperature overrun penalty function of an air handling unit intelligent agent:
r N+1,1,t =0;
defining a hydrogen sulfide concentration overrun penalty function of the cabin intelligent agent:
Figure GDA0003934252880000046
wherein: r is a radical of hydrogen i,2,t (s t ) A hydrogen sulfide concentration overrun penalty function of the cabin intelligent body;
Figure GDA0003934252880000047
indicating the maximum allowable concentration of hydrogen sulfide gas in the chamber region;
defining penalty function r for over-limit of hydrogen sulfide concentration of intelligent body of air handling unit N+1,2,t
Figure GDA0003934252880000048
Define the reward functions for N +1 agents:
r t =r i,1,t (s t )+br i,2,t (s t );
wherein r is t Is the reward function of the tth agent, b is a coupling factor with a positive value;
s5: carrying out intelligent agent training;
defining an action cost function Q y (s t ,a t ): is shown inState s t Lower adopting action a t Obtaining expected income, wherein y represents weight parameters obtained by training in the critic network;
defining a policy merit function pi q (a | s): q is a weight parameter in the actor network;
defining an action cost function for agent i
Figure GDA0003934252880000051
Figure GDA0003934252880000052
Wherein, f i Is a two-layer multi-layer perceptron, q i Is a one-layer multi-layer perceptron embedding function, o i Representing the observed quantity, x, of the ith agent i All information obtained on behalf of the ith agent communicating with other agents;
wherein: x is the number of i =∑ j≠i w j (W v e j );
Wherein, W v Is a covariance matrix, e j For the embedding function: e.g. of a cylinder j =q j (o j ,a j ) (ii) a h (x) is a non-linear activation function,
Figure GDA0003934252880000053
Figure GDA0003934252880000054
represents the degree of interest, W, of agent i to information provided by agents j other than agent i k And W q Are all covariance matrices;
delivering covariance matrix W in an operator-critical network v 、W k And W q Continuously training and updating the N +1 critic networks to minimize a loss function of the joint regression:
Figure GDA0003934252880000055
Figure GDA0003934252880000056
wherein: l is a radical of an alcohol Q (y) represents a loss function;
Figure GDA0003934252880000057
expressing the expectation of the calculation result of all the data in the experience pool;
Figure GDA0003934252880000058
representing the action value function of the agent i when the weight parameter is y;
Figure GDA0003934252880000059
indicates that at time t the weighting parameter is
Figure GDA00039342528800000510
A target policy cost function of time; γ represents the discount rate of the profit;
Figure GDA00039342528800000511
representing a temperature parameter and determining the balance between the mixed entropy and the benefit;
Figure GDA00039342528800000512
representing a target action cost function of the agent i; d is a radical of i A target reward value representing agent i; r is a radical of hydrogen i (o i ,a i ) Representing the income of the intelligent agent after the action a is taken when the observed value is O;
Figure GDA0003934252880000061
expressing the expectation of the calculation result of all the data in the experience pool;
thus, a random gradient function may be defined as:
Figure GDA0003934252880000062
wherein, the first and the second end of the pipe are connected with each other,
Figure GDA0003934252880000063
wherein:
Figure GDA0003934252880000064
calculating a random gradient corresponding to the ith agent; j (q) represents the corresponding loss function; e o~D,a~p Indicates that all possible outcomes are desired;
Figure GDA0003934252880000065
denotes the weight parameter q at time t i The target policy cost function of (1); u represents a set of all agents except agent i; b (o) i ,a U ) Is a state-dependent reference value that is generally used in policy gradient-like methods to reduce variance without changing the expectation of the policy gradient:
Figure GDA0003934252880000066
and training the intelligent agent until the loss function and the random gradient function meet the training condition, and applying the intelligent agent qualified in training to the online control of the ventilation system of the ocean platform.
In some embodiments of the present invention, in step S5, after defining the loss function and the random gradient function, the step of performing agent training further includes:
s51: an initialization step: initializing the capacity of an experience pool D and the state environment of N +1 intelligent agents in an ocean platform ventilation system; the state environment described here includes initial values of the outdoor temperature, the number of persons in the cabin, the weight q, and the weight y; initializing weights of a target network
Figure GDA0003934252880000067
And a policy function
Figure GDA0003934252880000068
Wherein
Figure GDA0003934252880000069
S52: defining Y epsilon; for the j (j is more than or equal to 1 and less than or equal to Y) th epsilon, firstly resetting the environment of all the agents to obtain the initial observed quantity o of each agent i i,1
S53: defining P moments; at the tth moment (t is more than or equal to 1 and less than or equal to P), each agent i selects a proper action according to the strategy function
Figure GDA0003934252880000071
At the same time will act a i,t The observation value o of the next moment is obtained by transmitting the observation value o to other agents in the platform ventilation system and interacting with other agents in the system environment based on an attention mechanism i,t+1 And a prize r i,t+1 (ii) a And stores the transition matrix (o) in the experience pool D t ,a t ,o t+1 ,r t+1 );
S54: training an operator network and a critic network by adopting data in the experience pool; calculating an approximate action cost function for each agent i
Figure GDA0003934252880000072
Wherein l is more than or equal to 1 and less than or equal to B,
Figure GDA0003934252880000073
and
Figure GDA0003934252880000074
denotes the l-th a in the mini-batch i And o i (ii) a Calculating an approximate policy function for the I-th data in all agent i and mini-batch
Figure GDA0003934252880000075
And approximate action cost function
Figure GDA0003934252880000076
Then updating the weight parameters in the critic network by minimizing a loss function; computing targets for all agents i simultaneouslyPolicy function
Figure GDA0003934252880000077
And actual action cost function
Figure GDA0003934252880000078
And updates the policies of all agents and parameters of the target network,
Figure GDA0003934252880000079
s55: and repeating the steps until t = P, j = Y, and finishing the training.
In some embodiments of the invention:
if the experience pool D is larger than the mini-batch in size, then a transfer matrix dataset of size B is randomly selected in the experience pool
Figure GDA00039342528800000710
To train the operator network and the critical network.
In some embodiments of the present invention, m i,t C discrete quantities are provided, and each discrete quantity corresponds to the opening degree of a variable air volume air bellow air valve in a cabin:
Figure GDA00039342528800000711
total air valve angle sigma in ocean platform air handling unit t There are Z discrete quantities, each corresponding to a total damper opening:
Figure GDA00039342528800000712
in some embodiments of the invention, an observed quantity of a cabin agent i at a current time t is defined:
Figure GDA0003934252880000081
wherein,
Figure GDA0003934252880000082
Representing the ambient temperature outside the ocean platform at time T, the set U representing the set of other cabin zones in the ocean platform except the cabin i, T j,t Representing the room temperature, K, of the chamber z in the set at time t i,t Indicates the number of persons in the cabin i at time t, S i,t The concentration of hydrogen sulfide gas at the moment t of the ith cabin is shown, and t' represents a time interval index in one day and is calculated by dividing the total time of one day by the time interval;
defining actions of an air conditioner processor intelligent agent:
o N+1,t =(t′,K 1,t ,...,K N,t ,S 1,t ,...,S N,t );
and obtaining an observed quantity group of N +1 intelligent agents at the t moment based on the observed quantity of the cabin intelligent agents and the action of the total fan intelligent agents.
The system provided by the invention has the beneficial effects that:
1. the control method of the ocean platform ventilation system can control hydrogen sulfide gas with over-standard temperature and concentration in the ocean platform cabin.
2. The method can simultaneously carry out ventilation control on a plurality of cabins, meets the ventilation quantity requirements of different cabins, does not need to establish a ventilation model for the cabins, and further can avoid errors caused by inaccurate models; when the intelligent agent is trained to select actions, only the current observed quantity is used, and any prior knowledge about uncertain parameters in the system is not needed, so that the air volume control method improves the applicability and improves the ventilation efficiency.
3. The method does not need manual adjustment, can realize automatic and rapid ventilation, has strong universality, and can reduce the control cost of the ventilation system of the ocean platform; under the condition of giving any initial value, the trained intelligent agent can quickly adjust the ventilation quantity, adjust the control parameter to a reasonable range, meet the personalized fresh air control requirement of the cabin, and eliminate the potential safety hazard caused by the leakage of the hydrogen sulfide toxic gas in the offshore platform. The ventilation method applied to the ocean platform is more diversified while the ventilation efficiency of the ocean platform is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the embodiments or the prior art description will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings may be obtained according to these drawings without inventive labor.
FIG. 1 is a schematic diagram of interaction of various agents of an ocean platform.
FIG. 2 is a flow chart of ocean platform agent training.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a control method of a ventilation system of an ocean platform, which is based on the working environment of the ocean platform and takes the factors of temperature and hydrogen sulfide concentration into consideration, so that the control method of the ventilation system of the ocean platform with high safety coefficient based on basic temperature control and hydrogen sulfide concentration control is provided.
First, the structure of the ventilation system of the ocean platform in the background art is described.
The ocean platform air conditioning system comprises an air processing unit acting on the whole ocean platform and a series of variable air volume air boxes arranged in each cabin area, wherein the air processing unit consists of a main air valve, a cooling/heating coil and a variable frequency fan. The variable air volume wind box of each chamber is provided with a vent, and the ventilation volume of the vent determines the temperature and the concentration of the hydrogen sulfide in the chamber. The ventilation quantity of the ventilation opening of the variable air volume bellows in each cabin is determined by a total air valve (controlling the air supply quantity), the opening degree of the air valve of the ventilation opening (controlling the air output quantity) and the like.
The overall idea of the control method of the ocean platform ventilation system provided by the invention is as follows: the invention regards the variable air volume bellows of each cabin area of the offshore platform and the main air valve in the air handling unit as intelligent bodies, and realizes the ventilation target by controlling the cooperative cooperation of a plurality of intelligent bodies. According to the invention, a proper neural network needs to be designed, the action, the state and the reward of each intelligent agent are defined at the same time, a multi-intelligent agent deep reinforcement learning model is established, an air conditioning system on a platform can control the ventilation volume of each cabin under the condition of giving the outdoor temperature of the platform and the number of people in each cabin through training the intelligent agents, so that under the condition that an ocean platform normally works and runs, a ventilation system can meet the different temperature requirements of each cabin by adjusting the air supply volume of different cabins, reasonable ventilation and heat dissipation are ensured, the working comfort and the living comfort of workers in the cabins are ensured, and the service life of equipment, energy conservation and consumption reduction are ensured to the maximum extent; when the ocean platform has hydrogen sulfide leakage accidents, as the ventilation measure is a measure for effectively preventing the gathering and diffusion of hydrogen sulfide, toxic gas such as hydrogen sulfide can permeate into working areas and living areas of the ocean platform, the ventilation system can control the temperature and monitor that the concentration of the hydrogen sulfide in the room exceeds a safety range, and a proper ventilation strategy is adopted to reduce the concentration of the hydrogen sulfide gas in each cabin as much as possible, improve the air quality in the room and ensure the safety of workers in the platform.
The control targets of the ventilation system include an indoor temperature control target and a hydrogen sulfide gas concentration control target.
(1) And controlling the temperature of the cabin of the ocean platform.
Figure GDA0003934252880000101
Wherein, T i,t Indicates the temperature in the cabin at time t of the ith cabin, which is detected by a temperature sensor provided in the cabin.
Figure GDA0003934252880000111
Represents the lowest permissible value of the temperature in the cabin,
Figure GDA0003934252880000112
representing the maximum allowable value of the temperature in the cabin. The area between the two temperature values is the limiting area of the cabin temperature. The minimum allowable value and the maximum allowable value are set according to the specific working environment of each cabin by considering the requirements of operators, the requirements of storage in the cabin and the like.
For example, the temperature control ranges of a cabin of a general ocean platform are different between a person and an unmanned person, when an operator is in the cabin, the requirement of the operator is mainly considered, and the health of the operator is taken as a reference; for an unmanned cabin, the normal operation of electrical equipment is generally maintained, reasonable ventilation and heat dissipation are carried out, and positive pressure or negative pressure in a room is maintained. According to the standard of a design method for a heating, ventilating and air conditioning system of an offshore platform, which is established by the general oil company in China, the indoor temperature of a manned working area or part of living areas, such as a control room, a communication room, a living house, a dining room and the like, in an offshore platform is generally controlled to be between 19 and 24 ℃; the indoor temperature of other living areas in the platform, such as toilets, storerooms and the like, is generally between 16 and 25 ℃; the indoor temperature of an unmanned working area in the platform, such as a battery room and the like, needs to be controlled between 15 and 35 ℃, the temperature of a transformer needs to be controlled between 5 and 45 ℃, and the temperature of other generator rooms, fire pump rooms and the like needs to be controlled between 5 and 35 ℃.
In practice, the indoor temperature of each cabin of the ocean platform is influenced by many factors, for example, due to the existence of the coupling effect, the indoor temperature of other cabins influences the indoor temperature of the cabin, and besides, the outdoor temperature and the ventilation quantity have the effect, and in the invention, the ventilation quantity in the room is expected to be changed to effectively ventilate and radiate the room.
(2) And controlling the concentration of the hydrogen sulfide gas on the ocean platform.
In the process of exploiting an acid gas field by an offshore platform, blowout is out of control and generally occurs in an open area of the platform, so leaked natural gas and hydrogen sulfide gas are easy to gather in the platform, a high-concentration natural gas explosion area and a high-concentration hydrogen sulfide poisoning area can be formed, the health of offshore platform workers and the production efficiency of the offshore platform workers are influenced, and serious casualties and property loss are caused; and the hydrogen sulfide gas can be diffused gradually and permeates into a safety area and a working area in the ocean platform, and in a cabin area with part of the permeated hydrogen sulfide gas but low indoor hydrogen sulfide concentration, a ventilation system of the ocean platform can provide guarantee for diluting the indoor hydrogen sulfide gas and reducing the hydrogen sulfide concentration. Therefore, the concentration of hydrogen sulfide in each cabin in the platform is selected to represent the air quality, and in order to avoid accidents, the concentration of hydrogen sulfide gas should be controlled within a safe range:
Figure GDA0003934252880000121
wherein S is i,t Indicates the concentration of hydrogen sulfide gas in the ith chamber at time t,
Figure GDA0003934252880000122
indicating the maximum allowable concentration of hydrogen sulfide gas in the chamber region, and when it exceeds that concentration, indicating that a hazardous condition has been reached within the chamber. The concentration can be set according to the relevant safety protection regulation file.
According to relevant regulations in shallow sea oil work hydrogen sulfide protection safety regulations (SY 6504-2010): "the hydrogen sulfide alarm device should be installed on the shallow sea petroleum facility equipped with hydrogen sulfide fixed probe, and can alarm to the safety facility, when the concentration of hydrogen sulfide in the air reaches 15mg/m 3 (10 ppm), the system can work in an audible and visual alarm mode. "" when the concentration of hydrogen sulfide gas reaches 150mg/m 3 (100 ppm) and cannot be controlled, and emergency evacuation of personnel and facilities is carried out according to the requirements of emergency plans when crisis personnel and facilities are safe. "furthermore, the threshold mass concentration of hydrogen sulfide is 15mg/m 3 The safety critical mass concentration is 30mg/m 3 And critical substances of dangerThe dosage concentration is 150mg/m 3 . The above standards provide basis for the safety range of the concentration of hydrogen sulfide in the ocean platform, and provide scientific and reasonable indexes when the intelligent agent is punished that the concentration of hydrogen sulfide exceeds the limit.
Specifically, the control method of the ocean platform ventilation system provided by the invention comprises the following steps.
S1: and establishing a hydrogen sulfide gas concentration change differential equation in the cabin of the ocean platform.
In an ideal case, the following assumptions are: the hydrogen sulfide gas uniformly permeates into the room (the concentration distribution of harmful substances in the indoor air is uniform), and the air flow of the air supply and the air exhaust is isothermal.
On the basis, the change of the concentration of the hydrogen sulfide gas in the chamber can establish a differential equation according to the principle of 'material balance', and for a continuous and stable ideal ventilation process, the differential equation can be listed as follows:
m i,t y i,t dt+x i,t dt-k i,t S i,t dt=J i ds;
wherein m is i,t The air volume of the ith cabin at the time t is determined by the opening and the angle of an air valve of an air volume-variable air bellow in the cabin; k is a radical of i,t Representing the air output of the ith cabin at the time t; y is i,t Represents the concentration of hydrogen sulfide gas in the supply air at time t in the ith compartment, dt represents a hypothetical very small time slot, x i,t Represents the amount of gas of hydrogen sulfide gas permeating into the chamber space at time t in the ith chamber, J i Is the volume of chamber i, ds represents the increase in the concentration of hydrogen sulfide gas in the chamber over time dt;
wherein m is i,t There are C discrete quantities, each discrete quantity corresponding to one damper opening:
Figure GDA0003934252880000131
similarly, the total damper angle in the air handling unit of the ocean platform may also be selected among the following Z discrete values:
Figure GDA0003934252880000132
s2: the variable air volume air box and the air handling unit in each cabin are respectively defined as an intelligent agent, namely a cabin intelligent agent and an air handling unit intelligent agent.
The intelligent agents adopt neural network fitting, can learn to update neural network parameters, and each intelligent agent comprises a strategy generation network and a strategy evaluation network.
The air conditioning system ventilation control method provided by the invention adopts an operator-critic algorithm in reinforcement learning, combines the characteristics of two types of reinforcement learning methods based on values and strategies, and comprises an operator network responsible for generating the strategies and a critic network responsible for evaluating the strategies in real time.
All the operator networks have the same structure and comprise an input layer and a plurality of hidden layers, wherein each hidden layer is provided with an activation function of a Leaky ReLU, and the operator networks also comprise an output layer which takes a softmax function as the activation function. Similarly, all the critic networks have the same structure and also comprise an input layer, a plurality of hidden layers with the Leaky ReLU function as the activation function and an output layer with the softmax function as the activation function.
Assuming that the offshore platform includes N cabins in total, N +1 agents are defined in total.
State s of N cabin agents is defined, and cabin agent air valves can adjust air output based on current environment to guarantee indoor temperature and concentration of hydrogen sulfide gas.
Defining an observed quantity of a cabin agent i at a current time t:
Figure GDA0003934252880000141
wherein the content of the first and second substances,
Figure GDA0003934252880000142
representing the ambient temperature outside the platform at time T, set U representing the set of cabin zones in the platform other than cabin i, and T j,t It indicates the indoor temperature of the cabin j in the set at time t (both the indoor temperature and the outdoor temperature can be obtained by temperature sensors), K i,t This indicates the number of persons in the cabin i at time t (which can be obtained by an electronic counting sensor at the door of each cabin, S) i,t Indicates the concentration of hydrogen sulfide gas (which can be obtained by a specific hydrogen sulfide gas sensor) in the ith compartment at time t. Wherein t' represents the time interval index in one day, is related to the set time interval length, and is calculated by dividing the total time of one day by the time interval. (e.g., when the time interval τ =15 minutes, the time interval index t' is 24 × 60/15= 96).
Defining the action of an intelligent agent of the air conditioner processor:
o N+1,t =(t′,K 1,t ,...,K N,t ,S 1,t ,...,S N,t );
obtaining an observed quantity group of N +1 agents at the time t:
s t =o t =(o 1,t ,...,o N+1,t )。
wherein: o 1,t Represents the observed quantity, o, of the first agent (first cabin agent) at time t N+1,t Represents the observed quantity of an agent (air handler agent) at time N + 1;
s3: defining agent action steps.
The air treatment unit comprises a cabin intelligent body, an air valve, an air inlet valve, an air outlet valve and an air inlet valve, wherein the action of the cabin intelligent body corresponds to the angle of the air valve of the variable air volume air box in the cabin and is used for controlling the air inlet volume in each cabin, and the action of the ocean platform air treatment unit intelligent body corresponds to the angle of the total air valve and is used for controlling the total air inlet volume.
The action a of each intelligent agent is the intake m of the ocean platform cabin i at the moment t for the cabin intelligent agent i (i is more than or equal to 1 and less than or equal to N) i,t For the N +1 th agent (the air handler is regarded as the agent), the action is σ of the total air valve angle in the unit at the time t t Thus, it is possible toFor these N +1 agents, the set of action values can be expressed as:
a t =(m 1,t ,m 2,t ,...,m N,t ,σ t )。
s4: an agent penalty function is defined.
The penalty comprises the penalty of exceeding the limit area by the temperature in the cabin:
defining a temperature overrun penalty function for the cabin agent:
Figure GDA0003934252880000151
wherein: r is i,1,t (s t ) Penalty function for temperature overrun of cabin agent, T i,t Indicates the temperature in the compartment at time t in the ith compartment,
Figure GDA0003934252880000152
represents the lowest permissible value of the temperature in the cabin,
Figure GDA0003934252880000153
a maximum allowable value representing a temperature in the cabin; [] + Is expressed as [ 2 ], [ 2 ]]If the internal value is greater than 0, the original value is taken, otherwise 0 is taken. Therefore, when the indoor temperature exceeds the maximum temperature of the limited range,
Figure GDA0003934252880000154
when the indoor temperature is lower than the lowest temperature of the limit range,
Figure GDA0003934252880000155
when the indoor temperature is stabilized within the limit range, then r i,1,t (s t )=0。
The correlation between the indoor temperature of each cabin and the air valve angle of the main air valve of the unit is not large, so the invention defines r N+1,1,t And =0. Wherein r is N+1,1,t And (4) performing a temperature overrun penalty function for the intelligent agent of the air handling unit.
The penalty also includes a penalty for hydrogen sulfide concentration exceeding a safe range.
Defining a hydrogen sulfide concentration overrun penalty function of the cabin intelligent agent:
Figure GDA0003934252880000161
wherein: r is a radical of hydrogen i,2,t (s t ) A penalty function for the hydrogen sulfide concentration overrun of the cabin intelligent agent; whether platform operators exist in the cabin platform at the moment can influence the value of the reasonable range of the temperature and the concentration of the hydrogen sulfide. S. the i,t Indicates the concentration of hydrogen sulfide gas in the ith chamber at time t,
Figure GDA0003934252880000162
indicating the maximum allowable concentration of hydrogen sulfide gas in the chamber region.
Defining penalty function r for over-limit of hydrogen sulfide concentration of intelligent body of air handling unit N+1,2,t
Figure GDA0003934252880000163
Define the reward function for N +1 agents:
r t =r i,1,t (s t )+br i,2,t (s t );
wherein r is t Is the penalty function for the tth agent, and b is a positive coupling factor in deg.c/ppm.
In order to obtain information about the status and penalties in the ventilation system, information exchange between different agents is required. Referring to fig. 1, once the agent obtains the state information, it can determine the corresponding action a according to the current state information t =(m 1,t ,m 2,t ,…,m n,t ,σ t ) The agent will then observe new state information at time t +1 and calculate the reward r received by the agent after selecting the action i,t
S5: and carrying out intelligent training.
And (4) training the intelligent agent by adopting an operator-critical neural network.
Defining an action cost function Q y (s t ,a t ): is shown in state s t Lower adoption action a t And obtaining expected revenue, wherein y represents weight parameters obtained by training in the critic network, and the parameters can be obtained by minimizing a loss function L in the discrete strategy time sequence difference learning Q (y) learned.
Defining a policy cost function pi q (a | s): can be obtained by training a policy gradient function, where q is a weight parameter in the actor network.
Defining an action cost function for agent i
Figure GDA0003934252880000171
Figure GDA0003934252880000172
Wherein f is i Is a two-layered multi-layered perceptron, q i Is a one-layer multi-layer perceptron embedding function, o i Representing the observed quantity, x, of the ith agent i Representing all information from other agents (when the neural network starts to compute the action cost function of agent i)
Figure GDA0003934252880000173
Then, the information of other agents is taken into consideration according to their respective weights):
x i =∑ j≠i w j (W v e j );
wherein, W v Is a covariance matrix, and will embed a function e j =q j (o j ,a j ) Conversion to "value". h (x) is a non-linear activation function,
Figure GDA0003934252880000174
indicates how much interest agent i is interested in information provided by other agent j, here W k And W q All are covariance matrixes, and the embedding function is converted into 'key' and 'query' respectively.
Delivering covariance matrix W in an actor-critical network v 、W k And W q Continuously training and updating all critic networks to minimize the loss function of the joint regression:
Figure GDA0003934252880000175
Figure GDA0003934252880000176
wherein: l is Q (y) represents a loss function;
Figure GDA0003934252880000177
expressing the expectation of the calculation results of all the data in the experience pool;
Figure GDA0003934252880000178
representing the action value function of the agent i when the weight parameter is y;
Figure GDA0003934252880000181
indicates that at time t the weighting parameter is
Figure GDA0003934252880000182
A target policy cost function of time; γ represents the discount rate of the profit;
Figure GDA0003934252880000183
representing a temperature parameter and determining the balance between the mixing entropy and the income;
Figure GDA0003934252880000184
representing a target action cost function of the agent i; d is a radical of i A target reward value representing agent i; r is i (o i ,a i ) Representing the income of the intelligent agent after the intelligent agent takes the action a when the observed value is o;
Figure GDA0003934252880000185
expressing the expectation of the calculation results of all the data in the experience pool;
thus, a random gradient function may be defined as:
Figure GDA0003934252880000186
Figure GDA0003934252880000187
wherein:
Figure GDA0003934252880000188
representing the random gradient; j (q) represents the corresponding loss function; e o~D,a~p Representing all possible results to expect;
Figure GDA0003934252880000189
denotes the weight parameter q at time t i The target policy cost function of (2); u represents the set of all agents except agent i.
Figure GDA00039342528800001810
The merit function for each agent in the multi-agent system indicates whether the agent's current action will result in an increase in expected revenue. Wherein b (o) i ,a U ) Is a reference value related to the state, generally used in the expectation of reducing variance without changing the policy gradient in the policy gradient class method, and it is defined in the present invention that all agents use the same reference value, namely:
Figure GDA00039342528800001811
wherein:
Figure GDA00039342528800001812
indicates that at time t the weighting parameter is
Figure GDA00039342528800001813
The cost function of the policy in time,
Figure GDA00039342528800001814
an action cost function of agent i in the case where the observed value is o is represented.
Referring to fig. 2, a specific flow of the control method provided by the present invention is as follows.
S51: and (5) initializing.
Initializing the capacity of an experience pool D and the state environment of N +1 intelligent agents in an ocean platform ventilation system; the state environment described here includes the initial values of the outdoor temperature, the number of persons in the cabin, the weight q, and the weight y;
the actions of the agent described herein are generally the adjustment of the state of the air handler damper, and the adjustment of the state of the variable air volume bellows outlet damper. The action of the intelligent agent has great influence on the concentration of hydrogen sulfide gas in each cabin of the ocean platform.
Initializing weights of a target network
Figure GDA0003934252880000191
And a policy function
Figure GDA0003934252880000192
Wherein
Figure GDA0003934252880000193
S52: defining Y epsilon; for the j (j is more than or equal to 1 and less than or equal to Y) th epimode, firstly resetting the environment of all the agents to obtain the initial observed quantity o of each agent i i,1
S53: defining P moments; at the t-th moment (t is more than or equal to 1 and less than or equal to P), selecting proper action for each agent i according to the strategy function
Figure GDA0003934252880000194
At the same time will act a i,t The observation value o of the next moment is obtained by transmitting the observation value o to other agents in the platform ventilation system and interacting with other agents in the system environment based on an attention mechanism i,t+1 And a prize r i,t+1 (ii) a And stores the transition matrix (o) in the experience pool D t ,a t ,o t+1 ,r t+1 );
When the agents are trained, an attention mechanism is adopted when the agents are communicated with one another; the model is helped to endow the input information of other agents with different weights, and more key and important information is extracted, so that the model can be judged more accurately, and meanwhile, larger expenses can not be brought to the calculation and storage of the model; meanwhile, when the intelligent air valve is trained, the change of the ventilation volume in different initial values is considered, so that the intelligent air valve can be used for timely adjustment when the target air volume changes due to the change of the initial values, and unnecessary air volume adjusting times are avoided.
S54: and (5) training.
And training the operator network and the critic network by using the data in the experience pool.
If the experience pool is larger in size than the mini-batch, the mini-batch will randomly select a transfer matrix dataset of size B in the experience pool
Figure GDA0003934252880000195
Using the data set to train an operator network and a critic network; computing approximate action cost function for each agent i simultaneously
Figure GDA0003934252880000201
Wherein l is more than or equal to 1 and less than or equal to B,
Figure GDA0003934252880000202
and
Figure GDA0003934252880000203
denotes the l-th a in the mini-batch i And o i (ii) a For all intelligenceThe I < th > data in the body i and the mini-batch calculate an approximate strategy function
Figure GDA0003934252880000204
And approximate action cost function
Figure GDA0003934252880000205
Then updating the weight parameters in the critic network by minimizing a loss function; computing objective policy functions for all agents i simultaneously
Figure GDA0003934252880000206
And actual action cost function
Figure GDA0003934252880000207
And updates the policies of all agents and parameters of the target network,
Figure GDA0003934252880000208
Figure GDA0003934252880000209
s55: the above steps are repeated until t = P, j = Y. And finishing the training.
S6: and testing the trained intelligent agent in the following process.
S61: initializing observations of N +1 agents: o 1 =(o 1,1 ,...,o N+1 );
S62: defining the testing time length as H moments;
s63: for the t (t is more than or equal to 1 and less than or equal to H) time, each agent obtains the strategy function p at the t time according to the learning q (·|o i,t ) To select the corresponding action a i,t (ii) a Simultaneously executing the selected action in the platform air conditioning system by all the intelligent agents; the system environment will give the observed quantities o of all agents at the next moment after the action moment is over i,t+1
S64: and repeating the step S62 until all the H moments are finished.
S7: and controlling the ventilation system of the ocean platform on line by the trained and qualified intelligent agent.
The ventilation control method provided by the invention is used for controlling the multi-cabin ocean platform ventilation system, can solve the problem that the online debugging in a large-scale solution space cannot be carried out in the traditional method, and can quickly adjust the temperature and the concentration of hydrogen sulfide to a reasonable range without wasting a large amount of time to calculate all possible solutions when the target air volume is changed, thereby achieving the effect of quick ventilation.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (5)

1. A method for controlling a ventilation system of an offshore platform based on temperature and hydrogen sulfide concentration control, the offshore platform comprising a plurality of compartments, the method comprising the steps of:
s1: establishing a hydrogen sulfide gas concentration change differential equation in the cabin of the ocean platform;
assuming that hydrogen sulfide gas uniformly permeates into the cabin, the air supply and exhaust flows in the cabin are isothermal flows;
establishing a hydrogen sulfide gas concentration change differential equation in the cabin of the ocean platform as follows:
m i,t y i,t dt+x i,t dt-k i,t S i,t dt=J i ds;
S i,t is of volume J i In chamber i, the initial concentration of hydrogen sulfide gas in the air, dt, is a very small time slot, m i,t For the ventilation system air supply during dt times, y i,t Is the concentration of hydrogen sulfide in the air of the air supply, x i,t Amount of gas, k, permeated into chamber i for dt times for hydrogen sulfide i,t The exhaust air volume for exhausting the cabin in dt time, and ds is the increment of the hydrogen sulfide concentration in the cabin in dt time;
s2: respectively defining each cabin and each air handling unit as an intelligent agent, namely, 1-N intelligent agents are cabin intelligent agents, and N +1 intelligent agents are air handling machine intelligent agents, so as to obtain N +1 intelligent agents, wherein N is the number of the ocean platform cabins; fitting the intelligent agents by adopting a neural network, wherein each intelligent agent comprises an operator network responsible for generating the strategy and a critic network responsible for evaluating the strategy in real time;
s3: defining an observation set of N +1 agents at the time t:
s t =o t =(o 1,t ,...,o N+1,t );
wherein: o 1,t Represents the observed quantity, o, of the 1 st agent at time t N+1,t Representing the observed quantity of the agent at time N + 1;
defining each agent action a, namely cabin agent and air handler agent actions:
a t =(m 1,t ,m 2,t ,...,m N,tt );
wherein: m is 1,t For the action of the 1 st agent at time t, m 2,t For the action of agent 2 at time t, σ t The action of the (N + 1) th agent at the moment t;
s4: defining a temperature overrun penalty function for the cabin agent:
Figure FDA0003949125120000021
wherein: r is i,1,t (s t ) Penalty function for temperature overrun of cabin agent, T i,t Indicates the cabin temperature of the ith cabin at time t,
Figure FDA0003949125120000022
represents the lowest permissible value of the temperature in the cabin,
Figure FDA0003949125120000023
a maximum allowable value representing the temperature in the cabin; [] + Indicates that the term is only used]Internal value>Taking the original value when 0 is needed, otherwise, taking 0; when the indoor temperature exceeds the maximum temperature of the limited rangeWhen the temperature is higher than the set temperature,
Figure FDA0003949125120000024
when the indoor temperature is lower than the lowest temperature of the limit range,
Figure FDA0003949125120000025
when the indoor temperature is stabilized within the limit range, then r i,1,t (s t )=0;
Defining a temperature overrun penalty function of an air handling unit intelligent agent:
r N+1,1,t =0;
defining a hydrogen sulfide concentration overrun penalty function for the cabin agent:
Figure FDA0003949125120000026
wherein: r is a radical of hydrogen i,2,t (s t ) A penalty function for the hydrogen sulfide concentration overrun of the cabin intelligent agent;
Figure FDA0003949125120000027
indicating the maximum allowable concentration of hydrogen sulfide gas in the chamber region;
defining penalty function r for over-limit of hydrogen sulfide concentration of intelligent body of air handling unit N+1,2,t
Figure FDA0003949125120000028
Define the reward functions for N +1 agents:
r t =r i,1,t (s t )+br i,2,t (s t );
wherein r is t Is the reward function of the tth agent, b is a coupling factor with a positive value;
s5: carrying out intelligent agent training;
defining an action cost function Q y (s t ,a t ): is shown in state s t Lower adoption action a t Obtaining expected income, wherein y represents weight parameters obtained by training in the critic network;
defining a policy cost function p q (a | s): q is a weight parameter in the actor network;
defining an action cost function for agent i
Figure FDA0003949125120000031
Figure FDA0003949125120000032
Wherein, f i Is a two-layer multi-layer perceptron, q i Is a one-layer multi-layer perceptron embedding function, o i Representing the observed quantity, x, of the ith agent i All information obtained on behalf of the ith agent by communicating with other agents;
wherein: x is a radical of a fluorine atom i =∑ j≠i w j (W v e j );
Wherein, W v Is a covariance matrix, e j For the embedding function: e.g. of a cylinder j =q j (o j ,a j );
Figure FDA0003949125120000033
W k And W q Are all covariance matrices;
delivering covariance matrix W in an actor-critical network v 、W k And W q Continuously training and updating the N +1 critic networks to minimize the loss function of the joint regression:
Figure FDA0003949125120000034
Figure FDA0003949125120000035
wherein: l is Q (y) represents a loss function;
Figure FDA0003949125120000036
expressing the expectation of the calculation result of all the data in the experience pool;
Figure FDA0003949125120000037
representing the action value function of the agent i when the weight parameter is y; d i A target prize value representing agent i;
Figure FDA0003949125120000038
denotes that at time t the weight parameter is
Figure FDA0003949125120000039
A target policy cost function of time;
r i (o i ,a i ) Representing the income of the intelligent agent after the intelligent agent takes the action a when the observed value is o; γ represents the discount rate of the profit;
Figure FDA0003949125120000041
representing a temperature parameter and determining the balance between the mixed entropy and the benefit;
Figure FDA0003949125120000042
an approximate action cost function representing agent i;
Figure FDA0003949125120000043
expressing the expectation of the calculation result of all the data in the experience pool;
thus, a random gradient function may be defined as:
Figure FDA0003949125120000044
wherein the content of the first and second substances,
Figure FDA0003949125120000045
wherein:
Figure FDA0003949125120000046
calculating a random gradient corresponding to the ith agent; j (q) represents the corresponding loss function; e o~D,a~p Indicating that all possible outcomes are desired;
Figure FDA0003949125120000047
denotes the weight parameter q at time t i The target policy cost function of (1); u represents a set of all agents except agent i; b (o) i ,a U ) Is a state-dependent reference value that is generally used in policy gradient-like methods to reduce variance without changing the expectation of the policy gradient:
Figure FDA0003949125120000048
wherein:
Figure FDA0003949125120000049
denotes that at time t the weight parameter is
Figure FDA00039491251200000410
The cost function of the policy in time,
Figure FDA00039491251200000411
representing an action cost function of the agent i under the condition that the observed value is o;
and training the intelligent agent until the loss function and the random gradient function meet the training condition, and applying the intelligent agent qualified in training to the online control of the ventilation system of the ocean platform.
2. The method of claim 1, wherein the step of performing intelligent agent training after defining the loss function and the stochastic gradient function in step S5 further comprises:
s51: an initialization step: initializing the capacity of an experience pool D and the state environment of N +1 intelligent agents in an ocean platform ventilation system; the state environment described here includes the initial values of the outdoor temperature, the number of persons in the cabin, the weight q, and the weight y; initializing an approximate action cost function for a target network
Figure FDA00039491251200000412
And a policy function
Figure FDA0003949125120000051
Wherein
Figure FDA0003949125120000052
S52: defining Y epsilon; for the j (j is more than or equal to 1 and less than or equal to Y) th epimode, firstly resetting the environment of all the agents to obtain the initial observed quantity o of each agent i i,1
S53: defining P moments; at the tth moment (t is more than or equal to 1 and less than or equal to P), each agent i selects a proper action according to the strategy function
Figure FDA0003949125120000053
At the same time will act a i,t The observation value o of the next moment is obtained by transmitting the observation value o to other agents in the platform ventilation system and interacting with other agents in the system environment based on an attention mechanism i,t+1 And a prize r i,t+1 (ii) a And stores the transition matrix (o) in the experience pool D t ,a t ,o t+1 ,r t+1 );
S54: training an operator network and a critic network by adopting data in the experience pool; computing an approximate action cost function for each agent i
Figure FDA0003949125120000054
Wherein l is more than or equal to 1 and less than or equal to B,
Figure FDA0003949125120000055
and
Figure FDA0003949125120000056
denotes the l-th a in the mini-batch i And o i (ii) a Calculating an approximate policy function for the I-th data in all agent i and mini-batch
Figure FDA0003949125120000057
And approximate action cost function
Figure FDA0003949125120000058
Then updating the weight parameters in the critic network by minimizing a loss function; computing objective policy functions for all agents i simultaneously
Figure FDA0003949125120000059
And approximate action cost function
Figure FDA00039491251200000510
And updates the policies of all agents and parameters of the target network,
Figure FDA00039491251200000511
s55: and repeating the steps until t = P, j = Y, and finishing the training.
3. The method of claim 2 for controlling an ocean platform Ventilation System based on temperature and Hydrogen sulfide concentration control,
if the size of experience pool D is larger than mini-batch, then a transfer matrix data set of size B is randomly selected from the experience pool
Figure FDA00039491251200000512
To train the operator network and the critical network.
4. The method of claim 1 for ocean platform ventilation system control based on temperature and hydrogen sulfide concentration control, wherein:
m i,t c discrete quantities are provided, and each discrete quantity corresponds to the opening degree of a variable air volume air bellow air valve in a cabin:
Figure FDA0003949125120000061
total air valve angle sigma in air handling unit of ocean platform t There are Z discrete quantities, each corresponding to a total damper opening:
Figure FDA0003949125120000062
5. the method of claim 1 for ocean platform ventilation system control based on temperature and hydrogen sulfide concentration control,
defining an observed quantity of a cabin agent i at a current time t:
Figure FDA0003949125120000063
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003949125120000064
representing the ambient temperature outside the ocean platform at time T, the set U representing the set of other cabin zones in the ocean platform except the cabin i, T j,t Representing the room temperature, K, of the chamber z in the set at time t i,t Indicates the number of persons in the cabin i at time t, S i,t The concentration of hydrogen sulfide gas at the moment t of the ith cabin is shown, and t' represents a time interval index in one day and is calculated by dividing the total time of one day by the time interval;
defining the action of an intelligent agent of the air conditioner processor:
o N+1,t =(t′,K 1,t ,...,K N,t ,S 1,t ,...,S N,t );
and obtaining an observed quantity group of N +1 intelligent agents at the t moment based on the observed quantity of the cabin intelligent agents and the action of the total fan intelligent agents.
CN202210124691.0A 2022-02-10 2022-02-10 Ocean platform ventilation system control method based on temperature and hydrogen sulfide concentration control Active CN114484822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210124691.0A CN114484822B (en) 2022-02-10 2022-02-10 Ocean platform ventilation system control method based on temperature and hydrogen sulfide concentration control

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210124691.0A CN114484822B (en) 2022-02-10 2022-02-10 Ocean platform ventilation system control method based on temperature and hydrogen sulfide concentration control

Publications (2)

Publication Number Publication Date
CN114484822A CN114484822A (en) 2022-05-13
CN114484822B true CN114484822B (en) 2023-01-31

Family

ID=81478288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210124691.0A Active CN114484822B (en) 2022-02-10 2022-02-10 Ocean platform ventilation system control method based on temperature and hydrogen sulfide concentration control

Country Status (1)

Country Link
CN (1) CN114484822B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115356919B (en) * 2022-10-19 2023-01-24 吉林省百皓科技有限公司 Self-adaptive adjusting method for PID controller of chlorine dioxide sterilizer
CN116610037B (en) * 2023-07-17 2023-09-29 中国海洋大学 Comprehensive optimization control method for air quantity of ocean platform ventilation system
CN117022633B (en) * 2023-10-08 2024-02-20 中国海洋大学 Ventilation control method of prefabricated cabin ventilation system for ship or ocean platform

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777528B (en) * 2016-11-25 2017-11-21 山东蓝光软件有限公司 The holographic forecast method of mine air-required volume
CN208224875U (en) * 2018-05-29 2018-12-11 厦门大学 A kind of container laboratory environment total management system
JP7014860B2 (en) * 2019-12-10 2022-02-01 環境リサーチ株式会社 Air sampling device, remote air measuring device, air sampling system, remote air measuring system, air sampling program, remote air measuring program and recording medium
CN114000907A (en) * 2021-12-10 2022-02-01 重庆邮电大学 Mine ventilation equipment intelligent regulation and control system based on digital twin technology

Also Published As

Publication number Publication date
CN114484822A (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN114484822B (en) Ocean platform ventilation system control method based on temperature and hydrogen sulfide concentration control
CN106765959A (en) Heat-air conditioner energy-saving control method based on genetic algorithm and depth B P neural network algorithms
Kumar et al. Energy analysis of a building using artificial neural network: A review
CN109189190A (en) A kind of data center&#39;s thermal management method based on temperature prediction
CN109946987A (en) A kind of life of elderly person environment optimization monitoring method Internet-based
CN114322199B (en) Digital twinning-based ventilation system autonomous optimization operation regulation and control platform and method
CN106371316B (en) Water island dosing On-Line Control Method based on PSO LSSVM
Alam et al. Uncertainties in neural network model based on carbon dioxide concentration for occupancy estimation
CN116610037B (en) Comprehensive optimization control method for air quantity of ocean platform ventilation system
CN112634083A (en) Building energy-saving monitoring method
CN107818340A (en) Two-stage Air-conditioning Load Prediction method based on K value wavelet neural networks
CN109857177B (en) Building electrical energy-saving monitoring method
JPH0713611A (en) Device and method for evaluating process model
CN106707999A (en) Building energy-saving system based on self-adaptive controller, control method and simulation
CN109857988A (en) A kind of safety monitoring method of cold ground assembled Modern Wood Construction
CN101833281A (en) Control method for saving energy of aeration in sewage treatment
CN109816937A (en) A kind of the gas leakage early warning system and method for tank car
CN113505492A (en) Scheduling method of cross-basin water transfer project based on digital twin technology
Song Intelligent PID controller based on fuzzy logic control and neural network technology for indoor environment quality improvement
CN114838452B (en) Intelligent air valve applied to variable air volume system, system and control method
Kim et al. Optimization of supply air flow and temperature for VAV terminal unit by artificial neural network
CN117128624A (en) Control method and device for fresh air conditioner and air conditioner management system and storage medium
CN107461881A (en) The refrigeration host computer efficiency diagnostic method and its system of a kind of hospital&#39;s Air Conditioning Facilities
Xiong et al. Model free optimization of building cooling water systems with refined action space
Xu et al. Fire Safety Assessment of High-Rise Buildings Based on Fuzzy Theory and Radial Basis Function Neural Network.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant