CN114484822B

CN114484822B - Ocean platform ventilation system control method based on temperature and hydrogen sulfide concentration control

Info

Publication number: CN114484822B
Application number: CN202210124691.0A
Authority: CN
Inventors: 崔璨; 薛璟; 黎明
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2022-02-10
Filing date: 2022-02-10
Publication date: 2023-01-31
Anticipated expiration: 2042-02-10
Also published as: CN114484822A

Abstract

The invention provides a control method of an ocean platform ventilation system based on temperature and hydrogen sulfide concentration control, which comprises the following steps: establishing a hydrogen sulfide gas concentration change differential equation in the ocean platform cabins, respectively defining the variable air volume air box and the air handling unit in each cabin as an intelligent agent, and obtaining N +1 intelligent agents in total, wherein N is the number of the ocean platform cabins; fitting the intelligent agents by adopting a neural network, wherein each intelligent agent comprises an operator network responsible for generating the strategy and a critic network responsible for evaluating the strategy in real time; defining states and actions of N +1 intelligent agents at the time t to define an over-limit penalty function of the temperature and the hydrogen sulfide concentration of the cabin intelligent agents; and carrying out intelligent training. The control method of the ocean platform ventilation system can control the hydrogen sulfide gas with the temperature and the concentration exceeding the standard in the cabin of the ocean platform.

Description

Ocean platform ventilation system control method based on temperature and hydrogen sulfide concentration control

Technical Field

The invention relates to the technical field of intelligent control, in particular to a control method of an ocean platform ventilation system based on temperature and hydrogen sulfide concentration control.

Background

The ventilation system is an important public system for ensuring the normal production and the personnel life of the ocean platform, and in order to ensure the normal production of the platform and the normal life of the working personnel, the living area and the production area of the ocean platform are ventilated so as to effectively eliminate the heat dissipation capacity of a room of electronic equipment, maintain the design temperature of the room and ensure the normal operation of the electronic equipment; the ventilation mode and the ventilation quantity are different according to different regions, and whether the offshore platform has good ventilation is an important basis for dividing dangerous regions, and is also an important factor related to the safety of the platform. From the safety perspective, the ventilation measures can prevent the gathering and diffusion of inflammable and explosive gases and toxic and harmful gases, maintain the positive pressure or negative pressure requirement of the room, provide enough fresh air for workers, and ensure a comfortable and sanitary living environment.

For an oil production ocean platform, some oil well products contain toxic hydrogen sulfide gas, and a small amount of high-concentration hydrogen sulfide gas absorbed by the oil well products can be fatal in a short time. The hydrogen sulfide gas will diffuse naturally into the ocean platform cabin. Ventilation is now widely used on ocean platforms to promote rapid diffusion dilution of the leaking hydrogen sulfide gas. For the accident scene of hydrogen sulfide leakage with small initial concentration, the ventilation measure is the currently available and effective slowing measure, and the health and the safety of ocean platform operators are prevented from being influenced by the aggregation and the diffusion of the hydrogen sulfide.

The ocean platform is usually a multi-cabin structure, more than one area needing ventilation is often provided, personnel, functions and the like in each cabin are different, and the concentration of hydrogen sulfide gas is different, so the ventilation requirement is also different. How to satisfy the ventilation demand that each region and platform operation personnel are different on the platform, on the basis of guaranteeing reasonable ventilation cooling, furthest dilutes and prevents its diffusion the hydrogen sulfide gas who reveals, and it is very important problem to provide safe comfortable offshore platform operational environment.

At present, the ventilation control method applied to the ocean platform mainly realizes the automatic control of the flow, the pressure difference and the like of the air conditioning system through the monitoring of the air conditioning system on the parameters such as the air flow, the temperature, the pressure and the like in the cabin, but the method depends on the hardware measures such as a fan, a pressure difference transmitter, a PLC (programmable logic controller) and the like, and meanwhile, the manual operation of an operator is required, so that the inconvenience is brought to the ventilation of the ocean platform. In addition, a method for pertinently selecting a corresponding ventilation mode and a corresponding fan type according to the type and specific requirements of each cabin of the ocean platform is also provided, and the method needs to consider the specific situation of the cabin and carry out specific thermodynamic modeling on the cabin, is complex to implement and has no universality.

Disclosure of Invention

The invention aims to solve one of the technical problems and provides a control method suitable for a multi-cabin ventilation system of an ocean platform, aiming at scientifically and reasonably realizing control indexes of temperature and hydrogen sulfide concentration, so that the ventilation system can meet different temperature requirements of each cabin by adjusting the air supply volume of different cabins under the condition that the ocean platform normally works and operates; when the ocean platform has hydrogen sulfide leakage accidents, because the ventilation measure is a measure for effectively preventing the gathering and diffusion of hydrogen sulfide, toxic gas such as hydrogen sulfide can permeate into a working area and a living area of the ocean platform, the ventilation system can monitor that the concentration of the hydrogen sulfide in a room exceeds a safety range while controlling the temperature, and a proper ventilation strategy is adopted to reduce the concentration of the hydrogen sulfide gas in each cabin as much as possible, improve the quality of air in the room and ensure the safety of workers in the platform.

In order to solve the above problems, the present invention provides the following technical solutions:

a control method of an ocean platform ventilation system based on temperature and hydrogen sulfide concentration control is disclosed, an ocean platform comprises a plurality of cabins, and the control method comprises the following steps:

s1: establishing a hydrogen sulfide gas concentration change differential equation in the cabin of the ocean platform;

assuming that hydrogen sulfide gas uniformly permeates into the cabin, the air supply and exhaust flows in the cabin are isothermal flows;

establishing a hydrogen sulfide gas concentration change differential equation in the cabin of the ocean platform as follows:

m _i，t y _i，t dt+x _i，t dt-k _i，t S _i，t dt＝J _i ds；

S _i，t is of volume J _i In chamber i, the initial concentration of hydrogen sulfide gas in the air, dt, is a very small time slot, m _i，t For the ventilation system air supply during dt times, y _i，t Is the concentration of hydrogen sulfide in the air of the air supply, x _i，t Amount of gas, k, permeated into chamber i for dt times for hydrogen sulfide _i，t The exhaust air volume for exhausting the cabin in dt time, and ds is the increment of the hydrogen sulfide concentration in the cabin in dt time;

s2: respectively defining each cabin and each air handling unit as an intelligent agent, namely, 1-N intelligent agents are cabin intelligent agents, and N +1 intelligent agents are air handling machine intelligent agents, so as to obtain N +1 intelligent agents, wherein N is the number of the ocean platform cabins; fitting the intelligent agents by adopting a neural network, wherein each intelligent agent comprises an operator network responsible for generating the strategy and a critic network responsible for evaluating the strategy in real time;

s3: defining an observation set of N +1 agents at the time t:

s _t ＝o _t ＝(o _1，t ，...，o _N+1，t )；

wherein: o _1，t Represents the observed quantity, o, of the 1 st agent at time t _N+1，t Representing the observed quantity of the N +1 agent at the time t;

defining agent actions a, namely cabin agent and air handler agent actions:

a _t ＝(m _1，t ，m _2，t ，...，m _N，t ，σ _t )；

wherein: m is a unit of _1，t For the action of agent 1 at time t, m _2，t For the action of agent 2 at time t, σ _t The action of the (N + 1) th agent at the moment t;

s4: defining a temperature overrun penalty function for the cabin agent:

wherein: r is a radical of hydrogen _i，1，t (s _t ) Penalty function for temperature overrun of cabin agent, T _i，t Indicates the cabin temperature of the ith cabin at time t,

represents the lowest permissible value of the temperature in the cabin,

a maximum allowable value representing the temperature in the cabin; [] ⁺ Indicates that the term is only used]If the internal value is greater than 0, taking the original value, otherwise, taking 0; when the indoor temperature exceeds the limit rangeAt the time of a high temperature, the temperature,

when the indoor temperature is lower than the lowest temperature of the limit range,

when the indoor temperature is stabilized within the limit range, then r _i，1，t (s _t )＝0；

Defining a temperature overrun penalty function of an air handling unit intelligent agent:

r _N+1，1，t ＝0；

defining a hydrogen sulfide concentration overrun penalty function of the cabin intelligent agent:

wherein: r is a radical of hydrogen _i，2，t (s _t ) A hydrogen sulfide concentration overrun penalty function of the cabin intelligent body;

indicating the maximum allowable concentration of hydrogen sulfide gas in the chamber region;

defining penalty function r for over-limit of hydrogen sulfide concentration of intelligent body of air handling unit _N+1，2，t ：

Define the reward functions for N +1 agents:

r _t ＝r _i，1，t (s _t )+br _i，2，t (s _t )；

wherein r is _t Is the reward function of the tth agent, b is a coupling factor with a positive value;

s5: carrying out intelligent agent training;

defining an action cost function Q _y (s _t ，a _t ): is shown inState s _t Lower adopting action a _t Obtaining expected income, wherein y represents weight parameters obtained by training in the critic network;

defining a policy merit function pi _q (a | s): q is a weight parameter in the actor network;

defining an action cost function for agent i

Wherein, f _i Is a two-layer multi-layer perceptron, q _i Is a one-layer multi-layer perceptron embedding function, o _i Representing the observed quantity, x, of the ith agent _i All information obtained on behalf of the ith agent communicating with other agents;

wherein: x is the number of _i ＝∑ _j≠i w _j (W _v e _j )；

Wherein, W _v Is a covariance matrix, e _j For the embedding function: e.g. of a cylinder _j ＝q _j (o _j ，a _j ) (ii) a h (x) is a non-linear activation function,

represents the degree of interest, W, of agent i to information provided by agents j other than agent i _k And W _q Are all covariance matrices;

delivering covariance matrix W in an operator-critical network _v 、W _k And W _q Continuously training and updating the N +1 critic networks to minimize a loss function of the joint regression:

wherein: l is a radical of an alcohol _Q (y) represents a loss function;

expressing the expectation of the calculation result of all the data in the experience pool;

representing the action value function of the agent i when the weight parameter is y;

indicates that at time t the weighting parameter is

A target policy cost function of time; γ represents the discount rate of the profit;

representing a temperature parameter and determining the balance between the mixed entropy and the benefit;

representing a target action cost function of the agent i; d is a radical of _i A target reward value representing agent i; r is a radical of hydrogen _i (o _i ，a _i ) Representing the income of the intelligent agent after the action a is taken when the observed value is O;

thus, a random gradient function may be defined as:

wherein, the first and the second end of the pipe are connected with each other,

wherein:

calculating a random gradient corresponding to the ith agent; j (q) represents the corresponding loss function; e _{o～D，a～p} Indicates that all possible outcomes are desired;

denotes the weight parameter q at time t _i The target policy cost function of (1); u represents a set of all agents except agent i; b (o) _i ，a _U ) Is a state-dependent reference value that is generally used in policy gradient-like methods to reduce variance without changing the expectation of the policy gradient:

and training the intelligent agent until the loss function and the random gradient function meet the training condition, and applying the intelligent agent qualified in training to the online control of the ventilation system of the ocean platform.

In some embodiments of the present invention, in step S5, after defining the loss function and the random gradient function, the step of performing agent training further includes:

s51: an initialization step: initializing the capacity of an experience pool D and the state environment of N +1 intelligent agents in an ocean platform ventilation system; the state environment described here includes initial values of the outdoor temperature, the number of persons in the cabin, the weight q, and the weight y; initializing weights of a target network

And a policy function

Wherein

S52: defining Y epsilon; for the j (j is more than or equal to 1 and less than or equal to Y) th epsilon, firstly resetting the environment of all the agents to obtain the initial observed quantity o of each agent i _i，1 ；

S53: defining P moments; at the tth moment (t is more than or equal to 1 and less than or equal to P), each agent i selects a proper action according to the strategy function

At the same time will act a _i，t The observation value o of the next moment is obtained by transmitting the observation value o to other agents in the platform ventilation system and interacting with other agents in the system environment based on an attention mechanism _i，t+1 And a prize r _i，t+1 (ii) a And stores the transition matrix (o) in the experience pool D _t ，a _t ，o _t+1 ，r _t+1 )；

S54: training an operator network and a critic network by adopting data in the experience pool; calculating an approximate action cost function for each agent i

Wherein l is more than or equal to 1 and less than or equal to B,

and

denotes the l-th a in the mini-batch _i And o _i (ii) a Calculating an approximate policy function for the I-th data in all agent i and mini-batch

And approximate action cost function

Then updating the weight parameters in the critic network by minimizing a loss function; computing targets for all agents i simultaneouslyPolicy function

And actual action cost function

And updates the policies of all agents and parameters of the target network,

s55: and repeating the steps until t = P, j = Y, and finishing the training.

In some embodiments of the invention:

if the experience pool D is larger than the mini-batch in size, then a transfer matrix dataset of size B is randomly selected in the experience pool

To train the operator network and the critical network.

In some embodiments of the present invention, m _i，t C discrete quantities are provided, and each discrete quantity corresponds to the opening degree of a variable air volume air bellow air valve in a cabin:

total air valve angle sigma in ocean platform air handling unit _t There are Z discrete quantities, each corresponding to a total damper opening:

in some embodiments of the invention, an observed quantity of a cabin agent i at a current time t is defined:

wherein，

Representing the ambient temperature outside the ocean platform at time T, the set U representing the set of other cabin zones in the ocean platform except the cabin i, T _j，t Representing the room temperature, K, of the chamber z in the set at time t _i，t Indicates the number of persons in the cabin i at time t, S _i，t The concentration of hydrogen sulfide gas at the moment t of the ith cabin is shown, and t' represents a time interval index in one day and is calculated by dividing the total time of one day by the time interval;

defining actions of an air conditioner processor intelligent agent:

o _N+1，t ＝(t′，K _1，t ，...，K _N，t ，S _1，t ，...，S _N，t )；

and obtaining an observed quantity group of N +1 intelligent agents at the t moment based on the observed quantity of the cabin intelligent agents and the action of the total fan intelligent agents.

The system provided by the invention has the beneficial effects that:

1. the control method of the ocean platform ventilation system can control hydrogen sulfide gas with over-standard temperature and concentration in the ocean platform cabin.

2. The method can simultaneously carry out ventilation control on a plurality of cabins, meets the ventilation quantity requirements of different cabins, does not need to establish a ventilation model for the cabins, and further can avoid errors caused by inaccurate models; when the intelligent agent is trained to select actions, only the current observed quantity is used, and any prior knowledge about uncertain parameters in the system is not needed, so that the air volume control method improves the applicability and improves the ventilation efficiency.

3. The method does not need manual adjustment, can realize automatic and rapid ventilation, has strong universality, and can reduce the control cost of the ventilation system of the ocean platform; under the condition of giving any initial value, the trained intelligent agent can quickly adjust the ventilation quantity, adjust the control parameter to a reasonable range, meet the personalized fresh air control requirement of the cabin, and eliminate the potential safety hazard caused by the leakage of the hydrogen sulfide toxic gas in the offshore platform. The ventilation method applied to the ocean platform is more diversified while the ventilation efficiency of the ocean platform is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required to be used in the embodiments or the prior art description will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings may be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic diagram of interaction of various agents of an ocean platform.

FIG. 2 is a flow chart of ocean platform agent training.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a control method of a ventilation system of an ocean platform, which is based on the working environment of the ocean platform and takes the factors of temperature and hydrogen sulfide concentration into consideration, so that the control method of the ventilation system of the ocean platform with high safety coefficient based on basic temperature control and hydrogen sulfide concentration control is provided.

First, the structure of the ventilation system of the ocean platform in the background art is described.

The ocean platform air conditioning system comprises an air processing unit acting on the whole ocean platform and a series of variable air volume air boxes arranged in each cabin area, wherein the air processing unit consists of a main air valve, a cooling/heating coil and a variable frequency fan. The variable air volume wind box of each chamber is provided with a vent, and the ventilation volume of the vent determines the temperature and the concentration of the hydrogen sulfide in the chamber. The ventilation quantity of the ventilation opening of the variable air volume bellows in each cabin is determined by a total air valve (controlling the air supply quantity), the opening degree of the air valve of the ventilation opening (controlling the air output quantity) and the like.

The overall idea of the control method of the ocean platform ventilation system provided by the invention is as follows: the invention regards the variable air volume bellows of each cabin area of the offshore platform and the main air valve in the air handling unit as intelligent bodies, and realizes the ventilation target by controlling the cooperative cooperation of a plurality of intelligent bodies. According to the invention, a proper neural network needs to be designed, the action, the state and the reward of each intelligent agent are defined at the same time, a multi-intelligent agent deep reinforcement learning model is established, an air conditioning system on a platform can control the ventilation volume of each cabin under the condition of giving the outdoor temperature of the platform and the number of people in each cabin through training the intelligent agents, so that under the condition that an ocean platform normally works and runs, a ventilation system can meet the different temperature requirements of each cabin by adjusting the air supply volume of different cabins, reasonable ventilation and heat dissipation are ensured, the working comfort and the living comfort of workers in the cabins are ensured, and the service life of equipment, energy conservation and consumption reduction are ensured to the maximum extent; when the ocean platform has hydrogen sulfide leakage accidents, as the ventilation measure is a measure for effectively preventing the gathering and diffusion of hydrogen sulfide, toxic gas such as hydrogen sulfide can permeate into working areas and living areas of the ocean platform, the ventilation system can control the temperature and monitor that the concentration of the hydrogen sulfide in the room exceeds a safety range, and a proper ventilation strategy is adopted to reduce the concentration of the hydrogen sulfide gas in each cabin as much as possible, improve the air quality in the room and ensure the safety of workers in the platform.

The control targets of the ventilation system include an indoor temperature control target and a hydrogen sulfide gas concentration control target.

(1) And controlling the temperature of the cabin of the ocean platform.

Wherein, T _i，t Indicates the temperature in the cabin at time t of the ith cabin, which is detected by a temperature sensor provided in the cabin.

Represents the lowest permissible value of the temperature in the cabin,

representing the maximum allowable value of the temperature in the cabin. The area between the two temperature values is the limiting area of the cabin temperature. The minimum allowable value and the maximum allowable value are set according to the specific working environment of each cabin by considering the requirements of operators, the requirements of storage in the cabin and the like.

For example, the temperature control ranges of a cabin of a general ocean platform are different between a person and an unmanned person, when an operator is in the cabin, the requirement of the operator is mainly considered, and the health of the operator is taken as a reference; for an unmanned cabin, the normal operation of electrical equipment is generally maintained, reasonable ventilation and heat dissipation are carried out, and positive pressure or negative pressure in a room is maintained. According to the standard of a design method for a heating, ventilating and air conditioning system of an offshore platform, which is established by the general oil company in China, the indoor temperature of a manned working area or part of living areas, such as a control room, a communication room, a living house, a dining room and the like, in an offshore platform is generally controlled to be between 19 and 24 ℃; the indoor temperature of other living areas in the platform, such as toilets, storerooms and the like, is generally between 16 and 25 ℃; the indoor temperature of an unmanned working area in the platform, such as a battery room and the like, needs to be controlled between 15 and 35 ℃, the temperature of a transformer needs to be controlled between 5 and 45 ℃, and the temperature of other generator rooms, fire pump rooms and the like needs to be controlled between 5 and 35 ℃.

In practice, the indoor temperature of each cabin of the ocean platform is influenced by many factors, for example, due to the existence of the coupling effect, the indoor temperature of other cabins influences the indoor temperature of the cabin, and besides, the outdoor temperature and the ventilation quantity have the effect, and in the invention, the ventilation quantity in the room is expected to be changed to effectively ventilate and radiate the room.

(2) And controlling the concentration of the hydrogen sulfide gas on the ocean platform.

In the process of exploiting an acid gas field by an offshore platform, blowout is out of control and generally occurs in an open area of the platform, so leaked natural gas and hydrogen sulfide gas are easy to gather in the platform, a high-concentration natural gas explosion area and a high-concentration hydrogen sulfide poisoning area can be formed, the health of offshore platform workers and the production efficiency of the offshore platform workers are influenced, and serious casualties and property loss are caused; and the hydrogen sulfide gas can be diffused gradually and permeates into a safety area and a working area in the ocean platform, and in a cabin area with part of the permeated hydrogen sulfide gas but low indoor hydrogen sulfide concentration, a ventilation system of the ocean platform can provide guarantee for diluting the indoor hydrogen sulfide gas and reducing the hydrogen sulfide concentration. Therefore, the concentration of hydrogen sulfide in each cabin in the platform is selected to represent the air quality, and in order to avoid accidents, the concentration of hydrogen sulfide gas should be controlled within a safe range:

wherein S is _i，t Indicates the concentration of hydrogen sulfide gas in the ith chamber at time t,

indicating the maximum allowable concentration of hydrogen sulfide gas in the chamber region, and when it exceeds that concentration, indicating that a hazardous condition has been reached within the chamber. The concentration can be set according to the relevant safety protection regulation file.

According to relevant regulations in shallow sea oil work hydrogen sulfide protection safety regulations (SY 6504-2010): "the hydrogen sulfide alarm device should be installed on the shallow sea petroleum facility equipped with hydrogen sulfide fixed probe, and can alarm to the safety facility, when the concentration of hydrogen sulfide in the air reaches 15mg/m ³ (10 ppm), the system can work in an audible and visual alarm mode. "" when the concentration of hydrogen sulfide gas reaches 150mg/m ³ (100 ppm) and cannot be controlled, and emergency evacuation of personnel and facilities is carried out according to the requirements of emergency plans when crisis personnel and facilities are safe. "furthermore, the threshold mass concentration of hydrogen sulfide is 15mg/m ³ The safety critical mass concentration is 30mg/m ³ And critical substances of dangerThe dosage concentration is 150mg/m ³ . The above standards provide basis for the safety range of the concentration of hydrogen sulfide in the ocean platform, and provide scientific and reasonable indexes when the intelligent agent is punished that the concentration of hydrogen sulfide exceeds the limit.

Specifically, the control method of the ocean platform ventilation system provided by the invention comprises the following steps.

S1: and establishing a hydrogen sulfide gas concentration change differential equation in the cabin of the ocean platform.

In an ideal case, the following assumptions are: the hydrogen sulfide gas uniformly permeates into the room (the concentration distribution of harmful substances in the indoor air is uniform), and the air flow of the air supply and the air exhaust is isothermal.

On the basis, the change of the concentration of the hydrogen sulfide gas in the chamber can establish a differential equation according to the principle of 'material balance', and for a continuous and stable ideal ventilation process, the differential equation can be listed as follows:

m _i，t y _i，t dt+x _i，t dt-k _i，t S _i，t dt＝J _i ds；

wherein m is _i，t The air volume of the ith cabin at the time t is determined by the opening and the angle of an air valve of an air volume-variable air bellow in the cabin; k is a radical of _i，t Representing the air output of the ith cabin at the time t; y is _i，t Represents the concentration of hydrogen sulfide gas in the supply air at time t in the ith compartment, dt represents a hypothetical very small time slot, x _i，t Represents the amount of gas of hydrogen sulfide gas permeating into the chamber space at time t in the ith chamber, J _i Is the volume of chamber i, ds represents the increase in the concentration of hydrogen sulfide gas in the chamber over time dt;

wherein m is _i，t There are C discrete quantities, each discrete quantity corresponding to one damper opening:

similarly, the total damper angle in the air handling unit of the ocean platform may also be selected among the following Z discrete values:

s2: the variable air volume air box and the air handling unit in each cabin are respectively defined as an intelligent agent, namely a cabin intelligent agent and an air handling unit intelligent agent.

The intelligent agents adopt neural network fitting, can learn to update neural network parameters, and each intelligent agent comprises a strategy generation network and a strategy evaluation network.

The air conditioning system ventilation control method provided by the invention adopts an operator-critic algorithm in reinforcement learning, combines the characteristics of two types of reinforcement learning methods based on values and strategies, and comprises an operator network responsible for generating the strategies and a critic network responsible for evaluating the strategies in real time.

All the operator networks have the same structure and comprise an input layer and a plurality of hidden layers, wherein each hidden layer is provided with an activation function of a Leaky ReLU, and the operator networks also comprise an output layer which takes a softmax function as the activation function. Similarly, all the critic networks have the same structure and also comprise an input layer, a plurality of hidden layers with the Leaky ReLU function as the activation function and an output layer with the softmax function as the activation function.

Assuming that the offshore platform includes N cabins in total, N +1 agents are defined in total.

State s of N cabin agents is defined, and cabin agent air valves can adjust air output based on current environment to guarantee indoor temperature and concentration of hydrogen sulfide gas.

Defining an observed quantity of a cabin agent i at a current time t:

wherein the content of the first and second substances,

representing the ambient temperature outside the platform at time T, set U representing the set of cabin zones in the platform other than cabin i, and T _j，t It indicates the indoor temperature of the cabin j in the set at time t (both the indoor temperature and the outdoor temperature can be obtained by temperature sensors), K _i，t This indicates the number of persons in the cabin i at time t (which can be obtained by an electronic counting sensor at the door of each cabin, S) _i，t Indicates the concentration of hydrogen sulfide gas (which can be obtained by a specific hydrogen sulfide gas sensor) in the ith compartment at time t. Wherein t' represents the time interval index in one day, is related to the set time interval length, and is calculated by dividing the total time of one day by the time interval. (e.g., when the time interval τ =15 minutes, the time interval index t' is 24 × 60/15= 96).

Defining the action of an intelligent agent of the air conditioner processor:

obtaining an observed quantity group of N +1 agents at the time t:

s _t ＝o _t ＝(o _1，t ，...，o _N+1，t )。

wherein: o _1，t Represents the observed quantity, o, of the first agent (first cabin agent) at time t _N+1，t Represents the observed quantity of an agent (air handler agent) at time N + 1;

s3: defining agent action steps.

The air treatment unit comprises a cabin intelligent body, an air valve, an air inlet valve, an air outlet valve and an air inlet valve, wherein the action of the cabin intelligent body corresponds to the angle of the air valve of the variable air volume air box in the cabin and is used for controlling the air inlet volume in each cabin, and the action of the ocean platform air treatment unit intelligent body corresponds to the angle of the total air valve and is used for controlling the total air inlet volume.

The action a of each intelligent agent is the intake m of the ocean platform cabin i at the moment t for the cabin intelligent agent i (i is more than or equal to 1 and less than or equal to N) _i，t For the N +1 th agent (the air handler is regarded as the agent), the action is σ of the total air valve angle in the unit at the time t _t Thus, it is possible toFor these N +1 agents, the set of action values can be expressed as:

a _t ＝(m _1，t ，m _2，t ，...，m _N，t ，σ _t )。

s4: an agent penalty function is defined.

The penalty comprises the penalty of exceeding the limit area by the temperature in the cabin:

defining a temperature overrun penalty function for the cabin agent:

wherein: r is _i，1，t (s _t ) Penalty function for temperature overrun of cabin agent, T _i，t Indicates the temperature in the compartment at time t in the ith compartment,

represents the lowest permissible value of the temperature in the cabin,

a maximum allowable value representing a temperature in the cabin; [] ⁺ Is expressed as [ 2 ], [ 2 ]]If the internal value is greater than 0, the original value is taken, otherwise 0 is taken. Therefore, when the indoor temperature exceeds the maximum temperature of the limited range,

when the indoor temperature is stabilized within the limit range, then r _i，1，t (s _t )＝0。

The correlation between the indoor temperature of each cabin and the air valve angle of the main air valve of the unit is not large, so the invention defines r _N+1，1，t And =0. Wherein r is _N+1，1，t And (4) performing a temperature overrun penalty function for the intelligent agent of the air handling unit.

The penalty also includes a penalty for hydrogen sulfide concentration exceeding a safe range.

wherein: r is a radical of hydrogen _i，2，t (s _t ) A penalty function for the hydrogen sulfide concentration overrun of the cabin intelligent agent; whether platform operators exist in the cabin platform at the moment can influence the value of the reasonable range of the temperature and the concentration of the hydrogen sulfide. S. the _i，t Indicates the concentration of hydrogen sulfide gas in the ith chamber at time t,

indicating the maximum allowable concentration of hydrogen sulfide gas in the chamber region.

Define the reward function for N +1 agents:

r _t ＝r _i，1，t (s _t )+br _i，2，t (s _t )；

wherein r is _t Is the penalty function for the tth agent, and b is a positive coupling factor in deg.c/ppm.

In order to obtain information about the status and penalties in the ventilation system, information exchange between different agents is required. Referring to fig. 1, once the agent obtains the state information, it can determine the corresponding action a according to the current state information _t ＝(m _1，t ，m _2，t ，…，m _n，t ，σ _t ) The agent will then observe new state information at time t +1 and calculate the reward r received by the agent after selecting the action _i，t 。

S5: and carrying out intelligent training.

And (4) training the intelligent agent by adopting an operator-critical neural network.

Defining an action cost function Q _y (s _t ，a _t ): is shown in state s _t Lower adoption action a _t And obtaining expected revenue, wherein y represents weight parameters obtained by training in the critic network, and the parameters can be obtained by minimizing a loss function L in the discrete strategy time sequence difference learning _Q (y) learned.

Defining a policy cost function pi _q (a | s): can be obtained by training a policy gradient function, where q is a weight parameter in the actor network.

Defining an action cost function for agent i

Wherein f is _i Is a two-layered multi-layered perceptron, q _i Is a one-layer multi-layer perceptron embedding function, o _i Representing the observed quantity, x, of the ith agent _i Representing all information from other agents (when the neural network starts to compute the action cost function of agent i)

Then, the information of other agents is taken into consideration according to their respective weights):

x _i ＝∑ _j≠i w _j (W _v e _j )；

wherein, W _v Is a covariance matrix, and will embed a function e _j ＝q _j (o _j ，a _j ) Conversion to "value". h (x) is a non-linear activation function,

indicates how much interest agent i is interested in information provided by other agent j, here W _k And W _q All are covariance matrixes, and the embedding function is converted into 'key' and 'query' respectively.

Delivering covariance matrix W in an actor-critical network _v 、W _k And W _q Continuously training and updating all critic networks to minimize the loss function of the joint regression:

wherein: l is _Q (y) represents a loss function;

expressing the expectation of the calculation results of all the data in the experience pool;

indicates that at time t the weighting parameter is

representing a temperature parameter and determining the balance between the mixing entropy and the income;

representing a target action cost function of the agent i; d is a radical of _i A target reward value representing agent i; r is _i (o _i ，a _i ) Representing the income of the intelligent agent after the intelligent agent takes the action a when the observed value is o;

thus, a random gradient function may be defined as:

wherein:

representing the random gradient; j (q) represents the corresponding loss function; e _{o～D，a～p} Representing all possible results to expect;

denotes the weight parameter q at time t _i The target policy cost function of (2); u represents the set of all agents except agent i.

The merit function for each agent in the multi-agent system indicates whether the agent's current action will result in an increase in expected revenue. Wherein b (o) _i ，a _U ) Is a reference value related to the state, generally used in the expectation of reducing variance without changing the policy gradient in the policy gradient class method, and it is defined in the present invention that all agents use the same reference value, namely:

wherein:

indicates that at time t the weighting parameter is

The cost function of the policy in time,

an action cost function of agent i in the case where the observed value is o is represented.

Referring to fig. 2, a specific flow of the control method provided by the present invention is as follows.

S51: and (5) initializing.

Initializing the capacity of an experience pool D and the state environment of N +1 intelligent agents in an ocean platform ventilation system; the state environment described here includes the initial values of the outdoor temperature, the number of persons in the cabin, the weight q, and the weight y;

the actions of the agent described herein are generally the adjustment of the state of the air handler damper, and the adjustment of the state of the variable air volume bellows outlet damper. The action of the intelligent agent has great influence on the concentration of hydrogen sulfide gas in each cabin of the ocean platform.

Initializing weights of a target network

And a policy function

Wherein

S52: defining Y epsilon; for the j (j is more than or equal to 1 and less than or equal to Y) th epimode, firstly resetting the environment of all the agents to obtain the initial observed quantity o of each agent i _i，1 ；

S53: defining P moments; at the t-th moment (t is more than or equal to 1 and less than or equal to P), selecting proper action for each agent i according to the strategy function

When the agents are trained, an attention mechanism is adopted when the agents are communicated with one another; the model is helped to endow the input information of other agents with different weights, and more key and important information is extracted, so that the model can be judged more accurately, and meanwhile, larger expenses can not be brought to the calculation and storage of the model; meanwhile, when the intelligent air valve is trained, the change of the ventilation volume in different initial values is considered, so that the intelligent air valve can be used for timely adjustment when the target air volume changes due to the change of the initial values, and unnecessary air volume adjusting times are avoided.

S54: and (5) training.

And training the operator network and the critic network by using the data in the experience pool.

If the experience pool is larger in size than the mini-batch, the mini-batch will randomly select a transfer matrix dataset of size B in the experience pool

Using the data set to train an operator network and a critic network; computing approximate action cost function for each agent i simultaneously

Wherein l is more than or equal to 1 and less than or equal to B,

and

denotes the l-th a in the mini-batch _i And o _i (ii) a For all intelligenceThe I < th > data in the body i and the mini-batch calculate an approximate strategy function

And approximate action cost function

Then updating the weight parameters in the critic network by minimizing a loss function; computing objective policy functions for all agents i simultaneously

And actual action cost function

And updates the policies of all agents and parameters of the target network,

s55: the above steps are repeated until t = P, j = Y. And finishing the training.

S6: and testing the trained intelligent agent in the following process.

S61: initializing observations of N +1 agents: o ₁ ＝(o _1，1 ，...，o _N+1 )；

S62: defining the testing time length as H moments;

s63: for the t (t is more than or equal to 1 and less than or equal to H) time, each agent obtains the strategy function p at the t time according to the learning _q (·|o _i，t ) To select the corresponding action a _i，t (ii) a Simultaneously executing the selected action in the platform air conditioning system by all the intelligent agents; the system environment will give the observed quantities o of all agents at the next moment after the action moment is over _i，t+1 。

S64: and repeating the step S62 until all the H moments are finished.

S7: and controlling the ventilation system of the ocean platform on line by the trained and qualified intelligent agent.

The ventilation control method provided by the invention is used for controlling the multi-cabin ocean platform ventilation system, can solve the problem that the online debugging in a large-scale solution space cannot be carried out in the traditional method, and can quickly adjust the temperature and the concentration of hydrogen sulfide to a reasonable range without wasting a large amount of time to calculate all possible solutions when the target air volume is changed, thereby achieving the effect of quick ventilation.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method for controlling a ventilation system of an offshore platform based on temperature and hydrogen sulfide concentration control, the offshore platform comprising a plurality of compartments, the method comprising the steps of:

m _i,t y _i,t dt+x _i,t dt-k _i,t S _i,t dt＝J _i ds；

S _i,t is of volume J _i In chamber i, the initial concentration of hydrogen sulfide gas in the air, dt, is a very small time slot, m _i,t For the ventilation system air supply during dt times, y _i,t Is the concentration of hydrogen sulfide in the air of the air supply, x _i,t Amount of gas, k, permeated into chamber i for dt times for hydrogen sulfide _i,t The exhaust air volume for exhausting the cabin in dt time, and ds is the increment of the hydrogen sulfide concentration in the cabin in dt time;

s3: defining an observation set of N +1 agents at the time t:

s _t ＝o _t ＝(o _1,t ,...,o _N+1,t )；

wherein: o _1,t Represents the observed quantity, o, of the 1 st agent at time t _N+1,t Representing the observed quantity of the agent at time N + 1;

defining each agent action a, namely cabin agent and air handler agent actions:

a _t ＝(m _1,t ,m _2,t ,...,m _N,t ,σ _t )；

wherein: m is _1,t For the action of the 1 st agent at time t, m _2,t For the action of agent 2 at time t, σ _t The action of the (N + 1) th agent at the moment t;

s4: defining a temperature overrun penalty function for the cabin agent:

wherein: r is _i,1,t (s _t ) Penalty function for temperature overrun of cabin agent, T _i,t Indicates the cabin temperature of the ith cabin at time t,

represents the lowest permissible value of the temperature in the cabin,

a maximum allowable value representing the temperature in the cabin; [] ⁺ Indicates that the term is only used]Internal value>Taking the original value when 0 is needed, otherwise, taking 0; when the indoor temperature exceeds the maximum temperature of the limited rangeWhen the temperature is higher than the set temperature,

when the indoor temperature is stabilized within the limit range, then r _i,1,t (s _t )＝0；

r _N+1,1,t ＝0；

defining a hydrogen sulfide concentration overrun penalty function for the cabin agent:

wherein: r is a radical of hydrogen _i,2,t (s _t ) A penalty function for the hydrogen sulfide concentration overrun of the cabin intelligent agent;

defining penalty function r for over-limit of hydrogen sulfide concentration of intelligent body of air handling unit _N+1,2,t ：

Define the reward functions for N +1 agents:

r _t ＝r _i,1,t (s _t )+br _i,2,t (s _t )；

s5: carrying out intelligent agent training;

defining an action cost function Q _y (s _t ,a _t ): is shown in state s _t Lower adoption action a _t Obtaining expected income, wherein y represents weight parameters obtained by training in the critic network;

defining a policy cost function p _q (a | s): q is a weight parameter in the actor network;

defining an action cost function for agent i

Wherein, f _i Is a two-layer multi-layer perceptron, q _i Is a one-layer multi-layer perceptron embedding function, o _i Representing the observed quantity, x, of the ith agent _i All information obtained on behalf of the ith agent by communicating with other agents;

wherein: x is a radical of a fluorine atom _i ＝∑ _j≠i w _j (W _v e _j )；

Wherein, W _v Is a covariance matrix, e _j For the embedding function: e.g. of a cylinder _j ＝q _j (o _j ,a _j )；

W _k And W _q Are all covariance matrices;

delivering covariance matrix W in an actor-critical network _v 、W _k And W _q Continuously training and updating the N +1 critic networks to minimize the loss function of the joint regression:

wherein: l is _Q (y) represents a loss function;

representing the action value function of the agent i when the weight parameter is y; d _i A target prize value representing agent i;

denotes that at time t the weight parameter is

A target policy cost function of time;

r _i (o _i ,a _i ) Representing the income of the intelligent agent after the intelligent agent takes the action a when the observed value is o; γ represents the discount rate of the profit;

an approximate action cost function representing agent i;

thus, a random gradient function may be defined as:

wherein the content of the first and second substances,

wherein:

calculating a random gradient corresponding to the ith agent; j (q) represents the corresponding loss function; e _o～D,a～p Indicating that all possible outcomes are desired;

denotes the weight parameter q at time t _i The target policy cost function of (1); u represents a set of all agents except agent i; b (o) _i ,a _U ) Is a state-dependent reference value that is generally used in policy gradient-like methods to reduce variance without changing the expectation of the policy gradient:

wherein:

denotes that at time t the weight parameter is

The cost function of the policy in time,

representing an action cost function of the agent i under the condition that the observed value is o;

2. The method of claim 1, wherein the step of performing intelligent agent training after defining the loss function and the stochastic gradient function in step S5 further comprises:

s51: an initialization step: initializing the capacity of an experience pool D and the state environment of N +1 intelligent agents in an ocean platform ventilation system; the state environment described here includes the initial values of the outdoor temperature, the number of persons in the cabin, the weight q, and the weight y; initializing an approximate action cost function for a target network

And a policy function

Wherein

S52: defining Y epsilon; for the j (j is more than or equal to 1 and less than or equal to Y) th epimode, firstly resetting the environment of all the agents to obtain the initial observed quantity o of each agent i _i,1 ；

At the same time will act a _i,t The observation value o of the next moment is obtained by transmitting the observation value o to other agents in the platform ventilation system and interacting with other agents in the system environment based on an attention mechanism _i,t+1 And a prize r _i,t+1 (ii) a And stores the transition matrix (o) in the experience pool D _t ,a _t ,o _t+1 ,r _t+1 )；

S54: training an operator network and a critic network by adopting data in the experience pool; computing an approximate action cost function for each agent i

Wherein l is more than or equal to 1 and less than or equal to B,

and

And approximate action cost function

And approximate action cost function

And updates the policies of all agents and parameters of the target network,

s55: and repeating the steps until t = P, j = Y, and finishing the training.

3. The method of claim 2 for controlling an ocean platform Ventilation System based on temperature and Hydrogen sulfide concentration control,

if the size of experience pool D is larger than mini-batch, then a transfer matrix data set of size B is randomly selected from the experience pool

To train the operator network and the critical network.

4. The method of claim 1 for ocean platform ventilation system control based on temperature and hydrogen sulfide concentration control, wherein:

m _i,t c discrete quantities are provided, and each discrete quantity corresponds to the opening degree of a variable air volume air bellow air valve in a cabin:

total air valve angle sigma in air handling unit of ocean platform _t There are Z discrete quantities, each corresponding to a total damper opening:

5. the method of claim 1 for ocean platform ventilation system control based on temperature and hydrogen sulfide concentration control,

defining an observed quantity of a cabin agent i at a current time t:

representing the ambient temperature outside the ocean platform at time T, the set U representing the set of other cabin zones in the ocean platform except the cabin i, T _j,t Representing the room temperature, K, of the chamber z in the set at time t _i,t Indicates the number of persons in the cabin i at time t, S _i,t The concentration of hydrogen sulfide gas at the moment t of the ith cabin is shown, and t' represents a time interval index in one day and is calculated by dividing the total time of one day by the time interval;

defining the action of an intelligent agent of the air conditioner processor:

o _N+1,t ＝(t′,K _1,t ,...,K _N,t ,S _1,t ,...,S _N,t )；