CN117450637A

CN117450637A - Layered optimization control method for ocean platform ventilation system

Info

Publication number: CN117450637A
Application number: CN202311790669.0A
Authority: CN
Inventors: 崔璨; 薛佳慧; 付艺聪; 吴森源
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2023-12-25
Filing date: 2023-12-25
Publication date: 2024-01-26
Anticipated expiration: 2043-12-25
Also published as: CN117450637B

Abstract

The invention belongs to the technical field of ocean engineering, and provides a layered optimization control method for an ocean platform ventilation system. Defining each cabin air valve as an intelligent body; determining a control target of the ocean platform ventilation system, and designing an intelligent body reinforcement learning element; according to the set control target of the ocean platform ventilation system, strengthening learning elements; training cabin air valve intelligent bodies based on a SAC algorithm to obtain required air quantity of each cabin of the ocean platform ventilation system; based on a cabin ventilation control target, carrying out proportional balance control on the air quantity of the ocean platform ventilation system, controlling the proportion error of the actual air quantity and the required air quantity of all cabins to be minimum, and solving to obtain the optimal cabin air valve angle with the consistent required air quantity proportion; and (3) carrying out proportional recovery according to the solving result of the proportional balance control to obtain the fan power after optimization solving.

Description

Layered optimization control method for ocean platform ventilation system

Technical Field

The invention relates to the technical field of ocean engineering, in particular to a layered optimization control method for an ocean platform ventilation system.

Background

Ocean platforms are important structural facilities that provide production and living facilities for offshore drilling, oil recovery, oil storage, and other activities. Insufficient ventilation of the ocean platform can cause unsmooth ventilation in the cabin, increase the concentration of pollutants in the cabin, reduce the comfort level of personnel and increase the risk of respiratory diseases (such as parallel building syndrome). Conversely, excessive ventilation can deliver unnecessary air volume to a compartment, resulting in energy waste. Therefore, the ventilation system is an important system for ensuring the normal production of the ocean platform and the normal life of personnel as an important component of the ocean platform.

Ocean platforms are usually multi-cabin structures, and various cabin functions are different, so that the control requirements can be different. The control strategy of establishing an optimal balance between maintaining each cabin air quality (IAQ), reducing ventilation energy consumption, is one of the major challenges facing the field of ocean platform ventilation system control.

Currently, demand Control Ventilation (DCV) strategies are widely used for ocean platform IAQ control. The basic idea of DCV is to use pre-designed ventilation schemes to introduce external fresh air to replace or dilute the concentration of contaminants (e.g. carbon dioxide) in the cabin to maintain IAQ in each cabin and to protect the health and safety of the ocean platform personnel. Practice proves that the implementation of a reasonable and effective ventilation scheme can achieve IAQ control targets of all cabins and save 25% -40% of energy. However, DCV also has its own drawbacks, i.e. improper ventilation scheme may result in insufficient or excessive ventilation of the cabin, resulting in too low IAQ or waste of energy. In addition, most of the research on DCV defaults that the ventilation system of the ocean platform can accurately track the required air quantity, and the error existing between the ventilation system and the ventilation system is ignored. Meanwhile, in the range of control problems, inherent nonlinearity and strong coupling characteristics of the platform ventilation system are not fully studied, for example, after ventilation volume adjustment, time delay occurs in air quality response in a cabin due to the existence of time sequence coupling. Therefore, how to realize rapid, accurate and stable air volume supply in the ocean platform IAQ control is a problem that most ocean platform ventilation system control methods ignore, and is one of the problems discussed in the invention.

Disclosure of Invention

The invention aims to solve the technical problems and provides a layered optimization control method for an ocean platform ventilation system.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the ocean platform ventilation system layering optimization control method comprises a fan, a main ventilation pipeline and a plurality of cabins, wherein the fan is communicated with the main ventilation pipeline, a main air valve is arranged on the main ventilation pipeline, and the main ventilation pipeline is communicated with each cabin; each cabin is provided with a variable air box, and each variable air box is provided with a cabin air valve;

the control method comprises the following steps:

s1: defining a cabin air valve of each cabin as an intelligent body; determining a control target of the ocean platform ventilation system, and designing an intelligent body reinforcement learning element; the control targets comprise a cabin carbon dioxide concentration control target, a fan static pressure, a cabin air valve opening degree, a cabin ventilation volume control target, an actual air volume proportion error control target, a cabin air valve control limit target, an air valve and a fan coordination control target; the reinforcement learning element comprises a cabin air valve intelligent body state element, a cabin air valve intelligent body action element and a cabin air valve intelligent body rewarding element;

s2: training cabin air valve intelligent bodies by adopting a strategy-value network based on SAC algorithm according to a set control target and reinforcement learning element of the ocean platform ventilation system to obtain the required air quantity of each cabin of the ocean platform ventilation system;

s3: based on a cabin ventilation control target, carrying out proportional balance control on the air quantity of the ocean platform ventilation system, controlling the proportion error of the actual air quantity and the required air quantity of all cabins to be minimum, and solving to obtain the optimal cabin air valve angle with the consistent required air quantity proportion;

s4: and (3) carrying out proportional recovery according to the solving result of the proportional balance control to obtain the fan power after optimization solving.

In some embodiments of the present invention, the control objective of the platform ventilation system includes one or a combination of the following objectives:

target 1: cabin carbon dioxide concentration control target:

in the middle ofRepresentation oftTime cabin->CO of (c) ₂ Real-time concentration; />Indicating cabin->CO of (c) ₂ An upper concentration limit;indicating cabin->CO of (c) ₂ A lower concentration energy conservation limit; />Representing the number of cabins;

target 2: fan static pressure, cabin air valve opening, cabin ventilation control target:

in the method, in the process of the invention,representation->At moment fan static pressure>And->Represents the specified operating range of the static pressure of the fan,is the minimum pressure value of the static pressure of the fan, < >>The maximum pressure value is the static pressure of the fan; />Representation->Time cabin->Cabin air valve angle, < ->And->Indicating cabin->The action limiting range of the cabin air valve, +.>Is a cabin->Minimum air valve angle of cabin air valve, +.>Is a cabin->A maximum damper angle of the cabin damper; />Representation oftTime cabin->Is characterized in that the required air quantity of the air conditioner,and->Indicating cabin->Limiting range of required air quantity->Is a cabin->Minimum required air volume, +.>Is a cabin->The required air quantity is the maximum value;

target 3: actual air volume proportion error control target:

in the middle ofIs indicated at->Time cabinChamber->Is the actual air quantity of the air conditioner; />Indicating relative error in airflow rate; />Representing the number of cabins;

target 4: the cabin damper control limit target should ensure that at least one damper is fully open:

when (when)When the cabin is indicated->Is fully opened;

target 5: and the air valve and the fan coordinate control target:

in the middle ofRepresenting the maximum fan static pressure under the energy-saving control target; />And the maximum air valve angle of the cabin air valve under the energy-saving control target of the ocean platform ventilation system is represented.

In some embodiments of the invention, cabin damper agent statusIs defined by:

wherein,representation->Time cabin->CO of (c) ₂ Concentration; />Representation oftCO outside the time cabin ₂ Concentration; />Representation ofTime cabin->Is set according to the required air quantity; />Representation->Time cabin->Is the number of people; />Representation->Human metabolism rate at moment;

cabin air valve intelligent body actionIs defined by:

wherein,representation->Time cabin->The air quantity variation amount of the air conditioner;

the damper agent training reward element includes:

wherein,shaping the reward function for the threshold value,/->Indicating a bootstrapping agentkLearned CO ₂ The prize of the concentration shapes the lower limit,c ₁ representing simultaneous satisfaction of all cabin COs ₂ The prize value at the time of concentration limitation and ventilation energy saving requirement,c ₂ representing cabin CO ₂ Shaping the prize value for the threshold value of the concentration; />Accelerating a bonus function for convergence;c ₃ representing a convergence acceleration prize value; />Limiting the bonus function for boundaries; />Indicating that all cabins are +.>CO at the moment of time ₂ Concentration of the set of relative errors, +.>Indicating cabin->At->CO at the moment of time ₂ Relative error of concentration, ++>Representing the number of cabins; />A function of standardizing rewards for the air quantity;

designing reinforcement learning reward functions：

Wherein,，/>，/>and->Is the positive weight coefficient of the four sub-bonus functions.

In some embodiments of the present invention,only when the following two formulas are satisfied:

wherein,indicating cabinkLower limit of air quantity variation ∈>Indicating cabinkThe upper limit of the air quantity variation.

In some embodiments of the present invention, the step of training the cabin air valve agent using the policy-value network comprises:

s21: setting an agent training loss function based on maximum entropy:

wherein,αrepresenting the temperature coefficient;representing reinforcement learning strategy->Allowed state-action pairs; />Representing a computational desire; />Indicating the state of the intelligent body of the cabin air valve>Selecting cabin air valve intelligent body action>The resulting rewards; />Representing a strategy for reinforcement learning; />Is the policy entropy, which is used to measure the uncertainty of the action probability distribution, defined as:

wherein,representation distribution->Is a value of (1):

；

s22: computing optimal strategy for SAC algorithm：

S23: defining flexible action functionsPerforming flexible action function strategy evaluation, and fixing strategy +.>And update->Value up to->Value convergence:

wherein,is a flexible action function; />Representing a discount factor;

s24: by means of convergenceValue improvement reinforcement learning strategy->Strategies for reinforcement learning after improvement +.>Bringing in a flexible action function, updating the strategy and getting new ∈>Index of function:

wherein,representing the updated policy; />Representing a policy before an update; />Representing a set of viable policies;representing +.>A value function; />Representing the partitioning function for normalizing the policy distribution.

In some embodiments of the present invention, in step S24, the range of policy updates is constrained by KL divergence:

；

representing a measurement distribution->And->Difference in KL divergence,>representing distributionXIs a value of (2); />Representing distributionYIs a value of->For calculating New ∈24>Index of function->Distribution of +.>The method comprises the steps of carrying out a first treatment on the surface of the Distribution ofY=。

In some embodiments of the present invention, in step S2, the step of obtaining the required air volume of each cabin at the current time by using the trained reinforcement learning agent includes:

the trained reinforcement learning intelligent agent is utilized to obtain according to the state of the reinforcement learning intelligent agent at the time ttCollection of required air quantity of each cabin of ocean platform at moment：

Wherein,representation->Time cabin->Is set in the air volume.

In some embodiments of the present invention, step S3 includes:

s31, converting the air quantity accurate control problem into the following proportional equation:

；

wherein,is->The actual air quantity is collected at the moment; />The method is a collection of upper-layer required air quantity; />Representing a unit row vector for normalization;

s32: the problem of accurate control of the air quantity is converted into a constraint optimization problem, and a first objective function is designed as follows:

；

representation->Time cabin->Opening of a cabin air valve;

s33: according to the energy saving control objective, the second objective function is designed as follows:

wherein,indicating cabin->A maximum opening degree of a cabin air valve;

s34: the third objective function is designed to:

wherein:representation->The static pressure of the fan is kept at all times;

s35: integration to obtain the total objective function：

Wherein,,/>,/>is a balance weight coefficient;

s36: based on the total objective function, the GA-fminconHSO method is adopted to solve the optimal air valve angle, and the method is obtainedTime cabin->Air quantity at optimal air valve angle>。

In some embodiments of the present invention, step S4 includes:

based onCalculating to obtain the actual air quantity:

wherein,optimization of the solution for the GA-fmincon HSO method is followed by +.>Time cabin->The air quantity under the optimal air valve angle; />The air quantity recovery ratio is a positive number smaller than 1; />Is->Time cabin->Is the actual air quantity of the air conditioner;

the fan power is controlled by adopting a control scheme of proportion control, the fan power is regulated according to the air quantity of a certain cabin terminal, the median of each proportion is selected, and the power regulation proportion of the matched fans is as follows:

wherein,the actual power of the fan after the proportional recovery is obtained; />The solved fan power is optimized for the GA-fmincon HSO method; />The median value of all cabin restoration proportion sets is represented.

The layered optimization control method for the ocean platform ventilation system has the beneficial effects that:

the invention provides a layered optimization control method of an ocean platform ventilation system, which is integrated with a flexible action-evaluation algorithm (SAC) and a hybrid search optimization method (HSO), so as to maintain air quality (IAQ) in a cabin and reduce energy consumption of the ventilation system.

The hierarchical optimization control method of the ocean platform ventilation system provided by the invention is divided into an upper layer control layer and a lower layer optimization layer, and an active control scheme is adopted to adjust the corresponding air quantity according to the environmental change conditions of different cabins so as to achieve the control target of the IAQ of each cabin. In the upper control, the invention establishes a virtual multi-cabin ventilation environment based on Reinforcement Learning (RL), trains the SAC-based intelligent body, and reduces the required air quantity to the maximum extent while keeping the carbon dioxide concentration in an energy-saving threshold interval (700 ppm-800 ppm), thereby reducing the energy consumption to the maximum extent. In the bottom optimization, the invention designs a required air volume tracking strategy of 'proportion balance and proportion recovery', directly optimizes the position of an air valve by using an HSO method based on a genetic algorithm and fmincon (GA-fmincon HSO), and implicitly optimizes the static pressure of a fan by combining fan power adjustment.

The static pressure of the fan is implicitly optimized through the position of the air valve, so that the efficiency of the layered control method of the ocean platform ventilation system is further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a general block diagram of a hierarchical optimization control method of an ocean platform ventilation system according to the present invention.

FIG. 2 is a block diagram of the SAC algorithm parameter update of the present invention.

FIG. 3 is a flow chart of the GA-Fmincong HSO method of the present invention.

Detailed Description

In order to make the technical problems, technical schemes and beneficial effects to be solved more clear, the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.

The invention provides a layered optimization control method of an ocean platform ventilation system, which can be used for controlling ventilation of each cabin of an ocean ship or an ocean platform ventilation system. The invention aims to overcome the defects of the existing ocean platform ventilation control method, and provides an ocean platform ventilation system layering optimization control method integrating a flexible action-evaluation algorithm (SAC) and a hybrid search optimization method (HSO) aiming at the requirements of cabin air quality and multi-cabin ventilation energy conservation, wherein the overall block diagram of the layering optimization control method is shown in figure 1.

The ocean platform comprises a main ventilation pipeline and a plurality of cabins, a main air valve is arranged on the main ventilation pipeline, and the main ventilation pipeline is communicated with each cabin; each cabin is provided with a variable air box, and each air box is provided with a cabin air valve. The total air valve is used for controlling the total ventilation quantity of the whole main ventilation pipeline, and the cabin air valve is used for controlling the ventilation quantity in each cabin. The opening degree of the cabin air valve can be selectively controlled according to the requirements according to the specific application of the cabin, so as to control the ventilation quantity of each cabin.

First, the control object of the ocean platform ventilation control system of the present invention will be described. The invention relates to a multi-target control algorithm, which is characterized by comprising the following specific control targets according to cabin air quality and multi-cabin ventilation energy-saving requirements.

Target 1: cabin carbon dioxide concentration control target.

The requirements of cabin air quality and ventilation energy conservation are considered, and the concentration range of carbon dioxide meets the following requirements:

in the middle ofRepresentation->Time cabin->CO of (c) ₂ Concentration; />Indicating cabin->CO of (c) ₂ The upper concentration limit (unit: ppm) was set to 800ppm according to the present invention; />Indicating cabin->CO of (c) ₂ Concentration energy conservation lower limit (unit: ppm), the invention is set to 700ppm; />Indicating the number of cabins.

Target 2: the fan static pressure, the opening degree of a cabin air valve and the cabin ventilation control target.

Based on inherent physical constraints of the ocean platform ventilation system, the static pressure of a fan, the position of a cabin air valve and the ventilation quantity of a cabin are in a specified operation range:

in the method, in the process of the invention,representation->At moment fan static pressure>And->Represents the specified operating range of the static pressure of the fan,is the minimum pressure value of the static pressure of the fan, < >>The maximum pressure value is the static pressure of the fan; />Representation->Time cabin->Cabin air valve angle, < ->And->Indicating cabin->The action limiting range of the cabin air valve, +.>Is a cabin->Minimum air valve angle of cabin air valve, +.>Is a cabin->A maximum damper angle of the cabin damper; />Representation->Time cabin->Is characterized in that the required air quantity of the air conditioner,and->Indicating cabin->Limiting range of required air quantity->Is a cabin->Minimum required air volume, +.>Is a cabin->The required air quantity is maximum.

Target 3: and controlling an actual air quantity proportion error.

In order to ensure the control accuracy of the actual air quantity, the actual air quantity proportion error should be kept within the allowable range:

in the middle ofRepresentation->Time cabin->Is the actual air volume (unit: m) ³ /h)；/>Indicating a relative error in airflow rate (industry standard of less than 10%); />Indicating the number of cabins.

Target 4: the cabin air valve controls the limiting index.

In order to minimize energy consumption, it should be ensured that at least one damper is fully open, the formula is as follows:

when (when)At the time, the cabin is representedChamber->Is fully opened.

Target 5: and the air valve and the fan coordinate control targets.

Minimizing resistance of the damper to correspondingly reduce fan static pressure and further reduce energy consumption:

in the middle ofRepresenting the maximum fan static pressure (unit: pa) under the energy-saving control target;the maximum air valve angle (unit: °) of the cabin air valve under the energy-saving control target of the ocean platform ventilation system is shown.

The method of the invention defines the cabin air valve of each cabin of the ocean platform as a cabin air valve intelligent body (commonAnd the training and controlling method specifically comprises the following steps.

S1: defining a cabin air valve of each cabin as an intelligent body; determining a control target of the ocean platform ventilation system, and designing an intelligent body reinforcement learning element; the control targets are as described in the foregoing targets 1 to 5. The reinforcement learning element includes a cabin air valve agent status element, a cabin air valve agent action element, and a cabin air valve agent rewarding element. And initializing and defining the state of the cabin air valve intelligent agent, the action of the cabin air valve intelligent agent and the rewarding of the cabin air valve intelligent agent.

S11: definition of cabin air valve agent state

Taking into account air mass and CO in a cabin ₂ Concentration, ventilation, number of persons in the cabin and metabolic rate are related, and thus the present invention is definedTime->The method comprises the following steps:

wherein,representation->Time cabin->CO of (c) ₂ Concentration (unit: ppm); />Representation->CO outside the time cabin ₂ Concentration (unit: ppm); />Representation->Cabin->Is the required air quantity (unit: m) ³ /h)；/>Representation->Time cabin->Is the number of people; />Representation->Human metabolism rate (unit: met) at the moment.

S12: definition of cabin air valve agent actions.

In order to effectively reduce the sense of blowing and limit the amplitude of the adjustment of the required air quantity, the action of the air valve intelligent body is set as the variation of the required air quantity:

wherein,representation->Time cabin->Air volume change amount (unit: m) ³ /h); constraint taking into account variables present in the actual physical model, +.>Only when the following two formulas are satisfied.

Wherein,indicating cabin->Lower limit of variation of air quantity (unit: m 3/h) and +.>Indicating cabin->The upper limit of the change in the amount of air (unit: m 3/h).

S13: definition of cabin damper agent rewards.

In order to achieve the control objective and guide the intelligent agent to learn the optimal strategy faster and more smoothly, the invention introduces the following four reward subfunctions:

wherein,shaping the reward function for the threshold value,/->Indicating a bootstrapping agentkLearned CO ₂ The prize of the concentration shapes the lower limit,c ₁ representing simultaneous satisfaction of all cabin COs ₂ The prize values at the concentration limits and ventilation energy saving requirements, i.e. meeting the upper and lower limits in target 1, are preferably 20,c ₂ representing cabin CO ₂ Threshold shaping prize value of concentration for guiding all cabinsCO ₂ To maintain threshold convergence over a greater range, the present invention is preferably 1; />For converging the acceleration rewarding function, is used for guiding the intelligent agent to quickly realize convergence in the training process,c ₃ indicating a convergence acceleration prize value, preferably 0.5 in the present invention; />For boundary limit rewarding function for guiding right action judgment of agent under boundary condition, the method comprises ++>Indicating that all cabins are intCO at the moment of time ₂ Concentration of the set of relative errors, +.>Representation->Time cabin->CO of (c) ₂ Relative error of concentration, ++>Representing the number of cabins;and the method is a standard air quantity rewarding function and is used for optimizing the required air quantity so as to ensure the energy-saving effect. Considering the four sub-reward functions comprehensively, the final reinforcement learning reward function is designed as follows:

S2: and training the intelligent agent.

Training the intelligent agent in a pre-constructed virtual environment by utilizing the reinforcement learning elements designed in the S1 to realize a final optimal control flow, wherein a SAC algorithm parameter updating block diagram is shown in fig. 2, and training the air valve intelligent agent by adopting a strategy-value network to finally obtain the required air quantity of each cabin of the ocean platform ventilation system. The specific training steps are as follows.

S21: setting an agent training loss function based on maximum entropy:

wherein,representation distribution->Is a value of (a). In the present invention, policy entropy->Uncertainty for calculating action probability distribution of S22 etc., at this time +.>，/>。

The SAC algorithm enhances the randomness of the strategy by introducing the maximum entropy, so that the probability of each action is distributed as uniformly as possible, and excessive concentration on specific actions is prevented, thereby improving the exploration capacity, the migration learning capacity and the stability of the algorithm.

S22: computing optimal policies in SAC algorithmThe method is characterized by comprising the following steps:

s23: defining flexible action functionsAnd performing flexible action function policy evaluation. Fixation strategy->Update ++using the flexible bellman equation>Value up to->Value convergence:

wherein,is a flexible action function; />Representing the discount factor. />

S24: by means of convergenceValue improvement strategy->Modified strategy->Bringing in a flexible action function, updating the strategy +.>And get a new->The exponent of the value function and the KL divergence is used to constrain the scope of policy updates. The policy update formula is as follows:

wherein,representing the updated policy; />Representing a policy before an update; />Representing a set of viable policies;representing +.>A value function; />Representing a partitioning function for normalizing the policy distribution;representing a distribution for measurementXAndYdifferent KL divergence.

Above, S23 and S24 are used for policy functionsIs a function of the iteration of (a): s23 calculation->The value of the policy function needed to be used at the current moment +.>The method comprises the steps of carrying out a first treatment on the surface of the S24 policy function->Is required to update +.calculated using S23>A value; updated policy function->Is reused forS23 calculating new->Values. And repeating the steps until the requirements are met.

The range of policy update is constrained by KL divergence, and the specific formula is as follows:

wherein:representing a measurement distribution->And->Difference in KL divergence,>representing distributionXIs a value of (2); />Representing distributionYIs a value of (a). In the present invention, < >>For calculating New ∈24>Index of function->At this time distributeThe method comprises the steps of carrying out a first treatment on the surface of the Distribution ofY=/>。

Through the steps, the flexible strategy is ensured to be capable of iteratively converging to the optimal strategy under the condition of meeting the maximum entropy in the discrete domain and the table domain. In the continuous domain, iterations of the flexible strategy can be implemented using a function approximator, typically approximated using a neural network.

S25: from playback bufferExtracting small batches of historical interaction data, and performing flexibilityNetwork parameters->And policy network parameters->The updating of (2) is specifically divided into three steps.

S251: value network parameter update

The loss function of the value network training is designed to ensure that the flexible bellman error is as small as possible, and the loss function is specifically as follows:

wherein,is flexible->Network->Parameters of (2); />Is a flexible state value function +.>Parameters of (2);the method can be calculated by the following formula:

wherein,is policy network->Parameters of (2); />Is the goal->Network->Is a parameter of (a). />

Updating network parameters by gradient descent algorithm, and flexibility in SAC algorithmParameters of the network->The gradient update is shown as follows:

wherein,is flexible->Updating step length of the network; />Is->Is a gradient of (a).

By checking state value netMethod for applying flexible update to network weights according to flexibilityVarious parameter updating targets of network>The parameters of the network are specifically:

wherein,is the goal->Update step size of the network.

S252: policy network parameter update

The strategic network trained loss function is designed, and is trained by minimizing the expected KL divergence in the Bellman equation, and the loss function is specifically as follows:

for policy network, using parameter variation skill, using its micro-nature to reduce calculation workload, its final output policy can use Gaussian distributionNTo express:

wherein,and->Respectively indicate->The mean and standard deviation of the moment gaussian distribution, based on which the loss function can be further rewritten as the following formula:

and continuously using gradient update to carry out strategy network training, wherein the parameter update process is as follows:

wherein,is the update step size of the policy network; />Is->Is a gradient of (a).

S253: temperature coefficient adaptive update

In the latest SAC algorithm, the optimal temperature coefficient is obtained in a self-adaptive way by updating the gradient of the maximum entropy, wherein the loss function is designed as follows:

wherein,is the minimum expected entropy expected, typically a negative value of the dimension of the motion space; />Representing policy network in->Action at time.

Using the gradient descent method, the updated temperature coefficients are as follows:

wherein,is the update step length of the temperature coefficient; />Is->Is a gradient of (a).

Based on the training reinforcement learning agentThe state of the intelligent agent is subjected to time reinforcement learning to obtain +.>The collection of the required air quantity of each cabin at the moment +.>：/>

Wherein,representation->Time cabin->Is set in the air volume.

S3: based on a cabin ventilation control target, proportional balance control is carried out on the air quantity of the ocean platform ventilation system, the proportional error of the actual air quantity and the required air quantity of all cabins is controlled to be minimum, and the optimal cabin air valve angle with the consistent required air quantity proportion is obtained through solving.

In the layered optimization control, the purpose of the lower layer optimization control is to ensure that the actual air quantity in the ocean platform ventilation system is matched with the required air quantity obtained by the upper layer, so that accurate tracking is realized. According to the target 3, firstly, the air quantity of the ocean platform ventilation system is subjected to proportion balance control, and the core is to keep the proportion of the actual air quantity of all cabins to the required air quantity to be minimized. According to the invention, a global optimal air valve with minimum energy consumption of an ocean platform ventilation system is obtained by designing a proportion balance objective function to be optimized, taking an air valve angle as an optimization variable and optimizing by utilizing a GA-fmincon HSO method so as to carry out accurate air volume tracking and energy saving optimization, and the specific steps are as follows.

S31: the proportion balance accurately tracks and controls the air quantity of the ocean platform ventilation system, and the core of the proportion balance is the air quantity proportion rather than the absolute value. By introducing normalization of the air quantity, the problem of accurate control of the air quantity is converted into the following proportional equation:

wherein,is->The actual air quantity is collected at the moment; />The method is a collection of upper-layer required air quantity; />Representing the unit row vector for normalization.

S32: the problem of accurate control of the air quantity is converted into the problem of constraint optimization, and an objective function is designed as follows:

wherein,representation->Time cabin->Opening of a cabin air valve; there are several possible solutions to the above objective function, so further optimization is required to obtain the most energy efficient damper angle.

S33: according to the object 4 and the object 5, at least one air valve should be guaranteed to be fully opened, and all air valve angles should be as small as possible to maintain maximum energy saving, the invention realizes the energy saving object by utilizing the minimization of the sum of the current air valve angle and the maximum air valve angle, and the object function is recorded as:

wherein,indicating cabin->A maximum opening degree of a cabin air valve;

s34: when the actual air volume control is performed based on the proportion balance, the influence of the static pressure of the fan can be further considered, the fan static pressure is linearly changed by performing power control through the PID controller, the fan static pressure can not be directly improved by adjusting the angle of the air valve, the proportion of the proportion balance can be increased as much as possible, the total air volume is maximized, the reduced power is further increased, and the energy-saving requirement is realized, so that a third objective function is recorded as follows:

wherein,representation->And (5) static pressure of the fan at the moment. />

S35: integrating the optimization targets to obtain the overall objective function of the lower-layer optimization:

wherein,,/>,/>for balancing the weight coefficients, the function is to keep the magnitudes of the sub-objective functions consistent and have different optimization weights.

S36: based on the total objective function, the GA-fminconHSO method is adopted to solve the optimal air valve angle, and the actual air quantity is obtained.

After the optimization objective function design of the ventilation system of the lower ocean platform is completed, the GA-fminconHSO method is utilized to solve the optimal air valve angle. Genetic Algorithm (GA) is a meta-heuristic optimization algorithm, originating from the evolutionary theory of darwins, and is widely used to solve complex nonlinear and multidimensional space-optimized search problems. The algorithm simulates the natural process of selection and adaptation by encoding the solution of the problem as a chromosome. It generates a new and improved chromosome population in each iteration through selection, crossover and mutation operations. The algorithm then decodes the best chromosome into a solution to the problem. The basic steps of the standard genetic algorithm are as follows:

s361: initializing:

setting the size of the populationProbability of crossing->Mutation probability->And termination criteria. Random generation->Individuals as an initial population->. Will generate counter +.>Set to 0.

S362: evaluation:

computing populationFitness of each individual.

S363: evolution:

a) Selecting: selection of wheel from populationIs selected from->For parent (+)>) Wherein the individual may be repeatedly selected.

b) Crossing: probability-basedFor selected->Performing cross operation on parents to generate +.>Intermediate individuals.

c) Mutation: probability-basedFor->The intermediate individuals independently apply mutation operators to generate +.>And candidate individuals.

S364: selecting:

from fitness and eliteSelection of->Individuals, forming a next generation population。

S365: and (3) terminating:

if the termination criterion is met, the output fitness is highestThe set of optimal solutions is recorded as. Otherwise, will->Add 1, and then go to step S362.

Aiming at the problem that the standard genetic algorithm is easy to generate local optimum in the optimal control of the ocean platform ventilation system, the fmincon algorithm is introduced to make up for the defect of the genetic algorithm in the aspect of local search. The genetic algorithm provides a good initial point for the fmincon algorithm, and is switched to the fmincon algorithm when the optimal error switching threshold is reached, so that the global optimal solution is searched with higher probability. The flow chart of the GA-fmincon HSO method is shown in FIG. 3.

S4: and (5) recovering the proportion.

And (3) carrying out proportional recovery according to the solving result of the proportional balance control to obtain the fan power after optimization solving. Specifically, after the solution is optimized through proportion balance, an optimal cabin air valve angle which meets the proportion of the required air quantity is obtained, the power of a fan is required to be adjusted subsequently so as to achieve the condition that the proportion is restored to meet the matching of the required air quantity, and a proportion restoring step is introduced.

S41: according to the result of S3, the air quantity of different cabins of the ocean platform ventilation system has the following proportional relation with the required air quantity after the proportion is balanced.

Wherein,optimization of the solution for the GA-fmincon HSO method is followed by +.>Time cabin->The air quantity under the optimal air valve angle is obtained; />The air quantity recovery ratio is a positive number smaller than 1; />Is->Time cabin->Is set to the actual air volume of the air conditioner.

Existing researches have shown that under the condition that the air quantity recovery proportion of one cabin is matched with the fan power adjustment proportion, other cabins can be strictly matched, and the relation is as follows:

wherein,(wherein->Is an integer including 1,2, …) is a coefficient related to the inherent characteristics of the angle of the air valve of the ocean platform ventilation system, and when the angle of each cabin air valve is fixed, the air valve is added>Remain unchanged. Therefore, by adjusting the fan power, the air volume of one cabin is scaled according to the proper recovery ratio, and the air volumes of other cabins are scaled according to the same ratio. Thus, the air volume of each cabin can be accurately matched with the respective required air volume.

S42: the fan power is controlled by adopting a control scheme of proportion control, the fan power is regulated according to the air quantity of a certain cabin terminal, the median of each proportion is selected, and the power regulation proportion of the matched fans is as follows:

wherein,the actual power (unit: W) of the fan after proportional recovery; />The solved fan power (unit: W) is optimized for the GA-fmincon HSO method; />The median value of all cabin restoration proportion sets is represented.

Compared with the existing Demand Control Ventilation (DCV) method, the proposed ocean platform ventilation system layering optimization control method has the following advantages:

a) The required air quantity can be adjusted according to the real-time environment, so that the indoor air quality of multiple cabins is effectively maintained;

b) The relative error between the actual air quantity and the required air quantity of the ocean platform ventilation system can be ensured to meet ASHRAE standard (less than 10%);

c) Experiments show that compared with a method based on DDPG, the method based on the upper SAC improves the time percentage of the carbon dioxide concentration in the energy-saving threshold interval by 42.77%; compared with the proportion method and the P2S-DVC method, the average energy saving rate of the lower GA-fmincon HSO method is respectively improved by 48.99 percent and 38.97 percent.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. The ocean platform ventilation system layering optimization control method is characterized by comprising a fan, a main ventilation pipeline and a plurality of cabins, wherein the fan is communicated with the main ventilation pipeline, a total air valve is arranged on the main ventilation pipeline, and the main ventilation pipeline is communicated with each cabin; each cabin is provided with a variable air box, and each variable air box is provided with a cabin air valve;

the control method comprises the following steps:

2. The method of hierarchical optimal control of an offshore platform ventilation system of claim 1, wherein the control objective of the offshore platform ventilation system comprises one or a combination of the following objectives:

target 1: cabin carbon dioxide concentration control target:

in the method, in the process of the invention,representation->At moment fan static pressure>And->Represents a defined operating range of the fan static pressure, +.>Is the minimum pressure value of the static pressure of the fan, < >>The maximum pressure value is the static pressure of the fan; />Representation->Time cabin->Cabin air valve angle, < ->And->Indicating cabin->The action limiting range of the cabin air valve, +.>Is a cabin->Minimum air valve angle of cabin air valve, +.>Is a cabin->A maximum damper angle of the cabin damper; />Representation->Time cabin->Is of the required air volume of->Andindicating cabin->Limiting range of required air quantity->Is a cabin->Minimum required air volume, +.>Is a cabin->The required air quantity is the maximum value;

target 3: actual air volume proportion error control target:

in the middle ofIs indicated at->Time cabin->Is the actual air quantity of the air conditioner; />Indicating relative error in airflow rate; />Representing the number of cabins;

when (when)When the cabin is indicated->Is fully opened;

target 5: and the air valve and the fan coordinate control target:

3. The ocean platform ventilation system layered optimization control method of claim 1, wherein:

cabin air valve intelligent body stateIs defined by:

wherein,representation->Time cabin->CO of (c) ₂ Concentration; />Representation->CO outside the time cabin ₂ Concentration; />Representation->Time cabin->Is set according to the required air quantity; />Representation->Time cabin->Is the number of people; />Representation->Human metabolism rate at moment;

cabin air valve intelligent body actionIs defined by:

wherein,representation->Time cabin->Air volume change amount of->；

The damper agent training reward element includes:

wherein,shaping the reward function for the threshold value,/->Indicating a bootstrapping agentkLearningCO of (c) ₂ The prize of the concentration shapes the lower limit,c ₁ representing simultaneous satisfaction of all cabin COs ₂ The prize value at the time of concentration limitation and ventilation energy saving requirement,c ₂ representing cabin CO ₂ Shaping the prize value for the threshold value of the concentration; />Accelerating a bonus function for convergence;c ₃ representing a convergence acceleration prize value; />Limiting the bonus function for boundaries; />Indicating that all cabins are +.>CO at the moment of time ₂ Concentration of the set of relative errors, +.>Indicating cabin->At->CO at the moment of time ₂ Relative error of concentration, ++>Representing the number of cabins; />A function of standardizing rewards for the air quantity;

designing reinforcement learning reward functions：

4. A method for hierarchical optimal control of an offshore platform ventilation system as claimed in claim 3,only when the following two formulas are satisfied:

5. The method of hierarchical optimization control of an offshore platform ventilation system of claim 3 or 4, wherein the step of training the cabin damper agent using a policy-value network comprises:

s21: setting an agent training loss function based on maximum entropy:

wherein,representation distribution->Is a value of (1):

；

s22: computing optimal strategy for SAC algorithm：

wherein,is a flexible action function; />Representing a discount factor;

6. The hierarchical optimization control method for an ocean platform ventilation system according to claim 5, wherein in step S24, the range of policy update is constrained by KL divergence:

；

representing a measurement distribution->And->Difference in KL divergence,>representation distribution->Is a value of (2); />Representation distribution->Is a value of->For calculating New ∈24>Index of function->Distribution of +.>The method comprises the steps of carrying out a first treatment on the surface of the Distribution ofY=。

7. The method for hierarchical optimization control of an ocean platform ventilation system according to claim 5, wherein in step S2, the step of obtaining the required air volume of each cabin at the current time by using the trained reinforcement learning agent comprises:

Wherein,representation->Time cabin->Is set in the air volume.

8. The ocean platform ventilation system hierarchical optimization control method of claim 6, wherein step S3 includes:

；

representation->Time cabin->Opening of a cabin air valve;

wherein,indicating cabin->A maximum opening degree of a cabin air valve;

s34: the third objective function is designed to:

wherein:representation->The static pressure of the fan is kept at all times;

s35: integration to obtain the total objective function：

Wherein,,/>,/>is a balance weight coefficient;

9. The ocean platform ventilation system hierarchical optimization control method of claim 8, wherein step S4 includes:

based onCalculating to obtain the actual air quantity:

wherein,optimization of the solution for the GA-fmincon HSO method is followed by +.>Time cabin->The air quantity under the optimal air valve angle;the air quantity recovery ratio is a positive number smaller than 1; />Is->Time cabin->Is the actual air quantity of the air conditioner;