CN113379093A

CN113379093A - Energy consumption analysis and optimization method for oil gas gathering and transportation system

Info

Publication number: CN113379093A
Application number: CN202010159512.8A
Authority: CN
Inventors: 李振泉; 张丁涌; 周长敬; 王兴武; 安学先; 高华; 孙东; 刘文聪; 闫恩祥; 李红强
Original assignee: China Petroleum and Chemical Corp; Sinopec Shengli Oilfield Co Xianhe Oil Production Plant
Current assignee: China Petroleum and Chemical Corp; Sinopec Shengli Oilfield Co Xianhe Oil Production Plant
Priority date: 2020-03-09
Filing date: 2020-03-09
Publication date: 2021-09-10

Abstract

The invention provides an energy consumption analysis and optimization method for an oil-gas gathering and transportation system, which comprises the following steps: step 1, collecting operation data of an oil field gathering and transportation system; step 2, preprocessing the collected data; step 3, carrying out correlation analysis on the data to evaluate the correctness and the validity of the data; step 4, carrying out variance analysis on the parameters of the oil field gathering and transportation equipment and carrying out feature importance analysis through preliminary fitting of a random forest machine learning method; step 5, establishing a machine learning random forest regression model; and 6, evaluating each set of scheme by using an energy consumption quantitative prediction big data model, and queuing to select the optimal scheme. The energy consumption analysis and optimization method for the oil gas gathering and transportation system can greatly reduce the modeling difficulty, shorten the design period and reduce the workload, thereby providing reasonable reference basis for the adjustment and optimization of the oil gas gathering and transportation scheme.

Description

Energy consumption analysis and optimization method for oil gas gathering and transportation system

Technical Field

The invention relates to the technical field of big data application and oil-gas gathering and transportation energy conservation, in particular to an energy consumption analysis and optimization method for an oil-gas gathering and transportation system.

Background

For an oil gas gathering and transportation system, nodes influencing energy consumption are more, such as outlet water temperature after heating by a heating furnace, water content of an oil outlet of a three-phase separator, surface temperature of the upper surface, the middle surface and the lower surface of a settling tank and the like. The conventional energy-saving method is to establish an energy consumption conservation model of the gathering and transportation system by combining an energy consumption analysis rule of the oil-gas gathering and transportation system and an energy balance test, and to analyze the model by adopting an optimization algorithm with the power and the minimum heat power consumption of the gathering and transportation system as targets.

The data accumulated in the oil gas gathering and transportation production has the following characteristics:

1. multiple event excitation, high dimension and stronger coupling. The data acquisition is frequent, the acquisition density is high, repeated redundant data exists, a plurality of parameters of the system are mutually influenced, and the behavior state of the system is acted jointly;

2. the oil gas gathering and transportation production system has instability, and the collected data is easily polluted due to industrial noise;

3. dynamics and diversity of data types. Parameters such as pressure, temperature, flow, equipment state and the like are constantly changed along with time and comprise various types of data such as logic type, numerical value type and the like;

4. multiple timescale and incompleteness. The frequency of the signals acquired by different parameters is different, and data loss may occur due to asynchronous data recording;

5. and (4) multiple modes. The gathering and transportation system has a normal working state and a fault working condition.

The traditional simulation technology is complex in work and long in research period when establishing an energy consumption analysis and optimization model, and the characteristics of a large amount of complex oil gas gathering and transportation data need to be considered, so that the modeling difficulty is further increased.

In recent years, big data driven AI machine learning techniques have advanced significantly and have been used very successfully in the fields of language and image processing. With the development of parallel computing architecture, the machine learning technology also has the capability of on-line operation, and the high real-time performance and low complexity thereof make the analysis and optimization of the oil field gathering and transportation tightly combined to be possible. The big data analysis technology is applied to other energy industries such as electric power, coal and the like, and particularly has obvious application effect on the aspect of electronic information.

In the field of oilfield big data, the production operation rule of the oilfield is analyzed according to the big data by the sectional Zealand and the like so as to solve the problem of oilfield production business; the problems and bottlenecks in management are solved through analysis of oilfield management data. Zhang uses big data analysis technology to carry out application analysis in the aspects of abnormal well automatic identification, sealing driving device matching technology and screw pump oil extraction process system optimization. The types of big data analysis technologies are discussed in the plum garden and the like, and the construction method of the oil field big data analysis platform and the application of the big data analysis technology in oil field production are summarized.

In the aspect of domestic patents, a machine learning method is less in published patents for oil-gas gathering and transportation optimization, and a machine learning-based three-segment continuous stepping heating furnace optimization energy-saving method is disclosed in a machine learning-based three-segment continuous stepping heating furnace optimization energy-saving method (application number: CN 201910526161.7). The time dimension is introduced into operation experience, and the rule that the heating furnace thermal efficiency is attenuated along with the time is abstracted, so that the problem of employee experience deviation is solved. But is more applied in other fields: a method for optimizing a machine learning model is disclosed in a patent of optimization method, device, terminal equipment and storage medium of the machine learning model (application number: CN201810687251.X), and is characterized by comprising the following steps: acquiring a plurality of data to be processed; inputting the data to be processed into a machine learning model, and screening the data to be processed which meets preset conditions by using the machine learning model to serve as marking data; wherein the annotation data comprises training data and verification data; training the machine learning model by using the training data to determine a trained model; updating the machine learning model based at least on the trained model and the validation data. A method for identifying negative financial information based on a machine learning algorithm is disclosed in the patent 'method and device for identifying negative financial information based on a machine learning algorithm' (application number: CN 201910789700.6). the method comprises the steps of analyzing a financial information text described in a natural language by using the machine learning algorithm to judge whether the emotion reflected by the text is negative; therefore, the processing of large-volume information can be realized through a computer, and whether the information text is negative can be judged more accurately through a pre-constructed algorithm model. An ore visible light image sorting method based on Adaboost machine learning is disclosed in the patent application No. CN201610715882.9, the machine learning method based on Adaboost machine learning is utilized to learn, train and predict the ore visible light image, mineral separation technicians with rich experience can be fully simulated to carry out ore sorting, each machine has the same and accurate ore sorting experience, subjectivity and individual difference of manually sorting ores are effectively avoided, people can be better replaced to work or work which cannot be finished by people can be better finished, labor intensity is reduced, and product production quality and labor production efficiency are improved.

However, no one has studied on the oil and gas gathering and transportation optimization studied finely by using a big data machine learning technology, so that an oil field gathering and transportation system energy consumption analysis and optimization method based on a big data machine learning algorithm is provided. Therefore, a new energy consumption analysis and optimization method for the oil-gas gathering and transportation system is invented, and the technical problems are solved.

Disclosure of Invention

The invention aims to provide an energy consumption analysis and optimization method for an oil gas gathering and transportation system, which combines big data application with actual requirements of an oil field, reduces gathering and transportation energy consumption and provides more accurate technical support for oil gas gathering and transportation.

The object of the invention can be achieved by the following technical measures: the energy consumption analysis and optimization method of the oil gas gathering and transportation system comprises the following steps: step 1, collecting operation data of an oil field gathering and transportation system; step 2, preprocessing the collected data; step 3, carrying out correlation analysis on the data to evaluate the correctness and the validity of the data; step 4, carrying out variance analysis on the parameters of the oil field gathering and transportation equipment and carrying out feature importance analysis through preliminary fitting of a random forest machine learning method; step 5, establishing a machine learning random forest regression model; and 6, evaluating each set of scheme by using an energy consumption quantitative prediction big data model, and queuing to select the optimal scheme.

The object of the invention can also be achieved by the following technical measures:

in step 1, the operation data of the oil field gathering and transportation system includes process parameters and energy consumption data of each component in the gathering and transportation system during actual operation, and specifically includes the extracted liquid amount processed by the gathering and transportation system and the temperature, pressure and water content of each component.

In step 1, parameters of different devices are collected to form a data set including all energy consumption device parameters, and all structured, unstructured and semi-structured data of the oil gas gathering and transportation energy consumption device parameters related to analysis are extracted in an all-sampling mode.

In step 2, the data preprocessing performed includes: missing value processing, outlier processing, and data distribution processing.

In step 2, when missing value processing is performed, for a data column with a missing value in the data table, a mean value method is used to fill up the missing value, and the formula is as follows:

null is a missing value, i is a row index, j is a column index, and m is the number of samples; v. of_i,jIs the value of ith row and j column of the data table.

In step 2, when abnormal values are processed, abnormal data exist in the data table, namely, the numerical values exceed the normal range, and abnormal points are identified and deleted by adopting a visual means; when data distribution processing is carried out, the distribution of each parameter is modified into normal distribution.

In step 3, every two gathering parameters are visualized and analyzed.

In step 4, analysis of variance is to calculate the variance of each feature and then delete the feature with variance 0:

σ²is the overall variance, X is variable, mu is the overall mean, N is the overall number of instances;

in actual work, when the overall mean is difficult to obtain, the sample statistics is applied to replace the overall parameters, and after correction, the sample variance calculation formula is as follows:

S²is the sample variance, X is a variable,

is the sample mean, and n is the sample number.

In step 4, the feature importance analysis is to calculate the contribution of each feature to the final predicted energy consumption, and the greater the contribution, the higher the weight.

In step 5, regression averaging is used in constructing the model; firstly, dividing a sampling data set into a training set, a testing set and a verification set; the division proportion of the three data sets is set as 80%, 10% and 10%, the characteristics are that each device exceeds the parameters, the target is energy consumption, and the method specifically comprises the following steps:

firstly, adopting cross validation and grid search means, training a model by using a training set, and determining random forest model parameters;

evaluating the fitting effect of the model by using the test set, wherein the evaluation standard adopts the mean square error; assuming that the real energy consumption is E, the model predicts the energy consumption as E', and the mean square error is:

m is the number of predicted samples, the smaller the RMSE, the better the fit;

finally, adjusting the model hyper-parameters by using the verification set; the specific mode is that a verification set is brought into a trained model, and a learning curve is drawn; the best hyper-parameter is when the upper curve is close to the lower curve and both are relatively high.

In step 6, the enumerated quantity of each adjustable parameter is implemented through the machine learning model trained in step 5, a scheme is generated by an exhaustion method or an orthogonal experiment, each scheme is evaluated by a big data model with energy consumption quantitative prediction, and the best scheme is selected in a queue.

The energy consumption analysis and optimization method of the oil and gas gathering and transportation system is different from the conventional method of establishing an energy consumption conservation model of the gathering and transportation system by combining an energy consumption analysis rule of the oil and gas gathering and transportation system and an energy balance test and analyzing the model by using an optimization algorithm. A quick and efficient management means is provided for an energy-saving manager, the oil-gas gathering and transportation system is operated efficiently, and energy is saved finally.

The method adopts a random forest machine learning mode to clean data of collected parameters of various energy consumption devices (three separators, a settling tank, a heating furnace, a dewatering pump, a stabilizing tower, a tower bottom pump and an outward conveying pump), and then divides a data set into a training set, a testing set and a verification set according to a machine learning modeling process. And training the model by using a training set by means of cross validation and grid search to determine the hyper-parameters of the model. And evaluating the fitting effect of the model by using a test set, wherein the evaluation standard adopts a mean square error. And finally, adjusting the model parameters by using the verification set. And obtaining the random forest regression model with the best effect to predict the energy consumption. And then, implementing enumeration quantity of each adjustable parameter through a machine learning model, generating a scheme by using an exhaustion method or an orthogonal experiment, evaluating each set of scheme by using an energy consumption quantitative prediction big data model, and queuing to select an optimal scheme, thereby optimizing the oil gas gathering and transportation process and reducing energy consumption. Different from the simulation technology, the simulation technology needs to establish each energy consumption equipment model according to a large number of complex formulas, the invention avoids the construction of a large number of complex formulas, and the modeling time is shorter compared with the simulation.

The invention has the following excellent effects: aiming at the phenomena of high application difficulty, long scheme design period and high workload of the traditional numerical simulation means based on the whole flow, the invention provides a processing method for fitting various equipment parameters of oil and gas transportation by using a machine learning method and establishing an energy consumption prediction model. By adopting the improved method, the modeling difficulty can be greatly reduced, the design period is shortened, the workload is reduced, and a reasonable reference basis is provided for the adjustment and optimization of the oil-gas gathering and transportation scheme.

Drawings

FIG. 1 is a graph of data correlation analysis in accordance with an embodiment of the present invention;

FIG. 2 is a graph of feature importance analysis in an embodiment of the present invention;

FIG. 3 is a schematic diagram of a random forest in accordance with an embodiment of the present invention;

FIG. 4 is a schematic cross-validation in accordance with an embodiment of the present invention;

FIG. 5 is a diagram illustrating a learning curve according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of model optimization in an embodiment of the present invention;

fig. 7 is a flowchart of an embodiment of the energy consumption analysis and optimization method of the oil and gas gathering and transportation system according to the present invention.

Detailed Description

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Fig. 7 is a flowchart of the energy consumption analysis and optimization method of the oil and gas gathering and transportation system of the present invention, which includes the following steps:

in step 101, operational data of an oilfield gathering system is collected according to oilfield monitoring equipment. The operation data of the oil field gathering and transportation system comprises process parameters and energy consumption data of each component in the gathering and transportation system during actual operation, and specifically can be the extracted liquid amount processed by the gathering and transportation system and the temperature, pressure, water content and the like of each component. The following equipment and respective parameters are considered in this embodiment:

three separators: inlet flow, level, gun level, outlet oil pressure

A settling tank: liquid level, oil-water interface, volume, quality of pure oil

Heating the furnace: inlet pressure, inlet temperature, outlet temperature

A dewatering pump: outlet pressure, variable frequency power frequency

A stabilizing tower: pressure, temperature

A tower bottom pump: outlet pressure, variable frequency power frequency

An external delivery pump: outlet pressure, frequency conversion power frequency, temperature, instantaneous flow, accumulated flow and power consumption

Other factors: ambient temperature

Each device is recorded into a respective table. And splicing the parameter tables of different equipment to form a data set comprising all the parameters of the energy consumption equipment. All structured, unstructured and semi-structured data of oil gas gathering and transportation energy consumption equipment parameters related to analysis are extracted in an overall sampling mode. The sampling table size is m rows and n columns. m represents the number of data samples, n represents the number of parameters, and m is much larger than n. The collection of data is in units of ten thousand, considering that the amount of machine learning data cannot be too small. n includes the incoming flow, level, gun level, outlet oil pressure, inlet temperature, outlet pressure, variable frequency power frequency, etc. Since machine learning has high requirements on data quality, the data will be further processed in step 102 after the data merging is completed in step 101.

In step 102, data preprocessing is performed on the data merged in step 101. The data preprocessing mainly carries out data cleaning work and aims to improve the data quality. The following processing is performed on the data in this example:

missing value processing: for the data columns with missing values in the data table, a mean value method is adopted to fill up the missing values, and the formula is as follows:

null is the missing value, i is the row index, j is the column index, and m is the number of samples. v. of_i,jIs the value of ith row and j column of the data table;

abnormal value processing: abnormal data (numerical values exceed a normal range) exist in the data table, and abnormal points are identified and deleted by adopting a visual means;

data distribution processing: the distribution of each parameter is modified to be normal.

After the data preprocessing is completed, the data analysis can proceed to step 103.

In step 103, a correlation analysis is first performed on the data to evaluate the correctness and validity of the data. This example first relates to a correlation analysis of data by visualization as shown in fig. 1. Particularly, every two parameters of gathering and transportation are analyzed in a visual mode. As can be seen from fig. 1, the power consumption has a positive correlation with the ambient temperature, and the output temperature has no obvious linear relationship with the ambient temperature. Therefore, it can be seen that the power consumption has a strong correlation with the ambient temperature.

At step 104, analysis of variance of the oilfield gathering equipment parameters and feature importance analysis by preliminary fitting through a random forest machine learning method are performed.

Analysis of variance is to calculate the variance of each feature and then remove features with variance of 0.

σ²For the population variance, X is the variable, μ is the population mean, and N is the population case number.

S²is the sample variance, X is a variable,

is the sample mean, and n is the sample number.

The feature importance analysis is to calculate the contribution of each feature to the final predicted energy consumption, and the greater the contribution, the higher the weight.

The main control factors (such as the water content of the incoming liquid, the output temperature and the power consumption) are searched through the first two steps, and the method is shown in figure 2. Extraneous factors are eliminated, and the purpose is to improve the model precision and reduce the model running time. Next, machine learning modeling is performed in step 105.

In step 105, a machine-learned random forest regression model is established.

The random forest principle is shown in fig. 3:

the random forest is a forest established in a random mode, the forest is composed of a plurality of decision trees, and each decision tree has no relation with each other. When a new sample exists, each decision tree of the forest is judged respectively, which class the sample belongs to is judged, and then the most classes are selected in a voting mode to serve as a final classification result. In the regression problem, the random forest outputs the average of all decision tree outputs.

In random forest, four steps for each decision tree "planting" and "growing":

(1) assuming that the number of samples in a training set is set to be N, then the N samples are obtained through repeated multiple sampling (boost sampling) with resetting, and the sampling result is used as the training set of the decision tree generated by the user;

(2) if there are M input variables, each node will randomly select M (M < M) specific variables and then use the M variables to determine the best split point. In the generation process of the decision tree, the value of m is kept unchanged, and the splitting method adopts the following formula to split the decision tree to the direction with the maximum information gain, which is specifically shown in a formula (4);

d is the data set, a is a certain characteristic (such as the temperature of the settling tank) in the data set, Gain is the information Gain, Ent (D) is the entropy of D,

and v is the conditional entropy of D and the number of categories.

(3) Each decision tree is grown to the maximum possible without pruning;

(4) new data is predicted by summing all decision trees (majority voting in classification, averaging in regression).

Since energy consumption prediction belongs to the regression problem, we use regression averaging when building the model. First, a sample data set is divided into a training set, a test set and a verification set. The division ratio of the three data sets is 80%, 10% and 10%, the characteristics are that each device is over-parameter, and the target is energy consumption.

Thirdly, adopting cross validation and grid search means, training the model by using a training set, and determining the parameters of the random forest model. Cross-validation is shown in figure 4.

And fourthly, evaluating the fitting effect of the model by using the test set, wherein the evaluation standard adopts a mean square error. Assuming that the real energy consumption is E, the model predicts the energy consumption as E', and the mean square error is:

m is the number of predicted samples, and a smaller RMSE indicates a better fit.

And finally, adjusting the hyper-parameters of the model by using the verification set. The specific method is to bring the verification set into a trained model and draw a learning curve, wherein the learning curve is shown in fig. 5. The best hyper-parameter is when the upper and lower curves are close and both are relatively high. The hyper-parameters include the number of trees, the depth of the trees, the number of leaf nodes of the trees, etc. After model hyper-parameters are determined in step 105 and the model is trained, optimal parameter selection may be performed in step 106.

At step 106, the overall flow is as shown in FIG. 6. And (5) implementing enumeration of each adjustable parameter through the machine learning model trained in the step 105, generating a scheme by an exhaustion method or an orthogonal experiment, evaluating each set of scheme by using an energy consumption quantitative prediction big data model, and queuing to select an optimal scheme.

The energy consumption analysis and optimization method for the oil-gas gathering and transportation system can greatly reduce the modeling difficulty, shorten the design period and reduce the workload, thereby providing a reasonable reference basis for the adjustment and optimization of the oil-gas gathering and transportation scheme. However, the model adopted by the method has low interpretability and high requirement on data quality, and the model needs to be trained to obtain a better effect.

Claims

1. The energy consumption analysis and optimization method of the oil gas gathering and transportation system is characterized by comprising the following steps:

step 1, collecting operation data of an oil field gathering and transportation system;

step 2, preprocessing the collected data;

step 3, carrying out correlation analysis on the data to evaluate the correctness and the validity of the data;

step 4, carrying out variance analysis on the parameters of the oil field gathering and transportation equipment and carrying out feature importance analysis through preliminary fitting of a random forest machine learning method;

step 5, establishing a machine learning random forest regression model;

and 6, evaluating each set of scheme by using an energy consumption quantitative prediction big data model, and queuing to select the optimal scheme.

2. The energy consumption analysis and optimization method for an oil and gas gathering and transportation system according to claim 1, wherein in step 1, the operation data of the oil field gathering and transportation system comprises process parameters and energy consumption data of each component in the gathering and transportation system during actual operation, and specifically comprises the liquid production amount processed by the gathering and transportation system and the temperature, pressure and water content of each component.

3. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system according to claim 2, wherein in step 1, parameters of different devices are collected to form a data set including all energy consumption device parameters, and all structured, unstructured and semi-structured data of the oil and gas gathering and transportation energy consumption device parameters related to analysis are extracted in an all-sampling manner.

4. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system as claimed in claim 1, wherein in step 2, the data preprocessing performed comprises: missing value processing, outlier processing, and data distribution processing.

5. The energy consumption analysis and optimization method of an oil and gas gathering and transportation system according to claim 4, wherein in step 2, when missing value processing is performed, for data columns with missing values in a data table, a mean value method is used to fill up the missing values, and the formula is as follows:

null is the miss value, i is the row index, j is the columnIndex, m is the number of samples; v. of_i,jIs the value of ith row and j column of the data table.

6. The energy consumption analysis and optimization method of an oil and gas gathering and transportation system according to claim 4, characterized in that in step 2, when abnormal values are processed, abnormal data exist in the data table, namely, the numerical value exceeds the normal range, and abnormal points are identified and deleted by a visual means; when data distribution processing is carried out, the distribution of each parameter is modified into normal distribution.

7. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system according to claim 1, wherein in step 3, gathering and transportation parameters are analyzed in a pairwise visualization manner.

8. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system according to claim 1, wherein in step 4, the analysis of variance is to calculate the variance of each feature and then delete the feature with the variance of 0:

S²is the sample variance, X is a variable,

is the sample mean, and n is the sample number.

9. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system according to claim 1, wherein in step 4, the feature importance analysis is to calculate the contribution of each feature to the final predicted energy consumption, and the greater the contribution, the higher the weight.

10. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system according to claim 1, wherein in step 5, regression averaging is used in constructing the model; firstly, dividing a sampling data set into a training set, a testing set and a verification set; the division proportion of the three data sets is set as 80%, 10% and 10%, the characteristics are that each device exceeds the parameters, the target is energy consumption, and the method specifically comprises the following steps:

m is the number of predicted samples, the smaller the RMSE, the better the fit;

11. The energy consumption analysis and optimization method for an oil and gas gathering and transportation system according to claim 1, characterized in that in step 6, the enumerated quantity of each adjustable parameter is implemented through the machine learning model trained in step 5, a scheme is generated by an exhaustion method or an orthogonal experiment, each scheme is evaluated by an energy consumption quantitative prediction big data model, and the optimal scheme is selected in a queue.