CN113379093A - Energy consumption analysis and optimization method for oil gas gathering and transportation system - Google Patents
Energy consumption analysis and optimization method for oil gas gathering and transportation system Download PDFInfo
- Publication number
- CN113379093A CN113379093A CN202010159512.8A CN202010159512A CN113379093A CN 113379093 A CN113379093 A CN 113379093A CN 202010159512 A CN202010159512 A CN 202010159512A CN 113379093 A CN113379093 A CN 113379093A
- Authority
- CN
- China
- Prior art keywords
- energy consumption
- data
- oil
- transportation system
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005265 energy consumption Methods 0.000 title claims abstract description 65
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000004458 analytical method Methods 0.000 title claims abstract description 38
- 238000005457 optimization Methods 0.000 title claims abstract description 32
- 238000010801 machine learning Methods 0.000 claims abstract description 36
- 238000007637 random forest analysis Methods 0.000 claims abstract description 18
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000013499 data model Methods 0.000 claims abstract description 7
- 238000010219 correlation analysis Methods 0.000 claims abstract description 6
- 238000012549 training Methods 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 16
- 238000012795 verification Methods 0.000 claims description 12
- 230000002159 abnormal effect Effects 0.000 claims description 10
- 238000009826 distribution Methods 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 9
- 230000000694 effects Effects 0.000 claims description 8
- 238000004519 manufacturing process Methods 0.000 claims description 8
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims description 7
- 238000002790 cross-validation Methods 0.000 claims description 6
- 239000007788 liquid Substances 0.000 claims description 5
- 238000012935 Averaging Methods 0.000 claims description 4
- 238000000540 analysis of variance Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000002474 experimental method Methods 0.000 claims description 4
- 230000000007 visual effect Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012937 correction Methods 0.000 claims description 3
- 238000012800 visualization Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 abstract description 4
- 238000003066 decision tree Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 10
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000010438 heat treatment Methods 0.000 description 7
- 238000007405 data analysis Methods 0.000 description 6
- 238000004088 simulation Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 230000000087 stabilizing effect Effects 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 239000003245 coal Substances 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000004134 energy conservation Methods 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 229910052500 inorganic mineral Inorganic materials 0.000 description 1
- 239000011707 mineral Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000007789 sealing Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention provides an energy consumption analysis and optimization method for an oil-gas gathering and transportation system, which comprises the following steps: step 1, collecting operation data of an oil field gathering and transportation system; step 2, preprocessing the collected data; step 3, carrying out correlation analysis on the data to evaluate the correctness and the validity of the data; step 4, carrying out variance analysis on the parameters of the oil field gathering and transportation equipment and carrying out feature importance analysis through preliminary fitting of a random forest machine learning method; step 5, establishing a machine learning random forest regression model; and 6, evaluating each set of scheme by using an energy consumption quantitative prediction big data model, and queuing to select the optimal scheme. The energy consumption analysis and optimization method for the oil gas gathering and transportation system can greatly reduce the modeling difficulty, shorten the design period and reduce the workload, thereby providing reasonable reference basis for the adjustment and optimization of the oil gas gathering and transportation scheme.
Description
Technical Field
The invention relates to the technical field of big data application and oil-gas gathering and transportation energy conservation, in particular to an energy consumption analysis and optimization method for an oil-gas gathering and transportation system.
Background
For an oil gas gathering and transportation system, nodes influencing energy consumption are more, such as outlet water temperature after heating by a heating furnace, water content of an oil outlet of a three-phase separator, surface temperature of the upper surface, the middle surface and the lower surface of a settling tank and the like. The conventional energy-saving method is to establish an energy consumption conservation model of the gathering and transportation system by combining an energy consumption analysis rule of the oil-gas gathering and transportation system and an energy balance test, and to analyze the model by adopting an optimization algorithm with the power and the minimum heat power consumption of the gathering and transportation system as targets.
The data accumulated in the oil gas gathering and transportation production has the following characteristics:
1. multiple event excitation, high dimension and stronger coupling. The data acquisition is frequent, the acquisition density is high, repeated redundant data exists, a plurality of parameters of the system are mutually influenced, and the behavior state of the system is acted jointly;
2. the oil gas gathering and transportation production system has instability, and the collected data is easily polluted due to industrial noise;
3. dynamics and diversity of data types. Parameters such as pressure, temperature, flow, equipment state and the like are constantly changed along with time and comprise various types of data such as logic type, numerical value type and the like;
4. multiple timescale and incompleteness. The frequency of the signals acquired by different parameters is different, and data loss may occur due to asynchronous data recording;
5. and (4) multiple modes. The gathering and transportation system has a normal working state and a fault working condition.
The traditional simulation technology is complex in work and long in research period when establishing an energy consumption analysis and optimization model, and the characteristics of a large amount of complex oil gas gathering and transportation data need to be considered, so that the modeling difficulty is further increased.
In recent years, big data driven AI machine learning techniques have advanced significantly and have been used very successfully in the fields of language and image processing. With the development of parallel computing architecture, the machine learning technology also has the capability of on-line operation, and the high real-time performance and low complexity thereof make the analysis and optimization of the oil field gathering and transportation tightly combined to be possible. The big data analysis technology is applied to other energy industries such as electric power, coal and the like, and particularly has obvious application effect on the aspect of electronic information.
In the field of oilfield big data, the production operation rule of the oilfield is analyzed according to the big data by the sectional Zealand and the like so as to solve the problem of oilfield production business; the problems and bottlenecks in management are solved through analysis of oilfield management data. Zhang uses big data analysis technology to carry out application analysis in the aspects of abnormal well automatic identification, sealing driving device matching technology and screw pump oil extraction process system optimization. The types of big data analysis technologies are discussed in the plum garden and the like, and the construction method of the oil field big data analysis platform and the application of the big data analysis technology in oil field production are summarized.
In the aspect of domestic patents, a machine learning method is less in published patents for oil-gas gathering and transportation optimization, and a machine learning-based three-segment continuous stepping heating furnace optimization energy-saving method is disclosed in a machine learning-based three-segment continuous stepping heating furnace optimization energy-saving method (application number: CN 201910526161.7). The time dimension is introduced into operation experience, and the rule that the heating furnace thermal efficiency is attenuated along with the time is abstracted, so that the problem of employee experience deviation is solved. But is more applied in other fields: a method for optimizing a machine learning model is disclosed in a patent of optimization method, device, terminal equipment and storage medium of the machine learning model (application number: CN201810687251.X), and is characterized by comprising the following steps: acquiring a plurality of data to be processed; inputting the data to be processed into a machine learning model, and screening the data to be processed which meets preset conditions by using the machine learning model to serve as marking data; wherein the annotation data comprises training data and verification data; training the machine learning model by using the training data to determine a trained model; updating the machine learning model based at least on the trained model and the validation data. A method for identifying negative financial information based on a machine learning algorithm is disclosed in the patent 'method and device for identifying negative financial information based on a machine learning algorithm' (application number: CN 201910789700.6). the method comprises the steps of analyzing a financial information text described in a natural language by using the machine learning algorithm to judge whether the emotion reflected by the text is negative; therefore, the processing of large-volume information can be realized through a computer, and whether the information text is negative can be judged more accurately through a pre-constructed algorithm model. An ore visible light image sorting method based on Adaboost machine learning is disclosed in the patent application No. CN201610715882.9, the machine learning method based on Adaboost machine learning is utilized to learn, train and predict the ore visible light image, mineral separation technicians with rich experience can be fully simulated to carry out ore sorting, each machine has the same and accurate ore sorting experience, subjectivity and individual difference of manually sorting ores are effectively avoided, people can be better replaced to work or work which cannot be finished by people can be better finished, labor intensity is reduced, and product production quality and labor production efficiency are improved.
However, no one has studied on the oil and gas gathering and transportation optimization studied finely by using a big data machine learning technology, so that an oil field gathering and transportation system energy consumption analysis and optimization method based on a big data machine learning algorithm is provided. Therefore, a new energy consumption analysis and optimization method for the oil-gas gathering and transportation system is invented, and the technical problems are solved.
Disclosure of Invention
The invention aims to provide an energy consumption analysis and optimization method for an oil gas gathering and transportation system, which combines big data application with actual requirements of an oil field, reduces gathering and transportation energy consumption and provides more accurate technical support for oil gas gathering and transportation.
The object of the invention can be achieved by the following technical measures: the energy consumption analysis and optimization method of the oil gas gathering and transportation system comprises the following steps: step 1, collecting operation data of an oil field gathering and transportation system; step 2, preprocessing the collected data; step 3, carrying out correlation analysis on the data to evaluate the correctness and the validity of the data; step 4, carrying out variance analysis on the parameters of the oil field gathering and transportation equipment and carrying out feature importance analysis through preliminary fitting of a random forest machine learning method; step 5, establishing a machine learning random forest regression model; and 6, evaluating each set of scheme by using an energy consumption quantitative prediction big data model, and queuing to select the optimal scheme.
The object of the invention can also be achieved by the following technical measures:
in step 1, the operation data of the oil field gathering and transportation system includes process parameters and energy consumption data of each component in the gathering and transportation system during actual operation, and specifically includes the extracted liquid amount processed by the gathering and transportation system and the temperature, pressure and water content of each component.
In step 1, parameters of different devices are collected to form a data set including all energy consumption device parameters, and all structured, unstructured and semi-structured data of the oil gas gathering and transportation energy consumption device parameters related to analysis are extracted in an all-sampling mode.
In step 2, the data preprocessing performed includes: missing value processing, outlier processing, and data distribution processing.
In step 2, when missing value processing is performed, for a data column with a missing value in the data table, a mean value method is used to fill up the missing value, and the formula is as follows:
null is a missing value, i is a row index, j is a column index, and m is the number of samples; v. ofi,jIs the value of ith row and j column of the data table.
In step 2, when abnormal values are processed, abnormal data exist in the data table, namely, the numerical values exceed the normal range, and abnormal points are identified and deleted by adopting a visual means; when data distribution processing is carried out, the distribution of each parameter is modified into normal distribution.
In step 3, every two gathering parameters are visualized and analyzed.
In step 4, analysis of variance is to calculate the variance of each feature and then delete the feature with variance 0:
σ2is the overall variance, X is variable, mu is the overall mean, N is the overall number of instances;
in actual work, when the overall mean is difficult to obtain, the sample statistics is applied to replace the overall parameters, and after correction, the sample variance calculation formula is as follows:
In step 4, the feature importance analysis is to calculate the contribution of each feature to the final predicted energy consumption, and the greater the contribution, the higher the weight.
In step 5, regression averaging is used in constructing the model; firstly, dividing a sampling data set into a training set, a testing set and a verification set; the division proportion of the three data sets is set as 80%, 10% and 10%, the characteristics are that each device exceeds the parameters, the target is energy consumption, and the method specifically comprises the following steps:
firstly, adopting cross validation and grid search means, training a model by using a training set, and determining random forest model parameters;
evaluating the fitting effect of the model by using the test set, wherein the evaluation standard adopts the mean square error; assuming that the real energy consumption is E, the model predicts the energy consumption as E', and the mean square error is:
m is the number of predicted samples, the smaller the RMSE, the better the fit;
finally, adjusting the model hyper-parameters by using the verification set; the specific mode is that a verification set is brought into a trained model, and a learning curve is drawn; the best hyper-parameter is when the upper curve is close to the lower curve and both are relatively high.
In step 6, the enumerated quantity of each adjustable parameter is implemented through the machine learning model trained in step 5, a scheme is generated by an exhaustion method or an orthogonal experiment, each scheme is evaluated by a big data model with energy consumption quantitative prediction, and the best scheme is selected in a queue.
The energy consumption analysis and optimization method of the oil and gas gathering and transportation system is different from the conventional method of establishing an energy consumption conservation model of the gathering and transportation system by combining an energy consumption analysis rule of the oil and gas gathering and transportation system and an energy balance test and analyzing the model by using an optimization algorithm. A quick and efficient management means is provided for an energy-saving manager, the oil-gas gathering and transportation system is operated efficiently, and energy is saved finally.
The method adopts a random forest machine learning mode to clean data of collected parameters of various energy consumption devices (three separators, a settling tank, a heating furnace, a dewatering pump, a stabilizing tower, a tower bottom pump and an outward conveying pump), and then divides a data set into a training set, a testing set and a verification set according to a machine learning modeling process. And training the model by using a training set by means of cross validation and grid search to determine the hyper-parameters of the model. And evaluating the fitting effect of the model by using a test set, wherein the evaluation standard adopts a mean square error. And finally, adjusting the model parameters by using the verification set. And obtaining the random forest regression model with the best effect to predict the energy consumption. And then, implementing enumeration quantity of each adjustable parameter through a machine learning model, generating a scheme by using an exhaustion method or an orthogonal experiment, evaluating each set of scheme by using an energy consumption quantitative prediction big data model, and queuing to select an optimal scheme, thereby optimizing the oil gas gathering and transportation process and reducing energy consumption. Different from the simulation technology, the simulation technology needs to establish each energy consumption equipment model according to a large number of complex formulas, the invention avoids the construction of a large number of complex formulas, and the modeling time is shorter compared with the simulation.
The invention has the following excellent effects: aiming at the phenomena of high application difficulty, long scheme design period and high workload of the traditional numerical simulation means based on the whole flow, the invention provides a processing method for fitting various equipment parameters of oil and gas transportation by using a machine learning method and establishing an energy consumption prediction model. By adopting the improved method, the modeling difficulty can be greatly reduced, the design period is shortened, the workload is reduced, and a reasonable reference basis is provided for the adjustment and optimization of the oil-gas gathering and transportation scheme.
Drawings
FIG. 1 is a graph of data correlation analysis in accordance with an embodiment of the present invention;
FIG. 2 is a graph of feature importance analysis in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a random forest in accordance with an embodiment of the present invention;
FIG. 4 is a schematic cross-validation in accordance with an embodiment of the present invention;
FIG. 5 is a diagram illustrating a learning curve according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of model optimization in an embodiment of the present invention;
fig. 7 is a flowchart of an embodiment of the energy consumption analysis and optimization method of the oil and gas gathering and transportation system according to the present invention.
Detailed Description
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Fig. 7 is a flowchart of the energy consumption analysis and optimization method of the oil and gas gathering and transportation system of the present invention, which includes the following steps:
in step 101, operational data of an oilfield gathering system is collected according to oilfield monitoring equipment. The operation data of the oil field gathering and transportation system comprises process parameters and energy consumption data of each component in the gathering and transportation system during actual operation, and specifically can be the extracted liquid amount processed by the gathering and transportation system and the temperature, pressure, water content and the like of each component. The following equipment and respective parameters are considered in this embodiment:
three separators: inlet flow, level, gun level, outlet oil pressure
A settling tank: liquid level, oil-water interface, volume, quality of pure oil
Heating the furnace: inlet pressure, inlet temperature, outlet temperature
A dewatering pump: outlet pressure, variable frequency power frequency
A stabilizing tower: pressure, temperature
A tower bottom pump: outlet pressure, variable frequency power frequency
An external delivery pump: outlet pressure, frequency conversion power frequency, temperature, instantaneous flow, accumulated flow and power consumption
Other factors: ambient temperature
Each device is recorded into a respective table. And splicing the parameter tables of different equipment to form a data set comprising all the parameters of the energy consumption equipment. All structured, unstructured and semi-structured data of oil gas gathering and transportation energy consumption equipment parameters related to analysis are extracted in an overall sampling mode. The sampling table size is m rows and n columns. m represents the number of data samples, n represents the number of parameters, and m is much larger than n. The collection of data is in units of ten thousand, considering that the amount of machine learning data cannot be too small. n includes the incoming flow, level, gun level, outlet oil pressure, inlet temperature, outlet pressure, variable frequency power frequency, etc. Since machine learning has high requirements on data quality, the data will be further processed in step 102 after the data merging is completed in step 101.
In step 102, data preprocessing is performed on the data merged in step 101. The data preprocessing mainly carries out data cleaning work and aims to improve the data quality. The following processing is performed on the data in this example:
missing value processing: for the data columns with missing values in the data table, a mean value method is adopted to fill up the missing values, and the formula is as follows:
null is the missing value, i is the row index, j is the column index, and m is the number of samples. v. ofi,jIs the value of ith row and j column of the data table;
abnormal value processing: abnormal data (numerical values exceed a normal range) exist in the data table, and abnormal points are identified and deleted by adopting a visual means;
data distribution processing: the distribution of each parameter is modified to be normal.
After the data preprocessing is completed, the data analysis can proceed to step 103.
In step 103, a correlation analysis is first performed on the data to evaluate the correctness and validity of the data. This example first relates to a correlation analysis of data by visualization as shown in fig. 1. Particularly, every two parameters of gathering and transportation are analyzed in a visual mode. As can be seen from fig. 1, the power consumption has a positive correlation with the ambient temperature, and the output temperature has no obvious linear relationship with the ambient temperature. Therefore, it can be seen that the power consumption has a strong correlation with the ambient temperature.
At step 104, analysis of variance of the oilfield gathering equipment parameters and feature importance analysis by preliminary fitting through a random forest machine learning method are performed.
Analysis of variance is to calculate the variance of each feature and then remove features with variance of 0.
σ2For the population variance, X is the variable, μ is the population mean, and N is the population case number.
In actual work, when the overall mean is difficult to obtain, the sample statistics is applied to replace the overall parameters, and after correction, the sample variance calculation formula is as follows:
The feature importance analysis is to calculate the contribution of each feature to the final predicted energy consumption, and the greater the contribution, the higher the weight.
The main control factors (such as the water content of the incoming liquid, the output temperature and the power consumption) are searched through the first two steps, and the method is shown in figure 2. Extraneous factors are eliminated, and the purpose is to improve the model precision and reduce the model running time. Next, machine learning modeling is performed in step 105.
In step 105, a machine-learned random forest regression model is established.
The random forest principle is shown in fig. 3:
the random forest is a forest established in a random mode, the forest is composed of a plurality of decision trees, and each decision tree has no relation with each other. When a new sample exists, each decision tree of the forest is judged respectively, which class the sample belongs to is judged, and then the most classes are selected in a voting mode to serve as a final classification result. In the regression problem, the random forest outputs the average of all decision tree outputs.
In random forest, four steps for each decision tree "planting" and "growing":
(1) assuming that the number of samples in a training set is set to be N, then the N samples are obtained through repeated multiple sampling (boost sampling) with resetting, and the sampling result is used as the training set of the decision tree generated by the user;
(2) if there are M input variables, each node will randomly select M (M < M) specific variables and then use the M variables to determine the best split point. In the generation process of the decision tree, the value of m is kept unchanged, and the splitting method adopts the following formula to split the decision tree to the direction with the maximum information gain, which is specifically shown in a formula (4);
d is the data set, a is a certain characteristic (such as the temperature of the settling tank) in the data set, Gain is the information Gain, Ent (D) is the entropy of D,and v is the conditional entropy of D and the number of categories.
(3) Each decision tree is grown to the maximum possible without pruning;
(4) new data is predicted by summing all decision trees (majority voting in classification, averaging in regression).
Since energy consumption prediction belongs to the regression problem, we use regression averaging when building the model. First, a sample data set is divided into a training set, a test set and a verification set. The division ratio of the three data sets is 80%, 10% and 10%, the characteristics are that each device is over-parameter, and the target is energy consumption.
Thirdly, adopting cross validation and grid search means, training the model by using a training set, and determining the parameters of the random forest model. Cross-validation is shown in figure 4.
And fourthly, evaluating the fitting effect of the model by using the test set, wherein the evaluation standard adopts a mean square error. Assuming that the real energy consumption is E, the model predicts the energy consumption as E', and the mean square error is:
m is the number of predicted samples, and a smaller RMSE indicates a better fit.
And finally, adjusting the hyper-parameters of the model by using the verification set. The specific method is to bring the verification set into a trained model and draw a learning curve, wherein the learning curve is shown in fig. 5. The best hyper-parameter is when the upper and lower curves are close and both are relatively high. The hyper-parameters include the number of trees, the depth of the trees, the number of leaf nodes of the trees, etc. After model hyper-parameters are determined in step 105 and the model is trained, optimal parameter selection may be performed in step 106.
At step 106, the overall flow is as shown in FIG. 6. And (5) implementing enumeration of each adjustable parameter through the machine learning model trained in the step 105, generating a scheme by an exhaustion method or an orthogonal experiment, evaluating each set of scheme by using an energy consumption quantitative prediction big data model, and queuing to select an optimal scheme.
The energy consumption analysis and optimization method for the oil-gas gathering and transportation system can greatly reduce the modeling difficulty, shorten the design period and reduce the workload, thereby providing a reasonable reference basis for the adjustment and optimization of the oil-gas gathering and transportation scheme. However, the model adopted by the method has low interpretability and high requirement on data quality, and the model needs to be trained to obtain a better effect.
Claims (11)
1. The energy consumption analysis and optimization method of the oil gas gathering and transportation system is characterized by comprising the following steps:
step 1, collecting operation data of an oil field gathering and transportation system;
step 2, preprocessing the collected data;
step 3, carrying out correlation analysis on the data to evaluate the correctness and the validity of the data;
step 4, carrying out variance analysis on the parameters of the oil field gathering and transportation equipment and carrying out feature importance analysis through preliminary fitting of a random forest machine learning method;
step 5, establishing a machine learning random forest regression model;
and 6, evaluating each set of scheme by using an energy consumption quantitative prediction big data model, and queuing to select the optimal scheme.
2. The energy consumption analysis and optimization method for an oil and gas gathering and transportation system according to claim 1, wherein in step 1, the operation data of the oil field gathering and transportation system comprises process parameters and energy consumption data of each component in the gathering and transportation system during actual operation, and specifically comprises the liquid production amount processed by the gathering and transportation system and the temperature, pressure and water content of each component.
3. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system according to claim 2, wherein in step 1, parameters of different devices are collected to form a data set including all energy consumption device parameters, and all structured, unstructured and semi-structured data of the oil and gas gathering and transportation energy consumption device parameters related to analysis are extracted in an all-sampling manner.
4. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system as claimed in claim 1, wherein in step 2, the data preprocessing performed comprises: missing value processing, outlier processing, and data distribution processing.
5. The energy consumption analysis and optimization method of an oil and gas gathering and transportation system according to claim 4, wherein in step 2, when missing value processing is performed, for data columns with missing values in a data table, a mean value method is used to fill up the missing values, and the formula is as follows:
null is the miss value, i is the row index, j is the columnIndex, m is the number of samples; v. ofi,jIs the value of ith row and j column of the data table.
6. The energy consumption analysis and optimization method of an oil and gas gathering and transportation system according to claim 4, characterized in that in step 2, when abnormal values are processed, abnormal data exist in the data table, namely, the numerical value exceeds the normal range, and abnormal points are identified and deleted by a visual means; when data distribution processing is carried out, the distribution of each parameter is modified into normal distribution.
7. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system according to claim 1, wherein in step 3, gathering and transportation parameters are analyzed in a pairwise visualization manner.
8. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system according to claim 1, wherein in step 4, the analysis of variance is to calculate the variance of each feature and then delete the feature with the variance of 0:
σ2is the overall variance, X is variable, mu is the overall mean, N is the overall number of instances;
in actual work, when the overall mean is difficult to obtain, the sample statistics is applied to replace the overall parameters, and after correction, the sample variance calculation formula is as follows:
9. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system according to claim 1, wherein in step 4, the feature importance analysis is to calculate the contribution of each feature to the final predicted energy consumption, and the greater the contribution, the higher the weight.
10. The method for analyzing and optimizing energy consumption of an oil and gas gathering and transportation system according to claim 1, wherein in step 5, regression averaging is used in constructing the model; firstly, dividing a sampling data set into a training set, a testing set and a verification set; the division proportion of the three data sets is set as 80%, 10% and 10%, the characteristics are that each device exceeds the parameters, the target is energy consumption, and the method specifically comprises the following steps:
firstly, adopting cross validation and grid search means, training a model by using a training set, and determining random forest model parameters;
evaluating the fitting effect of the model by using the test set, wherein the evaluation standard adopts the mean square error; assuming that the real energy consumption is E, the model predicts the energy consumption as E', and the mean square error is:
m is the number of predicted samples, the smaller the RMSE, the better the fit;
finally, adjusting the model hyper-parameters by using the verification set; the specific mode is that a verification set is brought into a trained model, and a learning curve is drawn; the best hyper-parameter is when the upper curve is close to the lower curve and both are relatively high.
11. The energy consumption analysis and optimization method for an oil and gas gathering and transportation system according to claim 1, characterized in that in step 6, the enumerated quantity of each adjustable parameter is implemented through the machine learning model trained in step 5, a scheme is generated by an exhaustion method or an orthogonal experiment, each scheme is evaluated by an energy consumption quantitative prediction big data model, and the optimal scheme is selected in a queue.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010159512.8A CN113379093A (en) | 2020-03-09 | 2020-03-09 | Energy consumption analysis and optimization method for oil gas gathering and transportation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010159512.8A CN113379093A (en) | 2020-03-09 | 2020-03-09 | Energy consumption analysis and optimization method for oil gas gathering and transportation system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113379093A true CN113379093A (en) | 2021-09-10 |
Family
ID=77568592
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010159512.8A Pending CN113379093A (en) | 2020-03-09 | 2020-03-09 | Energy consumption analysis and optimization method for oil gas gathering and transportation system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113379093A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116528270A (en) * | 2023-06-27 | 2023-08-01 | 杭州电瓦特科技有限公司 | Base station energy saving potential evaluation method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106355208A (en) * | 2016-08-31 | 2017-01-25 | 广州精点计算机科技有限公司 | Data prediction analysis method based on COX model and random survival forest |
US20180285788A1 (en) * | 2015-10-13 | 2018-10-04 | British Gas Trading Limited | System for energy consumption prediction |
CN109063313A (en) * | 2018-07-26 | 2018-12-21 | 北京交通大学 | Calculation Method of Energy Consumption in Train Traction based on machine learning |
CN109543203A (en) * | 2017-09-22 | 2019-03-29 | 山东建筑大学 | A kind of Building Cooling load forecasting method based on random forest |
CN110705774A (en) * | 2019-09-26 | 2020-01-17 | 汉纳森(厦门)数据股份有限公司 | Vehicle energy consumption analysis prediction method and system |
-
2020
- 2020-03-09 CN CN202010159512.8A patent/CN113379093A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180285788A1 (en) * | 2015-10-13 | 2018-10-04 | British Gas Trading Limited | System for energy consumption prediction |
CN106355208A (en) * | 2016-08-31 | 2017-01-25 | 广州精点计算机科技有限公司 | Data prediction analysis method based on COX model and random survival forest |
CN109543203A (en) * | 2017-09-22 | 2019-03-29 | 山东建筑大学 | A kind of Building Cooling load forecasting method based on random forest |
CN109063313A (en) * | 2018-07-26 | 2018-12-21 | 北京交通大学 | Calculation Method of Energy Consumption in Train Traction based on machine learning |
CN110705774A (en) * | 2019-09-26 | 2020-01-17 | 汉纳森(厦门)数据股份有限公司 | Vehicle energy consumption analysis prediction method and system |
Non-Patent Citations (3)
Title |
---|
文雯;刘文哲;肖祥武;向春波;谢小鹏;姜鑫;: "基于大数据和并行随机森林算法火电机组供电煤耗计算模型", 热力发电, no. 09, pages 13 - 18 * |
肖祥武 等: "基于大数据平台和并行随机森林算法的能耗预测模型优化", 华电技术, vol. 40, no. 7, pages 1 - 4 * |
黄铠;冯运凯;刘建武;程浩;张影;: "基于大数据挖掘的油气田企业全产业链精准管理", 物流技术, no. 02, pages 108 - 114 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116528270A (en) * | 2023-06-27 | 2023-08-01 | 杭州电瓦特科技有限公司 | Base station energy saving potential evaluation method, device, equipment and storage medium |
CN116528270B (en) * | 2023-06-27 | 2023-10-03 | 杭州电瓦特科技有限公司 | Base station energy saving potential evaluation method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106845717B (en) | Energy efficiency evaluation method based on multi-model fusion strategy | |
CN107274105B (en) | Linear discriminant analysis-based multi-attribute decision tree power grid stability margin evaluation method | |
CN104809658B (en) | A kind of rapid analysis method of low-voltage distribution network taiwan area line loss | |
CN111340063B (en) | Data anomaly detection method for coal mill | |
CN111259947A (en) | Power system fault early warning method and system based on multi-mode learning | |
CN105701596A (en) | Method for lean distribution network emergency maintenance and management system based on big data technology | |
CN109472241A (en) | Combustion engine bearing remaining life prediction technique based on support vector regression | |
CN110782658A (en) | Traffic prediction method based on LightGBM algorithm | |
CN110750524A (en) | Method and system for determining fault characteristics of active power distribution network | |
CN111815054A (en) | Industrial steam heat supply network short-term load prediction method based on big data | |
CN115794803B (en) | Engineering audit problem monitoring method and system based on big data AI technology | |
CN110335168A (en) | Method and system based on GRU optimization power information acquisition terminal fault prediction model | |
CN112987666A (en) | Power plant unit operation optimization regulation and control method and system | |
CN114048436A (en) | Construction method and construction device for forecasting enterprise financial data model | |
CN111476274B (en) | Big data predictive analysis method, system, device and storage medium | |
CN113379093A (en) | Energy consumption analysis and optimization method for oil gas gathering and transportation system | |
CN113030633B (en) | GA-BP neural network-based power distribution network fault big data analysis method and system | |
CN114169998A (en) | Financial big data analysis and mining algorithm | |
CN103886512A (en) | Thermal power unit index evaluation unit based on gray level clustering | |
CN116467658A (en) | Equipment fault tracing method based on Markov chain | |
CN115186935B (en) | Electromechanical device nonlinear fault prediction method and system | |
CN116108963A (en) | Electric power carbon emission prediction method and equipment based on integrated learning module | |
CN108493933A (en) | A kind of Characteristics of Electric Load method for digging based on depth decision Tree algorithms | |
CN114693175A (en) | Unit state analysis method and system based on network source network-related test | |
Zhou et al. | Study on Optimization of Data-Driven Anomaly Detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |