CN117391227A

CN117391227A - Oil pumping unit system efficiency prediction method and system based on ensemble learning algorithm

Info

Publication number: CN117391227A
Application number: CN202210776243.9A
Authority: CN
Inventors: 王才; 张喜顺; 赵瑞东; 师俊峰; 熊春明; 孙艺真; 邓峰; 刘猛; 陈诗雯; 陈冠宏
Original assignee: Petrochina Co Ltd
Current assignee: Petrochina Co Ltd
Priority date: 2022-06-29
Filing date: 2022-06-29
Publication date: 2024-01-12

Abstract

The invention belongs to the field of oil extraction engineering, and particularly discloses an oil pumping unit system efficiency prediction method and system based on an integrated learning algorithm, wherein the oil pumping unit system efficiency prediction method and system comprises the steps of constructing oil extraction big data resource pools of different property blocks represented by shaft data, oil pumping unit data and production data by adopting a construction unit; the screening unit screens main control factors affecting the system efficiency in the big data resource pool through a machine learning model; the prediction unit predicts the system efficiency by using a plurality of integrated learning models by taking the screened main control factors as main characteristic parameters. The method is independent of solving a complicated pumping unit system efficiency equation, utilizes an integrated learning algorithm to analyze the pumping unit production big data, fully considers the sensitivity influence of each production parameter, has high prediction precision, meets the development requirement of the oil field Internet of things, and has important significance in promoting the construction of the low-cost Internet of things, excavating production potential, reducing cost and enhancing efficiency.

Description

Oil pumping unit system efficiency prediction method and system based on ensemble learning algorithm

Technical Field

The invention belongs to the field of oil extraction engineering, and particularly relates to an oil pumping unit system efficiency prediction method and system based on an integrated learning algorithm.

Background

The oil pumping well accumulates mass data and contains energy consumption evaluation and working system optimization models. The traditional working system optimization is mostly based on a physical modeling method, and a physical model which comprehensively considers the influence of all factors is difficult to build on the premise of more. The system efficiency is an visual index for analyzing the energy consumption of the oil pumping unit, and the current common method is to approximate and regress the calculation formula of the system efficiency through the calculation formula of the system efficiency.

The working process of the sucker rod pumping system is a process of continuously transmitting and converting energy, and a certain amount of energy is lost when each transmission of the energy is carried out. After various losses of the system are subtracted from the input energy of the ground motor, the system lifts the effective energy of the liquid. By hydraulic power P _e Representing the work done by the pumping unit to lift a certain amount of liquid to the ground in a unit time, using P _{Into (I)} Representing the motor input power, the ratio of which is the conventional system efficiency.

According to the working characteristics of the pumping unit system, the efficiency of the pumping unit system can be divided into ground efficiency and underground efficiency. Defining the power P consumed by the polish rod to lift the liquid and overcome various resistances downhole _{Light source} The power of the polished rod is as follows:

the energy loss in the ground part occurs in the motor, belt and reduction gearbox, four bar linkage, therefore:

wherein K-payload factor; η (eta) ₁ -motor efficiency; η (eta) ₂ Belt and reduction gearbox efficiency; η (eta) ₃ -four bar linkage efficiency.

The energy lost in the downhole part is in packing boxes, sucker rods, pumps and tubing strings, thus

η ₁ -packing box efficiency; η (eta) ₂ -sucker rod efficiency; η (eta) ₃ Efficiency, eta of oil pump ₃ -column efficiency. The system efficiency of the resulting sucker rod pumping system is as follows:

η＝K×η ₁ ×η ₂ ×η ₃ ×η ₄ ×η ₅ ×η ₆ ×η ₇

from the conventional system efficiency concept, it can be seen that many factors affecting the system efficiency, such as the liquid supply capacity of the reservoir, the characteristics of the fluid, the trajectory of the well, the equipment and operating parameters of the well, affect the exertion of the system efficiency. Because the system efficiency has a plurality of influencing factors and the model is complex, it is difficult to establish an accurate physical model. At present, a method for predicting the efficiency of the pumping unit system by utilizing an integrated learning algorithm is rarely reported, and a system efficiency prediction model is established based on big data machine learning, so that the method has important significance in promoting the construction of the low-cost Internet of things, excavating production potential, reducing cost and enhancing efficiency.

Disclosure of Invention

In view of the above problems, in one aspect, the present invention discloses a method for predicting efficiency of a pumping unit system based on an ensemble learning algorithm, where the method includes:

constructing oil extraction big data resource pools of different property blocks represented by shaft data, pumping unit data and production data;

screening main control factors affecting system efficiency in the big data resource pool through a machine learning model;

and predicting the system efficiency by using the screened main control factors as main characteristic parameters and adopting a plurality of integrated learning models.

Further, the wellbore data includes pump diameter, pump depth, pump efficiency, and actual lift;

the pumping unit data comprise a model, a motor model, a stroke frequency, consumed power, a maximum load, a minimum load, daily power consumption, torque and maximum torque;

the production data comprise daily oil production, accumulated oil production liquid, water content, working fluid level, sinking degree, oil pressure, casing pressure and hundred-meter ton liquid power consumption.

Further, the data is analyzed and preprocessed before the large data resource pool is established, including analyzing the data duty ratio of each month, the number of each type of data, and the overall quality analysis of the data.

Further, screening the master factors affecting the system efficiency through the machine learning model comprises the following steps:

calculating the correlation between the system efficiency and each data in the big data resource pool by using a machine learning model;

drawing a relevance heat value graph of the system efficiency and each characteristic parameter according to the relevance calculation result;

and sorting according to the relevance intensity of each characteristic parameter and the system efficiency by combining the relevance heat value graph, wherein the sorting comprises sorting of characteristic parameters positively related to the system efficiency and sorting of characteristic parameters positively related to the system efficiency.

Further, the machine learning model includes a pearson correlation coefficient model.

Further, the ensemble learning model includes a random forest model, an AdaBoost model, a GradientBoosting model, and/or a Bagging model.

Further, the prediction method further includes: and comparing the prediction results of the integrated learning models, and selecting the prediction result of the integrated learning model with the highest fitting precision as the final prediction result.

On the other hand, the invention also discloses a pumping unit system efficiency prediction system based on the ensemble learning algorithm, the prediction system comprises:

the construction unit is used for constructing oil extraction big data resource pools of different property blocks represented by the shaft data, the pumping unit data and the production data;

the screening unit is used for screening main control factors affecting the system efficiency in the big data resource pool through a machine learning model;

and the prediction unit is used for predicting the system efficiency by taking the screened main control factors as main characteristic parameters and adopting a plurality of integrated learning models.

Further, the system further comprises:

the preprocessing unit is used for analyzing and preprocessing data before the large data resource pool is built, and comprises the steps of analyzing the data duty ratio of each month, the number of various types of data and the overall quality analysis of the data;

and a comparison unit: the method is used for comparing the prediction results of the integrated learning models, and selecting the prediction result of the integrated learning model with the highest fitting precision as the final prediction result.

Further, the screening unit performs the steps of:

Further, the machine learning model includes a pearson correlation coefficient model; the ensemble learning model comprises a random forest model, an AdaBoost model, a Gradientboosting model and/or a Bagging model.

The invention has the beneficial effects that:

the method is independent of solving a complicated pumping unit system efficiency equation, utilizes an integrated learning algorithm to analyze the pumping unit production big data, fully considers the sensitivity influence of each production parameter, has high prediction precision, meets the development requirement of the oil field Internet of things, and has important significance in promoting the construction of the low-cost Internet of things, excavating production potential and reducing cost and enhancing efficiency;

the invention analyzes and preprocesses the data before establishing the big data resource pool, can quickly know the whole quality and approximate distribution of the data, and can provide a basis for the follow-up screening of main control factors;

according to the invention, after the machine learning model is utilized to calculate the correlation between the system efficiency and each data in the big data resource pool, the correlation heat value diagram and the correlation sequencing are utilized to visually represent the correlation of each characteristic parameter, so that the main control factors can be screened out more clearly and intuitively.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 shows a schematic diagram of a pumping unit power curve simulation flow in an embodiment of the invention;

FIG. 2 shows a pie chart of data for each month in data analysis in an embodiment of the invention;

FIG. 3 shows a histogram of model number statistics in an embodiment of the invention;

FIG. 4 shows a feature attribute statistics bin graph (subject value < 100) in an embodiment of the invention;

FIG. 5 shows a heat value graph of the relevance of various factors of the system efficiency in an embodiment of the invention;

FIG. 6 illustrates a system efficiency master factor relevance rank in an embodiment of the invention;

FIG. 7 shows random forest model predictions in an embodiment of the present invention;

FIG. 8 shows the prediction results of an AdaBoosting model in an embodiment of the present invention;

FIG. 9 shows the prediction results of the GradientBoosting model in the embodiment of the present invention;

FIG. 10 shows the prediction result of the Bagging model in the embodiment of the invention;

FIG. 11 shows the prediction results of a support vector machine model with radial basis function in an embodiment of the present invention;

FIG. 12 shows the prediction results of a support vector machine model using a polynomial as a kernel function in an embodiment of the present invention;

FIG. 13 shows the prediction result of the minimum K nearest neighbor model in the embodiment of the invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the machine learning process, although a weak learner similar to a support vector machine or a decision tree can be theoretically used to obtain good performance, in practice, for various reasons, some experiences of some common learners are usually reused, and a relatively strong learner is often used, and a learning algorithm formed by reusing and combining the weak learners is called integrated learning. Therefore, in order to improve the prediction precision, the invention provides a method for predicting the efficiency of the pumping unit system by analyzing production data through an integrated learning algorithm, and carries out sensitivity analysis on factors influencing the system efficiency, thereby providing theoretical support for evaluating the working state of the pumping unit and optimizing the working system of the pumping unit.

The invention provides a pumping unit system efficiency prediction method based on an ensemble learning algorithm, which is specifically shown in fig. 1, and comprises the following steps:

acquiring shaft data, pumping unit data and production data and preprocessing the shaft data, the pumping unit data and the production data;

according to the shaft data, the pumping unit data and the production data, forming a large oil extraction data resource pool representing different property blocks, carrying out statistical analysis on data quality, and establishing an effective data analysis set;

screening main control factors influencing the system efficiency through a machine learning model and evaluating the correlation;

and predicting the system efficiency by using the main control factors affecting the system efficiency as main characteristic parameters and optimizing the predicted value by using a plurality of integrated learning models.

Based on the prediction method, the invention also constructs a set of oil pumping unit system efficiency prediction system based on an integrated learning algorithm, and the prediction system comprises:

the preprocessing unit is used for analyzing and preprocessing the acquired data and comprises the steps of analyzing the data duty ratio of each month, the number of various types of parameters and the overall quality analysis of the data; the data includes wellbore data, pumping unit data, and production data.

And the construction unit is used for constructing oil extraction big data resource pools of different property blocks represented by the shaft data, the pumping unit data and the production data.

And the screening unit screens main control factors affecting the system efficiency in the big data resource pool through a machine learning model.

And the comparison unit is used for comparing the prediction results of the integrated learning models and selecting the prediction result of the integrated learning model with the highest fitting precision as the final prediction result.

The well bore data comprise pump diameter, pump depth, pump efficiency and actual lifting; the pumping unit data comprise a model, a motor model, a stroke frequency, consumed power, a maximum load, a minimum load, daily power consumption, torque and maximum torque; the production data comprise daily oil production, accumulated oil production liquid, water content, working fluid level, sinking degree, oil pressure, casing pressure and hundred-meter ton liquid power consumption.

Wherein the screening unit performs the steps of:

The machine learning model comprises a pearson correlation coefficient model; the ensemble learning model comprises a random forest model, an AdaBoost model, a Gradientboosting model and/or a Bagging model.

The above method is described in detail with reference to specific examples.

The present embodiment describes the above process in detail taking statistics of system efficiency of the D oil field 1850 production well 2019, 10 months to 2020, 2 months and related production data as an example.

(1) Well bore data, pumping unit data and production data are collected and analyzed and preprocessed.

The collected characteristic parameters comprise daily data of machine type, motor type, pump diameter, pump depth, stroke frequency, daily liquid yield, daily oil yield, water content, working fluid level, sinking degree, oil pressure, casing pressure, consumed power, actual lifting, hundred-meter ton liquid power consumption, maximum load, minimum load, daily power consumption, pump efficiency, torque, maximum torque, load utilization rate, accumulated liquid yield and accumulated oil yield, and the characteristic parameter data of each month are shown in a figure 2. As can be seen from fig. 2, the data ratio of each month between 10 months in 2019 and 2 months in 2020 is: 22.13%, 23.25%, 24.56%, 15.03%.

The distribution of the characteristic parameters is counted by using a programming language, for example, the statistics of the model is carried out in FIG. 3, and the model is mainly CYJS8-3-37HB, PCYJY8-3-37HF and DCYJY8-3-37 HB. Fig. 4 is a box-type statistical chart of features with theoretical values smaller than 100 in the data set, circles in the chart represent abnormal values, and the chart shows that the abnormal values in the data set are more, and the overall data quality deviation is caused.

(2) System efficiency master control factor analysis based on big data analysis

In the embodiment, a pearson correlation coefficient model is selected to represent correlation analysis of system efficiency and production parameters, and the pearson correlation coefficient model is expressed by the following formula:

wherein x is _i Representing the ith characteristic parameter of each piece of data,representing the average of all i-th characteristic parameters. The correlation coefficient properties are as follows:

1) When |r|=1, the x and y variables are perfectly linear, and there is a definite functional relationship between x and y.

2) When 0 < |r| < 1, the criteria for general judgment are: 0 < |r| is less than or equal to 0.2 and is called low correlation; 0.2 < |r|is less than or equal to 0.6 and is called moderate correlation; 0.6 < |r|.ltoreq.1 is called highly correlated.

3) When r > 0, x and y are positive correlations, and when r < 0, x and y are negative correlations.

The relevance heat value diagram of the system efficiency and each characteristic parameter is drawn based on the pearson correlation coefficient model, and particularly as shown in fig. 5, as can be seen from fig. 5, the characteristic parameter with the highest heat is a model. And ordering the system efficiency correlation characteristics, the result is shown in fig. 6, and as can be seen from fig. 6, the result of ordering the positive correlation characteristics with the system efficiency according to the correlation strength is: daily fluid production, pump diameter, cumulative fluid production, cumulative oil production, stroke frequency, water content, torque, oil pressure, daily oil production, stroke, casing pressure, sinking degree, maximum torque and pump efficiency. The result of the ranking of the negative correlation features with the system efficiency according to the correlation strength is as follows: hundred meters ton of liquid power consumption, pump depth, working fluid level, actual lifting height, minimum load, motor model, load utilization rate, daily power consumption, model, power consumption and maximum load. The sequencing can improve the prediction efficiency and effect of the subsequent model.

(3) Model building and prediction

The embodiment adopts 4 integrated learning models of a random forest model, an AdaBoost model, a Gradientboosting model and a Bagging model to predict the efficiency of the pumping unit system. Prediction of random forest modelThe result is shown in fig. 7, the prediction result of the AdaBoost model is shown in fig. 8, the prediction result of the GradientBoosting model is shown in fig. 9, the prediction result of the Bagging model is shown in fig. 10, and as can be seen from fig. 7-10, the 4 ensemble learning models have better prediction capability, wherein the fitting precision of the AdaBoost model is the lowest (r ² = 0.9616), the highest fitting accuracy of the Bagging model (r ² = 0.9896). While the effect is general by adopting a single machine learning prediction model, for example, a support vector machine model prediction with radial basis as a kernel function is adopted, and the prediction result is shown in fig. 11; the results of the support vector machine model and the minimum K-nearest neighbor model using the polynomial as a kernel function are shown in fig. 12 and 13, respectively. Comparing fig. 7-10 and fig. 11-13, it can be seen that the fitting effect and prediction ability of each ensemble learning method is far better than that of a single machine learning prediction model.

From the model prediction results, it can be seen that the ensemble learning model has a better prediction effect than a single machine learning model. In machine-learned supervised learning algorithms, sometimes only a plurality of favored models (weakly supervised models, which perform better in some ways) are available. The integrated learning is to combine a plurality of weak supervision models to obtain a better and more comprehensive strong supervision model, and even if one weak classifier obtains error prediction, other weak classifiers can correct the error. Several machine learning techniques are combined into a meta algorithm of a prediction model to achieve the effects of reducing variance (implemented by a bagging method), deviation (implemented by a boosting method) or improving prediction (implemented by a stacking method). Ensemble learning has a very good strategy on data sets of various scales. If the data set is large, it can be divided into multiple small data sets, and multiple models are learned for combination. If the data set is small, sampling can be performed by using a Bootstrap method, so that a plurality of data sets are obtained, and a plurality of models are respectively trained and then combined.

Integration methods can be divided into two categories:

(1) A sequence integration method in which a base learner that participates in training is generated sequentially (e.g., adaBoost). The principle of the sequence method is to use the dependency relationship between the basic learners. By assigning a higher weight to the erroneously marked samples in the previous training, the overall predictive effect can be improved.

(2) Parallel integration methods in which the underlying learners involved in training are generated in parallel (e.g., ran dot Forest). The principle of the parallel method is to use independence between basic learners, and errors can be significantly reduced by averaging.

From the above examples, it can be seen that the result of predicting the efficiency of the pumping unit system by analyzing the production data and adopting the integrated learning model is more accurate, the accuracy is higher, and theoretical support can be provided for evaluating the working state of the pumping unit and optimizing the working system of the pumping unit.

Although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An oil pumping unit system efficiency prediction method based on an ensemble learning algorithm, wherein the prediction method comprises the following steps:

2. The pumping unit system efficiency prediction method of claim 1, wherein the wellbore data includes pump diameter, pump depth, pump efficiency, and actual lift;

3. The pumping unit system efficiency prediction method according to claim 1 or 2, wherein,

the data is analyzed and preprocessed before the large data resource pool is established, including analyzing the data duty ratio of each month, the number of each type of data and the overall quality analysis of the data.

4. The pumping unit system efficiency prediction method according to claim 1 or 2, wherein,

screening master control factors affecting system efficiency through a machine learning model comprises the following steps:

5. The pumping unit system efficiency prediction method of claim 4, wherein,

the machine learning model includes a pearson correlation coefficient model.

6. The pumping unit system efficiency prediction method according to claim 1 or 2, wherein,

the ensemble learning model comprises a random forest model, an AdaBoost model, a Gradientboosting model and/or a Bagging model.

7. The pumping unit system efficiency prediction method according to claim 1 or 2, wherein the prediction method further comprises: and comparing the prediction results of the integrated learning models, and selecting the prediction result of the integrated learning model with the highest fitting precision as the final prediction result.

8. An integrated learning algorithm-based pumping unit system efficiency prediction system, wherein the prediction system comprises:

9. The pumping unit system efficiency prediction system of claim 8, wherein the wellbore data comprises pump diameter, pump depth, pump efficiency, and actual lift;

10. The pumping unit system efficiency prediction system of claim 8 or 9, wherein the system further comprises:

11. The pumping unit system efficiency prediction system according to claim 8 or 9, wherein,

the screening unit performs the following steps:

12. The pumping unit system efficiency prediction system of claim 11, wherein,