CN112966023B - Integrity prejudging method for shaft - Google Patents
Integrity prejudging method for shaft Download PDFInfo
- Publication number
- CN112966023B CN112966023B CN202110269788.6A CN202110269788A CN112966023B CN 112966023 B CN112966023 B CN 112966023B CN 202110269788 A CN202110269788 A CN 202110269788A CN 112966023 B CN112966023 B CN 112966023B
- Authority
- CN
- China
- Prior art keywords
- well
- attribute
- data
- integrity
- calculating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 46
- 238000004519 manufacturing process Methods 0.000 claims abstract description 51
- 238000003066 decision tree Methods 0.000 claims abstract description 34
- 238000004422 calculation algorithm Methods 0.000 claims description 22
- 230000007547 defect Effects 0.000 abstract description 5
- 230000002159 abnormal effect Effects 0.000 abstract description 3
- 238000007637 random forest analysis Methods 0.000 description 9
- 238000011156 evaluation Methods 0.000 description 8
- 239000003129 oil well Substances 0.000 description 6
- 238000007789 sealing Methods 0.000 description 6
- 238000007418 data mining Methods 0.000 description 4
- 239000004568 cement Substances 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 239000004215 Carbon black (E152) Substances 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 229930195733 hydrocarbon Natural products 0.000 description 1
- 150000002430 hydrocarbons Chemical class 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/02—Agriculture; Fishing; Forestry; Mining
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Business, Economics & Management (AREA)
- Mining & Mineral Resources (AREA)
- Marketing (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Agronomy & Crop Science (AREA)
- Animal Husbandry (AREA)
- Marine Sciences & Fisheries (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Evolutionary Computation (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Artificial Intelligence (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Testing Of Devices, Machine Parts, Or Other Structures Thereof (AREA)
- Testing Or Calibration Of Command Recording Devices (AREA)
Abstract
The invention relates to a method for predicting the integrity of a shaft, which comprises the following steps: s1, establishing a big data sample according to the existing oil-gas well data; s2, determining characteristics of a well to be prejudged; s3, randomly generating k data subsets from the big data samples; s4, calculating the information gain rate of the discrete attribute and the continuous attribute of the k data subsets; s5, utilizing the information gain rate splitting subsets to establish the 1 st to the k decision tree; s6, placing the characteristics of the well to be pre-judged in the kth decision tree, and voting to decide the shaft integrity of the well to be pre-judged in the nth year. The method overcomes the defects that the existing method is low in accuracy, easy to be interfered by abnormal data, inaccurate in pre-judgment caused by factors such as reservoir temperature and pressure in production time and the like.
Description
Technical Field
The invention relates to the technical field of oil and gas wells, in particular to an integrity pre-judging method of a shaft.
Background
The failure of the integrity of a shaft is one of main risks in the production process of a high-temperature high-pressure oil-gas well, and is characterized in that the phenomenon of pressure in an annulus with original pressure being zero begins to appear, and once oil gas leakage occurs, the safety of personnel and equipment is seriously threatened. Currently, the problem of well integrity failure is more and more prominent in high-temperature high-pressure oil-gas wells, such as 16000 production wells in OCS (OCS) area of gulf of Mexico, and annular pressure phenomenon occurs in 43%.
Existing studies on wellbore integrity mainly focus on risk assessment, namely quantitatively calculating the integrity risk of a target well. The method comprises the steps of decomposing a well bore into a plurality of evaluation units, respectively calculating risk factors, such as a tubular column, a cement collar, a wellhead device and the like, wherein the risk factor of each unit is represented by the product of the failure frequency of the unit and the severity of the failure result, and comprehensively weighting after the risk factors of each unit are obtained to form the risk degree of the well. However, the existing method does not consider the influence of various production factors, the value of the severity of the failure result is too subjective, and the accuracy of the evaluation result is not high; on the other hand, the contribution of each unit to the whole well integrity failure is difficult to define; in addition, the conventional evaluation method has little significance on guidance of practice, and cannot predict the future failure years.
Accordingly, there is a need for a more comprehensive, objective, and scientific method of predicting wellbore integrity.
Disclosure of Invention
In view of one or more of the above-mentioned deficiencies of the prior art, the present invention is directed to a method of wellbore integrity pre-determination for predicting whether a wellbore integrity problem will occur in a particular time period for a certain hydrocarbon well.
The invention provides an integrity pre-judging method of a shaft, which comprises the following steps:
s1, establishing a big data sample according to the existing oil-gas well data;
S2, determining characteristics of a well to be prejudged;
s3, randomly generating k data subsets from the big data samples;
s4, calculating the information gain rate of the discrete attribute and the continuous attribute of the k data subsets;
s5, utilizing the information gain rate splitting subsets to establish the 1 st to the k decision tree;
s6, placing the characteristics of the well to be pre-judged in the kth decision tree, and voting to decide the shaft integrity of the well to be pre-judged in the nth year.
According to one embodiment of the invention, the big data sample comprises { X 1,X2,…X6,X7, C }, wherein: x 1 is well type, X 2 is well type, X 3 is whether a production sleeve is fully sealed or not, X 4 is production time, X 5 is reservoir temperature, X 6 is reservoir pressure, and X 7 is CO 2 content; c indicates whether the well bore of the well is complete.
According to one embodiment of the invention, determining the characteristics of the well to be prejudged comprises: determining whether the well type, the well type and the casing pipe of the well to be prejudged are fully sealed, and the reservoir temperature, the reservoir pressure and the CO 2 content.
According to one embodiment of the invention, step S3 comprises randomly selecting N samples from all N samples with substitution, randomly selecting m samples from all attributes of the samples to form a new data subset, and repeating the step k times to obtain k data subsets, wherein m is less than the number of all attributes.
According to one embodiment of the invention, the 1 st to k st decision trees are built based on a C4.5 algorithm.
According to one embodiment of the invention, when the information gain rate of the discrete attribute is calculated for the discrete attribute, firstly calculating the class entropy, then calculating the attribute entropy of each attribute, subtracting the class entropy from the attribute entropy to obtain the information gain of each attribute, calculating the split information measurement of each attribute, and finally calculating the information gain rate of each attribute.
According to one embodiment of the invention, for the continuous attribute, the continuous attribute is sorted from small to large, the midpoints of two adjacent values of the attribute are taken as bifurcation points to obtain two small subsets, and then the information gain rate of the continuous attribute is calculated according to the method for calculating the information gain rate of the discrete attribute.
According to one embodiment of the invention, when the decision tree is built, the ith data subset is utilized, the gain rates of all the calculated attributes are arranged from large to small, the attribute with the largest gain rate is selected as a splitting attribute to split the ith data subset, and the lower decision tree is built again in each small subset after splitting until all the child nodes are leaf nodes, and the ith decision tree is built.
According to one embodiment of the invention, step S6 comprises: and placing the well type, the well type and the production casing pipe of the well to be prejudged in the kth decision tree to make a decision to prejudge the integrity of the shaft from the production to the nth year.
According to one embodiment of the invention, k "yes" or "no" results are obtained when deciding, and the most obtained ticket is the final result.
The random forest algorithm based on the invention belongs to an integrated algorithm in data mining, and synthesizes the results of a plurality of decision trees, thereby having high accuracy; the result is not fitted excessively due to the introduction of randomness, the condition that leaf nodes cannot be found is avoided, and the model generalization capability is strong; is insensitive to outlier data and is not easily interfered by singular points. The considerations of the invention include well type, seal type, production time, reservoir temperature, pressure, CO 2 content, and broad coverage. The method has the advantages that the defects that the existing method is low in accuracy, easy to be interfered by abnormal data, inaccurate in pre-judgment caused by temperature and pressure of the reservoir in production time and the like are overcome.
Drawings
FIG. 1 is a flow chart of a method for integrity pre-determination of a wellbore according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the attached drawings, so that the objects, features and advantages of the present invention will be more clearly understood. It should be understood that the embodiments shown in the drawings are not intended to limit the scope of the invention, but rather are merely illustrative of the true spirit of the invention.
According to the method, the well type and the sealing type of the historical well are comprehensively considered, big data such as production time, reservoir temperature, pressure and CO 2 content are not involved in calculation in the whole process, and based on a random forest data mining algorithm, the results of a plurality of decision trees are integrated, so that the pre-judging precision is improved. Due to the introduction of production time, the calculation result of the invention can prompt the time of failure in the future and finally guide the practice.
The embodiment of the invention provides a shaft integrity pre-judging method based on a random forest algorithm, which is suitable for oil-gas fields with a well number scale and a shaft integrity problem, and is mainly used for pre-judging whether a certain oil-gas well has the shaft integrity problem in a specific time. The method is based on a random forest algorithm, and has high accuracy; the introduction of randomness makes the model have strong generalization capability and insensitive to outliers; factors considered include well type, seal type, and production time, reservoir temperature, pressure, CO 2 content. The method can overcome the defects of low accuracy, high cost for judging the failure unit, no consideration of factors such as production time and the like and weak instruction practice capability of the existing method.
In order to achieve the above object, the embodiments of the present invention adopt the following technical solutions: a shaft integrity prejudging method based on a random forest algorithm comprises the following steps:
1) Establishing a big data sample: counting the existing oil and gas well data to obtain N training samples { X 1,X2,…X6,X7, C }, wherein seven attribute values are as follows: x 1 is well type, X 2 is well type, X 3 is whether a production sleeve is fully sealed or not, X 4 is production time, X 5 is reservoir temperature, X 6 is reservoir pressure, and X 7 is CO 2 content; a conclusion value C indicates whether the well bore is complete, and is classified as "yes" or "no".
2) Determining characteristics of a well to be prejudged: the well type, casing pipe whether fully sealed, reservoir temperature, pressure and CO 2 content of the well to be prejudged are investigated and mastered, and the aim is to judge whether the well bore remains intact or not until the nth year of production of the well.
3) From all N samples, randomly selecting N samples (where there is repeated data, where the repeated data is combined to obtain N sample numbers), randomly selecting m (m < 7) from seven attributes to form a new data subset (where the dimension is n×m), and repeating the step k times to obtain k data subsets.
4) With the 1 st subset of data, a1 st decision tree is built based on the C4.5 algorithm. For discrete attributes such as well type, well type and sealing type, class entropy-Info (integrity) is calculated first, then attribute entropy-Info (attribute) of each attribute is calculated, information Gain-Gain (attribute) of each attribute is obtained by subtracting the class entropy-Info (attribute), split information measurement-SplitInfo (attribute) of each attribute is calculated, and finally information Gain rate-GainRate (attribute) of each attribute is calculated.
5) With the 1 st subset of data, a1 st decision tree is built based on the C4.5 algorithm. For continuous attributes such as production time, reservoir temperature, pressure and CO 2 content, sorting from small to large, taking the middle points of two adjacent values as bifurcation points to obtain two small subsets, and then calculating the information gain rate according to a discrete attribute method.
6) And (3) arranging the gain rates of all the calculated attributes from large to small by utilizing the 1 st data subset, selecting the attribute with the largest gain rate as a splitting attribute to split the data subset, and applying the steps 4-5 again in each small split subset until all the child nodes are leaf nodes (namely, the shaft is completely yes or no), so that the 1 st decision tree is grown.
7) Repeating steps 4 to 6 with the 2 nd to kth data subsets, and building the 2 nd to kth decision trees.
8) And (3) whether the well type, the well type and the production casing of the well to be prejudged are fully sealed or not is judged, and the production is carried out until the nth year, the reservoir temperature, the pressure and the CO 2 content are put into the k decision trees to carry out decision making, so that k 'yes' or 'no' results are obtained, and the most tickets are obtained as final results.
Among them, the C4.5 algorithm is a series of algorithms used in machine learning and data mining classification problems. Its goal is supervised learning: given a data set, each tuple therein can be described by a set of attribute values, each tuple belonging to a certain one of a mutually exclusive class. The goal of C4.5 is to find a mapping from attribute values to categories by learning, and this mapping can be used to classify new entities of unknown category. Such algorithms belong to the prior art and are not described in detail here.
In the above method, in step 4), the formula for calculating the total category entropy Info (integrity) and the attribute entropy Info (each attribute) is as follows:
wherein N is the total number of wells; x i j represents the number of samples for which the value of attribute Xi is j; x i j represents the number of samples of the complete shaft when the value of the attribute Xi is j; x i j is incomplete to indicate the number of incomplete samples of the well bore when the attribute Xi is given a value j.
Taking the X 2 well as an example, the Info (X 2) formula is as follows:
Information Gain of each attribute is calculated—gain (X i):
Gain (X i) =info (integrity) -Info (X i)
Calculating split information metrics for each attribute-SplitInfo (X i):
Wherein N is the total number of wells; x i j represents the number of samples of attribute Xi having a value j.
Taking the X 2 well as an example, the SplitInfo (X 2) formula is as follows:
finally, the information gain ratio of each attribute is calculated-GainRate (X i):
In the above method, in step 5), the continuous attributes such as production time, reservoir temperature, pressure and CO 2 content are all sorted from small to large, the midpoints of two adjacent values are taken as bifurcation points to obtain two sets, the continuous attributes are converted into discrete attributes in this way, and then the gain ratio of each continuous attribute is calculated according to the method of step 4.
In the above method, in step 6), the flag of the completion of 1 decision tree growth is set forth: and selecting the attribute with the maximum gain rate from the calculated gain rates as a splitting attribute to split the data subset, and applying a C4.5 algorithm to each small subset after splitting until all the child nodes are leaf nodes (namely, the well bore is completely yes or no).
In the above method, in step 7), it is illustrated that the sources of other k-1 decision trees in the random forest algorithm are similar to the 1 st decision tree.
In the above method, in step 8), after defining the characteristics of "well type, whether fully sealed, production time, reservoir temperature, pressure, and CO 2 content" of the well to be pre-determined, k results are obtained by using the k decision trees, and finally voting is performed to determine whether the well to be pre-determined keeps the well bore intact in the production time.
The random forest algorithm based on the invention belongs to an integrated algorithm in data mining, and synthesizes the results of a plurality of decision trees, thereby having high accuracy; the result is not fitted excessively due to the introduction of randomness, the condition that leaf nodes cannot be found is avoided, and the model generalization capability is strong; is insensitive to outlier data and is not easily interfered by singular points. The invention considers factors including well type, sealing type, production time, reservoir temperature, pressure and CO 2 content, and has wide coverage. The method has the advantages that the defects that the existing method is low in accuracy, easy to be interfered by abnormal data, inaccurate in pre-judgment is easily caused due to the fact that production time, reservoir temperature and pressure are not considered, and the like are overcome.
Examples
As shown in fig. 1, the wellbore integrity prejudging method based on the random forest algorithm provided by the invention comprises the following steps:
1) Establishing a big data sample: the existing oil and gas well data were counted to obtain 1500 training samples { X 1,X2,…X6,X7, C } as follows. Seven attribute values: x 1 is well type, X 2 is well type, X 3 is whether a production sleeve is fully sealed, X 4 is production time, X 5 is reservoir temperature, X 6 is reservoir pressure, and X 7 is CO 2 content; a conclusion value C indicates whether the well bore is complete, and is classified as "yes" or "no".
No. | X 1 well type | X 2 well | X 3 full seal | X 4 time | X 5 temperature | X 6 pressure | X 7CO2 content | C integrity |
1 | Vertical well | Oil well | Is that | 10 | 140 | 80 | 5% | Whether or not |
2 | Directional well | Oil well | Is that | 12 | 120 | 64 | 8% | Is that |
3 | Directional well | Gas well | Is that | 15 | 100 | 90 | 6% | Whether or not |
4 | Directional well | Gas well | Is that | 6 | 80 | 86 | 12% | Is that |
5 | Horizontal well | Gas well | Whether or not | 5 | 110 | 82 | 9% | Is that |
.. | .. | .. | .. | .. | .. | .. | ||
1500 | Vertical well | Oil well | Whether or not | 8 | 120 | 70 | 5% | Is that |
2) Determining characteristics of a well to be prejudged: judging whether the temperature of one reservoir is 110 ℃, the pressure of the reservoir is 70MPa, the content of CO 2 is 10%, and producing the casing fully sealed directional oil production well, wherein the well bore still keeps complete or not until 10 years of production.
3) From all 1500 samples, 1500 samples are randomly selected in a put-back way (repeated data are generated, 1000 samples are obtained after the repeated data are combined), and four of seven attributes are randomly selected to form a new data subset (the dimension of the new data subset is 1000 multiplied by 4). This step was repeated 9 times, resulting in 9 data subsets. Assuming 800 rows of "yes" and 200 rows of "no" for wellbore integrity in the 1 st data subset, the four randomly selected attributes are "well, fully casing, production time.
4) With the 1 st subset of data, a1 st decision tree is built based on the C4.5 algorithm. For discrete attributes such as well type, seal type. The specific steps are as follows, firstly calculating category entropy-Info (integrity), then calculating attribute entropy-Info (attribute) of each attribute, subtracting the category entropy-Info (attribute) from the attribute entropy-Info (attribute) to obtain information Gain-Gain (attribute) of each attribute, calculating split information measurement-SplitInfo (attribute) of each attribute, and finally calculating information Gain rate-GainRate (attribute) of each attribute.
Assuming that the well ratio in data subset 1 is straight: directional well: horizontal well = 100:700:200, integrity is as follows.
Well type | Complete/oral | Incomplete/oral |
Vertical well 100 ports | 50 | 50 |
Directional well 700 port | 600 | 100 |
Horizontal well 200 ports | 150 | 50 |
Assume that the well ratio in data subset 1 is: gas well = 700:300, integrity is as follows.
Well fastener | Complete/oral | Incomplete/oral |
Oil production well 700 port | 600 | 100 |
Gas production well 300 mouth | 200 | 100 |
Assume that the set condition in data subset 1 is full set cementing: non-fully sealed well cementation = 800:200, integrity is as follows.
Sealing condition | Complete/oral | Incomplete/oral |
Full-sealing 800 ports | 750 | 50 |
Unsealed 200 mouths | 200 | 150 |
The total category entropy-Info (integrity), and the entropy of each attribute-Info (each attribute) are calculated separately:
information Gain of each attribute, gain (each attribute), is calculated:
Gain (well) =info (complete) -Info (well) =0.722-0.676=0.046
Gain (well) =info (complete) -Info (well) =0.722-0.689=0.032
Gain (seal) =info (complete) -Info (seal) =0.722-0.432=0.290
Calculating the split information metric for each attribute-SplitInfo (each attribute):
Finally, the information gain ratio of each attribute is calculated, namely GainRate (each attribute):
5) With the 1 st subset of data, a 1 st decision tree is built based on the C4.5 algorithm. For continuous attributes (similar to reservoir temperature, pressure and CO 2 content) such as production time, the two values are sorted from small to large, the midpoints of two adjacent values are taken as bifurcation points to obtain two small subsets, and then the information gain rate is calculated according to a discrete attribute method.
For example, for the production time attribute, the production time is first sorted in an increasing order, i.e. 5 years, 6 years, 8 years, 10 years, etc., two values are taken that are adjacent, for example, the midpoint between 5 years and 6 years is 5.5 years, and other midpoints are available in the same way, such as 7 years, 9 years, etc. Each midpoint can re-discretize the data subset 1 into two small subsets, e.g. using midpoint "9 years" can be divided into two small subsets with production time < = 9 years and production time >9 years.
Assume that the production time in data subset 1 < = 9 years: the number of wells produced for >9 years = 100:900, the integrity is as follows:
Production time | Complete/oral | Incomplete/oral |
Production time < = 9 years: 100 mouths | 95 | 5 |
Production time >9 years: 900 ports | 705 | 195 |
The information gain ratio of the production time attribute is calculated as follows:
Gain (time=9) =info (complete) -Info (time=9) =0.72-0.71=0.015
And calculating other intermediate points in turn to obtain the information gain rate of all production time.
6) And arranging the gain rates from large to small, selecting the attribute with the maximum gain rate as a splitting attribute to split the data subset 1 for the first time, and applying the C4.5 algorithm again in each small subset after splitting until all the small subsets are split into leaf nodes (namely, the well bore is completely yes or no), so that the 1 st decision tree is grown.
7) Repeating steps 4 to 6 with randomly generated 2 nd to 9 th data subsets, and building 2 nd to 9 th decision trees.
8) And (3) producing the directional oil well with fully sealed casing pipe, wherein the characteristic of the well to be prejudged is that the reservoir temperature is 110 ℃, the pressure is 70MPa, the CO 2 content is 10%, and the directional oil well is produced by putting the directional oil well into 9 decision trees above for decision making in 10 th year, so that 9 conclusions of yes or no are obtained, and the conclusion with the most tickets is the final result.
Compared with the prior art, the method comprises the following steps: the method for judging the integrity of the well shaft comprises the steps of decomposing the well shaft into a plurality of evaluation units for respective evaluation, such as a tubular column, a cement sheath, a wellhead device and the like, wherein risk factors of the units are represented by products of failure frequency P and failure result severity S of the units, and the risk factors of the units are obtained and then comprehensively weighted to form the risk degree of the well. Wherein the frequency P is obtained through a failure accident tree, and the value is between 0 and 1; the severity S is generally assigned based on subjective experience, with no fixed range.
The existing method has the following defects: ① The effect of production time is not considered; ② The influence of reservoir temperature and pressure conditions is not considered; ③ The value of the severity S is too subjective; ④ The contributions of several evaluation units to the integrity failure are difficult to define, for example, the wellhead pressure of a well is found, the reasons are that the well is corroded and perforated by a casing, a cement sheath is cracked, or both, the contributions of which unit (or units) are defined are that the well is shut in, a tubular column is required to be produced, the integrity of each unit is checked one by one, the cost is huge, and the reasons cannot be found; ⑤ The existing evaluation method has little significance on guiding practice, can not give time for failure in the future, and only quantifies the risk of the failure of the well.
The invention considers factors including well type, sealing type, production time, reservoir temperature, pressure, CO 2 content, has wide coverage and no subjective assignment item, and integrates the conclusion of a plurality of decision trees based on random forest algorithm, so that the accuracy of the pre-judging result is high; the result is not fitted excessively due to the introduction of randomness, and the condition that leaf nodes cannot be found is avoided, so that the model generalization capability is strong; is insensitive to outlier data and is not easily interfered by singular points. According to the method, production stopping is not needed for checking each evaluation unit, and a big data sample can be built by only observing whether the wellhead is pressurized or not, so that the economic cost is zero. Moreover, due to the introduction of the influence of production time, the pre-judgment result of the invention can prompt the time of the well failure in the future, and finally guide the practice.
The foregoing description of the exemplary embodiments of the invention is not intended to limit the scope of the invention, but rather to limit the scope of the invention. Moreover, it should be noted that the components of the present invention are not limited to the above-mentioned overall application, and each technical feature described in the specification of the present invention may be selected to be used alone or in combination according to actual needs, so that other combinations and specific applications related to the present invention are naturally covered by the present invention.
Claims (8)
1. A method of integrity pre-determining of a wellbore, the method comprising the steps of:
s1, establishing a big data sample according to the existing oil-gas well data;
S2, determining characteristics of a well to be prejudged;
s3, randomly generating k data subsets from the big data samples;
s4, calculating the information gain rate of the discrete attribute and the continuous attribute of the k data subsets;
s5, utilizing the information gain rate splitting subsets to establish the 1 st to the k decision tree;
S6, placing the characteristics of the well to be prejudged in the kth decision tree, and voting to decide the integrity of the shaft of the well to be prejudged in the nth year;
The big data sample includes { X1, X2, … X6, X7, C }, wherein: x1 is well type, X2 is well type, X3 is whether the production casing is fully sealed or not, X4 is production time, X5 is reservoir temperature, X6 is reservoir pressure, and X7 is CO2 content; c represents whether the well bore of the well is complete;
the step S6 comprises the following steps: and placing the well type, the well type and the production casing pipe of the well to be prejudged in the kth decision tree to make a decision to prejudge the integrity of the shaft in the production to the nth year.
2. The method of claim 1, wherein determining characteristics of the well to be predicted comprises: and determining whether the well type, the well type and the casing pipe of the well to be prejudged are fully sealed, and the reservoir temperature, the pressure and the CO2 content.
3. The method of claim 1, wherein step S3 comprises randomly selecting N samples from all N samples, randomly selecting m samples from all attributes of the samples to form a new data subset, and repeating the step k times to obtain k data subsets, wherein m is the number of all attributes.
4. The method of claim 1, wherein the 1 st to k-th decision trees are built based on a C4.5 algorithm.
5. The method of claim 1, wherein for the discrete attributes, when calculating the information gain rate of the discrete attributes, calculating the class entropy, calculating the attribute entropy of each attribute, subtracting the class entropy from the attribute entropy to obtain the information gain of each attribute, calculating the split information metric of each attribute, and finally calculating the information gain rate of each attribute.
6. The method of claim 1, wherein for the continuous attribute, the information gain ratio of the continuous attribute is calculated by sorting from small to large, taking the midpoints of two adjacent values of the attribute as bifurcation points to obtain two small subsets, and then calculating the information gain ratio of the discrete attribute.
7. The method according to claim 1, wherein when the decision tree is built, the i-th data subset is used to arrange the gain rates of all the calculated attributes from large to small, the attribute with the largest gain rate is selected as a splitting attribute to split the i-th data subset, and the lower decision tree is built again in each small subset after splitting until all the child nodes are leaf nodes, and the i-th decision tree is built.
8. The method of claim 1, wherein k "yes" or "no" results are obtained when deciding, and the most obtained ticket is the final result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110269788.6A CN112966023B (en) | 2021-03-12 | 2021-03-12 | Integrity prejudging method for shaft |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110269788.6A CN112966023B (en) | 2021-03-12 | 2021-03-12 | Integrity prejudging method for shaft |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112966023A CN112966023A (en) | 2021-06-15 |
CN112966023B true CN112966023B (en) | 2024-06-14 |
Family
ID=76277612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110269788.6A Active CN112966023B (en) | 2021-03-12 | 2021-03-12 | Integrity prejudging method for shaft |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112966023B (en) |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009029135A1 (en) * | 2007-08-24 | 2009-03-05 | Exxonmobil Upstream Research Company | Method for predicting well reliability by computer simulation |
EA025004B1 (en) * | 2010-06-18 | 2016-11-30 | Лэндмарк Грэфикс Корпорейшн | Computer-implemented method for wellbore optimization and program carrier device having computer executable instructions for optimization of a wellbore |
CN108733966A (en) * | 2017-04-14 | 2018-11-02 | 国网重庆市电力公司 | A kind of multidimensional electric energy meter field thermodynamic state verification method based on decision woodlot |
CN109751038A (en) * | 2017-11-01 | 2019-05-14 | 中国石油化工股份有限公司 | A kind of method of quantitative assessment oil/gas well wellbore integrity |
CN108846259B (en) * | 2018-04-26 | 2020-10-23 | 河南师范大学 | Gene classification method and system based on clustering and random forest algorithm |
CN110717524B (en) * | 2019-09-20 | 2021-04-06 | 浙江工业大学 | Method for predicting thermal comfort of old people |
CN112329862A (en) * | 2020-11-09 | 2021-02-05 | 杭州安恒信息技术股份有限公司 | Decision tree-based anti-money laundering method and system |
-
2021
- 2021-03-12 CN CN202110269788.6A patent/CN112966023B/en active Active
Non-Patent Citations (1)
Title |
---|
基于随机森林机器学习的井筒完整性失效预判研究;袁俊亮等;《科技通报》;20220228;55-60 * |
Also Published As
Publication number | Publication date |
---|---|
CN112966023A (en) | 2021-06-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112901137B (en) | Deep well drilling mechanical drilling speed prediction method based on deep neural network Sequential model | |
CN112529341B (en) | Drilling well leakage probability prediction method based on naive Bayesian algorithm | |
CN108733632B (en) | Well selection evaluation method for repeated fracturing of medium-low permeability high water-containing oil reservoir | |
Jianxing et al. | Risk assessment of submarine pipelines using modified FMEA approach based on cloud model and extended VIKOR method | |
CN112610903A (en) | Water supply pipe network leakage positioning method based on deep neural network model | |
CN107387051B (en) | Repeated fracturing well selection method for multi-stage fractured horizontal well with low-permeability heterogeneous oil reservoir | |
CN105205329A (en) | Comprehensive evaluation method for dam safety | |
CN109522962B (en) | Chemical plant safety quantitative evaluation method | |
CN107862324B (en) | MWSPCA-based CBR prediction model intelligent early warning method | |
CN114372693B (en) | Transformer fault diagnosis method based on cloud model and improved DS evidence theory | |
CN106228190A (en) | Decision tree method of discrimination for resident's exception water | |
CN111414692B (en) | Pressure gauge verification table reliability assessment method based on Bayesian correction model | |
CN115471097A (en) | Data-driven underground local area safety state evaluation method | |
Tripathy et al. | Explaining Anomalies in Industrial Multivariate Time-series Data with the help of eXplainable AI | |
CN101771584B (en) | Network abnormal flow detection method | |
CN116934262A (en) | Construction safety supervision system and method based on artificial intelligence | |
Qin et al. | Evaluation of goaf stability based on transfer learning theory of artificial intelligence | |
CN112966023B (en) | Integrity prejudging method for shaft | |
Su et al. | Prediction of drilling leakage locations based on optimized neural networks and the standard random forest method | |
CN116825253B (en) | Method for establishing hot rolled strip steel mechanical property prediction model based on feature selection | |
CN109656904A (en) | A kind of case risk checking method and system | |
CN112818557A (en) | Well control system safety assessment method and system based on fuzzy comprehensive analysis | |
CN110927478B (en) | Method and system for determining state of transformer equipment of power system | |
CN116384780A (en) | Fire-fighting system safety degree judging method | |
CN114818927B (en) | Data-driven equipment corrosion prediction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |