CN112966023B - Integrity prejudging method for shaft - Google Patents

Integrity prejudging method for shaft Download PDF

Info

Publication number
CN112966023B
CN112966023B CN202110269788.6A CN202110269788A CN112966023B CN 112966023 B CN112966023 B CN 112966023B CN 202110269788 A CN202110269788 A CN 202110269788A CN 112966023 B CN112966023 B CN 112966023B
Authority
CN
China
Prior art keywords
well
attribute
data
integrity
calculating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110269788.6A
Other languages
Chinese (zh)
Other versions
CN112966023A (en
Inventor
袁俊亮
殷志明
范白涛
李中
幸雪松
谢仁军
周长所
罗洪斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Research Center of CNOOC China Ltd
CNOOC China Ltd
Original Assignee
Beijing Research Center of CNOOC China Ltd
CNOOC China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Research Center of CNOOC China Ltd, CNOOC China Ltd filed Critical Beijing Research Center of CNOOC China Ltd
Priority to CN202110269788.6A priority Critical patent/CN112966023B/en
Publication of CN112966023A publication Critical patent/CN112966023A/en
Application granted granted Critical
Publication of CN112966023B publication Critical patent/CN112966023B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Mining & Mineral Resources (AREA)
  • Marketing (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Agronomy & Crop Science (AREA)
  • Animal Husbandry (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Testing Of Devices, Machine Parts, Or Other Structures Thereof (AREA)
  • Testing Or Calibration Of Command Recording Devices (AREA)

Abstract

The invention relates to a method for predicting the integrity of a shaft, which comprises the following steps: s1, establishing a big data sample according to the existing oil-gas well data; s2, determining characteristics of a well to be prejudged; s3, randomly generating k data subsets from the big data samples; s4, calculating the information gain rate of the discrete attribute and the continuous attribute of the k data subsets; s5, utilizing the information gain rate splitting subsets to establish the 1 st to the k decision tree; s6, placing the characteristics of the well to be pre-judged in the kth decision tree, and voting to decide the shaft integrity of the well to be pre-judged in the nth year. The method overcomes the defects that the existing method is low in accuracy, easy to be interfered by abnormal data, inaccurate in pre-judgment caused by factors such as reservoir temperature and pressure in production time and the like.

Description

Integrity prejudging method for shaft
Technical Field
The invention relates to the technical field of oil and gas wells, in particular to an integrity pre-judging method of a shaft.
Background
The failure of the integrity of a shaft is one of main risks in the production process of a high-temperature high-pressure oil-gas well, and is characterized in that the phenomenon of pressure in an annulus with original pressure being zero begins to appear, and once oil gas leakage occurs, the safety of personnel and equipment is seriously threatened. Currently, the problem of well integrity failure is more and more prominent in high-temperature high-pressure oil-gas wells, such as 16000 production wells in OCS (OCS) area of gulf of Mexico, and annular pressure phenomenon occurs in 43%.
Existing studies on wellbore integrity mainly focus on risk assessment, namely quantitatively calculating the integrity risk of a target well. The method comprises the steps of decomposing a well bore into a plurality of evaluation units, respectively calculating risk factors, such as a tubular column, a cement collar, a wellhead device and the like, wherein the risk factor of each unit is represented by the product of the failure frequency of the unit and the severity of the failure result, and comprehensively weighting after the risk factors of each unit are obtained to form the risk degree of the well. However, the existing method does not consider the influence of various production factors, the value of the severity of the failure result is too subjective, and the accuracy of the evaluation result is not high; on the other hand, the contribution of each unit to the whole well integrity failure is difficult to define; in addition, the conventional evaluation method has little significance on guidance of practice, and cannot predict the future failure years.
Accordingly, there is a need for a more comprehensive, objective, and scientific method of predicting wellbore integrity.
Disclosure of Invention
In view of one or more of the above-mentioned deficiencies of the prior art, the present invention is directed to a method of wellbore integrity pre-determination for predicting whether a wellbore integrity problem will occur in a particular time period for a certain hydrocarbon well.
The invention provides an integrity pre-judging method of a shaft, which comprises the following steps:
s1, establishing a big data sample according to the existing oil-gas well data;
S2, determining characteristics of a well to be prejudged;
s3, randomly generating k data subsets from the big data samples;
s4, calculating the information gain rate of the discrete attribute and the continuous attribute of the k data subsets;
s5, utilizing the information gain rate splitting subsets to establish the 1 st to the k decision tree;
s6, placing the characteristics of the well to be pre-judged in the kth decision tree, and voting to decide the shaft integrity of the well to be pre-judged in the nth year.
According to one embodiment of the invention, the big data sample comprises { X 1,X2,…X6,X7, C }, wherein: x 1 is well type, X 2 is well type, X 3 is whether a production sleeve is fully sealed or not, X 4 is production time, X 5 is reservoir temperature, X 6 is reservoir pressure, and X 7 is CO 2 content; c indicates whether the well bore of the well is complete.
According to one embodiment of the invention, determining the characteristics of the well to be prejudged comprises: determining whether the well type, the well type and the casing pipe of the well to be prejudged are fully sealed, and the reservoir temperature, the reservoir pressure and the CO 2 content.
According to one embodiment of the invention, step S3 comprises randomly selecting N samples from all N samples with substitution, randomly selecting m samples from all attributes of the samples to form a new data subset, and repeating the step k times to obtain k data subsets, wherein m is less than the number of all attributes.
According to one embodiment of the invention, the 1 st to k st decision trees are built based on a C4.5 algorithm.
According to one embodiment of the invention, when the information gain rate of the discrete attribute is calculated for the discrete attribute, firstly calculating the class entropy, then calculating the attribute entropy of each attribute, subtracting the class entropy from the attribute entropy to obtain the information gain of each attribute, calculating the split information measurement of each attribute, and finally calculating the information gain rate of each attribute.
According to one embodiment of the invention, for the continuous attribute, the continuous attribute is sorted from small to large, the midpoints of two adjacent values of the attribute are taken as bifurcation points to obtain two small subsets, and then the information gain rate of the continuous attribute is calculated according to the method for calculating the information gain rate of the discrete attribute.
According to one embodiment of the invention, when the decision tree is built, the ith data subset is utilized, the gain rates of all the calculated attributes are arranged from large to small, the attribute with the largest gain rate is selected as a splitting attribute to split the ith data subset, and the lower decision tree is built again in each small subset after splitting until all the child nodes are leaf nodes, and the ith decision tree is built.
According to one embodiment of the invention, step S6 comprises: and placing the well type, the well type and the production casing pipe of the well to be prejudged in the kth decision tree to make a decision to prejudge the integrity of the shaft from the production to the nth year.
According to one embodiment of the invention, k "yes" or "no" results are obtained when deciding, and the most obtained ticket is the final result.
The random forest algorithm based on the invention belongs to an integrated algorithm in data mining, and synthesizes the results of a plurality of decision trees, thereby having high accuracy; the result is not fitted excessively due to the introduction of randomness, the condition that leaf nodes cannot be found is avoided, and the model generalization capability is strong; is insensitive to outlier data and is not easily interfered by singular points. The considerations of the invention include well type, seal type, production time, reservoir temperature, pressure, CO 2 content, and broad coverage. The method has the advantages that the defects that the existing method is low in accuracy, easy to be interfered by abnormal data, inaccurate in pre-judgment caused by temperature and pressure of the reservoir in production time and the like are overcome.
Drawings
FIG. 1 is a flow chart of a method for integrity pre-determination of a wellbore according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in detail below with reference to the attached drawings, so that the objects, features and advantages of the present invention will be more clearly understood. It should be understood that the embodiments shown in the drawings are not intended to limit the scope of the invention, but rather are merely illustrative of the true spirit of the invention.
According to the method, the well type and the sealing type of the historical well are comprehensively considered, big data such as production time, reservoir temperature, pressure and CO 2 content are not involved in calculation in the whole process, and based on a random forest data mining algorithm, the results of a plurality of decision trees are integrated, so that the pre-judging precision is improved. Due to the introduction of production time, the calculation result of the invention can prompt the time of failure in the future and finally guide the practice.
The embodiment of the invention provides a shaft integrity pre-judging method based on a random forest algorithm, which is suitable for oil-gas fields with a well number scale and a shaft integrity problem, and is mainly used for pre-judging whether a certain oil-gas well has the shaft integrity problem in a specific time. The method is based on a random forest algorithm, and has high accuracy; the introduction of randomness makes the model have strong generalization capability and insensitive to outliers; factors considered include well type, seal type, and production time, reservoir temperature, pressure, CO 2 content. The method can overcome the defects of low accuracy, high cost for judging the failure unit, no consideration of factors such as production time and the like and weak instruction practice capability of the existing method.
In order to achieve the above object, the embodiments of the present invention adopt the following technical solutions: a shaft integrity prejudging method based on a random forest algorithm comprises the following steps:
1) Establishing a big data sample: counting the existing oil and gas well data to obtain N training samples { X 1,X2,…X6,X7, C }, wherein seven attribute values are as follows: x 1 is well type, X 2 is well type, X 3 is whether a production sleeve is fully sealed or not, X 4 is production time, X 5 is reservoir temperature, X 6 is reservoir pressure, and X 7 is CO 2 content; a conclusion value C indicates whether the well bore is complete, and is classified as "yes" or "no".
2) Determining characteristics of a well to be prejudged: the well type, casing pipe whether fully sealed, reservoir temperature, pressure and CO 2 content of the well to be prejudged are investigated and mastered, and the aim is to judge whether the well bore remains intact or not until the nth year of production of the well.
3) From all N samples, randomly selecting N samples (where there is repeated data, where the repeated data is combined to obtain N sample numbers), randomly selecting m (m < 7) from seven attributes to form a new data subset (where the dimension is n×m), and repeating the step k times to obtain k data subsets.
4) With the 1 st subset of data, a1 st decision tree is built based on the C4.5 algorithm. For discrete attributes such as well type, well type and sealing type, class entropy-Info (integrity) is calculated first, then attribute entropy-Info (attribute) of each attribute is calculated, information Gain-Gain (attribute) of each attribute is obtained by subtracting the class entropy-Info (attribute), split information measurement-SplitInfo (attribute) of each attribute is calculated, and finally information Gain rate-GainRate (attribute) of each attribute is calculated.
5) With the 1 st subset of data, a1 st decision tree is built based on the C4.5 algorithm. For continuous attributes such as production time, reservoir temperature, pressure and CO 2 content, sorting from small to large, taking the middle points of two adjacent values as bifurcation points to obtain two small subsets, and then calculating the information gain rate according to a discrete attribute method.
6) And (3) arranging the gain rates of all the calculated attributes from large to small by utilizing the 1 st data subset, selecting the attribute with the largest gain rate as a splitting attribute to split the data subset, and applying the steps 4-5 again in each small split subset until all the child nodes are leaf nodes (namely, the shaft is completely yes or no), so that the 1 st decision tree is grown.
7) Repeating steps 4 to 6 with the 2 nd to kth data subsets, and building the 2 nd to kth decision trees.
8) And (3) whether the well type, the well type and the production casing of the well to be prejudged are fully sealed or not is judged, and the production is carried out until the nth year, the reservoir temperature, the pressure and the CO 2 content are put into the k decision trees to carry out decision making, so that k 'yes' or 'no' results are obtained, and the most tickets are obtained as final results.
Among them, the C4.5 algorithm is a series of algorithms used in machine learning and data mining classification problems. Its goal is supervised learning: given a data set, each tuple therein can be described by a set of attribute values, each tuple belonging to a certain one of a mutually exclusive class. The goal of C4.5 is to find a mapping from attribute values to categories by learning, and this mapping can be used to classify new entities of unknown category. Such algorithms belong to the prior art and are not described in detail here.
In the above method, in step 4), the formula for calculating the total category entropy Info (integrity) and the attribute entropy Info (each attribute) is as follows:
wherein N is the total number of wells; x i j represents the number of samples for which the value of attribute Xi is j; x i j represents the number of samples of the complete shaft when the value of the attribute Xi is j; x i j is incomplete to indicate the number of incomplete samples of the well bore when the attribute Xi is given a value j.
Taking the X 2 well as an example, the Info (X 2) formula is as follows:
Information Gain of each attribute is calculated—gain (X i):
Gain (X i) =info (integrity) -Info (X i)
Calculating split information metrics for each attribute-SplitInfo (X i):
Wherein N is the total number of wells; x i j represents the number of samples of attribute Xi having a value j.
Taking the X 2 well as an example, the SplitInfo (X 2) formula is as follows:
finally, the information gain ratio of each attribute is calculated-GainRate (X i):
In the above method, in step 5), the continuous attributes such as production time, reservoir temperature, pressure and CO 2 content are all sorted from small to large, the midpoints of two adjacent values are taken as bifurcation points to obtain two sets, the continuous attributes are converted into discrete attributes in this way, and then the gain ratio of each continuous attribute is calculated according to the method of step 4.
In the above method, in step 6), the flag of the completion of 1 decision tree growth is set forth: and selecting the attribute with the maximum gain rate from the calculated gain rates as a splitting attribute to split the data subset, and applying a C4.5 algorithm to each small subset after splitting until all the child nodes are leaf nodes (namely, the well bore is completely yes or no).
In the above method, in step 7), it is illustrated that the sources of other k-1 decision trees in the random forest algorithm are similar to the 1 st decision tree.
In the above method, in step 8), after defining the characteristics of "well type, whether fully sealed, production time, reservoir temperature, pressure, and CO 2 content" of the well to be pre-determined, k results are obtained by using the k decision trees, and finally voting is performed to determine whether the well to be pre-determined keeps the well bore intact in the production time.
The random forest algorithm based on the invention belongs to an integrated algorithm in data mining, and synthesizes the results of a plurality of decision trees, thereby having high accuracy; the result is not fitted excessively due to the introduction of randomness, the condition that leaf nodes cannot be found is avoided, and the model generalization capability is strong; is insensitive to outlier data and is not easily interfered by singular points. The invention considers factors including well type, sealing type, production time, reservoir temperature, pressure and CO 2 content, and has wide coverage. The method has the advantages that the defects that the existing method is low in accuracy, easy to be interfered by abnormal data, inaccurate in pre-judgment is easily caused due to the fact that production time, reservoir temperature and pressure are not considered, and the like are overcome.
Examples
As shown in fig. 1, the wellbore integrity prejudging method based on the random forest algorithm provided by the invention comprises the following steps:
1) Establishing a big data sample: the existing oil and gas well data were counted to obtain 1500 training samples { X 1,X2,…X6,X7, C } as follows. Seven attribute values: x 1 is well type, X 2 is well type, X 3 is whether a production sleeve is fully sealed, X 4 is production time, X 5 is reservoir temperature, X 6 is reservoir pressure, and X 7 is CO 2 content; a conclusion value C indicates whether the well bore is complete, and is classified as "yes" or "no".
No. X 1 well type X 2 well X 3 full seal X 4 time X 5 temperature X 6 pressure X 7CO2 content C integrity
1 Vertical well Oil well Is that 10 140 80 5% Whether or not
2 Directional well Oil well Is that 12 120 64 8% Is that
3 Directional well Gas well Is that 15 100 90 6% Whether or not
4 Directional well Gas well Is that 6 80 86 12% Is that
5 Horizontal well Gas well Whether or not 5 110 82 9% Is that
.. .. .. .. .. .. ..
1500 Vertical well Oil well Whether or not 8 120 70 5% Is that
2) Determining characteristics of a well to be prejudged: judging whether the temperature of one reservoir is 110 ℃, the pressure of the reservoir is 70MPa, the content of CO 2 is 10%, and producing the casing fully sealed directional oil production well, wherein the well bore still keeps complete or not until 10 years of production.
3) From all 1500 samples, 1500 samples are randomly selected in a put-back way (repeated data are generated, 1000 samples are obtained after the repeated data are combined), and four of seven attributes are randomly selected to form a new data subset (the dimension of the new data subset is 1000 multiplied by 4). This step was repeated 9 times, resulting in 9 data subsets. Assuming 800 rows of "yes" and 200 rows of "no" for wellbore integrity in the 1 st data subset, the four randomly selected attributes are "well, fully casing, production time.
4) With the 1 st subset of data, a1 st decision tree is built based on the C4.5 algorithm. For discrete attributes such as well type, seal type. The specific steps are as follows, firstly calculating category entropy-Info (integrity), then calculating attribute entropy-Info (attribute) of each attribute, subtracting the category entropy-Info (attribute) from the attribute entropy-Info (attribute) to obtain information Gain-Gain (attribute) of each attribute, calculating split information measurement-SplitInfo (attribute) of each attribute, and finally calculating information Gain rate-GainRate (attribute) of each attribute.
Assuming that the well ratio in data subset 1 is straight: directional well: horizontal well = 100:700:200, integrity is as follows.
Well type Complete/oral Incomplete/oral
Vertical well 100 ports 50 50
Directional well 700 port 600 100
Horizontal well 200 ports 150 50
Assume that the well ratio in data subset 1 is: gas well = 700:300, integrity is as follows.
Well fastener Complete/oral Incomplete/oral
Oil production well 700 port 600 100
Gas production well 300 mouth 200 100
Assume that the set condition in data subset 1 is full set cementing: non-fully sealed well cementation = 800:200, integrity is as follows.
Sealing condition Complete/oral Incomplete/oral
Full-sealing 800 ports 750 50
Unsealed 200 mouths 200 150
The total category entropy-Info (integrity), and the entropy of each attribute-Info (each attribute) are calculated separately:
information Gain of each attribute, gain (each attribute), is calculated:
Gain (well) =info (complete) -Info (well) =0.722-0.676=0.046
Gain (well) =info (complete) -Info (well) =0.722-0.689=0.032
Gain (seal) =info (complete) -Info (seal) =0.722-0.432=0.290
Calculating the split information metric for each attribute-SplitInfo (each attribute):
Finally, the information gain ratio of each attribute is calculated, namely GainRate (each attribute):
5) With the 1 st subset of data, a 1 st decision tree is built based on the C4.5 algorithm. For continuous attributes (similar to reservoir temperature, pressure and CO 2 content) such as production time, the two values are sorted from small to large, the midpoints of two adjacent values are taken as bifurcation points to obtain two small subsets, and then the information gain rate is calculated according to a discrete attribute method.
For example, for the production time attribute, the production time is first sorted in an increasing order, i.e. 5 years, 6 years, 8 years, 10 years, etc., two values are taken that are adjacent, for example, the midpoint between 5 years and 6 years is 5.5 years, and other midpoints are available in the same way, such as 7 years, 9 years, etc. Each midpoint can re-discretize the data subset 1 into two small subsets, e.g. using midpoint "9 years" can be divided into two small subsets with production time < = 9 years and production time >9 years.
Assume that the production time in data subset 1 < = 9 years: the number of wells produced for >9 years = 100:900, the integrity is as follows:
Production time Complete/oral Incomplete/oral
Production time < = 9 years: 100 mouths 95 5
Production time >9 years: 900 ports 705 195
The information gain ratio of the production time attribute is calculated as follows:
Gain (time=9) =info (complete) -Info (time=9) =0.72-0.71=0.015
And calculating other intermediate points in turn to obtain the information gain rate of all production time.
6) And arranging the gain rates from large to small, selecting the attribute with the maximum gain rate as a splitting attribute to split the data subset 1 for the first time, and applying the C4.5 algorithm again in each small subset after splitting until all the small subsets are split into leaf nodes (namely, the well bore is completely yes or no), so that the 1 st decision tree is grown.
7) Repeating steps 4 to 6 with randomly generated 2 nd to 9 th data subsets, and building 2 nd to 9 th decision trees.
8) And (3) producing the directional oil well with fully sealed casing pipe, wherein the characteristic of the well to be prejudged is that the reservoir temperature is 110 ℃, the pressure is 70MPa, the CO 2 content is 10%, and the directional oil well is produced by putting the directional oil well into 9 decision trees above for decision making in 10 th year, so that 9 conclusions of yes or no are obtained, and the conclusion with the most tickets is the final result.
Compared with the prior art, the method comprises the following steps: the method for judging the integrity of the well shaft comprises the steps of decomposing the well shaft into a plurality of evaluation units for respective evaluation, such as a tubular column, a cement sheath, a wellhead device and the like, wherein risk factors of the units are represented by products of failure frequency P and failure result severity S of the units, and the risk factors of the units are obtained and then comprehensively weighted to form the risk degree of the well. Wherein the frequency P is obtained through a failure accident tree, and the value is between 0 and 1; the severity S is generally assigned based on subjective experience, with no fixed range.
The existing method has the following defects: ① The effect of production time is not considered; ② The influence of reservoir temperature and pressure conditions is not considered; ③ The value of the severity S is too subjective; ④ The contributions of several evaluation units to the integrity failure are difficult to define, for example, the wellhead pressure of a well is found, the reasons are that the well is corroded and perforated by a casing, a cement sheath is cracked, or both, the contributions of which unit (or units) are defined are that the well is shut in, a tubular column is required to be produced, the integrity of each unit is checked one by one, the cost is huge, and the reasons cannot be found; ⑤ The existing evaluation method has little significance on guiding practice, can not give time for failure in the future, and only quantifies the risk of the failure of the well.
The invention considers factors including well type, sealing type, production time, reservoir temperature, pressure, CO 2 content, has wide coverage and no subjective assignment item, and integrates the conclusion of a plurality of decision trees based on random forest algorithm, so that the accuracy of the pre-judging result is high; the result is not fitted excessively due to the introduction of randomness, and the condition that leaf nodes cannot be found is avoided, so that the model generalization capability is strong; is insensitive to outlier data and is not easily interfered by singular points. According to the method, production stopping is not needed for checking each evaluation unit, and a big data sample can be built by only observing whether the wellhead is pressurized or not, so that the economic cost is zero. Moreover, due to the introduction of the influence of production time, the pre-judgment result of the invention can prompt the time of the well failure in the future, and finally guide the practice.
The foregoing description of the exemplary embodiments of the invention is not intended to limit the scope of the invention, but rather to limit the scope of the invention. Moreover, it should be noted that the components of the present invention are not limited to the above-mentioned overall application, and each technical feature described in the specification of the present invention may be selected to be used alone or in combination according to actual needs, so that other combinations and specific applications related to the present invention are naturally covered by the present invention.

Claims (8)

1. A method of integrity pre-determining of a wellbore, the method comprising the steps of:
s1, establishing a big data sample according to the existing oil-gas well data;
S2, determining characteristics of a well to be prejudged;
s3, randomly generating k data subsets from the big data samples;
s4, calculating the information gain rate of the discrete attribute and the continuous attribute of the k data subsets;
s5, utilizing the information gain rate splitting subsets to establish the 1 st to the k decision tree;
S6, placing the characteristics of the well to be prejudged in the kth decision tree, and voting to decide the integrity of the shaft of the well to be prejudged in the nth year;
The big data sample includes { X1, X2, … X6, X7, C }, wherein: x1 is well type, X2 is well type, X3 is whether the production casing is fully sealed or not, X4 is production time, X5 is reservoir temperature, X6 is reservoir pressure, and X7 is CO2 content; c represents whether the well bore of the well is complete;
the step S6 comprises the following steps: and placing the well type, the well type and the production casing pipe of the well to be prejudged in the kth decision tree to make a decision to prejudge the integrity of the shaft in the production to the nth year.
2. The method of claim 1, wherein determining characteristics of the well to be predicted comprises: and determining whether the well type, the well type and the casing pipe of the well to be prejudged are fully sealed, and the reservoir temperature, the pressure and the CO2 content.
3. The method of claim 1, wherein step S3 comprises randomly selecting N samples from all N samples, randomly selecting m samples from all attributes of the samples to form a new data subset, and repeating the step k times to obtain k data subsets, wherein m is the number of all attributes.
4. The method of claim 1, wherein the 1 st to k-th decision trees are built based on a C4.5 algorithm.
5. The method of claim 1, wherein for the discrete attributes, when calculating the information gain rate of the discrete attributes, calculating the class entropy, calculating the attribute entropy of each attribute, subtracting the class entropy from the attribute entropy to obtain the information gain of each attribute, calculating the split information metric of each attribute, and finally calculating the information gain rate of each attribute.
6. The method of claim 1, wherein for the continuous attribute, the information gain ratio of the continuous attribute is calculated by sorting from small to large, taking the midpoints of two adjacent values of the attribute as bifurcation points to obtain two small subsets, and then calculating the information gain ratio of the discrete attribute.
7. The method according to claim 1, wherein when the decision tree is built, the i-th data subset is used to arrange the gain rates of all the calculated attributes from large to small, the attribute with the largest gain rate is selected as a splitting attribute to split the i-th data subset, and the lower decision tree is built again in each small subset after splitting until all the child nodes are leaf nodes, and the i-th decision tree is built.
8. The method of claim 1, wherein k "yes" or "no" results are obtained when deciding, and the most obtained ticket is the final result.
CN202110269788.6A 2021-03-12 2021-03-12 Integrity prejudging method for shaft Active CN112966023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110269788.6A CN112966023B (en) 2021-03-12 2021-03-12 Integrity prejudging method for shaft

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110269788.6A CN112966023B (en) 2021-03-12 2021-03-12 Integrity prejudging method for shaft

Publications (2)

Publication Number Publication Date
CN112966023A CN112966023A (en) 2021-06-15
CN112966023B true CN112966023B (en) 2024-06-14

Family

ID=76277612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110269788.6A Active CN112966023B (en) 2021-03-12 2021-03-12 Integrity prejudging method for shaft

Country Status (1)

Country Link
CN (1) CN112966023B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009029135A1 (en) * 2007-08-24 2009-03-05 Exxonmobil Upstream Research Company Method for predicting well reliability by computer simulation
EA025004B1 (en) * 2010-06-18 2016-11-30 Лэндмарк Грэфикс Корпорейшн Computer-implemented method for wellbore optimization and program carrier device having computer executable instructions for optimization of a wellbore
CN108733966A (en) * 2017-04-14 2018-11-02 国网重庆市电力公司 A kind of multidimensional electric energy meter field thermodynamic state verification method based on decision woodlot
CN109751038A (en) * 2017-11-01 2019-05-14 中国石油化工股份有限公司 A kind of method of quantitative assessment oil/gas well wellbore integrity
CN108846259B (en) * 2018-04-26 2020-10-23 河南师范大学 Gene classification method and system based on clustering and random forest algorithm
CN110717524B (en) * 2019-09-20 2021-04-06 浙江工业大学 Method for predicting thermal comfort of old people
CN112329862A (en) * 2020-11-09 2021-02-05 杭州安恒信息技术股份有限公司 Decision tree-based anti-money laundering method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于随机森林机器学习的井筒完整性失效预判研究;袁俊亮等;《科技通报》;20220228;55-60 *

Also Published As

Publication number Publication date
CN112966023A (en) 2021-06-15

Similar Documents

Publication Publication Date Title
CN112901137B (en) Deep well drilling mechanical drilling speed prediction method based on deep neural network Sequential model
CN112529341B (en) Drilling well leakage probability prediction method based on naive Bayesian algorithm
CN108733632B (en) Well selection evaluation method for repeated fracturing of medium-low permeability high water-containing oil reservoir
Jianxing et al. Risk assessment of submarine pipelines using modified FMEA approach based on cloud model and extended VIKOR method
CN112610903A (en) Water supply pipe network leakage positioning method based on deep neural network model
CN107387051B (en) Repeated fracturing well selection method for multi-stage fractured horizontal well with low-permeability heterogeneous oil reservoir
CN105205329A (en) Comprehensive evaluation method for dam safety
CN109522962B (en) Chemical plant safety quantitative evaluation method
CN107862324B (en) MWSPCA-based CBR prediction model intelligent early warning method
CN114372693B (en) Transformer fault diagnosis method based on cloud model and improved DS evidence theory
CN106228190A (en) Decision tree method of discrimination for resident&#39;s exception water
CN111414692B (en) Pressure gauge verification table reliability assessment method based on Bayesian correction model
CN115471097A (en) Data-driven underground local area safety state evaluation method
Tripathy et al. Explaining Anomalies in Industrial Multivariate Time-series Data with the help of eXplainable AI
CN101771584B (en) Network abnormal flow detection method
CN116934262A (en) Construction safety supervision system and method based on artificial intelligence
Qin et al. Evaluation of goaf stability based on transfer learning theory of artificial intelligence
CN112966023B (en) Integrity prejudging method for shaft
Su et al. Prediction of drilling leakage locations based on optimized neural networks and the standard random forest method
CN116825253B (en) Method for establishing hot rolled strip steel mechanical property prediction model based on feature selection
CN109656904A (en) A kind of case risk checking method and system
CN112818557A (en) Well control system safety assessment method and system based on fuzzy comprehensive analysis
CN110927478B (en) Method and system for determining state of transformer equipment of power system
CN116384780A (en) Fire-fighting system safety degree judging method
CN114818927B (en) Data-driven equipment corrosion prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant