CN112966023B

CN112966023B - Integrity prejudging method for shaft

Info

Publication number: CN112966023B
Application number: CN202110269788.6A
Authority: CN
Inventors: 袁俊亮; 殷志明; 范白涛; 李中; 幸雪松; 谢仁军; 周长所; 罗洪斌
Original assignee: Beijing Research Center of CNOOC China Ltd; CNOOC China Ltd
Current assignee: Beijing Research Center of CNOOC China Ltd; CNOOC China Ltd
Priority date: 2021-03-12
Filing date: 2021-03-12
Publication date: 2024-06-14
Anticipated expiration: 2041-03-12
Also published as: CN112966023A

Abstract

The invention relates to a method for predicting the integrity of a shaft, which comprises the following steps: s1, establishing a big data sample according to the existing oil-gas well data; s2, determining characteristics of a well to be prejudged; s3, randomly generating k data subsets from the big data samples; s4, calculating the information gain rate of the discrete attribute and the continuous attribute of the k data subsets; s5, utilizing the information gain rate splitting subsets to establish the 1 st to the k decision tree; s6, placing the characteristics of the well to be pre-judged in the kth decision tree, and voting to decide the shaft integrity of the well to be pre-judged in the nth year. The method overcomes the defects that the existing method is low in accuracy, easy to be interfered by abnormal data, inaccurate in pre-judgment caused by factors such as reservoir temperature and pressure in production time and the like.

Description

Integrity prejudging method for shaft

Technical Field

The invention relates to the technical field of oil and gas wells, in particular to an integrity pre-judging method of a shaft.

Background

The failure of the integrity of a shaft is one of main risks in the production process of a high-temperature high-pressure oil-gas well, and is characterized in that the phenomenon of pressure in an annulus with original pressure being zero begins to appear, and once oil gas leakage occurs, the safety of personnel and equipment is seriously threatened. Currently, the problem of well integrity failure is more and more prominent in high-temperature high-pressure oil-gas wells, such as 16000 production wells in OCS (OCS) area of gulf of Mexico, and annular pressure phenomenon occurs in 43%.

Existing studies on wellbore integrity mainly focus on risk assessment, namely quantitatively calculating the integrity risk of a target well. The method comprises the steps of decomposing a well bore into a plurality of evaluation units, respectively calculating risk factors, such as a tubular column, a cement collar, a wellhead device and the like, wherein the risk factor of each unit is represented by the product of the failure frequency of the unit and the severity of the failure result, and comprehensively weighting after the risk factors of each unit are obtained to form the risk degree of the well. However, the existing method does not consider the influence of various production factors, the value of the severity of the failure result is too subjective, and the accuracy of the evaluation result is not high; on the other hand, the contribution of each unit to the whole well integrity failure is difficult to define; in addition, the conventional evaluation method has little significance on guidance of practice, and cannot predict the future failure years.

Accordingly, there is a need for a more comprehensive, objective, and scientific method of predicting wellbore integrity.

Disclosure of Invention

In view of one or more of the above-mentioned deficiencies of the prior art, the present invention is directed to a method of wellbore integrity pre-determination for predicting whether a wellbore integrity problem will occur in a particular time period for a certain hydrocarbon well.

The invention provides an integrity pre-judging method of a shaft, which comprises the following steps:

s1, establishing a big data sample according to the existing oil-gas well data;

S2, determining characteristics of a well to be prejudged;

s3, randomly generating k data subsets from the big data samples;

s4, calculating the information gain rate of the discrete attribute and the continuous attribute of the k data subsets;

s5, utilizing the information gain rate splitting subsets to establish the 1 st to the k decision tree;

s6, placing the characteristics of the well to be pre-judged in the kth decision tree, and voting to decide the shaft integrity of the well to be pre-judged in the nth year.

According to one embodiment of the invention, the big data sample comprises { X ₁,X₂,…X₆,X₇, C }, wherein: x ₁ is well type, X ₂ is well type, X ₃ is whether a production sleeve is fully sealed or not, X ₄ is production time, X ₅ is reservoir temperature, X ₆ is reservoir pressure, and X ₇ is CO ₂ content; c indicates whether the well bore of the well is complete.

According to one embodiment of the invention, determining the characteristics of the well to be prejudged comprises: determining whether the well type, the well type and the casing pipe of the well to be prejudged are fully sealed, and the reservoir temperature, the reservoir pressure and the CO ₂ content.

According to one embodiment of the invention, step S3 comprises randomly selecting N samples from all N samples with substitution, randomly selecting m samples from all attributes of the samples to form a new data subset, and repeating the step k times to obtain k data subsets, wherein m is less than the number of all attributes.

According to one embodiment of the invention, the 1 st to k st decision trees are built based on a C4.5 algorithm.

According to one embodiment of the invention, when the information gain rate of the discrete attribute is calculated for the discrete attribute, firstly calculating the class entropy, then calculating the attribute entropy of each attribute, subtracting the class entropy from the attribute entropy to obtain the information gain of each attribute, calculating the split information measurement of each attribute, and finally calculating the information gain rate of each attribute.

According to one embodiment of the invention, for the continuous attribute, the continuous attribute is sorted from small to large, the midpoints of two adjacent values of the attribute are taken as bifurcation points to obtain two small subsets, and then the information gain rate of the continuous attribute is calculated according to the method for calculating the information gain rate of the discrete attribute.

According to one embodiment of the invention, when the decision tree is built, the ith data subset is utilized, the gain rates of all the calculated attributes are arranged from large to small, the attribute with the largest gain rate is selected as a splitting attribute to split the ith data subset, and the lower decision tree is built again in each small subset after splitting until all the child nodes are leaf nodes, and the ith decision tree is built.

According to one embodiment of the invention, step S6 comprises: and placing the well type, the well type and the production casing pipe of the well to be prejudged in the kth decision tree to make a decision to prejudge the integrity of the shaft from the production to the nth year.

According to one embodiment of the invention, k "yes" or "no" results are obtained when deciding, and the most obtained ticket is the final result.

The random forest algorithm based on the invention belongs to an integrated algorithm in data mining, and synthesizes the results of a plurality of decision trees, thereby having high accuracy; the result is not fitted excessively due to the introduction of randomness, the condition that leaf nodes cannot be found is avoided, and the model generalization capability is strong; is insensitive to outlier data and is not easily interfered by singular points. The considerations of the invention include well type, seal type, production time, reservoir temperature, pressure, CO ₂ content, and broad coverage. The method has the advantages that the defects that the existing method is low in accuracy, easy to be interfered by abnormal data, inaccurate in pre-judgment caused by temperature and pressure of the reservoir in production time and the like are overcome.

Drawings

FIG. 1 is a flow chart of a method for integrity pre-determination of a wellbore according to an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the attached drawings, so that the objects, features and advantages of the present invention will be more clearly understood. It should be understood that the embodiments shown in the drawings are not intended to limit the scope of the invention, but rather are merely illustrative of the true spirit of the invention.

According to the method, the well type and the sealing type of the historical well are comprehensively considered, big data such as production time, reservoir temperature, pressure and CO ₂ content are not involved in calculation in the whole process, and based on a random forest data mining algorithm, the results of a plurality of decision trees are integrated, so that the pre-judging precision is improved. Due to the introduction of production time, the calculation result of the invention can prompt the time of failure in the future and finally guide the practice.

The embodiment of the invention provides a shaft integrity pre-judging method based on a random forest algorithm, which is suitable for oil-gas fields with a well number scale and a shaft integrity problem, and is mainly used for pre-judging whether a certain oil-gas well has the shaft integrity problem in a specific time. The method is based on a random forest algorithm, and has high accuracy; the introduction of randomness makes the model have strong generalization capability and insensitive to outliers; factors considered include well type, seal type, and production time, reservoir temperature, pressure, CO ₂ content. The method can overcome the defects of low accuracy, high cost for judging the failure unit, no consideration of factors such as production time and the like and weak instruction practice capability of the existing method.

In order to achieve the above object, the embodiments of the present invention adopt the following technical solutions: a shaft integrity prejudging method based on a random forest algorithm comprises the following steps:

1) Establishing a big data sample: counting the existing oil and gas well data to obtain N training samples { X ₁,X₂,…X₆,X₇, C }, wherein seven attribute values are as follows: x ₁ is well type, X ₂ is well type, X ₃ is whether a production sleeve is fully sealed or not, X ₄ is production time, X ₅ is reservoir temperature, X ₆ is reservoir pressure, and X ₇ is CO ₂ content; a conclusion value C indicates whether the well bore is complete, and is classified as "yes" or "no".

2) Determining characteristics of a well to be prejudged: the well type, casing pipe whether fully sealed, reservoir temperature, pressure and CO ₂ content of the well to be prejudged are investigated and mastered, and the aim is to judge whether the well bore remains intact or not until the nth year of production of the well.

3) From all N samples, randomly selecting N samples (where there is repeated data, where the repeated data is combined to obtain N sample numbers), randomly selecting m (m < 7) from seven attributes to form a new data subset (where the dimension is n×m), and repeating the step k times to obtain k data subsets.

4) With the 1 st subset of data, a1 st decision tree is built based on the C4.5 algorithm. For discrete attributes such as well type, well type and sealing type, class entropy-Info (integrity) is calculated first, then attribute entropy-Info (attribute) of each attribute is calculated, information Gain-Gain (attribute) of each attribute is obtained by subtracting the class entropy-Info (attribute), split information measurement-SplitInfo (attribute) of each attribute is calculated, and finally information Gain rate-GainRate (attribute) of each attribute is calculated.

5) With the 1 st subset of data, a1 st decision tree is built based on the C4.5 algorithm. For continuous attributes such as production time, reservoir temperature, pressure and CO ₂ content, sorting from small to large, taking the middle points of two adjacent values as bifurcation points to obtain two small subsets, and then calculating the information gain rate according to a discrete attribute method.

6) And (3) arranging the gain rates of all the calculated attributes from large to small by utilizing the 1 st data subset, selecting the attribute with the largest gain rate as a splitting attribute to split the data subset, and applying the steps 4-5 again in each small split subset until all the child nodes are leaf nodes (namely, the shaft is completely yes or no), so that the 1 st decision tree is grown.

7) Repeating steps 4 to 6 with the 2 nd to kth data subsets, and building the 2 nd to kth decision trees.

8) And (3) whether the well type, the well type and the production casing of the well to be prejudged are fully sealed or not is judged, and the production is carried out until the nth year, the reservoir temperature, the pressure and the CO ₂ content are put into the k decision trees to carry out decision making, so that k 'yes' or 'no' results are obtained, and the most tickets are obtained as final results.

Among them, the C4.5 algorithm is a series of algorithms used in machine learning and data mining classification problems. Its goal is supervised learning: given a data set, each tuple therein can be described by a set of attribute values, each tuple belonging to a certain one of a mutually exclusive class. The goal of C4.5 is to find a mapping from attribute values to categories by learning, and this mapping can be used to classify new entities of unknown category. Such algorithms belong to the prior art and are not described in detail here.

In the above method, in step 4), the formula for calculating the total category entropy Info (integrity) and the attribute entropy Info (each attribute) is as follows:

wherein N is the total number of wells; x _i ^j represents the number of samples for which the value of attribute Xi is j; x _i ^j represents the number of samples of the complete shaft when the value of the attribute Xi is j; x _i ^j is incomplete to indicate the number of incomplete samples of the well bore when the attribute Xi is given a value j.

Taking the X ₂ well as an example, the Info (X ₂) formula is as follows:

Information Gain of each attribute is calculated—gain (X _i):

Gain (X _i) =info (integrity) -Info (X _i)

Calculating split information metrics for each attribute-SplitInfo (X _i):

Wherein N is the total number of wells; x _i ^j represents the number of samples of attribute Xi having a value j.

Taking the X ₂ well as an example, the SplitInfo (X ₂) formula is as follows:

finally, the information gain ratio of each attribute is calculated-GainRate (X _i):

In the above method, in step 5), the continuous attributes such as production time, reservoir temperature, pressure and CO ₂ content are all sorted from small to large, the midpoints of two adjacent values are taken as bifurcation points to obtain two sets, the continuous attributes are converted into discrete attributes in this way, and then the gain ratio of each continuous attribute is calculated according to the method of step 4.

In the above method, in step 6), the flag of the completion of 1 decision tree growth is set forth: and selecting the attribute with the maximum gain rate from the calculated gain rates as a splitting attribute to split the data subset, and applying a C4.5 algorithm to each small subset after splitting until all the child nodes are leaf nodes (namely, the well bore is completely yes or no).

In the above method, in step 7), it is illustrated that the sources of other k-1 decision trees in the random forest algorithm are similar to the 1 st decision tree.

In the above method, in step 8), after defining the characteristics of "well type, whether fully sealed, production time, reservoir temperature, pressure, and CO ₂ content" of the well to be pre-determined, k results are obtained by using the k decision trees, and finally voting is performed to determine whether the well to be pre-determined keeps the well bore intact in the production time.

The random forest algorithm based on the invention belongs to an integrated algorithm in data mining, and synthesizes the results of a plurality of decision trees, thereby having high accuracy; the result is not fitted excessively due to the introduction of randomness, the condition that leaf nodes cannot be found is avoided, and the model generalization capability is strong; is insensitive to outlier data and is not easily interfered by singular points. The invention considers factors including well type, sealing type, production time, reservoir temperature, pressure and CO ₂ content, and has wide coverage. The method has the advantages that the defects that the existing method is low in accuracy, easy to be interfered by abnormal data, inaccurate in pre-judgment is easily caused due to the fact that production time, reservoir temperature and pressure are not considered, and the like are overcome.

Examples

As shown in fig. 1, the wellbore integrity prejudging method based on the random forest algorithm provided by the invention comprises the following steps:

1) Establishing a big data sample: the existing oil and gas well data were counted to obtain 1500 training samples { X ₁,X₂,…X₆,X₇, C } as follows. Seven attribute values: x ₁ is well type, X ₂ is well type, X ₃ is whether a production sleeve is fully sealed, X ₄ is production time, X ₅ is reservoir temperature, X ₆ is reservoir pressure, and X ₇ is CO ₂ content; a conclusion value C indicates whether the well bore is complete, and is classified as "yes" or "no".

No.	X ₁ well type	X ₂ well	X ₃ full seal	X ₄ time	X ₅ temperature	X ₆ pressure	X ₇CO₂ content	C integrity
									1	Vertical well	Oil well	Is that	10	140	80	5％	Whether or not
2	Directional well	Oil well	Is that	12	120	64	8％	Is that
									3	Directional well	Gas well	Is that	15	100	90	6％	Whether or not
4	Directional well	Gas well	Is that	6	80	86	12％	Is that
									5	Horizontal well	Gas well	Whether or not	5	110	82	9％	Is that
	..	..	..	..	..	..		..
									1500	Vertical well	Oil well	Whether or not	8	120	70	5％	Is that

2) Determining characteristics of a well to be prejudged: judging whether the temperature of one reservoir is 110 ℃, the pressure of the reservoir is 70MPa, the content of CO ₂ is 10%, and producing the casing fully sealed directional oil production well, wherein the well bore still keeps complete or not until 10 years of production.

3) From all 1500 samples, 1500 samples are randomly selected in a put-back way (repeated data are generated, 1000 samples are obtained after the repeated data are combined), and four of seven attributes are randomly selected to form a new data subset (the dimension of the new data subset is 1000 multiplied by 4). This step was repeated 9 times, resulting in 9 data subsets. Assuming 800 rows of "yes" and 200 rows of "no" for wellbore integrity in the 1 st data subset, the four randomly selected attributes are "well, fully casing, production time.

4) With the 1 st subset of data, a1 st decision tree is built based on the C4.5 algorithm. For discrete attributes such as well type, seal type. The specific steps are as follows, firstly calculating category entropy-Info (integrity), then calculating attribute entropy-Info (attribute) of each attribute, subtracting the category entropy-Info (attribute) from the attribute entropy-Info (attribute) to obtain information Gain-Gain (attribute) of each attribute, calculating split information measurement-SplitInfo (attribute) of each attribute, and finally calculating information Gain rate-GainRate (attribute) of each attribute.

Assuming that the well ratio in data subset 1 is straight: directional well: horizontal well = 100:700:200, integrity is as follows.

Well type	Complete/oral	Incomplete/oral
			Vertical well 100 ports	50	50
Directional well 700 port	600	100
			Horizontal well 200 ports	150	50

Assume that the well ratio in data subset 1 is: gas well = 700:300, integrity is as follows.

Well fastener	Complete/oral	Incomplete/oral
			Oil production well 700 port	600	100
Gas production well 300 mouth	200	100

Assume that the set condition in data subset 1 is full set cementing: non-fully sealed well cementation = 800:200, integrity is as follows.

Sealing condition	Complete/oral	Incomplete/oral
			Full-sealing 800 ports	750	50
Unsealed 200 mouths	200	150

The total category entropy-Info (integrity), and the entropy of each attribute-Info (each attribute) are calculated separately:

information Gain of each attribute, gain (each attribute), is calculated:

Gain (well) =info (complete) -Info (well) =0.722-0.676=0.046

Gain (well) =info (complete) -Info (well) =0.722-0.689=0.032

Gain (seal) =info (complete) -Info (seal) =0.722-0.432=0.290

Calculating the split information metric for each attribute-SplitInfo (each attribute):

Finally, the information gain ratio of each attribute is calculated, namely GainRate (each attribute):

5) With the 1 st subset of data, a 1 st decision tree is built based on the C4.5 algorithm. For continuous attributes (similar to reservoir temperature, pressure and CO ₂ content) such as production time, the two values are sorted from small to large, the midpoints of two adjacent values are taken as bifurcation points to obtain two small subsets, and then the information gain rate is calculated according to a discrete attribute method.

For example, for the production time attribute, the production time is first sorted in an increasing order, i.e. 5 years, 6 years, 8 years, 10 years, etc., two values are taken that are adjacent, for example, the midpoint between 5 years and 6 years is 5.5 years, and other midpoints are available in the same way, such as 7 years, 9 years, etc. Each midpoint can re-discretize the data subset 1 into two small subsets, e.g. using midpoint "9 years" can be divided into two small subsets with production time < = 9 years and production time >9 years.

Assume that the production time in data subset 1 < = 9 years: the number of wells produced for >9 years = 100:900, the integrity is as follows:

Production time	Complete/oral	Incomplete/oral
			Production time < = 9 years: 100 mouths	95	5
Production time >9 years: 900 ports	705	195

The information gain ratio of the production time attribute is calculated as follows:

Gain (time=9) =info (complete) -Info (time=9) =0.72-0.71=0.015

And calculating other intermediate points in turn to obtain the information gain rate of all production time.

6) And arranging the gain rates from large to small, selecting the attribute with the maximum gain rate as a splitting attribute to split the data subset 1 for the first time, and applying the C4.5 algorithm again in each small subset after splitting until all the small subsets are split into leaf nodes (namely, the well bore is completely yes or no), so that the 1 st decision tree is grown.

7) Repeating steps 4 to 6 with randomly generated 2 nd to 9 th data subsets, and building 2 nd to 9 th decision trees.

8) And (3) producing the directional oil well with fully sealed casing pipe, wherein the characteristic of the well to be prejudged is that the reservoir temperature is 110 ℃, the pressure is 70MPa, the CO ₂ content is 10%, and the directional oil well is produced by putting the directional oil well into 9 decision trees above for decision making in 10 th year, so that 9 conclusions of yes or no are obtained, and the conclusion with the most tickets is the final result.

Compared with the prior art, the method comprises the following steps: the method for judging the integrity of the well shaft comprises the steps of decomposing the well shaft into a plurality of evaluation units for respective evaluation, such as a tubular column, a cement sheath, a wellhead device and the like, wherein risk factors of the units are represented by products of failure frequency P and failure result severity S of the units, and the risk factors of the units are obtained and then comprehensively weighted to form the risk degree of the well. Wherein the frequency P is obtained through a failure accident tree, and the value is between 0 and 1; the severity S is generally assigned based on subjective experience, with no fixed range.

The existing method has the following defects: ① The effect of production time is not considered; ② The influence of reservoir temperature and pressure conditions is not considered; ③ The value of the severity S is too subjective; ④ The contributions of several evaluation units to the integrity failure are difficult to define, for example, the wellhead pressure of a well is found, the reasons are that the well is corroded and perforated by a casing, a cement sheath is cracked, or both, the contributions of which unit (or units) are defined are that the well is shut in, a tubular column is required to be produced, the integrity of each unit is checked one by one, the cost is huge, and the reasons cannot be found; ⑤ The existing evaluation method has little significance on guiding practice, can not give time for failure in the future, and only quantifies the risk of the failure of the well.

The invention considers factors including well type, sealing type, production time, reservoir temperature, pressure, CO ₂ content, has wide coverage and no subjective assignment item, and integrates the conclusion of a plurality of decision trees based on random forest algorithm, so that the accuracy of the pre-judging result is high; the result is not fitted excessively due to the introduction of randomness, and the condition that leaf nodes cannot be found is avoided, so that the model generalization capability is strong; is insensitive to outlier data and is not easily interfered by singular points. According to the method, production stopping is not needed for checking each evaluation unit, and a big data sample can be built by only observing whether the wellhead is pressurized or not, so that the economic cost is zero. Moreover, due to the introduction of the influence of production time, the pre-judgment result of the invention can prompt the time of the well failure in the future, and finally guide the practice.

The foregoing description of the exemplary embodiments of the invention is not intended to limit the scope of the invention, but rather to limit the scope of the invention. Moreover, it should be noted that the components of the present invention are not limited to the above-mentioned overall application, and each technical feature described in the specification of the present invention may be selected to be used alone or in combination according to actual needs, so that other combinations and specific applications related to the present invention are naturally covered by the present invention.

Claims

1. A method of integrity pre-determining of a wellbore, the method comprising the steps of:

s1, establishing a big data sample according to the existing oil-gas well data;

S2, determining characteristics of a well to be prejudged;

s3, randomly generating k data subsets from the big data samples;

S6, placing the characteristics of the well to be prejudged in the kth decision tree, and voting to decide the integrity of the shaft of the well to be prejudged in the nth year;

The big data sample includes { X1, X2, … X6, X7, C }, wherein: x1 is well type, X2 is well type, X3 is whether the production casing is fully sealed or not, X4 is production time, X5 is reservoir temperature, X6 is reservoir pressure, and X7 is CO2 content; c represents whether the well bore of the well is complete;

the step S6 comprises the following steps: and placing the well type, the well type and the production casing pipe of the well to be prejudged in the kth decision tree to make a decision to prejudge the integrity of the shaft in the production to the nth year.

2. The method of claim 1, wherein determining characteristics of the well to be predicted comprises: and determining whether the well type, the well type and the casing pipe of the well to be prejudged are fully sealed, and the reservoir temperature, the pressure and the CO2 content.

3. The method of claim 1, wherein step S3 comprises randomly selecting N samples from all N samples, randomly selecting m samples from all attributes of the samples to form a new data subset, and repeating the step k times to obtain k data subsets, wherein m is the number of all attributes.

4. The method of claim 1, wherein the 1 st to k-th decision trees are built based on a C4.5 algorithm.

5. The method of claim 1, wherein for the discrete attributes, when calculating the information gain rate of the discrete attributes, calculating the class entropy, calculating the attribute entropy of each attribute, subtracting the class entropy from the attribute entropy to obtain the information gain of each attribute, calculating the split information metric of each attribute, and finally calculating the information gain rate of each attribute.

6. The method of claim 1, wherein for the continuous attribute, the information gain ratio of the continuous attribute is calculated by sorting from small to large, taking the midpoints of two adjacent values of the attribute as bifurcation points to obtain two small subsets, and then calculating the information gain ratio of the discrete attribute.

7. The method according to claim 1, wherein when the decision tree is built, the i-th data subset is used to arrange the gain rates of all the calculated attributes from large to small, the attribute with the largest gain rate is selected as a splitting attribute to split the i-th data subset, and the lower decision tree is built again in each small subset after splitting until all the child nodes are leaf nodes, and the i-th decision tree is built.

8. The method of claim 1, wherein k "yes" or "no" results are obtained when deciding, and the most obtained ticket is the final result.