CN107133670A

CN107133670A - A kind of Complex lithologic identification method and system based on decision tree data mining algorithm

Info

Publication number: CN107133670A
Application number: CN201710266077.7A
Authority: CN
Inventors: 谭锋奇
Original assignee: University of Chinese Academy of Sciences
Current assignee: University of Chinese Academy of Sciences
Priority date: 2017-04-21
Filing date: 2017-04-21
Publication date: 2017-09-05

Abstract

The present invention relates to a kind of Complex lithologic identification method and system based on decision tree data mining algorithm.It is related to oil exploration reservoir assessment technology field.Methods described includes：The lithological profile of sealing core drilling well section is set up, Lithology Discrimination data set is formed according to lithological profile；Data prediction is carried out to the Lithology Discrimination data set；Correspond to the different lithology in Lithology Discrimination data set respectively with different log parameters；Data mining is carried out to the lithology in the Lithology Discrimination data set after pretreatment, tree-shaped identification model is formed, so that complex lithology is identified the lithology according to corresponding to the tree-shaped identification model and log parameter.The present invention passes through lithological profile formation Lithology Discrimination data set, carry out unified Lithology Discrimination, and different log parameters are corresponded to according to the characteristics of lithology, after data mining is carried out to Lithology Discrimination data set, clear tree-shaped identification model is formed, accurate Lithology Discrimination can be carried out under complex lithology situation.

Description

A kind of Complex lithologic identification method and system based on decision tree data mining algorithm

Technical field

Decision tree data mining algorithm is based on the present invention relates to oil exploration reservoir assessment technology field, more particularly to one kind Complex lithologic identification method and system.

Background technology

With the expansion of petroleum exploration and development breadth and depth, one side oil data are constantly accumulated with very fast speed, If the oil data of magnanimity are made full use of as a kind of carrier of wealth, it can be brought additionally to Petroleum finance Benefit, traditional geophysical exploration method theoretical based on petrophysics, mathematics, statistics, oil exploration for In-depth seems that some are helpless using these mass datas；On the other hand various untraditional reservoirs are just turning into main force's exploration pair As for these complex reservoirs, the method such as conventional cross plot, linear regression, multivariate discriminant analysis can not be solved effectively Certainly Lithology Discrimination, reservoir parameter calculate etc. problem.Therefore, be necessary in petroleum exploration and development introduce artificial intelligence, machine learning, The new method of other research fields such as pattern-recognition, data mining, can be fine as the solution of a kind of " data-driven " Ground solves the various problems run into complex reservoir evaluation.

Data mining be exactly it is implicit from substantial amounts of, incomplete, noisy, fuzzy, random extracting data, Process that is unknown in advance but being potentially useful information and knowledge.Data mining task is divided into two major classes of description and prediction, preceding The summary pattern of potential relation in person's export data, the latter is inferred that, to make prediction, main method has god to current data Through network, SVMs, Bayesian network and decision tree etc., wherein neutral net oneself through being widely used in Lithology Discrimination, it is heavy Product is mutually in terms of division, Permeability Prediction, oil-gas-water layer identification, in practice it has proved that, when the complicated oil of processing many factors influence During geological problem, nonlinear neural net method is more superior than linear statistical analysis technique.But neutral net has most Big shortcoming is exactly that easily training sample " is crossed and learnt ", and the variable element of method is too many, if giving algorithm time enough, it Anything " can be almost remembered ", the model being built such that will depart from geologic setting, without actual application value；Separately Outside, neutral net, SVMs and Bayesian network also have one common disadvantage is that prediction model belong to it is " black Box ", can't see with relation fitting in what manner between sample data and attribute.

The content of the invention

In order to solve the above-mentioned technical problem there is provided a kind of complex lithology knowledge based on decision tree data mining algorithm by the present invention Other method and system.

The technical scheme that the present invention solves above-mentioned technical problem is as follows：A kind of complexity based on decision tree data mining algorithm Lithology Identification Methods, methods described includes：

The lithological profile of sealing core drilling well section is set up, Lithology Discrimination data set is formed according to lithological profile；

Data prediction is carried out to the Lithology Discrimination data set；

Correspond to the different lithology in Lithology Discrimination data set respectively with different log parameters；

Data mining is carried out to the lithology in the Lithology Discrimination data set after pretreatment, tree-shaped identification model is formed, from And complex lithology is identified the lithology according to corresponding to the tree-shaped identification model and log parameter.

The beneficial effects of the invention are as follows：The present invention carries out unified lithology by lithological profile formation Lithology Discrimination data set Identification, and the different log parameter of correspondence according to the characteristics of lithology, after data mining is carried out to Lithology Discrimination data set, are formed Clear tree-shaped identification model, can carry out accurate Lithology Discrimination under complex lithology situation.

On the basis of above-mentioned technical proposal, the present invention can also do following improvement.

Further, after the lithological profile of sealing core drilling well section is set up, the survey according to thick-layer, short lap and lithology intermediate zone Well readings principle reads the log response of different lithology section respectively, sets up conglomerate lithology pass corresponding with log parameter System, forms Lithology Discrimination data set.

Further, the process of data prediction is carried out to the Lithology Discrimination data set includes polishing missing values, overall mark Standardization and elimination abnormity point.

Beneficial effect using above-mentioned further scheme is：The shadow of log response in the presence of short lap and lithology intermediate zone Sound can all cause data exception, accordingly, it would be desirable to which abnormity point is eliminated so as to precisely identification.

Further, data digging is carried out to the lithology in the Lithology Discrimination data set after pretreatment using decision Tree algorithms Pick.

Beneficial effect using above-mentioned further scheme is：Decision Tree algorithms belong to " whitepack " and seal model up for safekeeping, can understand Recognize grader be how to work and log parameter importance.For anisotropism is serious, complex lithology is changeable Oil and gas reservoir, traditional mathematical statistics method is difficult to reflect exactly the Nonlinear Mapping relation between log and lithology, And the traditional decision-tree with self-organizing, self study, reasoned thinking ability and Nonlinear Modeling can solve this well and ask Topic, provides high-precision Lithology Discrimination result, it is ensured that the reasonable Efficient Development of complex reservoir for evaluating reservoir.

Further, it is to the process that the lithology in the Lithology Discrimination data set after pretreatment carries out data mining：Calculate Go out weight of each log parameter in Lithology Discrimination, obtain sensitive parameter and be modeled, characteristic manner with from top to bottom finally Tree-shaped identification model is set up, wherein each branch set represents the recognition rule of a class lithology, leaf node represents to constitute the knowledge Not other numerical intervals of regular log parameter and each log parameter.

Beneficial effect using above-mentioned further scheme is：Analyzed by the weight to different log parameters and then oneself Dynamic preferably to go out lithology sensitive parameter, final set up takes into account Model Identification precision and the Complex lithologic identification mould of sample generalization ability Type, important geologic basis is provided for Comprehensive Evaluation of Reservoir.

In order to solve the above-mentioned technical problem, the present invention also proposes a kind of complex lithology based on decision tree data mining algorithm Identifying system, the system includes：

Data set sets up module, the lithological profile for setting up sealing core drilling well section, and forming lithology according to lithological profile knows Other data set；

Pretreatment module, for carrying out data prediction to the Lithology Discrimination data set；

Mark module, for corresponding to the different lithology in Lithology Discrimination data set respectively with different log parameters；

Data-mining module, for carrying out data mining, shape to the lithology in the Lithology Discrimination data set after pretreatment Into tree-shaped identification model, so that the lithology according to corresponding to the tree-shaped identification model and log parameter is carried out to complex lithology Identification.

Further, the data set is set up module and is additionally operable to after the lithological profile of sealing core drilling well section is set up, according to thick The well logging readings principle of layer, short lap and lithology intermediate zone reads the log response of different lithology section respectively, sets up gravel The corresponding relation of rock lithology and log parameter, forms Lithology Discrimination data set.

Further, the pretreatment module to the Lithology Discrimination data set specifically for carrying out polishing missing values, entirety Standardization and the pretreatment for eliminating abnormity point.

Further, the data-mining module is specifically for using decision Tree algorithms to the Lithology Discrimination number after pretreatment Data mining is carried out according to the lithology of concentration.

Further, the data-mining module is additionally operable to calculate weight of each log parameter in Lithology Discrimination, obtains Obtain sensitive parameter to be modeled, tree-shaped identification model is finally set up from top to bottom with characteristic manner, wherein each branch set The recognition rule of a class lithology is represented, leaf node represents to constitute the log parameter of the recognition rule and the number of each log parameter Value is interval.

Brief description of the drawings

Fig. 1 is the flow chart of the Complex lithologic identification method described in the embodiment of the present invention；

Fig. 2 is the tree-shaped identification model schematic diagram described in the embodiment of the present invention；

Fig. 3 is the schematic diagram of the Complex lithologic identification system described in the embodiment of the present invention.

Embodiment

The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and It is non-to be used to limit the scope of the present invention.

As shown in figure 1, the present embodiment sets up high-precision, non-to determine the response relation of different lithology and log Linear Lithology Discrimination model, improves the accuracy of identification of complex lithology, proposes a kind of answering based on decision tree data mining algorithm Miscellaneous Lithology Identification Methods, methods described includes：

Data prediction is carried out to the Lithology Discrimination data set；

It is preferred that, before Lithology Discrimination data set is formed, the geologic setting and reservoir characteristics in binding area, based on storage Layer evaluates overall requirement, describes to name the lithology for determining research area in terms of two with casting body flake from the core sample of sealed coring well Title, because the anisotropism of complicated reservoirs is strong, acutely, the local casting body flake of rock core can not accurately reflect stratum to variation of lithological True lithology；And core description lacks the quantitative scale to different minerals composition, effective combination of two methods can be carried The accuracy rate that high complex lithology is named.

By taking the following group Conglomerate Reservoir in area gram in Karamay oilfield six as an example, Conglomerate Reservoir is due to golden material resource, many water systems and fast The depositional environment of speed change causes reservoir heterogeneity strong and the features such as changeable complex lithology, and lithology is accurately identified as such The difficult point of oil reservoir adjustment exploitation.Core sample description and casting body flake qualification result according to research 8 mouthfuls of area sealed coring well is true Determine the lithology of Conglomerate Reservoir mainly include conglomerate, glutenite, chiltern conglomerate, containing glutenite, gritstone containing gravel, pebbly sandstone, in Several lithology of sandstone, packsand, pebbly mudstone, silty, mud stone etc. ten.

Specifically, using sequence stratigraphy as according to the lithological profile for setting up 8 mouthfuls of sealing core drilling well sections, reading different lithology section Log response, the conglomerate lithology of 327 intervals and the corresponding relation of log data are set up altogether, conglomerate lithology is formed The sample data set of identification.

Wherein, the principle of well logging readings is as follows：1. thick-layer reads average value as the basic data of Lithology Discrimination；2. mud The maximum or minimum value that the short laps such as rock, glutenite, volcanic rock, metamorphic rock read log are used as the basis of Lithology Discrimination Data；3. the intermediate zone between different lithology reads average value, or carries out back-up processing as abnormity point in data prediction.

Data prediction is carried out for the Lithology Discrimination sample data set of foundation, mainly includes polishing missing values, overall mark Standardization, eliminates abnormity point etc., most important of which preprocessor is the deletion of abnormity point, for the storage with complex lithology Layer, the reason for causing data sample abnormal mainly has two aspects, and one is the presence of short lap, due to logger longitudinal frame Limitation, log response value can not reflect the real information of formation lithology；On the other hand due to log response in lithology intermediate zone Influenceed by both sides country rock, well logging numerical value also is difficult to the information of the true lithology of accurate calibration.Lost according to log value is caused Genuine reason, deletes 18 abnormal point numericals, it is caused mainly due to short lap and lithology intermediate zone altogether.

The complex lithology classification for determining to need to recognize is named based on lithology, can be to some phases according to the requirement of evaluating reservoir Merge, some important lithology can also be finely divided like lithology；In addition, determining the excavation field of decision Tree algorithms, i.e., Which log is selected to indicate lithological information.

Specifically, according to the total demand of Conglomerate Reservoir evaluating reservoir, area gram the following group conglomerate lithology is ultimately determined in six 8 kinds of conglomerate, chiltern conglomerate, glutenite, gritstone containing gravel, packsand, pebbly mudstone, silty and mud stone etc.；Excavate word Section selection virgin zone resistivity (Rt), natural gamma (GR), natural potential (SP), hole diameter (CAL), neutron porosity (CNL), Interval transit time (AC) and compensation density (DEN) amount to 7 log parameters to indicate lithology.

The present embodiment carries out data digging using decision Tree algorithms to the lithology in the Lithology Discrimination data set after pretreatment Pick.

Specifically, decision Tree algorithms carry out the data mining of conglomerate lithology using C5.0, algorithm utilizes information gain-ratio (Information Gain Ratio) is used as split vertexes as measurement, the maximum attribute of selection information gain-ratio value is divided； The method that C5.0 uses errors of pessimism beta pruning (Pessimistic Error Pruning), that is, utilize the company in binomial distribution Continuous amendment is corrected to substituting into error again, more to be met actual error rate.

How the selection of Split Attribute selects an optimal Split Attribute from numerous attributes, is the core of algorithm, Selection standard is roughly divided into two classes：1. selection strategy separate between attribute；2. be mutually related selection strategy between attribute, often The method of kind has its advantage and disadvantage, is carried out based on actual sample data preferred.Pruning strategy includes predictive pruning and the class of rear beta pruning two, preceding Person increases tactful using tree is stopped early；The latter is then to be first according to maximum-norm generation initial decision tree, is then cut again Branch, in practice it has proved that rear pruning method is more successful for Oil/Gas Reservoir Assessment field.

On the basis of Split Attribute and Pruning strategy are determined, Lithology Discrimination data set of the decision Tree algorithms to foundation is utilized Excavated, decision Tree algorithms can provide weight of each log parameter in Lithology Discrimination, and then preferably go out sensitive parameter It is modeled, smaller for shared weight, the relatively low parameter of sensitiveness is then not involved in the foundation of decision-tree model.It is determined that modeling After sensitive parameter, decision Tree algorithms " are divided and rule " from top to down sets up tree-shaped identification model, and each branch of tree represents one The recognition rule of class lithology, leaf node represents to constitute the property parameters of the recognition rule and the numerical intervals of each parameter.

Specifically, from 73 conglomerate lithology sensitive parameters preferably gone out in excavating field, the ginseng of virgin zone resistivity Number weight is maximum, sensitiveness highest；Interval transit time and natural gamma take second place, and other 4 parameters are due in conglomerate Lithology Discrimination Shared weight is smaller, and sensitiveness is relatively low, thus is not involved in the foundation of decision-tree model.On the preferred basis of lithology sensitive parameter On, decision Tree algorithms set up tree-shaped identification model, and Lithology Discrimination model is since root node is virgin zone resistivity on the whole Data sample is tested, " dividing and rule " to amount to from top to down is divided into four levels, and each branch of tree represents a class rock The recognition rule of property, as shown in Figure 2.

" the branch rule " set up using decision tree can recognize the lithology that sample data is concentrated exactly, conglomerate lithology Sample data amounts to 309, and decision-tree model can accurately identify 296, and comprehensive recognition accuracy reaches 95.79%.Due to What is set up is high accuracy, nonlinear identification model, and compared to conventional intersection drawing method, accuracy of identification is increased substantially.Separately Outside, generalization ability of the model on other data sets is also relatively good, may apply to the Complex lithologic identification of research other wells of area In.

Decision tree data mining algorithm preferably goes out the sensitive parameter of conglomerate Lithology Discrimination according to given excavation field automatically, And then the tree-shaped identification model of Nonlinear Mapping is set up, the synthesis recognition accuracy of model is higher.In order to geophysics Family provides identification model that is more directly perceived, being more easy to application, in three conglomerate lithology sensitive parameters that algorithm preferably goes out, constructs first Interval transit time and the product of natural gamma (AC*GR), conglomerate lithology cross plot is made then in conjunction with virgin zone resistivity (Rt) Version, the change information according to crossplot analyzes the changing rule of reservoir, and it is strong to be that the adjustment of Conglomerate Reservoir development plan and keeping away is penetrated Water Flooding Layer provides geologic basis.

The complex lithology for the Lithology Discrimination model evaluation Conglomerate Reservoirs, Karamay Oil Field set up using this example, is achieved Relatively good Effect of geological application, and provide a kind of new resolving ideas and scheme for the evaluating reservoir of complex reservoir. From the example of Conglomerate Reservoir Lithology Discrimination as can be seen that data mining can in a large amount of unknown data Automatic-searching lithology Sensitive parameter, nonlinear decision tree identification model is set up by the analysis to parameters weighting, and the process of data mining and knot Research of the fruit to set up lithology plate using geophysics knowledge provides idea and method.Two methods be interdepend, phase The relation mutually promoted, data mining can give people to provide the analysis of law of some mass datas in early stage, help geophysics Useful information is extracted in research；Geophysicist selects some important information using its background knowledge and to construct some new Parameter, can help data digging method determine more rationally and it is effective excavate field, be the foundation of geophysical model Accuracy, quantification, the result of sectionalization are provided.

As shown in figure 3, corresponding, the present embodiment also proposes that a kind of complex lithology based on decision tree data mining algorithm is known Other system, the system includes：

Before Lithology Discrimination data set is formed, the geologic setting and reservoir characteristics in binding area, based on evaluating reservoir It is overall to require, the lithology title in determination research area in terms of two is named from core sample description and the casting body flake of sealed coring well, Because the anisotropism of complicated reservoirs is strong, acutely, the local casting body flake of rock core can not accurately reflect the true of stratum to variation of lithological Real lithology；And core description lacks the quantitative scale to different minerals composition, effective combination of two methods can improve multiple The accuracy rate that complex is named.

On the basis of lithology is named, using sequence stratigraphy as according to the lithological profile for setting up sealing core drilling well section, read The log response of different lithology section, the principle for readings of logging well is as follows：1. thick-layer reads average value and is used as Lithology Discrimination Basic data；2. the short lap such as mud stone, glutenite, volcanic rock, metamorphic rock reads maximum or the minimum value conduct of log The basic data of Lithology Discrimination；3. the intermediate zone between different lithology reads average value, or as abnormal in data prediction Point carries out back-up processing.

Data prediction is carried out for the Lithology Discrimination sample data set of foundation, mainly includes polishing missing values, overall mark Standardization, eliminates abnormity point etc., most important of which preprocessor is the deletion of abnormity point, for the storage with complex lithology Layer, the reason for causing data sample abnormal mainly has two aspects, and one is the presence of short lap, due to logger longitudinal frame Limitation, log response value can not reflect the real information of formation lithology；On the other hand due to log response in lithology intermediate zone Influenceed by both sides country rock, well logging numerical value also is difficult to the information of the true lithology of accurate calibration.

Complex lithologic identification method of the present invention based on decision tree data mining algorithm, its core technology is division Property value and Pruning strategy it is preferred, algorithm determines every kind of survey by the comprehensive analysis to different lithology and log response Weight of the well curve in Lithology Discrimination, and then preferably go out lithology sensitive parameter and set up tree shaped model, the precision of model and extensive Ability is all relatively good.Relative to cross plot and other data mining algorithms, decision tree Lithology Identification Methods can provide high accuracy, Nonlinear identification model, for the evaluating reservoir with complex lithology, its accuracy of identification is higher than common cross-plot；It is another Aspect, decision Tree algorithms belong to " whitepack " and seal model up for safekeeping, and can be understood that grader is how to work and each seed ginseng Several relative importances, also has good directive function to geophysical research.Therefore, storage of the invention in complex lithology Layer has important application value and the preferable market demand in evaluating.

The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims

1. a kind of Complex lithologic identification method based on decision tree data mining algorithm, it is characterised in that methods described includes：

Data prediction is carried out to the Lithology Discrimination data set；

Data mining is carried out to the lithology in the Lithology Discrimination data set after pretreatment, tree-shaped identification model is formed, so that root Complex lithology is identified according to the lithology corresponding to the tree-shaped identification model and log parameter.

2. a kind of Complex lithologic identification method based on decision tree data mining algorithm according to claim 1, its feature It is, after the lithological profile of sealing core drilling well section is set up, the well logging readings principle according to thick-layer, short lap and lithology intermediate zone The log response of different lithology section is read respectively, the corresponding relation of conglomerate lithology and log parameter is set up, and forms lithology Identification data collection.

3. a kind of Complex lithologic identification method based on decision tree data mining algorithm according to claim 2, its feature It is, the process that data prediction is carried out to the Lithology Discrimination data set is standardized and disappeared including polishing missing values, entirety Except abnormity point.

4. a kind of Complex lithologic identification method based on decision tree data mining algorithm according to claim 3, its feature It is, data mining is carried out to the lithology in the Lithology Discrimination data set after pretreatment using decision Tree algorithms.

5. a kind of Complex lithologic identification side based on decision tree data mining algorithm according to any one of Claims 1-4 Method, it is characterised in that be to the process that the lithology in the Lithology Discrimination data set after pretreatment carries out data mining：Calculate Each weight of the log parameter in Lithology Discrimination, is obtained sensitive parameter and is modeled, finally built from top to bottom with characteristic manner Tree-shaped identification model is found, wherein each branch set represents the recognition rule of a class lithology, leaf node represents to constitute the identification The log parameter of rule and the numerical intervals of each log parameter.

6. a kind of Complex lithologic identification system based on decision tree data mining algorithm, it is characterised in that the system includes：

Data set sets up module, the lithological profile for setting up sealing core drilling well section, and Lithology Discrimination number is formed according to lithological profile According to collection；

Data-mining module, for carrying out data mining to the lithology in the Lithology Discrimination data set after pretreatment, forms tree Shape identification model, so that the lithology according to corresponding to the tree-shaped identification model and log parameter is known to complex lithology Not.

7. a kind of Complex lithologic identification system based on decision tree data mining algorithm according to claim 6, its feature Be, the data set is set up module and is additionally operable to after the lithological profile of sealing core drilling well section is set up, according to thick-layer, short lap and The well logging readings principle of lithology intermediate zone reads the log response of different lithology section respectively, sets up conglomerate lithology and well logging The corresponding relation of parameter, forms Lithology Discrimination data set.

8. a kind of Complex lithologic identification system based on decision tree data mining algorithm according to claim 7, its feature Be, the pretreatment module specifically for the Lithology Discrimination data set is carried out polishing missing values, overall standardization and Eliminate the pretreatment of abnormity point.

9. a kind of Complex lithologic identification system based on decision tree data mining algorithm according to claim 8, its feature It is, the data-mining module is specifically for using decision Tree algorithms to the rock in the Lithology Discrimination data set after pretreatment Property carry out data mining.

10. a kind of Complex lithologic identification system based on decision tree data mining algorithm according to any one of claim 6 to 9 System, it is characterised in that the data-mining module is additionally operable to calculate weight of each log parameter in Lithology Discrimination, is obtained Sensitive parameter is modeled, and finally sets up tree-shaped identification model from top to bottom with characteristic manner, wherein each the branch's generation set The recognition rule of the class lithology of table one, leaf node represents to constitute the log parameter of the recognition rule and the numerical value of each log parameter It is interval.