CN107133670A

CN107133670A - A kind of Complex lithologic identification method and system based on decision tree data mining algorithm

Info

Publication number: CN107133670A
Application number: CN201710266077.7A
Authority: CN
Inventors: 谭锋奇
Original assignee: University of Chinese Academy of Sciences
Current assignee: University of Chinese Academy of Sciences
Priority date: 2017-04-21
Filing date: 2017-04-21
Publication date: 2017-09-05

Abstract

The invention relates to a complex lithology identification method and system based on a decision tree data mining algorithm. It relates to the technical field of oil exploration reservoir evaluation. The method includes: establishing a lithology profile of a sealed core section, forming a lithology identification data set according to the lithology profile; performing data preprocessing on the lithology identification data set; using different logging parameters to correspond to lithology Identify different lithologies in the data set; perform data mining on the lithology in the preprocessed lithology identification data set to form a tree identification model, so that complex Lithology is identified. The present invention forms the lithology identification data set through the lithology profile, performs unified lithology identification, and corresponds to different logging parameters according to the characteristics of lithology, and forms a clear and clear tree after data mining on the lithology identification data set The identification model can perform accurate lithology identification under complex lithology conditions.

Description

A complex lithology identification method and system based on decision tree data mining algorithm

技术领域technical field

本发明涉及石油勘探储层评价技术领域，尤其涉及一种基于决策树数据挖掘算法的复杂岩性识别方法及系统。The invention relates to the technical field of petroleum exploration reservoir evaluation, in particular to a complex lithology identification method and system based on a decision tree data mining algorithm.

背景技术Background technique

随着石油勘探开发广度和深度的拓展，一方面石油数据以飞快的速度不断积累，海量的石油数据作为一种财富的载体如果加以充分利用，可以给石油勘探与开发带来额外的效益，传统的基于岩石物理学、数理学、统计学、石油勘探理论的地球物理勘探方法对于深化利用这些海量数据显得有些无能为力；另一方面各种非常规油气藏正成为主力勘探对象，对于这些复杂油气藏，常规的交会图、线性回归、多元判别分析等方法并不能有效地解决岩性识别、储层参数计算等难题。因此，石油勘探开发中有必要引入人工智能、机器学习、模式识别等其它研究领域的新方法，数据挖掘作为一种“数据驱动”的解决方法，可以很好地解决复杂油气藏评价中遇到的各种难题。With the expansion of the breadth and depth of oil exploration and development, on the one hand, oil data is continuously accumulating at a rapid rate. If the massive oil data is used as a carrier of wealth, if it is fully utilized, it can bring additional benefits to oil exploration and development. Traditional Geophysical exploration methods based on petrophysics, mathematics, statistics, and petroleum exploration theory are somewhat powerless to deepen the utilization of these massive data; on the other hand, various unconventional oil and gas reservoirs are becoming the main exploration targets. However, conventional methods such as crossplot, linear regression, and multivariate discriminant analysis cannot effectively solve difficult problems such as lithology identification and reservoir parameter calculation. Therefore, it is necessary to introduce new methods in other research fields such as artificial intelligence, machine learning, and pattern recognition in petroleum exploration and development. As a "data-driven" solution, data mining can well solve problems encountered in complex oil and gas reservoir evaluation. of various problems.

数据挖掘就是从大量的、不完全的、有噪声的、模糊的、随机的数据中提取隐含的、事先未知的、但又是潜在有用信息和知识的过程。数据挖掘任务分为描述和预测两大类，前者导出数据中潜在关系的概括模式，后者对当前数据进行推断以做出预测，主要方法有神经网络、支持向量机、贝叶斯网络和决策树等，其中神经网络己经广泛应用于岩性识别、沉积相划分、渗透率预测、油气水层识别等方面，实践证明，当处理多种因素影响的复杂石油地质问题时，非线性的神经网络方法要比线性的统计分析技术优越。但是神经网络有个最大的缺点就是容易对训练样本“过学习”，方法的可变参数太多，如果给算法足够的时间，它几乎可以“记住”任何事情，这样建立的模型就会脱离地质背景，没有实际的应用价值；另外，神经网络、支持向量机和贝叶斯网络还有一个共同的缺点就是预测的模型都属于“黑盒”，样本数据和属性之间以怎样的方式和关系拟合是看不到的。Data mining is the process of extracting implicit, previously unknown, but potentially useful information and knowledge from a large number of incomplete, noisy, fuzzy, and random data. Data mining tasks are divided into two categories: description and prediction. The former derives a generalized pattern of potential relationships in the data, and the latter infers the current data to make predictions. The main methods are neural networks, support vector machines, Bayesian networks, and decision-making. Among them, neural networks have been widely used in lithology identification, sedimentary facies division, permeability prediction, oil, gas and water layer identification, etc. Practice has proved that when dealing with complex petroleum geological problems affected by various factors, nonlinear neural networks Network methods are superior to linear statistical analysis techniques. But one of the biggest disadvantages of neural network is that it is easy to "over-learn" the training samples. There are too many variable parameters in the method. If the algorithm is given enough time, it can "remember" almost anything, and the model established in this way will be out of control. The geological background has no practical application value; in addition, neural networks, support vector machines, and Bayesian networks have a common shortcoming that the predicted models are all "black boxes", and how the sample data and attributes are related to each other Relationship fitting is invisible.

发明内容Contents of the invention

本发明为了解决上述技术问题，提供一种基于决策树数据挖掘算法的复杂岩性识别方法及系统。In order to solve the above technical problems, the present invention provides a complex lithology identification method and system based on a decision tree data mining algorithm.

本发明解决上述技术问题的技术方案如下：一种基于决策树数据挖掘算法的复杂岩性识别方法，所述方法包括：The technical solution of the present invention to solve the above technical problems is as follows: a complex lithology identification method based on decision tree data mining algorithm, said method comprising:

建立密闭取心井段的岩性剖面，根据岩性剖面形成岩性识别数据集；Establish the lithology profile of the sealed coring section, and form the lithology identification data set according to the lithology profile;

对所述岩性识别数据集进行数据预处理；Performing data preprocessing on the lithology identification data set;

以不同的测井参数分别对应岩性识别数据集中的不同岩性；Different logging parameters correspond to different lithologies in the lithology identification data set;

对预处理之后的岩性识别数据集中的岩性进行数据挖掘，形成树状识别模型，从而根据所述树状识别模型以及测井参数所对应的岩性对复杂岩性进行识别。Data mining is performed on the lithology in the preprocessed lithology identification data set to form a tree identification model, so as to identify complex lithology according to the lithology corresponding to the tree identification model and logging parameters.

本发明的有益效果是：本发明通过岩性剖面形成岩性识别数据集，进行统一岩性识别，并根据岩性的特点对应不同的测井参数，在对岩性识别数据集进行数据挖掘后，形成清晰明了的树状识别模型，能够在复杂岩性状况下进行精准岩性识别。The beneficial effects of the present invention are: the present invention forms the lithology identification data set through the lithology profile, performs unified lithology identification, and corresponds to different logging parameters according to the characteristics of the lithology, after data mining the lithology identification data set , forming a clear and clear tree identification model, which can carry out accurate lithology identification under complex lithology conditions.

在上述技术方案的基础上，本发明还可以做如下改进。On the basis of the above technical solutions, the present invention can also be improved as follows.

进一步，在建立密闭取心井段的岩性剖面后，依据厚层、薄夹层和岩性过渡带的测井读值原则分别读取不同岩性段的测井曲线响应值，建立砾岩岩性与测井参数的对应关系，形成岩性识别数据集。Further, after establishing the lithology section of the closed core section, the logging curve response values of different lithology sections were read according to the logging reading principles of thick layers, thin interbeds, and lithology transition zones, and the conglomerate rock section was established. According to the corresponding relationship between lithology and logging parameters, a lithology identification data set is formed.

进一步，对所述岩性识别数据集进行数据预处理的过程包括补齐缺失值、整体标准化以及消除异常点。Further, the process of data preprocessing on the lithology identification data set includes filling missing values, overall standardization and eliminating abnormal points.

采用上述进一步方案的有益效果是：薄夹层的存在和岩性过渡带上测井响应的影响都会造成数据异常，因此，需要将异常点进行消除以便精准识别。The beneficial effect of adopting the above-mentioned further scheme is that the existence of thin interlayers and the influence of the logging response on the lithological transition zone will cause data anomalies. Therefore, it is necessary to eliminate the anomalies for accurate identification.

进一步，采用决策树算法对预处理之后的岩性识别数据集中的岩性进行数据挖掘。Further, the decision tree algorithm is used to carry out data mining on the lithology in the preprocessed lithology identification data set.

采用上述进一步方案的有益效果是：决策树算法属于“白盒”封存模型，可以清楚的了解到分类器是如何工作以及测井参数的重要性。对于非均质性严重、岩性复杂多变的油气储层，传统的数理统计方法难以准确地反映测井曲线与岩性之间的非线性映射关系，而具有自组织、自学习、推理思维能力和非线性建模的决策树方法能够很好地解决这个问题，为储层评价提供高精度的岩性识别结果，保证复杂油气藏合理高效开发。The beneficial effect of adopting the above further solution is that the decision tree algorithm belongs to the "white box" storage model, and it is possible to clearly understand how the classifier works and the importance of the logging parameters. For oil and gas reservoirs with severe heterogeneity and complex and changeable lithology, it is difficult for traditional mathematical statistical methods to accurately reflect the nonlinear mapping relationship between logging curves and lithology. The decision tree method of capability and nonlinear modeling can well solve this problem, provide high-precision lithology identification results for reservoir evaluation, and ensure the reasonable and efficient development of complex oil and gas reservoirs.

进一步，对预处理之后的岩性识别数据集中的岩性进行数据挖掘的过程为：计算出每个测井参数在岩性识别中的权重，获得敏感参数进行建模，最后以表征方式自上而下建立树状识别模型，其中树的每一个分支代表一类岩性的识别规则，叶节点表示构成该识别规则的测井参数以及每个测井参数的数值区间。Further, the process of data mining the lithology in the preprocessed lithology identification data set is as follows: calculate the weight of each logging parameter in lithology identification, obtain sensitive parameters for modeling, and finally use the characterization method from top to bottom Next, a tree-like identification model is established, in which each branch of the tree represents a type of lithology identification rule, and leaf nodes represent the logging parameters constituting the identification rule and the value range of each logging parameter.

采用上述进一步方案的有益效果是：通过对不同测井参数的权重进行分析进而自动优选出岩性敏感参数，最终建立兼顾模型识别精度与样本泛化能力的复杂岩性识别模型，为储层综合评价提供重要的地质依据。The beneficial effect of adopting the above-mentioned further scheme is: by analyzing the weights of different logging parameters, the lithology sensitive parameters are automatically selected, and finally a complex lithology identification model that takes into account the model identification accuracy and the sample generalization ability is established, providing comprehensive information for reservoirs The evaluation provides important geological basis.

为了解决上述技术问题，本发明还提出一种基于决策树数据挖掘算法的复杂岩性识别系统，所述系统包括：In order to solve the above technical problems, the present invention also proposes a complex lithology identification system based on decision tree data mining algorithm, said system comprising:

数据集建立模块，用于建立密闭取心井段的岩性剖面，根据岩性剖面形成岩性识别数据集；The data set establishment module is used to establish the lithology profile of the sealed core section, and form the lithology identification data set according to the lithology profile;

预处理模块，用于对所述岩性识别数据集进行数据预处理；A preprocessing module, configured to perform data preprocessing on the lithology identification data set;

标识模块，用于以不同的测井参数分别对应岩性识别数据集中的不同岩性；The identification module is used to identify different lithologies in the lithology data set with different logging parameters respectively;

数据挖掘模块，用于对预处理之后的岩性识别数据集中的岩性进行数据挖掘，形成树状识别模型，从而根据所述树状识别模型以及测井参数所对应的岩性对复杂岩性进行识别。The data mining module is used to perform data mining on the lithology in the preprocessed lithology identification data set to form a tree identification model, so as to analyze the complex lithology according to the lithology corresponding to the tree identification model and logging parameters. to identify.

进一步，所述数据集建立模块还用于在建立密闭取心井段的岩性剖面后，依据厚层、薄夹层和岩性过渡带的测井读值原则分别读取不同岩性段的测井曲线响应值，建立砾岩岩性与测井参数的对应关系，形成岩性识别数据集。Further, the data set establishment module is also used to read the logging data of different lithology sections according to the logging reading principles of thick layers, thin interlayers, and lithology transition zones after establishing the lithology profile of the sealed core section. The response value of the well curve is used to establish the corresponding relationship between the lithology of the conglomerate and the logging parameters to form a lithology identification data set.

进一步，所述预处理模块具体用于对所述岩性识别数据集进行补齐缺失值、整体标准化以及消除异常点的预处理。Further, the preprocessing module is specifically used to preprocess the lithology identification data set by filling missing values, overall standardization, and eliminating abnormal points.

进一步，所述数据挖掘模块具体用于采用决策树算法对预处理之后的岩性识别数据集中的岩性进行数据挖掘。Further, the data mining module is specifically used to perform data mining on the lithology in the preprocessed lithology identification data set by using a decision tree algorithm.

进一步，所述数据挖掘模块还用于计算出每个测井参数在岩性识别中的权重，获得敏感参数进行建模，最后以表征方式自上而下建立树状识别模型，其中树的每一个分支代表一类岩性的识别规则，叶节点表示构成该识别规则的测井参数以及每个测井参数的数值区间。Further, the data mining module is also used to calculate the weight of each logging parameter in lithology identification, obtain sensitive parameters for modeling, and finally establish a tree-like identification model from top to bottom in a representational manner, wherein each tree A branch represents a class of lithology identification rules, and leaf nodes represent the logging parameters constituting the identification rules and the value range of each logging parameter.

附图说明Description of drawings

图1为本发明实施例所述的复杂岩性识别方法的流程图；Fig. 1 is the flow chart of the complex lithology identification method described in the embodiment of the present invention;

图2为本发明实施例所述的树状识别模型示意图；FIG. 2 is a schematic diagram of a tree recognition model described in an embodiment of the present invention;

图3为本发明实施例所述的复杂岩性识别系统的原理图。Fig. 3 is a schematic diagram of the complex lithology identification system described in the embodiment of the present invention.

具体实施方式detailed description

以下结合附图对本发明的原理和特征进行描述，所举实例只用于解释本发明，并非用于限定本发明的范围。The principles and features of the present invention are described below in conjunction with the accompanying drawings, and the examples given are only used to explain the present invention, and are not intended to limit the scope of the present invention.

如图1所示，本实施例为了确定不同岩性与测井曲线的响应关系，建立高精度、非线性的岩性识别模型，提高复杂岩性的识别精度，提出一种基于决策树数据挖掘算法的复杂岩性识别方法，所述方法包括：As shown in Figure 1, in order to determine the response relationship between different lithologies and logging curves, establish a high-precision, non-linear lithology identification model, and improve the identification accuracy of complex lithologies, this embodiment proposes a data mining method based on decision trees. An algorithmic complex lithology identification method, the method comprising:

优选的，在形成岩性识别数据集之前，结合研究区的地质背景与油藏特性，基于储层评价整体要求，从密闭取心井的岩心样品描述与铸体薄片定名两方面确定研究区的岩性名称，因为复杂储层的非均质性强，岩性变化剧烈，岩心局部的铸体薄片不能准确反映地层的真实岩性；而岩心描述又缺少对不同矿物成分的定量刻度，两种方法的有效结合可以提高复杂岩性定名的准确率。Preferably, before forming the lithology identification data set, combined with the geological background and reservoir characteristics of the research area, and based on the overall requirements of reservoir evaluation, the core sample description of the sealed coring well and the naming of the cast thin section are determined. The name of lithology, because of the strong heterogeneity of complex reservoirs and the drastic changes in lithology, the local cast thin slices of cores cannot accurately reflect the real lithology of formations; and the description of cores lacks quantitative scales for different mineral compositions. The effective combination of methods can improve the accuracy of naming complex lithology.

以克拉玛依油田六中区克下组砾岩油藏为例，砾岩油藏由于金物源、多水系和快速变化的沉积环境导致储层非均质性强及岩性复杂多变等特点，岩性的准确识别成为该类油藏调整开发的难点。根据研究区8口密闭取心井的岩心样品描述及铸体薄片鉴定结果确定砾岩油藏的岩性主要包括砾岩、砂砾岩、砂质砾岩、含砂砾岩、含砾粗砂岩、含砾砂岩、中砂岩、细砂岩、含砾泥岩、粉砂质泥岩、泥岩等十几种岩性。Taking the Kexia Formation conglomerate reservoir in the sixth middle area of Karamay Oilfield as an example, the conglomerate reservoir has the characteristics of strong heterogeneity and complex lithology due to gold source, multiple water systems and rapidly changing depositional environment. Accurate identification of nature has become a difficult point in the adjustment and development of this type of reservoir. According to the description of core samples from 8 closed coring wells in the study area and the identification results of cast thin sections, it is determined that the lithology of conglomerate reservoirs mainly includes conglomerate, glutenite, sandy conglomerate, glutenite, pebble-bearing coarse sandstone, and conglomerate. There are more than a dozen lithologies such as pebble sandstone, medium sandstone, fine sandstone, pebble-bearing mudstone, silty mudstone, and mudstone.

具体的，以层序地层学为依据建立8口密闭取心井段的岩性剖面，读取不同岩性段的测井曲线响应值，共计建立327个层段的砾岩岩性与测井数据的对应关系，形成砾岩岩性识别的样本数据集。Specifically, based on sequence stratigraphy, the lithology profiles of 8 sealed coring well sections were established, and the response values of logging curves of different lithology sections were read. A total of 327 sections of conglomerate lithology and logging The corresponding relationship of the data forms a sample data set for conglomerate lithology identification.

其中，测井读值的原则如下：①厚层读取平均值作为岩性识别的基础数据；②泥岩、砂砾岩、火山岩、变质岩等薄夹层读取测井曲线的最大值或最小值作为岩性识别的基础数据；③不同岩性之间的过渡带读取平均值，或者作为数据预处理中异常点进行备份处理。Among them, the principles of logging readings are as follows: ① read the average value of thick layers as the basic data for lithology identification; The basic data for lithology identification; ③ The transition zone between different lithologies is read as the average value, or it can be used as an abnormal point in data preprocessing for backup processing.

对于建立的岩性识别样本数据集进行数据预处理，主要包括补齐缺失值，整体标准化，消除异常点等，其中最重要的预处理程序是异常点的删除，对于具有复杂岩性的储层，造成数据样本异常的原因主要有两方面，一是薄夹层的存在，由于测井仪器纵向分辨率的局限，测井响应值不能反映地层岩性的真实信息；另一方面由于岩性过渡带上测井响应受到两侧围岩的影响，测井数值也很难准确刻度真实岩性的信息。依据造成测井曲线值失真的原因，共计删除18个异常点数据，其主要是由于薄夹层及岩性过渡带造成的。Data preprocessing for the established lithology identification sample data set mainly includes filling missing values, overall standardization, eliminating abnormal points, etc. The most important preprocessing procedure is the deletion of abnormal points. For reservoirs with complex lithology , there are two main reasons for the abnormality of data samples. One is the existence of thin interlayers. Due to the limitation of the longitudinal resolution of logging tools, the logging response value cannot reflect the real information of formation lithology; The upper logging response is affected by the surrounding rocks on both sides, and it is difficult for the logging data to accurately calibrate the real lithology information. According to the reasons for the distortion of the log curve values, a total of 18 abnormal point data were deleted, which were mainly caused by thin interlayers and lithological transition zones.

基于岩性定名确定需要识别的复杂岩性类别，根据储层评价的要求可以对某些相似岩性进行合并，也可以对一些重要岩性进行细分；另外，确定决策树算法的挖掘字段，即选择哪些测井曲线来指示岩性信息。Determine the complex lithology categories that need to be identified based on lithology naming, and according to the requirements of reservoir evaluation, some similar lithologies can be merged, and some important lithologies can also be subdivided; in addition, the mining field of the decision tree algorithm is determined, That is, which well logs are selected to indicate lithology information.

具体的，根据砾岩油藏储层评价的总体需求，六中区克下组砾岩岩性最终确定为砾岩、砂质砾岩、砂砾岩、含砾粗砂岩、细砂岩、含砾泥岩、粉砂质泥岩和泥岩等8种；挖掘字段选择原状地层电阻率(Rt)、自然伽马(GR)、自然电位(SP)、井径(CAL)、中子孔隙度(CNL)、声波时差(AC)和补偿密度(DEN)共计7个测井参数来指示岩性。Specifically, according to the overall requirements of conglomerate reservoir evaluation, the lithology of the conglomerate in the Kexia Formation of the Liuzhong District is finally determined to be conglomerate, sandy conglomerate, glutenite, pebble-bearing coarse sandstone, fine sandstone, and conglomerate mudstone 8 types, including silty mudstone and mudstone; mining field selects undisturbed formation resistivity (Rt), natural gamma ray (GR), spontaneous potential (SP), borehole diameter (CAL), neutron porosity (CNL), acoustic wave A total of seven logging parameters, time difference (AC) and compensated density (DEN), indicate lithology.

本实施例采用决策树算法对预处理之后的岩性识别数据集中的岩性进行数据挖掘。In this embodiment, a decision tree algorithm is used to perform data mining on the lithology in the preprocessed lithology identification data set.

具体的，决策树算法采用C5.0进行砾岩岩性的数据挖掘，算法利用信息增益率(Information Gain Ratio)作为划分度量，选择信息增益率值最大的属性作为分裂节点；C5.0采用悲观误差剪枝(Pessimistic Error Pruning)的方法，即利用二项式分布中的连续修正对再代入误差加以修正，以得到更为符合实际的错误率。Specifically, the decision tree algorithm uses C5.0 for data mining of conglomerate lithology. The algorithm uses the information gain ratio (Information Gain Ratio) as the division measure, and selects the attribute with the largest information gain ratio as the split node; C5.0 adopts the pessimistic The method of error pruning (Pessimistic Error Pruning) is to use the continuous correction in the binomial distribution to correct the re-substitution error to obtain a more realistic error rate.

分裂属性的选择即如何从众多属性中选择一个最佳的分裂属性，是算法的核心，选择标准大致分为两类：①属性间相互独立的选择策略；②属性间相互关联的选择策略，每种方法都有其优缺点，基于实际样本数据进行优选。剪枝策略包括预剪枝和后剪枝两类，前者采用及早停止树增长策略；后者则是首先按照最大规模生成初始决策树，然后再进行剪枝，实践证明后剪枝方法对于油气储层评价领域更为成功。The selection of split attributes, that is, how to select the best split attribute from many attributes, is the core of the algorithm. The selection criteria can be roughly divided into two categories: ① The selection strategy of independent attributes; ② The selection strategy of interrelated attributes. Both methods have their advantages and disadvantages, and are optimized based on actual sample data. The pruning strategy includes two types: pre-pruning and post-pruning. The former adopts an early stop tree growth strategy; the latter first generates an initial decision tree according to the largest scale, and then performs pruning. more successful in the field of hierarchical evaluation.

在分裂属性与剪枝策略确定的基础上，利用决策树算法对建立的岩性识别数据集进行挖掘，决策树算法可以给出每个测井参数在岩性识别中的权重，进而优选出敏感参数进行建模，对于所占权重较小，敏感性较低的参数则不参与决策树模型的建立。在确定建模敏感参数之后，决策树算法从上而下“分而治之”建立树状识别模型，树的每个分枝代表一类岩性的识别规则，叶节点表示构成该识别规则的属性参数以及每个参数的数值区间。On the basis of determining the split attribute and pruning strategy, the decision tree algorithm is used to mine the established lithology identification data set. The decision tree algorithm can give the weight of each logging parameter in lithology identification, and then optimize the sensitive The parameters are modeled, and the parameters with smaller weight and lower sensitivity are not involved in the establishment of the decision tree model. After determining the sensitive parameters for modeling, the decision tree algorithm establishes a tree-like identification model from top to bottom "divide and conquer". Each branch of the tree represents a type of lithology identification rule, and the leaf nodes represent the attribute parameters that constitute the identification rule and The range of values for each parameter.

具体的，从7个挖掘字段中优选出的3个砾岩岩性敏感参数，原状地层电阻率的参数权重最大，敏感性最高；声波时差和自然伽马次之，其他4个参数由于在砾岩岩性识别中所占权重较小，敏感性较低，因而不参与决策树模型的建立。在岩性敏感参数优选的基础上，决策树算法建立树状识别模型，整体上岩性识别模型从根节点即原状地层电阻率开始对数据样本进行测试，从上而下“分而治之”共计分为四个层次，树的每个分枝代表一类岩性的识别规则，如图2所示。Specifically, among the three sensitive parameters of conglomerate lithology selected from the seven excavated fields, the parameter weight of the undisturbed formation resistivity is the largest, and the sensitivity is the highest; Lithology identification has a small weight and low sensitivity, so it does not participate in the establishment of the decision tree model. Based on the optimization of lithology-sensitive parameters, the decision tree algorithm establishes a tree-like identification model. On the whole, the lithology identification model starts to test data samples from the root node, which is the original formation resistivity, and divides and conquers from top to bottom. Four levels, each branch of the tree represents a type of lithology identification rules, as shown in Figure 2.

利用决策树建立的“树枝规则”可以准确地识别样本数据集中的岩性，砾岩岩性的样本数据共计309个，决策树模型可以准确识别296个，综合识别准确率达到95.79％。由于建立的是高精度、非线性的识别模型，相较于常规的交会图方法，识别精度大幅度提高。另外，模型在其他数据集上的泛化能力也比较好，可以应用到研究区其他井的复杂岩性识别中。The "branch rule" established by using the decision tree can accurately identify the lithology in the sample data set. There are 309 sample data of conglomerate lithology, and the decision tree model can accurately identify 296, and the comprehensive identification accuracy rate reaches 95.79%. Since the high-precision and non-linear recognition model is established, the recognition accuracy is greatly improved compared with the conventional crossplot method. In addition, the generalization ability of the model on other data sets is relatively good, and it can be applied to the complex lithology identification of other wells in the study area.

决策树数据挖掘算法根据给定的挖掘字段自动优选出砾岩岩性识别的敏感参数，进而建立非线性映射的树状识别模型，模型的综合识别准确率比较高。为了给地球物理学家提供更直观、更易应用的识别模型，在算法优选出的三个砾岩岩性敏感参数中，首先构造声波时差与自然伽马的乘积(AC*GR)，然后结合原状地层电阻率(Rt)制作砾岩岩性交会图版，依据交会图版的变化信息分析储层的变化规律，为砾岩油藏开发方案的调整及避射强水淹层提供地质依据。The decision tree data mining algorithm automatically selects the sensitive parameters of conglomerate lithology identification according to the given mining fields, and then establishes a tree-like identification model of nonlinear mapping. The comprehensive identification accuracy of the model is relatively high. In order to provide geophysicists with a more intuitive and easy-to-apply identification model, among the three sensitive parameters of conglomerate lithology selected by the algorithm, first construct the product of acoustic time difference and natural gamma ray (AC*GR), and then combine the original Formation resistivity (Rt) is used to make a conglomerate lithology intersection chart, and the change law of the reservoir is analyzed according to the change information of the intersection chart, which provides geological basis for the adjustment of the development plan of the conglomerate reservoir and the avoidance of strong water-flooded layers.

利用本实例建立的岩性识别模型评价克拉玛依油田砾岩油藏的复杂岩性，取得了比较好的地质应用效果，并且为复杂油气藏的储层评价提供了一种新的解决思路与方案。从砾岩油藏岩性识别的实例中可以看出，数据挖掘可以在大量未知的数据中自动寻找岩性敏感参数，通过对参数权重的分析建立非线性的决策树识别模型，而数据挖掘的过程和结果为利用地球物理知识建立岩性图版的研究提供了思路和方法。两种方法是相互依赖、相互促进的关系，数据挖掘在早期可以给人们提供一些海量数据的分析规律，帮助地球物理研究提取有用的信息；地球物理学家利用其背景知识选择一些重要的信息以及构造一些新的参数，可以帮助数据挖掘方法确定更加合理和有效的挖掘字段，为地球物理模型的建立提供准确化、定量化、细分化的结果。Using the lithology identification model established in this example to evaluate the complex lithology of conglomerate reservoirs in Karamay Oilfield has achieved good geological application results, and provides a new solution for reservoir evaluation of complex oil and gas reservoirs. From the example of conglomerate reservoir lithology identification, it can be seen that data mining can automatically find lithological sensitive parameters in a large amount of unknown data, and establish a nonlinear decision tree identification model through the analysis of parameter weights. The process and results provide ideas and methods for the study of using geophysical knowledge to establish lithological maps. The two methods are interdependent and mutually promoting. Data mining can provide people with some analysis rules of massive data in the early stage and help geophysics research to extract useful information; geophysicists use their background knowledge to select some important information and Constructing some new parameters can help data mining methods determine more reasonable and effective mining fields, and provide accurate, quantitative, and subdivided results for the establishment of geophysical models.

如图3所示，对应的，本实施例还提出一种基于决策树数据挖掘算法的复杂岩性识别系统，所述系统包括：As shown in Figure 3, correspondingly, this embodiment also proposes a complex lithology identification system based on a decision tree data mining algorithm, and the system includes:

在形成岩性识别数据集之前，结合研究区的地质背景与油藏特性，基于储层评价整体要求，从密闭取心井的岩心样品描述与铸体薄片定名两方面确定研究区的岩性名称，因为复杂储层的非均质性强，岩性变化剧烈，岩心局部的铸体薄片不能准确反映地层的真实岩性；而岩心描述又缺少对不同矿物成分的定量刻度，两种方法的有效结合可以提高复杂岩性定名的准确率。Before forming the lithology identification data set, combined with the geological background and reservoir characteristics of the study area, and based on the overall requirements of reservoir evaluation, the lithology name of the study area is determined from two aspects: the description of the core samples of the sealed coring well and the naming of the casting thin sections , because of the strong heterogeneity of complex reservoirs and drastic changes in lithology, the cast thin slices of cores cannot accurately reflect the real lithology of formations; and the description of cores lacks quantitative scales for different mineral components. The two methods are effective Combining can improve the accuracy of complex lithological naming.

在岩性定名的基础上，以层序地层学为依据建立密闭取心井段的岩性剖面，读取不同岩性段的测井曲线响应值，测井读值的原则如下：①厚层读取平均值作为岩性识别的基础数据；②泥岩、砂砾岩、火山岩、变质岩等薄夹层读取测井曲线的最大值或最小值作为岩性识别的基础数据；③不同岩性之间的过渡带读取平均值，或者作为数据预处理中异常点进行备份处理。On the basis of lithology nomenclature, based on sequence stratigraphy, the lithology section of the closed core section is established, and the logging curve response values of different lithology sections are read. The principles of logging reading values are as follows: ①Thick layer Read the average value as the basic data for lithology identification; ② read the maximum or minimum value of the logging curve for thin interlayers such as mudstone, glutenite, volcanic rock, and metamorphic rock as the basic data for lithology identification; ③ between different lithologies Read the average value of the transition zone, or use it as an abnormal point in data preprocessing for backup processing.

对于建立的岩性识别样本数据集进行数据预处理，主要包括补齐缺失值，整体标准化，消除异常点等，其中最重要的预处理程序是异常点的删除，对于具有复杂岩性的储层，造成数据样本异常的原因主要有两方面，一是薄夹层的存在，由于测井仪器纵向分辨率的局限，测井响应值不能反映地层岩性的真实信息；另一方面由于岩性过渡带上测井响应受到两侧围岩的影响，测井数值也很难准确刻度真实岩性的信息。Data preprocessing for the established lithology identification sample data set mainly includes filling missing values, overall standardization, eliminating abnormal points, etc. The most important preprocessing procedure is the deletion of abnormal points. For reservoirs with complex lithology , there are two main reasons for the abnormality of data samples. One is the existence of thin interlayers. Due to the limitation of the longitudinal resolution of logging tools, the logging response value cannot reflect the real information of formation lithology; The upper logging response is affected by the surrounding rocks on both sides, and it is difficult for the logging data to accurately calibrate the real lithology information.

本发明所述的基于决策树数据挖掘算法的复杂岩性识别方法，其核心技术是分裂属性值及剪枝策略的优选，算法通过对不同岩性与测井曲线响应值的综合分析确定每种测井曲线在岩性识别中的权重，进而优选出岩性敏感参数建立树状模型，模型的精度及泛化能力都比较好。相对于交会图和其他数据挖掘算法，决策树岩性识别方法可以提供高精度、非线性的识别模型，对于具有复杂岩性的储层评价，其识别精度高于普通的交会图法；另一方面，决策树算法属于“白盒”封存模型，可以清楚地了解到分类器是如何工作以及各种参数的相对重要性，对地球物理学的研究也有很好的指导作用。因此，该发明在复杂岩性的储层评价中具有重要的应用价值及较好的市场需求。The complex lithology identification method based on the decision tree data mining algorithm described in the present invention, its core technology is the optimization of splitting attribute values and pruning strategies, the algorithm determines each The weight of logging curves in lithology identification, and then optimize the lithology sensitive parameters to build a tree model, the accuracy and generalization ability of the model are relatively good. Compared with crossplot and other data mining algorithms, the decision tree lithology identification method can provide a high-precision, non-linear identification model. For the evaluation of reservoirs with complex lithology, its identification accuracy is higher than that of ordinary crossplot methods; another On the one hand, the decision tree algorithm belongs to the "white box" storage model, which can clearly understand how the classifier works and the relative importance of various parameters, and it also has a good guiding role in the research of geophysics. Therefore, the invention has important application value and good market demand in reservoir evaluation of complex lithology.

以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection of the present invention. within range.

Claims

1. a kind of Complex lithologic identification method based on decision tree data mining algorithm, it is characterised in that methods described includes：

The lithological profile of sealing core drilling well section is set up, Lithology Discrimination data set is formed according to lithological profile；

Data prediction is carried out to the Lithology Discrimination data set；

Correspond to the different lithology in Lithology Discrimination data set respectively with different log parameters；

Data mining is carried out to the lithology in the Lithology Discrimination data set after pretreatment, tree-shaped identification model is formed, so that root Complex lithology is identified according to the lithology corresponding to the tree-shaped identification model and log parameter.

2. a kind of Complex lithologic identification method based on decision tree data mining algorithm according to claim 1, its feature It is, after the lithological profile of sealing core drilling well section is set up, the well logging readings principle according to thick-layer, short lap and lithology intermediate zone The log response of different lithology section is read respectively, the corresponding relation of conglomerate lithology and log parameter is set up, and forms lithology Identification data collection.

3. a kind of Complex lithologic identification method based on decision tree data mining algorithm according to claim 2, its feature It is, the process that data prediction is carried out to the Lithology Discrimination data set is standardized and disappeared including polishing missing values, entirety Except abnormity point.

4. a kind of Complex lithologic identification method based on decision tree data mining algorithm according to claim 3, its feature It is, data mining is carried out to the lithology in the Lithology Discrimination data set after pretreatment using decision Tree algorithms.

5. a kind of Complex lithologic identification side based on decision tree data mining algorithm according to any one of Claims 1-4 Method, it is characterised in that be to the process that the lithology in the Lithology Discrimination data set after pretreatment carries out data mining：Calculate Each weight of the log parameter in Lithology Discrimination, is obtained sensitive parameter and is modeled, finally built from top to bottom with characteristic manner Tree-shaped identification model is found, wherein each branch set represents the recognition rule of a class lithology, leaf node represents to constitute the identification The log parameter of rule and the numerical intervals of each log parameter.

6. a kind of Complex lithologic identification system based on decision tree data mining algorithm, it is characterised in that the system includes：

Data set sets up module, the lithological profile for setting up sealing core drilling well section, and Lithology Discrimination number is formed according to lithological profile According to collection；

Pretreatment module, for carrying out data prediction to the Lithology Discrimination data set；

Mark module, for corresponding to the different lithology in Lithology Discrimination data set respectively with different log parameters；

Data-mining module, for carrying out data mining to the lithology in the Lithology Discrimination data set after pretreatment, forms tree Shape identification model, so that the lithology according to corresponding to the tree-shaped identification model and log parameter is known to complex lithology Not.

7. a kind of Complex lithologic identification system based on decision tree data mining algorithm according to claim 6, its feature Be, the data set is set up module and is additionally operable to after the lithological profile of sealing core drilling well section is set up, according to thick-layer, short lap and The well logging readings principle of lithology intermediate zone reads the log response of different lithology section respectively, sets up conglomerate lithology and well logging The corresponding relation of parameter, forms Lithology Discrimination data set.

8. a kind of Complex lithologic identification system based on decision tree data mining algorithm according to claim 7, its feature Be, the pretreatment module specifically for the Lithology Discrimination data set is carried out polishing missing values, overall standardization and Eliminate the pretreatment of abnormity point.

9. a kind of Complex lithologic identification system based on decision tree data mining algorithm according to claim 8, its feature It is, the data-mining module is specifically for using decision Tree algorithms to the rock in the Lithology Discrimination data set after pretreatment Property carry out data mining.

10. a kind of Complex lithologic identification system based on decision tree data mining algorithm according to any one of claim 6 to 9 System, it is characterised in that the data-mining module is additionally operable to calculate weight of each log parameter in Lithology Discrimination, is obtained Sensitive parameter is modeled, and finally sets up tree-shaped identification model from top to bottom with characteristic manner, wherein each the branch's generation set The recognition rule of the class lithology of table one, leaf node represents to constitute the log parameter of the recognition rule and the numerical value of each log parameter It is interval.