CN111090680A

CN111090680A - Shared logging data mining method

Info

Publication number: CN111090680A
Application number: CN201911086499.1A
Authority: CN
Inventors: 邓志勇; 丁磊; 胡向阳; 张恒荣; 刘土亮
Original assignee: China National Offshore Oil Corp CNOOC; CNOOC China Ltd Zhanjiang Branch
Current assignee: China National Offshore Oil Corp CNOOC; CNOOC China Ltd Zhanjiang Branch
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2020-05-01

Abstract

The invention discloses a shared logging data mining method, which comprises the following steps: inputting all logging curves related to a target learning task and learning samples formed by target curves, converting data of the learning samples into a two-dimensional matrix, normalizing the two-dimensional matrix, obtaining valuable learning samples by curve intelligent selection, obtaining a data mining method by algorithm intelligent selection, optimizing a data mining model by parameter intelligent selection, storing learning knowledge in a rock physics data mining knowledge base and sharing and calling. The invention has the beneficial effects that: the method can quickly and intelligently select curves, algorithms and parameters, supports model sharing, quickly and accurately extracts the combination of the logging curves, the data mining method and the combination of the method parameters which are required to finish the target learning task, and is favorable for accurately implementing logging data mining tasks such as logging curve reconstruction, rock physics phase classification and the like.

Description

Shared logging data mining method

Technical Field

The invention relates to the technical field of physical logging data mining, in particular to a shared logging data mining method.

Background

In the geophysical logging data mining process, the construction of a learning sample, the selection of a learning method and the setting of method parameters are the keys influencing the data mining effect. At present, the industry still adopts a pure manual setting mode in the three aspects, particularly, the selection of the learning method and the setting of the method parameters are based on the quality of the previous learning result, and the method and the parameters are adjusted in an trial mode in combination with the understanding of an interpreter, so that the mode is inefficient, and the optimal method selection and parameter combination are difficult to obtain.

Disclosure of Invention

The invention aims to provide a shared logging data mining method, which can quickly and intelligently select curves, algorithms and parameters, supports model sharing, and can quickly and accurately extract the combination of logging curves, data mining methods and method parameters required by completing a target learning task.

In order to achieve the purpose, the invention adopts the following technical scheme:

a shared logging data mining method comprises the following steps:

the method comprises the following steps: inputting all logging curves related to a target learning task and learning samples consisting of target curves, filling missing values of the learning samples into a mean value, deleting repeated depths of logging curve values, converting qualitative text data into a two-dimensional matrix, and finally carrying out normalization standardization operation on the two-dimensional matrix;

step two: intelligently selecting curves, taking each curve in a learning sample, and calculating the divergence and the correlation of the curves, wherein the divergence calculation formula of the curves is as follows:

the correlation of the curves is calculated by the formula:

comprehensively obtaining the value score of each curve according to the formula, selecting a plurality of curves with the highest value scores as valuable learning curves, and forming a new learning sample;

step three: intelligently selecting an algorithm, calling experience parameters of various methods in a rock physical data mining knowledge base, then dividing a learning sample in proportion, comprehensively obtaining the accuracy of various data mining methods by using m times of cross validation, and searching the first n methods with the highest accuracy as data mining methods;

step four: the method comprises the steps of intelligently selecting parameters, dividing s threads according to the number of data mining methods to intelligently select the parameters of each method, automatically optimizing by using a hyper-parameter, selecting the variation range of all the parameters, adopting an iteration method, selecting the ith parameter combination, calculating the accuracy of the method, updating the ith parameter combination to be the optimal parameter combination if the accuracy of the ith parameter combination is greater than the maximum accuracy of the previous i-1 parameter combinations, or not updating, then performing the next iteration until all possible combinations of the parameters are traversed, finally determining the optimal parameter combination of each method, wherein the obtained optimal parameter combination can be used for final data mining, and obtaining a data mining model;

step five: and storing and sharing and calling learning knowledge, storing the data mining related learning knowledge into a rock physics data mining knowledge base, calling the data mining related learning knowledge from the rock physics data mining knowledge base when the data mining related learning knowledge is required, determining a prediction sample input curve according to curve redirection, and obtaining the prediction curve by using a data mining model.

Preferably, in the first step, the well logging curve includes, but is not limited to, a natural gamma curve, a natural potential curve, a neutron density curve, and a resistivity curve, and the target curve is a curve to be predicted, including, but not limited to, a porosity curve and a permeability curve.

Preferably, in the third step, the methods called from the petrophysical data mining knowledge base of the petrophysical data include, but are not limited to, support vector machine, bayes and gradient boosting tree.

The invention has the beneficial effects that: the method can quickly and intelligently select curves, algorithms and parameters and support model sharing, and adopts methods such as multithreading, cross validation, hyperparameter searching and the like to quickly and accurately extract the combination of logging curves, data mining methods and method parameters which are required to finish a target learning task, so that the method is favorable for accurately implementing logging data mining tasks such as logging curve reconstruction, rock physics classification and the like.

Detailed Description

The technical solutions in the embodiments of the present invention will be described below.

A shared logging data mining method comprises the following steps:

the method comprises the following steps: inputting all logging curves related to a target learning task and learning samples consisting of the target curves, wherein the logging curves comprise a natural gamma curve, a natural potential curve, a neutron density curve and a resistivity curve, the target curves are porosity curves, then filling missing values of the learning samples into a mean value, deleting repeated depths of logging curve values, converting qualitative text data into a two-dimensional matrix, and finally carrying out normalization standardization operation on the two-dimensional matrix;

the correlation of the curves is calculated by the formula:

step three: the method comprises the steps of intelligently selecting an algorithm, calling empirical parameters of a support vector machine, Bayes and a gradient lifting tree in a rock physical data mining knowledge base, then dividing a learning sample according to a proportion of 0.7/0.3, using 10 times of cross validation synthesis to obtain the accuracy of various data mining methods, and searching a method with the highest accuracy as a data mining method;

step four: intelligently selecting parameters, dividing 1 thread according to the number of data mining methods to intelligently select the parameters of each method, automatically optimizing by using hyper-parameters, selecting the variation ranges of all the parameters, wherein the variation ranges of all the parameters are (kernel: 'linear', 'rbf', 'sigmoid'), (C: 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1,2,4), (gamma: 0.125,0.25,0.5,1,2,4,6), (class _ weight: 'balanced', None), (decision _ function _ shape: 'oo', 'ovr'), adopting an iterative method, taking the ith parameter combination, calculating the accuracy of the method, if the accuracy of the ith parameter combination is higher than the accuracy of the i-1 parameter combination, updating the optimal parameter combination as the parameter, and otherwise, updating the parameter for the next time, until all possible combinations of the parameters are traversed, the optimal parameter combination of each method is finally determined, and the optimal parameter combination obtained from (kernel: 'linear'), (C: 4), (gamma: 0.125), (class _ weight: None) and (decision _ function _ shape: 'oo') is used for final data mining to obtain a data mining model;

The method can quickly and intelligently select curves, algorithms and parameters and support model sharing, and adopts methods such as multithreading, cross validation, hyperparameter searching and the like to quickly and accurately extract the combination of logging curves, data mining methods and method parameters which are required to finish a target learning task, so that the method is favorable for accurately implementing logging data mining tasks such as logging curve reconstruction, rock physics classification and the like.

Claims

1. A shared logging data mining method is characterized by comprising the following steps:

the correlation of the curves is calculated by the formula:

2. The method of claim 1, wherein in the first step, the well log curve includes but is not limited to a natural gamma curve, a natural potential curve, a neutron density curve, and a resistivity curve, and the target curve is a curve to be predicted, including but not limited to a porosity curve and a permeability curve.

3. The method of claim 1, wherein in step three, the methods called from the petrophysical data mining knowledge base include but are not limited to support vector machines, bayesian, and gradient boosting trees.