CN111090680A - Shared logging data mining method - Google Patents
Shared logging data mining method Download PDFInfo
- Publication number
- CN111090680A CN111090680A CN201911086499.1A CN201911086499A CN111090680A CN 111090680 A CN111090680 A CN 111090680A CN 201911086499 A CN201911086499 A CN 201911086499A CN 111090680 A CN111090680 A CN 111090680A
- Authority
- CN
- China
- Prior art keywords
- data mining
- curve
- curves
- learning
- logging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 238000007418 data mining Methods 0.000 title claims abstract description 59
- 239000011435 rock Substances 0.000 claims abstract description 13
- 239000011159 matrix material Substances 0.000 claims abstract description 8
- 238000002790 cross-validation Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000012706 support-vector machine Methods 0.000 claims description 3
- 230000035699 permeability Effects 0.000 claims description 2
- 239000000284 extract Substances 0.000 abstract description 4
- 230000002349 favourable effect Effects 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a shared logging data mining method, which comprises the following steps: inputting all logging curves related to a target learning task and learning samples formed by target curves, converting data of the learning samples into a two-dimensional matrix, normalizing the two-dimensional matrix, obtaining valuable learning samples by curve intelligent selection, obtaining a data mining method by algorithm intelligent selection, optimizing a data mining model by parameter intelligent selection, storing learning knowledge in a rock physics data mining knowledge base and sharing and calling. The invention has the beneficial effects that: the method can quickly and intelligently select curves, algorithms and parameters, supports model sharing, quickly and accurately extracts the combination of the logging curves, the data mining method and the combination of the method parameters which are required to finish the target learning task, and is favorable for accurately implementing logging data mining tasks such as logging curve reconstruction, rock physics phase classification and the like.
Description
Technical Field
The invention relates to the technical field of physical logging data mining, in particular to a shared logging data mining method.
Background
In the geophysical logging data mining process, the construction of a learning sample, the selection of a learning method and the setting of method parameters are the keys influencing the data mining effect. At present, the industry still adopts a pure manual setting mode in the three aspects, particularly, the selection of the learning method and the setting of the method parameters are based on the quality of the previous learning result, and the method and the parameters are adjusted in an trial mode in combination with the understanding of an interpreter, so that the mode is inefficient, and the optimal method selection and parameter combination are difficult to obtain.
Disclosure of Invention
The invention aims to provide a shared logging data mining method, which can quickly and intelligently select curves, algorithms and parameters, supports model sharing, and can quickly and accurately extract the combination of logging curves, data mining methods and method parameters required by completing a target learning task.
In order to achieve the purpose, the invention adopts the following technical scheme:
a shared logging data mining method comprises the following steps:
the method comprises the following steps: inputting all logging curves related to a target learning task and learning samples consisting of target curves, filling missing values of the learning samples into a mean value, deleting repeated depths of logging curve values, converting qualitative text data into a two-dimensional matrix, and finally carrying out normalization standardization operation on the two-dimensional matrix;
step two: intelligently selecting curves, taking each curve in a learning sample, and calculating the divergence and the correlation of the curves, wherein the divergence calculation formula of the curves is as follows:
the correlation of the curves is calculated by the formula:
comprehensively obtaining the value score of each curve according to the formula, selecting a plurality of curves with the highest value scores as valuable learning curves, and forming a new learning sample;
step three: intelligently selecting an algorithm, calling experience parameters of various methods in a rock physical data mining knowledge base, then dividing a learning sample in proportion, comprehensively obtaining the accuracy of various data mining methods by using m times of cross validation, and searching the first n methods with the highest accuracy as data mining methods;
step four: the method comprises the steps of intelligently selecting parameters, dividing s threads according to the number of data mining methods to intelligently select the parameters of each method, automatically optimizing by using a hyper-parameter, selecting the variation range of all the parameters, adopting an iteration method, selecting the ith parameter combination, calculating the accuracy of the method, updating the ith parameter combination to be the optimal parameter combination if the accuracy of the ith parameter combination is greater than the maximum accuracy of the previous i-1 parameter combinations, or not updating, then performing the next iteration until all possible combinations of the parameters are traversed, finally determining the optimal parameter combination of each method, wherein the obtained optimal parameter combination can be used for final data mining, and obtaining a data mining model;
step five: and storing and sharing and calling learning knowledge, storing the data mining related learning knowledge into a rock physics data mining knowledge base, calling the data mining related learning knowledge from the rock physics data mining knowledge base when the data mining related learning knowledge is required, determining a prediction sample input curve according to curve redirection, and obtaining the prediction curve by using a data mining model.
Preferably, in the first step, the well logging curve includes, but is not limited to, a natural gamma curve, a natural potential curve, a neutron density curve, and a resistivity curve, and the target curve is a curve to be predicted, including, but not limited to, a porosity curve and a permeability curve.
Preferably, in the third step, the methods called from the petrophysical data mining knowledge base of the petrophysical data include, but are not limited to, support vector machine, bayes and gradient boosting tree.
The invention has the beneficial effects that: the method can quickly and intelligently select curves, algorithms and parameters and support model sharing, and adopts methods such as multithreading, cross validation, hyperparameter searching and the like to quickly and accurately extract the combination of logging curves, data mining methods and method parameters which are required to finish a target learning task, so that the method is favorable for accurately implementing logging data mining tasks such as logging curve reconstruction, rock physics classification and the like.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below.
A shared logging data mining method comprises the following steps:
the method comprises the following steps: inputting all logging curves related to a target learning task and learning samples consisting of the target curves, wherein the logging curves comprise a natural gamma curve, a natural potential curve, a neutron density curve and a resistivity curve, the target curves are porosity curves, then filling missing values of the learning samples into a mean value, deleting repeated depths of logging curve values, converting qualitative text data into a two-dimensional matrix, and finally carrying out normalization standardization operation on the two-dimensional matrix;
step two: intelligently selecting curves, taking each curve in a learning sample, and calculating the divergence and the correlation of the curves, wherein the divergence calculation formula of the curves is as follows:
the correlation of the curves is calculated by the formula:
comprehensively obtaining the value score of each curve according to the formula, selecting a plurality of curves with the highest value scores as valuable learning curves, and forming a new learning sample;
step three: the method comprises the steps of intelligently selecting an algorithm, calling empirical parameters of a support vector machine, Bayes and a gradient lifting tree in a rock physical data mining knowledge base, then dividing a learning sample according to a proportion of 0.7/0.3, using 10 times of cross validation synthesis to obtain the accuracy of various data mining methods, and searching a method with the highest accuracy as a data mining method;
step four: intelligently selecting parameters, dividing 1 thread according to the number of data mining methods to intelligently select the parameters of each method, automatically optimizing by using hyper-parameters, selecting the variation ranges of all the parameters, wherein the variation ranges of all the parameters are (kernel: 'linear', 'rbf', 'sigmoid'), (C: 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1,2,4), (gamma: 0.125,0.25,0.5,1,2,4,6), (class _ weight: 'balanced', None), (decision _ function _ shape: 'oo', 'ovr'), adopting an iterative method, taking the ith parameter combination, calculating the accuracy of the method, if the accuracy of the ith parameter combination is higher than the accuracy of the i-1 parameter combination, updating the optimal parameter combination as the parameter, and otherwise, updating the parameter for the next time, until all possible combinations of the parameters are traversed, the optimal parameter combination of each method is finally determined, and the optimal parameter combination obtained from (kernel: 'linear'), (C: 4), (gamma: 0.125), (class _ weight: None) and (decision _ function _ shape: 'oo') is used for final data mining to obtain a data mining model;
step five: and storing and sharing and calling learning knowledge, storing the data mining related learning knowledge into a rock physics data mining knowledge base, calling the data mining related learning knowledge from the rock physics data mining knowledge base when the data mining related learning knowledge is required, determining a prediction sample input curve according to curve redirection, and obtaining the prediction curve by using a data mining model.
The method can quickly and intelligently select curves, algorithms and parameters and support model sharing, and adopts methods such as multithreading, cross validation, hyperparameter searching and the like to quickly and accurately extract the combination of logging curves, data mining methods and method parameters which are required to finish a target learning task, so that the method is favorable for accurately implementing logging data mining tasks such as logging curve reconstruction, rock physics classification and the like.
Claims (3)
1. A shared logging data mining method is characterized by comprising the following steps:
the method comprises the following steps: inputting all logging curves related to a target learning task and learning samples consisting of target curves, filling missing values of the learning samples into a mean value, deleting repeated depths of logging curve values, converting qualitative text data into a two-dimensional matrix, and finally carrying out normalization standardization operation on the two-dimensional matrix;
step two: intelligently selecting curves, taking each curve in a learning sample, and calculating the divergence and the correlation of the curves, wherein the divergence calculation formula of the curves is as follows:
the correlation of the curves is calculated by the formula:
comprehensively obtaining the value score of each curve according to the formula, selecting a plurality of curves with the highest value scores as valuable learning curves, and forming a new learning sample;
step three: intelligently selecting an algorithm, calling experience parameters of various methods in a rock physical data mining knowledge base, then dividing a learning sample in proportion, comprehensively obtaining the accuracy of various data mining methods by using m times of cross validation, and searching the first n methods with the highest accuracy as data mining methods;
step four: the method comprises the steps of intelligently selecting parameters, dividing s threads according to the number of data mining methods to intelligently select the parameters of each method, automatically optimizing by using a hyper-parameter, selecting the variation range of all the parameters, adopting an iteration method, selecting the ith parameter combination, calculating the accuracy of the method, updating the ith parameter combination to be the optimal parameter combination if the accuracy of the ith parameter combination is greater than the maximum accuracy of the previous i-1 parameter combinations, or not updating, then performing the next iteration until all possible combinations of the parameters are traversed, finally determining the optimal parameter combination of each method, wherein the obtained optimal parameter combination can be used for final data mining, and obtaining a data mining model;
step five: and storing and sharing and calling learning knowledge, storing the data mining related learning knowledge into a rock physics data mining knowledge base, calling the data mining related learning knowledge from the rock physics data mining knowledge base when the data mining related learning knowledge is required, determining a prediction sample input curve according to curve redirection, and obtaining the prediction curve by using a data mining model.
2. The method of claim 1, wherein in the first step, the well log curve includes but is not limited to a natural gamma curve, a natural potential curve, a neutron density curve, and a resistivity curve, and the target curve is a curve to be predicted, including but not limited to a porosity curve and a permeability curve.
3. The method of claim 1, wherein in step three, the methods called from the petrophysical data mining knowledge base include but are not limited to support vector machines, bayesian, and gradient boosting trees.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911086499.1A CN111090680A (en) | 2019-11-08 | 2019-11-08 | Shared logging data mining method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911086499.1A CN111090680A (en) | 2019-11-08 | 2019-11-08 | Shared logging data mining method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111090680A true CN111090680A (en) | 2020-05-01 |
Family
ID=70393121
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911086499.1A Pending CN111090680A (en) | 2019-11-08 | 2019-11-08 | Shared logging data mining method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111090680A (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268460A (en) * | 2016-12-30 | 2018-07-10 | 广东精点数据科技股份有限公司 | A kind of method for automatically selecting optimal models based on big data |
CN108763460A (en) * | 2018-05-28 | 2018-11-06 | 成都优易数据有限公司 | A kind of machine learning method and system based on SQL |
CN110223156A (en) * | 2019-05-16 | 2019-09-10 | 杭州排列科技有限公司 | Automation model evolutionary algorithm based on gradually optimal feature selection |
-
2019
- 2019-11-08 CN CN201911086499.1A patent/CN111090680A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268460A (en) * | 2016-12-30 | 2018-07-10 | 广东精点数据科技股份有限公司 | A kind of method for automatically selecting optimal models based on big data |
CN108763460A (en) * | 2018-05-28 | 2018-11-06 | 成都优易数据有限公司 | A kind of machine learning method and system based on SQL |
CN110223156A (en) * | 2019-05-16 | 2019-09-10 | 杭州排列科技有限公司 | Automation model evolutionary algorithm based on gradually optimal feature selection |
Non-Patent Citations (3)
Title |
---|
李洪奇等: "基于数据挖掘技术的测井评价方法", 《测井技术》 * |
李洪奇等: "复杂储层测井评价数据挖掘方法研究", 《石油学报》 * |
范广玲等: "数据挖掘模型选择的通用建模研究", 《科学技术与工程》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kukačka et al. | Regularization for deep learning: A taxonomy | |
CN107122861B (en) | Gas emission quantity prediction method based on PCA-PSO-ELM | |
Ba et al. | Adaptive dropout for training deep neural networks | |
CN108985335B (en) | Integrated learning prediction method for irradiation swelling of nuclear reactor cladding material | |
CN114548591B (en) | Sequential data prediction method and system based on mixed deep learning model and Stacking | |
CN112836802A (en) | Semi-supervised learning method, lithology prediction method and storage medium | |
Meng et al. | Application of support vector machines to a small-sample prediction | |
CN107292406A (en) | Seismic properties method for optimizing based on vector regression and genetic algorithm | |
CN108876038B (en) | Big data, artificial intelligence and super calculation synergetic material performance prediction method | |
Chaki et al. | A one-class classification framework using SVDD: application to an imbalanced geological dataset | |
Ferris et al. | Variational Monte Carlo with the multiscale entanglement renormalization ansatz | |
CN103942421B (en) | Method for predicting testing data on basis of noise disturbance | |
CN113656707A (en) | Financing product recommendation method, system, storage medium and equipment | |
CN111090680A (en) | Shared logging data mining method | |
CN112926251A (en) | Landslide displacement high-precision prediction method based on machine learning | |
CN105335763A (en) | Fabric defect classification method based on improved extreme learning machine | |
CN110866551B (en) | Drilling data dimension reduction method based on high correlation filtering algorithm and PCA algorithm | |
CN110956388A (en) | Method for generating yield increasing scheme of offshore oil and gas reservoir | |
CN117076921A (en) | Prediction method of logging-while-drilling resistivity curve based on residual fully-connected network | |
CN104517121A (en) | Spatial big data dictionary learning method based on particle swarm optimization | |
CN109256142A (en) | Voice conversion is based on extension kernel class gridding method processing sporadic data modeling method and equipment | |
Taqyudin et al. | Wood Classification Based on Fiber Texture Using Backpropagation Method | |
CN115238860A (en) | Method and device for generating leakage pressure prediction model | |
CN109858127B (en) | Blue algae bloom prediction method based on recursive time sequence deep confidence network | |
CN114169535A (en) | Anomaly detection algorithm of industrial Internet of things data platform based on group intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200501 |
|
RJ01 | Rejection of invention patent application after publication |