CN111090680A - Shared logging data mining method - Google Patents

Shared logging data mining method Download PDF

Info

Publication number
CN111090680A
CN111090680A CN201911086499.1A CN201911086499A CN111090680A CN 111090680 A CN111090680 A CN 111090680A CN 201911086499 A CN201911086499 A CN 201911086499A CN 111090680 A CN111090680 A CN 111090680A
Authority
CN
China
Prior art keywords
data mining
curve
curves
learning
logging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911086499.1A
Other languages
Chinese (zh)
Inventor
邓志勇
丁磊
胡向阳
张恒荣
刘土亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China National Offshore Oil Corp CNOOC
CNOOC China Ltd Zhanjiang Branch
Original Assignee
China National Offshore Oil Corp CNOOC
CNOOC China Ltd Zhanjiang Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China National Offshore Oil Corp CNOOC, CNOOC China Ltd Zhanjiang Branch filed Critical China National Offshore Oil Corp CNOOC
Priority to CN201911086499.1A priority Critical patent/CN111090680A/en
Publication of CN111090680A publication Critical patent/CN111090680A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a shared logging data mining method, which comprises the following steps: inputting all logging curves related to a target learning task and learning samples formed by target curves, converting data of the learning samples into a two-dimensional matrix, normalizing the two-dimensional matrix, obtaining valuable learning samples by curve intelligent selection, obtaining a data mining method by algorithm intelligent selection, optimizing a data mining model by parameter intelligent selection, storing learning knowledge in a rock physics data mining knowledge base and sharing and calling. The invention has the beneficial effects that: the method can quickly and intelligently select curves, algorithms and parameters, supports model sharing, quickly and accurately extracts the combination of the logging curves, the data mining method and the combination of the method parameters which are required to finish the target learning task, and is favorable for accurately implementing logging data mining tasks such as logging curve reconstruction, rock physics phase classification and the like.

Description

Shared logging data mining method
Technical Field
The invention relates to the technical field of physical logging data mining, in particular to a shared logging data mining method.
Background
In the geophysical logging data mining process, the construction of a learning sample, the selection of a learning method and the setting of method parameters are the keys influencing the data mining effect. At present, the industry still adopts a pure manual setting mode in the three aspects, particularly, the selection of the learning method and the setting of the method parameters are based on the quality of the previous learning result, and the method and the parameters are adjusted in an trial mode in combination with the understanding of an interpreter, so that the mode is inefficient, and the optimal method selection and parameter combination are difficult to obtain.
Disclosure of Invention
The invention aims to provide a shared logging data mining method, which can quickly and intelligently select curves, algorithms and parameters, supports model sharing, and can quickly and accurately extract the combination of logging curves, data mining methods and method parameters required by completing a target learning task.
In order to achieve the purpose, the invention adopts the following technical scheme:
a shared logging data mining method comprises the following steps:
the method comprises the following steps: inputting all logging curves related to a target learning task and learning samples consisting of target curves, filling missing values of the learning samples into a mean value, deleting repeated depths of logging curve values, converting qualitative text data into a two-dimensional matrix, and finally carrying out normalization standardization operation on the two-dimensional matrix;
step two: intelligently selecting curves, taking each curve in a learning sample, and calculating the divergence and the correlation of the curves, wherein the divergence calculation formula of the curves is as follows:
Figure BDA0002265576120000021
the correlation of the curves is calculated by the formula:
Figure BDA0002265576120000022
comprehensively obtaining the value score of each curve according to the formula, selecting a plurality of curves with the highest value scores as valuable learning curves, and forming a new learning sample;
step three: intelligently selecting an algorithm, calling experience parameters of various methods in a rock physical data mining knowledge base, then dividing a learning sample in proportion, comprehensively obtaining the accuracy of various data mining methods by using m times of cross validation, and searching the first n methods with the highest accuracy as data mining methods;
step four: the method comprises the steps of intelligently selecting parameters, dividing s threads according to the number of data mining methods to intelligently select the parameters of each method, automatically optimizing by using a hyper-parameter, selecting the variation range of all the parameters, adopting an iteration method, selecting the ith parameter combination, calculating the accuracy of the method, updating the ith parameter combination to be the optimal parameter combination if the accuracy of the ith parameter combination is greater than the maximum accuracy of the previous i-1 parameter combinations, or not updating, then performing the next iteration until all possible combinations of the parameters are traversed, finally determining the optimal parameter combination of each method, wherein the obtained optimal parameter combination can be used for final data mining, and obtaining a data mining model;
step five: and storing and sharing and calling learning knowledge, storing the data mining related learning knowledge into a rock physics data mining knowledge base, calling the data mining related learning knowledge from the rock physics data mining knowledge base when the data mining related learning knowledge is required, determining a prediction sample input curve according to curve redirection, and obtaining the prediction curve by using a data mining model.
Preferably, in the first step, the well logging curve includes, but is not limited to, a natural gamma curve, a natural potential curve, a neutron density curve, and a resistivity curve, and the target curve is a curve to be predicted, including, but not limited to, a porosity curve and a permeability curve.
Preferably, in the third step, the methods called from the petrophysical data mining knowledge base of the petrophysical data include, but are not limited to, support vector machine, bayes and gradient boosting tree.
The invention has the beneficial effects that: the method can quickly and intelligently select curves, algorithms and parameters and support model sharing, and adopts methods such as multithreading, cross validation, hyperparameter searching and the like to quickly and accurately extract the combination of logging curves, data mining methods and method parameters which are required to finish a target learning task, so that the method is favorable for accurately implementing logging data mining tasks such as logging curve reconstruction, rock physics classification and the like.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below.
A shared logging data mining method comprises the following steps:
the method comprises the following steps: inputting all logging curves related to a target learning task and learning samples consisting of the target curves, wherein the logging curves comprise a natural gamma curve, a natural potential curve, a neutron density curve and a resistivity curve, the target curves are porosity curves, then filling missing values of the learning samples into a mean value, deleting repeated depths of logging curve values, converting qualitative text data into a two-dimensional matrix, and finally carrying out normalization standardization operation on the two-dimensional matrix;
step two: intelligently selecting curves, taking each curve in a learning sample, and calculating the divergence and the correlation of the curves, wherein the divergence calculation formula of the curves is as follows:
Figure BDA0002265576120000031
the correlation of the curves is calculated by the formula:
Figure BDA0002265576120000041
comprehensively obtaining the value score of each curve according to the formula, selecting a plurality of curves with the highest value scores as valuable learning curves, and forming a new learning sample;
step three: the method comprises the steps of intelligently selecting an algorithm, calling empirical parameters of a support vector machine, Bayes and a gradient lifting tree in a rock physical data mining knowledge base, then dividing a learning sample according to a proportion of 0.7/0.3, using 10 times of cross validation synthesis to obtain the accuracy of various data mining methods, and searching a method with the highest accuracy as a data mining method;
step four: intelligently selecting parameters, dividing 1 thread according to the number of data mining methods to intelligently select the parameters of each method, automatically optimizing by using hyper-parameters, selecting the variation ranges of all the parameters, wherein the variation ranges of all the parameters are (kernel: 'linear', 'rbf', 'sigmoid'), (C: 0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1,2,4), (gamma: 0.125,0.25,0.5,1,2,4,6), (class _ weight: 'balanced', None), (decision _ function _ shape: 'oo', 'ovr'), adopting an iterative method, taking the ith parameter combination, calculating the accuracy of the method, if the accuracy of the ith parameter combination is higher than the accuracy of the i-1 parameter combination, updating the optimal parameter combination as the parameter, and otherwise, updating the parameter for the next time, until all possible combinations of the parameters are traversed, the optimal parameter combination of each method is finally determined, and the optimal parameter combination obtained from (kernel: 'linear'), (C: 4), (gamma: 0.125), (class _ weight: None) and (decision _ function _ shape: 'oo') is used for final data mining to obtain a data mining model;
step five: and storing and sharing and calling learning knowledge, storing the data mining related learning knowledge into a rock physics data mining knowledge base, calling the data mining related learning knowledge from the rock physics data mining knowledge base when the data mining related learning knowledge is required, determining a prediction sample input curve according to curve redirection, and obtaining the prediction curve by using a data mining model.
The method can quickly and intelligently select curves, algorithms and parameters and support model sharing, and adopts methods such as multithreading, cross validation, hyperparameter searching and the like to quickly and accurately extract the combination of logging curves, data mining methods and method parameters which are required to finish a target learning task, so that the method is favorable for accurately implementing logging data mining tasks such as logging curve reconstruction, rock physics classification and the like.

Claims (3)

1. A shared logging data mining method is characterized by comprising the following steps:
the method comprises the following steps: inputting all logging curves related to a target learning task and learning samples consisting of target curves, filling missing values of the learning samples into a mean value, deleting repeated depths of logging curve values, converting qualitative text data into a two-dimensional matrix, and finally carrying out normalization standardization operation on the two-dimensional matrix;
step two: intelligently selecting curves, taking each curve in a learning sample, and calculating the divergence and the correlation of the curves, wherein the divergence calculation formula of the curves is as follows:
Figure FDA0002265576110000011
the correlation of the curves is calculated by the formula:
Figure FDA0002265576110000012
comprehensively obtaining the value score of each curve according to the formula, selecting a plurality of curves with the highest value scores as valuable learning curves, and forming a new learning sample;
step three: intelligently selecting an algorithm, calling experience parameters of various methods in a rock physical data mining knowledge base, then dividing a learning sample in proportion, comprehensively obtaining the accuracy of various data mining methods by using m times of cross validation, and searching the first n methods with the highest accuracy as data mining methods;
step four: the method comprises the steps of intelligently selecting parameters, dividing s threads according to the number of data mining methods to intelligently select the parameters of each method, automatically optimizing by using a hyper-parameter, selecting the variation range of all the parameters, adopting an iteration method, selecting the ith parameter combination, calculating the accuracy of the method, updating the ith parameter combination to be the optimal parameter combination if the accuracy of the ith parameter combination is greater than the maximum accuracy of the previous i-1 parameter combinations, or not updating, then performing the next iteration until all possible combinations of the parameters are traversed, finally determining the optimal parameter combination of each method, wherein the obtained optimal parameter combination can be used for final data mining, and obtaining a data mining model;
step five: and storing and sharing and calling learning knowledge, storing the data mining related learning knowledge into a rock physics data mining knowledge base, calling the data mining related learning knowledge from the rock physics data mining knowledge base when the data mining related learning knowledge is required, determining a prediction sample input curve according to curve redirection, and obtaining the prediction curve by using a data mining model.
2. The method of claim 1, wherein in the first step, the well log curve includes but is not limited to a natural gamma curve, a natural potential curve, a neutron density curve, and a resistivity curve, and the target curve is a curve to be predicted, including but not limited to a porosity curve and a permeability curve.
3. The method of claim 1, wherein in step three, the methods called from the petrophysical data mining knowledge base include but are not limited to support vector machines, bayesian, and gradient boosting trees.
CN201911086499.1A 2019-11-08 2019-11-08 Shared logging data mining method Pending CN111090680A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911086499.1A CN111090680A (en) 2019-11-08 2019-11-08 Shared logging data mining method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911086499.1A CN111090680A (en) 2019-11-08 2019-11-08 Shared logging data mining method

Publications (1)

Publication Number Publication Date
CN111090680A true CN111090680A (en) 2020-05-01

Family

ID=70393121

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911086499.1A Pending CN111090680A (en) 2019-11-08 2019-11-08 Shared logging data mining method

Country Status (1)

Country Link
CN (1) CN111090680A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268460A (en) * 2016-12-30 2018-07-10 广东精点数据科技股份有限公司 A kind of method for automatically selecting optimal models based on big data
CN108763460A (en) * 2018-05-28 2018-11-06 成都优易数据有限公司 A kind of machine learning method and system based on SQL
CN110223156A (en) * 2019-05-16 2019-09-10 杭州排列科技有限公司 Automation model evolutionary algorithm based on gradually optimal feature selection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108268460A (en) * 2016-12-30 2018-07-10 广东精点数据科技股份有限公司 A kind of method for automatically selecting optimal models based on big data
CN108763460A (en) * 2018-05-28 2018-11-06 成都优易数据有限公司 A kind of machine learning method and system based on SQL
CN110223156A (en) * 2019-05-16 2019-09-10 杭州排列科技有限公司 Automation model evolutionary algorithm based on gradually optimal feature selection

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李洪奇等: "基于数据挖掘技术的测井评价方法", 《测井技术》 *
李洪奇等: "复杂储层测井评价数据挖掘方法研究", 《石油学报》 *
范广玲等: "数据挖掘模型选择的通用建模研究", 《科学技术与工程》 *

Similar Documents

Publication Publication Date Title
Kukačka et al. Regularization for deep learning: A taxonomy
CN107122861B (en) Gas emission quantity prediction method based on PCA-PSO-ELM
Ba et al. Adaptive dropout for training deep neural networks
CN108985335B (en) Integrated learning prediction method for irradiation swelling of nuclear reactor cladding material
Tsai et al. First-break automatic picking with deep semisupervised learning neural network
CN108596327A (en) A kind of seismic velocity spectrum artificial intelligence pick-up method based on deep learning
Meng et al. Application of support vector machines to a small-sample prediction
CN112836802A (en) Semi-supervised learning method, lithology prediction method and storage medium
CN107292406A (en) Seismic properties method for optimizing based on vector regression and genetic algorithm
CN114548591A (en) Time sequence data prediction method and system based on hybrid deep learning model and Stacking
CN114898109A (en) Porphyry shallow-formation low-temperature hydrothermal type mineral prediction method and system based on deep learning
Ferris et al. Variational Monte Carlo with the multiscale entanglement renormalization ansatz
CN104732067A (en) Industrial process modeling forecasting method oriented at flow object
CN106990995B (en) Circular block size selection method based on machine learning
CN108876038B (en) Big data, artificial intelligence and super calculation synergetic material performance prediction method
CN114091333A (en) Shale gas content artificial intelligence prediction method based on machine learning
CN111090680A (en) Shared logging data mining method
CN110866551B (en) Drilling data dimension reduction method based on high correlation filtering algorithm and PCA algorithm
CN117076921A (en) Prediction method of logging-while-drilling resistivity curve based on residual fully-connected network
CN112329804A (en) Naive Bayes lithofacies classification integrated learning method and device based on feature randomness
CN103942421B (en) Method for predicting testing data on basis of noise disturbance
CN111066562A (en) Grape downy mildew prediction method and system
CN109256142A (en) Voice conversion is based on extension kernel class gridding method processing sporadic data modeling method and equipment
Taqyudin et al. Wood Classification Based on Fiber Texture Using Backpropagation Method
CN109858127B (en) Blue algae bloom prediction method based on recursive time sequence deep confidence network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200501

RJ01 Rejection of invention patent application after publication