CN115270959B - Shale lithology recognition method and device based on recursion feature elimination fusion random forest - Google Patents

Shale lithology recognition method and device based on recursion feature elimination fusion random forest Download PDF

Info

Publication number
CN115270959B
CN115270959B CN202210894158.2A CN202210894158A CN115270959B CN 115270959 B CN115270959 B CN 115270959B CN 202210894158 A CN202210894158 A CN 202210894158A CN 115270959 B CN115270959 B CN 115270959B
Authority
CN
China
Prior art keywords
lithology
feature
training
feature quantity
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210894158.2A
Other languages
Chinese (zh)
Other versions
CN115270959A (en
Inventor
冯程
钟云滔
毛锐
王先虎
祁利祺
冯梓岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Petroleum Beijing
Original Assignee
China University of Petroleum Beijing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Petroleum Beijing filed Critical China University of Petroleum Beijing
Priority to CN202210894158.2A priority Critical patent/CN115270959B/en
Publication of CN115270959A publication Critical patent/CN115270959A/en
Application granted granted Critical
Publication of CN115270959B publication Critical patent/CN115270959B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • EFIXED CONSTRUCTIONS
    • E21EARTH OR ROCK DRILLING; MINING
    • E21BEARTH OR ROCK DRILLING; OBTAINING OIL, GAS, WATER, SOLUBLE OR MELTABLE MATERIALS OR A SLURRY OF MINERALS FROM WELLS
    • E21B49/00Testing the nature of borehole walls; Formation testing; Methods or apparatus for obtaining samples of soil or well fluids, specially adapted to earth drilling or wells
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/30Assessment of water resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Mining & Mineral Resources (AREA)
  • Geology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Environmental & Geological Engineering (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Fluid Mechanics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Life Sciences & Earth Sciences (AREA)
  • Geochemistry & Mineralogy (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a shale lithology recognition method and device for eliminating fusion random forests based on recursive features. The method comprises the following steps: acquiring feature data of an original lithology feature set according to the original logging data and the nuclear magnetic resonance logging data; performing recursive feature elimination processing based on the feature data of the original lithology feature set to obtain a training lithology feature set, wherein the feature quantity in the training lithology feature set is smaller than that in the original lithology feature set; and training the lithology recognition model according to the feature data corresponding to the training lithology feature set to obtain a lithology recognition model for recognizing the lithology of shale, wherein the lithology recognition model is a random forest model. The method provided by the application combines logging data, a characteristic elimination method and a random forest model to carry out lithology recognition, and has the advantages of high accuracy and short running time.

Description

Shale lithology recognition method and device based on recursion feature elimination fusion random forest
Technical Field
The application relates to the field of well logging, in particular to a shale lithology recognition method and device for eliminating fusion random forests based on recursive features.
Background
Lithology identification is the basic work of reservoir evaluation, and has important significance for subsequent oil and gas reservoir exploration evaluation. However, due to the cost limitations of drilling coring and the time and effort required for manual interpretation of the coring data, continuous and accurate lithology identification is often not possible. Thus, in actual production, logging data is one of the main means of identifying reservoir lithology. With the cross disciplinary fusion of computer science and technology and the logging field, the use of algorithms such as machine learning to explore the correlation of lithology and logging response characteristics has a broad prospect in solving the problem of complex lithology recognition.
In machine learning problems, it is particularly important how to select variables related to predicted values, which process belongs to feature selection in feature engineering. There are generally two methods for feature selection in the lithology recognition field: firstly, the variables related to lithology are manually selected by combining expert experience or physical significance of a logging curve, and the selection method has stronger subjective factors. And secondly, when the data set is established, various statistical methods are utilized to determine the correlation between different logging features and lithology or the correlation between different logging features, such as a cross-correlation method, a dimension reduction method such as principal component analysis, factor analysis and the like, a gray correlation analysis, fuzzy sequencing, nuclear density estimation, pearson correlation coefficient analysis and the like.
However, the feature selection in the existing lithology recognition field is low in recognition accuracy and long in model running time.
Disclosure of Invention
The application provides a shale lithology recognition method and device based on recursive feature elimination fusion random forest, which are used for solving the problems of long running time and low recognition accuracy of model recognition.
In a first aspect, the present application provides a method for training a lithology recognition model, including:
acquiring feature data of an original lithology feature set according to the original logging data and the nuclear magnetic resonance logging data;
performing recursive feature elimination processing based on the feature data of the original lithology feature set to obtain a training lithology feature set, wherein the feature quantity in the training lithology feature set is smaller than that in the original lithology feature set;
and training the lithology recognition model according to the feature data corresponding to the training lithology feature set to obtain a lithology recognition model for recognizing the lithology of shale, wherein the lithology recognition model is a random forest model.
In one possible implementation manner, the performing recursive feature elimination processing based on the feature data of the original lithology feature set to obtain a training lithology feature set includes:
acquiring importance ranking of original lithology characteristics in the original lithology characteristic set according to a classification model eliminated by recursive characteristics;
acquiring target feature quantity in a training lithology feature set based on a feature quantity cross-validation method;
and acquiring the training lithology characteristic set according to the target characteristic quantity and the importance sequence.
In one possible implementation manner, the method for cross-verifying based on feature quantity obtains a target feature quantity in a training lithology feature set, including:
acquiring violin diagrams of different shales aiming at each characteristic data, wherein the violin diagrams are used for reflecting the distribution of the characteristic data;
and obtaining the target feature quantity according to the violin diagram and the feature quantity-based cross-validation method.
In one possible implementation manner, the obtaining the target feature quantity according to the violin diagram and the feature quantity-based cross-validation method includes:
acquiring a first feature quantity and a cross verification score corresponding to each feature quantity based on a cross verification method of the feature quantity, wherein the difference value between the cross verification score corresponding to the first feature quantity and the cross verification score corresponding to a second feature quantity is smaller than a preset difference value, and the second feature quantity is smaller than the first feature quantity by 1;
and selecting a target feature quantity from the first feature quantity or a third feature quantity according to different shale and violin diagrams aiming at each feature data, wherein the third feature quantity is larger than the first feature quantity.
In one possible implementation manner, the selecting, according to the violin map of each feature data of different shales, the target feature quantity from the first feature quantity or the third feature quantity includes:
gradually increasing 1 for the first feature quantity to obtain the third feature quantity;
for the violin graph of the feature data corresponding to the successive increment 1, if the difference between the transverse width or longitudinal length of at least one shale violin graph and the violin graph of other shale is larger than the preset width difference or length difference, continuing the successive increment 1 operation until the difference is smaller than the preset width difference or length difference, and taking the third feature quantity of which the last difference is larger than the preset width difference or length difference as the target feature quantity;
and aiming at the violin diagrams of the characteristic data corresponding to the progressive increment 1, if the difference between the transverse width or the longitudinal length of at least one shale violin diagram and the violin diagrams of other shale is larger than the preset width difference or the length difference, taking the first characteristic quantity as the target characteristic quantity.
In one possible implementation, the raw lithology features in the raw lithology feature set corresponding to the nmr log data include a structural index, a skeleton density index, and a T2 geometric mean.
In one possible implementation manner, the training the lithology recognition model according to the feature data corresponding to the training lithology feature set, to obtain a lithology recognition model for recognizing lithology of shale, includes:
acquiring the number of target decision trees based on a cross validation method of the number of decision trees;
and training the lithology recognition model based on the number of the target decision trees and the feature data corresponding to the training lithology feature set to obtain a lithology recognition model for recognizing the lithology of shale.
In a second aspect, the present application provides a lithology recognition method, comprising:
acquiring lithology characteristic data;
and inputting the lithology characteristic data into a lithology recognition model to obtain a lithology recognition result output by the lithology recognition model, wherein the lithology recognition model is obtained by training through the training method.
In a third aspect, the present application provides an electronic device comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes the computer-executable instructions stored by the memory, causing the at least one processor to perform the method as described above.
In a fourth aspect, the application provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method as described above.
The application provides a shale lithology recognition method and a shale lithology recognition device based on recursive feature elimination fusion random forest, wherein the shale lithology recognition method acquires feature data of an original lithology feature set according to original logging data and nuclear magnetic resonance logging data; performing recursive feature elimination processing based on the feature data of the original lithology feature set to obtain a training lithology feature set, wherein the feature quantity in the training lithology feature set is smaller than that in the original lithology feature set; and training the lithology recognition model according to the feature data corresponding to the training lithology feature set to obtain a lithology recognition model for recognizing the lithology of shale, wherein the lithology recognition model is a random forest model. Initial characteristic data of lithology is identified by combining the original logging data and nuclear magnetic resonance logging data, so that an original database is enriched, characteristic differences among different lithology are improved, and a good identification effect is convenient to achieve in the follow-up process; after enriching the database, selecting and processing the data, finding out the most suitable characteristic data, preparing the selected characteristic data into a data set, inputting the data set into a training model, training out a model capable of more accurately identifying lithology, and collecting shale characteristics according to the screened characteristics to identify lithology.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic diagram of an application of shale lithology recognition according to an embodiment of the present application;
FIG. 2 is a flowchart I of a method for training a lithology recognition model according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a random forest algorithm in a training method of a lithology recognition model according to an embodiment of the present application;
FIG. 4 is a second flowchart of a training method of a lithology recognition model according to an embodiment of the present application;
fig. 5a is a violin diagram one of a training method of a lithology recognition model according to an embodiment of the present application;
fig. 5b is a violin diagram two of a training method of a lithology recognition model according to an embodiment of the present application;
fig. 5c is a violin diagram III of a training method of a lithology recognition model according to an embodiment of the present application;
fig. 5d is a violin diagram IV of a training method of a lithology recognition model according to an embodiment of the present application;
fig. 5e is a violin diagram five of a training method of a lithology recognition model according to an embodiment of the present application;
fig. 5f is a violin diagram six of a training method of a lithology recognition model according to an embodiment of the present application;
fig. 5g is a violin diagram seven of a training method of a lithology recognition model according to an embodiment of the present application;
fig. 5h is a violin diagram eight of a training method of a lithology recognition model according to an embodiment of the present application;
fig. 5i is a violin diagram nine of a training method of a lithology recognition model according to an embodiment of the present application;
fig. 5j is a violin diagram ten of a training method of a lithology recognition model according to an embodiment of the present application;
FIG. 5k is a violin diagram eleven of a training method of a lithology recognition model provided by an embodiment of the present application;
fig. 5l is a violin diagram twelve of a training method of a lithology recognition model according to an embodiment of the present application;
FIG. 6 is a flowchart III of a method for training a lithology recognition model according to an embodiment of the present application;
FIG. 7 is a cross-validation score plot of a training method for lithology recognition models provided by embodiments of the present application;
FIG. 8 is a flow chart of a lithology recognition method provided by an embodiment of the application;
fig. 9 is a schematic hardware diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
With the convergence of computer science and technology with interdisciplinary in the logging arts, the prior art generally performs lithology recognition in two ways: firstly, by combining expert experience or physical characteristics of a logging curve, manually selecting a variable related to lithology, wherein the method is too dependent on manual experience, so that an identification result has a certain subjectivity; secondly, various mathematical statistical methods are established for selection, for example: the cross-plot method has larger blindness and uncertainty, and the dimension reduction method similar to the cross-plot method comprises a factor molecular method and the like, and reduces characteristic variables as far as possible under the condition of not losing original information, so that the original characteristic variables are reserved no matter whether the original characteristic variables are related to lithology or not, the target characteristic is inaccurate to reserve, and the accuracy of lithology identification is greatly influenced.
The embodiment of the application provides a training method of a lithology recognition model and a lithology recognition method, and the following description is made with reference to fig. 1 on how the method can be implemented.
Fig. 1 is an application schematic diagram of shale lithology recognition according to an embodiment of the present application. As shown in fig. 1, the system includes: shale, logging features and training models; and it can be seen that the last trained model can be used to identify the lithology of shale.
Shale contains different attributes and can be generally classified into the following: mudstone Yun Yan, mudstone, sand Yun Yan, cloud silty fine sandstone and feldspar and rock dust sandstone; these shale properties, simply lithology, may be characterized by different logging characteristics, for example, measuring that the natural Gamma (GR) value of a shale is within a certain range, while lithology within that range may include mudstone and cloud siltstone, focusing only on the GR value, then the recognition result may be that the shale layer contains mudstone and cloud siltstone, but in fact the shale layer may include only one lithology of mudstone and cloud siltstone, and may also include other lithologies not found by the GR value; so selecting the proper logging features can better provide powerful support for subsequent identification work.
The application well combines the original logging data and nuclear magnetic resonance logging data to select logging characteristics, and uses recursive characteristic elimination to process after the logging characteristics are selected, so as to select data with better expressive force and recognition, and eliminate the influence of part of manual selection on recognition subjectivity; the number of features that can achieve data stabilization and have good expressive power may be more than one, for example, it is possible that 4 or more features are selected from 12 feature choices to achieve sufficient stability.
In order to make feature data input to the training model richer and more accurate, it may be determined whether the logging features are as resolved as the violin map would appear for each feature data in combination with different shales, and if so, the feature data may be retained instead of choosing the smallest number among feature numbers satisfying data stabilization and good expressivity, for example, 4 features should be chosen as input to the training model when choosing the smallest feature number, but 6 may be chosen on the basis of combining the violin map, so that the rich data characteristics may be retained with reduced model burden.
After the logging characteristics are selected, the data set can be manufactured and sent into a random forest model for training, and the trained random forest model can be used in the subsequent lithology recognition process to finish accurate recognition.
The following specifically describes how the process of the training method for lithology recognition model according to the embodiment of the present application may be implemented.
Fig. 2 is a flowchart of a training method of a lithology recognition model according to an embodiment of the present application.
As shown in fig. 2, the method includes:
s201, acquiring characteristic data of an original lithology characteristic set according to original logging data and nuclear magnetic resonance logging data;
the original logging data are logging data selected from conventional logging data, wherein the conventional logging data are logging data commonly used in conventional lithology identifying means, and the logging data which can distinguish lithology are selected from the conventional logging data to construct characteristic data of an original lithology characteristic set; meanwhile, in order to enrich the characteristic data, nuclear magnetic resonance logging data are selected, and different data performances exist in the data contained in the nuclear magnetic resonance logging data when different lithology is faced.
Optionally, the original logging data used for constructing the feature data of the original lithology feature set selected in the implementation of the present application includes: natural Gamma (GR) values, natural potential (SP) values, borehole diameter (CALI) values, compensated Neutron (CNL) values, compensated Density (DEN) values, sonic moveout (AC) values, transition zone Resistivity (RI) values, flushing zone Resistivity (RXO) values, as-is formation Resistivity (RT) values.
By way of example, nuclear magnetic resonance logging data selected in the practice of the present application for constructing feature data for an original lithology feature set includes:
the original lithology characteristics in the original lithology characteristic set corresponding to the nuclear magnetic resonance logging data comprise a structural index, a skeleton density index and a T2 geometric mean value.
S202, performing recursive feature elimination processing based on feature data of the original lithology feature set to obtain a training lithology feature set, wherein the feature quantity in the training lithology feature set is smaller than that in the original lithology feature set.
In the steps of the above embodiment, 9 kinds of original logging data and 3 kinds of nuclear magnetic resonance logging data are provided, 12 kinds of feature data are total, namely, the number of features in the original lithology feature set is 12, a training lithology feature set is obtained by screening from the 12 kinds of feature data, and the number of features in the training lithology feature set after the final screening is smaller than that in the original lithology feature set, for example, 6 kinds or other numbers smaller than 12 kinds; a recursive feature elimination method is used in the actual feature screening process.
Recursive feature elimination is a "greedy" algorithm aimed at finding the best performing feature training lithology feature set. And repeatedly creating a model, retaining the optimal characteristics or eliminating the worst characteristics when each iteration is performed, and finally selecting a training lithology characteristic set. The present embodiment does not particularly limit the recursive feature elimination process.
S203, training the lithology recognition model according to the feature data corresponding to the training lithology feature set to obtain a lithology recognition model for recognizing the lithology of shale, wherein the lithology recognition model is a random forest model.
The training lithology feature set established after feature screening is used as the input of a lithology recognition model, the feature data corresponding to the training lithology feature set are several kinds of feature data after screening, and the operation time of an algorithm can be reduced for the lithology recognition model;
the lithology recognition model selected in the embodiment is a random forest model, and the recognition accuracy of the model is trained by optimizing the number of super-parameter-decision trees in the model training process.
The following specifically describes how a random forest model in a lithology recognition model training method can be implemented according to an embodiment of the present application.
Fig. 3 is a schematic diagram of a random forest algorithm in a training method of a lithology recognition model according to an embodiment of the present application. As shown in fig. 3, the algorithm includes:
(1) T training sets S1, S2, … and ST are randomly selected from the initial data set by adopting a boottrap sampling method.
(2) Each training set produces a respective decision tree D1, D2, …, DT. Randomly extracting M attributes from M attributes at a non-leaf node of each decision tree (assuming that the number of the attributes of the sample is M, and M is an integer greater than zero and less than M), and selecting an optimal attribute to spread branches on the node according to the Gini index.
(3) Each tree grows intact and pruning is not performed.
(4) And predicting the test set sample x according to the generated decision tree to obtain corresponding categories D1 (x), D2 (x), … and DT (x). And voting is adopted, and the classification result with the highest category in the classification results of the T decision trees is taken as the classification result of the final test set.
Random forests have the advantage that the dataset is chosen randomly and each decision tree is modeled with a part of all samples and only a part of the attributes extracted. The generalization capability of the model is greatly enhanced, so that the model is not easy to overfit.
For example, a cross-validation method based on the number of decision trees obtains the number of target decision trees;
and training the lithology recognition model based on the number of the target decision trees and the feature data corresponding to the training lithology feature set to obtain a lithology recognition model for recognizing the lithology of shale.
The number of decision trees can be selected by using a cross verification method and a grid search method in the training process, and in the cross verification process, the cross verification score shows an ascending trend and is stable step by step along with the increase of the number of the decision trees, so that the classification accuracy of the model is gradually increased to a stable state; in the embodiment, the number of comprehensively selected decision trees in the experimental process is 91, and the maximum depth is 8; and (5) finishing model training after the model parameters are adjusted.
In the embodiment of the application, the characteristic data of the original lithology characteristic set is obtained according to the original logging data and the nuclear magnetic resonance logging data; performing recursive feature elimination processing based on the feature data of the original lithology feature set to obtain a training lithology feature set, wherein the feature quantity in the training lithology feature set is smaller than that in the original lithology feature set; and training the lithology recognition model according to the feature data corresponding to the training lithology feature set to obtain a lithology recognition model for recognizing the lithology of shale, wherein the lithology recognition model is a random forest model. In the method, the original characteristic data of lithology is identified by combining the original logging data and the nuclear magnetic resonance logging data, so that an original database is enriched, the characteristic difference between different lithologies is improved, and a good identification effect is convenient to achieve subsequently; after enriching the database, the data can be selected and processed without adopting the characteristic data, so that the most suitable characteristic data can be found, and the aim of carefully selecting the characteristic data is to reduce the operation burden of the model under the condition that the characteristic data is sufficient; and finally, preparing the selected characteristic data into a data set, inputting the data set into a training model, training out a model capable of identifying lithology more accurately, and collecting shale characteristics according to the screened characteristics to identify lithology.
The following specifically describes how the process of acquiring the training lithology feature set in the training method of the lithology recognition model according to the embodiment of the present application may be implemented in conjunction with fig. 4 and fig. 5a to fig. 5 l.
Fig. 4 is a flowchart two of a training method of a lithology recognition model according to an embodiment of the present application.
As shown in fig. 4, the method includes:
s401, acquiring importance ranking of original lithology characteristics in the original lithology characteristic set according to a classification model eliminated by the recursion characteristics.
The recursive feature elimination algorithm also comprises a classification model, wherein the classification model can be a random forest model or other classification models capable of realizing the requirements of the embodiment, the classification model can calculate the importance sequence of different lithology features, the calculated input value of the classification model is the value of lithology response parameters, and the lithology response parameters are subjected to standardized treatment before being input.
Because of the difference in units of 12 logging features, the original logging data needs to be normalized first, in the following manner:
wherein y is i Is the result after standardized treatment; x is x i Is the original logging data;is the average of the raw log data; s is the standard deviation of the original log data, and n is the total number of log features.
S402, acquiring a violin graph of different shales for each characteristic data, wherein the violin graph is used for reflecting the distribution of the characteristic data.
After the 12 kinds of characteristic data are subjected to importance ranking, violin diagrams are obtained aiming at different lithology of shale and different characteristic data of the shale under different lithology.
As shown in FIGS. 5 a-5 l, five different lithologies are shown at GR, SP, CALI, CNL, DEN, AC, RI, RXO, RT, T2 geometric mean (T 2lm Value), structural index (I j Value) and skeleton Density index (P) m Values), may also be referred to as lithology response parameter representations; in the figure, a is mud crystal Yun Yan, b is mudstone, c is sand Yun Yan, d is cloud fine sandstone, and e is feldspar and rock dust sandstone;
the wider the transverse direction of the response parameter, the more concentrated the data, and the longer the longitudinal direction of the response parameter, the more scattered the data; taking the GR value as an example, the GR value of the position with the widest transverse width of the mud crystal Yun Yan is in the range of 70-120API, the GR value of the position with the widest transverse width of the feldspar and rock dust sandstone is in the range of 36-84API, and the GR value of the mud crystal Yun Yan is obviously wider than the distribution range of the GR value of the feldspar and rock dust sandstone; in this embodiment, a value corresponding to the lateral width is mainly selected as the calculation input of the classification model in S401.
For example, in this embodiment, the importance ranking of 12 features is shown in table 1:
TABLE 1 feature data importance ranking
S403, acquiring the target feature quantity according to the violin diagram and the feature quantity-based cross-validation method.
After the importance ranking is carried out on the feature data, how many features are selected from the feature data; at the moment, the cross verification method can determine that the cross verification score is gradually increased and stabilized in the process of increasing the feature quantity until the cross verification scores of all 12 feature quantities are tested; and determining the final selected target feature quantity by taking the feature quantity corresponding to the stabilized cross verification score as a main component and taking the violin diagram as an auxiliary component.
S404, acquiring the training lithology characteristic set according to the target characteristic quantity and the importance sequence.
Determining importance ranking of all feature quantity and finally selecting feature quantity, and selecting the first m features with the importance ranking as the standard, wherein m is smaller than the total feature quantity; and constructing a training lithology characteristic set by using the selected characteristics.
In the embodiment of the application, the importance ranking of the original lithology characteristics in the original lithology characteristic set is obtained according to a classification model with the recursive characteristic eliminated; acquiring violin diagrams of different shales aiming at each characteristic data, wherein the violin diagrams are used for reflecting the distribution of the characteristic data; acquiring the target feature quantity according to the violin diagram and the feature quantity-based cross-validation method; and acquiring the training lithology characteristic set according to the target characteristic quantity and the importance sequence. In the method, the importance ranking of the original lithology characteristics is finished through the characteristic elimination method, the quantity of the characteristics to be selected is mainly determined by a cross-validation method, the quantity of the finally selected target characteristics is determined by taking the violin as an auxiliary component, the screening of the characteristics is gradually finished, the subjectivity of the characteristics selected by using a physical diagram or a graph is avoided, meanwhile, the proper characteristics are accurately selected to manufacture a training lithology characteristic set, and the accuracy of subsequent lithology classification and identification is ensured.
The following specifically describes how the target feature quantity confirmation process can be implemented in the training method of the lithology recognition model according to the embodiment of the present application with reference to fig. 6.
Fig. 6 is a flowchart III of a training method of a lithology recognition model according to an embodiment of the present application.
As shown in fig. 6, the method includes:
s601, acquiring a first feature quantity and a cross verification score corresponding to each feature quantity based on a cross verification method of the feature quantity;
the difference value of the cross verification score corresponding to the first feature quantity and the cross verification score corresponding to the second feature quantity is smaller than a preset difference value, and the second feature quantity is smaller than the first feature quantity by 1.
FIG. 7 is a cross-validation score plot of a training method for lithology recognition models provided by an embodiment of the present application. As shown in fig. 7, the cross-validation scores corresponding to different feature numbers calculated by the cross-validation method are shown, and it can be seen that the increase of the feature numbers to 3 post-cross-validation scores decreases, starting from a difference value of the cross-validation scores from 3 feature numbers to 4 feature numbers, and the difference value of two adjacent cross-validation scores is smaller than the preset difference value; here, 3 feature quantities may be understood as a first feature quantity, and 4 feature quantities may be understood as a second feature quantity; alternatively, if the first feature number is 4 and the second feature number is 5, the difference value thereof also satisfies the condition of being smaller than the preset difference value.
S602, gradually increasing 1 for the first feature quantity to obtain the third feature quantity.
Based on the understanding of S601 described above, the feature quantity greater than 3 can be understood as the third feature quantity.
S603, judging whether the difference between the transverse width or longitudinal length of at least one shale violin map and the violin map of other shale is larger than a preset width difference or length difference according to the violin map of the characteristic data corresponding to the progressive increment 1; if yes, then execute S604; if not, S605 is executed.
Based on the illustration and description of the violin diagrams in fig. 5a to 5l, it can be known that the violin diagrams can also distinguish the characteristic differences among different lithologies, gradually increase by 1 based on 3 characteristic quantities, check whether the distinguishing degree of the characteristic data on the lithologies is obvious or not by combining the violin diagrams, and make corresponding selection on the characteristics.
And S604, continuing the operation of gradually increasing 1 until the difference is smaller than the preset width difference or the length difference, and taking the third feature quantity of which the last difference is larger than the preset width difference or the length difference as the target feature quantity.
Sequentially checking violin diagram representation meeting cross-validated features according to importance ranking, such as DEN value features corresponding to FIG. 5e, T corresponding to FIG. 5j 2lm The value characteristics, such as RT value characteristics corresponding to FIG. 5i, can obviously distinguish different lithology, and meanwhile, 6 characteristic quantities are reserved, and the verification score is high and stable and can be reserved; compared with the initial 12 feature quantities, after screening, the quantity is halved, and the method has obvious effect on the subsequent reduction of the calculation load of the model; although the present embodiment retains 6 kinds of feature quantities, the value is not absolute, and the feature quantities may also vary according to the accuracy requirements and model calculation requirements.
S605, taking the first feature quantity as the target feature quantity.
If the violin diagrams corresponding to the feature quantities greater than 3 are all not clearly distinguished as the CALI value of fig. 5c, the first feature quantity is directly reserved as the target feature quantity.
In the embodiment of the application, a first feature quantity is acquired by a cross-validation method based on the feature quantity, and a cross-validation score corresponding to each feature quantity is acquired; the difference value of the cross verification score corresponding to the first feature quantity and the cross verification score corresponding to the second feature quantity is smaller than a preset difference value, and the second feature quantity is smaller than the first feature quantity by 1; gradually increasing 1 for the first feature quantity to obtain the third feature quantity; judging whether the difference between the transverse width or longitudinal length of at least one shale violin map and the violin map of other shale is larger than a preset width difference or length difference or not according to the violin map of the characteristic data corresponding to the progressive increment 1; if yes, continuing to gradually increment the operation of 1 until the difference is smaller than the preset width difference or the length difference, and taking the third characteristic quantity of which the last difference is larger than the preset width difference or the length difference as the target characteristic quantity; and if not, taking the first characteristic quantity as the target characteristic quantity. In the method, after feature importance ranking, the number and the range of the selected features can be preliminarily determined on the basis of a cross-validation method, and the final selected target feature number is determined from the range by combining the violin diagram, so that the effective features are accurately selected, the recognition accuracy is improved, and the model burden is reduced.
The following describes how the process of the lithology recognition method according to the embodiment of the present application may be implemented in detail with reference to fig. 8.
Fig. 8 is a flowchart of a lithology recognition method according to an embodiment of the present application. As shown in fig. 8, the method includes:
s801, lithology characteristic data are acquired.
And determining the target feature quantity and the feature types corresponding to the target feature quantity in the model training process, and acquiring lithology feature data of shale according to the feature types to be used for actually detecting lithology of the shale.
S802, inputting the lithology characteristic data into a lithology recognition model to obtain a lithology recognition result output by the lithology recognition model.
And inputting the acquired data into the trained lithology recognition model to obtain a recognition result.
Table 2 shows the performance comparison of the method of the present application with other recognition models, as shown in Table 2, the lithology recognition accuracy of the RF-RFE model is highest and reaches 83.20%, while the RF and SVM accuracy is close to 80.68% and 80.30%, respectively. More importantly, the RF-RFE model achieves a good application effect by using only 6 parameters, which indicates that the rest 6 parameters contain some redundant information to interfere with lithology recognition, and the RF-RFE model avoids the problem. But the accuracy improvement effect is limited, which is determined by the characteristics of the data volume itself. From the run-time analysis, the run-time of the RF-RFE model was the shortest, only 0.1825s, and the RF run-time without RFE was the longest, reaching 0.2134s. Mainly because the RF-RFE only uses 6 parameters, the data volume is reduced by half, and the operation efficiency is improved. Because of data limitation, the data volume is smaller, the running time of the three models is shorter, but when the volume and dimension of the logging data are enough, the RF-RFE model firstly carries out optimization on the characteristic parameters, so that the advantages can be obviously reflected; wherein, RF is random forest algorithm, RFE is recursive feature elimination method, SVM is support vector machine method.
Table 2 comparison of performance
Method Accuracy/% Run time/s
RF-RFE (application) 83.20 0.1825
RF 80.68 0.2134
SVM 80.30 0.1935
In the embodiment of the application, lithology characteristic data are acquired; and inputting the lithology characteristic data into a lithology recognition model to obtain a lithology recognition result output by the lithology recognition model. The method is used for the trained lithology recognition model in the embodiment of the application, so that lithology can be accurately and rapidly recognized.
The present application also provides an electronic device including: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes the computer-executable instructions stored by the memory, causing the at least one processor to perform the method described above.
Fig. 9 is a schematic hardware diagram of an electronic device according to an embodiment of the present application. As shown in fig. 9, the apparatus 90 provided in this embodiment includes: at least one processor 901 and a memory 902. The device 90 further comprises a communication means 903. The processor 901, the memory 902, and the communication unit 903 are connected via a bus 904.
In a specific implementation, at least one processor 901 executes computer-executable instructions stored in the memory 902, such that the at least one processor 901 performs a method as described above.
The specific implementation process of the processor 901 may refer to the above-mentioned method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein again.
In the embodiment shown in fig. 9, it should be understood that the processor may be a central processing unit (english: central Processing Unit, abbreviated as CPU), or may be other general purpose processors, digital signal processors (english: digital Signal Processor, abbreviated as DSP), application specific integrated circuits (english: application Specific Integrated Circuit, abbreviated as ASIC), or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
The Memory may comprise high-speed Memory (Random Access Memory, RAM) or may further comprise Non-volatile Memory (NVM), such as at least one disk Memory.
The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or to one type of bus.
The present application also provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the method as described above.
The computer readable storage medium described above may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, or optical disk. A readable storage medium can be any available medium that can be accessed by a general purpose or special purpose computer.
An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. In the alternative, the readable storage medium may be integral to the processor. The processor and the readable storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). The processor and the readable storage medium may reside as discrete components in a device.
The division of the units is merely a logic function division, and there may be another division manner when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains and as may be applied to the precise construction hereinbefore set forth and shown in the drawings and as follows in the scope of the appended claims. The scope of the application is limited only by the appended claims.

Claims (8)

1. A training method for eliminating shale lithology recognition model fused with random forest based on recursive features is characterized by comprising the following steps:
acquiring feature data of an original lithology feature set according to the original logging data and the nuclear magnetic resonance logging data;
performing recursive feature elimination processing based on the feature data of the original lithology feature set to obtain a training lithology feature set, wherein the feature quantity in the training lithology feature set is smaller than that in the original lithology feature set;
training a lithology recognition model according to the feature data corresponding to the training lithology feature set to obtain a lithology recognition model for recognizing the lithology of shale, wherein the lithology recognition model is a random forest model;
based on the feature data of the original lithology feature set, performing recursive feature elimination processing to obtain a training lithology feature set, including:
acquiring importance ranking of original lithology characteristics in the original lithology characteristic set according to a classification model eliminated by recursive characteristics;
acquiring violin diagrams of different shales aiming at each characteristic data, wherein the violin diagrams are used for reflecting the distribution of the characteristic data;
acquiring target feature quantity according to the violin diagram and a cross verification method based on the feature quantity;
and acquiring the training lithology characteristic set according to the target characteristic quantity and the importance sequence.
2. The method of claim 1, wherein the obtaining the target feature quantity according to the violin map and the feature quantity-based cross-validation method comprises:
acquiring a first feature quantity and a cross verification score corresponding to each feature quantity based on a cross verification method of the feature quantity, wherein the difference value between the cross verification score corresponding to the first feature quantity and the cross verification score corresponding to a second feature quantity is smaller than a preset difference value, and the second feature quantity is smaller than the first feature quantity by 1;
and selecting a target feature quantity from the first feature quantity or a third feature quantity according to different shale and violin diagrams aiming at each feature data, wherein the third feature quantity is larger than the first feature quantity.
3. The method of claim 2, wherein selecting a target feature quantity from the first feature quantity or the third feature quantity according to the violin map of each feature data of different shales, comprises:
gradually increasing 1 for the first feature quantity to obtain the third feature quantity;
for the violin graph of the feature data corresponding to the successive increment 1, if the difference between the transverse width or longitudinal length of at least one shale violin graph and the violin graph of other shale is larger than the preset width difference or length difference, continuing the successive increment 1 operation until the difference is smaller than the preset width difference or length difference, and taking the third feature quantity of which the last difference is larger than the preset width difference or length difference as the target feature quantity;
and aiming at the violin diagrams of the characteristic data corresponding to the progressive increment 1, if the difference between the transverse width or the longitudinal length of at least one shale violin diagram and the violin diagrams of other shale is larger than the preset width difference or the length difference, taking the first characteristic quantity as the target characteristic quantity.
4. The method of claim 1, wherein the raw lithology features in the raw lithology feature set corresponding to the nmr log data comprise a structural index, a skeletal density index, and a T2 geometric mean.
5. The method of claim 1, wherein training the lithology recognition model according to the feature data corresponding to the training lithology feature set to obtain a lithology recognition model for recognizing lithology of shale comprises:
acquiring the number of target decision trees based on a cross validation method of the number of decision trees;
and training the lithology recognition model based on the number of the target decision trees and the feature data corresponding to the training lithology feature set to obtain a lithology recognition model for recognizing the lithology of shale.
6. A shale lithology recognition method based on recursive feature elimination fusion random forest, comprising:
acquiring lithology characteristic data;
inputting the lithology characteristic data into a lithology recognition model to obtain a lithology recognition result output by the lithology recognition model, wherein the lithology recognition model is obtained through training by the training method according to any one of claims 1 to 5.
7. An electronic device, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing computer-executable instructions stored in the memory causes the at least one processor to perform the method of any one of claims 1-6.
8. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method according to any of claims 1-6.
CN202210894158.2A 2022-07-27 2022-07-27 Shale lithology recognition method and device based on recursion feature elimination fusion random forest Active CN115270959B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210894158.2A CN115270959B (en) 2022-07-27 2022-07-27 Shale lithology recognition method and device based on recursion feature elimination fusion random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210894158.2A CN115270959B (en) 2022-07-27 2022-07-27 Shale lithology recognition method and device based on recursion feature elimination fusion random forest

Publications (2)

Publication Number Publication Date
CN115270959A CN115270959A (en) 2022-11-01
CN115270959B true CN115270959B (en) 2023-08-22

Family

ID=83770983

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210894158.2A Active CN115270959B (en) 2022-07-27 2022-07-27 Shale lithology recognition method and device based on recursion feature elimination fusion random forest

Country Status (1)

Country Link
CN (1) CN115270959B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117079059B (en) * 2023-10-13 2023-12-19 云南师范大学 Tree species automatic classification method based on multi-source satellite image

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919184A (en) * 2019-01-28 2019-06-21 中国石油大学(北京) A kind of more well complex lithology intelligent identification Methods and system based on log data
CN110992200A (en) * 2019-12-11 2020-04-10 长江大学 Shale gas well staged fracturing effect evaluation and yield prediction method based on random forest
CN113361638A (en) * 2021-07-01 2021-09-07 中国石油大学(北京) Complex reservoir lithology identification method, device, equipment and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109919184A (en) * 2019-01-28 2019-06-21 中国石油大学(北京) A kind of more well complex lithology intelligent identification Methods and system based on log data
CN110992200A (en) * 2019-12-11 2020-04-10 长江大学 Shale gas well staged fracturing effect evaluation and yield prediction method based on random forest
CN113361638A (en) * 2021-07-01 2021-09-07 中国石油大学(北京) Complex reservoir lithology identification method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李恒凯 等.基于多源数据的南方丘陵山地土地利用随机森林分类.《农业工程学报》.2021,第第37卷卷(第第37卷期),正文第1-2节. *

Also Published As

Publication number Publication date
CN115270959A (en) 2022-11-01

Similar Documents

Publication Publication Date Title
CN112989708B (en) Well logging lithology identification method and system based on LSTM neural network
Bae et al. Coala: A novel approach for the extraction of an alternate clustering of high quality and high dissimilarity
US7801924B2 (en) Decision tree construction via frequent predictive itemsets and best attribute splits
CN110674841B (en) Logging curve identification method based on clustering algorithm
CN110346831B (en) Intelligent seismic fluid identification method based on random forest algorithm
CN111783825A (en) Well logging lithology identification method based on convolutional neural network learning
US9292550B2 (en) Feature generation and model selection for generalized linear models
CN112001788B (en) Credit card illegal fraud identification method based on RF-DBSCAN algorithm
CN110866836B (en) Computer-implemented medical insurance scheme auditing method and device
CN115906675B (en) Well position and injection and production parameter joint optimization method based on time sequence multi-target prediction model
CN115270959B (en) Shale lithology recognition method and device based on recursion feature elimination fusion random forest
WO2024036709A1 (en) Anomalous data detection method and apparatus
CN112164426A (en) Drug small molecule target activity prediction method and device based on TextCNN
US20090099778A1 (en) Seismic data processing workflow decision tree
CN115481577B (en) Automatic oil reservoir history fitting method based on random forest and genetic algorithm
CN115358285B (en) Method, device and equipment for selecting key geological parameters of block to be surveyed
CN116427915A (en) Conventional logging curve crack density prediction method and system based on random forest
CN107194468A (en) Towards the decision tree Increment Learning Algorithm of information big data
CN111353529A (en) Mixed attribute data set clustering method for automatically determining clustering center
CN113705110A (en) Blasting vibration speed prediction method based on dual random forest regression method
US8954282B2 (en) Autonomic seismic data processing
CN113685162B (en) Fracturing parameter determination method, device, equipment and storage medium
RU2745492C1 (en) Method and system for the search for analogues of oil and gas fields
CN113034264A (en) Method and device for establishing customer loss early warning model, terminal equipment and medium
CN117407841B (en) Shale layer seam prediction method based on optimization integration algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant