CN113344359A - Method for quantitatively evaluating quality master control factors of tight sandstone gas reservoir based on random forest - Google Patents

Method for quantitatively evaluating quality master control factors of tight sandstone gas reservoir based on random forest Download PDF

Info

Publication number
CN113344359A
CN113344359A CN202110599248.4A CN202110599248A CN113344359A CN 113344359 A CN113344359 A CN 113344359A CN 202110599248 A CN202110599248 A CN 202110599248A CN 113344359 A CN113344359 A CN 113344359A
Authority
CN
China
Prior art keywords
factors
influence
data
main control
random forest
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110599248.4A
Other languages
Chinese (zh)
Inventor
甄艳
康锦涛
赵晓明
葛家旺
周雪松
代茂林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Petroleum University
Original Assignee
Southwest Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Petroleum University filed Critical Southwest Petroleum University
Priority to CN202110599248.4A priority Critical patent/CN113344359A/en
Publication of CN113344359A publication Critical patent/CN113344359A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01VGEOPHYSICS; GRAVITATIONAL MEASUREMENTS; DETECTING MASSES OR OBJECTS; TAGS
    • G01V9/00Prospecting or detecting by methods not provided for in groups G01V1/00 - G01V8/00
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Medical Informatics (AREA)
  • General Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Operations Research (AREA)

Abstract

The invention discloses a method for quantitatively evaluating main control factors of the quality of a tight sandstone gas reservoir based on random forests, which comprises the following steps: s1: collecting relevant influence factors influencing reservoir quality in a research area, and performing parameter processing according to the parameter types of the influence factors; s2: storing the processed parameters as comma separated value files; s3: using the characterization parameters of the reservoir quality as dependent variables and using the influence factors as independent variables; s4: extracting training data by using a random forest algorithm and a replaced random sampling mode according to the dependent variable and the independent variable to construct a decision tree and a random forest; s5: and calculating the error of the data outside the bag of the decision tree aiming at each influence factor, and preferably selecting the main control influence factor according to the error of the data outside the bag. The method can realize quantitative evaluation on the main control factors of the quality of the tight sandstone gas reservoir by using the data error outside the bag, and provides a referential geological basis for the improvement of the oil field development effect and the description of the gas reservoir in the next step.

Description

Method for quantitatively evaluating quality master control factors of tight sandstone gas reservoir based on random forest
Technical Field
The invention relates to the technical field of unconventional oil and gas reservoir development, in particular to a method for quantitatively evaluating main control factors of the quality of a tight sandstone gas reservoir based on a random forest.
Background
At the present stage, the world enters the conventional oil gas stable production-up and unconventional oil gas rapid development stage, and the compact sandstone gas becomes the key direction of unconventional natural gas development. A large number of development practices show that effective evaluation of the main quality control factors of the tight sandstone gas reservoir is a key basic problem for realizing large-scale and profitable development of the gas reservoir.
The current methods for evaluating the reservoir quality master control factors include the following categories: firstly, carrying out compact sandstone quality evaluation on a reservoir by using sedimentology and logging rock physics; secondly, reservoir quality is researched and distinguished by using the modes of core observation, slice identification, physical property analysis and the like; thirdly, fully combining the data of well logging, core observation, analysis and assay, oil testing and production trial, and the like, establishing a reservoir evaluation index system, and comprehensively analyzing reservoir quality influence factors; and fourthly, establishing a reservoir evaluation method and criterion by combining logging data and adopting a numerical simulation mode based on reservoir characteristics.
The method is mainly based on single-factor qualitative or semi-quantitative evaluation constrained by artificial experience, however, development proves that the quality of the tight sandstone reservoir is jointly controlled by multiple factors such as deposition, diagenesis, construction and the like. The existing method mainly focuses on qualitative or semi-quantitative analysis of single factor parameters, and the evaluation results of the method have the problems of subjectivity, non-universality and the like, so that the quality definition of a compact sandstone gas reservoir is unclear and inaccurate, the design of a development scheme and well location deployment are influenced, and particularly in a reservoir with serious heterogeneity, the permeability cannot be calculated with high precision through a pore structure and the radius of a pore throat. Therefore, it is highly desirable to provide a quantitative evaluation method of multifactorial linkage.
Disclosure of Invention
In order to solve the problems, the invention aims to provide a method for quantitatively evaluating the main control factors of the quality of the compact sandstone gas reservoir based on a random forest.
The technical scheme of the invention is as follows:
a method for quantitatively evaluating the main control factors of the quality of a tight sandstone gas reservoir based on a random forest comprises the following steps:
s1: collecting relevant influence factors influencing reservoir quality in a research area, and performing parameter processing according to the parameter types of the influence factors:
if the influence factors are data parameters, checking whether the data parameters are missing, and if a certain data parameter is missing, removing all the influence factors of the same batch; if no deletion exists, reserving;
if the influence factor is a text parameter, carrying out assignment processing on the text parameter;
s2: storing the processed parameters as comma separated value files;
s3: taking the characterization parameters of the reservoir quality as dependent variables of the reservoir quality analysis, and taking the influencing factors as independent variables of the reservoir quality analysis;
s4: extracting training data by using a random forest algorithm and a replaced random sampling mode according to the dependent variable and the independent variable to construct a decision tree and a random forest;
s5: and calculating the error of the data outside the bag of the decision tree aiming at each influence factor, and preferably selecting the main control influence factor according to the error of the data outside the bag.
Preferably, in step S1, the relevant influencing factors include a sedimentation-type influencing factor, a diagenesis-type influencing factor, and a construction-type influencing factor.
Preferably, the sedimentation influencing factors comprise granularity lithology type, mineral lithology type, sorting property, roundness grinding, various granularity parameters, different types of mineral particle content, impurity base content and primary pore content; the diagenetic influence factors comprise cementing action types, different types of cementing material contents, different types of dissolving hole contents, different types of alternate mineral contents and compaction strength; the construction type influence factors comprise fracture types and different types of fracture contents.
Preferably, in step S3, the characterization parameter of the reservoir quality is a flow cell index.
Preferably, step S4 specifically includes the following sub-steps:
s41: by using a random forest algorithm, taking the processed data set as an input sample data set, randomly extracting the data set for multiple times to form subset data in a replacement mode, wherein the sampling times are consistent with the number of samples, and the subset data obtained by sampling is used for constructing a decision tree;
s42: randomly extracting part of influence factors from the sub-data set to form a candidate partition set, selecting one influence factor from the candidate partition set as a partition point of the decision tree according to a minimum node purity principle, continuing splitting by adopting the principle until all samples of the node reach leaf nodes, and finishing splitting;
s43: and repeating the step S41 and the step S42 to establish a plurality of decision trees, and forming the random forest by the decision trees.
Preferably, step S5 specifically includes the following sub-steps:
s51: putting all influence factors in the data outside the bags into the constructed random forest, and calculating the predicted value of each data outside the bags through the random forest aiming at a certain decision tree;
s52: calculating a mean square error I between a predicted value and a true value of each data outside the bag;
s53: selecting a certain influence factor in the data outside the bag, randomly adding noise into the influence factor, then placing the influence factor into the random forest, and calculating to obtain a predicted value after the noise is added; calculating a mean square error II between a predicted value and a true value of the influencing factor with noise;
s54: judging the importance of the influence factor according to the magnitude of the first mean square error and the second mean square error of the influence factor:
if the second mean square error is larger than the first mean square error after random noise is added, and the difference value between the second mean square error and the first mean square error is larger than a threshold value, the influence factor is important, otherwise, the influence factor is unimportant;
s55: repeating the step S53 and the step S54, and judging all the remaining influence factors in the data outside the bag;
s56: and (4) repeating the step S51 and the step S55 aiming at each remaining decision tree in the random forest, calculating the out-of-bag data error of each influence factor, taking the average value of the out-of-bag data errors as the importance value of each influence factor, arranging the importance values in a descending order according to the size of the average value, wherein the influence factors in the top order are the main control factors influencing the reservoir quality.
Preferably, the method further comprises the following steps:
s6: reconstructing a decision tree and a random forest according to the main control influence factors selected by the out-of-bag data errors;
s7: calculating the out-of-bag data errors of the optimal main control influence factors, and taking the average value of the out-of-bag data errors of the main control influence factors as the importance value of each main control influence factor;
s8: and calculating the percentage of the importance value of each preferred main control influence factor to all the preferred main control factor importance values, and quantitatively representing the influence degree of each main control influence factor on the reservoir quality.
The invention has the beneficial effects that:
according to the method, the random forest algorithm is used for carrying out quantitative evaluation on related influence factors on the characterization parameters reflecting the reservoir quality, so that the extraction of key factors for controlling the reservoir quality is realized, the deposition and diagenesis of the reservoir are determined, and the formation and distribution of the tight sandstone high-quality reservoir are effectively judged. The traditional method mainly depends on human experience, numerical simulation, scanning electron microscope analysis, core test analysis and other methods to determine main control factors, and cannot quantitatively evaluate the importance of each influence factor. Compared with the traditional method, the method applies the random forest algorithm to the evaluation research of the main control factors of the quality of the compact sandstone gas reservoir, not only can automatically screen out the main control factors, but also can quantitatively evaluate the main control factors. The result obtained by the method has the advantages of universality, objectivity, accuracy and the like, and can provide a referential geological basis for the next gas reservoir description and the oil field development effect improvement.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of the quantitative evaluation method for the quality control factors of the tight sandstone gas reservoir according to the invention;
FIG. 2 is a flow chart illustrating data processing according to the present invention;
FIG. 3 is a schematic flow chart of dependent variable and independent variable selection according to the present invention;
FIG. 4 is a schematic flow chart of decision tree and random forest construction according to the present invention;
FIG. 5 is a schematic flow chart of the present invention for quantitatively evaluating reservoir dominating factors.
Detailed Description
The invention is further illustrated with reference to the following figures and examples. It should be noted that, in the present application, the embodiments and the technical features of the embodiments may be combined with each other without conflict. It is noted that, unless otherwise indicated, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The use of the terms "comprising" or "including" and the like in the present disclosure is intended to mean that the elements or items listed before the term cover the elements or items listed after the term and their equivalents, but not to exclude other elements or items.
As shown in fig. 1-5, the invention provides a method for quantitatively evaluating the main control factor of the quality of a tight sandstone gas reservoir based on a random forest, which comprises the following steps:
s1: relevant influencing factors influencing reservoir quality of the research area are collected, wherein the relevant influencing factors comprise sedimentation influencing factors, diagenetic influencing factors and construction influencing factors.
In a specific embodiment, the sedimentation type influencing factors comprise granularity lithology type, mineral lithology type, sorting property, roundness, various granularity parameters, contents of different types of mineral particles, impurity base content and primary pore content; the diagenetic influence factors comprise cementing action types, different types of cementing material contents, different types of dissolving hole contents, different types of alternate mineral contents and compaction strength; the construction type influence factors comprise fracture types and different types of fracture contents. It should be noted that the relevant influencing factors may be different for different research areas, and in other research areas, the influencing factors may not be applicable, or may have other influencing factors besides the influencing factors.
S2: performing parameter processing according to the parameter types of the influence factors, and saving the processed parameters as a comma separated value file (. csv) suitable for R language reading; the specific treatment method comprises the following steps:
if the influence factors are data parameters, checking whether the data parameters are missing, if a certain data parameter is missing, removing all influence factors of the same batch (for example, X influence factors are obtained, XY influence factor data are obtained through Y batch test, if the ith batch test only obtains the data of X-j (j is more than or equal to 1) influence factors, removing all influence factor data of the ith batch, namely, finally obtaining X (Y-1) influence factor data); if no deletion exists, reserving; therefore, the integrity of the data can be ensured, and the calculation precision of the subsequent steps is improved.
If the influence factor is a text parameter, carrying out assignment processing on the text parameter; for example, the values of "poor", "medium" and "good" in the sorting property are respectively assigned to 0, 1 and 2, and the values of the support type particles and the miscellaneous base are assigned to 0 and 1, so that the model can conveniently process the text parameters, and it should be noted that the size of the assignment of the text parameters has no influence on the result.
S3: taking the characterization parameters of the reservoir quality as dependent variables of the reservoir quality analysis, and taking the influencing factors as independent variables of the reservoir quality analysis; the characterization parameter of the reservoir quality is determined by the following steps:
(1) collecting and arranging production test data, wherein the production test data comprises oil production, gas production, water production and liquid production;
(2) collecting and sorting (or calculating) parameter data reflecting reservoir quality, wherein the parameter data comprises porosity, permeability, flow unit index (FZI) and pore throat structure; the flow cell index (FZI) is calculated by the following formula:
Figure BDA0003092318270000051
Figure BDA0003092318270000052
Figure BDA0003092318270000053
in the formula: FZI is a flow unit index, and is dimensionless; RQI is the quality index of the reservoir, and is dimensionless; k is the permeability, D;
Figure BDA0003092318270000054
porosity,%;
Figure BDA0003092318270000055
is the ratio of pore volume to particle volume.
(3) And (3) analyzing the correlation between the production test data and porosity, permeability, flow unit index (FZI) and pore throat structure parameters by taking the production test data as a standard for evaluating the quality of the reservoir, and determining the parameter which can most reflect the reservoir quality as a characterization parameter of the reservoir quality.
In a specific embodiment, the above method is used to determine the characterization parameter of the reservoir quality, and finally, the flow cell index is preferably selected as the characterization parameter of the reservoir quality.
S4: extracting training data by using a random forest algorithm and a replaced random sampling mode according to the dependent variable and the independent variable to construct a decision tree and a random forest; the method specifically comprises the following substeps:
s41: and (3) by using a random forest algorithm, taking the processed data set as an input sample data set, randomly extracting for many times to form subset data in a replacement mode, wherein the sampling times are consistent with the number of samples, and the subset data obtained by sampling is used for constructing a decision tree.
S42: randomly extracting part of influence factors from the sub-data set to form a candidate segmentation set, selecting one influence factor from the candidate segmentation set as a segmentation point of the decision tree according to a minimum node purity principle, continuing splitting by adopting the principle until all samples of the node reach leaf nodes, and finishing splitting.
The minimum principle of node purity, namely the minimum principle of the Kearny coefficient, can be characterized by calculating the Kearny coefficient of the segmentation point, and the node with the minimum Kearny coefficient is selected as the segmentation node, so that the minimum node purity after segmentation can be realized.
In the process of forming the decision tree, each node is split according to the mode. For example, in any subset of data, if the influence factors of random extraction include primary inter-granular pores, main particle size and cast mold pores, the kini coefficient is calculated, the primary inter-granular pores are selected as the segmentation nodes, and the splitting is finished until all samples of the node reach the leaf node.
S43: and repeating the step S41 and the step S42 to establish a plurality of decision trees, and forming the random forest by the decision trees.
S5: and calculating the error of the data outside the bag of the decision tree aiming at each influence factor, and preferably selecting the main control influence factor according to the error of the data outside the bag.
In the construction process of the decision tree, as a replaced random sampling mode is adopted, part of sample data is always not extracted and is called as data outside a bag; and the error between the predicted value of the data outside the bag in the decision tree and the real value thereof is the error of the data outside the bag. Step S5 specifically includes the following substeps:
s51: putting all influence factors in the data outside the bags into the constructed random forest, and calculating the predicted value of each data outside the bags through the random forest aiming at a certain decision tree;
s52: calculating a mean square error I between a predicted value and a true value of each data outside the bag;
s53: selecting a certain influence factor in the data outside the bag, randomly adding noise into the influence factor, then placing the influence factor into the random forest, and calculating to obtain a predicted value after the noise is added; calculating a mean square error II between a predicted value and a true value of the influencing factor with noise;
s54: judging the importance of the influence factor according to the magnitude of the first mean square error and the second mean square error of the influence factor:
if the second mean square error is larger than the first mean square error after random noise is added, and the difference value between the second mean square error and the first mean square error is larger than a threshold value, the influence factor is important, otherwise, the influence factor is unimportant;
s55: repeating the step S53 and the step S54, and judging all the remaining influence factors in the data outside the bag;
s56: and (4) repeating the step S51 and the step S55 aiming at each remaining decision tree in the random forest, calculating the out-of-bag data error of each influence factor, taking the average value of the out-of-bag data errors as the importance value of each influence factor, arranging the importance values in a descending order according to the size of the average value, wherein the influence factors in the top order are the main control factors influencing the reservoir quality.
In a specific embodiment, the method for quantitatively evaluating the reservoir quality main control factor further comprises the following steps:
s6: reconstructing a decision tree and a random forest according to the main control influence factors selected by the out-of-bag data errors;
s7: calculating the out-of-bag data errors of the optimal main control influence factors, and taking the average value of the out-of-bag data errors of the main control influence factors as the importance value of each main control influence factor;
s8: and calculating the percentage of the importance value of each preferred main control influence factor to all the preferred main control factor importance values, and quantitatively representing the influence degree of each main control influence factor on the reservoir quality.
In a specific embodiment, taking a research area of a tight sandstone gas reservoir as an example, the research area is located in a sunken xanthate structure in the west lake of the east-sea basin, a target layer is the lower section of a gradual-new Huagang group, the structure is located in the middle and south of a sunken central inversion structure zone in the west lake of the east-sea land frame basin, the structure is an NE-SW anticline structure, the stratum is relatively flat, the xanthate 1-1 mainly develops anticline encirclement, and the xanthate 2-2 mainly develops a low-amplitude anticline and broken anticline structure group on a secondary extrusion zone. The hong Kong group is a product deposited and filled in the initial new fracture-depression transition stage, a shallow lake-delta sedimentary system under the main development continental ground background has the sedimentary thickness of between 1000 and 2000m, the total thickness of the lower flower section is less than that of the upper flower section, and the main lithology is as follows: the porosity of the secondary feldspar sandstone, the secondary cuttings sandstone, the feldspar cuttings sandstone and the rock cuttings feldspar sandstone is between 2.1% and 12.5%, the porosity is mainly concentrated on 8% to 10%, the permeability is between 0.02 and 22.72mD, the permeability is higher except one fracture position, the permeability is lower than 2mD, the permeability is mainly concentrated on 0.1mD to 0.4mD, and the low-porosity and low-permeability sandstone reservoir belongs to a typical low-porosity and low-permeability compact sandstone reservoir. Early development practices show that although sand bodies in the research area are large in vertical thickness and continuously distributed in the transverse direction, the reservoir heterogeneity is extremely high, the productivity difference between different intervals of the same development well and between adjacent different development wells is extremely large, the gas production rate of some intervals reaches dozens of thousands of squares, and even some intervals have no capacity, so that the key of the production problem is the reservoir quality difference. Therefore, the main control factor for knowing the reservoir quality is the core basic problem for breaking the efficient development of the tight sandstone gas reservoir in the research area.
The reservoir quality main control factor quantitative evaluation method of the research area comprises the following steps:
the first step is as follows: and collecting sedimentary, diagenetic and tectonic parameters for controlling the quality of the geological reservoir.
(1) Collecting rock core description and experimental analysis data, wherein the rock core description and experimental analysis data comprise 636 pieces of deposition parameter data such as granularity lithology type, mineral lithology type, sorting property, roundness, different types of mineral particle content, impurity-based content, various granularity parameters (average value, standard deviation, skewness and kurtosis), different types of mineral percentage content and primary pore content, 345 pieces of formation parameter data such as cementing action type, different types of cement content (siliceous cement content, calcium cement content, argillaceous cement content and iron ore content), different types of soluble pore content (including inter-granular soluble pore content, intra-granular soluble pore content and cast film pore content), different types of clay mineral content, different types of cross-substituted mineral content and compaction strength, and 640 pieces of construction parameter data such as crack type and different types of crack content;
(2) for text type parameter data, classifying and assigning values to the text data, for example, assigning values of 'poor', 'medium' and 'good' in the sorting property to 0, 1 and 2 respectively, and assigning values of particles and miscellaneous bases of the support type to 0 and 1;
(3) by adopting the steps, samples with more missing parameters are removed, 340 pieces of sample data are obtained, the data set is represented by Q, and the data set is stored as a comma separated value file (. csv) and is suitable for reading in R language.
The second step is that: characterizing parameters for evaluating reservoir quality are determined.
(1) Collecting and arranging production test data including oil production, gas production, water production and liquid production;
(2) collecting and sorting (or calculating) parameter data reflecting reservoir quality, wherein the parameter data comprises porosity, permeability, flow unit index (FZI) and pore throat structure;
(3) analyzing the correlation between the production data and parameters such as porosity, permeability, flow unit index (FZI), pore throat structure and the like by taking the production data as a reservoir quality standard, preferably selecting the FZI as the parameter which can best reflect the reservoir quality, and determining the FZI as a reservoir quality characterization parameter;
(4) the flow unit index (FZI) is used as a dependent variable for controlling the quality of the compact sandstone gas reservoir, and other related influence factors of the reservoir quality, such as the content of inter-granular dissolved pores, the main grain diameter, the content of cast-die pores, the content of primary inter-granular pores and other parameters are used as independent variables.
The third step: and (5) constructing a decision tree and a random forest.
(1) And (3) by utilizing a random forest algorithm, taking the processed 340 sample data sets Q as input sample data sets, and randomly extracting 340 times from the input sample data sets in a replacement mode to form subset data M, wherein the subset data is used for constructing a decision tree.
(2) Each sample in M contains 34 characteristic values, i.e. 34 influencing factors, such as primary intergranular pore content, argillaceous foreign base content, quartz content and the like. Randomly extracting 1/3 influence factors from M to form a candidate division point set of the decision tree, and using C1And (4) showing. For example, the influence factors of random extraction are 11 influence factors such as potassium feldspar content, plagioclase feldspar content, quartz content, roundness, sorting property, support type, primary intergranular pore content and intergranular pore content, and the like, which constitute C1. Calculating C1Selecting the particle size soluble pore with the smallest influence factor as a division point of the decision tree, and dividing M into left and right sets respectively represented as MLAnd MR
The calculation method of the kini coefficient is as follows: the random forest adopts CART decision tree, in CART algorithm, because of binary tree classification, if the sample subset M is divided into M according to whether the characteristic A takes a certain possible value b or notLAnd MRThen, under the condition of the feature a, the kini coefficient of the set M is:
Figure BDA0003092318270000081
in the formula: gini (M) represents the uncertainty of the set M; gini (M, a) represents the uncertainty of the set M after a ═ b segmentation; the larger the kini coefficient, the greater the uncertainty in representing the sample, and the greater the node purity after segmentation.
(3) For set MLBy adopting the method, 11 influencing factors are randomly extracted to form a decisionSet of candidate segmentation points of tree C2For example, the influence factors such as the argillaceous content, the iron ore content, the roundness, the quartz content and the like are randomly extracted, and C is calculated2Selecting iron ore with the smallest influence factor as a segmentation node, and further using M as a node for segmentationLDividing the classification into a left classification set and a right classification set; this process is repeated until MLThe splitting is ended when each sample in the set reaches a leaf node.
(4)MRBy reaction of a compound with MLSame way of treatment up to MRWhen all the samples in the group of the leaf node reach the leaf node, the splitting is finished; at this point, the decision tree construction of the subset data M is completed.
(5) And (4) repeating the steps (1) to (4) for multiple times, establishing a plurality of decision trees, and forming a random forest by the plurality of decision trees.
The fourth step: determining main control factors influencing reservoir quality, and carrying out quantitative expression.
(1) Setting decision tree T1The corresponding data outside the bag is O1(ii) a Mixing O with1Putting the obtained solution into a constructed random forest for calculation to obtain a predicted value e of the random forest1(ii) a Calculating the predicted value e of the data outside the bag1Mean square error from the true value E, denoted error1
error1=mean(E-e1)2 (5)
(2) Data outside bag O1In (2), for each sample, an intergranular pore (with x) was selected1Express) this feature adds random noise, other feature values remain unchanged, and the out-of-bag data after adding noise is recorded as O2Substituting the prediction value into a random forest for calculation, and recording the obtained prediction value as e2Calculating the true value E and the predicted value E2Mean square error between, noted error2
error2=mean(E-e2)2 (6)
(3) For inter-granular pore (x)1) Calculating the difference of the two mean square errors, and recording as Sx1
Sx1=error2-error1 (7)
(4) Repeating the steps, respectively calculating the mean square errors of the remaining 33 characteristics in the data outside the bag, such as the argillaceous content, the primary intergranular pore content, the casting die pore content and the like, and respectively recording the mean square errors as Sxi(i=2,……,34);
(5) Repeating the steps (1) to (4) aiming at each decision tree in the random forest, calculating the error of the data outside the bag of each influence factor, taking the average value of the errors as the importance value of each influence factor, and recording the importance value as Wxi(i=1,2,……,34):
Figure BDA0003092318270000091
In the formula: k represents the number of decision trees in the random forest.
(6) To WxiAnd (3) sorting in a descending order, eliminating influence factors behind the sorting, such as removing 20 influence factors including plagioclase feldspar content, siliceous cement content, potassium feldspar content, sorting property, quartz content, intra-granular soluble pore content and the like, and preliminarily selecting 14 factors including inter-granular soluble pore content, cast mold pore content, a graphical method granularity average value, primary inter-granular pore content, main grain size and the like as main control factors influencing the reservoir quality.
(7) Based on the preliminarily selected main control influence factors, the decision tree and the random forest are reconstructed by adopting the steps, the error of data outside the bag is calculated, the experiment is repeated for a plurality of times, the average value of the error is taken as the importance value of each influence factor, the importance value is converted into a percentage form, and the result is shown in table 1:
TABLE 1 quantitative evaluation results of reservoir quality master control factors
Serial number Influencing factor Mean square error Percentage of importance
1 Content of intergranular pores 23.78726498 44.56%
2 Content of die holes 5.978337387 11.20%
3 Average particle size by graphical method 5.916006387 11.08%
4 Primary intergranular pore content 3.495146425 6.55%
5 Kaolinite content 3.256391689 6.10%
6 Major particle size 2.980834705 5.58%
7 Calcium cement content (calcite)Dolomite) 2.246414229 4.21%
8 Content of illite 1.420160649 2.66%
9 Mud content 1.340817622 2.51%
10 Maximum particle size 1.33884933 2.51%
11 X.S 0.616936967 1.16%
12 Content of illite-montmorillonite mixed layer 0.508265171 0.95%
13 Lithology of granularity 0.406021903 0.76%
14 Of the cement type 0.097017516 0.18%
The percentage of importance of each influence factor in table 1 can be used for describing the influence degree of each influence factor on the reservoir quality, so as to realize quantitative evaluation of the main control factor of the tight sandstone gas reservoir quality. In the example, the influence of the inter-granular dissolved pore content on the reservoir of the compact sandstone is the most important, the importance of the reservoir can reach 44.56% after quantification, and then the influence of the cementing type, the granular lithology and the content of the illite-smectite mixed layer is the least and is less than 1% after quantification. Therefore, compared with the primary deposition effect (average particle size, primary inter-particle pores, main particle size and the like) and the cementation effect (kaolinite cementation, calcareous cementation, illite cementation and the like), the accumulated influence degree of inter-particle dissolution pores and casting mold pores on the reservoir quality reaches 55.8 percent, namely the dissolution and erosion effect is a key factor for controlling the reservoir quality and determines the formation and distribution of the high-quality reservoir of the compact sandstone.
The method can quantitatively evaluate the main control factors influencing the reservoir quality, and has remarkable progress compared with the prior art which relies on methods such as human experience, numerical simulation, scanning electron microscope analysis, core test analysis and the like.
Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A method for quantitatively evaluating the main control factors of the quality of a tight sandstone gas reservoir based on a random forest is characterized by comprising the following steps:
s1: collecting relevant influence factors influencing reservoir quality in a research area, and performing parameter processing according to the parameter types of the influence factors:
if the influence factors are data parameters, checking whether the data parameters are missing, and if a certain data parameter is missing, removing all the influence factors of the same batch; if no deletion exists, reserving;
if the influence factor is a text parameter, carrying out assignment processing on the text parameter;
s2: storing the processed parameters as comma separated value files;
s3: taking the characterization parameters of the reservoir quality as dependent variables of the reservoir quality analysis, and taking the influencing factors as independent variables of the reservoir quality analysis;
s4: extracting training data by using a random forest algorithm and a replaced random sampling mode according to the dependent variable and the independent variable to construct a decision tree and a random forest;
s5: and calculating the error of the data outside the bag of the decision tree aiming at each influence factor, and preferably selecting the main control influence factor according to the error of the data outside the bag.
2. The method for quantitatively evaluating the main control factors of the quality of the tight sandstone gas reservoir based on the random forest as claimed in claim 1, wherein in the step S1, the relevant influence factors comprise sedimentation influence factors, diagenetic influence factors and construction influence factors.
3. The method for quantitatively evaluating the main control factors of the quality of the tight sandstone gas reservoir based on the random forest as claimed in claim 2, wherein the sedimentation influence factors comprise granularity lithology type, mineral lithology type, sortability, roundness grinding, various granularity parameters, contents of different types of mineral particles, impurity basis content and primary pore content; the diagenetic influence factors comprise cementing action types, different types of cementing material contents, different types of dissolving hole contents, different types of alternate mineral contents and compaction strength; the construction type influence factors comprise fracture types and different types of fracture contents.
4. The method for quantitatively evaluating the main control factors of the quality of the tight sandstone gas reservoir based on the random forest as claimed in claim 1, wherein in the step S3, the characterization parameters of the reservoir quality adopt flow cell indexes.
5. The method for quantitatively evaluating the main control factors of the quality of the tight sandstone gas reservoir based on the random forest as claimed in claim 1, wherein the step S4 specifically comprises the following substeps:
s41: by using a random forest algorithm, taking the processed data set as an input sample data set, randomly extracting the data set for multiple times to form subset data in a replacement mode, wherein the sampling times are consistent with the number of samples, and the subset data obtained by sampling is used for constructing a decision tree;
s42: randomly extracting part of influence factors from the sub-data set to form a candidate partition set, selecting one influence factor from the candidate partition set as a partition point of the decision tree according to a minimum node purity principle, continuing splitting by adopting the principle until all samples of the node reach leaf nodes, and finishing splitting;
s43: and repeating the step S41 and the step S42 to establish a plurality of decision trees, and forming the random forest by the decision trees.
6. The method for quantitatively evaluating the main control factors of the quality of the tight sandstone gas reservoir based on the random forest as claimed in claim 1, wherein the step S5 specifically comprises the following substeps:
s51: putting all influence factors in the data outside the bags into the constructed random forest, and calculating the predicted value of each data outside the bags through the random forest aiming at a certain decision tree;
s52: calculating a mean square error I between a predicted value and a true value of each data outside the bag;
s53: selecting a certain influence factor in the data outside the bag, randomly adding noise into the influence factor, then placing the influence factor into the random forest, and calculating to obtain a predicted value after the noise is added; calculating a mean square error II between a predicted value and a true value of the influencing factor with noise;
s54: judging the importance of the influence factor according to the magnitude of the first mean square error and the second mean square error of the influence factor:
if the second mean square error is larger than the first mean square error after random noise is added, and the difference value between the second mean square error and the first mean square error is larger than a threshold value, the influence factor is important, otherwise, the influence factor is unimportant;
s55: repeating the step S53 and the step S54, and judging all the remaining influence factors in the data outside the bag;
s56: and (4) repeating the step S51 and the step S55 aiming at each remaining decision tree in the random forest, calculating the out-of-bag data error of each influence factor, taking the average value of the out-of-bag data errors as the importance value of each influence factor, arranging the importance values in a descending order according to the size of the average value, wherein the influence factors in the top order are the main control factors influencing the reservoir quality.
7. The method for quantitatively evaluating the main control factors of the quality of the tight sandstone gas reservoir based on the random forest according to any one of claims 1 to 6, characterized by further comprising the following steps of:
s6: reconstructing a decision tree and a random forest according to the main control influence factors selected by the out-of-bag data errors;
s7: calculating the out-of-bag data errors of the optimal main control influence factors, and taking the average value of the out-of-bag data errors of the main control influence factors as the importance value of each main control influence factor;
s8: and calculating the percentage of the importance value of each preferred main control influence factor to all the preferred main control factor importance values, and quantitatively representing the influence degree of each main control influence factor on the reservoir quality.
CN202110599248.4A 2021-05-31 2021-05-31 Method for quantitatively evaluating quality master control factors of tight sandstone gas reservoir based on random forest Pending CN113344359A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110599248.4A CN113344359A (en) 2021-05-31 2021-05-31 Method for quantitatively evaluating quality master control factors of tight sandstone gas reservoir based on random forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110599248.4A CN113344359A (en) 2021-05-31 2021-05-31 Method for quantitatively evaluating quality master control factors of tight sandstone gas reservoir based on random forest

Publications (1)

Publication Number Publication Date
CN113344359A true CN113344359A (en) 2021-09-03

Family

ID=77472448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110599248.4A Pending CN113344359A (en) 2021-05-31 2021-05-31 Method for quantitatively evaluating quality master control factors of tight sandstone gas reservoir based on random forest

Country Status (1)

Country Link
CN (1) CN113344359A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106405050A (en) * 2016-09-28 2017-02-15 西安石油大学 Method for quantitatively evaluating ultra-deep reservoir diagenesis and pore evolution
CN106778836A (en) * 2016-11-29 2017-05-31 天津大学 A kind of random forest proposed algorithm based on constraints
CN106841001A (en) * 2017-01-17 2017-06-13 西南石油大学 A kind of tight sand porosity based on reservoir quality Analysis The Main Control Factor, Permeability Prediction method
CN110992200A (en) * 2019-12-11 2020-04-10 长江大学 Shale gas well staged fracturing effect evaluation and yield prediction method based on random forest
CN112329804A (en) * 2020-06-30 2021-02-05 中国石油大学(北京) Naive Bayes lithofacies classification integrated learning method and device based on feature randomness

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106405050A (en) * 2016-09-28 2017-02-15 西安石油大学 Method for quantitatively evaluating ultra-deep reservoir diagenesis and pore evolution
CN106778836A (en) * 2016-11-29 2017-05-31 天津大学 A kind of random forest proposed algorithm based on constraints
CN106841001A (en) * 2017-01-17 2017-06-13 西南石油大学 A kind of tight sand porosity based on reservoir quality Analysis The Main Control Factor, Permeability Prediction method
CN110992200A (en) * 2019-12-11 2020-04-10 长江大学 Shale gas well staged fracturing effect evaluation and yield prediction method based on random forest
CN112329804A (en) * 2020-06-30 2021-02-05 中国石油大学(北京) Naive Bayes lithofacies classification integrated learning method and device based on feature randomness

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高强山: "基于随机回归森林的储层预测方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)-基础科学辑》 *

Similar Documents

Publication Publication Date Title
CN110644980B (en) Comprehensive classification evaluation method for ultra-low permeability oil reservoir
CN104360039B (en) A kind of quantitative evaluation method for diagenetic facies of a tight sandstone reservoir
CN109034647A (en) A kind of method that densification oil-gas reservoir volume fracturing horizontal well refracturing selects well
CN108009716A (en) A kind of horizontal well volume fracturing influential effect factor mutiple-stage model method
CN111749686A (en) Drill bit rapid optimization method based on stratum drilling resistance parameters
CN107346455A (en) A kind of method for identifying shale gas production capacity
CN103308433A (en) Method for analyzing and evaluating tight sandstone reservoir diagenetic facies based on porosity evolution
CN110347720B (en) Fracturing well selection and layer selection method based on flow process
CN114215513B (en) Quantitative discrimination method, device, medium and equipment for buried hill reservoir mode
CN112965114B (en) Dessert evaluation method for offshore deep natural gas reservoir
CN107301483A (en) The rapid integrated method for evaluating non-producing reserves economic producing feasibility
CN116127675A (en) Prediction method for maximum recoverable reserve of shale oil horizontal well volume fracturing
CN112147301A (en) Quantitative evaluation method for effectiveness of continental-phase fresh water lake basin compact oil hydrocarbon source rock
CN112196513A (en) Longmaxi group shale gas well productivity prediction method based on horizontal well trajectory evaluation
CN116025324A (en) Intelligent sectional clustering method for fracturing grade of horizontal well
CN109403960B (en) Method for judging reservoir fluid properties by logging gas peak-logging state
Rich Petrographic analysis of Atokan carbonate rocks in central and southern Great Basin
Fan et al. Semi-supervised learning–based petrophysical facies division and “Sweet Spot” identification of low-permeability sandstone reservoir
CN113344359A (en) Method for quantitatively evaluating quality master control factors of tight sandstone gas reservoir based on random forest
Al Kattan et al. Cluster analysis approach to identify rock type in Tertiary Reservoir of Khabaz oil field case study
CN114638147A (en) Method for determining fracturing process parameters of oil and gas reservoir
CN111101914A (en) Horizontal well fracturing segment cluster optimization method and equipment
CN106227959B (en) Method and device for predicting lithologic reservoir favorable area based on four-graph superposition method
CN113820754B (en) Deep tight sandstone reservoir evaluation method based on artificial intelligence identification of reservoir lithofacies
CN108661629B (en) Engineering dessert quantitative evaluation method for shale stratum

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210903

RJ01 Rejection of invention patent application after publication