CN113344359A

CN113344359A - Method for quantitatively evaluating quality master control factors of tight sandstone gas reservoir based on random forest

Info

Publication number: CN113344359A
Application number: CN202110599248.4A
Authority: CN
Inventors: 甄艳; 康锦涛; 赵晓明; 葛家旺; 周雪松; 代茂林
Original assignee: Southwest Petroleum University
Current assignee: Southwest Petroleum University
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-09-03

Abstract

The invention discloses a method for quantitative evaluation of the main control factors of tight sandstone gas reservoir quality based on random forest, comprising the following steps: S1: collecting relevant influencing factors affecting the quality of the reservoir in a study area, and determining the parameters according to the parameter types of the influencing factors. Carry out parameter processing; S2: save the processed parameters as a comma-separated value file; S3: take the parameter representing the reservoir quality as the dependent variable, and take the influencing factor as the independent variable; S4: according to the dependent variable and the For independent variables, random forest algorithm is used to extract training data by random sampling with replacement, and a decision tree and random forest are constructed; S5: For each influencing factor, calculate the out-of-bag data error of the decision tree, according to the out-of-bag data error. Data errors are selected to select the main control influencing factors. The invention can realize quantitative evaluation of the main control factors of tight sandstone gas reservoir quality by utilizing the data error outside the bag, and provide a reference geological basis for the improvement of the oilfield development effect and the gas reservoir description in the next step.

Description

Method for quantitatively evaluating quality master control factors of tight sandstone gas reservoir based on random forest

Technical Field

The invention relates to the technical field of unconventional oil and gas reservoir development, in particular to a method for quantitatively evaluating main control factors of the quality of a tight sandstone gas reservoir based on a random forest.

Background

At the present stage, the world enters the conventional oil gas stable production-up and unconventional oil gas rapid development stage, and the compact sandstone gas becomes the key direction of unconventional natural gas development. A large number of development practices show that effective evaluation of the main quality control factors of the tight sandstone gas reservoir is a key basic problem for realizing large-scale and profitable development of the gas reservoir.

The current methods for evaluating the reservoir quality master control factors include the following categories: firstly, carrying out compact sandstone quality evaluation on a reservoir by using sedimentology and logging rock physics; secondly, reservoir quality is researched and distinguished by using the modes of core observation, slice identification, physical property analysis and the like; thirdly, fully combining the data of well logging, core observation, analysis and assay, oil testing and production trial, and the like, establishing a reservoir evaluation index system, and comprehensively analyzing reservoir quality influence factors; and fourthly, establishing a reservoir evaluation method and criterion by combining logging data and adopting a numerical simulation mode based on reservoir characteristics.

The method is mainly based on single-factor qualitative or semi-quantitative evaluation constrained by artificial experience, however, development proves that the quality of the tight sandstone reservoir is jointly controlled by multiple factors such as deposition, diagenesis, construction and the like. The existing method mainly focuses on qualitative or semi-quantitative analysis of single factor parameters, and the evaluation results of the method have the problems of subjectivity, non-universality and the like, so that the quality definition of a compact sandstone gas reservoir is unclear and inaccurate, the design of a development scheme and well location deployment are influenced, and particularly in a reservoir with serious heterogeneity, the permeability cannot be calculated with high precision through a pore structure and the radius of a pore throat. Therefore, it is highly desirable to provide a quantitative evaluation method of multifactorial linkage.

Disclosure of Invention

In order to solve the problems, the invention aims to provide a method for quantitatively evaluating the main control factors of the quality of the compact sandstone gas reservoir based on a random forest.

The technical scheme of the invention is as follows:

a method for quantitatively evaluating the main control factors of the quality of a tight sandstone gas reservoir based on a random forest comprises the following steps:

s1: collecting relevant influence factors influencing reservoir quality in a research area, and performing parameter processing according to the parameter types of the influence factors:

if the influence factors are data parameters, checking whether the data parameters are missing, and if a certain data parameter is missing, removing all the influence factors of the same batch; if no deletion exists, reserving;

if the influence factor is a text parameter, carrying out assignment processing on the text parameter;

s2: storing the processed parameters as comma separated value files;

s3: taking the characterization parameters of the reservoir quality as dependent variables of the reservoir quality analysis, and taking the influencing factors as independent variables of the reservoir quality analysis;

s4: extracting training data by using a random forest algorithm and a replaced random sampling mode according to the dependent variable and the independent variable to construct a decision tree and a random forest;

s5: and calculating the error of the data outside the bag of the decision tree aiming at each influence factor, and preferably selecting the main control influence factor according to the error of the data outside the bag.

Preferably, in step S1, the relevant influencing factors include a sedimentation-type influencing factor, a diagenesis-type influencing factor, and a construction-type influencing factor.

Preferably, the sedimentation influencing factors comprise granularity lithology type, mineral lithology type, sorting property, roundness grinding, various granularity parameters, different types of mineral particle content, impurity base content and primary pore content; the diagenetic influence factors comprise cementing action types, different types of cementing material contents, different types of dissolving hole contents, different types of alternate mineral contents and compaction strength; the construction type influence factors comprise fracture types and different types of fracture contents.

Preferably, in step S3, the characterization parameter of the reservoir quality is a flow cell index.

Preferably, step S4 specifically includes the following sub-steps:

s41: by using a random forest algorithm, taking the processed data set as an input sample data set, randomly extracting the data set for multiple times to form subset data in a replacement mode, wherein the sampling times are consistent with the number of samples, and the subset data obtained by sampling is used for constructing a decision tree;

s42: randomly extracting part of influence factors from the sub-data set to form a candidate partition set, selecting one influence factor from the candidate partition set as a partition point of the decision tree according to a minimum node purity principle, continuing splitting by adopting the principle until all samples of the node reach leaf nodes, and finishing splitting;

s43: and repeating the step S41 and the step S42 to establish a plurality of decision trees, and forming the random forest by the decision trees.

Preferably, step S5 specifically includes the following sub-steps:

s51: putting all influence factors in the data outside the bags into the constructed random forest, and calculating the predicted value of each data outside the bags through the random forest aiming at a certain decision tree;

s52: calculating a mean square error I between a predicted value and a true value of each data outside the bag;

s53: selecting a certain influence factor in the data outside the bag, randomly adding noise into the influence factor, then placing the influence factor into the random forest, and calculating to obtain a predicted value after the noise is added; calculating a mean square error II between a predicted value and a true value of the influencing factor with noise;

s54: judging the importance of the influence factor according to the magnitude of the first mean square error and the second mean square error of the influence factor:

if the second mean square error is larger than the first mean square error after random noise is added, and the difference value between the second mean square error and the first mean square error is larger than a threshold value, the influence factor is important, otherwise, the influence factor is unimportant;

s55: repeating the step S53 and the step S54, and judging all the remaining influence factors in the data outside the bag;

s56: and (4) repeating the step S51 and the step S55 aiming at each remaining decision tree in the random forest, calculating the out-of-bag data error of each influence factor, taking the average value of the out-of-bag data errors as the importance value of each influence factor, arranging the importance values in a descending order according to the size of the average value, wherein the influence factors in the top order are the main control factors influencing the reservoir quality.

Preferably, the method further comprises the following steps:

s6: reconstructing a decision tree and a random forest according to the main control influence factors selected by the out-of-bag data errors;

s7: calculating the out-of-bag data errors of the optimal main control influence factors, and taking the average value of the out-of-bag data errors of the main control influence factors as the importance value of each main control influence factor;

s8: and calculating the percentage of the importance value of each preferred main control influence factor to all the preferred main control factor importance values, and quantitatively representing the influence degree of each main control influence factor on the reservoir quality.

The invention has the beneficial effects that:

according to the method, the random forest algorithm is used for carrying out quantitative evaluation on related influence factors on the characterization parameters reflecting the reservoir quality, so that the extraction of key factors for controlling the reservoir quality is realized, the deposition and diagenesis of the reservoir are determined, and the formation and distribution of the tight sandstone high-quality reservoir are effectively judged. The traditional method mainly depends on human experience, numerical simulation, scanning electron microscope analysis, core test analysis and other methods to determine main control factors, and cannot quantitatively evaluate the importance of each influence factor. Compared with the traditional method, the method applies the random forest algorithm to the evaluation research of the main control factors of the quality of the compact sandstone gas reservoir, not only can automatically screen out the main control factors, but also can quantitatively evaluate the main control factors. The result obtained by the method has the advantages of universality, objectivity, accuracy and the like, and can provide a referential geological basis for the next gas reservoir description and the oil field development effect improvement.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of the quantitative evaluation method for the quality control factors of the tight sandstone gas reservoir according to the invention;

FIG. 2 is a flow chart illustrating data processing according to the present invention;

FIG. 3 is a schematic flow chart of dependent variable and independent variable selection according to the present invention;

FIG. 4 is a schematic flow chart of decision tree and random forest construction according to the present invention;

FIG. 5 is a schematic flow chart of the present invention for quantitatively evaluating reservoir dominating factors.

Detailed Description

The invention is further illustrated with reference to the following figures and examples. It should be noted that, in the present application, the embodiments and the technical features of the embodiments may be combined with each other without conflict. It is noted that, unless otherwise indicated, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The use of the terms "comprising" or "including" and the like in the present disclosure is intended to mean that the elements or items listed before the term cover the elements or items listed after the term and their equivalents, but not to exclude other elements or items.

As shown in fig. 1-5, the invention provides a method for quantitatively evaluating the main control factor of the quality of a tight sandstone gas reservoir based on a random forest, which comprises the following steps:

s1: relevant influencing factors influencing reservoir quality of the research area are collected, wherein the relevant influencing factors comprise sedimentation influencing factors, diagenetic influencing factors and construction influencing factors.

In a specific embodiment, the sedimentation type influencing factors comprise granularity lithology type, mineral lithology type, sorting property, roundness, various granularity parameters, contents of different types of mineral particles, impurity base content and primary pore content; the diagenetic influence factors comprise cementing action types, different types of cementing material contents, different types of dissolving hole contents, different types of alternate mineral contents and compaction strength; the construction type influence factors comprise fracture types and different types of fracture contents. It should be noted that the relevant influencing factors may be different for different research areas, and in other research areas, the influencing factors may not be applicable, or may have other influencing factors besides the influencing factors.

S2: performing parameter processing according to the parameter types of the influence factors, and saving the processed parameters as a comma separated value file (. csv) suitable for R language reading; the specific treatment method comprises the following steps:

if the influence factors are data parameters, checking whether the data parameters are missing, if a certain data parameter is missing, removing all influence factors of the same batch (for example, X influence factors are obtained, XY influence factor data are obtained through Y batch test, if the ith batch test only obtains the data of X-j (j is more than or equal to 1) influence factors, removing all influence factor data of the ith batch, namely, finally obtaining X (Y-1) influence factor data); if no deletion exists, reserving; therefore, the integrity of the data can be ensured, and the calculation precision of the subsequent steps is improved.

If the influence factor is a text parameter, carrying out assignment processing on the text parameter; for example, the values of "poor", "medium" and "good" in the sorting property are respectively assigned to 0, 1 and 2, and the values of the support type particles and the miscellaneous base are assigned to 0 and 1, so that the model can conveniently process the text parameters, and it should be noted that the size of the assignment of the text parameters has no influence on the result.

S3: taking the characterization parameters of the reservoir quality as dependent variables of the reservoir quality analysis, and taking the influencing factors as independent variables of the reservoir quality analysis; the characterization parameter of the reservoir quality is determined by the following steps:

(1) collecting and arranging production test data, wherein the production test data comprises oil production, gas production, water production and liquid production;

(2) collecting and sorting (or calculating) parameter data reflecting reservoir quality, wherein the parameter data comprises porosity, permeability, flow unit index (FZI) and pore throat structure; the flow cell index (FZI) is calculated by the following formula:

in the formula: FZI is a flow unit index, and is dimensionless; RQI is the quality index of the reservoir, and is dimensionless; k is the permeability, D;

porosity,%;

is the ratio of pore volume to particle volume.

(3) And (3) analyzing the correlation between the production test data and porosity, permeability, flow unit index (FZI) and pore throat structure parameters by taking the production test data as a standard for evaluating the quality of the reservoir, and determining the parameter which can most reflect the reservoir quality as a characterization parameter of the reservoir quality.

In a specific embodiment, the above method is used to determine the characterization parameter of the reservoir quality, and finally, the flow cell index is preferably selected as the characterization parameter of the reservoir quality.

S4: extracting training data by using a random forest algorithm and a replaced random sampling mode according to the dependent variable and the independent variable to construct a decision tree and a random forest; the method specifically comprises the following substeps:

s41: and (3) by using a random forest algorithm, taking the processed data set as an input sample data set, randomly extracting for many times to form subset data in a replacement mode, wherein the sampling times are consistent with the number of samples, and the subset data obtained by sampling is used for constructing a decision tree.

S42: randomly extracting part of influence factors from the sub-data set to form a candidate segmentation set, selecting one influence factor from the candidate segmentation set as a segmentation point of the decision tree according to a minimum node purity principle, continuing splitting by adopting the principle until all samples of the node reach leaf nodes, and finishing splitting.

The minimum principle of node purity, namely the minimum principle of the Kearny coefficient, can be characterized by calculating the Kearny coefficient of the segmentation point, and the node with the minimum Kearny coefficient is selected as the segmentation node, so that the minimum node purity after segmentation can be realized.

In the process of forming the decision tree, each node is split according to the mode. For example, in any subset of data, if the influence factors of random extraction include primary inter-granular pores, main particle size and cast mold pores, the kini coefficient is calculated, the primary inter-granular pores are selected as the segmentation nodes, and the splitting is finished until all samples of the node reach the leaf node.

In the construction process of the decision tree, as a replaced random sampling mode is adopted, part of sample data is always not extracted and is called as data outside a bag; and the error between the predicted value of the data outside the bag in the decision tree and the real value thereof is the error of the data outside the bag. Step S5 specifically includes the following substeps:

In a specific embodiment, the method for quantitatively evaluating the reservoir quality main control factor further comprises the following steps:

In a specific embodiment, taking a research area of a tight sandstone gas reservoir as an example, the research area is located in a sunken xanthate structure in the west lake of the east-sea basin, a target layer is the lower section of a gradual-new Huagang group, the structure is located in the middle and south of a sunken central inversion structure zone in the west lake of the east-sea land frame basin, the structure is an NE-SW anticline structure, the stratum is relatively flat, the xanthate 1-1 mainly develops anticline encirclement, and the xanthate 2-2 mainly develops a low-amplitude anticline and broken anticline structure group on a secondary extrusion zone. The hong Kong group is a product deposited and filled in the initial new fracture-depression transition stage, a shallow lake-delta sedimentary system under the main development continental ground background has the sedimentary thickness of between 1000 and 2000m, the total thickness of the lower flower section is less than that of the upper flower section, and the main lithology is as follows: the porosity of the secondary feldspar sandstone, the secondary cuttings sandstone, the feldspar cuttings sandstone and the rock cuttings feldspar sandstone is between 2.1% and 12.5%, the porosity is mainly concentrated on 8% to 10%, the permeability is between 0.02 and 22.72mD, the permeability is higher except one fracture position, the permeability is lower than 2mD, the permeability is mainly concentrated on 0.1mD to 0.4mD, and the low-porosity and low-permeability sandstone reservoir belongs to a typical low-porosity and low-permeability compact sandstone reservoir. Early development practices show that although sand bodies in the research area are large in vertical thickness and continuously distributed in the transverse direction, the reservoir heterogeneity is extremely high, the productivity difference between different intervals of the same development well and between adjacent different development wells is extremely large, the gas production rate of some intervals reaches dozens of thousands of squares, and even some intervals have no capacity, so that the key of the production problem is the reservoir quality difference. Therefore, the main control factor for knowing the reservoir quality is the core basic problem for breaking the efficient development of the tight sandstone gas reservoir in the research area.

The reservoir quality main control factor quantitative evaluation method of the research area comprises the following steps:

the first step is as follows: and collecting sedimentary, diagenetic and tectonic parameters for controlling the quality of the geological reservoir.

(1) Collecting rock core description and experimental analysis data, wherein the rock core description and experimental analysis data comprise 636 pieces of deposition parameter data such as granularity lithology type, mineral lithology type, sorting property, roundness, different types of mineral particle content, impurity-based content, various granularity parameters (average value, standard deviation, skewness and kurtosis), different types of mineral percentage content and primary pore content, 345 pieces of formation parameter data such as cementing action type, different types of cement content (siliceous cement content, calcium cement content, argillaceous cement content and iron ore content), different types of soluble pore content (including inter-granular soluble pore content, intra-granular soluble pore content and cast film pore content), different types of clay mineral content, different types of cross-substituted mineral content and compaction strength, and 640 pieces of construction parameter data such as crack type and different types of crack content;

(2) for text type parameter data, classifying and assigning values to the text data, for example, assigning values of 'poor', 'medium' and 'good' in the sorting property to 0, 1 and 2 respectively, and assigning values of particles and miscellaneous bases of the support type to 0 and 1;

(3) by adopting the steps, samples with more missing parameters are removed, 340 pieces of sample data are obtained, the data set is represented by Q, and the data set is stored as a comma separated value file (. csv) and is suitable for reading in R language.

The second step is that: characterizing parameters for evaluating reservoir quality are determined.

(1) Collecting and arranging production test data including oil production, gas production, water production and liquid production;

(2) collecting and sorting (or calculating) parameter data reflecting reservoir quality, wherein the parameter data comprises porosity, permeability, flow unit index (FZI) and pore throat structure;

(3) analyzing the correlation between the production data and parameters such as porosity, permeability, flow unit index (FZI), pore throat structure and the like by taking the production data as a reservoir quality standard, preferably selecting the FZI as the parameter which can best reflect the reservoir quality, and determining the FZI as a reservoir quality characterization parameter;

(4) the flow unit index (FZI) is used as a dependent variable for controlling the quality of the compact sandstone gas reservoir, and other related influence factors of the reservoir quality, such as the content of inter-granular dissolved pores, the main grain diameter, the content of cast-die pores, the content of primary inter-granular pores and other parameters are used as independent variables.

The third step: and (5) constructing a decision tree and a random forest.

(1) And (3) by utilizing a random forest algorithm, taking the processed 340 sample data sets Q as input sample data sets, and randomly extracting 340 times from the input sample data sets in a replacement mode to form subset data M, wherein the subset data is used for constructing a decision tree.

(2) Each sample in M contains 34 characteristic values, i.e. 34 influencing factors, such as primary intergranular pore content, argillaceous foreign base content, quartz content and the like. Randomly extracting 1/3 influence factors from M to form a candidate division point set of the decision tree, and using C₁And (4) showing. For example, the influence factors of random extraction are 11 influence factors such as potassium feldspar content, plagioclase feldspar content, quartz content, roundness, sorting property, support type, primary intergranular pore content and intergranular pore content, and the like, which constitute C₁. Calculating C₁Selecting the particle size soluble pore with the smallest influence factor as a division point of the decision tree, and dividing M into left and right sets respectively represented as M_LAnd M_R；

The calculation method of the kini coefficient is as follows: the random forest adopts CART decision tree, in CART algorithm, because of binary tree classification, if the sample subset M is divided into M according to whether the characteristic A takes a certain possible value b or not_LAnd M_RThen, under the condition of the feature a, the kini coefficient of the set M is:

in the formula: gini (M) represents the uncertainty of the set M; gini (M, a) represents the uncertainty of the set M after a ═ b segmentation; the larger the kini coefficient, the greater the uncertainty in representing the sample, and the greater the node purity after segmentation.

(3) For set M_LBy adopting the method, 11 influencing factors are randomly extracted to form a decisionSet of candidate segmentation points of tree C₂For example, the influence factors such as the argillaceous content, the iron ore content, the roundness, the quartz content and the like are randomly extracted, and C is calculated₂Selecting iron ore with the smallest influence factor as a segmentation node, and further using M as a node for segmentation_LDividing the classification into a left classification set and a right classification set; this process is repeated until M_LThe splitting is ended when each sample in the set reaches a leaf node.

(4)M_RBy reaction of a compound with M_LSame way of treatment up to M_RWhen all the samples in the group of the leaf node reach the leaf node, the splitting is finished; at this point, the decision tree construction of the subset data M is completed.

(5) And (4) repeating the steps (1) to (4) for multiple times, establishing a plurality of decision trees, and forming a random forest by the plurality of decision trees.

The fourth step: determining main control factors influencing reservoir quality, and carrying out quantitative expression.

(1) Setting decision tree T₁The corresponding data outside the bag is O₁(ii) a Mixing O with₁Putting the obtained solution into a constructed random forest for calculation to obtain a predicted value e of the random forest₁(ii) a Calculating the predicted value e of the data outside the bag₁Mean square error from the true value E, denoted error₁：

error₁＝mean(E-e₁)² (5)

(2) Data outside bag O₁In (2), for each sample, an intergranular pore (with x) was selected₁Express) this feature adds random noise, other feature values remain unchanged, and the out-of-bag data after adding noise is recorded as O₂Substituting the prediction value into a random forest for calculation, and recording the obtained prediction value as e₂Calculating the true value E and the predicted value E₂Mean square error between, noted error₂：

error₂＝mean(E-e₂)² (6)

(3) For inter-granular pore (x)₁) Calculating the difference of the two mean square errors, and recording as S_x1：

S_x1＝error₂-error₁ (7)

(4) Repeating the steps, respectively calculating the mean square errors of the remaining 33 characteristics in the data outside the bag, such as the argillaceous content, the primary intergranular pore content, the casting die pore content and the like, and respectively recording the mean square errors as S_xi(i＝2,……,34)；

(5) Repeating the steps (1) to (4) aiming at each decision tree in the random forest, calculating the error of the data outside the bag of each influence factor, taking the average value of the errors as the importance value of each influence factor, and recording the importance value as W_xi(i＝1,2,……,34)：

In the formula: k represents the number of decision trees in the random forest.

(6) To W_xiAnd (3) sorting in a descending order, eliminating influence factors behind the sorting, such as removing 20 influence factors including plagioclase feldspar content, siliceous cement content, potassium feldspar content, sorting property, quartz content, intra-granular soluble pore content and the like, and preliminarily selecting 14 factors including inter-granular soluble pore content, cast mold pore content, a graphical method granularity average value, primary inter-granular pore content, main grain size and the like as main control factors influencing the reservoir quality.

(7) Based on the preliminarily selected main control influence factors, the decision tree and the random forest are reconstructed by adopting the steps, the error of data outside the bag is calculated, the experiment is repeated for a plurality of times, the average value of the error is taken as the importance value of each influence factor, the importance value is converted into a percentage form, and the result is shown in table 1:

TABLE 1 quantitative evaluation results of reservoir quality master control factors

Serial number	Influencing factor	Mean square error	Percentage of importance
				1	Content of intergranular pores	23.78726498	44.56％
2	Content of die holes	5.978337387	11.20％
				3	Average particle size by graphical method	5.916006387	11.08％
4	Primary intergranular pore content	3.495146425	6.55％
				5	Kaolinite content	3.256391689	6.10％
6	Major particle size	2.980834705	5.58％
				7	Calcium cement content (calcite)Dolomite)	2.246414229	4.21％
8	Content of illite	1.420160649	2.66％
				9	Mud content	1.340817622	2.51％
10	Maximum particle size	1.33884933	2.51％
				11	X.S	0.616936967	1.16％
12	Content of illite-montmorillonite mixed layer	0.508265171	0.95％
				13	Lithology of granularity	0.406021903	0.76％
14	Of the cement type	0.097017516	0.18％

The percentage of importance of each influence factor in table 1 can be used for describing the influence degree of each influence factor on the reservoir quality, so as to realize quantitative evaluation of the main control factor of the tight sandstone gas reservoir quality. In the example, the influence of the inter-granular dissolved pore content on the reservoir of the compact sandstone is the most important, the importance of the reservoir can reach 44.56% after quantification, and then the influence of the cementing type, the granular lithology and the content of the illite-smectite mixed layer is the least and is less than 1% after quantification. Therefore, compared with the primary deposition effect (average particle size, primary inter-particle pores, main particle size and the like) and the cementation effect (kaolinite cementation, calcareous cementation, illite cementation and the like), the accumulated influence degree of inter-particle dissolution pores and casting mold pores on the reservoir quality reaches 55.8 percent, namely the dissolution and erosion effect is a key factor for controlling the reservoir quality and determines the formation and distribution of the high-quality reservoir of the compact sandstone.

The method can quantitatively evaluate the main control factors influencing the reservoir quality, and has remarkable progress compared with the prior art which relies on methods such as human experience, numerical simulation, scanning electron microscope analysis, core test analysis and the like.

Although the present invention has been described with reference to a preferred embodiment, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. a method for quantitative evaluation of tight sandstone gas reservoir quality main control factors based on random forest, is characterized in that, comprises the following steps:

S1: Collect relevant influencing factors affecting reservoir quality in the study area, and perform parameter processing according to the parameter types of the influencing factors:

If the influencing factor is a data class parameter, check whether the data class parameter is missing, if a data class parameter is missing, remove all the influencing factors in the same batch; if there is no missing, keep it;

If the influencing factor is a text parameter, perform assignment processing on the text parameter;

S2: Save the processed parameters as a comma-separated value file;

S3: The characterization parameter of the reservoir quality is used as the dependent variable of the reservoir quality analysis, and the influencing factor is used as the independent variable of the reservoir quality analysis;

S4: According to the dependent variable and the independent variable, use the random forest algorithm to extract training data by using a random sampling method with replacement to construct a decision tree and a random forest;

S5: For each influencing factor, calculate the out-of-bag data error of the decision tree, and select the main control influence factor according to the out-of-bag data error.

2. The method for quantitative evaluation of main control factors of tight sandstone gas reservoir quality based on random forest according to claim 1, wherein in step S1, the relevant influencing factors include sedimentary influencing factors, diagenetic influencing factors, and structural factors.

3. The method for quantitative evaluation of main control factors of tight sandstone gas reservoir quality based on random forest according to claim 2, wherein the sedimentary influencing factors include granularity lithology type, mineral lithology type, sorting property , roundness, various particle size parameters, content of different types of mineral particles, content of miscellaneous bases, content of primary pores; the diagenetic influencing factors include the type of cementation, the content of different types of cement, the content of different types of dissolved pores, the content of different types of metasomatism Mineral content and compaction strength; the structural influencing factors include fracture type and content of different types of fractures.

4 . The method for quantitatively evaluating main control factors of tight sandstone gas reservoir quality based on random forest according to claim 1 , wherein, in step S3 , a flow unit index is used as the characterization parameter of the reservoir quality. 5 .

5. The method for quantitative evaluation of tight sandstone gas reservoir quality main control factors based on random forest according to claim 1, wherein step S4 specifically comprises the following sub-steps:

S41: Using the random forest algorithm, the processed data set is used as the input sample data set, and the method with replacement is used to randomly select multiple times from it to form a subset data. The sampling times are consistent with the number of samples, and the subset obtained by sampling The data is used to build a decision tree;

S42: Randomly extract some influencing factors from the sub-data set to form a candidate segmentation set, select an influencing factor from the candidate segmentation set as the segmentation point of the decision tree according to the principle of minimum node impurity, and continue to split using this principle until all samples of the node are When the leaf node is reached, the split ends;

S43: Repeat steps S41 and S42 to establish a plurality of decision trees, and the random forest is constituted by the plurality of decision trees.

6. The method for quantitative evaluation of tight sandstone gas reservoir quality main control factors based on random forest according to claim 1, wherein step S5 specifically comprises the following sub-steps:

S51: Put each influencing factor in the out-of-bag data into the constructed random forest, and for a certain decision tree, calculate the predicted value of each out-of-bag data through the random forest;

S52: Calculate the mean square error 1 between the predicted value of each out-of-bag data and the actual value;

S53: In the out-of-bag data, select a certain influencing factor, add noise to it randomly, and then put it into the random forest, and calculate the predicted value after adding noise; calculate the predicted value of the influence factor with noise and the The mean square error between the true values 2;

S54: Judge the importance of the influencing factor according to the size of the mean square error 1 and the mean square error 2 of the influencing factor:

If after adding random noise, the mean square error 2 is greater than the mean square error 1, and the difference between the two is greater than the threshold, then the influencing factor is important, otherwise it is not important;

S55: Repeat steps S53 and S54 to judge all remaining influencing factors in the data outside the bag;

S56: For each remaining decision tree in the random forest, repeat steps S51 and S55, calculate the out-of-bag data error of each influencing factor, and take the average value as the importance value of each influencing factor, according to the average value Arranged in descending order of size, the top-ranked influencing factor is the main controlling factor affecting the reservoir quality.

7. The method for quantitative evaluation of the main control factor of tight sandstone gas reservoir quality based on random forest according to any one of claims 1-6, is characterized in that, also comprises the following steps:

S6: Rebuild the decision tree and random forest according to the main control influencing factors selected by the error of the out-of-bag data;

S7: Calculate the out-of-bag data error of each preferred main control influence factor, and take the average value of out-of-bag data error of each main control influence factor as the importance value of each main control influence factor;

S8: Calculate the percentage of the importance value of each preferred main control influencing factor to the importance value of all preferred main control factors, and use it to quantitatively characterize the influence degree of each main control influencing factor on the reservoir quality.