CN117851762A

CN117851762A - Machine learning reservoir permeability determination method and device based on PI and deep FM

Info

Publication number: CN117851762A
Application number: CN202311524013.4A
Authority: CN
Inventors: 姜福杰; 曹流; 霍利娜; 陈迪; 陈伟业; 郭婧; 齐振国; 庞雄奇; 陈冬霞; 陈君青
Original assignee: China University of Petroleum Beijing
Current assignee: China University of Petroleum Beijing
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2024-04-09

Abstract

The application provides a machine learning reservoir permeability determination method and device based on PI and deep FM. The method comprises the following steps: obtaining a target core and a sample core; processing the sample core based on an arrangement importance algorithm to obtain a target attribute of the sample core; taking the target attribute of the sample core as the target attribute of the target core; determining an attribute value corresponding to a target attribute of the target core; and inputting an attribute value corresponding to the target attribute of the target core into a target prediction model to obtain the target permeability of the target core. The method can perform credible and interpretable quantitative display on the importance of the total attributes in advance to obtain the target attribute with great influence on the permeability, calculate the permeability based on the attribute value corresponding to the target attribute, automatically mine the linear, low-order and high-order characteristic interaction information among the attribute values, and improve the accuracy of the permeability calculation.

Description

Machine learning reservoir permeability determination method and device based on PI and deep FM

Technical Field

The specification belongs to the technical field of petroleum and natural gas exploration, and particularly relates to a machine learning reservoir permeability determination method and device based on PI and deep FM.

Background

In the prior art, a permeability prediction model is built on the basis of a machine learning algorithm in advance, target influence factors with large influence on permeability prediction are screened out from a plurality of permeability influence factors in advance through a feature selection method, and then the target influence factors are input into the permeability prediction model to obtain the permeability. The existing feature selection methods comprise a Pearson algorithm, a Spearman algorithm, a principal component analysis method and the like; the Pearson algorithm and the principal component analysis method are more suitable for linear relations, and the advantages of the Pearson algorithm and the principal component analysis method are not obvious aiming at nonlinear relations; the Spearman algorithm is applicable to monotonic relationships between features and is insensitive to outliers. Therefore, the existing characteristic selection method cannot accurately and interpretably screen out target influence factors with larger influence on permeability prediction, and the accuracy of the follow-up predicted permeability is influenced. Meanwhile, a common permeability prediction machine learning algorithm or a classical model in the geological field cannot simultaneously mine linear, low-order and high-order characteristic interaction information among target influence factors, so that accuracy of subsequent permeability prediction is affected.

In view of the above technical problems, no effective solution has been proposed at present.

Disclosure of Invention

The application provides a machine learning reservoir permeability determining method and device based on PI and deep FM, which can accurately calculate reservoir permeability.

An object of an embodiment of the present application is to provide a method for determining a machine learning reservoir permeability based on PI and deep fm, including:

obtaining a target core and a sample core;

processing the sample core based on an arrangement importance algorithm to obtain a target attribute of the sample core; taking the target attribute of the sample core as the target attribute of the target core;

determining an attribute value corresponding to a target attribute of the target core;

and inputting an attribute value corresponding to the target attribute of the target core into a target prediction model to obtain the target permeability of the target core.

Further, in another embodiment of the method, the processing the sample core based on the ranking importance algorithm to obtain the target attribute of the sample core includes:

acquiring a sample data set of a sample core; the sample data set comprises a full attribute, an attribute value corresponding to the full attribute and a sample permeability;

respectively processing the sample data set by using a plurality of verification models to obtain a plurality of first permeabilities of the sample core;

Based on an arrangement importance algorithm, determining a plurality of groups of importance parameters corresponding to the full-scale attribute according to the sample permeability and the first permeability;

according to a plurality of groups of importance parameters, the full-quantity attributes are respectively arranged in a descending order to obtain a plurality of sequencing results;

screening out a target sorting result with higher accuracy from a plurality of sorting results;

and screening a plurality of full-quantity attributes with the importance parameter larger than an importance threshold value from the target sorting result to serve as target attributes.

Further, in another embodiment of the method, the plurality of verification models includes: decision tree model, random forest model, extreme gradient lifting model, and lightweight gradient lifting model.

Further, in another embodiment of the method, the processing the sample data sets with a plurality of verification models, respectively, to obtain a plurality of first permeabilities of the sample core includes:

randomly arranging attribute values corresponding to the full attributes to obtain a plurality of random data sets;

and respectively inputting the plurality of random data sets into a plurality of verification models to obtain a plurality of first permeabilities of the sample core.

Further, in another embodiment of the method, the determining, based on the rank importance algorithm, a plurality of sets of importance parameters corresponding to the full-scale attribute according to the sample permeability and the first permeability includes:

calculating a plurality of groups of performance parameters of a verification model aiming at a random data set according to the sample permeability and the first permeability;

and calculating a plurality of groups of importance parameters corresponding to the full-quantity attribute according to the plurality of groups of performance parameters.

Further, in another embodiment of the method, the screening the target sorting result with higher accuracy from the plurality of sorting results includes:

extracting full-quantity attributes with importance parameters larger than zero from the sequencing results respectively to form a plurality of attribute sets to be detected;

detecting whether the attribute set to be detected has a reference attribute or not to obtain a detection result;

according to the detection result, taking the attribute set to be detected with the determined reference attribute as a target attribute set;

and taking the sorting result corresponding to the target attribute set as a target sorting result.

Further, in another embodiment of the method, the reference attribute includes: fractal dimension, porosity, clay content.

Further, in another embodiment of the method, the target prediction model is a depth factorizer model.

An object of an embodiment of the present application is to provide a PI and deep fm-based machine learning reservoir permeability determination apparatus, including:

the acquisition module is used for acquiring a target core and a sample core;

the processing module is used for processing the sample core based on an arrangement importance algorithm to obtain target attributes of the sample core; taking the target attribute of the sample core as the target attribute of the target core;

the determining module is used for determining an attribute value corresponding to the target attribute of the target core;

and the calculation module is used for inputting the attribute value corresponding to the target attribute of the target core into a target prediction model to obtain the target permeability of the target core.

An object of an embodiment of the present application is to provide a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the above-mentioned PI and deep fm based machine learning reservoir permeability determination method.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure, the drawings that are required for the embodiments will be briefly described below, and the drawings described below are only some embodiments described in the present disclosure, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic diagram of one embodiment of a PI and deep FM based machine learning reservoir permeability determination method provided in embodiments of the present disclosure;

fig. 2 is a schematic structural diagram of a deep fm model according to an embodiment of the present disclosure;

FIG. 3 is a plot of MSE and RMSE before and after feature screening provided in an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a PI and deep fm-based machine learning reservoir permeability determining apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of an embodiment of a structure of a server according to an embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

In the prior art, a permeability prediction model is built on the basis of a machine learning algorithm in advance, target influence factors with large influence on permeability prediction are screened out from a plurality of permeability influence factors in advance through a feature selection method, and then the target influence factors are input into the permeability prediction model to obtain the permeability.

Considering existing feature selection methods including Pearson algorithm, spearman algorithm, principal component analysis, analytic hierarchy process, tree family algorithm, etc.; the Pearson algorithm and the principal component analysis method are more suitable for linear relations, and the advantages of the Pearson algorithm and the principal component analysis method are not obvious aiming at nonlinear relations; the Spearman algorithm is applicable to monotonic relation among features and is insensitive to abnormal values; the subjectivity of the analytic hierarchy process is too strong; the tree family algorithm is based on various mathematical calculation methods, has certain preference on the characteristics and has certain error. Therefore, the existing characteristic selection method cannot accurately and interpretably screen out target influence factors with larger influence on permeability prediction, and the accuracy of the follow-up predicted permeability is influenced.

Further, it is also considered that because of the complex pore structure of unconventional reservoirs such as tight sandstone, shale and the like, the heterogeneity is strong, the interaction between factors influencing the permeability of the reservoir is very complex, and the interaction information between low-order features (features, namely influencing factors) and high-order features is very important. The conventional geological domain classical model and machine learning prediction model have defects, most of characteristic interactions are difficult to completely determine prior and model, so that interaction of low-order and high-order characteristics cannot be fully considered, a permeability prediction model with high accuracy cannot be constructed, and accuracy of subsequent prediction of permeability is affected. For example, in the prior art, the geology classical model referred to in table 5 was constructed based on a multiple linear regression algorithm, considering only the influence of each feature on permeability, and not the interaction information between features.

Aiming at the problems and the specific reasons for generating the problems in the prior art, the application considers and provides a method and a device for determining the permeability of a machine learning reservoir based on PI and deep FM so as to improve the accuracy of calculating the permeability of the reservoir.

Based on the above thought, the present specification proposes a machine learning reservoir permeability determining method based on PI and deep fm, the method comprising: obtaining a target core and a sample core; processing the sample core based on an arrangement importance algorithm to obtain a target attribute of the sample core; taking the target attribute of the sample core as the target attribute of the target core; determining an attribute value corresponding to a target attribute of the target core; and inputting an attribute value corresponding to the target attribute of the target core into a target prediction model to obtain the target permeability of the target core.

Referring to fig. 1, the present disclosure proposes a machine learning reservoir permeability determination method based on PI and deep fm. In particular implementations, the method may include the following.

S101: and obtaining a target core and a sample core.

In some embodiments, the target core is derived from a target reservoir, typically a plurality of target cores; for example, sampling at different locations of a target reservoir may result in a plurality of target cores. The sample core is derived from a sample reservoir, and the geological condition of the sample reservoir is relatively close to that of a target reservoir, so that the physical properties of the target core are relatively close to those of the sample core; the number of sample cores is typically plural, and the number of sample cores is typically less than the target core.

S102: processing the sample core based on an arrangement importance algorithm to obtain a target attribute of the sample core; and taking the target attribute of the sample core as the target attribute of the target core.

In some embodiments, the processing the sample core based on the ranking importance algorithm to obtain the target attribute of the sample core specifically includes:

s1: acquiring a sample data set of a sample core; the sample data set comprises a full attribute, an attribute value corresponding to the full attribute and a sample permeability;

s2: respectively processing the sample data set by using a plurality of verification models to obtain a plurality of first permeabilities of the sample core;

s3: based on an arrangement importance algorithm, determining a plurality of groups of importance parameters corresponding to the full-scale attribute according to the sample permeability and the first permeability;

s4: according to a plurality of groups of importance parameters, the full-quantity attributes are respectively arranged in a descending order to obtain a plurality of sequencing results;

s5: screening out a target sorting result with higher accuracy from a plurality of sorting results;

s6: and screening a plurality of full-quantity attributes with the importance parameter larger than an importance threshold value from the target sorting result to serve as target attributes.

In some embodiments, a full-scale attribute refers to all attributes that may have an impact on permeability predictions; among the full-scale attributes, some attributes have a great influence on permeability prediction and need to be extracted; some properties have little impact on permeability predictions and need to be culled.

In some embodiments, the full-scale attributes specifically include: microscopic characterization parameters of pore structure, mineral composition parameters, and porosity

In some embodiments, the pore structure microscopic characterization parameters specifically include: the micro-characterization parameters of the pore structure comprise a sorting coefficient (Sp), a displacement pressure (Pcd), a pore radius (R25) corresponding to the accumulated mercury intrusion saturation of 25%, a pore radius (R20) corresponding to the accumulated mercury intrusion saturation of 20%, a maximum pore throat radius (Rmax), an average pore throat radius (Rave), pore throat distribution peak positions (PSD-P), a characteristic structure coefficient (P1), a relative sorting coefficient (Dr), a skewness (Skp), a final residual saturation (Smin), a fractal dimension (D6) and an inflection point radius (Rapex).

In some embodiments, the mineral composition parameters specifically include: feldspar content (Feldspar), quartz content (Quartz), carbonate content (Carbonate), clay content (Clay).

In some embodiments, the attribute value corresponding to the full-scale attribute may be obtained by:

S1: performing a high-pressure mercury injection experiment on a sample rock core to obtain an attribute value corresponding to a microscopic characterization parameter sorting coefficient (Sp), a displacement pressure (PCd), a pore radius (R25) corresponding to an accumulated mercury injection saturation of 25%, a pore radius (R20) corresponding to an accumulated mercury injection saturation of 20%, a maximum pore throat radius (Rmax), an average pore throat radius (Rave), pore throat distribution peak positions (PSD-P), a characteristic structure coefficient (P1), a relative sorting coefficient (Dr), a skewness (Skp), a final residual saturation (Smin) and an inflection point radius (Rapex) of a pore structure microscopic characterization parameter sorting coefficient (Pcd);

s2: obtaining mercury saturation distribution based on a high-pressure mercury-pressing experiment, and obtaining attribute values corresponding to fractal dimension (D6);

s3: performing an X-ray diffraction experiment on the sample core to obtain attribute values corresponding to Feldspar content (Feldspar), quartz content (Quartz), carbonate content (carbon) and Clay content (Clay);

s4: performing a hydrogen measurement porosity experiment on the sample core to obtain porosityCorresponding attribute values.

In some embodiments, the inflection radius (Rapex) is calculated as:

wherein Rapex represents inflection radius, S _Hg,m+1 Represents the mercury saturation corresponding to the (m+1) th data point, S _Hg,m Represents the mercury saturation corresponding to the mth data point, P _c Indicating the mercury intrusion pressure.

In some embodiments, the fractal dimension is used to describe the surface morphology of the sample core, and the calculation formula of the fractal dimension is:

wherein r represents the pore radius, N (> r) represents the number of pores having a pore radius greater than r, N (r) represents the number of pores having a pore radius equal to r, r _max Represents the maximum value of the pore radius, P (r) represents the pore radius density distribution function, D6 represents the fractal dimension, V _Hg (r) represents the cumulative mercury volume corresponding to the pore radius r, and l represents the sample core length (or capillary length).

In some embodiments, permeability experiments may be performed on the sample core to obtain the permeability of the sample core (i.e., sample permeability). The sample permeability is a true value of the permeability of the sample core. The mode of measuring the permeability through the permeability experiment is complex in operation and high in experiment cost, so that the method is not beneficial to popularization and application to a plurality of target cores. According to the method, firstly, based on a sample data set of a sample core and an arrangement importance algorithm, target attributes of the sample core with higher importance for predicting reservoir permeability are screened out, then, the target attributes of the sample core are used as target attributes of the target core, further, attribute values corresponding to the target attributes of the target core are obtained, finally, the attribute values corresponding to the target attributes and a depth factor decomposition machine model (deep FM model) are utilized, and meanwhile, linear, low-order and high-order characteristic interaction information among the target attributes is mined, so that the target permeability of the target core is predicted. The mode of this application record can reduce the experimental cost of obtaining the permeability to guarantee the accuracy of calculating the permeability.

Based on the embodiment, the attribute values and the sample permeability corresponding to the total attributes of the plurality of sample cores can be obtained, and a sample data set is constructed; and the corresponding relation exists between the attribute value corresponding to the sample core total attribute and the sample permeability.

In some embodiments, the sample data set is processed by a plurality of verification models respectively to obtain a plurality of first permeabilities of the sample core, which specifically includes:

s1: randomly arranging attribute values corresponding to the full attributes to obtain a plurality of random data sets;

s2: and respectively inputting the plurality of random data sets into a plurality of verification models to obtain a plurality of first permeabilities of the sample core.

In some embodiments, the plurality of verification models includes: decision Tree model (DT model), random Forest model (RF model), extreme gradient lifting model (XGBoost model), lightweight gradient lifting model (LightGBM model).

In some embodiments, the verification model is trained beforehand by a machine learning algorithm. The verification model is also a permeability prediction model, and the predicted permeability is output by inputting attribute values corresponding to the full-scale attributes. In the method, the importance parameters of all the full-quantity attributes are obtained based on the permeability (namely the first permeability) output by the verification model, so that the target attribute with higher importance degree is selected from the full-quantity attributes.

In some embodiments, there is a correspondence between the attribute value corresponding to the full-scale attribute of the sample core and the sample permeability, a first full-scale attribute is selected, the correspondence between the attribute value corresponding to the full-scale attribute and the sample permeability is disturbed many times (i.e. the attribute values corresponding to the full-scale attribute are randomly arranged), the correspondence between the attribute values corresponding to other full-scale attributes and the sample permeability is kept unchanged, and a first random data set can be obtained after each disturbance (after random arrangement).

In some embodiments, a second full-scale attribute is selected, and the correspondence between the attribute value corresponding to the full-scale attribute and the sample permeability is disturbed multiple times (i.e., the attribute values corresponding to the full-scale attribute are randomly arranged), so that the correspondence between the attribute values corresponding to other full-scale attributes and the sample permeability is kept unchanged, and a second random data set can be obtained after each disturbance (after random arrangement).

And repeating the steps, and sequentially carrying out random arrangement on attribute values corresponding to all the attributes to obtain a plurality of third random data sets, a plurality of fourth random data sets, a plurality of fifth random data sets, a plurality of sixth random data sets, a plurality of seventh random data sets, a plurality of eighth random data sets, a plurality of ninth random data sets, a plurality of tenth random data sets, a plurality of eleventh random data sets, a plurality of twelfth random data sets, a plurality of thirteenth random data sets, a plurality of fourteenth random data sets, a plurality of fifteenth random data sets, a plurality of sixteenth random data sets, a plurality of seventeenth random data sets and a plurality of eighteenth random data sets.

In some embodiments, the plurality of first random data sets are respectively input into a decision tree model, a random forest model, an extreme gradient lifting model, and a lightweight gradient lifting model, the decision tree model outputs a plurality of first permeabilities, the random forest model outputs a plurality of first permeabilities, the extreme gradient lifting model outputs a plurality of first permeabilities, the lightweight gradient lifting model outputs a plurality of first permeabilities, and the first permeabilities are predicted based on the random data sets after random arrangement.

In some embodiments, the plurality of second random data sets are input into a decision tree model, a random forest model, an extreme gradient lifting model, and a lightweight gradient lifting model, respectively, the decision tree model outputting the plurality of first permeabilities, the random forest model outputting the plurality of first permeabilities, the extreme gradient lifting model outputting the plurality of first permeabilities, the lightweight gradient lifting model outputting the plurality of first permeabilities.

The steps are repeated, and a plurality of third random data sets, a plurality of fourth random data sets, a plurality of fifth random data sets, a plurality of sixth random data sets, a plurality of seventh random data sets, a plurality of eighth random data sets, a plurality of ninth random data sets, a plurality of tenth random data sets, a plurality of eleventh random data sets, a plurality of twelfth random data sets, a plurality of thirteenth random data sets, a plurality of fourteenth random data sets, a plurality of fifteenth random data sets, a plurality of sixteenth random data sets, a plurality of seventeenth random data sets and a plurality of eighteenth random data sets are respectively input into the decision tree model, the random forest model, the extreme gradient lifting model and the lightweight gradient lifting model to obtain a plurality of first permeability.

According to an arrangement importance algorithm (Permutation Importance, PI) algorithm, if a certain full-scale attribute has an important influence on the predicted permeability, after a random data set is constructed by disturbing attribute values corresponding to the full-scale attribute, the disturbed random data set is input into a verification model to obtain a first permeability, the difference between the first permeability and the sample permeability is very large, and the importance of the full-scale attribute is high, so that the attribute values corresponding to the full-scale attribute can seriously influence the predicted result of the reservoir permeability. If a certain full-quantity attribute has little influence or smaller influence on the predicted permeability, after a random data set is constructed by disturbing attribute values corresponding to the full-quantity attribute, the disturbed random data set is input into a verification model to obtain the first permeability, the difference between the first permeability and the sample permeability is smaller, the importance of the full-quantity attribute is low, and the attribute values corresponding to the full-quantity attribute cannot influence the predicted result of the reservoir permeability or have smaller influence on the predicted reservoir permeability. Therefore, based on the PI algorithm, the target attribute which has important influence on the prediction of the reservoir permeability can be accurately and interpretably screened from the total attributes, and the reservoir permeability is predicted by only using the attribute value corresponding to the target attribute.

In some embodiments, the importance parameter represents a degree of decay of the verification model. If a certain full-scale attribute has an important influence on the predicted permeability, after a random data set is constructed by disturbing attribute values corresponding to the full-scale attribute, the disturbed random data set is input into a verification model to obtain the first permeability, the difference between the first permeability and the sample permeability is very large, the importance parameter corresponding to the full-scale attribute is large, the importance parameter is a positive value, and the verification model is seriously attenuated. If a certain full-scale attribute has little influence or smaller influence on the predicted permeability, after a random data set is constructed by disturbing attribute values corresponding to the full-scale attribute, the disturbed random data set is input into a verification model to obtain a first permeability, the difference between the first permeability and the sample permeability is smaller, the importance parameter corresponding to the full-scale attribute is smaller, the importance parameter is 0 or a negative value at the moment, the verification model hardly decays or even becomes good (the performance of the verification model hardly decays when the importance parameter is 0, and the performance of the verification model even becomes good when the importance parameter is a negative value).

In some embodiments, based on a permutation importance algorithm, determining a plurality of sets of importance parameters corresponding to the full-scale attribute according to the sample permeability and the first permeability specifically includes:

S1: calculating a plurality of groups of performance parameters of a verification model aiming at a random data set according to the sample permeability and the first permeability;

s2: and calculating a plurality of groups of importance parameters corresponding to the full-quantity attribute according to the plurality of groups of performance parameters.

In some embodiments, the importance parameter may be calculated according to the following formula:

where q represents the number of the full-scale attribute, M _q Importance parameter representing the q-th full-scale attribute, s representing the verification moduleType performance parameters for sample data sets, K represents the total number of random data sets obtained for a certain total number of attributes, s _u,q Representing performance parameters of the verification model for the u-th random data set.

In some embodiments, the performance parameter characterizes a degree of difference between the sample permeability and the first permeability, the greater the performance parameter.

In some embodiments, the performance parameter is also denoted as R ² Is a statistical indicator for evaluating the goodness of fit of a regression model (i.e., a verification model) that represents the proportion of the variability of the dependent variable that can be interpreted by the model, i.e., the degree of fit of the model to the data. R is R ² Has a value ranging from 0 to 1, a higher R ² The value represents that the model can better explain the variability of the dependent variable, namely the fitting degree of the model is better. R is R ² The variability of dependent variables that the representation model can interpret is a proportion of the total variability.

R ² The explanation of (2) is as follows:

R ² =0: the model cannot explain the variability of the dependent variable, namely the predicted value of the model is not related to the actual observed value;

R ² =1: the model is fully capable of accounting for variability of the dependent variables, i.e., the predicted value of the model is fully consistent with the actual observed value.

R ² Is calculated based on the sum of the total squares (Total Sum of Squares, TSS), the sum of the squares of the regression (Regression Sum of Squares, RSS) and the sum of the squares of the residuals (Residual Sum of Squares, ESS). The specific calculation steps are as follows:

1. calculating a Total Sum of Squares (TSS) representing the total variability of the dependent variable; TSS is equal to the sum of squares of the difference between the sample permeability and the mean of the sample permeability;

2. calculating a sum of squares Regression (RSS) representing variability of model interpretation; RSS equals the sum of squares of the difference between the first permeability and the sample permeability;

3. calculating a sum of squares residual (ESS) representing the residual variability that the model cannot interpret;

4. calculating R ² ：R ² ＝1-(ESS/TSS)＝RSS/TSS。

In some embodiments, a set of importance parameters corresponding to the decision tree model may be determined using the first permeability calculated by the decision tree model in combination with the sample permeability; the first permeability calculated by the random forest model is utilized, and a group of importance parameters corresponding to the random forest model can be determined by combining the sample permeability; the first permeability calculated by the extreme gradient lifting model is utilized, and a group of importance parameters corresponding to the extreme gradient lifting model can be determined by combining the sample permeability; and determining a group of importance parameters corresponding to the lightweight gradient lifting model by using the first permeability calculated by the lightweight gradient lifting model and combining the sample permeability.

In some embodiments, the importance parameter is positive and the larger the value, the more important the overall attribute corresponding to the importance parameter, the greater the impact on predicted permeability. The importance parameter is equal to 0, which means that the total attribute corresponding to the importance parameter has no influence on the predicted permeability basically and can be abandoned. The importance parameter is a negative value, which indicates that the precision of the first permeability obtained after the whole quantity attribute corresponding to the importance parameter is disordered is higher than the precision of the permeability predicted before the order is disordered, and indicates that the whole quantity attribute corresponding to the importance parameter basically has no influence on the predicted permeability, even has negative influence, and can be abandoned.

TABLE 1

TABLE 2

TABLE 3 Table 3

/>

TABLE 4 Table 4

In some embodiments, table 1 represents ranking results for a full-scale attribute based on a decision tree model. Table 2 shows the ranking results for the full-scale attribute based on the random forest model. Table 3 shows the ranking results for the full-scale attribute based on the extreme gradient lifting model. Table 4 shows the ranking results for the full-scale attribute based on the lightweight gradient lifting model. The meaning of each full-scale attribute in tables 1 to 4 is: pore structure microscopic characterization parameter sorting coefficient (Sp), displacement pressure (Pcd), pore radius (R25) corresponding to accumulated mercury intrusion saturation of 25%, pore radius (R20) corresponding to accumulated mercury intrusion saturation of 20%, maximum pore throat radius (Rmax), average pore throat radius (Rave), pore throat distribution peak position (PSD-P), characteristic structure coefficient (P1), relative sorting coefficient (Dr), skewness (Skp), final residual saturation (Smin), fractal dimension (D6), inflection point radius (Rapex), porosity Feldspar content (Feldspar), quartz content (Quartz), carbonate content (Carbonate), clay content (Clay).

Based on the embodiment, the feature screening method (PI algorithm) provided by the application is visual in expression, has no prejudice to the features, is consistent with the property of one feature importance measure expected by people, and is irrelevant to a model. Therefore, PI algorithms have interpretability.

In some embodiments, screening a target sorting result with higher accuracy from a plurality of sorting results specifically includes:

s1: extracting full-quantity attributes with importance parameters larger than zero from the sequencing results respectively to form a plurality of attribute sets to be detected;

s2: detecting whether the attribute set to be detected has a reference attribute or not to obtain a detection result;

s3: according to the detection result, taking the attribute set to be detected with the determined reference attribute as a target attribute set;

s4: and taking the sorting result corresponding to the target attribute set as a target sorting result.

In some embodiments, the reference attributes include: fractal dimension, porosity, clay content.

In some embodiments, the reference attribute refers to an attribute that has been determined to have a greater impact on permeability predictions. When the reference attribute exists in the attribute set to be detected, the ordering result corresponding to the attribute set to be detected is accurate.

In some embodiments, the set of attributes to be detected corresponding to table 1 is { fractal dimension, carbonate content, relative sorting coefficient }, where there is no porosity, clay content, so the sorting result corresponding to table 1 is low in accuracy and is not used.

In some embodiments, the set of attributes to be detected corresponding to table 2 is { fractal dimension, characteristic structural coefficient, porosity, relative sorting coefficient, carbonate content, clay content, feldspar content, displacement pressure, pore structure microscopic characterization parameter sorting coefficient, pore radius corresponding to cumulative mercury saturation of 25%, inflection point radius, pore radius corresponding to cumulative mercury saturation of 20%, average pore throat radius, quartz content }, where fractal dimension, porosity, clay content are present, so the sorting result corresponding to table 2 is highly accurate and remains as the target sorting result.

In some embodiments, the set of attributes to be detected corresponding to table 3 is { fractal dimension, porosity }, where there is no clay content, so the ranking results corresponding to table 3 are low in accuracy and are not used.

In some embodiments, the set of attributes to be detected corresponding to table 4 is { fractal dimension, skewness, clay content, porosity, carbonate content, pore structure microscopic characterization parameter sorting coefficient }, where fractal dimension, porosity, clay content are present, so the sorting result corresponding to table 4 is highly accurate and remains as the target sorting result.

In some embodiments, the importance threshold may be set to 0.0200. For table 2, the target attributes obtained are: fractal dimension, characteristic structure coefficient, porosity, relative sorting coefficient, carbonate content, clay content, feldspar content, displacement pressure, pore structure microscopic characterization parameter sorting coefficient, and pore radius corresponding to accumulated mercury saturation of 25%. For table 4, the target attributes obtained are: fractal dimension, skewness, clay content, porosity, carbonate content, and microscopic characterization parameter sorting coefficient of the pore structure.

In some embodiments, the target property of the sample core may be taken as the target property of the target core because the physical properties of the target core and the physical properties of the sample core are relatively close.

S103: and determining an attribute value corresponding to the target attribute of the target core.

In some embodiments, the method for determining the attribute value corresponding to the target attribute of the target core may refer to an embodiment for determining the attribute value corresponding to the total attribute of the sample core, which is not described herein.

S104: and inputting an attribute value corresponding to the target attribute of the target core into a target prediction model to obtain the target permeability of the target core.

In some embodiments, the target prediction model is a depth factorizer model, also referred to as deep fm model, which is derived based on a deep fm algorithm. Feature combinations are a problem encountered in many machine learning modeling processes, and if features are directly modeled, it is highly likely that the correlation information between features (i.e., attribute values) will be ignored. The deep FM algorithm can improve the output precision of the model by constructing a new characteristic combination mode of the cross characteristic. The deep fm algorithm has both linear, low-order and high-order features and does not require manual feature interaction.

In some embodiments, FIG. 2 shows a schematic structural diagram of a deep FM model, wherein spark Features represent Sparse feature input layers; dense components represent fully connected layers, and spark Features are combined with Dense components to be called Feature Embedding modules (feature embedding modules); FM Layer represents an FM (Factorization Machine, factorizer) module; hidden Layers represent Deep modules; output Units represent Output modules for outputting the target permeability. From spark Features, a number of attribute values are input: fieldi, fieldj, fieldm (i, j, m denote the numbers of the attribute values, and Field denotes the attribute values). The Addition represents the operation of adding all inputs, the Inner Product represents the output of the unit as the Product of two input vectors (the Addition is the first order in the two-dimensional FM module equation, the Inner Product is the second order in the two-dimensional FM module equation), the Sigmoid Function represents the Sigmoid Function, activation Function represents the activation Function, weight-1 Connection represents the default Weight 1 Connection, normal Connection represents a Weight to learn Connection, and Embedding represents Embedding. The Deep module is responsible for the extraction of high-order features, the FM module is responsible for the extraction of low-order features, and the Deep module and the FM module share the same input (Feature Embedding module) and are in parallel relation. The FM module can extract the hidden variable inner product of each dimensional characteristic, and has stronger interpretability. However, due to computational complexity, only second order feature combinations are typically used. The Deep module is a feed-forward neural network for learning high-order feature interactions. The FM module and Deep module share Feature Embedding modules such that the Deep FM model can learn interactions of low-order and high-order features from the input raw features (i.e., attribute values) at the same time.

In some embodiments, the predictive function of the deep fm algorithm is:

y＝sigmoid(y _FM +y _DNN )

wherein y represents the output of the deep FM model, i.e. the target permeability; y is _FM Representing the output of the FM module; y is _DNN Representing the output of the Deep module; sigmoid represents a sigmoid function.

The two-dimensional FM module equation is defined as follows:

wherein i and j each represent the number of the attribute value, and n represents the total number of the attribute values; x is x _i Representing an ith attribute value; w (w) ₀ Representing a global bias term; w (w) _i Represents x _i Corresponding weight parameters; x is x _j Represents the j-th attribute value;<v _i ,v _j >representing cross terms, a total of n (n-1)/2 cross terms are available; v _i Representing the ith attribute value x _i Corresponding hidden vector, v _j Represents the j-th attribute x _j Hidden vectors corresponding to the values.

Where < ·, · > represents the inner product of two k-dimensional vectors:

for each x _i Introducing a k-dimensional (k is much smaller than n) auxiliary vector v _i ＝{v _i,1 ,v _i,2 ,v _i,3 ,…,v _i,k Using the result of the vector inner product to represent the combined parameters<v _i ,v _j >，Is a hyper-parameter defining the decomposition dimension (+)>Representing a positive integer set). The pairwise relationship between features is represented by the inner product of hidden vectors, which are essentially embedded representations of the features.

In the two-dimensional FM module equation,is a traditional linear model, and the FM module adds cross terms into the traditional linear model<v _i ,v _j >I.e. a combination of features. When the parameters in the cross terms are all 0, the two-dimensional FM module equation degenerates into a normal linear model. The two-dimensional FM module equation can know that the FM module can simultaneously excavate the combination information of the linear relation and the second-order characteristic.

In some embodiments, w ₀ The range of the value of (2) satisfies w ₀ E R (R represents a real set), w _i The range of the value of (2) satisfies w _i ∈R，v _i The value range of (2) satisfies v _i ∈R ^n×k (k represents v _i The dimension of (c).

The Deep neural network of the Deep module learns high-order feature combination information through multi-layer hidden layers and nonlinear conversion, and the input of the Deep neural network is the transverse splicing of all dense vectors:

a ⁽⁰⁾ ＝[e ₁ ,e ₂ ,e ₃ ,…,e _m ]

wherein a is ⁽⁰⁾ Representing inputs to the Deep module, e _p (p=1, 2,3, …, m) represents the p-th dense vector, and m represents the total number of dense vectors.

In some embodiments, the Feature Embedding module compresses the data dimensions of the original features, converting the original features into dense vectors.

The hidden layer equation is:

a ^(l+1) ＝σ(W ⁽¹⁾ a ^(l) +b ⁽¹⁾ )

wherein l represents the number of layers of the hidden layer, a ^(l+1) Represents the output of the hidden layer of the layer 1, a ^(l) Output of the first hidden layer, σ represents hidden layer equation, W ⁽¹⁾ Weight parameter representing hidden layer of the first layer, b ⁽¹⁾ Representing the bias of the hidden layer of the first layer.

Based on the embodiment, the PI algorithm has high calculation speed, is easy to understand, is applicable to structured data and is irrelevant to a model, has no prejudice to the characteristics, and is consistent with the property of one expected characteristic importance measure; the deep FM algorithm can simultaneously mine linear, low-order and high-order interaction information among features (features, namely attribute values), and the method combines an interpretable artificial intelligence algorithm with permeability influence factor feature selection, so that the method has higher interpretability and wider application range compared with the feature selection algorithm in the prior art, and is not influenced by model defects. Further, according to the two-dimensional FM module equation and the neural network of the Deep module, the technical effect of the Deep FM algorithm of automatically mining linear, low-order and high-order interaction information among geological features is also interpretable.

In a specific example of a scenario, a permeability experiment is performed on the target rock as well, and the measured permeability of the target core is obtained as a true value to verify the accuracy of the method described in the present application. For the target core, various classical permeability models are adopted to predict the permeability of the target core, and the error of the prediction result is shown in the table 5.

TABLE 5

/>

In Table 5, K ₀ Represents permeability, R10 represents a pore radius corresponding to 10% of cumulative mercury intrusion, R15 represents a pore radius corresponding to 15% of cumulative mercury intrusion, R20 represents a pore radius corresponding to 20% of cumulative mercury intrusion, R25 represents a pore radius corresponding to 25% of cumulative mercury intrusion, R30 represents a pore radius corresponding to 30% of cumulative mercury intrusion, R35 represents a pore radius corresponding to 35% of cumulative mercury intrusion, R40 represents a pore radius corresponding to 40% of cumulative mercury intrusion,representing porosity, rapex represents inflection radius, MSE represents mean square error, RMSE represents root mean square error.

The mean square error is determined by the following formula:

wherein K is _exp Representing the true value of permeability, K _pred Predicted value (obtained by model in table 5) representing permeability, n ₀ The number of true values representing permeability (i.e., the number of target cores).

The root mean square error is determined by the following formula:

as can be seen from table 5, the calculation accuracy of Winland (R35) is highest among all the classical permeability models, and the calculation result of Winland (R35) is extracted for subsequent comparison.

The attribute values corresponding to the 18 full-scale attributes are input into a deep FM model, permeability is predicted, and the obtained permeability is recorded as deep FM (18 Features). The attribute values corresponding to the 10 target attributes (fractal dimension, characteristic structure coefficient, porosity, relative sorting coefficient, carbonate content, clay content, feldspar content, displacement pressure, pore structure microscopic characterization parameter sorting coefficient, and pore radius corresponding to the cumulative mercury saturation of 25%) extracted in table 2 are input into a deep fm model, permeability is predicted, and the obtained permeability is recorded as PI-deep fm (10 Features). The attribute values corresponding to the 6 target attributes (fractal dimension, skewness, clay content, porosity, carbonate content and pore structure microscopic characterization parameter sorting coefficient) extracted in table 4 are input into a deep FM model, permeability is predicted, and the obtained permeability is recorded as PI-deep FM (6 Features). The permeation results corresponding to the various models are shown in table 6.

TABLE 6

As can be seen from Table 6, the RMSE value of deep FM (18 Features) reached 0.542, which is 3.04% lower than the Winland (R35) model, which performed best in the classical permeability model. The RMSE value of PI-deep FM (10 Features) was 0.475, which was 15.03% lower than Winland (R35). The RMSE value of PI-deep FM (6 Features) was 0.415, which was 25.76% lower than that of Winland (R35) model. Therefore, by using the method provided by the application, the calculation accuracy of the permeability can be effectively improved by inputting the target attribute screened out by the verification model into the deep FM model for prediction.

From the values of MSE and RMSE in Table 6, we obtain profiles of MSE and RMSE before and after feature screening, referring to FIG. 3, MSE1 represents the mean square error of deep FM (18 Featues), MSE2 represents the mean square error of PI-deep FM (10 Featues), MSE3 represents the mean square error of PI-deep FM (6 Featues), RMSE1 represents the mean square error of deep FM (18 Featues), RMSE2 represents the mean square error of PI-deep FM (10 Featues), RMSE3 represents the mean square error of PI-deep FM (6 Featues), and the vertical axis Value represents the values of MSE and RMSE. Fig. 3 specifically shows the mean square error and root mean square error distribution result of deep fm (18 Features), the PI-deep fm (10 Features), and the PI-deep fm (6 Features) respectively obtained by running the target prediction model 50 times. Based on the experimental results, the method for combining the characteristic screening and the deep FM model is obviously superior to a classical permeability model. The full attribute importance analysis and permeability prediction method of the method has the characteristics of being interpretable, capable of being applied in a large scale, suitable for structured data, time-saving and capable of reducing human intervention, and provides a new artificial intelligence method and thought for permeability prediction of rock porous media with complex pore structures. The method can effectively excavate potential interaction information among all attributes, greatly reduce the error of permeability prediction work, can be used for guiding tight sandstone oil gas exploration and development and a tight sandstone gas migration and reservoir mechanism, and has good application prospect in the geological field.

According to the method for determining the permeability of the machine learning reservoir based on PI and deep fm, the present disclosure further provides an embodiment of a device for determining the permeability of the machine learning reservoir based on PI and deep fm, as shown in fig. 4, and the device for determining the permeability of the machine learning reservoir based on PI and deep fm specifically includes the following modules: an acquisition module 401, a processing module 402, a determination module 403, and a calculation module 404.

An acquisition module 401, configured to acquire a target core and a sample core;

a processing module 402, configured to process the sample core based on an permutation importance algorithm to obtain a target attribute of the sample core; taking the target attribute of the sample core as the target attribute of the target core;

a determining module 403, configured to determine an attribute value corresponding to a target attribute of the target core;

and the calculating module 404 is configured to input an attribute value corresponding to the target attribute of the target core into a target prediction model, so as to obtain a target permeability of the target core.

In some embodiments, the processing module 402 is specifically configured to obtain a sample data set of a sample core; the sample data set comprises a full attribute, an attribute value corresponding to the full attribute and a sample permeability; respectively processing the sample data set by using a plurality of verification models to obtain a plurality of first permeabilities of the sample core; based on an arrangement importance algorithm, determining a plurality of groups of importance parameters corresponding to the full-scale attribute according to the sample permeability and the first permeability; according to a plurality of groups of importance parameters, the full-quantity attributes are respectively arranged in a descending order to obtain a plurality of sequencing results; screening out a target sorting result with higher accuracy from a plurality of sorting results; and screening a plurality of full-quantity attributes with the importance parameter larger than an importance threshold value from the target sorting result to serve as target attributes.

It should be noted that, the units, devices, or modules described in the above embodiments may be implemented by a computer chip or entity, or may be implemented by a product having a certain function. For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, when the present description is implemented, the functions of each module may be implemented in the same piece or pieces of software and/or hardware, or a module that implements the same function may be implemented by a plurality of sub-modules or a combination of sub-units, or the like. The above-described apparatus embodiments are merely illustrative, for example, the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The present description embodiments also provide a computer storage medium storing computer program instructions for a PI and deep fm based machine learning reservoir permeability determination method, which when executed by a processor, implement: obtaining a target core and a sample core; processing the sample core based on an arrangement importance algorithm to obtain a target attribute of the sample core; taking the target attribute of the sample core as the target attribute of the target core; determining an attribute value corresponding to a target attribute of the target core; and inputting an attribute value corresponding to the target attribute of the target core into a target prediction model to obtain the target permeability of the target core.

In the present embodiment, the storage medium includes, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.

In this embodiment, the functions and effects of the program instructions stored in the computer storage medium may be explained in comparison with other embodiments, and are not described herein.

The present disclosure also provides a server comprising a processor and a memory for storing processor-executable instructions, the processor, when embodied, being operable to perform the following steps according to the instructions: obtaining a target core and a sample core; processing the sample core based on an arrangement importance algorithm to obtain a target attribute of the sample core; taking the target attribute of the sample core as the target attribute of the target core; determining an attribute value corresponding to a target attribute of the target core; and inputting an attribute value corresponding to the target attribute of the target core into a target prediction model to obtain the target permeability of the target core.

In order to more accurately complete the above instructions, referring to fig. 5, another specific server is provided in this embodiment of the present disclosure, where the server includes a network communication port 501, a processor 502, and a memory 503, and the above structures are connected by an internal cable, so that each structure may perform specific data interaction.

The network communication port 501 may be specifically used to obtain a target core and a sample core.

The processor 502 may be specifically configured to process the sample core based on an permutation importance algorithm to obtain a target attribute of the sample core; taking the target attribute of the sample core as the target attribute of the target core; determining an attribute value corresponding to a target attribute of the target core; and inputting an attribute value corresponding to the target attribute of the target core into a target prediction model to obtain the target permeability of the target core.

The memory 503 may be used to store a corresponding program of instructions.

In this embodiment, the network communication port 501 may be a virtual port that binds with different communication protocols, so that different data may be sent or received. For example, the network communication port may be a port responsible for performing web data communication, a port responsible for performing FTP data communication, or a port responsible for performing mail data communication. The network communication port may also be an entity's communication interface or a communication chip. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it may also be a Wifi chip; it may also be a bluetooth chip.

In this embodiment, the processor 502 may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor, and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a programmable logic controller, and an embedded microcontroller, among others. The description is not intended to be limiting.

In this embodiment, the memory 503 may include a plurality of layers, and in a digital system, the memory may be any memory as long as it can hold binary data; in an integrated circuit, a circuit with a memory function without a physical form is also called a memory, such as a RAM, a FIFO, etc.; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card, and the like.

Although the present description provides method operational steps as described in the examples or flowcharts, more or fewer operational steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one way of performing the order of steps and does not represent a unique order of execution. When implemented by an apparatus or client product in practice, the methods illustrated in the embodiments or figures may be performed sequentially or in parallel (e.g., in a parallel processor or multi-threaded processing environment, or even in a distributed data processing environment). The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, it is not excluded that additional identical or equivalent elements may be present in a process, method, article, or apparatus that comprises a described element. The terms first, second, etc. are used to denote a name, but not any particular order.

Those skilled in the art will also appreciate that, in addition to implementing the controller in a pure computer readable program code, it is well possible to implement the same functionality by logically programming the method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc. Such a controller can be regarded as a hardware component, and means for implementing various functions included therein can also be regarded as a structure within the hardware component. Or even means for achieving the various functions may be regarded as either software modules implementing the methods or structures within hardware components.

The description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of embodiments, it will be apparent to those skilled in the art that the present description may be implemented in software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the present specification may be embodied essentially in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and include several instructions to cause a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to perform the methods described in the various embodiments or portions of the embodiments of the present specification.

Various embodiments in this specification are described in a progressive manner, and identical or similar parts are all provided for each embodiment, each embodiment focusing on differences from other embodiments. The specification is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

Although the present specification has been described by way of example, it will be appreciated by those skilled in the art that there are many variations and modifications to the specification without departing from the spirit of the specification, and it is intended that the appended claims encompass such variations and modifications as do not depart from the spirit of the specification.

Claims

1. A machine learning reservoir permeability determination method based on PI and deep fm, comprising:

obtaining a target core and a sample core;

2. The method of claim 1, wherein processing the sample core based on a permutation importance algorithm to obtain a target attribute of the sample core comprises:

3. The method of claim 2, wherein the plurality of verification models comprises: decision tree model, random forest model, extreme gradient lifting model, and lightweight gradient lifting model.

4. The method of claim 2, wherein processing the sample data sets with a plurality of validation models, respectively, results in a plurality of first permeabilities of the sample core, comprising:

5. The method of claim 2, wherein determining a plurality of sets of importance parameters corresponding to the full-scale attribute from the sample permeability and the first permeability based on a rank importance algorithm comprises:

6. The method of claim 2, wherein screening out a higher accuracy target ranking result from the plurality of ranking results comprises:

7. The method of claim 6, wherein the reference attribute comprises: fractal dimension, porosity, clay content.

8. The method of claim 1, wherein the target prediction model is a depth factorizer model.

9. A PI and deep fm based machine learning reservoir permeability determination apparatus, comprising:

the acquisition module is used for acquiring a target core and a sample core;

10. A computer readable storage medium, having stored thereon computer instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 8.