CN117093922A - Improved SVM-based complex fluid identification method for unbalanced sample oil reservoir - Google Patents

Improved SVM-based complex fluid identification method for unbalanced sample oil reservoir Download PDF

Info

Publication number
CN117093922A
CN117093922A CN202311068108.XA CN202311068108A CN117093922A CN 117093922 A CN117093922 A CN 117093922A CN 202311068108 A CN202311068108 A CN 202311068108A CN 117093922 A CN117093922 A CN 117093922A
Authority
CN
China
Prior art keywords
data
sample
samples
class
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311068108.XA
Other languages
Chinese (zh)
Inventor
毛敏
刘娟霞
徐长敏
何理鹏
杨毅
印森林
罗思雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China France Bohai Geoservices Co Ltd
Original Assignee
China France Bohai Geoservices Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China France Bohai Geoservices Co Ltd filed Critical China France Bohai Geoservices Co Ltd
Priority to CN202311068108.XA priority Critical patent/CN117093922A/en
Publication of CN117093922A publication Critical patent/CN117093922A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Abstract

The application relates to an unbalanced sample oil reservoir complex fluid identification method based on an improved SVM. According to the improved SVM-based unbalanced sample oil reservoir complex fluid identification method, firstly, data are expanded through an ADASYN model, so that the possibility of occurrence of fitting is reduced, and then, a plurality of weak SVM classifiers are combined through an AdaBoost.M2-SVM, so that the model popularization capability is improved, and the problem that the traditional method is inadaptive and multi-solution in unconventional oil reservoir fluid identification is solved. In addition, according to the method, the intelligent identification of the oil, gas and water layers of the reservoir can be realized in the oil reservoir fluid identification combined with logging based on an artificial intelligence algorithm. The method comprises the steps of data processing, algorithm design, model training and model deployment, and a set of reservoir oil-gas-water layer intelligent identification system combining logging data is formed.

Description

Improved SVM-based complex fluid identification method for unbalanced sample oil reservoir
Technical Field
The application relates to an unbalanced sample oil reservoir complex fluid identification method based on an improved SVM, and belongs to the technical field of oil reservoir complex fluid identification.
Background
In recent years, multiple sets of low-resistance reservoirs are drilled and met in offshore oil field exploration processes. Identification of the fluid properties of low-resistance reservoirs is an important link in oil and gas field exploration and development research, and the accuracy of identification is directly related to discovery and efficient development of oil and gas reservoirs. Currently, reservoir fluid property identification relies primarily on logging and logging data. The logging data is used as the first characteristic data of exploration, has the characteristics of being visual and more obvious in the original stratum property, is one of the most important data for assisting specialists in searching and evaluating the reservoir, and can effectively identify the lithology and physical properties of the reservoir and identify the fluid property of the reservoir through various technical means (such as rock debris logging, core logging and X-ray element logging). The logging data is obtained by directly information of underground oil gas, has the characteristics of intuitionism and accuracy, but has some defects. First, the sampling interval is large, possibly resulting in discontinuity of information. Second, the delamination accuracy is low and the interpretation of reservoir thickness is not accurate enough.
Logging techniques mainly distinguish hydrocarbon reservoirs from water layers by differences in the response of logging curves such as resistivity logging and sonic logging to fluid properties. The resistivity well logging can distinguish the impedance difference of different mediums to current so as to distinguish the hydrocarbon reservoir and the water layer, and the acoustic well logging can distinguish different mediums by utilizing the difference of propagation speeds of acoustic waves in different mediums, so that the existence of the hydrocarbon reservoir can be judged. In addition, auxiliary logging curves such as natural gamma logging, neutron logging, etc. may also be used for identification of reservoir fluid properties. However, for a low-resistance oil reservoir, the logging response characteristics of an oil layer and a water layer in the same oil reservoir are not obvious, the resistivity of the water layer of the oil layer is not obviously limited, the low-resistance oil reservoir is difficult to identify based on conventional logging means, and the low-resistance oil reservoir is often misinterpreted as the water layer and omitted.
The well logging interpretation method mainly comprises an intersection plate method based on mathematical statistics and a method based on a volume model and a theoretical formula. However, for unconventional reservoirs such as low-resistance reservoirs, the simplified volume model is not adaptive in practical application, and the established empirical formula is poor in precision and popularization.
With the development of artificial intelligence technology, researchers have gradually begun to try data-driven reservoir fluid identification methods. For example, algorithms based on convolutional neural networks, long-short-term memory network algorithm methods. However, logging data has the problem of unbalanced data distribution, and is easy to be over-fitted to a deep network, so that the model has poor popularization capability and is difficult to apply to actual development. Therefore, in order to solve the problems of the above-mentioned various technologies, it is necessary to develop an unbalanced sample oil reservoir complex fluid identification method based on the improvement SVM (Support Vector Machine), so as to solve the above problems of the existing oil reservoir complex fluid identification method.
Disclosure of Invention
The application aims at: the method for identifying the complex fluid of the unbalanced sample oil reservoir based on the improved SVM is provided, so that the problems of poor precision and poor popularization existing in the existing method for identifying the complex fluid of the unbalanced sample oil reservoir are solved.
The technical scheme of the application is as follows:
an unbalanced sample oil reservoir complex fluid identification method based on an improved SVM is characterized by comprising the following steps of: the identification method comprises the following steps:
step 1, collecting logging data of an arrangement research area, and establishing a data layer;
step 2, data in the data layer are arranged to obtain a final data set, wherein the data arrangement comprises data arrangement, abnormal data deletion, filling of empty data, data interpolation and data dimension reduction;
step 3, because the lithology data and the gas measurement data are inconsistent with the data fineness of other data in the step 2.1, interpolation is needed to be carried out on the lithology data and the gas measurement data;
step 4, step 1.2, counting 15 factors influencing the reservoir fluid property, and converting a plurality of non-linear related indexes into less indexes which are not related with each other through data dimension reduction by principal component analysis; the principal component analysis was calculated as follows:
step 5, for the data processed in step 4, the data proportion difference between the oil layer and the oil-water same layer and the non-oil layer is too large, the oil layer occupies a relatively low proportion, and for the training of the model, the model is easy to perform poor performance because of the problem of unbalanced data, so that an improved self-adaptive synthetic sampling Adaptive Synthetic Sampling, namely an ADASYN algorithm, is required to be used for data enhancement; step 5.1, for the samples obtained in step 4, using the improved ADASYN to randomly oversample a few types of samples, the algorithm comprises the following calculation steps:
step 6, constructing an AdaBoost.M2-SVM model;
and 7, inputting the data processed in the step 5 into a model for training, and identifying the oil-gas-water layer of logging data through training an AdaBoost.M2-SVM algorithm.
The application has the advantages that:
according to the improved SVM-based unbalanced sample oil reservoir complex fluid identification method, firstly, data are expanded through an ADASYN model, so that the possibility of occurrence of fitting is reduced, and then, a plurality of weak SVM classifiers are combined through an AdaBoost.M2-SVM, so that the model popularization capability is improved, and the problem that the traditional method is inadaptive and multi-solution in unconventional oil reservoir fluid identification is solved. In addition, according to the method, the intelligent identification of the oil, gas and water layers of the reservoir can be realized in the oil reservoir fluid identification combined with logging based on an artificial intelligence algorithm. The method comprises the steps of data processing, algorithm design, model training and model deployment, and a set of reservoir oil-gas-water layer intelligent identification system combining logging data is formed.
Drawings
FIG. 1 is a flow chart of the present application;
FIG. 2 is a sample graph obtained when the present application is applied;
FIG. 3 is a graph showing the results of inputting a training set and a validation set into an AdaBoost.M2-SVM model when the present application is applied.
Detailed Description
The improved SVM-based complex fluid identification method for the unbalanced sample oil reservoir is characterized by comprising the following steps of: the identification method comprises the following steps:
step 1, collecting logging data of an arrangement research area, and establishing a data layer;
step 1.1, selecting logging data of a plurality of wells in a work area to form a data set, and taking a group of data according to each interval of 0.1m to obtain a final required sample;
step 1.2, because the depths of samples collected by different sensors in the well are inconsistent, the training requirement of the model cannot be met, matching and splicing are carried out according to the depth characteristics of data, and the samples with different depths are subjected to uniform depth operation. Lithology data, fluorescence levels, gas measurement data comprising C1, C2, C3, iC4, nC4, iC5, nC5, CO2 and H2S, and induction data comprising resistivity, natural gamma, neutrons and density are selected as characteristic data, a gas layer, an oil-water layer, a water layer and a dry layer are used as tag data, and a data layer is established by the above data.
Step 2, data in the data layer are arranged to obtain a final data set, wherein the data arrangement comprises data arrangement, abnormal data deletion, filling of empty data, data interpolation and data dimension reduction;
step 2.1, because the data of the "fluorescence level" and "logging comprehensive interpretation" columns in the original excel table of the data layer have default values, the default values need to be filled. For "fluorescence level", because this value is too low in some intervals that are not reservoirs, the staff is not recording the data into the original excel table, should actually be filled with 0, and is the same for "logging comprehensive interpretation", and is typically filled with "water layer" for blank data; for those actual missing values, selecting to delete the entire piece of data; for the air test data, the invalid value in the original data is-999.25, and the whole data with negative data is selected to be deleted.
Step 3, because the lithology data and the gas measurement data are inconsistent with the data fineness of other data in the step 2.1, interpolation is needed to be carried out on the lithology data and the gas measurement data;
step 3.1, because the precision of the gas measurement data is 1m for each data sampling point, and the data precision of the sensing data is 0.1m, the gas measurement data is interpolated, and different classical interpolation algorithms are adopted for interpolation, nearest neighbor interpolation, bilinear interpolation and bicubic interpolation algorithms, so that the gas measurement data is more refined;
the principle of the bilinear interpolation algorithm is as follows:
assuming P is a point to be solved of the gas measurement data, approaching points Q11, Q12, Q21 and Q22, respectively carrying out linear interpolation on the points Q11, Q21 and Q12 and Q22 to obtain R1 and R2, and then carrying out linear interpolation on the points R1 and R2 to obtain a value of the point P;
step 3.2, lithology data are interval data, and the lithology of a depth interval is reflected.
Step 4, step 1.2, counting 15 factors influencing the reservoir fluid property, and converting a plurality of non-linear related indexes into less indexes which are not related with each other through data dimension reduction by principal component analysis; the principal component analysis was calculated as follows:
in step 4.1, there are n factors affecting the properties of the reservoir fluid, and the overall sample is m, and the sample matrix may be expressed as:
wherein x is ij A value representing a j-th variable in the i-th set of sample data;
step 4.2, as the factors influencing the low-resistance oil reservoir fluid properties of the research block are more, the factors have different dimensions, and the analysis of the main control factors of the oil reservoir fluid properties and the prediction results of the follow-up flow are influenced; in order to eliminate the dimensional influence among different factors, the matrix X is necessary to be subjected to standardized transformation to obtain a matrix Z, and the data is subjected to normalization processing by adopting mean reduction and variance division;
wherein the method comprises the steps of
Step 4.3, solving a correlation coefficient matrix for the standardized matrix Z
Solving characteristic equation |R-lambda I of sample correlation matrix R n N eigenvalues of =0 and λ 1 ≥λ 2 ≥…≥λ n ≥0
Step 4.4, determining the k value to ensure that the accumulated contribution rate of the information is more than 85 percent
Finally we get different influencing factors P1, P2 …, P12 in 12.
Step 5, for the data processed in step 4, the data proportion difference between the oil layer and the oil-water same layer and the non-oil layer is too large, the oil layer occupies a relatively low proportion, and for the training of the model, the model is easy to perform poor performance because of the problem of unbalanced data, so that an improved self-adaptive synthetic sampling Adaptive Synthetic Sampling, namely an ADASYN algorithm, is required to be used for data enhancement;
step 5.1, for the samples obtained in step 4, using the improved ADASYN to randomly oversample a few types of samples, the algorithm comprises the following calculation steps:
(1) training a classifier by using a training set of samples, and testing an identification result by using a verification set;
(2) calculating error division rate sigma of each class respectively through the obtained confusion matrix iWhere i, j= { a, b, c … }, TP i The real class is represented as i class, and the prediction is also the number of i class; FN (Fn) j The number of the real classes is i class, but predicted as j class; when sigma is i When the data is larger than the threshold value, selecting to perform data enhancement between the class i and the class j, otherwise, not performing data enhancement; the class with more samples between class i and class j is set as s m The smaller number is s l
(3) Calculation s m Sum s l The number of required synthesized samples between G, g=(s) m -s l ) X beta, wherein s m Sum s l The number of samples respectively representing more classes and less classes, and beta is a random number between 0 and 1;
(4) for each of the fewer classes of samples x i Calculating the duty ratio r of most classes in k samples with nearest Euclidean distance iDelta in i Representing the number of most classes of samples among the k samples nearest to euclidean distance, i=1, 2,3 … s l
(5) For r i Performing standardization
(6) Calculating the number of samples g required to be generated by a minority class
(7) Generating a new sample by using a traditional SMOTE algorithm according to g to be generated by each minority sample;
(8) sample N to be newly generated i And (3) adding the training set to obtain a new training set, and repeating the steps (1) and (2) until the error fraction sigma is smaller than a threshold value.
Step 6, constructing an AdaBoost.M2-SVM model;
as the popularization of the AdaBoost.M2 algorithm, the multi-classification problem of K classes is converted into K-1 classification problems, so that the AdaBoost algorithm can be applied to the multi-classification problems. The adaboost.m2 algorithm computes classifier weights by combining multiple linear kernel SVM classifiers with different weights by maintaining a set of weight distributions for training data in a training set;
the adaboost.m2 algorithm is as follows:
input training set s= { X i ,y i I=1, …, N; the number of samples is N, where the vector X i Representing the ith training sample, tagWherein i is a number, and K represents different category numbers; iteration times T;
at round t iteration, sample (x i ,y i ) The weight distribution is D t (i) The method comprises the steps of carrying out a first treatment on the surface of the The sample weights are equal at the beginning; the sample weight of the classification error increases for each iteration, resulting in more training. Presence of sample X i Which is correctly classified as y i Incorrect classification of y (non-y of K-1 species) i Class). Assume training to obtain weak classifier h t The result is [0,1]Takes on values between, for the samples (x i ,y i ) Classifier h t Can be judged for K-1 times, three conditions can appear in each result, the classification is correct, the classification is wrong and the results are y and y i Randomly selects one type. Then, the probability of each discriminant error is:
for the K classification problem, there are K-1 different y of different importance in different cases, thus each y is given a weightThen the pseudo-loss of adaboost.m2 is epsilon t
(1) Initializing weights of sample data toThe weight of a certain error label y of the sample i in the first iteration is: />
(2) Loop iteration t=1, …, T;
(1) the sum of the weights of the error labels of sample i in the t-th iterationFor y not equal to y i Has the following componentsSample distribution->
(2) According to the sample distribution D t (i) Reselecting the sample, calling the SVM to train the sample to obtain a sub-classifier h t
(3) Calculate h t Pseudo-loss.
(4) Order theUpdating the weight value: />Wherein->
(3) Obtaining a final combined classifier:
and 7, inputting the data processed in the step 5 into a model for training, and identifying the oil-gas-water layer of logging data through training an AdaBoost.M2-SVM algorithm.
In order to verify the correctness of the application, the applicant applies 32 wells of a certain oil well, and the application process is as follows: the complex oil reservoir fluid identification method based on the improved SVM comprises the following steps:
step 1, collecting logging data of an arrangement research area, and establishing a data layer;
step 1.1, selecting logging data of 32 wells in a work area to form a data set, and taking a group of data according to each interval of 0.1m to obtain a final required sample (see the attached drawing 2 of the specification);
step 1.2, because the depths of samples collected by different sensors are inconsistent and the training requirement of the model cannot be met, matching and splicing are carried out according to the depth characteristics of the data, and the samples with different depths are subjected to depth unification operation. The lithology, fluorescence level, gas measurement data (C1, C2, C3, iC4, nC4, iC5, nC5, CO2 and H2S) and induction data (resistivity, natural gamma, neutrons and density) are selected as characteristic data, an air layer, an oil-water layer, a water layer and a dry layer are used as label data, and a data layer is established by the data.
Step 2, data in the data layer are arranged to obtain a final data set, wherein the data arrangement comprises data arrangement, abnormal data deletion, filling of empty data, data interpolation and data dimension reduction;
step 2.1, because the data of the "fluorescence level" and "logging comprehensive interpretation" columns in the original excel table have default values, filling the default values is needed. For "fluorescence level", the operator does not record the data into the original excel table, but should actually fill with 0, and the same process is done for "logging comprehensive interpretation", and generally with "water layer" for blank data, because this value is too low in some intervals that are not reservoirs. For those actual missing values, the entire piece of data is selected for deletion herein. For the air test value, the invalid value in the original data is-999.25, and the whole piece of data with negative deleted data is selected.
Step 3, because the lithology data and the gas measurement data are inconsistent with the data fineness of other data in the step 2.1, interpolation is needed to be carried out on the lithology data and the gas measurement data;
and 3.1, for the purposes of the present document, the precision of the gas measurement data is 1m for each data sampling point, and the data precision of the sensing data is 0.1m, so that the gas measurement data is interpolated, different classical interpolation algorithms are adopted for interpolation, nearest neighbor interpolation, bilinear interpolation and bicubic interpolation algorithms are adopted, and the gas measurement data is more refined. The principle of bilinear interpolation algorithm follows; assuming P is a point to be solved of the gas measurement data, approaching points Q11, Q12, Q21 and Q22, respectively carrying out linear interpolation on the points Q11, Q21 and Q12 and Q22 to obtain R1 and R2, and then carrying out linear interpolation on the points R1 and R2 to obtain a value of the point P;
step 3.2, lithology data are interval data, and the lithology of a depth interval is reflected. For example: in EP10-2-2 wells, the lithology is pebble-containing coarse sandstone at a depth of 703-720m, the interval data are converted into data of 0.1m, and the lithology of 703.1m,703.2m … and 719.9m is pebble-containing coarse sandstone.
Step 4, 15 factors affecting the reservoir fluid property are counted (step 1.2), and the principal component analysis converts a plurality of non-linear related indexes into fewer indexes which are not related to each other through data dimension reduction. The principal component analysis was calculated as follows: in step 4.1, there are n factors affecting the properties of the reservoir fluid, and the overall sample is m, and the sample matrix may be expressed as:
wherein x is ij A value representing a j-th variable in the i-th set of sample data;
and 4.2, as the factors influencing the low-resistance oil reservoir fluid properties of the research block are more, the factors have different dimensions and influence the analysis of the main control factors of the oil reservoir fluid properties and the prediction results of subsequent flows. In order to eliminate the dimensional influence among different factors, the matrix X is necessary to be subjected to standardized transformation to obtain a matrix Z, and the data is subjected to normalization processing by adopting mean reduction and variance division;
wherein the method comprises the steps of
Step 4.3, solving a correlation coefficient matrix for the standardized matrix Z
Solving characteristic equation |R-lambda I of sample correlation matrix R n N eigenvalues of =0 and λ 1 ≥λ 2 ≥…≥λ n ≥0
Step 4.4, determining the k value to ensure that the accumulated contribution rate of the information is more than 85 percent
Finally we get different influencing factors P1, P2 …, P12 in 12.
Step 5, for the data processed in the step 4, the data proportion difference between the oil layer and the oil-water same layer and the non-oil layer is too large, the oil layer occupies a relatively low proportion, and for the training of the model, the model is easy to perform poor performance because of the problem of unbalanced data, so that an improved self-adaptive synthetic sampling (Adaptive Synthetic Sampling, abbreviated as ADASYN) algorithm is required to be used for data enhancement;
step 5.1, for the samples obtained in step 4, using the improved ADASYN to randomly oversample a few types of samples, the algorithm comprises the following calculation steps:
(1) training a classifier by using a training set of samples, and testing an identification result by using a verification set;
(2) calculating error division rate sigma of each class respectively through the obtained confusion matrix iWhere i, j= { a, b, c … }, TP i The number of i-class predictions is also indicated as i-class true class. FN (Fn) j The number of the real class i but the predicted j class i is indicated. When sigma is i When the data is larger than the threshold value, selecting to perform data enhancement between the class i and the class j, otherwise, performing no data enhancement. The class with more samples between class i and class j is set as s m The smaller number is s l
(3) Calculation s m Sum s l The number of required synthesized samples between G, g=(s) m -s l ) X beta, wherein s m Sum s l The number of samples respectively representing more classes and less classes, and beta is a random number between 0 and 1;
(4) for each of the fewer classes of samples x i Calculating the duty ratio r of most classes in k samples with nearest Euclidean distance iDelta in i Representing the number of most classes of samples among the k samples nearest to euclidean distance, i=1, 2,3 … s l
(5) For r i Performing standardization
(6) Calculating the number of samples g required to be generated by a minority class
(7) Generating a new sample by using a traditional SMOTE algorithm according to g to be generated by each minority sample;
(8) sample N to be newly generated i Adding training set to obtain new training setRepeating the steps (1) and (2) until the misclassification rate sigma is less than a threshold.
Step 6, constructing an AdaBoost.M2-SVM model;
as the popularization of the AdaBoost.M2 algorithm, the multi-classification problem of K classes is converted into K-1 classification problems, so that the AdaBoost algorithm can be applied to the multi-classification problems. The adaboost.m2 algorithm computes classifier weights by combining multiple linear kernel SVM classifiers with different weights by maintaining a set of weight distributions for training data in a training set;
the adaboost.m2 algorithm is as follows:
input training set s= { X i ,y i I=1, …, N; the number of samples is N, where the vector X i Representing the ith training sample, tagWherein i is a number, K represents different category numbers, and the iteration times T;
at round t iteration, sample (x i ,y i ) The weight distribution is D t (i) A. The application relates to a method for producing a fibre-reinforced plastic composite Initially the sample weights are equal. The sample weight of the classification error increases for each iteration, resulting in more training. Presence of sample X i Which is correctly classified as y i Incorrect classification of y (non-y of K-1 species) i Class). Assume training to obtain weak classifier h t The result is [0,1]Takes on values between, for the samples (x i ,y i ) Classifier h t Can be judged for K-1 times, three conditions can appear in each result, the classification is correct, the classification is wrong and the results are y and y i Randomly selects one type. Then, the probability of each discriminant error is
For the K classification problem, there are K-1 different y of different importance in different cases, thus each y is given a weightThen the pseudo-loss of adaboost.m2 is
(1) Initializing weights of sample data toThe weight of a certain error tag y of sample i in the first iteration is +.>
(2) Loop iteration t=1, …, T;
(1) the sum of the weights of the error labels of sample i in the t-th iterationFor y not equal to y i Has the following componentsSample distribution->
(2) According to the sample distribution D t (i) Reselecting the sample, calling the SVM to train the sample to obtain a sub-classifier h t
(3) Calculate h t Pseudo-loss.
(4) Order theUpdating the weight value: />Wherein->
(3) Obtaining the final combined classifier
Step 6.1, inputting the processed data (step 5) into a model for training, and identifying an oil-gas-water layer of logging data through training an AdaBoost.M2-SVM algorithm;
the AdaBoost.M2-SVM classification model adopts the following steps to explain logging oil-gas-water layers;
(1) Dividing the data expanded in the step 5 into a training set, a verification set and a test set according to the proportion of 7:2:1;
(2) The training set and the verification set are input into an AdaBoost.M2-SVM model, and Accumey (Accuracy), precision (Precision), recall (Recall) and F1-score (F1 value) are calculated through a confusion matrix and serve as evaluation indexes of the model (see figure 3 of the specification). By comparing the accuracy of the traditional SVM model with that of the AdaBoost.M2-SVM under the same data enhancement algorithm, the accuracy of the AdaBoost.M2-SVM is found to be higher. The accuracy of the AdaBoost.M2-SVM is higher on different logging databases, and the AdaBoost.M2-SVM has better generalization capability.
According to the improved SVM-based unbalanced sample oil reservoir complex fluid identification method, firstly, data are expanded through an ADASYN model, so that the possibility of occurrence of fitting is reduced, and then, a plurality of weak SVM classifiers are combined through an AdaBoost.M2-SVM, so that the model popularization capability is improved, and the problem that the traditional method is inadaptive and multi-solution in unconventional oil reservoir fluid identification is solved. In addition, according to the method, the intelligent identification of the oil, gas and water layers of the reservoir can be realized in the oil reservoir fluid identification combined with logging based on an artificial intelligence algorithm. The method comprises the steps of data processing, algorithm design, model training and model deployment, and a set of reservoir oil-gas-water layer intelligent identification system combining logging data is formed.

Claims (5)

1. An unbalanced sample oil reservoir complex fluid identification method based on an improved SVM is characterized by comprising the following steps of: the identification method comprises the following steps:
step 1, collecting logging data of an arrangement research area, and establishing a data layer;
step 2, data in the data layer are arranged to obtain a final data set, wherein the data arrangement comprises data arrangement, abnormal data deletion, filling of empty data, data interpolation and data dimension reduction;
step 2.1, because the data of the 'fluorescence level' and 'logging comprehensive interpretation' columns in the original excel table of the data layer have default values, filling the default values is needed; for "fluorescence level", because this value is too low in some intervals that are not reservoirs, the staff is not recording the data into the original excel table, should actually be filled with 0, and is the same for "logging comprehensive interpretation", and is typically filled with "water layer" for blank data; for those actual missing values, selecting to delete the entire piece of data; for the gas measurement data, the invalid value in the original data is-999.25, and the whole data with negative data is selected to be deleted;
step 3, because the lithology data and the gas measurement data are inconsistent with the data fineness of other data in the step 2.1, interpolation is needed to be carried out on the lithology data and the gas measurement data;
step 3.1, because the precision of the gas measurement data is 1m for each data sampling point, and the data precision of the sensing data is 0.1m, the gas measurement data is interpolated, and different classical interpolation algorithms are adopted for interpolation, nearest neighbor interpolation, bilinear interpolation and bicubic interpolation algorithms, so that the gas measurement data is more refined;
step 3.2, lithology data are interval data, and the lithology of a depth interval is reflected;
step 4, step 1.2, counting 15 factors influencing the reservoir fluid property, and converting a plurality of non-linear related indexes into less indexes which are not related with each other through data dimension reduction by principal component analysis;
step 5, for the data processed in step 4, the data proportion difference between the oil layer and the oil-water same layer and the non-oil layer is too large, the oil layer occupies a relatively low proportion, and for the training of the model, the model is easy to perform poor performance because of the problem of unbalanced data, so that an improved self-adaptive synthetic sampling Adaptive Synthetic Sampling, namely an ADASYN algorithm, is required to be used for data enhancement; step 5.1, for the samples obtained in step 4, using the improved ADASYN to randomly oversample a few types of samples, the algorithm comprises the following calculation steps:
(1) training a classifier by using a training set of samples, and testing an identification result by using a verification set;
(2) calculating error division rate sigma of each class respectively through the obtained confusion matrix iWhere i, j= { a, b, c … }, TP i The real class is represented as i class, and the prediction is also the number of i class; FN (Fn) j The number of the real classes is i class, but predicted as j class; when sigma is i When the data is larger than the threshold value, selecting to perform data enhancement between the class i and the class j, otherwise, not performing data enhancement; the class with more samples between class i and class j is set as s m The smaller number is s l
(3) Calculation s m Sum s l The number of required synthesized samples between G, g=(s) m -s l ) X beta, wherein s m Sum s l The number of samples respectively representing more classes and less classes, and beta is a random number between 0 and 1;
(4) for each of the fewer classes of samples x i Calculating the duty ratio r of most classes in k samples with nearest Euclidean distance iDelta in i Representing the number of most classes of samples among the k samples nearest to euclidean distance, i=1, 2,3 … s l
(5) For r i Performing standardization
(6) Calculating the number of samples g required to be generated by a minority class
(7) Generating a new sample by using a traditional SMOTE algorithm according to g to be generated by each minority sample;
(8) sample N to be newly generated i Adding the training set to obtain a new training set, and repeating the steps (1) and (2) until the error fraction sigma is smaller than a threshold value;
step 6, constructing an AdaBoost.M2-SVM model;
the AdaBoost.M2 algorithm is used as popularization of the AdaBoost algorithm, and the multi-classification problem of the K class is converted into K-1 classification problems, so that the AdaBoost algorithm can be applied to the multi-classification problems; the adaboost.m2 algorithm computes classifier weights by combining multiple linear kernel SVM classifiers with different weights by maintaining a set of weight distributions for training data in a training set;
and 7, inputting the data processed in the step 5 into a model for training, and identifying the oil-gas-water layer of logging data through training an AdaBoost.M2-SVM algorithm.
2. The improved SVM based complex fluid identification method for unbalanced sample reservoirs of claim 1, wherein: the method for collecting and sorting logging data of a research area and establishing a data layer comprises the following steps:
step 1.1, selecting logging data of a plurality of wells in a work area to form a data set, and taking a group of data according to each interval of 0.1m to obtain a final required sample;
step 1.2, matching and splicing are carried out according to the depth characteristics of data because the depths of samples acquired by different sensors in a well are inconsistent and the training requirement of a model cannot be met, and the samples with different depths are subjected to uniform depth operation; lithology data, fluorescence levels, gas measurement data comprising C1, C2, C3, iC4, nC4, iC5, nC5, CO2 and H2S, and induction data comprising resistivity, natural gamma, neutrons and density are selected as characteristic data, a gas layer, an oil-water layer, a water layer and a dry layer are used as tag data, and a data layer is established by the above data.
3. The improved SVM based complex fluid identification method for unbalanced sample reservoirs of claim 1, wherein: the principle of the bilinear interpolation algorithm is as follows:
assuming that P is a point to be solved of the gas measurement data, adjacent points Q11, Q12, Q21 and Q22, respectively carrying out linear interpolation on the points Q11, Q21 and Q12 and Q22 to obtain R1 and R2, and then carrying out linear interpolation on the points R1 and R2 to obtain the value of the point P.
4. The improved SVM based complex fluid identification method for unbalanced sample reservoirs of claim 1, wherein: the main component analysis comprises the following calculation steps:
in step 4.1, there are n factors affecting the properties of the reservoir fluid, and the overall sample is m, and the sample matrix may be expressed as:
wherein x is ij A value representing a j-th variable in the i-th set of sample data;
step 4.2, as the factors influencing the low-resistance oil reservoir fluid properties of the research block are more, the factors have different dimensions, and the analysis of the main control factors of the oil reservoir fluid properties and the prediction results of the follow-up flow are influenced; in order to eliminate the dimensional influence among different factors, the matrix X is necessary to be subjected to standardized transformation to obtain a matrix Z, and the data is subjected to normalization processing by adopting mean reduction and variance division;
wherein the method comprises the steps of
Step 4.3, solving a correlation coefficient matrix for the standardized matrix Z
Solving characteristic equation |R-lambda I of sample correlation matrix R n N eigenvalues of =0 and λ 1 ≥λ 2 ≥…≥λ n ≥0
Step 4.4, determining the k value to ensure that the accumulated contribution rate of the information is more than 85 percent
Finally we get different influencing factors P1, P2 …, P12 in 12.
5. The improved SVM based complex fluid identification method for unbalanced sample reservoirs of claim 1, wherein: the AdaBoost.M2 algorithm comprises the following steps:
input training set s= { X i ,y i I=1, …, N; the number of samples is N, where the vector X i Representing the ith training sample, tagWherein i is a number, and K represents different category numbers; iteration times T;
at round t iteration, sample (x i ,y i ) The weight distribution is D t (i) The method comprises the steps of carrying out a first treatment on the surface of the The sample weights are equal at the beginning; the sample weight of the classification error is increased in each iteration, so that more training is obtained; presence of sample X i Which is correctly classified as y i Incorrect classification of y (non-y of K-1 species) i Class); assume training to obtain weak classifier h t The result is [0,1]Takes on values between, for the samples (x i ,y i ) Classifier h t Can be judged for K-1 times, three conditions can appear in each result, the classification is correct, the classification is wrong and the results are y and y i Randomly selecting one type; then, the probability of each discriminant error is
For the K classification problem, there are K-1 different y of different importance in different cases, thus each y is given a weightThen the pseudo-loss of adaboost.m2 is epsilon t
(1) Initializing weights of sample data toThe weight of a certain error label y of the sample i in the first iteration is: />
(2) Loop iteration t=1, …, T;
(1) the sum of the weights of the error labels of sample i in the t-th iterationFor y not equal to y i Has the following componentsSample distribution->
(2) According to the sample distribution D t (i) Reselecting the sample, calling the SVM to train the sample to obtain a sub-classifier h t
(3) Calculate h t Pseudo loss;
(4) order theUpdating the weight value: />Wherein the method comprises the steps of
(3) Obtaining a final combined classifier:
CN202311068108.XA 2023-08-23 2023-08-23 Improved SVM-based complex fluid identification method for unbalanced sample oil reservoir Pending CN117093922A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311068108.XA CN117093922A (en) 2023-08-23 2023-08-23 Improved SVM-based complex fluid identification method for unbalanced sample oil reservoir

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311068108.XA CN117093922A (en) 2023-08-23 2023-08-23 Improved SVM-based complex fluid identification method for unbalanced sample oil reservoir

Publications (1)

Publication Number Publication Date
CN117093922A true CN117093922A (en) 2023-11-21

Family

ID=88772914

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311068108.XA Pending CN117093922A (en) 2023-08-23 2023-08-23 Improved SVM-based complex fluid identification method for unbalanced sample oil reservoir

Country Status (1)

Country Link
CN (1) CN117093922A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436011A (en) * 2023-12-15 2024-01-23 四川泓宝润业工程技术有限公司 Machine pump equipment fault prediction method, storage medium and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117436011A (en) * 2023-12-15 2024-01-23 四川泓宝润业工程技术有限公司 Machine pump equipment fault prediction method, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
CN109635461B (en) Method and system for automatically identifying surrounding rock grade by using while-drilling parameters
CN110674841B (en) Logging curve identification method based on clustering algorithm
Chang et al. Lithofacies identification using multiple adaptive resonance theory neural networks and group decision expert system
CN110346831B (en) Intelligent seismic fluid identification method based on random forest algorithm
US8090538B2 (en) System and method for interpretation of well data
CN107356958A (en) A kind of fluvial depositional reservoir substep seismic facies Forecasting Methodology based on geological information constraint
CN105760673A (en) Fluvial facies reservoir earthquake sensitive parameter template analysis method
CN115758212A (en) Mechanical equipment fault diagnosis method based on parallel network and transfer learning
Zhu et al. Rapid identification of high-quality marine shale gas reservoirs based on the oversampling method and random forest algorithm
CN117093922A (en) Improved SVM-based complex fluid identification method for unbalanced sample oil reservoir
Ye et al. Drilling formation perception by supervised learning: Model evaluation and parameter analysis
CN115964667A (en) River-lake lithofacies well logging identification method based on deep learning and resampling
CN110552693A (en) layer interface identification method of induction logging curve based on deep neural network
CN116304941A (en) Ocean data quality control method and device based on multi-model combination
Chikhi et al. Probabilistic neural method combined with radial-bias functions applied to reservoir characterization in the Algerian Triassic province
CN113592028A (en) Method and system for identifying logging fluid by using multi-expert classification committee machine
CN114707597A (en) River facies tight sandstone reservoir complex lithofacies intelligent identification method and system
CN112987091A (en) Reservoir detection method and device, electronic equipment and storage medium
CN112257789A (en) Method for identifying surrounding rock grade
CN112084553A (en) Surveying method for tunnel planning
Gao et al. A novel automated machine-learning model for lithofacies recognition
CN117272841B (en) Shale gas dessert prediction method based on hybrid neural network
CN117574269B (en) Intelligent identification method and system for natural cracks of land shale reservoir
CN117407841B (en) Shale layer seam prediction method based on optimization integration algorithm
CN114021663B (en) Industrial process off-line data segmentation method based on sequence local discrimination information mining network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination