CN114970676A - Data mining-based fractured leakage particle-based plugging formula recommendation method - Google Patents

Data mining-based fractured leakage particle-based plugging formula recommendation method Download PDF

Info

Publication number
CN114970676A
CN114970676A CN202210441433.5A CN202210441433A CN114970676A CN 114970676 A CN114970676 A CN 114970676A CN 202210441433 A CN202210441433 A CN 202210441433A CN 114970676 A CN114970676 A CN 114970676A
Authority
CN
China
Prior art keywords
data
formula
lost circulation
particle size
plugging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210441433.5A
Other languages
Chinese (zh)
Other versions
CN114970676B (en
Inventor
王贵
何杰
徐生江
曹成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Petroleum University
Original Assignee
Southwest Petroleum University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Petroleum University filed Critical Southwest Petroleum University
Priority to CN202210441433.5A priority Critical patent/CN114970676B/en
Publication of CN114970676A publication Critical patent/CN114970676A/en
Application granted granted Critical
Publication of CN114970676B publication Critical patent/CN114970676B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Pure & Applied Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data mining-based fractured leakage particle-based plugging formula recommendation method, which comprises the following steps of: s1: acquiring lost circulation data and preprocessing the lost circulation data; s2: carrying out similarity analysis on the preprocessed lost circulation data to obtain candidate lost circulation data; s3: and performing leaking stoppage formula granularity analysis and clustering on leaking stoppage formula data in the candidate well leakage data, and determining a recommended formula set. The invention overcomes the defect that the leakage stoppage construction greatly depends on the experimental judgment of a leakage stoppage method by engineering technicians, carries out data mining on the leakage data through similarity analysis and a clustering algorithm, finally achieves the effect of recommending a field operation particle-based leakage stoppage formula in real time, and has positive practical significance for quick decision of a field leakage treatment scheme, selection of a scientific and reasonable drilling fluid leakage stoppage formula, improvement of the safety of a drilling project and one-time success rate of drilling leakage stoppage operation.

Description

Data mining-based fractured leakage particle-based plugging formula recommendation method
Technical Field
The invention belongs to the technical field of leakage stoppage of petroleum drilling engineering, and particularly relates to a method for recommending a fractured leakage particle-based leakage stoppage formula based on data mining.
Background
The leakage is a complex condition that working fluid (drilling fluid, workover fluid, well cementing cement slurry and the like) in a well leaks into a stratum in the operation processes of drilling, well completion and the like of petroleum and natural gas exploration and development, and the fracture leakage is the most common leakage type and is a long-standing and difficult-to-solve technical problem. Lost circulation not only consumes drilling time and loses mud, but also can cause a series of complex conditions such as drilling sticking, blowout, well collapse and the like, even cause well abandonment and cause great economic loss, so that the lost circulation must be effectively controlled.
The leakage stoppage is a post-treatment step for the leakage, and is the most important and indispensable ring in the drilling operation process. For the research on the work of plugging, the former work mainly focuses on: 1. research of a formation leakage model: the stratum leakage model research mainly focuses on the research of the influence factors of the drilling fluid leakage and the inversion of the stratum conditions around the well, so as to provide basic data for the selection of the plugging material; 2. the theory of borehole strengthening: aiming at the mechanism of plugging a crack leakage passage by a plugging material and the reason of improving the bearing capacity of a well wall after plugging the crack, a plurality of theories are provided, such as a ' tail sealing (Tip screening) ' theory, a ' Stress Cage ' (Stress Cage) ' theory, a ' crack closing Stress ' (crack Closure Stress) ' theory, a ' crack Propagation Resistance ' (crack Propagation Resistance) ' theory and the like; 3. the size selection criterion of the plugging material is as follows: the size and the particle size distribution of the particles are decisive factors of bridging and blocking behaviors of the particles in the cracks, and only a particle system with proper size and particle size distribution can effectively bridge and block in pore throats or cracks, so that particle size distribution selection standards such as'd 1/2 rule', '1/3 bridging rule' and the like based on an ideal filling theory are provided. Although the previous theories and experimental researches are rich, the similarity research of well leakage data and the data mining of key parameters of a leaking stoppage formula are not involved.
Disclosure of Invention
The invention provides a method for recommending a fractured leakage particle-based leaking stoppage formula based on data mining in order to solve the problems.
The technical scheme of the invention is as follows: a data mining-based fractured leakage particle-based plugging formula recommendation method comprises the following steps:
s1: acquiring lost circulation data and preprocessing the lost circulation data;
s2: carrying out similarity analysis on the preprocessed lost circulation data to obtain candidate lost circulation data;
s3: and performing leak stopping formula granularity analysis and clustering on the leak stopping formula data in the candidate well leakage data to determine a recommended formula set.
Further, step S1 includes the following sub-steps:
s11: performing data cleaning on the lost circulation data to obtain optimized lost circulation data;
s12: performing characteristic coding on the optimized lost circulation data to obtain coded lost circulation data;
s13: and carrying out data specification on the encoded lost circulation data to finish the preprocessing of the lost circulation data.
Further, in step S11, the specific method for performing data cleansing is as follows: filling missing lost well leakage data, detecting abnormal well leakage data by using a box separation method, and filling the abnormal well leakage data;
in the step S12, character type lost circulation data in the optimized lost circulation data are subjected to characteristic coding by adopting One-Hot coding;
step S13 includes the following substeps:
s131: carrying out mean normalization on the encoded lost circulation data, updating the lost circulation characteristic value of the encoded lost circulation data, and carrying out the jth lost circulation characteristic value of the ith encoded lost circulation data after normalization
Figure BDA0003614117100000021
The calculation formula of (2) is as follows:
Figure BDA0003614117100000022
wherein the content of the first and second substances,
Figure BDA0003614117100000023
representing the original lost circulation characteristic value, M representing the total number of lost circulation data samples, N representing the number of lost circulation data samplesTotal number of symbols, mu j The mean value of the lost circulation characteristic is represented,
Figure BDA0003614117100000024
a standard deviation representing a lost circulation characteristic;
s132: calculating a covariance matrix sigma of the encoded lost circulation data after updating the lost circulation eigenvalue, and performing singular value decomposition on the covariance matrix sigma to obtain an eigenvector matrix U of the covariance matrix, wherein the calculation formula is as follows:
Figure BDA0003614117100000025
wherein x is (i) A feature vector representing an ith lost circulation data sample;
s133: singular value decomposition is carried out on the covariance matrix sigma to obtain an eigenvector matrix U and a square matrix S of the covariance matrix sigma, and the dimension reduction dimension of the lost circulation data is determined according to the square matrix S;
s134: performing dimension reduction according to the eigenvector matrix U of the covariance matrix, determining the leakage characteristic dimension of the encoded leakage data after dimension reduction, calculating the leakage eigenvector of the encoded leakage data after dimension reduction, completing data specification, and performing dimension reduction on the leakage eigenvector z of the ith encoded leakage data (i) The calculation formula of (2) is as follows:
Figure BDA0003614117100000026
wherein, U reduce Representing a lost circulation data dimension reduction matrix;
in step S134, the lost circulation characteristic dimension k of the encoded lost circulation data after dimension reduction satisfies
Figure BDA0003614117100000027
Wherein S is ii Representing data points on the diagonal of the square matrix S.
Further, step S2 includes the following sub-steps:
s21: calculating the similarity of numerical data and the similarity of character data in the preprocessed well leakage data;
s22: calculating the overall similarity of the lost circulation data according to the similarity of the numerical data and the similarity of the character data;
s23: (d, cd, fp) constructing numerical data 1 ,fp 2 ) (r, cr, sp) of sensitive LSH function family and character type data 1 ,sp 2 ) A family of sensitive LSH functions, where c represents the approximate near factor of the lost circulation data, d represents the Euclidean distance sensitivity range of the numerical data, fp 1 Represents the lower limit of the numerical data similarity probability fp 2 Representing the upper limit of the probability of similarity of the numerical data, r representing the Jaccard distance sensitivity range, sp, of the character-type data 1 Representing a lower limit of probability of similarity, sp, of character-type data 2 Representing an upper limit of probability of similarity of character-type data;
s24: and constructing a binary mixed index of the lost circulation data according to the overall similarity of the lost circulation data, the LSH function family of the numerical data and the LSH function family of the character data to obtain the candidate lost circulation data.
Further, in step S21, the calculation formula of the similarity distE of the numerical data is:
Figure BDA0003614117100000031
wherein, Euclidean Dist (·) represents the calculation of the lost circulation data object o 1 And o 2 Euclidean distance function of between, o 1 F-type represents the lost circulation data object o 1 Numerical data of (a), o 2 F-type represents the lost circulation data object o 2 Of (d), dmax represents a lost circulation data object o 1 And o 2 Maximum distance between numerical data features;
in step S21, the calculation formula of the similarity distJ of character type data is:
Figure BDA0003614117100000032
wherein o is 1 ·s-typeRepresenting lost circulation data object o 1 Character-type data of o 2 S-type represents the lost circulation data object o 2 The character type data of (a);
in step S22, the calculation formula of the overall similarity dist of the lost circulation data is:
dist=α×distE+(1-m)×diStJ
wherein α represents a weighting parameter for both types of lost circulation data;
in step S23, (d, cd, fp) of numerical data 1 ,fp 2 ) The expression of the sensitive LSH functional family h (o.f-type) is:
Figure BDA0003614117100000033
Figure BDA0003614117100000034
Figure BDA0003614117100000035
where a represents a randomly generated d-dimensional vector, b represents a randomly generated real number between (0, W), c represents an approximate factor of the lost circulation data, d represents the Euclidean distance sensitivity range of the numerical data, W represents a constant, o.f-type represents the numerical data of the lost circulation data object o, t represents an integral variable, fp represents 1 Represents the lower limit of the numerical data similarity probability fp 2 Representing the upper limit of the probability of similarity of numerical data, f 2 (. cndot.) represents a standard regular probability density function;
in step S23, (r, cr, sp) of character type data 1 ,sp 2 ) The expression of the sensitive LSH functional family h (O.s-type) is:
h(O·s-type)=argming(q),q∈O·s-type
where O.s-type represents a lost circulation character-type data set, q represents character-type data in a lost circulation data object, g (-) represents a random number generation function, and r represents the Jaccard distance sensitivity range of character-type data,sp 1 Representing a lower limit of probability of similarity, sp, of character-type data 2 Representing an upper limit of probability of similarity for the character-type data.
Further, step S24 includes the following sub-steps:
s241: randomly selecting k from LSH function family of numerical data and LSH function family of character data 1 And k 2 The hash functions form a well leakage data binary mixed index LSH function family G, and the expression is as follows:
G=g(o)
Figure BDA0003614117100000041
wherein h is 1 (·),…
Figure BDA0003614117100000042
Representing sets of LSH functions, h, of numerical data 1 (·),…,
Figure BDA0003614117100000043
Representing character type data LSH function set, g (o) representing well leakage data binary mixed index LSH function set, o.f-type representing numerical data of well leakage data object o, O.s-type representing well leakage character type data set;
s242: randomly selecting p well leakage data hash functions G from a well leakage data binary mixed index LSH function family G 1 ,…,g p And storing the hash values corresponding to the p hash functions of the lost circulation data into corresponding hash buckets, wherein the calculation formula of the number p of the selected hash functions of the lost circulation data is as follows:
Figure BDA0003614117100000044
where M represents the total number of lost circulation data samples, fp 1 Represents the lower limit of the numerical data similarity probability fp 2 Represents the upper limit of the similarity probability of numerical data, sp 1 Representing a lower limit of probability of similarity, sp, of character-type data 2 Representing character type data similarity probabilityAn upper limit;
s243: in a hash bucket, will conform to distE (o) i ,q)<cd、distJ(o i ,q)<cr and dist (o) i ,q)<The well leakage data of epsilon is used as candidate well leakage data, wherein, o i The method comprises the steps of representing any lost circulation data, q representing a new lost circulation field query object, distE representing similarity of numerical data, distJ representing similarity of character data, c representing approximate factor of the lost circulation data, d representing Euclidean distance sensitive range of the numerical data, r representing Jaccard distance sensitive range of the character data, and epsilon representing an overall similarity threshold of the lost circulation data.
Further, step S3 includes the following sub-steps:
s31: calculating the discrete cumulative particle size distribution of the plugging formula data in the candidate lost circulation data;
s32: calculating the continuous cumulative particle size distribution of the leaking stoppage formula data by using an interpolation method according to the discrete cumulative particle size distribution of the leaking stoppage formula data;
s33: according to the continuous accumulated particle size distribution of the leaking stoppage formula data, taking the minimum particle size in the leaking stoppage formula data as an initial value, iteratively calculating the accumulated particle size of the leaking stoppage formula data under the current particle size, and determining the key particle size of the leaking stoppage formula according to the accumulated particle size of the leaking stoppage formula data under the current particle size until the maximum particle size is reached;
s34: performing K-means clustering analysis on a leaking stoppage formula parameter sample set consisting of the key particle size of a leaking stoppage formula and the corresponding formula concentration to obtain the clustering center point of a cluster divided by each formula parameter sample;
s35: and determining a recommended formula set according to the clustering center points of the clusters divided by each formula parameter sample.
Further, in step S31, the discrete cumulative particle size y corresponding to the i +1 th particle size value in the plugging formula data l+1 The calculation formula of (2) is as follows:
y l+1 =y ll+1 ,O≤l≤Q
wherein, y l Indicates the cumulative particle size, eta, corresponding to the ith particle size value l+1 Set representing the l +1 th particle size value correspondenceGranularity forming, wherein Q represents the total number of composition granularity distribution intervals;
in step S32, the continuous cumulative particle size distribution H of the plugging formula data 3 (x) The calculation formula of (2) is as follows:
Figure BDA0003614117100000051
wherein x represents the particle size of the point to be interpolated, x l Representing the particle size, x, of the left end of the point to be interpolated l+1 Representing the granularity value, y, of the right end point of the point to be interpolated l Representing the cumulative distribution of the left-hand particle size values, y l+1 Representing the cumulative distribution, y, corresponding to the particle size value of the right end point l ' cumulative distribution derivative, y, corresponding to the particle size value at the left end l+1 ' represents the cumulative distribution derivative corresponding to the right end point particle size value;
in step S33, the calculation formula of the cumulative particle size gr of the plugging formula data at the current particle size is:
Figure BDA0003614117100000052
wherein n represents the total number of the plugging formula materials, epsilon i Denotes the concentration of the i-th material, y il-1 Denotes the cumulative particle size, y, corresponding to the i-th 1-th particle size value of the i-th material il Represents the cumulative particle size, rho, corresponding to the ith particle size value of the ith material i The density of the ith material is shown, and V represents the total volume of the formula;
in step S33, the specific method for determining the key particle size of the plugging formula is as follows: if the cumulative particle size gr of the plugging formula data under the current particle size reaches 10%, the particle size of the current plugging formula is D10; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 50%, the particle size of the current plugging formula is D50; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 90%, the particle size of the current plugging formula is D90.
Further, step S34 includes the following sub-steps:
S341:in the leaking stoppage formula parameter sample set E, calculating the distance between each formula parameter sample and the corresponding initial formula parameter mean vector, and determining the nearest distance, the distance d between each formula parameter sample and the corresponding formula parameter mean vector ji The calculation formula of (2) is as follows:
d ji =||e ji || 2
wherein e is j Represents each recipe parameter sample, μ i An initial recipe parameter mean vector representing each recipe parameter sample;
s342: according to the nearest distance, determining cluster marks of all formula parameter samples, dividing all formula parameter samples into corresponding clusters according to the cluster marks of all formula parameter samples, and marking the cluster of all formula parameter samples i and the cluster of all formula parameter samples C newi The calculation formulas of (A) and (B) are respectively as follows:
i=argmind ji
c newi =c i ∪{e j };
wherein, C i Representing the original recipe parameter sample cluster division;
s343: calculating the formula parameter mean vector of each formula parameter sample after clustering, updating the formula parameter mean vector and performing cluster division again when the formula parameter mean vectors before and after clustering of each formula parameter sample are inconsistent, determining the cluster center point of each formula parameter sample divided cluster and the formula parameter mean vector mu 'of each formula parameter sample after clustering' i The calculation formula of (c) is:
Figure BDA0003614117100000061
where e represents a sample in the recipe parameter sample cluster.
Further, step S35 includes the following sub-steps:
s351: setting the random sampling times of the recommended formula;
s352: randomly adding the on-site plugging material to obtain a plugging formula;
s353: judging whether the current plugging formula is recommended or not, if so, returning to the step S352, and otherwise, entering the step S354;
s354: judging whether the particle size of the current leaking stoppage formula meets the requirement of a recommended formula set or not according to the clustering center point of the cluster divided by each formula parameter sample, if so, adding the current leaking stoppage formula into the recommended formula set, otherwise, returning to the step S352 until the set random sampling times are reached, and determining the final recommended formula set;
in step S351, the random addition of each recipe parameter sample
Figure BDA0003614117100000066
The total concentration S of the formula and the leaking stoppage formula meets the requirement
Figure BDA0003614117100000062
Wherein m represents the random selection quantity of the existing plugging materials on site,
Figure BDA0003614117100000067
the addition amount of the i-th plugging material is shown;
in step S354, a formula set is recommended, and if the particle size of the current plugging formula is D10, the requirement of the recommended formula set is
Figure BDA0003614117100000063
If the particle size of the current plugging formula is D50, the requirements of the recommended formula set are
Figure BDA0003614117100000064
If the particle size of the current plugging formula is D90, the requirements of the recommended formula set are
Figure BDA0003614117100000065
Wherein, D10 re The parameter value of the recommended grain size D10 of the plugging formula is shown as D10 acc A parameter value representing the grain size D10 of the plugging formula at the cluster center point,D50 re the parameter value of the recommended grain size D50 of the plugging formula is shown as D50 acc The parameter value D50 of the particle size of the plugging formula at the cluster center point is represented as D90 re The parameter value of the recommended grain size D90 of the plugging formula is shown as D90 acc And (4) a parameter value representing the particle size D90 of the plugging formula at the cluster center point.
The invention has the beneficial effects that: the invention overcomes the defect that the leakage stoppage construction greatly depends on the experimental judgment of a leakage stoppage method by engineering technicians, carries out data mining on the leakage data through similarity analysis and a clustering algorithm, finally achieves the effect of recommending a field operation particle-based leakage stoppage formula in real time, and has positive practical significance for quick decision of a field leakage treatment scheme, selection of a scientific and reasonable drilling fluid leakage stoppage formula, improvement of the safety of a drilling project and one-time success rate of drilling leakage stoppage operation.
Drawings
Fig. 1 is a flow chart of a method for recommending a fracture lost-particle-based plugging formulation based on data mining.
Detailed Description
The embodiments of the present invention will be further described with reference to the accompanying drawings.
Before describing specific embodiments of the present invention, in order to make the solution of the present invention more clear and complete, the definitions of the abbreviations and key terms appearing in the present invention will be explained first:
a box separation method: the values of the stored data are smoothed by looking at "neighbors" (surrounding values), the depth of a bin is used to indicate that the same number of data are in different bins, and the width of a bin is used to indicate the value interval of each bin value.
One-Hot encoding: an N-bit status register is used to encode N states, each having its own independent register bit and only one of which is active at any one time.
D10: the cumulative particle size distribution of a sample reaches 10% of the corresponding particle size. Its physical meaning is that the particles have a size of less than 10% of its particle size.
D50: the cumulative percent particle size distribution for a sample at 50% corresponds to the particle size. Its physical meaning is that the particle size is greater than 50% of its particles and less than 50% of its particles, D50 also being referred to as the median or median particle size. D50 is commonly used to denote the average particle size of the particles.
D90: the cumulative particle size distribution of a sample reaches 90% of the corresponding particle size. Its physical meaning is that 90% of the particles have a particle size smaller than it.
As shown in fig. 1, the invention provides a data mining-based method for recommending a fractured leakage particle-based leakage stoppage formula, which comprises the following steps:
s1: acquiring lost circulation data and preprocessing the lost circulation data;
s2: carrying out similarity analysis on the preprocessed lost circulation data to obtain candidate lost circulation data;
s3: and performing leak stopping formula granularity analysis and clustering on the leak stopping formula data in the candidate well leakage data to determine a recommended formula set.
In the embodiment of the invention, the required sample data characteristics are determined:
1) formation parameters: structure type, lithology, horizon, top bound depth, bottom bound depth;
2) well section parameters: diameter, well depth, well inclination;
3) drilling fluid parameters: system, inlet and outlet flow, density, rheological parameters, drilling fluid pool volume and solid content;
4) engineering parameters are as follows: bit diameter, hook load, drilling rate, bit pressure, torque;
5) leakage parameters: loss speed, loss amount, loss time, loss degree (micro-leakage, large-leakage and loss return loss), loss working condition and drill bit position;
6) plugging parameters: the type of the leaking stoppage slurry (while drilling and stopping drilling), the volume of the leaking stoppage slurry, the leaking stoppage formula (comprising materials, addition and concentration), and the leaking stoppage effect (success, failure and reduction of leaking speed);
7) parameters of the plugging material are as follows: type (such as sheet), specification (such as 1-3 mm), manufacturer, density, and composition particle size distribution.
In the embodiment of the present invention, step S1 includes the following sub-steps:
s11: performing data cleaning on the lost circulation data to obtain optimized lost circulation data;
s12: performing characteristic coding on the optimized lost circulation data to obtain coded lost circulation data;
s13: and carrying out data specification on the encoded lost circulation data to finish the preprocessing of the lost circulation data.
In the embodiment of the present invention, in step S11, the specific method for performing data cleansing is as follows: filling missing lost well leakage data, detecting abnormal well leakage data by using a box separation method, and filling the abnormal well leakage data;
in the step S12, character type lost circulation data in the optimized lost circulation data are subjected to characteristic coding by adopting One-Hot coding; such as lithology: sandstone, mudstone, carbonate rock, conglomerate, igneous rock; the corresponding codes are: 00001, 00010, 00100, 01000, 10000;
in step S13, the encoded lost circulation data includes M pieces of data, and the entire data set is represented as: x is the number of (1) ,x( 2 ),…,x (M) (ii) a Each piece of data comprises N characteristics, and each well leakage data characteristic value can represent:
Figure BDA0003614117100000081
i is more than or equal to 1 and less than or equal to M, and j is more than or equal to 1 and less than or equal to N. In order to eliminate irrelevant redundant features and useless noise, the dimensionality reduction of the lost circulation features is carried out by PCA principal component analysis, and the method comprises the following sub-steps:
s131: carrying out mean value normalization on the coded well leakage data, updating the well leakage characteristic value of the coded well leakage data, and carrying out the jth well leakage characteristic value of the ith coded well leakage data after normalization
Figure BDA0003614117100000082
The calculation formula of (2) is as follows:
Figure BDA0003614117100000083
wherein the content of the first and second substances,
Figure BDA0003614117100000084
representing the original lost circulation characteristic value, M representing the total number of lost circulation data samples, N representing the total number of lost circulation data characteristics, mu j A mean value of the lost circulation feature is represented,
Figure BDA0003614117100000085
a standard deviation representing a lost circulation characteristic;
s132: calculating a covariance matrix sigma of the encoded lost circulation data after updating the lost circulation eigenvalue, and performing singular value decomposition on the covariance matrix sigma to obtain an eigenvector matrix U of the covariance matrix, wherein the calculation formula is as follows:
Figure BDA0003614117100000091
wherein x is (i) A feature vector representing an ith lost circulation data sample;
s133: singular value decomposition is carried out on the covariance matrix sigma to obtain an eigenvector matrix U and a square matrix S of the covariance matrix sigma, and dimension reduction of the well leakage data is determined according to the square matrix S;
the formula for performing singular value decomposition on the covariance matrix Σ is as follows: Σ ═ USV T Wherein S represents a diagonal matrix used for determining the dimensionality reduction of the lost circulation data, U represents a lost circulation data deviation matrix, and V represents a lost circulation data variance matrix;
s134: reducing the dimension according to the eigenvector matrix U of the covariance matrix, determining the lost circulation characteristic dimension of the encoded lost circulation data after dimension reduction, calculating the lost circulation characteristic vector of the encoded lost circulation data after dimension reduction, completing the data specification, and obtaining the lost circulation characteristic vector z of the ith encoded lost circulation data after dimension reduction (i) The calculation formula of (2) is as follows:
Figure BDA0003614117100000092
wherein, U reduce Representing a lost circulation data dimension reduction matrix which consists of the first k vectors of the U matrix;
in step S134, encoding lost circulation data after dimension reductionSatisfies the leak characteristic dimension k
Figure BDA0003614117100000093
Wherein S is ii Representing the data points on the diagonal of the square matrix S.
In step S133, the lost circulation characteristic dimension k of the encoded lost circulation data after dimension reduction satisfies
Figure BDA0003614117100000094
Wherein S is ii Data points on the diagonal of the matrix S are represented for calculating the dimensionality reduction deviation of the lost circulation data, and less than 5% of the data points represent that 95% of the deviation of the lost circulation data is retained.
In the embodiment of the present invention, step S2 includes the following sub-steps:
s21: calculating the similarity of numerical data and the similarity of character data in the preprocessed well leakage data;
s22: calculating the overall similarity of the lost circulation data according to the similarity of the numerical data and the similarity of the character data;
s23: (d, cd, fp) constructing numerical data 1 ,fp 2 ) (r, cr, sp) of sensitive LSH function family and character type data 1 ,sp 2 ) A family of sensitive LSH functions, where c represents the approximate near factor of the lost circulation data, d represents the Euclidean distance sensitivity range of the numerical data, fp 1 Represents the lower limit of the numerical data similarity probability fp 2 Representing the upper limit of the probability of similarity of the numerical data, r representing the Jaccard distance sensitivity range, sp, of the character-type data 1 Representing a lower limit of probability of similarity, sp, of character-type data 2 Representing an upper limit of probability of similarity of character-type data;
s24: and constructing a binary mixed index of the lost circulation data according to the overall similarity of the lost circulation data, the LSH function family of the numerical data and the LSH function family of the character data to obtain the candidate lost circulation data.
The preprocessed well leakage data characteristic vector is z (i) Including numerical and character type features. For numerical features, Euclidean distance is adopted to judge similarity, and for character features, Jaccard distance is adopted to judge phaseSimilarity; by the binary mixed LSH algorithm, the lost circulation data which are most similar to the field lost circulation query example can be quickly found, and a basis is provided for subsequent quick leaking stoppage decision.
In the embodiment of the present invention, in step S21, the calculation formula of the similarity distE of the numerical data is:
Figure BDA0003614117100000101
wherein Euclidean Dist (·) represents the calculated lost circulation data object o 1 And o 2 Of the Euclidean distance function of o 1 F-type represents the lost circulation data object o 1 Numerical data of o 2 F-type represents the lost circulation data object o 2 Of (d), dmax represents a lost circulation data object o 1 And o 2 Maximum distance between numerical data features;
in step S21, the calculation formula of the similarity distJ of character-type data is:
Figure BDA0003614117100000102
wherein o is 1 S-type represents the lost circulation data object o 1 Character-type data of o 2 S-type represents the lost circulation data object o 2 Character type data of (1);
in step S22, for any two lost circulation data objects o 1 ,o 2 Linear weighted summation is adopted, and the calculation formula of the overall similarity dist of the lost circulation data is as follows:
dist=α×distE+(1-α)×distJ
wherein α represents a weighting parameter for both types of lost circulation data;
in step S23, (d, cd, fp) of numerical data 1 ,fp 2 ) The expression of the sensitive LSH functional family h (o.f-type) is:
Figure BDA0003614117100000103
Figure BDA0003614117100000104
Figure BDA0003614117100000105
where a represents a randomly generated d-dimensional vector, b represents a randomly generated real number between (0, W), c represents an approximate factor of the lost circulation data, d represents the Euclidean distance sensitivity range of the numerical data, W represents a constant, o.f-type represents the numerical data of the lost circulation data object o, t represents an integral variable, fp represents 1 Represents the lower limit of the numerical data similarity probability fp 2 Representing the upper limit of the probability of similarity of numerical data, f 2 (. cndot.) represents a standard regular probability density function; fp 1 And fp 2 The binary mixed index is used for subsequently constructing the lost circulation data;
in step S23, (r, cr, sp) of character type data 1 ,sp 2 ) The expression of the sensitive LSH functional family h (O.s-type) is:
h(O.s-type)=argming(q),q∈0.s-type
where O.s-type represents the lost circulation character-type data set, q represents the character-type data in the lost circulation data object, g (-) represents the random number generation function, r represents the Jaccard distance sensitivity range of the character-type data, sp 1 Represents the lower limit of similarity probability, sp, of character-type data 2 Representing the upper limit of the probability of similarity, sp, of character-type data 1 =1-r,sp 2 =1-cr,sp 1 And sp 2 And the binary mixed index is used for subsequently constructing the well leakage data.
In the embodiment of the present invention, step S24 includes the following sub-steps:
s241: randomly selecting k from LSH function family of numerical data and LSH function family of character data 1 And k 2 And the hash functions form a well leakage data binary mixed index LSH function family G, and the expression is as follows:
G=g(o)
Figure BDA0003614117100000111
k 1 and k 2 The values of (A) are as follows:
Figure BDA0003614117100000112
Figure BDA0003614117100000113
wherein h is 1 (·),…
Figure BDA0003614117100000114
Representing sets of LSH functions, h, of numerical data 1 (·),…,
Figure BDA0003614117100000115
Representing character type data LSH function set, g (o) representing well leakage data binary mixed index LSH function set, o.f-type representing numerical data of well leakage data object o, O.s-type representing well leakage character type data set;
s242: randomly selecting p well leakage data hash functions G from the well leakage data binary mixed index LSH function family G 1 ,…,g p And storing the hash values corresponding to the p well leakage data hash functions into corresponding hash buckets, wherein the calculation formula of the number p of the selected well leakage data hash functions is as follows:
Figure BDA0003614117100000116
where M represents the total number of lost circulation data samples, fp 1 Represents the lower limit of the numerical data similarity probability fp 2 Represents the upper limit of the numerical data similarity probability, sp 1 Representing a lower limit of probability of similarity, sp, of character-type data 2 Representing an upper limit of probability of similarity of character-type data;
s243: in a hash bucket, will conform to distE (o) i ,q)<cd、distJ(o i ,q)<cr and dist (o) i ,q)<The well leakage data of epsilon is used as candidate well leakage data, wherein, o i The method comprises the steps of representing any lost circulation data, q representing a new lost circulation field query object, distE representing similarity of numerical data, distJ representing similarity of character data, c representing approximate factor of the lost circulation data, d representing Euclidean distance sensitive range of the numerical data, r representing Jaccard distance sensitive range of the character data, and epsilon representing an overall similarity threshold of the lost circulation data.
In the embodiment of the present invention, step S3 includes the following sub-steps:
s31: calculating the discrete cumulative particle size distribution of the plugging formula data in the candidate lost circulation data;
s32: calculating the continuous cumulative particle size distribution of the leaking stoppage formula data by using an interpolation method according to the discrete cumulative particle size distribution of the leaking stoppage formula data;
s33: according to the continuous accumulated particle size distribution of the leaking stoppage formula data, taking the minimum particle size in the leaking stoppage formula data as an initial value, iteratively calculating the accumulated particle size of the leaking stoppage formula data under the current particle size, and determining the key particle size of the leaking stoppage formula according to the accumulated particle size of the leaking stoppage formula data under the current particle size until the maximum particle size is reached;
s34: performing K-means clustering analysis on a leaking stoppage formula parameter sample set consisting of the key particle size of a leaking stoppage formula and the corresponding formula concentration to obtain the clustering center point of a cluster divided by each formula parameter sample;
s35: and determining a recommended formula set according to the clustering center points of the clusters divided by each formula parameter sample.
The lost circulation data obtained by the similarity analysis model comprises lost circulation formula data, and the lost circulation formula data is subjected to particle size analysis and mainly comprises four parameters: d10, D50, D90 and the recipe concentration, and then clustering the recipe parameter set using a K-means clustering algorithm and outputting a cluster center point.
In the embodiment of the present invention, in step S31, the (l + 1) th granule in the plugging formula dataDiscrete cumulative granularity y corresponding to value l+1 The calculation formula of (2) is as follows:
y l+1 =y ll+1 ,0≤l≤Q
y 0 =η 0
wherein, y l Indicates the cumulative particle size, eta, corresponding to the ith particle size value l+1 The composition granularity corresponding to the (l + 1) th granularity value is represented, and Q represents the total number of composition granularity distribution intervals; eta 0 Denotes the starting point composition particle size, y 0 Representing a starting point cumulative granularity;
in step S32, the continuous cumulative particle size distribution H of the plugging formula data 3 (x) The calculation formula of (2) is as follows:
Figure BDA0003614117100000121
wherein x represents the particle size of the point to be interpolated, x l Representing the particle size, x, of the left end of the point to be interpolated l+1 Representing the granularity value, y, of the right end point of the point to be interpolated l Representing the cumulative distribution of the left-hand particle size values, y l+1 Representing the cumulative distribution, y, corresponding to the particle size value of the right end point l ' cumulative distribution derivative, y, corresponding to the particle size value at the left end l+1 ' represents the cumulative distribution derivative corresponding to the right end point particle size value;
in step S33, the calculation formula of the cumulative particle size gr of the plugging formula data at the current particle size is:
Figure BDA0003614117100000122
wherein, the total number of the plugging formula materials is shown as epsilon i Denotes the concentration of the i-th material, y il-1 Denotes the cumulative particle size, y, corresponding to the i-th 1-th particle size value of the i-th material il Represents the cumulative particle size, rho, corresponding to the ith particle size value of the ith material i The density of the ith material is shown, and V is the total volume of the formula;
in step S33, the specific method for determining the key particle size of the plugging formula is as follows: if the cumulative particle size gr of the plugging formula data under the current particle size reaches 10%, the critical particle size of the plugging formula is D10; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 50%, the critical particle size of the plugging formula is D50; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 90%, the critical particle size of the plugging formula is D90.
In the embodiment of the present invention, step S34 includes the following sub-steps:
s341: in the leaking stoppage formula parameter sample set E, calculating the distance between each formula parameter sample and the corresponding initial formula parameter mean vector, and determining the nearest distance, the distance d between each formula parameter sample and the corresponding formula parameter mean vector ji The calculation formula of (2) is as follows:
d ji =||e ji || 2
wherein e is j Represents each recipe parameter sample, μ i An initial recipe parameter mean vector representing each recipe parameter sample;
s342: according to the nearest distance, determining cluster marks of all formula parameter samples, dividing all formula parameter samples into corresponding clusters according to the cluster marks of all formula parameter samples, and marking the cluster of all formula parameter samples i and the cluster of all formula parameter samples C newi The calculation formulas of (A) and (B) are respectively as follows:
i=argmin ji
C newi =C i ∪{e j };
wherein, C i Representing the original recipe parameter sample cluster division;
s343: calculating the formula parameter mean vector of each formula parameter sample after clustering, updating the formula parameter mean vector and performing cluster division again when the formula parameter mean vectors before and after clustering of each formula parameter sample are inconsistent, determining the cluster center point of each formula parameter sample divided cluster and the formula parameter mean vector mu 'of each formula parameter sample after clustering' i The calculation formula of (2) is as follows:
Figure BDA0003614117100000131
where e represents a sample in the recipe parameter sample cluster.
In the embodiment of the present invention, step S35 includes the following sub-steps:
s351: setting the random sampling times of the recommended formula;
s352: randomly adding the on-site plugging material to obtain a plugging formula;
s353: judging whether the current plugging formula is recommended or not, if so, returning to the step S352, and otherwise, entering the step S354;
s354: judging whether the particle size of the current leaking stoppage formula meets the requirement of a recommended formula set or not according to the clustering center point of the cluster divided by each formula parameter sample, if so, adding the current leaking stoppage formula into the recommended formula set, otherwise, returning to the step S352 until the set random sampling times are reached, and determining the final recommended formula set;
taking a clustering center point mu 'output by a clustering algorithm' i The four parameters of D10, D50 and D90 and the total concentration of the formula are input into a leaking stoppage formula recommendation algorithm to obtain a recommended formula, so that quick decision support is provided for field leaking stoppage.
In step S351, the random addition of each recipe parameter sample
Figure BDA0003614117100000145
The total concentration S of the formula and the plugging formula meets the requirement
Figure BDA0003614117100000141
Wherein m represents the random selection quantity of the existing plugging materials on site,
Figure BDA0003614117100000146
the addition amount of the i-th plugging material is shown;
in step S354, a formula set is recommended, and if the particle size of the current plugging formula is D10, the requirement of the recommended formula set is
Figure BDA0003614117100000142
If the particle size of the current plugging formula is D50, the requirements of the recommended formula set are
Figure BDA0003614117100000143
If the particle size of the current plugging formula is D90, the requirements of the recommended formula set are
Figure BDA0003614117100000144
Wherein, D10 re The parameter value of the recommended grain size D10 of the plugging formula is shown as D10 acc The parameter value of the plugging particle size D10 of the cluster center point plugging formula D50 re The parameter value of the recommended grain size D50 of the plugging formula is shown as D50 acc The parameter value D50 of the particle size of the plugging formula at the cluster center point is represented as D90 re The parameter value of the recommended grain size D90 of the plugging formula is shown as D90 acc And (4) a parameter value representing the particle size D90 of the plugging formula at the cluster center point.
The working principle and the process of the invention are as follows: firstly, establishing a lost circulation database by using lost circulation data, and carrying out pretreatment such as data cleaning, characteristic coding, data reduction and the like on the lost circulation data; then, carrying out similarity analysis on the preprocessed lost circulation data by using a binary mixed LSH algorithm, and inquiring the lost circulation data of the well history with the similarity larger than a set threshold value according to the new lost circulation data; secondly, calculating the characteristic granularity of the inquired historical leaking stoppage formula, training a K-mean clustering model by taking leaking stoppage formulas D10, D50, D90 and concentration as input characteristics, and outputting a clustering center point; and finally, recommending a particle-based plugging formula by using the clustering center point.
The invention has the beneficial effects that: the invention overcomes the defect that the leakage stoppage construction greatly depends on the experimental judgment of a leakage stoppage method by engineering technicians, carries out data mining on the leakage data through similarity analysis and a clustering algorithm, finally achieves the effect of recommending a field operation particle-based leakage stoppage formula in real time, and has positive practical significance for quick decision of a field leakage treatment scheme, selection of a scientific and reasonable drilling fluid leakage stoppage formula, improvement of the safety of a drilling project and one-time success rate of drilling leakage stoppage operation.
It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims (10)

1. A method for recommending a fractured lost particle-based plugging formula based on data mining is characterized by comprising the following steps of:
s1: acquiring lost circulation data and preprocessing the lost circulation data;
s2: carrying out similarity analysis on the preprocessed lost circulation data to obtain candidate lost circulation data;
s3: and performing leak stopping formula granularity analysis and clustering on the leak stopping formula data in the candidate well leakage data to determine a recommended formula set.
2. The data mining-based fractured leakage particle-based plugging formulation recommendation method according to claim 1, wherein the step S1 comprises the following sub-steps:
s11: performing data cleaning on the lost circulation data to obtain optimized lost circulation data;
s12: performing characteristic coding on the optimized lost circulation data to obtain coded lost circulation data;
s13: and carrying out data specification on the encoded lost circulation data to finish the preprocessing of the lost circulation data.
3. The recommendation method for a fractured lost particle-based plugging formulation based on data mining as claimed in claim 2, wherein in the step S11, the specific method for performing data cleaning is as follows: filling missing lost well leakage data, detecting abnormal well leakage data by using a box separation method, and filling the abnormal well leakage data;
in the step S12, character type lost circulation data in the optimized lost circulation data are subjected to characteristic coding by adopting One-Hot coding;
the step S13 includes the following sub-steps:
s131: carrying out mean normalization on the encoded lost circulation data, updating the lost circulation characteristic value of the encoded lost circulation data, and carrying out the jth lost circulation characteristic value of the ith encoded lost circulation data after normalization
Figure FDA0003614117090000011
The calculation formula of (2) is as follows:
Figure FDA0003614117090000012
wherein the content of the first and second substances,
Figure FDA0003614117090000013
representing the original lost circulation characteristic value, M representing the total number of lost circulation data samples, N representing the total number of lost circulation data characteristics, mu j The mean value of the lost circulation characteristic is represented,
Figure FDA0003614117090000014
a standard deviation representing a lost circulation characteristic;
s132: and calculating a covariance matrix sigma of the encoded lost circulation data after updating the lost circulation characteristic value, wherein the calculation formula is as follows:
Figure FDA0003614117090000015
wherein x is (i) A feature vector representing an ith lost circulation data sample;
s133: singular value decomposition is carried out on the covariance matrix sigma to obtain an eigenvector matrix U and a square matrix S of the covariance matrix sigma, and dimension reduction of the well leakage data is determined according to the square matrix S;
s134: according to the characteristic direction of covariance matrixReducing the dimension of the quantity matrix U, determining the lost circulation characteristic dimension of the dimension-reduced coded lost circulation data, calculating the lost circulation characteristic vector of the dimension-reduced coded lost circulation data, completing data specification, and finishing the lost circulation characteristic vector z of the ith coded lost circulation data after dimension reduction (i) The calculation formula of (2) is as follows:
Figure FDA0003614117090000021
wherein, U reduce Representing a lost circulation data dimension reduction matrix;
in the step S134, the lost circulation characteristic dimension k of the encoded lost circulation data after dimension reduction satisfies
Figure FDA0003614117090000022
Wherein S is ii Representing data points on the diagonal of the square matrix S.
4. The data mining-based fractured leakage particle-based plugging formulation recommendation method according to claim 1, wherein the step S2 comprises the following sub-steps:
s21: calculating the similarity of numerical data and the similarity of character data in the preprocessed well leakage data;
s22: calculating the overall similarity of the lost circulation data according to the similarity of the numerical data and the similarity of the character data;
s23: (d, cd, fp) constructing numerical data 1 ,fp 2 ) (r, cr, sp) of sensitive LSH function family and character type data 1 ,sp 2 ) A family of sensitive LSH functions, where c represents the approximate near factor of the lost circulation data, d represents the Euclidean distance sensitivity range of the numerical data, fp 1 Represents the lower limit of the numerical data similarity probability fp 2 Representing the upper limit of the probability of similarity of the numerical data, r representing the Jaccard distance sensitivity range, sp, of the character-type data 1 Representing a lower limit of probability of similarity, sp, of character-type data 2 Representing an upper limit of probability of similarity of character-type data;
s24: and constructing a binary mixed index of the lost circulation data according to the overall similarity of the lost circulation data, the LSH function family of the numerical data and the LSH function family of the character data to obtain the candidate lost circulation data.
5. The recommendation method for a fracture leakage particle-based plugging formula based on data mining as claimed in claim 4, wherein in said step S21, the calculation formula of the similarity distE of numerical data is as follows:
Figure FDA0003614117090000023
wherein, Euclidean Dist (·) represents the calculation of the lost circulation data object o 1 And o 2 Of the Euclidean distance function of o 1 F-type represents the lost circulation data object o 1 Numerical data of (a), o 2 F-type represents the lost circulation data object o 2 Dmax represents the lost circulation data object o 1 And o 2 Maximum distance between numerical data features;
in step S21, the calculation formula of the similarity distJ of the character-type data is:
Figure FDA0003614117090000024
wherein o is 1 S-type represents the lost circulation data object o 1 Character-type data of o 2 S-type represents the lost circulation data object o 2 Character type data of (1);
in step S22, the calculation formula of the overall similarity dist of the lost circulation data is as follows:
dist=α×distE+(1-α)×distJ
wherein α represents a weighting parameter for both types of lost circulation data;
in the step S23, (d, cd, fp) of numerical data 1 ,fp 2 ) The expression of the sensitive LSH functional family h (o.f-type) is:
Figure FDA0003614117090000031
Figure FDA0003614117090000032
Figure FDA0003614117090000033
where a represents a randomly generated d-dimensional vector, b represents a randomly generated real number between (0, W), c represents an approximate factor of the lost circulation data, d represents the Euclidean distance sensitivity range of the numerical data, W represents a constant, o.f-type represents the numerical data of the lost circulation data object o, t represents an integral variable, fp represents 1 Represents the lower limit of the numerical data similarity probability fp 2 Representing the upper limit of the probability of similarity of numerical data, f 2 (. cndot.) represents a standard regular probability density function;
in the step S23, (r, cr, sp) of character type data 1 ,sp 2 ) The expression of the sensitive LSH functional family h (O.s-type) is:
h(O.s-type)=arg min g(q),q∈O.s-type
where O.s-type represents the lost circulation character-type data set, q represents the character-type data in the lost circulation data object, g (-) represents the random number generation function, r represents the Jaccard distance sensitivity range of the character-type data, sp 1 Representing a lower limit of probability of similarity, sp, of character-type data 2 Representing an upper limit of probability of similarity for the character-type data.
6. The data mining-based fractured leakage particle-based plugging formulation recommendation method according to claim 4, wherein the step S24 comprises the following sub-steps:
s241: randomly selecting k from LSH function family of numerical data and LSH function family of character data 1 And k 2 A Hash function to form well leakage dataThe meta-hybrid index LSH function family G has the expression:
G=g(o)
Figure FDA0003614117090000034
wherein the content of the first and second substances,
Figure FDA0003614117090000035
represents a set of numerical data LSH functions,
Figure FDA0003614117090000036
representing character type data LSH function set, g (o) representing well leakage data binary mixed index LSH function set, o.f-type representing numerical data of well leakage data object o, O.s-type representing well leakage character type data set;
s242: randomly selecting p well leakage data hash functions G from a well leakage data binary mixed index LSH function family G 1 ,…,g p And storing the hash values corresponding to the p well leakage data hash functions into corresponding hash buckets, wherein the calculation formula of the number p of the selected well leakage data hash functions is as follows:
Figure FDA0003614117090000041
where M represents the total number of lost circulation data samples, fp 1 Represents the lower limit of the numerical data similarity probability fp 2 Represents the upper limit of the numerical data similarity probability, sp 1 Representing a lower limit of probability of similarity, sp, of character-type data 2 Representing an upper limit of probability of similarity of character-type data;
s243: in a hash bucket, will conform to distE (o) i ,q)<cd、distJ(o i Q) < cr and dist (o) i Q) < epsilon as candidate lost circulation data, wherein o i Representing arbitrary lost circulation data, q representing a new lost circulation field query object, distE representing the similarity of numerical data, distJ representing the similarity of character data,c represents an approximate factor of the lost circulation data, d represents a Euclidean distance sensitive range of the numerical data, r represents a Jaccard distance sensitive range of the character data, and epsilon represents an overall similarity threshold of the lost circulation data.
7. The data mining-based fractured leakage particle-based plugging formulation recommendation method according to claim 1, wherein the step S3 comprises the following sub-steps:
s31: calculating the discrete cumulative particle size distribution of the plugging formula data in the candidate lost circulation data;
s32: calculating the continuous cumulative particle size distribution of the leaking stoppage formula data by using an interpolation method according to the discrete cumulative particle size distribution of the leaking stoppage formula data;
s33: according to the continuous accumulated particle size distribution of the leaking stoppage formula data, taking the minimum particle size in the leaking stoppage formula data as an initial value, iteratively calculating the accumulated particle size of the leaking stoppage formula data under the current particle size, and determining the key particle size of the leaking stoppage formula according to the accumulated particle size of the leaking stoppage formula data under the current particle size until the maximum particle size is reached;
s34: performing K-means clustering analysis on a plugging formula parameter sample set consisting of key particle sizes of plugging formula and corresponding formula concentrations to obtain clustering center points of clusters divided by each formula parameter sample;
s35: and determining a recommended formula set according to the clustering center points of the clusters divided by each formula parameter sample.
8. The data mining-based recommendation method for a fractured leakage particle-based plugging formula according to claim 7, wherein in the step S31, the discrete cumulative particle size y corresponding to the i +1 th particle size value in the plugging formula data l+1 The calculation formula of (c) is:
y l+1 =y ll+1 ,0≤l≤Q
wherein, y l Indicates the cumulative particle size, eta, corresponding to the ith particle size value l+1 The composition granularity corresponding to the (l + 1) th granularity value is represented, and Q represents the total number of composition granularity distribution intervals;
in the step S32, the continuous cumulative particle size distribution H of the plugging formula data 3 (x) The calculation formula of (2) is as follows:
Figure FDA0003614117090000042
wherein x represents the particle size of the point to be interpolated, x l Representing the particle size, x, of the left end of the point to be interpolated l+1 Representing the granularity value, y, of the right end point of the point to be interpolated l Representing the cumulative distribution of the left-hand particle size values, y l+1 Representing the cumulative distribution, y, corresponding to the particle size value of the right end point l ' cumulative distribution derivative, y, corresponding to the particle size value at the left end l+1 ' represents the cumulative distribution derivative corresponding to the right end point particle size value;
in step S33, the calculation formula of the cumulative particle size gr of the plugging formula data at the current particle size is:
Figure FDA0003614117090000051
wherein n represents the total number of the plugging formula materials, epsilon i Denotes the concentration of the i-th material, y il-1 Denotes the cumulative particle size, y, corresponding to the i-th 1-th particle size value of the i-th material il Represents the cumulative particle size, rho, corresponding to the ith particle size value of the ith material i The density of the ith material is shown, and V represents the total volume of the formula;
in step S33, the specific method for determining the key particle size of the plugging formula is as follows: if the cumulative particle size gr of the plugging formula data under the current particle size reaches 10%, the particle size is the critical particle size D10 of the plugging formula; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 50%, the particle size is the critical particle size D50 of the plugging formula; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 90%, the particle size is the critical particle size D90 of the plugging formula.
9. The data mining-based fractured leakage particle-based plugging formulation recommendation method according to claim 7, wherein the step S34 comprises the following sub-steps:
s341: in the leaking stoppage formula parameter sample set E, calculating the distance between each formula parameter sample and the corresponding initial formula parameter mean vector, and determining the nearest distance and the distance d between each formula parameter sample and the corresponding formula parameter mean vector ji The calculation formula of (2) is as follows:
d ji =||e ji || 2
wherein e is j Represents each recipe parameter sample, μ i An initial recipe parameter mean vector representing each recipe parameter sample;
s342: according to the nearest distance, determining cluster marks of all formula parameter samples, dividing all formula parameter samples into corresponding clusters according to the cluster marks of all formula parameter samples, and marking the cluster of all formula parameter samples i and the cluster of all formula parameter samples C newi The calculation formulas of (A) and (B) are respectively as follows:
i=arg min d ji
C newi =C i ∪{e j };
wherein, C i Representing the original recipe parameter sample cluster division;
s343: calculating the formula parameter mean vector of each formula parameter sample after clustering, updating the formula parameter mean vector and performing cluster division again when the formula parameter mean vectors before and after clustering of each formula parameter sample are inconsistent, determining the cluster center point of each formula parameter sample divided cluster and the formula parameter mean vector mu 'of each formula parameter sample after clustering' i The calculation formula of (2) is as follows:
Figure FDA0003614117090000061
where e represents a sample in the recipe parameter sample cluster.
10. The data mining based fractured leakage particle-based lost circulation formulation recommendation method according to claim 1, wherein the step S35 comprises the following sub steps:
s351: setting the random sampling times of the recommended formula;
s352: randomly adding the on-site plugging material to obtain a plugging formula;
s353: judging whether the current plugging formula is recommended or not, if so, returning to the step S352, and otherwise, entering the step S354;
s354: judging whether the particle size of the current leaking stoppage formula meets the requirement of a recommended formula set or not according to the clustering center point of the cluster divided by each formula parameter sample, if so, adding the current leaking stoppage formula into the recommended formula set, otherwise, returning to the step S352 until the set random sampling times are reached, and determining the final recommended formula set;
in the step S351, the random addition amount of each formula parameter sample
Figure FDA0003614117090000062
The total concentration S of the formula and the leaking stoppage formula meets the requirement
Figure FDA0003614117090000063
Wherein m represents the random selection quantity of the existing plugging materials on site,
Figure FDA0003614117090000064
the addition amount of the i-th plugging material is shown;
in the step S354, a formula set is recommended, and if the particle size of the current plugging formula is D10, the requirement of the recommended formula set is
Figure FDA0003614117090000065
If the particle size of the current plugging formula is D50, the requirements of the recommended formula set are
Figure FDA0003614117090000066
If the particle size of the current plugging formula is D90, the requirements of the recommended formula set are
Figure FDA0003614117090000067
Wherein, D10 re The parameter value of the recommended grain size D10 of the plugging formula is shown as D10 acc The parameter value D10 of the particle size of the plugging formula at the cluster center point is represented as D50 re The parameter value of the recommended grain size D50 of the plugging formula is shown as D50 acc The parameter value D50 of the particle size of the plugging formula at the cluster center point is represented as D90 re The parameter value of the recommended grain size D90 of the plugging formula is shown as D90 acc And (4) a parameter value representing the particle size D90 of the plugging formula at the cluster center point.
CN202210441433.5A 2022-04-25 2022-04-25 Data mining-based fractured leakage particle-based plugging formula recommendation method Active CN114970676B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210441433.5A CN114970676B (en) 2022-04-25 2022-04-25 Data mining-based fractured leakage particle-based plugging formula recommendation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210441433.5A CN114970676B (en) 2022-04-25 2022-04-25 Data mining-based fractured leakage particle-based plugging formula recommendation method

Publications (2)

Publication Number Publication Date
CN114970676A true CN114970676A (en) 2022-08-30
CN114970676B CN114970676B (en) 2023-02-24

Family

ID=82979010

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210441433.5A Active CN114970676B (en) 2022-04-25 2022-04-25 Data mining-based fractured leakage particle-based plugging formula recommendation method

Country Status (1)

Country Link
CN (1) CN114970676B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116146190A (en) * 2023-02-24 2023-05-23 西南石油大学 Underground leakage or overflow early warning device and method based on bidirectional flow measurement

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102061151A (en) * 2009-11-13 2011-05-18 中国石油化工股份有限公司 Plugging bearing material for petroleum drilling, and preparation method and application thereof
CN104121014A (en) * 2014-06-16 2014-10-29 西南石油大学 Method for diagnosing type of leakage of drilled well based on neural network fusion technique
US20160333247A1 (en) * 2014-02-18 2016-11-17 Halliburton Energy Services, Inc. Multi-modal particle size distribution lost circulation material
CN107506480A (en) * 2017-09-13 2017-12-22 浙江工业大学 A kind of excavated based on comment recommends method with the double-deck graph structure of Density Clustering
CN108868687A (en) * 2017-05-15 2018-11-23 中国石油化工股份有限公司 A kind of method of leak-proof leak-stopping
CN111738620A (en) * 2020-07-17 2020-10-02 西南石油大学 Well leakage risk prediction and leakage stoppage decision system and method based on association rules
CN112500042A (en) * 2020-12-02 2021-03-16 中联煤层气有限责任公司 Elastic-toughness well cementation cement slurry suitable for coal bed gas and preparation method thereof
CN113051469A (en) * 2021-03-05 2021-06-29 广东工业大学 Subject selection recommendation method based on K-clustering algorithm
CN113111586A (en) * 2021-04-19 2021-07-13 西南石油大学 Drilling well plugging formula prediction method based on neural network
CN113936750A (en) * 2021-11-02 2022-01-14 天津渤海中联石油科技有限公司 Method and equipment for optimizing proportion of plugging material

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102061151A (en) * 2009-11-13 2011-05-18 中国石油化工股份有限公司 Plugging bearing material for petroleum drilling, and preparation method and application thereof
US20160333247A1 (en) * 2014-02-18 2016-11-17 Halliburton Energy Services, Inc. Multi-modal particle size distribution lost circulation material
CN104121014A (en) * 2014-06-16 2014-10-29 西南石油大学 Method for diagnosing type of leakage of drilled well based on neural network fusion technique
CN108868687A (en) * 2017-05-15 2018-11-23 中国石油化工股份有限公司 A kind of method of leak-proof leak-stopping
CN107506480A (en) * 2017-09-13 2017-12-22 浙江工业大学 A kind of excavated based on comment recommends method with the double-deck graph structure of Density Clustering
CN111738620A (en) * 2020-07-17 2020-10-02 西南石油大学 Well leakage risk prediction and leakage stoppage decision system and method based on association rules
CN112500042A (en) * 2020-12-02 2021-03-16 中联煤层气有限责任公司 Elastic-toughness well cementation cement slurry suitable for coal bed gas and preparation method thereof
CN113051469A (en) * 2021-03-05 2021-06-29 广东工业大学 Subject selection recommendation method based on K-clustering algorithm
CN113111586A (en) * 2021-04-19 2021-07-13 西南石油大学 Drilling well plugging formula prediction method based on neural network
CN113936750A (en) * 2021-11-02 2022-01-14 天津渤海中联石油科技有限公司 Method and equipment for optimizing proportion of plugging material

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GUI WANG,ET AL: "Discrete Element Simulation of Granular Lost Circulation Material Plugging a Fracture", 《PARTICULATE SCIENCE AND TECHNOLOGY: AN INTERNATIONAL JOURNAL》 *
袁锦彪 等: "页岩气油基钻井液堵漏技术及其在长宁区块应用", 《钻采工艺》 *
邓正强 等: "川渝地区防漏堵漏智能辅助决策平台研究与应用", 《石油钻采工艺》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116146190A (en) * 2023-02-24 2023-05-23 西南石油大学 Underground leakage or overflow early warning device and method based on bidirectional flow measurement

Also Published As

Publication number Publication date
CN114970676B (en) 2023-02-24

Similar Documents

Publication Publication Date Title
CN109779604B (en) Modeling method for diagnosing lost circulation and method for diagnosing lost circulation
Li et al. Study on intelligent prediction for risk level of lost circulation while drilling based on machine learning
CN114970676B (en) Data mining-based fractured leakage particle-based plugging formula recommendation method
CN112561356B (en) Comprehensive evaluation method for production dynamic mode and productivity of shale oil fracturing horizontal well
Aljubran et al. Deep learning and time-series analysis for the early detection of lost circulation incidents during drilling operations
CN115345378B (en) Shale gas well yield evaluation method based on machine learning
CN114139458B (en) Drilling parameter optimization method based on machine learning
CN113073959B (en) Drilling well loss mechanism prediction and auxiliary decision method based on fuzzy decision tree
CN114482990A (en) Method and device for well drilling overflow early warning
CN115438823A (en) Borehole wall instability mechanism analysis and prediction method and system
Vilela et al. Fuzzy logic applied to value of information assessment in oil and gas projects
Cao et al. Feature investigation on the ROP machine learning model using realtime drilling data
CN117035197A (en) Intelligent lost circulation prediction method with minimized cost
CN113052374B (en) Data-driven intelligent prediction method for casing loss depth of oil well
CN116882292A (en) Lost circulation overflow early warning method based on LightGBM and anomaly detection algorithm
CN113269436B (en) River happiness grade assessment method based on KNN algorithm
Carpenter Data mining effective for casing-failure prediction and prevention
CN113673771A (en) Shale gas horizontal well fracturing segmentation method
CN113158561A (en) TBM operation parameter optimization method and system suitable for various rock mass conditions
CN114077861A (en) Method and system for identifying lithology in real time in drilling process
Aslam et al. Capacitance Resistance Clustered Model for Mature Peripheral Waterflood Performance Prediction & Optimization
López-Yáñez et al. Multivariate Prediction Based on the Gamma Classifier: A Data Mining Application to Petroleum Engineering
CN116070767B (en) Drilling fluid leakage horizon prediction method based on capsule neural network
CN114662390B (en) SVR algorithm-based well drilling leakage pressure prediction method
Ounsakul et al. Novel Sand Control Design Method Using Data-Driven Approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant