CN114970676A

CN114970676A - Data mining-based fractured leakage particle-based plugging formula recommendation method

Info

Publication number: CN114970676A
Application number: CN202210441433.5A
Authority: CN
Inventors: 王贵; 何杰; 徐生江; 曹成
Original assignee: Southwest Petroleum University
Current assignee: Southwest Petroleum University
Priority date: 2022-04-25
Filing date: 2022-04-25
Publication date: 2022-08-30
Anticipated expiration: 2042-04-25
Also published as: CN114970676B

Abstract

The invention discloses a data mining-based fractured leakage particle-based plugging formula recommendation method, which comprises the following steps of: s1: acquiring lost circulation data and preprocessing the lost circulation data; s2: carrying out similarity analysis on the preprocessed lost circulation data to obtain candidate lost circulation data; s3: and performing leaking stoppage formula granularity analysis and clustering on leaking stoppage formula data in the candidate well leakage data, and determining a recommended formula set. The invention overcomes the defect that the leakage stoppage construction greatly depends on the experimental judgment of a leakage stoppage method by engineering technicians, carries out data mining on the leakage data through similarity analysis and a clustering algorithm, finally achieves the effect of recommending a field operation particle-based leakage stoppage formula in real time, and has positive practical significance for quick decision of a field leakage treatment scheme, selection of a scientific and reasonable drilling fluid leakage stoppage formula, improvement of the safety of a drilling project and one-time success rate of drilling leakage stoppage operation.

Description

Data mining-based fractured leakage particle-based plugging formula recommendation method

Technical Field

The invention belongs to the technical field of leakage stoppage of petroleum drilling engineering, and particularly relates to a method for recommending a fractured leakage particle-based leakage stoppage formula based on data mining.

Background

The leakage is a complex condition that working fluid (drilling fluid, workover fluid, well cementing cement slurry and the like) in a well leaks into a stratum in the operation processes of drilling, well completion and the like of petroleum and natural gas exploration and development, and the fracture leakage is the most common leakage type and is a long-standing and difficult-to-solve technical problem. Lost circulation not only consumes drilling time and loses mud, but also can cause a series of complex conditions such as drilling sticking, blowout, well collapse and the like, even cause well abandonment and cause great economic loss, so that the lost circulation must be effectively controlled.

The leakage stoppage is a post-treatment step for the leakage, and is the most important and indispensable ring in the drilling operation process. For the research on the work of plugging, the former work mainly focuses on: 1. research of a formation leakage model: the stratum leakage model research mainly focuses on the research of the influence factors of the drilling fluid leakage and the inversion of the stratum conditions around the well, so as to provide basic data for the selection of the plugging material; 2. the theory of borehole strengthening: aiming at the mechanism of plugging a crack leakage passage by a plugging material and the reason of improving the bearing capacity of a well wall after plugging the crack, a plurality of theories are provided, such as a ' tail sealing (Tip screening) ' theory, a ' Stress Cage ' (Stress Cage) ' theory, a ' crack closing Stress ' (crack Closure Stress) ' theory, a ' crack Propagation Resistance ' (crack Propagation Resistance) ' theory and the like; 3. the size selection criterion of the plugging material is as follows: the size and the particle size distribution of the particles are decisive factors of bridging and blocking behaviors of the particles in the cracks, and only a particle system with proper size and particle size distribution can effectively bridge and block in pore throats or cracks, so that particle size distribution selection standards such as'd 1/2 rule', '1/3 bridging rule' and the like based on an ideal filling theory are provided. Although the previous theories and experimental researches are rich, the similarity research of well leakage data and the data mining of key parameters of a leaking stoppage formula are not involved.

Disclosure of Invention

The invention provides a method for recommending a fractured leakage particle-based leaking stoppage formula based on data mining in order to solve the problems.

The technical scheme of the invention is as follows: a data mining-based fractured leakage particle-based plugging formula recommendation method comprises the following steps:

s1: acquiring lost circulation data and preprocessing the lost circulation data;

s2: carrying out similarity analysis on the preprocessed lost circulation data to obtain candidate lost circulation data;

s3: and performing leak stopping formula granularity analysis and clustering on the leak stopping formula data in the candidate well leakage data to determine a recommended formula set.

Further, step S1 includes the following sub-steps:

s11: performing data cleaning on the lost circulation data to obtain optimized lost circulation data;

s12: performing characteristic coding on the optimized lost circulation data to obtain coded lost circulation data;

s13: and carrying out data specification on the encoded lost circulation data to finish the preprocessing of the lost circulation data.

Further, in step S11, the specific method for performing data cleansing is as follows: filling missing lost well leakage data, detecting abnormal well leakage data by using a box separation method, and filling the abnormal well leakage data;

in the step S12, character type lost circulation data in the optimized lost circulation data are subjected to characteristic coding by adopting One-Hot coding;

step S13 includes the following substeps:

s131: carrying out mean normalization on the encoded lost circulation data, updating the lost circulation characteristic value of the encoded lost circulation data, and carrying out the jth lost circulation characteristic value of the ith encoded lost circulation data after normalization

The calculation formula of (2) is as follows:

wherein the content of the first and second substances,

representing the original lost circulation characteristic value, M representing the total number of lost circulation data samples, N representing the number of lost circulation data samplesTotal number of symbols, mu _j The mean value of the lost circulation characteristic is represented,

a standard deviation representing a lost circulation characteristic;

s132: calculating a covariance matrix sigma of the encoded lost circulation data after updating the lost circulation eigenvalue, and performing singular value decomposition on the covariance matrix sigma to obtain an eigenvector matrix U of the covariance matrix, wherein the calculation formula is as follows:

wherein x is ⁽ⁱ⁾ A feature vector representing an ith lost circulation data sample;

s133: singular value decomposition is carried out on the covariance matrix sigma to obtain an eigenvector matrix U and a square matrix S of the covariance matrix sigma, and the dimension reduction dimension of the lost circulation data is determined according to the square matrix S;

s134: performing dimension reduction according to the eigenvector matrix U of the covariance matrix, determining the leakage characteristic dimension of the encoded leakage data after dimension reduction, calculating the leakage eigenvector of the encoded leakage data after dimension reduction, completing data specification, and performing dimension reduction on the leakage eigenvector z of the ith encoded leakage data ⁽ⁱ⁾ The calculation formula of (2) is as follows:

wherein, U _reduce Representing a lost circulation data dimension reduction matrix;

in step S134, the lost circulation characteristic dimension k of the encoded lost circulation data after dimension reduction satisfies

Wherein S is _ii Representing data points on the diagonal of the square matrix S.

Further, step S2 includes the following sub-steps:

s21: calculating the similarity of numerical data and the similarity of character data in the preprocessed well leakage data;

s22: calculating the overall similarity of the lost circulation data according to the similarity of the numerical data and the similarity of the character data;

s23: (d, cd, fp) constructing numerical data ₁ ,fp ₂ ) (r, cr, sp) of sensitive LSH function family and character type data ₁ ,sp ₂ ) A family of sensitive LSH functions, where c represents the approximate near factor of the lost circulation data, d represents the Euclidean distance sensitivity range of the numerical data, fp ₁ Represents the lower limit of the numerical data similarity probability fp ₂ Representing the upper limit of the probability of similarity of the numerical data, r representing the Jaccard distance sensitivity range, sp, of the character-type data ₁ Representing a lower limit of probability of similarity, sp, of character-type data ₂ Representing an upper limit of probability of similarity of character-type data;

s24: and constructing a binary mixed index of the lost circulation data according to the overall similarity of the lost circulation data, the LSH function family of the numerical data and the LSH function family of the character data to obtain the candidate lost circulation data.

Further, in step S21, the calculation formula of the similarity distE of the numerical data is:

wherein, Euclidean Dist (·) represents the calculation of the lost circulation data object o ₁ And o ₂ Euclidean distance function of between, o ₁ F-type represents the lost circulation data object o ₁ Numerical data of (a), o ₂ F-type represents the lost circulation data object o ₂ Of (d), dmax represents a lost circulation data object o ₁ And o ₂ Maximum distance between numerical data features;

in step S21, the calculation formula of the similarity distJ of character type data is:

wherein o is ₁ ·s-typeRepresenting lost circulation data object o ₁ Character-type data of o ₂ S-type represents the lost circulation data object o ₂ The character type data of (a);

in step S22, the calculation formula of the overall similarity dist of the lost circulation data is:

dist＝α×distE+(1-m)×diStJ

wherein α represents a weighting parameter for both types of lost circulation data;

in step S23, (d, cd, fp) of numerical data ₁ ,fp ₂ ) The expression of the sensitive LSH functional family h (o.f-type) is:

where a represents a randomly generated d-dimensional vector, b represents a randomly generated real number between (0, W), c represents an approximate factor of the lost circulation data, d represents the Euclidean distance sensitivity range of the numerical data, W represents a constant, o.f-type represents the numerical data of the lost circulation data object o, t represents an integral variable, fp represents ₁ Represents the lower limit of the numerical data similarity probability fp ₂ Representing the upper limit of the probability of similarity of numerical data, f ₂ (. cndot.) represents a standard regular probability density function;

in step S23, (r, cr, sp) of character type data ₁ ,sp ₂ ) The expression of the sensitive LSH functional family h (O.s-type) is:

h(O·s-type)＝argming(q)，q∈O·s-type

where O.s-type represents a lost circulation character-type data set, q represents character-type data in a lost circulation data object, g (-) represents a random number generation function, and r represents the Jaccard distance sensitivity range of character-type data，sp ₁ Representing a lower limit of probability of similarity, sp, of character-type data ₂ Representing an upper limit of probability of similarity for the character-type data.

Further, step S24 includes the following sub-steps:

s241: randomly selecting k from LSH function family of numerical data and LSH function family of character data ₁ And k ₂ The hash functions form a well leakage data binary mixed index LSH function family G, and the expression is as follows:

G＝g(o)

wherein h is ₁ (·),…

Representing sets of LSH functions, h, of numerical data ₁ (·),…,

Representing character type data LSH function set, g (o) representing well leakage data binary mixed index LSH function set, o.f-type representing numerical data of well leakage data object o, O.s-type representing well leakage character type data set;

s242: randomly selecting p well leakage data hash functions G from a well leakage data binary mixed index LSH function family G ₁ ,…,g _p And storing the hash values corresponding to the p hash functions of the lost circulation data into corresponding hash buckets, wherein the calculation formula of the number p of the selected hash functions of the lost circulation data is as follows:

where M represents the total number of lost circulation data samples, fp ₁ Represents the lower limit of the numerical data similarity probability fp ₂ Represents the upper limit of the similarity probability of numerical data, sp ₁ Representing a lower limit of probability of similarity, sp, of character-type data ₂ Representing character type data similarity probabilityAn upper limit;

s243: in a hash bucket, will conform to distE (o) _i ,q)<cd、distJ(o _i ,q)<cr and dist (o) _i ,q)<The well leakage data of epsilon is used as candidate well leakage data, wherein, o _i The method comprises the steps of representing any lost circulation data, q representing a new lost circulation field query object, distE representing similarity of numerical data, distJ representing similarity of character data, c representing approximate factor of the lost circulation data, d representing Euclidean distance sensitive range of the numerical data, r representing Jaccard distance sensitive range of the character data, and epsilon representing an overall similarity threshold of the lost circulation data.

Further, step S3 includes the following sub-steps:

s31: calculating the discrete cumulative particle size distribution of the plugging formula data in the candidate lost circulation data;

s32: calculating the continuous cumulative particle size distribution of the leaking stoppage formula data by using an interpolation method according to the discrete cumulative particle size distribution of the leaking stoppage formula data;

s33: according to the continuous accumulated particle size distribution of the leaking stoppage formula data, taking the minimum particle size in the leaking stoppage formula data as an initial value, iteratively calculating the accumulated particle size of the leaking stoppage formula data under the current particle size, and determining the key particle size of the leaking stoppage formula according to the accumulated particle size of the leaking stoppage formula data under the current particle size until the maximum particle size is reached;

s34: performing K-means clustering analysis on a leaking stoppage formula parameter sample set consisting of the key particle size of a leaking stoppage formula and the corresponding formula concentration to obtain the clustering center point of a cluster divided by each formula parameter sample;

s35: and determining a recommended formula set according to the clustering center points of the clusters divided by each formula parameter sample.

Further, in step S31, the discrete cumulative particle size y corresponding to the i +1 th particle size value in the plugging formula data _l+1 The calculation formula of (2) is as follows:

y _l+1 ＝y _l +η _l+1 ，O≤l≤Q

wherein, y _l Indicates the cumulative particle size, eta, corresponding to the ith particle size value _l+1 Set representing the l +1 th particle size value correspondenceGranularity forming, wherein Q represents the total number of composition granularity distribution intervals;

in step S32, the continuous cumulative particle size distribution H of the plugging formula data ₃ (x) The calculation formula of (2) is as follows:

wherein x represents the particle size of the point to be interpolated, x _l Representing the particle size, x, of the left end of the point to be interpolated _l+1 Representing the granularity value, y, of the right end point of the point to be interpolated _l Representing the cumulative distribution of the left-hand particle size values, y _l+1 Representing the cumulative distribution, y, corresponding to the particle size value of the right end point _l ' cumulative distribution derivative, y, corresponding to the particle size value at the left end _l+1 ' represents the cumulative distribution derivative corresponding to the right end point particle size value;

in step S33, the calculation formula of the cumulative particle size gr of the plugging formula data at the current particle size is:

wherein n represents the total number of the plugging formula materials, epsilon _i Denotes the concentration of the i-th material, y _il-1 Denotes the cumulative particle size, y, corresponding to the i-th 1-th particle size value of the i-th material _il Represents the cumulative particle size, rho, corresponding to the ith particle size value of the ith material _i The density of the ith material is shown, and V represents the total volume of the formula;

in step S33, the specific method for determining the key particle size of the plugging formula is as follows: if the cumulative particle size gr of the plugging formula data under the current particle size reaches 10%, the particle size of the current plugging formula is D10; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 50%, the particle size of the current plugging formula is D50; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 90%, the particle size of the current plugging formula is D90.

Further, step S34 includes the following sub-steps:

S341：in the leaking stoppage formula parameter sample set E, calculating the distance between each formula parameter sample and the corresponding initial formula parameter mean vector, and determining the nearest distance, the distance d between each formula parameter sample and the corresponding formula parameter mean vector _ji The calculation formula of (2) is as follows:

d _ji ＝||e _j -μ _i || ₂

wherein e is _j Represents each recipe parameter sample, μ _i An initial recipe parameter mean vector representing each recipe parameter sample;

s342: according to the nearest distance, determining cluster marks of all formula parameter samples, dividing all formula parameter samples into corresponding clusters according to the cluster marks of all formula parameter samples, and marking the cluster of all formula parameter samples i and the cluster of all formula parameter samples C _newi The calculation formulas of (A) and (B) are respectively as follows:

i＝argmind _ji

c _newi ＝c _i ∪{e _j }；

wherein, C _i Representing the original recipe parameter sample cluster division;

s343: calculating the formula parameter mean vector of each formula parameter sample after clustering, updating the formula parameter mean vector and performing cluster division again when the formula parameter mean vectors before and after clustering of each formula parameter sample are inconsistent, determining the cluster center point of each formula parameter sample divided cluster and the formula parameter mean vector mu 'of each formula parameter sample after clustering' _i The calculation formula of (c) is:

where e represents a sample in the recipe parameter sample cluster.

Further, step S35 includes the following sub-steps:

s351: setting the random sampling times of the recommended formula;

s352: randomly adding the on-site plugging material to obtain a plugging formula;

s353: judging whether the current plugging formula is recommended or not, if so, returning to the step S352, and otherwise, entering the step S354;

s354: judging whether the particle size of the current leaking stoppage formula meets the requirement of a recommended formula set or not according to the clustering center point of the cluster divided by each formula parameter sample, if so, adding the current leaking stoppage formula into the recommended formula set, otherwise, returning to the step S352 until the set random sampling times are reached, and determining the final recommended formula set;

in step S351, the random addition of each recipe parameter sample

The total concentration S of the formula and the leaking stoppage formula meets the requirement

Wherein m represents the random selection quantity of the existing plugging materials on site,

the addition amount of the i-th plugging material is shown;

in step S354, a formula set is recommended, and if the particle size of the current plugging formula is D10, the requirement of the recommended formula set is

If the particle size of the current plugging formula is D50, the requirements of the recommended formula set are

If the particle size of the current plugging formula is D90, the requirements of the recommended formula set are

Wherein, D10 _re The parameter value of the recommended grain size D10 of the plugging formula is shown as D10 _acc A parameter value representing the grain size D10 of the plugging formula at the cluster center point,D50 _re the parameter value of the recommended grain size D50 of the plugging formula is shown as D50 _acc The parameter value D50 of the particle size of the plugging formula at the cluster center point is represented as D90 _re The parameter value of the recommended grain size D90 of the plugging formula is shown as D90 _acc And (4) a parameter value representing the particle size D90 of the plugging formula at the cluster center point.

The invention has the beneficial effects that: the invention overcomes the defect that the leakage stoppage construction greatly depends on the experimental judgment of a leakage stoppage method by engineering technicians, carries out data mining on the leakage data through similarity analysis and a clustering algorithm, finally achieves the effect of recommending a field operation particle-based leakage stoppage formula in real time, and has positive practical significance for quick decision of a field leakage treatment scheme, selection of a scientific and reasonable drilling fluid leakage stoppage formula, improvement of the safety of a drilling project and one-time success rate of drilling leakage stoppage operation.

Drawings

Fig. 1 is a flow chart of a method for recommending a fracture lost-particle-based plugging formulation based on data mining.

Detailed Description

The embodiments of the present invention will be further described with reference to the accompanying drawings.

Before describing specific embodiments of the present invention, in order to make the solution of the present invention more clear and complete, the definitions of the abbreviations and key terms appearing in the present invention will be explained first:

a box separation method: the values of the stored data are smoothed by looking at "neighbors" (surrounding values), the depth of a bin is used to indicate that the same number of data are in different bins, and the width of a bin is used to indicate the value interval of each bin value.

One-Hot encoding: an N-bit status register is used to encode N states, each having its own independent register bit and only one of which is active at any one time.

D10: the cumulative particle size distribution of a sample reaches 10% of the corresponding particle size. Its physical meaning is that the particles have a size of less than 10% of its particle size.

D50: the cumulative percent particle size distribution for a sample at 50% corresponds to the particle size. Its physical meaning is that the particle size is greater than 50% of its particles and less than 50% of its particles, D50 also being referred to as the median or median particle size. D50 is commonly used to denote the average particle size of the particles.

D90: the cumulative particle size distribution of a sample reaches 90% of the corresponding particle size. Its physical meaning is that 90% of the particles have a particle size smaller than it.

As shown in fig. 1, the invention provides a data mining-based method for recommending a fractured leakage particle-based leakage stoppage formula, which comprises the following steps:

In the embodiment of the invention, the required sample data characteristics are determined:

1) formation parameters: structure type, lithology, horizon, top bound depth, bottom bound depth;

2) well section parameters: diameter, well depth, well inclination;

3) drilling fluid parameters: system, inlet and outlet flow, density, rheological parameters, drilling fluid pool volume and solid content;

4) engineering parameters are as follows: bit diameter, hook load, drilling rate, bit pressure, torque;

5) leakage parameters: loss speed, loss amount, loss time, loss degree (micro-leakage, large-leakage and loss return loss), loss working condition and drill bit position;

6) plugging parameters: the type of the leaking stoppage slurry (while drilling and stopping drilling), the volume of the leaking stoppage slurry, the leaking stoppage formula (comprising materials, addition and concentration), and the leaking stoppage effect (success, failure and reduction of leaking speed);

7) parameters of the plugging material are as follows: type (such as sheet), specification (such as 1-3 mm), manufacturer, density, and composition particle size distribution.

In the embodiment of the present invention, step S1 includes the following sub-steps:

In the embodiment of the present invention, in step S11, the specific method for performing data cleansing is as follows: filling missing lost well leakage data, detecting abnormal well leakage data by using a box separation method, and filling the abnormal well leakage data;

in the step S12, character type lost circulation data in the optimized lost circulation data are subjected to characteristic coding by adopting One-Hot coding; such as lithology: sandstone, mudstone, carbonate rock, conglomerate, igneous rock; the corresponding codes are: 00001, 00010, 00100, 01000, 10000;

in step S13, the encoded lost circulation data includes M pieces of data, and the entire data set is represented as: x is the number of ⁽¹⁾ ,x( ² ),…,x ^(M) (ii) a Each piece of data comprises N characteristics, and each well leakage data characteristic value can represent:

i is more than or equal to 1 and less than or equal to M, and j is more than or equal to 1 and less than or equal to N. In order to eliminate irrelevant redundant features and useless noise, the dimensionality reduction of the lost circulation features is carried out by PCA principal component analysis, and the method comprises the following sub-steps:

s131: carrying out mean value normalization on the coded well leakage data, updating the well leakage characteristic value of the coded well leakage data, and carrying out the jth well leakage characteristic value of the ith coded well leakage data after normalization

The calculation formula of (2) is as follows:

wherein the content of the first and second substances,

representing the original lost circulation characteristic value, M representing the total number of lost circulation data samples, N representing the total number of lost circulation data characteristics, mu _j A mean value of the lost circulation feature is represented,

a standard deviation representing a lost circulation characteristic;

s133: singular value decomposition is carried out on the covariance matrix sigma to obtain an eigenvector matrix U and a square matrix S of the covariance matrix sigma, and dimension reduction of the well leakage data is determined according to the square matrix S;

the formula for performing singular value decomposition on the covariance matrix Σ is as follows: Σ ═ USV ^T Wherein S represents a diagonal matrix used for determining the dimensionality reduction of the lost circulation data, U represents a lost circulation data deviation matrix, and V represents a lost circulation data variance matrix;

s134: reducing the dimension according to the eigenvector matrix U of the covariance matrix, determining the lost circulation characteristic dimension of the encoded lost circulation data after dimension reduction, calculating the lost circulation characteristic vector of the encoded lost circulation data after dimension reduction, completing the data specification, and obtaining the lost circulation characteristic vector z of the ith encoded lost circulation data after dimension reduction ⁽ⁱ⁾ The calculation formula of (2) is as follows:

wherein, U _reduce Representing a lost circulation data dimension reduction matrix which consists of the first k vectors of the U matrix;

in step S134, encoding lost circulation data after dimension reductionSatisfies the leak characteristic dimension k

Wherein S is _ii Representing the data points on the diagonal of the square matrix S.

In step S133, the lost circulation characteristic dimension k of the encoded lost circulation data after dimension reduction satisfies

Wherein S is _ii Data points on the diagonal of the matrix S are represented for calculating the dimensionality reduction deviation of the lost circulation data, and less than 5% of the data points represent that 95% of the deviation of the lost circulation data is retained.

In the embodiment of the present invention, step S2 includes the following sub-steps:

The preprocessed well leakage data characteristic vector is z ⁽ⁱ⁾ Including numerical and character type features. For numerical features, Euclidean distance is adopted to judge similarity, and for character features, Jaccard distance is adopted to judge phaseSimilarity; by the binary mixed LSH algorithm, the lost circulation data which are most similar to the field lost circulation query example can be quickly found, and a basis is provided for subsequent quick leaking stoppage decision.

In the embodiment of the present invention, in step S21, the calculation formula of the similarity distE of the numerical data is:

wherein Euclidean Dist (·) represents the calculated lost circulation data object o ₁ And o ₂ Of the Euclidean distance function of o ₁ F-type represents the lost circulation data object o ₁ Numerical data of o ₂ F-type represents the lost circulation data object o ₂ Of (d), dmax represents a lost circulation data object o ₁ And o ₂ Maximum distance between numerical data features;

in step S21, the calculation formula of the similarity distJ of character-type data is:

wherein o is ₁ S-type represents the lost circulation data object o ₁ Character-type data of o ₂ S-type represents the lost circulation data object o ₂ Character type data of (1);

in step S22, for any two lost circulation data objects o ₁ ,o ₂ Linear weighted summation is adopted, and the calculation formula of the overall similarity dist of the lost circulation data is as follows:

dist＝α×distE+(1-α)×distJ

where a represents a randomly generated d-dimensional vector, b represents a randomly generated real number between (0, W), c represents an approximate factor of the lost circulation data, d represents the Euclidean distance sensitivity range of the numerical data, W represents a constant, o.f-type represents the numerical data of the lost circulation data object o, t represents an integral variable, fp represents ₁ Represents the lower limit of the numerical data similarity probability fp ₂ Representing the upper limit of the probability of similarity of numerical data, f ₂ (. cndot.) represents a standard regular probability density function; fp ₁ And fp ₂ The binary mixed index is used for subsequently constructing the lost circulation data;

h(O.s-type)＝argming(q)，q∈0.s-type

where O.s-type represents the lost circulation character-type data set, q represents the character-type data in the lost circulation data object, g (-) represents the random number generation function, r represents the Jaccard distance sensitivity range of the character-type data, sp ₁ Represents the lower limit of similarity probability, sp, of character-type data ₂ Representing the upper limit of the probability of similarity, sp, of character-type data ₁ ＝1-r,sp ₂ ＝1-cr，sp ₁ And sp ₂ And the binary mixed index is used for subsequently constructing the well leakage data.

In the embodiment of the present invention, step S24 includes the following sub-steps:

s241: randomly selecting k from LSH function family of numerical data and LSH function family of character data ₁ And k ₂ And the hash functions form a well leakage data binary mixed index LSH function family G, and the expression is as follows:

G＝g(o)

k ₁ and k ₂ The values of (A) are as follows:

wherein h is ₁ (·),…

Representing sets of LSH functions, h, of numerical data ₁ (·),…,

s242: randomly selecting p well leakage data hash functions G from the well leakage data binary mixed index LSH function family G ₁ ,…,g _p And storing the hash values corresponding to the p well leakage data hash functions into corresponding hash buckets, wherein the calculation formula of the number p of the selected well leakage data hash functions is as follows:

where M represents the total number of lost circulation data samples, fp ₁ Represents the lower limit of the numerical data similarity probability fp ₂ Represents the upper limit of the numerical data similarity probability, sp ₁ Representing a lower limit of probability of similarity, sp, of character-type data ₂ Representing an upper limit of probability of similarity of character-type data;

In the embodiment of the present invention, step S3 includes the following sub-steps:

The lost circulation data obtained by the similarity analysis model comprises lost circulation formula data, and the lost circulation formula data is subjected to particle size analysis and mainly comprises four parameters: d10, D50, D90 and the recipe concentration, and then clustering the recipe parameter set using a K-means clustering algorithm and outputting a cluster center point.

In the embodiment of the present invention, in step S31, the (l + 1) th granule in the plugging formula dataDiscrete cumulative granularity y corresponding to value _l+1 The calculation formula of (2) is as follows:

y _l+1 ＝y _l +η _l+1 ，0≤l≤Q

y ₀ ＝η ₀

wherein, y _l Indicates the cumulative particle size, eta, corresponding to the ith particle size value _l+1 The composition granularity corresponding to the (l + 1) th granularity value is represented, and Q represents the total number of composition granularity distribution intervals; eta ₀ Denotes the starting point composition particle size, y ₀ Representing a starting point cumulative granularity;

wherein, the total number of the plugging formula materials is shown as epsilon _i Denotes the concentration of the i-th material, y _il-1 Denotes the cumulative particle size, y, corresponding to the i-th 1-th particle size value of the i-th material _il Represents the cumulative particle size, rho, corresponding to the ith particle size value of the ith material _i The density of the ith material is shown, and V is the total volume of the formula;

in step S33, the specific method for determining the key particle size of the plugging formula is as follows: if the cumulative particle size gr of the plugging formula data under the current particle size reaches 10%, the critical particle size of the plugging formula is D10; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 50%, the critical particle size of the plugging formula is D50; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 90%, the critical particle size of the plugging formula is D90.

In the embodiment of the present invention, step S34 includes the following sub-steps:

s341: in the leaking stoppage formula parameter sample set E, calculating the distance between each formula parameter sample and the corresponding initial formula parameter mean vector, and determining the nearest distance, the distance d between each formula parameter sample and the corresponding formula parameter mean vector _ji The calculation formula of (2) is as follows:

d _ji ＝||e _j -μ _i || ₂

i＝argmin _ji

C _newi ＝C _i ∪{e _j }；

s343: calculating the formula parameter mean vector of each formula parameter sample after clustering, updating the formula parameter mean vector and performing cluster division again when the formula parameter mean vectors before and after clustering of each formula parameter sample are inconsistent, determining the cluster center point of each formula parameter sample divided cluster and the formula parameter mean vector mu 'of each formula parameter sample after clustering' _i The calculation formula of (2) is as follows:

where e represents a sample in the recipe parameter sample cluster.

In the embodiment of the present invention, step S35 includes the following sub-steps:

s351: setting the random sampling times of the recommended formula;

taking a clustering center point mu 'output by a clustering algorithm' _i The four parameters of D10, D50 and D90 and the total concentration of the formula are input into a leaking stoppage formula recommendation algorithm to obtain a recommended formula, so that quick decision support is provided for field leaking stoppage.

In step S351, the random addition of each recipe parameter sample

The total concentration S of the formula and the plugging formula meets the requirement

the addition amount of the i-th plugging material is shown;

Wherein, D10 _re The parameter value of the recommended grain size D10 of the plugging formula is shown as D10 _acc The parameter value of the plugging particle size D10 of the cluster center point plugging formula D50 _re The parameter value of the recommended grain size D50 of the plugging formula is shown as D50 _acc The parameter value D50 of the particle size of the plugging formula at the cluster center point is represented as D90 _re The parameter value of the recommended grain size D90 of the plugging formula is shown as D90 _acc And (4) a parameter value representing the particle size D90 of the plugging formula at the cluster center point.

The working principle and the process of the invention are as follows: firstly, establishing a lost circulation database by using lost circulation data, and carrying out pretreatment such as data cleaning, characteristic coding, data reduction and the like on the lost circulation data; then, carrying out similarity analysis on the preprocessed lost circulation data by using a binary mixed LSH algorithm, and inquiring the lost circulation data of the well history with the similarity larger than a set threshold value according to the new lost circulation data; secondly, calculating the characteristic granularity of the inquired historical leaking stoppage formula, training a K-mean clustering model by taking leaking stoppage formulas D10, D50, D90 and concentration as input characteristics, and outputting a clustering center point; and finally, recommending a particle-based plugging formula by using the clustering center point.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A method for recommending a fractured lost particle-based plugging formula based on data mining is characterized by comprising the following steps of:

2. The data mining-based fractured leakage particle-based plugging formulation recommendation method according to claim 1, wherein the step S1 comprises the following sub-steps:

3. The recommendation method for a fractured lost particle-based plugging formulation based on data mining as claimed in claim 2, wherein in the step S11, the specific method for performing data cleaning is as follows: filling missing lost well leakage data, detecting abnormal well leakage data by using a box separation method, and filling the abnormal well leakage data;

the step S13 includes the following sub-steps:

The calculation formula of (2) is as follows:

wherein the content of the first and second substances,

representing the original lost circulation characteristic value, M representing the total number of lost circulation data samples, N representing the total number of lost circulation data characteristics, mu _j The mean value of the lost circulation characteristic is represented,

a standard deviation representing a lost circulation characteristic;

s132: and calculating a covariance matrix sigma of the encoded lost circulation data after updating the lost circulation characteristic value, wherein the calculation formula is as follows:

s134: according to the characteristic direction of covariance matrixReducing the dimension of the quantity matrix U, determining the lost circulation characteristic dimension of the dimension-reduced coded lost circulation data, calculating the lost circulation characteristic vector of the dimension-reduced coded lost circulation data, completing data specification, and finishing the lost circulation characteristic vector z of the ith coded lost circulation data after dimension reduction ⁽ⁱ⁾ The calculation formula of (2) is as follows:

in the step S134, the lost circulation characteristic dimension k of the encoded lost circulation data after dimension reduction satisfies

4. The data mining-based fractured leakage particle-based plugging formulation recommendation method according to claim 1, wherein the step S2 comprises the following sub-steps:

s23: (d, cd, fp) constructing numerical data ₁ ，fp ₂ ) (r, cr, sp) of sensitive LSH function family and character type data ₁ ，sp ₂ ) A family of sensitive LSH functions, where c represents the approximate near factor of the lost circulation data, d represents the Euclidean distance sensitivity range of the numerical data, fp ₁ Represents the lower limit of the numerical data similarity probability fp ₂ Representing the upper limit of the probability of similarity of the numerical data, r representing the Jaccard distance sensitivity range, sp, of the character-type data ₁ Representing a lower limit of probability of similarity, sp, of character-type data ₂ Representing an upper limit of probability of similarity of character-type data;

5. The recommendation method for a fracture leakage particle-based plugging formula based on data mining as claimed in claim 4, wherein in said step S21, the calculation formula of the similarity distE of numerical data is as follows:

wherein, Euclidean Dist (·) represents the calculation of the lost circulation data object o ₁ And o ₂ Of the Euclidean distance function of o ₁ F-type represents the lost circulation data object o ₁ Numerical data of (a), o ₂ F-type represents the lost circulation data object o ₂ Dmax represents the lost circulation data object o ₁ And o ₂ Maximum distance between numerical data features;

in step S21, the calculation formula of the similarity distJ of the character-type data is:

in step S22, the calculation formula of the overall similarity dist of the lost circulation data is as follows:

dist＝α×distE+(1-α)×distJ

in the step S23, (d, cd, fp) of numerical data ₁ ，fp ₂ ) The expression of the sensitive LSH functional family h (o.f-type) is:

in the step S23, (r, cr, sp) of character type data ₁ ，sp ₂ ) The expression of the sensitive LSH functional family h (O.s-type) is:

h(O.s-type)＝arg min g(q)，q∈O.s-type

where O.s-type represents the lost circulation character-type data set, q represents the character-type data in the lost circulation data object, g (-) represents the random number generation function, r represents the Jaccard distance sensitivity range of the character-type data, sp ₁ Representing a lower limit of probability of similarity, sp, of character-type data ₂ Representing an upper limit of probability of similarity for the character-type data.

6. The data mining-based fractured leakage particle-based plugging formulation recommendation method according to claim 4, wherein the step S24 comprises the following sub-steps:

s241: randomly selecting k from LSH function family of numerical data and LSH function family of character data ₁ And k ₂ A Hash function to form well leakage dataThe meta-hybrid index LSH function family G has the expression:

G＝g(o)

wherein the content of the first and second substances,

represents a set of numerical data LSH functions,

s242: randomly selecting p well leakage data hash functions G from a well leakage data binary mixed index LSH function family G ₁ ，…，g _p And storing the hash values corresponding to the p well leakage data hash functions into corresponding hash buckets, wherein the calculation formula of the number p of the selected well leakage data hash functions is as follows:

s243: in a hash bucket, will conform to distE (o) _i ，q)＜cd、distJ(o _i Q) < cr and dist (o) _i Q) < epsilon as candidate lost circulation data, wherein o _i Representing arbitrary lost circulation data, q representing a new lost circulation field query object, distE representing the similarity of numerical data, distJ representing the similarity of character data,c represents an approximate factor of the lost circulation data, d represents a Euclidean distance sensitive range of the numerical data, r represents a Jaccard distance sensitive range of the character data, and epsilon represents an overall similarity threshold of the lost circulation data.

7. The data mining-based fractured leakage particle-based plugging formulation recommendation method according to claim 1, wherein the step S3 comprises the following sub-steps:

s34: performing K-means clustering analysis on a plugging formula parameter sample set consisting of key particle sizes of plugging formula and corresponding formula concentrations to obtain clustering center points of clusters divided by each formula parameter sample;

8. The data mining-based recommendation method for a fractured leakage particle-based plugging formula according to claim 7, wherein in the step S31, the discrete cumulative particle size y corresponding to the i +1 th particle size value in the plugging formula data _l+1 The calculation formula of (c) is:

y _l+1 ＝y _l +η _l+1 ，0≤l≤Q

wherein, y _l Indicates the cumulative particle size, eta, corresponding to the ith particle size value _l+1 The composition granularity corresponding to the (l + 1) th granularity value is represented, and Q represents the total number of composition granularity distribution intervals;

in the step S32, the continuous cumulative particle size distribution H of the plugging formula data ₃ (x) The calculation formula of (2) is as follows:

in step S33, the specific method for determining the key particle size of the plugging formula is as follows: if the cumulative particle size gr of the plugging formula data under the current particle size reaches 10%, the particle size is the critical particle size D10 of the plugging formula; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 50%, the particle size is the critical particle size D50 of the plugging formula; if the cumulative particle size gr of the plugging formula data under the current particle size reaches 90%, the particle size is the critical particle size D90 of the plugging formula.

9. The data mining-based fractured leakage particle-based plugging formulation recommendation method according to claim 7, wherein the step S34 comprises the following sub-steps:

s341: in the leaking stoppage formula parameter sample set E, calculating the distance between each formula parameter sample and the corresponding initial formula parameter mean vector, and determining the nearest distance and the distance d between each formula parameter sample and the corresponding formula parameter mean vector _ji The calculation formula of (2) is as follows:

d _ji ＝||e _j -μ _i || ₂

i＝arg min d _ji

C _newi ＝C _i ∪｛e _j }；

where e represents a sample in the recipe parameter sample cluster.

10. The data mining based fractured leakage particle-based lost circulation formulation recommendation method according to claim 1, wherein the step S35 comprises the following sub steps:

s351: setting the random sampling times of the recommended formula;

in the step S351, the random addition amount of each formula parameter sample

the addition amount of the i-th plugging material is shown;

in the step S354, a formula set is recommended, and if the particle size of the current plugging formula is D10, the requirement of the recommended formula set is

Wherein, D10 _re The parameter value of the recommended grain size D10 of the plugging formula is shown as D10 _acc The parameter value D10 of the particle size of the plugging formula at the cluster center point is represented as D50 _re The parameter value of the recommended grain size D50 of the plugging formula is shown as D50 _acc The parameter value D50 of the particle size of the plugging formula at the cluster center point is represented as D90 _re The parameter value of the recommended grain size D90 of the plugging formula is shown as D90 _acc And (4) a parameter value representing the particle size D90 of the plugging formula at the cluster center point.