CN117668584A - Rapid preparation method and system of oil well working condition diagnosis data set - Google Patents

Rapid preparation method and system of oil well working condition diagnosis data set Download PDF

Info

Publication number
CN117668584A
CN117668584A CN202311388141.0A CN202311388141A CN117668584A CN 117668584 A CN117668584 A CN 117668584A CN 202311388141 A CN202311388141 A CN 202311388141A CN 117668584 A CN117668584 A CN 117668584A
Authority
CN
China
Prior art keywords
data
indicator diagram
data set
clustering
diagram
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311388141.0A
Other languages
Chinese (zh)
Inventor
王相
邵志伟
芮诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou University
Original Assignee
Changzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou University filed Critical Changzhou University
Priority to CN202311388141.0A priority Critical patent/CN117668584A/en
Publication of CN117668584A publication Critical patent/CN117668584A/en
Pending legal-status Critical Current

Links

Landscapes

  • Testing Of Devices, Machine Parts, Or Other Structures Thereof (AREA)

Abstract

The invention discloses a method and a system, which relate to the technical field of petroleum exploitation and comprise the following steps: collecting data of an oilfield data center; drawing an indicator diagram based on preprocessing the original data; outputting a numerical vector of the indicator diagram through characteristic conversion of the indicator diagram; and carrying out cluster analysis on the feature vectors, outputting a clustering result and checking. The rapid preparation method of the oil well working condition diagnosis data set provided by the invention uses the clustering analysis of the feature vector of the indicator diagram, ensures that the tiny features of the indicator diagram are not ignored, improves the accuracy of data analysis, checks displacement load, ensures that data is not missed or wrong, improves the data quality, and can evaluate more rapidly by observing the line diagram of the SSE. The invention achieves better effects in terms of execution cost, data set quality and completion time.

Description

Rapid preparation method and system of oil well working condition diagnosis data set
Technical Field
The invention relates to the technical field of oil exploitation, in particular to a method and a system for rapidly preparing an oil well working condition diagnosis data set.
Background
The pumping unit is one of important equipment for petroleum exploitation, and because of the large number and wide distribution of oil wells, equipment such as a pumping rod, an oil pump and the like often reach in the oil well with the depth of thousands of meters, the fault type is difficult to visually judge, the fault is difficult to accurately find in time, and the economic loss is serious. Along with the continuous deep construction of the oil field intellectualization, a large number of sensors are assembled in an oil well production system to collect various production data of the oil well in real time.
The existing oil well production monitoring technology has the following defects: limited data accuracy: the accuracy of the sensors and monitoring devices limits the accuracy of the data collected. For some subtle changes or anomalies, existing monitoring systems may not be able to accurately detect and capture. Delay of data acquisition and transmission: there may be delays in the acquisition, transmission and processing of the monitoring data. This means that the monitoring system may not be able to acquire and feed back critical data in real time, thereby affecting timely knowledge and decision making of the well production conditions. Insufficient integration of multiple parameters: well production involves the monitoring of a number of parameters including pressure, temperature, flow, fluid level, etc. The existing monitoring system may have defects in integrating a plurality of parameters, and cannot comprehensively evaluate and comprehensively analyze the production conditions of the oil well. Cost and complexity: some advanced monitoring techniques may require expensive equipment and complex installation procedures, which increase the cost and difficulty of implementation of the monitoring. For some small production units or wells in remote areas, it may not be practical to employ these advanced monitoring techniques. It is difficult to cope with complex geological and technological environments: the geology and process environment of different wells are different, and some environments can be complex. Existing monitoring techniques may face challenges that are difficult to accommodate for complex environments and variations, resulting in inaccuracy or incompleteness of the monitoring results. Challenges of data analysis and interpretation: monitoring systems typically generate large amounts of data that require efficient data analysis and interpretation to extract useful information. However, existing data analysis tools and methods may not fully utilize such data or require interpretation and analysis by a skilled artisan.
The indicator diagram is a key basis for oil well production fault diagnosis, and with the rapid development of machine learning, various methods such as a support vector machine, a BP neural network, a convolutional neural network and the like are tried for oil well fault diagnosis. However, these methods all rely on high quality graph datasets, which require significant effort and time to categorize the graphs one by one according to fault type.
Disclosure of Invention
The present invention has been made in view of the above-described problems.
Therefore, the technical problems solved by the invention are as follows: the existing fault diagnosis method has the optimization problems of low data set quality, long time consumption and high labor cost.
In order to solve the technical problems, the invention provides the following technical scheme: a method for rapidly preparing an oil well condition diagnostic dataset, comprising: collecting data of an oilfield data center; drawing an indicator diagram based on preprocessing the original data; outputting a numerical vector of the indicator diagram through characteristic conversion of the indicator diagram; and carrying out cluster analysis on the feature vectors, outputting a clustering result and checking.
As a preferable scheme of the rapid preparation method of the oil well working condition diagnosis data set, the invention comprises the following steps: the data of the oil field data center is collected and comprises a big data system for working condition diagnosis from the oil field data center, and the big data system is used as raw data of a rapid preparation method of an oil well working condition diagnosis data set;
wherein, the big data system of the working condition diagnosis comprises displacement and load.
As a preferable scheme of the rapid preparation method of the oil well working condition diagnosis data set, the invention comprises the following steps: the drawing into the indicator diagram based on the preprocessing of the original data comprises the following steps: preprocessing the original data to make the data length the same, and drawing an indicator diagram;
for each piece of data, checking whether the displacement and load data meet the requirements;
if the displacement and the load quantity are inconsistent, discarding the strip data;
if the displacement and the load have null values, discarding the strip data;
when the displacement and the load quantity are consistent but no null value exists, drawing an indicator diagram;
providing an interactive interface, displaying detailed information of abnormal data, and allowing a user to view data content and perform selection operation;
the displacement and the load after pretreatment respectively comprise N sample points, and an indicator diagram is drawn by taking the displacement as an abscissa and the load as an ordinate.
As a preferable scheme of the rapid preparation method of the oil well working condition diagnosis data set, the invention comprises the following steps: the method comprises the steps of converting the indicator diagram through characteristics, reading the indicator diagram through python, converting the indicator diagram into a matrix form, converting the characteristic matrix into characteristic vectors based on a reshape method, and expressing the characteristic vectors as:
G=read_image(filepath)
M(G)=MatrixTransform(G)
Φ=RandomProjection(M(G))
C(G)=M(G)+i×Φ
Ω=SVD(Ψ)
wherein G is an indicator diagram, M (G) is a matrix form corresponding to the indicator diagram G, phi is a random projection of the indicator diagram matrix, C (G) is a complex domain representation of the indicator diagram, ψ is a nonlinear function for processing high-dimensional data,for matrices after singular value decomposition, i is a complex unit, ρ and τ are nonlinear functions or matrices, ψ * Is the complex conjugate of ψ.
As a preferable scheme of the rapid preparation method of the oil well working condition diagnosis data set, the invention comprises the following steps: the feature vector clustering analysis comprises clustering the classification numbers in [ m, n ] based on an Aggler-temporal clustering algorithm, outputting corresponding cluster error square sum SSE under each K in [ m, n ], and expressing as:
where K is the number of clusters, ci is the i-th cluster, x is the point in Ci, ci is the centroid of the i-th cluster, K is the set dataset to be clustered K, m is the set dataset to be clustered lower limit, and n is the set dataset to be clustered upper limit.
As a preferable scheme of the rapid preparation method of the oil well working condition diagnosis data set, the invention comprises the following steps: the step of outputting the clustering result and checking comprises the steps of drawing the sum of the squares of the classification number and the error into a line graph, and observing the inflection point of the line graph;
converting the samples into a cluster based on an Agglimeracil clustering algorithm, wherein the number of clusters is the same as that of the samples in an initial state, and combining similar clusters with the smallest distance between clusters into a new cluster until the termination condition of the algorithm is met;
outputting the minimum SSE, and outputting the gradient with the fastest SSE falling speed under the condition that the given K value step length is 1;
the sum of SSEs for all clusters is expressed as:
where K is the number of clusters;
the SSE descent speed function is expressed as:
wherein SSE (T) K-1 ) AndSSE(T K ) SSE in K-1 and K clusters, R is the step length;
introducing a penalty term, and outputting an optimization target expressed as:
max K∈[m,n] (V(K)-λ·K)
where λ is a regularization parameter;
based on the elbow rule, the acceleration of the SSE descent is output, expressed as:
A(K)=V(K+1)-V(K)
inflection points occur within [ m, n ] when A (K) becomes positive or near zero;
if the classification number in [ m, n ] enables the elbow method schematic diagram to have obvious inflection points, the corresponding classification number is the optimal classification number, k is set as the optimal classification number confirmed by the elbow method, and a clustering result is stored;
if the classification number in [ m, n ] can not make the elbow schematic diagram have obvious inflection points, the upper limit n of the classification number is increased, clustering is continued, and a line diagram is drawn for observation until the obvious inflection points appear.
As a preferable scheme of the rapid preparation method of the oil well working condition diagnosis data set, the invention comprises the following steps: after the clustering result is output and the clustering is verified, the dividing condition of the data set is rapidly checked;
displaying clustering results, including different classes of indicator diagrams and picture distribution conditions;
rapidly checking a subset of each indicator diagram category to find pictures inconsistent with other categories;
if the images in the subset are inconsistent with the other classes of the indicator diagram, the subset needs to be removed from the current class, and the data set is finally determined.
Another object of the present invention is to provide a rapid preparation system for an oil well working condition diagnosis data set, which can convert the image characteristics of an indicator diagram into the numerical characteristics of vectors, and then combine a machine learning method to rapidly and automatically divide the indicator diagram data of fault diagnosis, so that the quality of the data set for fault diagnosis is effectively improved, and the time and labor costs are greatly reduced.
As a preferred solution for the rapid preparation of an oil well condition diagnostic dataset according to the invention, wherein: the system comprises a data acquisition module, an indicator diagram drawing module, an indicator diagram characteristic conversion module and a characteristic vector cluster analysis and inspection module; the data acquisition module is used for acquiring a big data system for working condition diagnosis in the oilfield data center; the indicator diagram drawing module is used for drawing an indicator diagram by judging whether the displacement and load data meet the requirements or not based on preprocessing the collected working condition diagnosis data; the indicator diagram feature conversion module is used for reading an indicator diagram through python, converting the indicator diagram into a matrix form, and converting feature matrix into feature vectors based on a reshape method; the feature vector cluster analysis and inspection module is used for carrying out cluster analysis on the feature vectors and checking the clustering result.
A computer device comprising a memory and a processor, said memory storing a computer program, characterized in that execution of said computer program by said processor is the step of performing a method for rapid preparation of a well condition diagnostic dataset.
A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of a method for fast preparing a diagnostic data set for well conditions.
The invention has the beneficial effects that: the rapid preparation method of the oil well working condition diagnosis data set provided by the invention uses the Agglimerate modeling algorithm to perform cluster analysis on the feature vector of the indicator diagram, ensures that the tiny features of the indicator diagram are not ignored, improves the accuracy of data analysis, checks displacement load, ensures that data is not missed or wrong, improves the data quality, and can evaluate more rapidly by observing the SSE line diagram and simultaneously applies the elbow rule, so that the rapid preparation method can adapt to different data sets and scenes, and improves the adaptability and flexibility of the rapid preparation method. The invention achieves better effects in terms of execution cost, data set quality and completion time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. Wherein:
fig. 1 is an overall flowchart of a method for rapidly preparing an oil well condition diagnosis data set according to a first embodiment of the present invention.
Fig. 2 is a preparation flow chart of a method for rapidly preparing an oil well condition diagnosis data set according to a first embodiment of the present invention.
Fig. 3 is an elbow method schematic diagram of a method for rapidly preparing an oil well condition diagnosis data set according to a first embodiment of the present invention.
FIG. 4 is a feature matrix of an indicator diagram of a method for rapidly preparing a diagnostic dataset for well conditions according to a second embodiment of the present invention.
FIG. 5 is an overall flow chart of a system for rapid preparation of a diagnostic data set for well conditions according to a third embodiment of the present invention.
Detailed Description
So that the manner in which the above recited objects, features and advantages of the present invention can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments, some of which are illustrated in the appended drawings. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Example 1
Referring to fig. 1-3, for one embodiment of the present invention, a method for rapidly preparing a diagnostic data set for well conditions is provided, comprising:
s1: data of an oilfield data center is collected.
Furthermore, a big data system for working condition diagnosis is collected from an oilfield data center and is used as raw data of a rapid preparation method of an oil well working condition diagnosis data set.
It should be noted that the big data system for the required condition diagnosis includes: displacement, load.
S2: and drawing an indicator diagram based on preprocessing the original data.
Furthermore, the original data is preprocessed to make the length of each piece of data identical, and an indicator diagram is drawn.
It should be noted that, for each piece of data, it is checked whether the displacement and load data meet the requirements;
if the displacement and the load quantity are inconsistent, discarding the strip data;
if the displacement and the load have null values, discarding the strip data;
when the displacement and the load are consistent in quantity but no null value exists, an indicator diagram is drawn.
It should also be noted that this may occur when errors occur in the sensor's incoming data center or the data's outgoing.
Further, the displacement and the load after pretreatment respectively comprise N sample points, and the indicator diagram can be drawn by taking the displacement as an abscissa and the load as an ordinate.
S3: and outputting a numerical vector of the indicator diagram through characteristic conversion of the indicator diagram.
Further, the characteristic transformation is carried out on the drawn indicator diagram, and a numerical vector of the indicator diagram is obtained.
It should be noted that, by using python to read the indicator diagram, the indicator diagram is converted into a matrix form, and the feature matrix is converted into a feature vector based on the reshape method to be expressed as:
G=read_image(filepath)
M(G)=MatrixTransform(G)
Φ=RandomProjection(M(G))
C(G)=M(G)+i×Φ
Ω=SVD(Ψ)
wherein G is an indicator diagram, M (G) is a matrix form corresponding to the indicator diagram G, phi is a random projection of the indicator diagram matrix, C (G) is a complex domain representation of the indicator diagram, ψ is a nonlinear function for processing high-dimensional data,for matrices after singular value decomposition, i is a complex unit, ρ and τ are nonlinear functions or matrices, ψ * Is the complex conjugate of ψ.
It should also be noted that after reading in a 3-channel indicator diagram with a resolution of 32×16 and converting it into a feature vector, the length of the feature vector is 1536, i.e. 1536 digital features.
S4: and carrying out cluster analysis on the feature vectors, outputting a clustering result and checking.
Furthermore, the rapid preparation method of the oil well working condition diagnosis data set adopts python to process the oil well indicator diagram to obtain characteristic vectors of the indicator diagram, and utilizes an Agglimerate clustering algorithm to perform cluster analysis on the indicator diagram, and the indicator diagram is automatically divided into different subsets based on characteristic differences among the vectors.
It should be noted that, performing cluster analysis on the feature vector includes:
and setting a lower limit m and an upper limit n of the data set to be clustered K according to expert experience.
Based on an Agglimmering clustering algorithm, clustering the classification numbers in [ m, n ], and outputting corresponding cluster error square sums SSEs under each K in [ m, n ], wherein the cluster error square sums SSEs are expressed as:
where K is the number of clusters, ci is the i-th cluster, x is the point in Ci, ci is the centroid of the i-th cluster, K is the set dataset to be clustered K, m is the set dataset to be clustered lower limit, and n is the set dataset to be clustered upper limit.
It should also be noted that each K and the SSE under its corresponding classification result are plotted as a line graph, as in fig. 3.
Further, the line graph is observed according to the elbow rule, and the optimal clustering number K is selected.
It should be noted that, based on the AggliptionCluster algorithm, the sample is converted into one cluster, the number of clusters in the initial state is the same as that of the sample, and the similar clusters with the smallest distance between clusters are combined into a new cluster until the termination condition of the algorithm is satisfied;
outputting the minimum SSE, and outputting the gradient with the fastest SSE falling speed under the condition that the given K value step length is 1;
the sum of SSEs for all clusters is expressed as:
where K is the number of clusters;
SSE drop is a speed demand function expressed as:
wherein SSE (T) K-1 ) And SSE (T) K ) SSE in K-1 and K clusters, R is the step length;
introducing a penalty term, and outputting an optimization target expressed as:
max K∈[m,n] (V(K)-λ·K)
where λ is a regularization parameter;
satisfying the elbow rule, consider the acceleration of the SSE descent, expressed as:
A(K)=V(K+1)-V(K)
inflection points occur within [ m, n ] when A (K) becomes positive or near zero;
if the classification number in [ m, n ] enables the elbow method schematic diagram to have obvious inflection points, the corresponding classification number is the optimal classification number, k is set as the optimal classification number confirmed by the elbow method, and a clustering result is stored;
if the classification number in [ m, n ] can not make the elbow schematic diagram have obvious inflection points, the upper limit n of the classification number is increased, clustering is continued, and a line diagram is drawn for observation until the obvious inflection points appear.
It should also be noted that the problem can be translated into a gradient descent theory tool, in which, in order to ensure that the total SSE is minimum, each step of K takes a value, i.e. every 1 increase in K value, SSE can be reduced to the greatest extent, i.e. the gradient with the highest SSE descent speed is found given a step of 1K value. If the number of classifications in [ m, n ] does not make FIG. 4 have a distinct inflection point, then the upper limit n of the number of classifications needs to be adjusted higher, the clustering is continued, and the line graph observation is drawn.
Further, k is set as the optimal classification number confirmed by the elbow method, and the clustering result is saved.
It should be noted that, verifying the clustering result includes:
after the clustering is finished, the partitioning condition of the data set is rapidly checked, and if pictures inconsistent with other indicator diagram categories appear in a certain subset, the pictures are required to be removed from the current category, and the data set is finally determined.
Example 2
Referring to fig. 4, for one embodiment of the present invention, a method for rapidly preparing a diagnostic data set of oil well conditions is provided, and in order to verify the beneficial effects of the present invention, scientific demonstration is performed through economic benefit calculation and simulation experiments.
Experiments were performed in two groups, one group being treated using prior art techniques and the other group being subjected to the same procedure using the present invention.
The experimental procedure carried out according to the invention is as follows: acquiring the needed data of the invention according to a big data system of working condition diagnosis: each 1000 pieces of displacement and load data respectively comprise 200 sample points, the 400 sample points can form an indicator diagram (displacement is an abscissa and load is an ordinate), and the total of 1000 indicator diagrams are 1000, and one piece of displacement and load are shown in table 1:
table 1 shows the displacement and load of the diagram
Preprocessing the data to check whether the displacement and the load are consistent in quantity; whether a null exists. Each bar of displacement and its corresponding load should contain 200 sample points, and this bar of data is discarded in either case. The displacement and the load after pretreatment all comprise 200 sample points, the displacement is taken as an abscissa, the load is taken as an ordinate, the indicator diagram can be drawn, 1000 indicator diagrams are finally obtained, the drawn size is 32 x 16 x 3, and the indicator diagram is shown in fig. 4.
And reading an indicator diagram by using python, converting the indicator diagram into a matrix form, wherein the feature matrix is shown in fig. 5, and converting the feature matrix into feature vectors by using a reshape method. After reading in a 3-channel indicator diagram with the resolution of 32×16 and converting the 3-channel indicator diagram into a feature vector, the length of the feature vector is 1536, namely 1536 numerical features.
The rapid preparation method of the oil well working condition diagnosis data set adopts a computer graphics method to obtain a feature vector with the length of 1536, utilizes an Aggliomeric clustering algorithm to perform cluster analysis on the feature vector, and clusters 1000 clusters into K clusters. Setting a lower limit 5 and an upper limit 12 of the data set to be clustered K according to expert experience; clustering the classification numbers in [5,12] by adopting an Agglimeracil clustering algorithm; calculating the square sum SSE of the corresponding cluster errors under each K in [5,12 ]; the sum of squares of the errors is expressed as:
drawing each K and SSE of its corresponding classification result into a line graph, as shown in FIG. 3; observing the line graph according to an elbow rule, and selecting an optimal clustering number K=8; and setting k as 8, clustering and storing the clustering result of the indicator diagram.
The problem can be converted into a gradient descent theory tool, in order to ensure that the total SSE is minimum, the SSE can be furthest reduced in each step of K, namely, every 1 increase of the K, namely, the gradient with the highest SSE descent speed can be found under the condition that the given K value step length is 1.
After the clustering is finished, the division condition of the data set is rapidly checked, if pictures inconsistent with other indicator diagram categories appear in one subset, the pictures are required to be removed from the current category, and finally the sample set is determined.
Another set of experiments, using the prior art to process the fault diagnosis indicator diagram data, as shown in table 2, the running time represents the time required for outputting the sample set, the smaller the data, the better the representation method; the data accuracy of the fault diagnosis indicates the deviation between the fault diagnosis of the data and the actual fault diagnosis, and the higher the data is, the better the method is.
Table 2 comparison of experimental results
Prior Art My invention
Run time 48s 22s
Data accuracy rate of fault diagnosis 97.2% 99.8%
In the aspect of operation time, the time of the invention is only 22s, which is far higher than 48s in the prior art, and the experimental result shows that the invention has higher treatment efficiency, is more suitable for actual work, and greatly reduces time and labor cost; in terms of the data accuracy of fault diagnosis, the accuracy of my invention is 99.8 percent, which is higher than 97.2 percent of the prior art, and experimental results show that the my invention has higher quality of fault diagnosis data sets.
In summary, my invention is a preferred solution in terms of time consumption and failure accuracy.
Example 3
Referring to fig. 5, for one embodiment of the present invention, a method for rapidly preparing an oil well condition diagnostic dataset is provided, comprising: the system comprises a data acquisition module, an indicator diagram drawing module, an indicator diagram characteristic conversion module and a characteristic vector cluster analysis and inspection module.
The data acquisition module is used for acquiring a big data system for working condition diagnosis in the oilfield data center, the indicator diagram drawing module is used for preprocessing original data, drawing an indicator diagram, the indicator diagram feature conversion module is used for carrying out feature conversion on the drawn indicator diagram, outputting feature vectors of the indicator diagram, and the feature vector cluster analysis and inspection module is used for carrying out cluster analysis on the feature vectors and checking a cluster result.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method of the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium upon which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like. It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.
It should be noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that the technical solution of the present invention may be modified or substituted without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered in the scope of the claims of the present invention.

Claims (10)

1. A method for rapidly preparing an oil well condition diagnostic data set, comprising:
collecting data of an oilfield data center;
drawing an indicator diagram based on preprocessing the original data;
outputting a numerical vector of the indicator diagram through characteristic conversion of the indicator diagram;
and carrying out cluster analysis on the feature vectors, outputting a clustering result and checking.
2. A method of rapidly preparing a diagnostic data set for well conditions as claimed in claim 1, wherein: the data of the oil field data center is collected and comprises a big data system for working condition diagnosis from the oil field data center, and the big data system is used as raw data of a rapid preparation method of an oil well working condition diagnosis data set;
wherein, big data system of oil well operating condition diagnosis includes displacement and load.
3. A method of rapidly preparing a diagnostic data set for well conditions as claimed in claim 1, wherein: the drawing into the indicator diagram based on the preprocessing of the original data comprises the following steps: preprocessing the original data to make the data length the same, and drawing an indicator diagram;
for each piece of data, checking whether the displacement and load data meet the requirements;
if the displacement and the load quantity are inconsistent, discarding the strip data;
if the displacement and the load have null values, discarding the strip data;
when the displacement and the load quantity are consistent but no null value exists, drawing an indicator diagram;
providing an interactive interface, displaying detailed information of abnormal data, and allowing a user to view data content and perform selection operation;
the displacement and the load after pretreatment respectively comprise N sample points, and an indicator diagram is drawn by taking the displacement as an abscissa and the load as an ordinate.
4. A method of rapidly preparing a diagnostic data set for well conditions as claimed in claim 1, wherein: the method comprises the steps of converting the indicator diagram through characteristics, reading the indicator diagram through python, converting the indicator diagram into a matrix form, converting the characteristic matrix into characteristic vectors based on a reshape method, and expressing the characteristic vectors as:
G=read_image(filepath)
M(G)=MatrixTransform(G)
Φ=RandomProjection(M(G))
C(G)=M(G)+i×Φ
Ω=SVD(Ψ)
wherein G is an indicator diagram, M (G) is a matrix form corresponding to the indicator diagram G, phi is a random projection of the indicator diagram matrix, and C (G) is an indicatorThe complex domain representation of the diagram, ψ is a nonlinear function that processes high-dimensional data,for matrices after singular value decomposition, i is a complex unit, ρ and τ are nonlinear functions or matrices, ψ * Is the complex conjugate of ψ.
5. A method of rapidly preparing a diagnostic data set for well conditions as claimed in claim 1, wherein: the feature vector clustering analysis comprises clustering the classification numbers in [ m, n ] based on an Aggler-temporal clustering algorithm, outputting corresponding cluster error square sum SSE under each K in [ m, n ], and expressing as:
where K is the number of clusters, ci is the i-th cluster, x is the point in Ci, ci is the centroid of the i-th cluster, K is the set dataset to be clustered K, m is the set dataset to be clustered lower limit, and n is the set dataset to be clustered upper limit.
6. A method of rapidly preparing a diagnostic data set for well conditions as claimed in claim 1, wherein: the step of outputting the clustering result and checking includes drawing the sum of the classification number and the square error into a line graph, and observing inflection points of the line graph;
converting the samples into a cluster based on an Agglimeracil clustering algorithm, wherein the number of clusters is the same as that of the samples in an initial state, and combining similar clusters with the smallest distance between clusters into a new cluster until the termination condition of the algorithm is met;
outputting the minimum SSE, and outputting the gradient with the fastest SSE falling speed under the condition that the given K value step length is 1;
the sum of SSEs for all clusters is expressed as:
where K is the number of clusters;
the SSE descent speed function is expressed as:
wherein SSE (T) K-1 ) And SSE (T) K ) SSE in K-1 and K clusters, R is the step length;
introducing a penalty term, and outputting an optimization target expressed as:
max K∈[m,n] (V(K)-λ·K)
where λ is a regularization parameter;
based on the elbow rule, the acceleration of the SSE descent is output, expressed as:
A(K)=V(K+1)-V(K)
inflection points occur within [ m, n ] when A (K) becomes positive or near zero;
if the classification number in [ m, n ] enables the elbow method schematic diagram to have obvious inflection points, the corresponding classification number is the optimal classification number, k is set as the optimal classification number confirmed by the elbow method, and a clustering result is stored;
if the classification number in [ m, n ] can not make the elbow schematic diagram have obvious inflection points, the upper limit n of the classification number is increased, clustering is continued, and a line diagram is drawn for observation until the obvious inflection points appear.
7. A method of rapidly preparing a diagnostic data set for well conditions as claimed in claim 1, wherein: after the clustering result is output and the clustering is verified, the dividing condition of the data set is rapidly checked;
displaying clustering results, including different classes of indicator diagrams and picture distribution conditions;
rapidly checking a subset of each indicator diagram category to find pictures inconsistent with other categories;
if the pictures inconsistent with the classes of the indicator diagrams appear in the subset, the subset is removed from the current class, and the data set is determined.
8. A system employing a method of rapid preparation of an oil well condition diagnostic data set according to any one of claims 1 to 7, characterized in that: the system comprises a data acquisition module, an indicator diagram drawing module, an indicator diagram characteristic conversion module and a characteristic vector cluster analysis and inspection module;
the data acquisition module is used for acquiring a big data system for working condition diagnosis in the oilfield data center;
the indicator diagram drawing module is used for drawing an indicator diagram by judging whether the displacement and load data meet the requirements or not based on preprocessing the collected working condition diagnosis data;
the indicator diagram feature conversion module is used for reading an indicator diagram through python, converting the indicator diagram into a matrix form, and converting feature matrix into feature vectors based on a reshape method;
the feature vector cluster analysis and inspection module is used for carrying out cluster analysis on the feature vectors and checking the clustering result.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, carries out the steps of a method for fast preparation of a well condition diagnostic dataset according to any of claims 1 to 7.
10. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor carries out the steps of a method for the rapid preparation of an oil well condition diagnostic dataset according to any of claims 1 to 7.
CN202311388141.0A 2023-10-24 2023-10-24 Rapid preparation method and system of oil well working condition diagnosis data set Pending CN117668584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311388141.0A CN117668584A (en) 2023-10-24 2023-10-24 Rapid preparation method and system of oil well working condition diagnosis data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311388141.0A CN117668584A (en) 2023-10-24 2023-10-24 Rapid preparation method and system of oil well working condition diagnosis data set

Publications (1)

Publication Number Publication Date
CN117668584A true CN117668584A (en) 2024-03-08

Family

ID=90079729

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311388141.0A Pending CN117668584A (en) 2023-10-24 2023-10-24 Rapid preparation method and system of oil well working condition diagnosis data set

Country Status (1)

Country Link
CN (1) CN117668584A (en)

Similar Documents

Publication Publication Date Title
CN109555566B (en) Steam turbine rotor fault diagnosis method based on LSTM
CN111914883B (en) Spindle bearing state evaluation method and device based on deep fusion network
CN112508105B (en) Fault detection and retrieval method for oil extraction machine
CN112416643A (en) Unsupervised anomaly detection method and unsupervised anomaly detection device
EP1958034B1 (en) Use of sequential clustering for instance selection in machine condition monitoring
CN112766301B (en) Oil extraction machine indicator diagram similarity judging method
CN112416662A (en) Multi-time series data anomaly detection method and device
CN115587543A (en) Federal learning and LSTM-based tool residual life prediction method and system
CN116108346A (en) Bearing increment fault diagnosis life learning method based on generated feature replay
Sharma et al. A semi-supervised generalized vae framework for abnormality detection using one-class classification
Biegel et al. Deep learning for multivariate statistical in-process control in discrete manufacturing: A case study in a sheet metal forming process
CN117056865B (en) Method and device for diagnosing operation faults of machine pump equipment based on feature fusion
CN113092083A (en) Machine pump fault diagnosis method and device based on fractal dimension and neural network
CN110320802B (en) Complex system signal time sequence identification method based on data visualization
CN116842379A (en) Mechanical bearing residual service life prediction method based on DRSN-CS and BiGRU+MLP models
CN117668584A (en) Rapid preparation method and system of oil well working condition diagnosis data set
CN114755010A (en) Rotary machine vibration fault diagnosis method and system
CN114548295A (en) Bearing fault classification system and method based on multi-scale domain adaptive network
CN111967507A (en) Discrete cosine transform and U-Net based time sequence anomaly detection method
CN112834194B (en) Fault intelligent detection method based on soft target measurement under fault-free sample
CN114383846B (en) Bearing composite fault diagnosis method based on fault label information vector
Zhang et al. A Novel Deep Learning Representation for Industrial Control System Data.
CN118070190A (en) Power data anomaly diagnosis method based on physical dependency relation mining
Zhao et al. Industrial Processes Fault Diagnosis Based on Omni-Scale InceptionTime
CN117591860A (en) Data anomaly detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination