CN116682045A

CN116682045A - Beam pumping unit fault detection method based on intelligent video analysis

Info

Publication number: CN116682045A
Application number: CN202310712166.5A
Authority: CN
Inventors: 张岩; 张志祥; 肖坤; 韩非; 田枫; 董宏丽
Original assignee: Northeast Petroleum University
Current assignee: Northeast Petroleum University
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2023-09-01

Abstract

The invention relates to a fault detection method of a beam pumping unit based on intelligent video analysis, which comprises the steps of obtaining an image of the pumping unit, establishing an example segmentation data set for a horsehead of the pumping unit, and dividing a training set, a verification set and a test set; establishing an instance segmentation deep learning network model; training an instance segmentation deep learning network model with a fixed network structure, and adjusting network parameters by forward transmission and backward feedback of a deep convolutional neural network by an instance segmentation data set; establishing a data set of horsehead movement time sequence data; training a support vector machine to detect faults; and performing fault detection of the beam pumping unit based on intelligent video analysis, combining with the actual application scene of the oil field, and performing fault detection of the beam pumping unit on a webpage based on Bian Yun collaborative architecture. The invention does not use a sensor to directly contact with the pumping unit, is not influenced by force in the motion of the pumping unit, does not influence the normal operation of the pumping unit, can reproduce the scene, tracks the history, and is visual and clear.

Description

Beam pumping unit fault detection method based on intelligent video analysis

Technical field:

the invention relates to the crossing field of petroleum development and artificial intelligence technology, in particular to a fault detection method of a beam pumping unit based on intelligent video analysis.

The background technology is as follows:

the beam pumping unit is used as the pumping unit which is most widely applied in China, and has very bad working environment and conditions. Except for necessary maintenance and repair, the oil pumping machine is required to work continuously for 24 hours a day throughout the year, and is basically unattended. And the pumping unit is mainly distributed in outdoor remote areas such as deserts, gobi deserts, oceans and the like, and is subjected to the test of extremely high temperature, extremely low temperature, wind, frost, rain and snow, and the natural environment is bad. In addition, the oil well in China is generally complex in geology, dispersed in geographical position, low in crude oil quality, high in heavy oil content and high in wax content. Therefore, the probability of failure of the oil pumping unit in the operation process is higher, once the failure is difficult to discover and process in time, the crude oil output efficiency is reduced if the failure is light, the output and the economic benefit of an oil field are affected, and the oil well production stopping and even casualties can be caused if the failure is heavy.

At present, the discovery of the faults of the pumping unit mainly depends on maintenance personnel to go to the periphery of the oil well to carry out field inspection periodically, or the monitoring video is directly called and checked by the attendant, so that a great deal of manpower is consumed, and meanwhile, a plurality of problems such as high consumption, low efficiency, untimely and the like are difficult to avoid. In addition, the system is influenced by the factors such as remote oil well position, unsmooth traffic, hard natural environment, frequent working time variation and the like, and the staff bears a large burden. Therefore, the automatic detection technology of the oil pumping unit fault is always a key point of intelligent guard system construction in intelligent oil field development.

The existing automatic fault detection method for the pumping unit mainly uses an indicator diagram, and is divided into two steps of indicator diagram acquisition and indicator diagram analysis, wherein the data source is single and the analysis is difficult. The collection modes of the indicator diagram mainly comprise a direct measurement method and an indirect measurement method. The direct measurement method generally utilizes a sensor to directly measure the suspension point load and displacement, equipment for measuring the load is developed from an early ground polish rod load sensor to a subsequent underground oil pump load sensor, and the measured data is more accurate. The load displacement sensor is mounted on the oil pumping unit to directly obtain the indicator diagram, but the load displacement sensor is complex in deployment and is easily influenced by external force, the phenomenon of data drift is difficult to avoid, the accuracy of the indicator diagram data is influenced, the follow-up indicator diagram analysis is influenced, the continuous service time is short, the oil pumping unit is required to be stopped for disassembly and correction, the oil extraction efficiency of an oil well is influenced, and the persistence requirement of the digital Internet of things is difficult to meet. The indirect measurement method is to indirectly obtain an indicator diagram by utilizing other parameters of the pumping unit system, such as an oil pipe diagram conversion method, an electric parameter inversion method and the like, wherein the electric parameter inversion indicator diagram method is the most commonly used indirect measurement method at present, the indicator diagram is obtained by calculation inversion through using electric parameters, and the difference of the pumping unit type, the balance weight and the like has larger influence on a calculation result. The method for analyzing the indicator diagram mainly uses a neural network at present, and only needs to input the characteristic data of the indicator diagram into the neural network with trained parameters, automatically analyzes the category of the indicator diagram and detects faults. However, the data are more required, the training cost is higher, and the difficulty is higher.

In recent years, with the acceleration of digital oilfield construction, the coverage rate of cameras at oilfield well sites is continuously increased to prevent accidents such as oil theft, animal invasion, oil leakage and the like, but the level of detecting the faults of the pumping unit based on video analysis is low, and the accuracy of extracting key components of the pumping unit is low and analysis of the faults of the pumping unit based on vision is lacking by using a traditional image processing algorithm or a target detection model based on deep learning.

The invention comprises the following steps:

the invention aims to provide a fault detection method for a beam pumping unit based on intelligent video analysis, which is used for solving the problems that an indicator diagram adopted by the automatic fault detection method for the beam pumping unit in the prior art has single data source and is difficult to analyze.

The technical scheme adopted for solving the technical problems is as follows: the fault detection method for the beam pumping unit based on intelligent video analysis comprises the following steps:

step one, acquiring an oil pumping unit image, and processing to form an original data set; amplifying the sample, establishing an example segmentation data set for the horsehead of the pumping unit, and dividing a training set, a verification set and a test set;

step two, an instance segmentation deep learning network model G is established, an instance segmentation data set image is input, and features are extracted through a full convolution backbone network; obtaining feature graphs I of different-size proportion layers through feature pyramids; the feature map I of each layer is followed by two parallel branches: class branching and mask branching; the class branches obtain example branches, mask convolution kernels G and mask feature graphs F are obtained from the mask branches, and example masks are obtained through convolution of the mask convolution kernels G and the mask feature graphs F; the instance branches and the instance masks pass through a matrix NMS to obtain an initial instance segmentation deep learning network model G; evaluating the degree of difference between the predicted value and the true value of the network model by using the loss function, and minimizing the difference between the predicted value and the true value of the network model to obtain an instance segmentation deep learning network model G with a fixed network structure;

step three, training an instance segmentation deep learning network model G with a network structure fixed; using the example segmentation data set obtained in the first step, adjusting network parameters through forward transmission and backward feedback of a deep convolutional neural network, and storing the adjusted network parameters to obtain a trained example segmentation deep learning network model G;

step four, establishing a data set of horsehead movement time sequence data; processing running videos of different pumping units by using the well-trained example segmentation deep learning network model G in the step three, obtaining a horsehead mask in each frame, forming time sequence data representing horsehead movement by using the vertical coordinates of the circle center of the minimum circumscribed circle of the mask, and then marking the time sequence data by combining with the pumping unit maintenance record to obtain a data set of the horsehead movement time sequence data;

step five, training a support vector machine to detect faults;

and step six, performing fault detection of the beam pumping unit based on intelligent video analysis, and performing fault detection of the beam pumping unit on a webpage based on Bian Yun collaborative architecture in combination with the actual application scene of the oil field.

The specific method of the step one in the scheme is as follows:

after the oil pumping unit image is obtained, data with poor quality on the image is cleaned, wherein the data is full of chaotic light spots, obvious horse head motion ghost and low image resolution; then normalizing the image, and unifying the resolution of the image according to the original length-width ratio by bilinear interpolation, wherein the defect is filled with black edges; marking the horsehead mask on the image by using a labelme library of Python with a polygonal tool; after the horsehead mask label file is obtained, the horsehead mask label file and the image file are converted into an MS-COCO format together to form an original data set; amplifying the sample, wherein the amplified sample comprises mirror image transformation, random rotation, random cavity and brightness transformation, and dividing the amplified sample into a training set, a verification set and a test set according to the proportion of 8:1:1.

The specific method of the second step in the scheme is as follows:

(2.1) inputting an instance segmentation dataset image, and extracting features through a full convolution backbone network: splitting an original feature map with c channels into s groups, wherein each group has c/s channels, and reserving the channels of the first group until final splicing; starting from the second group, after each group uses 3x3 convolution to extract the features, half is left to the final splice, and the other half is spliced with the next group and then uses convolution to extract the features; until the last group, all the extracted features are spliced with the features separated before;

(2.2) obtaining feature graphs I of different size proportion layers through feature pyramids, wherein the feature graphs I of each layer are connected with two parallel branches: class branches and mask branches, wherein the class branches divide I into grids of C dimension S multiplied by S, C is the number of semantic classes, and if the center of an example target falls in the grid, the grid predicts the semantic class of the target;

the mask branch is used for adding network coordinate positioning information through a Coordcon as a feature map I, in a convolution kernel branch of the mask branch, S×S convolution kernels G are dynamically predicted at most, D is the parameter number of G, and D=E if G is 1×1 convolution; if G is a 3×3 convolution, then d=9e; in the feature branches of the mask branches, uniformly predicting a mask feature map F for all levels, wherein F comprises information of feature maps of different levels, the shape of the mask feature map F is H multiplied by W multiplied by E, H and W are obtained by reducing the length and the width of an input image according to a certain proportion, and E is the channel number of the feature map; then convolving the mask convolution kernel G with the mask feature map F to obtain a maximum of S multiplied by S example masks;

(2.3) parallel computing by the matrix NMS through GPU to obtain attenuation factors of all generated masks, and the j-th mask m _j The attenuation factor calculation formula of (2) is:

in the formula (1), S is the confidence coefficient of the mask, f (iou) _i,j ) According to m _i And m _j Cross-over ratio iou of (a) _i,j Calculated pair m _j Is equal to 1-iou _i,j Can be obtained by combining with S _j Multiplication inhibition m _j Confidence of pointing to instance, and f (iou _.,i ) Is m _i The probability of being suppressed is calculated as:

formula (2) is m _i The minimum suppression factor f calculated with IoU of the higher confidence mask may be calculated using a gaussian function because f is a decreasing function, calculated as:

in the formula (3), sigma takes a default value of 0.5.

(2.4) using the loss function to evaluate the degree of difference between the semantic category, the network predicted value of the mask, and the true value, the calculation formula is:

L＝L _cate +λL _mask (4)

in the formula (4), λ is a weight parameter, and is set to 3. Loss function L for semantic class prediction _cate Focal Loss is used to solve the problem of unbalanced positive and negative sample poles in the training process, L _mask The method is used for predicting the mask, and the calculation formula is as follows:

in the formula (5), k means the kth lattice of the mask in which the center falls in the sxs lattice, and is counted in the order from left to right and top to bottom; n (N) _pos Is the positive number of samples;and->The true semantic category confidence of the mask in the kth grid and the true mask with the center falling on the kth grid are obtained; 1 is an indication function, the value depends on +.> Taking 1, otherwise, 0; d, d _mask Is used for calculating and generating the difference between the mask and the real mask, and the formula is as follows:

in formula (6), p _x，y And q _x,y And generating gray scale values of the mask and the real mask at grid positions of x rows and y columns.

The specific method of the fifth step in the scheme is as follows:

firstly, carrying out frequency domain feature extraction on time sequence data samples with different lengths by using Fourier transformation; constructing feature vectors for classification with unified dimensions by using amplitude values of the previous 4 frequency components; and then, carrying out parameter optimization on a support vector machine based on a Gaussian kernel function by using a grid search method, training an optimal classifier, and detecting the fault of the pumping unit, wherein the parameters to be optimized are mainly a penalty factor C and a kernel function parameter g, and the parameter C is derived from the formula:

equation (7) is a range Loss for quantifying the degree to which a sample is misclassified, and the larger C is, the more misclassified the sample is, and the more difficult the model training is. Wherein omega ^T x+b=0 is used to describe the classification hyperplane, ω of the support vector machine ^T Is a transpose of the dimension matrix parameters omega, the ω is a modulus of ω. N is the number of samples, ζ _i Is the relaxation parameter introduced for each sample. g is derived from a gaussian radial basis function, the formula of which is:

the equation (8) obtains the inner product value of the high-dimensional vector by calculation in the low-dimensional space, so that the calculation amount is greatly reduced. This gaussian radial basis function is introduced because the support vector machine maps the samples into a high dimensional space where the optimal classification hyperplane is found, but in a high dimensional space where the computation of the sample point multiplication is too large. Wherein,, ||x _i -x _j || ² For the squared euclidean distance between two feature vectors, g is a super parameter, requiring optimization.

The beneficial effects are that:

1. according to the invention, an oilfield well site camera is utilized, pumping unit fault detection is carried out based on intelligent video analysis, firstly, a pumping unit horsehead data set is prepared, an image example segmentation network model is trained, a horsehead mask is obtained by using the model, and then a point coordinate is fixed on the mask to form time sequence data so as to reflect the working condition of the pumping unit; then, analyzing the characterization of different working conditions of the pumping unit on time sequence data to form a pumping unit fault sample set based on intelligent video analysis; then, detecting the fault of the pumping unit by using a time sequence classification algorithm; finally, combining the actual application scene of the oil field, and carrying out fault detection of the oil pumping unit. The invention improves the level of detecting the fault of the pumping unit based on video analysis and enhances the reliability of the fault detection function of the pumping unit of the Internet of things in the oil field.

2. The invention does not use a sensor to directly contact with the pumping unit, is not influenced by force in the motion of the pumping unit, and does not influence the normal operation of the pumping unit.

3. The embodiment segmentation network can segment horsehead masks well, and the reasoning speed can meet the real-time requirement; the support vector machine can achieve higher detection accuracy and has better use value; the invention is realized by combining with the actual needs of the oil field, the functions can meet the needs of the pumping unit in all aspects of fault management, the performance also reaches the actual use needs, and the invention has better practical value.

4. From the visual point of view, the invention uses artificial intelligence technology to deeply excavate the monitoring video value, changes the passive monitoring into intelligent active monitoring, does not need the stop of the pumping unit when the pumping unit is deployed and maintained, does not influence the production, has lower cost and saves a large amount of manpower and material resources. Based on video recording, the scene can be reappeared, the history is tracked, and the method is visual and clear.

Description of the drawings:

FIG. 1 is a flow chart of the solution of the invention;

FIG. 2 is a block diagram of an example split network model set forth in the present invention;

FIG. 3 is a backbone network block diagram of an example split network;

FIG. 4 is a spectrum diagram obtained by Fourier transform;

FIG. 5 is a system main interface diagram of the present invention;

FIG. 6 is 6 more typical horsehead shapes contained in a pumping unit horsehead example segmentation dataset;

FIG. 7 is a graph of horsehead movement timing data for a pumping unit in a normal state;

FIG. 8 is a graph of horsehead movement timing data for a pumping unit belt slip;

FIG. 9 is a graph of horsehead movement timing data for a pumping unit at a stop;

fig. 10 is a graph of horsehead movement timing data when the pumping unit horsehead is tipped.

The specific embodiment is as follows:

the invention is further described below with reference to the accompanying drawings:

referring to fig. 1-4, the fault detection method for the beam pumping unit based on intelligent video analysis comprises the following steps: preparing a horsehead data set of the pumping unit, training an image example segmentation network model, acquiring a horsehead mask by using the model, and fixing a point coordinate on the mask to form time sequence data so as to reflect the working condition of the pumping unit; then, analyzing the characterization of different working conditions of the pumping unit on time sequence data to form a pumping unit fault sample set based on intelligent video analysis; then, detecting the fault of the pumping unit by using a time sequence classification algorithm; finally, combining the actual application scene of the oil field, and carrying out fault detection of the oil pumping unit. The method comprises the following steps:

step one: preparation of a pumping unit horsehead example segmentation data set:

currently, the disclosed example split data set of the horsehead of the pumping unit is lacking, so that a perfect example split data set is firstly manufactured for the horsehead of the pumping unit before the training of a model is started.

After the image of the pumping unit is obtained by using a crawler and a field shooting mode, firstly, part of data with poor quality is washed out, such as chaotic light spots, obvious horse head motion ghost, excessively low image resolution and the like are filled in the image; then, after leaving clear images, normalizing the images, and unifying the resolution of the images according to the original length-width ratio through bilinear interpolation, wherein the defects are filled with black edges; the horsehead mask on the image was then marked with a polygon tool using the labelme library of Python. After the horsehead mask tag file is obtained, it is converted to MS-COCO format along with the image file to form the original dataset. Amplifying samples by mirror image transformation, random rotation, random cavity, brightness transformation and other operations, and dividing the samples into a training set, a verification set and a test set according to the proportion of 8:1:1.

Step two: example segmentation deep learning network model G structure setup:

the input image is first characterized by a full convolutional network (fully convolutional network, FCN), i.e., a backbone network. Because the camera head placed on the oil extraction site has a plurality of tasks, the distance between the placement point and the pumping unit is difficult to be fixed and consistent, so that the horseheads of the pumping unit needing to extract masks are different in size, and the characteristic extraction capacity of the backbone network in the example segmentation network is required to be improved, so that the multi-channel fused backbone network is used. The original feature map with c channels is split into s groups of c/s channels each, and then the channels of the first group are reserved for final splicing. And starting from the second group, after each group uses a 3x3 convolution to extract features, half is left to the final splice, and the other half is spliced with the next group to extract features using the convolution. Half of the obtained features are left to be spliced finally, and the other half of the obtained features are sent to the next group of extracted features. Until the last group, all the extracted features are stitched with the features previously separated.

Then, feature pyramids (feature pyramid network, FPN) are used for obtaining feature graphs I with different size layers, and two parallel branches are connected to each layer of I: class branching and mask branching. Wherein the class branch divides I into an sxs grid, which predicts the class of an instance object if the center of this object falls in the grid. The number of decision grids S varies at different levels and the small size example is subdivided by more grids. And dynamically predicting S multiplied by S group convolution kernel weights G through convolution kernel branches in mask branches, uniformly predicting I of all levels through feature branches as a mask feature map F, and adding accurate position information by using CoordConv as F in prediction. Then convolving G with F yields a maximum of sx S example masks.

Then, a feature map I of different size proportion layers is obtained through feature pyramids (feature pyramid network, FPN), and two parallel branches are connected to each layer I: class branching and mask branching. Wherein the class branches divide I into a grid of C dimensions S, C being the number of semantic classes. If there is an instance object whose center falls in the grid, the grid predicts the semantic category of the object. The parameters S that determine the number of grids are also different at different levels, and small-size examples will be subdivided by more grids in a larger level. In the mask branch, network coordinate positioning information is added in advance by using CoordConv as a feature map. In the convolution kernel branches of the mask branches, a maximum of S×S convolution kernels G are dynamically predicted, and D is the parameter number of G. D=e if G is a 1×1 convolution; if G is a 3×3 convolution, d=9e. In the feature branches of the mask branches, a mask feature map F is predicted uniformly for all levels, wherein F contains information of feature maps of different levels, the shape of the mask feature map F is H×W×E, H and W are obtained by reducing the length and width of an input image according to a certain proportion, and E is the number of feature map channels. And then convolving the mask convolution kernel G with the mask feature map F to obtain a maximum of S×S example masks, wherein a plurality of generated masks point to the same example, and only one most reliable mask is reserved. The NMS method is to remove (suppress) masks that have too high a cross-over ratio (Intersection over union, ioU) centered on the mask confidence, leaving the mask most likely to point to the instance.

The matrix NMS calculates attenuation factors of all generated masks in parallel through GPU, and the j-th mask m _j The attenuation factor calculation formula of (2) is

Wherein S is _i For the confidence level of the ith mask, f (iou _i,j ) According to m _i And m _j Cross-over ratio iou of (a) _i,j Calculated pair m _j Is equal to 1-iou _i,j Can be obtained by combining with S _j Multiplication inhibition m _j Confidence of pointing to instance, and f (iou _.,i ) Is m _i The probability of being suppressed is calculated as

This is m _i Minimum numerical suppression calculated for IoU of higher confidence masksThe factor f, since f is a decreasing function, a Gaussian function can be used, calculated as

Wherein, sigma takes a default value of 0.5. In this way, the matrix NMS calculates in parallel through the GPU, so that the NMS speed is greatly improved, and higher precision is achieved.

In order to optimize network parameters through back propagation during deep neural network training, the degree of difference between semantic category, network predicted value and true value of mask is also required to be evaluated by using loss function, and a calculation formula is as follows

L＝L _cate +λL _mask

In the formula, lambda is a weight parameter and is set to 3. Loss function L for semantic class prediction _cate Focal Loss is used to solve the problem of unbalanced positive and negative sample poles in the training process, L _mask For mask prediction, the calculation formula is as follows

Wherein k refers to the kth lattice of the mask with the center falling in the S x S grid, and counting according to the sequence from left to right and from top to bottom; n (N) _pos Is the positive number of samples;and->The true semantic category confidence of the mask in the kth grid and the true mask with the center falling on the kth grid are obtained; 1 is an indication function, the value depends on +.> Taking 1, otherwise, 0; d, d _mask Is used for calculating a generated mask and a real maskDifferential Dice Loss, of the formula

Wherein p is _x,y And q _x,y And generating gray scale values of the mask and the real mask at grid positions of x rows and y columns.

Step three: training a network model:

and (3) training the optimized instance segmentation network model by using the training set and the verification set in the horse head instance segmentation data set obtained in the step one, adjusting network parameters by forward transmission and backward feedback of the deep convolutional neural network, and storing the adjusted network parameters.

Before training, training parameters of the model are initialized. The optimization strategy selects a random gradient descent method, the learning rate is set to be 0.0025 initially, the impulse parameter is set to be 0.9, and the regularization weight attenuation factor is set to be 0.0001. The batch size was 3, 64 epochs were trained, and training information was recorded for each epoch. The learning rate optimization strategy selects a linear learning rate preheating method, the initial preheating proportion is 0.01, the iteration summary is gradually increased in the initial 500, then the learning rate is reduced at 27 th and 56 th epochs, and convergence is accelerated.

Step four: preparation of data set of horsehead movement time series data

And (3) processing different oil pumping unit operation videos by using the trained model in the step (III), obtaining a horsehead mask in each frame, forming time sequence data representing horsehead movement by using the vertical coordinates of the circle center of the minimum circumscribed circle of the mask, and then combining with an oil pumping unit maintenance record and marking the time sequence data to obtain a data set of the horsehead movement time sequence data, and dividing the data set into a training set and a test set according to the proportion of 8:2.

Step five: training support vector machine to detect faults

Firstly, carrying out frequency domain feature extraction on time series data samples with different lengths by using Fourier transformation, counting the frequency domain features of each sample, and taking the first 4 parameters to construct feature vectors for classification with uniform dimensions. And then, performing parameter optimization on a support vector machine based on a Gaussian kernel function by using a grid search method, and training an optimal classifier to detect the fault of the pumping unit. The parameters to be optimized are mainly penalty factor C and kernel function parameter g, and C is derived from the formula:

s.t.y _i (ω ^T x _i +b)≥1-ξ _i (i＝1,2,...,N,ξ _i ≥0)

the formula is used for quantifying the degree of misclassification of a sample, and the larger C is, the more misclassification the sample cannot be, and the more difficult the model training is. Wherein omega ^T x+b=0 is used to describe the classification hyperplane, ω of the support vector machine ^T Is a transpose of the dimension matrix parameters omega, the ω is a modulus of ω. N is the number of samples, ζ _i Is the relaxation parameter introduced for each sample. g is derived from a gaussian radial basis function, the formula of which is:

the formula obtains the inner product value of the high-dimensional vector by calculation in a low-dimensional space, thereby greatly reducing the calculation amount. This gaussian radial basis function is introduced because the support vector machine maps the samples into a high dimensional space where the optimal classification hyperplane is found, but in a high dimensional space where the computation of the sample point multiplication is too large. Wherein,, ||x _i -x _j || ² For the squared euclidean distance between two feature vectors, g is a super parameter, requiring optimization.

Step six: and (3) performing fault detection on the beam pumping unit:

based on the method completed in the first step to the fifth step, the fault detection system of the beam pumping unit based on intelligent video analysis is realized by combining the actual application scene of the oil field. Based on Bian Yun collaborative architecture, C, python, java, nodeJS is used for development, running on the edge and the server, and managing pumping unit faults on the web page.

Example 1:

referring to fig. 5-10, the fault detection method for the beam pumping unit based on intelligent video analysis comprises six parts of preparation of a pumping unit horsehead example segmentation data set, example segmentation deep learning network model structure setting, network model training, preparation of a horsehead movement time sequence data set, and fault detection by training a support vector machine, so as to realize a fault detection system for the beam pumping unit based on intelligent video analysis. Firstly, preparing a pumping unit horsehead example segmentation data set to obtain a pumping unit horsehead example segmentation data set in an MS-COCO format; secondly, setting the structure of an instance segmentation deep learning network model; then training a network model; then preparing a data set of horsehead movement time sequence data; and then training the support vector machine to detect faults by using the time sequence data set obtained in the previous step. Finally, based on the steps, the fault detection system of the beam pumping unit based on intelligent video analysis is realized by combining the actual application scene of the oil field.

The method comprises the following steps:

1. preparation of a pumping unit horsehead example segmentation data set:

1.1 Beam-pumping unit image preprocessing

And (3) using a program written by a Python crawler library, namely, the script, crawling the search results of the pumping unit in hundred-degree pictures, and downloading the pictures in the search page. In addition, a variety of pumping unit images are taken at the oilfield using a cell phone.

Then, the images crawled by the crawlers are cleaned. Firstly, removing a non-beam pumping unit image, a cartoon image, a model image, a hand-drawn image, a horsehead-free image and some advertisement images, and then cleaning images with poor image quality, such as images full of chaotic light spots, obvious horsehead motion residual shadows, excessively low image resolution and the like. The final image contains 6 typical horsehead shapes for the pumping unit.

The image is then normalized. The use of bilinear interpolation functions in the OpenCV library of Python unifies the resolution of the image with the original aspect ratio and fills in the black edges where there are deficiencies.

1.2 sample labeling and Format conversion

Each horsehead mask on each beam-pumping unit image was labeled with a polygon tool using the labelme library of Python. After the horsehead mask tag file is obtained, it is converted to MS-COCO format along with the image file to form the original dataset.

1.3 sample amplification and partitioning

The samples were amplified using the imgauge library of Python using mirror image transformation, random rotation, random hole and luminance transformation, etc. And is divided into a training set, a verification set and a test set according to the proportion of 8:1:1.

2. Example segmentation deep learning network model G structure setup:

2.1 statistical analysis of MS COCO data sets

By carrying out statistical analysis on the center distance and the size proportion of the target pairs in the MS COCO data set, the situation that each instance in the image is different in center position or size in most cases can be obtained, so that the instance segmentation deep learning network directly distinguishes the instances by using the center position and the size of the instance. The distinction of the instance size is completed through a feature pyramid (Feature Pyramid Networks, FPN), the distinction of the instance center position is completed through dividing a feature image output by the FPN into an S multiplied by S grid, a center sampling method is used, a center area is set to be a rectangle with the width and the height of 0.2 times around the centroid coordinates of a real mask, therefore, each real mask has 3 positive samples on average, and if the instance center falls into the grid, the semantic category and the mask of the instance are inferred in parallel through category branches and mask branches in the grid.

2.2 backbone network architecture

Because the camera head placed on the oil extraction site has a plurality of tasks, the distance between the placement point and the pumping unit is difficult to be fixed and consistent, so that the horseheads of the pumping unit needing to extract masks are different in size, and the characteristic extraction capacity of the backbone network in the example segmentation network is required to be improved, so that the multi-channel fused backbone network is used. The original feature map with c channels is split into s groups of c/s channels each, and then the channels of the first group are reserved for final splicing. And starting from the second group, after each group uses a 3x3 convolution to extract features, half is left to the final splice, and the other half is spliced with the next group to extract features using the convolution. Half of the obtained features are left to be spliced finally, and the other half of the obtained features are sent to the next group of extracted features. Until the last group, all the extracted features are stitched with the features previously separated.

2.3 improving the inference speed

By using the matrix NMS, the consumption of the mask NMS in the reasoning process is obviously reduced in a parallel computing mode, so that the reasoning speed is improved.

3. Training a network model:

and (3) training the optimized example segmentation network model by using the training set and the verification set in the oil pumping unit horse head example segmentation data set obtained in the step one. And prior to training, initializing training parameters of the model. The optimization strategy selects a random gradient descent method, the learning rate is set to be 0.0025 initially, the impulse parameter is set to be 0.9, and the regularization weight attenuation factor is set to be 0.0001. The batch size was 3, 64 epochs were trained, and training information was recorded for each epoch. The learning rate optimization strategy selects a linear learning rate preheating method, the initial preheating proportion is 0.01, the iteration summary is gradually increased in the initial 500, then the learning rate is reduced at 27 th and 56 th epochs, and convergence is accelerated.

4. Data set preparation of horsehead movement time sequence data:

and (3) processing running videos of different pumping units by using the trained model in the step (III), obtaining a horsehead mask in each frame, forming time sequence data representing horsehead movement by using the vertical coordinate of the circle center of the minimum circumscribed circle of the mask, then analyzing the pumping unit states of the video at different time intervals by combining with the maintenance record of the pumping unit, obtaining time sequence data characteristics corresponding to the different pumping unit states, such as the normal state of the pumping unit, belt slipping, well stopping, horsehead turning and the like, and marking the horsehead movement time sequence data of the normal state and the belt slipping faults to obtain a data set.

5. Training a support vector machine to detect faults:

firstly, carrying out frequency domain feature extraction on time series data samples with different lengths by using Fourier transformation, carrying out statistical analysis on frequency domain features, and taking the first 4 parameters to construct feature vectors for classification with uniform dimensions. And then, performing parameter optimization on a support vector machine based on a Gaussian kernel function by using a grid search method, and training an optimal classifier to detect the fault of the pumping unit. The parameters to be optimized are mainly penalty factor C and kernel function parameter g, and C is derived from the formula:

s.t.y _i (ω ^T x _i +b)≥1-ξ _i (i＝1,2,...,N,ξ _i ≥0)

6. And (3) performing fault detection on the beam pumping unit:

based on the method completed in the first step to the fifth step, the fault detection system of the beam pumping unit based on intelligent video analysis is realized by combining the actual application scene of the oil field. Based on Bian Yun collaborative architecture, C, python, java, nodeJS is used for development, a video stream intelligent analysis framework NVIDIA Deepstream is used, a deep learning reasoning optimization library TensorRT, message queues RabbitMQ, databases MySQL, minIO and other components are used, the components run on an edge side Jetson NX and a deep learning reasoning server, and pumping unit faults are managed on a webpage.

The implementation effect is as follows:

the main interface diagram of the fault detection system of the beam pumping unit based on intelligent video analysis, which is realized by the invention, is shown in fig. 5.

The 6 more typical horsehead shapes contained in the pumping unit horsehead example segmentation dataset are shown in fig. 6.

Fig. 7, fig. 8, fig. 9 and fig. 10 are respectively graphs of horsehead movement time sequence data of the pumping unit in normal state, belt slipping, well stopping and horsehead falling.

Claims

1. The fault detection method of the beam pumping unit based on intelligent video analysis is characterized by comprising the following steps of:

step two, an instance segmentation deep learning network model G is established, an instance segmentation data set image is input, and features are extracted through a full convolution backbone network; obtaining feature graphs I of different-size proportion layers through feature pyramids; the feature map I of each layer is followed by two parallel branches: class branching and mask branching; the class branches obtain example branches, mask convolution kernels G and mask feature graphs F are obtained from the mask branches, and example masks are obtained through convolution of the mask convolution kernels G and the mask feature graphs F; the instance branches and the instance masks pass through a matrix NMS to obtain an initial instance segmentation deep learning network model G; evaluating the degree of difference between the predicted value and the true value of the network model by using the loss function, and minimizing the difference between the predicted value and the true value of the network model to obtain an example segmentation deep learning network model G;

step four, establishing a data set of horsehead movement time sequence data; processing operation monitoring videos of different pumping units by using the well-trained example segmentation deep learning network model G in the step three, obtaining a horsehead mask in each frame, forming time sequence data representing horsehead movement by using the vertical coordinates of the circle center of the minimum circumscribed circle of the mask, and then marking the time sequence data by combining with the pumping unit maintenance record to obtain a data set of the horsehead movement time sequence data;

step five, training a support vector machine to detect faults;

2. The intelligent video analysis-based fault detection method for beam-pumping units, as claimed in claim 1, is characterized in that: the specific method of the first step is as follows:

3. The intelligent video analysis-based fault detection method for beam-pumping units, as claimed in claim 2, is characterized in that: the specific method of the second step is as follows:

in the formula (1), S _i For the confidence level of the ith mask, f (iou _i,j ) According to m _i And m _j Cross-over ratio iou of (a) _i,j Calculated pair m _j Is equal to 1-iou _i,j By combining with S _j Multiplication inhibition m _j Confidence of pointing to instance, f (iou _.,i ) Is m _i The probability of being suppressed is calculated as:

formula (2) is m _i The numerical least suppression factor f calculated for IoU with the higher confidence mask is calculated as f, which is a decreasing function, using a gaussian function, with the formula:

in the formula (3), sigma takes a default value of 0.5;

L＝L _cate +λL _mask (4)

in the formula (4), lambda is a weight parameter and is set to 3; loss function L for semantic class prediction _cate Focal Loss is used to solve the problem of unbalanced positive and negative sample poles in the training process, L _mask The method is used for predicting the mask, and the calculation formula is as follows:

in the formula (5), k means the kth lattice of the mask in which the center falls in the sxs lattice, and is counted in the order from left to right and top to bottom; n (N) _pos Is the positive number of samples;and->The true semantic category confidence of the mask in the kth grid and the true mask with the center falling on the kth grid are obtained; 1 is an indication function, takeThe value depends on +.> Taking 1, otherwise, 0; d, d _mask Is used for calculating and generating the difference between the mask and the real mask, and the formula is as follows:

in formula (6), p _x,y And q _x,y And generating gray scale values of the mask and the real mask at grid positions of x rows and y columns.

4. The intelligent video analysis-based fault detection method for beam-pumping unit, as set forth in claim 3, wherein: the specific method of the fifth step is as follows:

firstly, carrying out frequency domain feature extraction on time sequence data samples with different lengths by using Fourier transformation; constructing feature vectors for classification with unified dimensions by using amplitude values of the previous 4 frequency components; and then, carrying out parameter optimization on a support vector machine based on a Gaussian kernel function by using a grid search method, training an optimal classifier, and detecting the fault of the pumping unit, wherein the parameters to be optimized are a penalty factor C and a kernel function parameter g, and the parameter C is derived from the formula:

equation (7) is a finger Loss for quantifying the degree to which a sample is misclassified, the larger C the sample is, the less misclassified, and the more difficult the model training is; wherein omega ^T x+b=0 is used to describe the classification hyperplane, ω of the support vector machine ^T Is the transpose of the dimension matrix parameter ω, ω is the modulus of ω, N is the number of samples, ζ _i Is a relaxation parameter introduced for each sample; g comes from Gaussian radial directionA basis function, the formula of which is:

equation (8) obtains the inner product value of the high-dimensional vector with a calculation in the low-dimensional space, where ||x _i -x _j || ² G is a super-parameter, which is the squared euclidean distance between two feature vectors.