CN113392823A

CN113392823A - Oil level meter reading method based on deep network regression

Info

Publication number: CN113392823A
Application number: CN202110478518.6A
Authority: CN
Inventors: 郑晓隆; 赵如彬
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-04-30
Filing date: 2021-04-30
Publication date: 2021-09-14
Anticipated expiration: 2041-04-30
Also published as: CN113392823B

Abstract

The invention discloses an oil level meter reading method based on deep network regression, which comprises the following steps: step S1: acquiring image data of an oil level meter as a sample image; step S2: establishing a deep neural network model; step S3: training a model, namely mixing two types of samples when training samples are input, wherein one part is a complete graph sample and is used for predicting the position size of an oil level meter, and the other part is a screenshot generated according to the information of the oil level meter and is used for predicting the height position of a liquid level surface; then, a cross training mode is used, namely two scale images are mixed to serve as input, if a large scale image is input, the output of a first branch and the output of a second branch of the network are used as prediction results, and the position of an oil level table is used as a true value to calculate loss; if the small-scale image is input, the third branch output of the network is used as a prediction result, and the loss is calculated by using the height of the liquid level; step S4: and inputting the image acquired by the industrial oil level meter monitoring camera into the trained model for prediction.

Description

Oil level meter reading method based on deep network regression

Technical Field

The invention belongs to the field of target detection and small target positioning, and relates to a target detection and oil level meter liquid level line accurate positioning method based on a deep neural network.

Background

The task of Object Detection (Object Detection) technology, which is mainly to find all objects in an image and determine their categories and positions, is one of the core problems of computer vision. At present, there are four major categories of tasks in computer vision, which are: classifying, positioning, detecting and segmenting. In the application scene of automatic reading of an industrial oil level meter, accurate liquid level in the oil level meter is a key technology, but the liquid level in the oil level meter is actually formed by a liquid column, the length and the width of the liquid level in an image are less than one percent of the size of the image, and how to accurately position the liquid level is also a difficult problem.

The existing method for positioning the liquid level surface of the oil level meter mainly adopts a traditional image processing method, firstly adopts a mode matching method to position the position of the oil level meter from an original image, and then adopts the technology of line detection such as Hough transformation and the like to position the position of the liquid level surface, template matching can be influenced by the image quality of a template and the current oil level meter image, the position of a quasi oil level meter is often difficult to position, and the liquid level surface serving as a fine object is also easily interfered by the imaging quality, so that the precision of positioning the liquid level surface by adopting the method is not high.

Based on the hierarchical relationship between the oil level meter and the liquid level surface, the oil level meter in the large-scale original image is positioned by adopting the same deep neural network model, then the small-scale image is obtained by cutting based on the oil level meter position, and then the regression prediction positioning is carried out on the liquid level surface in the small-scale image.

Disclosure of Invention

In order to accurately position the liquid level surface in the large-scale image, the invention provides an oil level meter reading method based on depth network regression, which comprises the steps of firstly positioning a relatively large target, namely an oil level meter, from the large-scale image based on the relative position relation between the oil level meter and the liquid level surface, and then cutting the image based on the obtained oil level meter position to obtain a small-scale image containing the oil level meter; and inputting the obtained small-scale image into the model again, and obtaining the accurate liquid level position by utilizing depth convolution network regression.

In order to overcome the defects of the prior art, the invention adopts a multi-task learning strategy when training a model, positions an oil level meter in a large-scale image and positions a liquid level position in a small-scale image, and the technical scheme is as follows:

a deep network regression-based oil level meter reading method comprises the following steps:

step S1: data generation and preprocessing, this step further including:

step S11: an industrial oil level gauge long-range view shot in the field is adopted, and each image may contain one oil level gauge or a plurality of oil level gauges;

step S12: and marking the perspective view of the industrial oil level gauge to obtain two groups of labels of the oil level gauge position and the liquid level surface position.

Step S13: establishing group truth data, predicting the information of two labels, needing to establish the probability and position information of the target, and finally predicting a matrix with the size of 7 multiplied by 5 and a vector with the size of 1 multiplied by 1.

Step S2: creating a model and training, this step further comprising:

step S21: establishing a deep neural network, adopting a general GoogleNet as a main network, and outputting three branches by the network: predicting the position probability of the oil level gauge, predicting regression parameters of the oil level gauge position, and predicting the regression of the height position of the liquid level surface.

Step S22: the original high-resolution image is scaled to an image input network of 448 x 448 size, the final oil level table information characteristic map is output through a convolutional layer, a pooling layer and an activation layer, the size of the output first characteristic map is 7x7 and is used for predicting the probability that the oil level table falls into a certain block, the size of the output second characteristic map is 7x 4 and is used for predicting the position and the size of the oil level table, and the third output is the height position of the liquid level surface.

Step S23: and continuously inputting the training samples and the target true value into the network, adjusting the parameters of each layer of the network by using error back propagation, continuously carrying out iterative training, and finally realizing convergence to obtain a network model.

Step S24: the deep neural network designed by the invention is used for realizing the prediction of two types of samples, and the loss corresponding to each type of sample is different, so that different types of losses are fused at last, then the network is driven to update, and finally the total loss tends to be minimum until the loss is stable and motionless, and the network training is finished.

Step S3: after the network is trained, the industrial oil level meter image acquired on site can be input into the network after being preprocessed, and the predicted coordinates and size of the oil level meter and the height position of the liquid level can be obtained.

Compared with the prior art, the method has the following technical effects:

1) by means of a deep neural network technology, a two-step method is adopted, the method is divided into two steps, firstly, the oil level meter is detected on an original image to obtain a highly accurate oil level meter, then, the oil level meter is used as an interested area, a deep convolution network regression method is adopted, and small target objects such as a liquid level surface are positioned instead of linear detection (Hough transformation and the like) depending on image quality, so that the positioning robustness and accuracy are improved;

2) the invention adopts the same network to complete two tasks, namely the detection oil level meter and the positioning regression liquid level surface are both carried out by the same deep convolution network, and the invention improves both the model apparent memory occupancy rate and the detection speed.

Drawings

FIG. 1 is a flow chart of an oil level meter reading method based on deep network regression according to the present invention.

Fig. 2 is a depiction of an oil level gauge and a liquid level.

FIG. 3 is a grid and oil level table interest diagram for creating a ground truth.

FIG. 4 is a deep neural network architecture of the present invention.

FIG. 5 is a structure of a convolution module in the present invention;

FIG. 6 is the structure of an acceptance module in the present invention;

Detailed Description

Firstly, training data preparation is carried out, height positions of an oil level meter and a liquid level surface are manually marked to generate a real label from a scene picture of the oil level meter shot on the spot, and a training set with sufficient scale is prepared; establishing a neural network of the structure; the network is trained by the training set until convergence. Finally, the picture not used for training is used for testing.

The technical solution provided by the present invention will be further explained with reference to the accompanying drawings.

Referring to fig. 1, a flow chart of an oil level meter reading method based on deep network regression is shown, and the method specifically includes the following steps:

step S1: acquiring image data of an oil level meter as a sample image, and labeling the sample image to obtain label data for training; obtaining two types of sample labeling information, namely information of two labels of a predicted oil level meter and a liquid level surface;

step S2: establishing a deep neural network model, wherein the model structure adopts GoogleNet as a main network, and the model outputs three branches, wherein the first branch is used for predicting the probability of an oil level gauge, the second branch is used for predicting the regression of the position of the oil level gauge, and the third branch is used for predicting the regression of the height position of a liquid level;

step S3: training the model established in the step S2, wherein in order to enable a network to predict the oil level and the liquid level height at the same time, first, the two types of samples are mixed when inputting the training sample, one part is a complete diagram sample for predicting the size of the oil level, and the other part is a screenshot generated according to the information of the oil level for predicting the height of the liquid level; then, a cross training mode is used, namely two scale images are mixed to serve as input, if a large scale image is input, the output of a first branch and the output of a second branch of the network are used as prediction results, and the position of an oil level table is used as a true value to calculate loss; if the small-scale image is input, the third branch output of the network is used as a prediction result, and the loss is calculated by using the height of the liquid level;

step S4: inputting the image collected by the industrial oil level meter monitoring camera into a trained model for prediction to obtain the position and size of the oil level meter in the image, and predicting the height position coordinate of the liquid level surface by using the oil level meter screenshot.

In step S1, a large number of cameras for monitoring the oil level gauge are installed in the substation scene, the present invention uses such a large amount of image data, each image may include an oil level gauge corresponding to a target, if some oil level gauges are concentrated, a plurality of oil level gauges may be in a single image, that is, a plurality of targets, and a specific oil level gauge can be monitored by a special camera, so that a value of the oil level gauge can be read, and therefore, at least one oil level gauge exists in each image.

Meanwhile, labeling the sample image by using a labeling tool to obtain two types of labeling information, wherein the position and the size O of the oil level indicator are [ x [ ]^o,y^o,w^o,h^o]The horizontal and vertical coordinates and the width and height of the central region of the oil level gauge are shown. And the liquid level height position Q ═ y^q]The vertical axis of the cutting chart indicates the liquid level. As indicated by the red boxes and lines in fig. 2.

And finally, establishing group channel data for training, predicting information of two labels of an oil level meter and a liquid level surface, and establishing probability of occurrence and position size information of the oil level meter, wherein the matrix with the size of 7 multiplied by 5 and height information of the liquid level surface are predicted, namely a vector with the size of 1 multiplied by 1. The first 7X 1 matrix M^OProbIs the probability of the oil level table appearing, is initialized to 0, the original image is scaled to 448 × 448, and is subjected to 64 × 64 blocking, as shown in fig. 3(a), if the oil level table falls into a certain area, then in the corresponding M^OProbThe value is assigned to 1 at the element position. A second 7X 4 matrix M^OLocIs the relative coordinate and size of the oil level meter to the 7x7 grid, and the range is [0,1]Internal; intercepting an interested area containing an oil level table, as shown in figure 3(b), wherein the 1 × 1 vector is the coordinate (y) of the height position of the liquid level surface relative to the intercepted graph, and normalizing the labels in the generated training set, wherein the range is [0,1 ]]And improving the subsequent training precision.

A similar python code to create the oil level table group channel data is as follows:

plate_gt＝np.zeros((7,7,5),dtype＝np.float32)

x1,y1,x2,y2＝bnd[0][1:]

x1/＝imgw

x2/＝imgw

y1/＝imgh

y2/＝imgh

cx＝(x1+x2)/2

cy＝(y1+y2)/2

indx＝int(np.ceil(cx*7)-1)

indy＝int(np.ceil(cy*7)-1)

device_gt[indy,indx,0]＝1

device_gt[indy,indx,1]＝cx*7-indx

device_gt[indy,indx,2]＝cy*7-indy

device_gt[indy,indx,3]＝np.sqrt(x2-x1)

device_gt[indy,indx,4]＝np.sqrt(y2-y1)

in step S2, the deep neural network model structure is shown in fig. 4, and fig. 5 is a structure of the convolution module in the present invention; FIG. 6 is the structure of an acceptance module in the present invention; the detailed parameters are as follows:

(1) first layer convolution module Conv2d _1a _7x 7: inputting a three-channel 448 multiplied by 448 image, the size of a convolution kernel is 7 multiplied by 7, the step size is 2, filling is 3, and a 64-channel feature map is output.

(2) The second layer is the maximum pooling layer maxpool0, convolution kernel size 3 × 3, step size 2, pad 1.

(3) The third layer is a local response normalization layer lrn1 with parameters of size 5, alpha 0.0001, beta 0.75, and k 1.

(4) The fourth layer convolution module Conv2d _2a _3x 3: inputting a 64-channel feature map, the size of a convolution kernel is 1 multiplied by 1, the step size is 1, filling 0, and outputting the 64-channel feature map.

(5) Fifth layer convolution module Conv2d _2b _3x 3: inputting a 64-channel feature map, the size of a convolution kernel is 3 multiplied by 3, the step size is 1, filling 1, and outputting a 192-channel feature map.

(6) The sixth local response normalization layer lrn2 has a parameter of 5, alpha 0.0001, beta 0.75, and k 1.

(7) The seventh layer is the maximum pooling layer maxpool1, convolution kernel size 3 × 3, step size 2, pad 1.

(8) The eighth layer is an acceptance module acceptance _ a1, the number of input channels is 192, and the number of output channels is 256.

(9) The ninth layer is an acceptance module acceptance _ a2, the number of input channels is 256, and the number of output channels is 480.

(10) The tenth layer is the maximum pooling layer maxpool2, convolution kernel size 3 × 3, step size 2, fill 1.

(11) The tenth layer is an acceptance module acceptance _ b1, the number of input channels is 480, and the number of output channels is 512.

(12) The twelfth layer is an acceptance module acceptance _ b2, the number of input channels is 512, and the number of output channels is 512.

(13) The thirteenth layer is an acceptance module acceptance _ b3, the number of input channels is 512, and the number of output channels is 512.

(14) The fourteenth layer is an acceptance module acceptance _ b4, the number of input channels is 512, and the number of output channels is 528.

(15) The fifteenth layer is an acceptance module acceptance _ b5, the number of input channels is 528, and the number of output channels is 832.

(16) The sixteenth layer is the maximum pooling layer maxpool3, convolution kernel size 3 × 3, step size 2, fill 1.

(17) The seventeenth layer is an acceptance module acceptance _ c1, the number of input channels is 832, and the number of output channels is 832.

(18) The eighteenth layer is an acceptance module acceptance _ c2, the number of input channels is 832, and the number of output channels is 1024.

(19) The nineteenth layer is the maximum pooling layer maxpool4, convolution kernel size 3 × 3, step size 2, fill 1.

(twenty-20 th layer is the first layer convolution module Conv2d _3a _3x3, input 1024 channels, output 1024 channels, convolution kernel size 3x3, fill 1.

(21) The twenty-second layer is a second layer convolution module Conv2d _3b _3x3, which inputs 1024 channels and outputs 1024 channels, and the convolution kernel size is 3x 3.

The network then splits into three branch outputs.

A first branch: the convolution module Conv2d _3c _1x1 inputs 1024 channels and outputs 1 channel, the convolution kernel size is 1x1, and the offset is 0. The result is output as a (pitch, 1, 7, 7) vector by Sigmoid function, which indicates the probability that the target is located in a certain area after the picture is divided into 49 areas of 7 × 7, and is used for predicting the position and size of the oil level table when a large-scale image is input.

A second branch: the convolution module Conv2d _3d _3x3 inputs 1024 channels and outputs 4 channels, the convolution kernel size is 1 × 1, and the offset is 0. The result is output as a (pitch, 4, 7, 7) vector by Sigmoid function, which indicates that after the picture is divided into 49 areas of 7 × 7, the four channels each indicate the offset of the target center coordinate in the x-axis direction of the area center, the offset of the target center coordinate in the y-axis direction of the area center, the target x-axis length, and the target y-axis length. The lengths are the ratio of coordinate pixel values to the total pixels of the length or the width of the image, and are used for predicting the position of the oil level indicator when the large-scale image is input.

A third branch: the convolution module Conv2d _3e _3x3 inputs 1024 channels and outputs 1 channel, the size of a convolution kernel is 3x3, and the offset is 1; the convolution module Conv2d _3f _3x3 inputs 1 channel and outputs 1 channel, the size of a convolution kernel is 3x3, the offset is 1, and the step size is 3; the convolution module Conv2d _3g _3x3 inputs 1 channel and outputs 1 channel, the convolution kernel size is 3x3, and the offset is 0. The output is (batch size, 1, 1, 1) vector, and a channel represents the bias of the target liquid level surface based on the y-axis direction of the center of the region and is used for predicting the height position of the liquid level surface when a small-scale image is input.

In step S3, the original high resolution image is scaled to 448 × 448 size image input network, the final oil level table information feature map is output through the convolutional layer, the pooling layer and the active layer, the size of the first feature map is 7 × 7 for predicting the probability that the oil level table falls into a certain block, the size of the second feature map is 7 × 7 × 4 for predicting the position of the oil level table, and the third output is the liquid level surface height coordinate.

In order to enable a network to predict the oil level meter and the liquid level height at the same time, the two types of samples are mixed when training samples are input, and the mixing ratio is set to be 1: 1, about half of the samples in a batch are full graph samples for predicting the position of the oil level meter, and the other half of the samples are screenshots generated according to the information of the oil level meter for predicting the height position of the liquid level surface. Using a cross training mode, namely mixing two scale images as input, if a large scale image is input, using the output of a first branch and a second branch of the network as a prediction result, and using an oil level table position as a true value to calculate loss; if the small-scale image is input, the third branch output of the network is used as a prediction result, and the loss is calculated by using the height of the liquid level.

The training parameters were as follows:

using the dynamic learning rate, starting at 0.001, every 10 epochs are reduced to one tenth of the previous.

batchsize＝32

momentum＝0.9

And continuously inputting the training samples and the target true value into the network, adjusting the parameters of each layer of the network by using error back propagation, continuously carrying out iterative training, and finally realizing convergence to obtain a network model.

In addition, the deep neural network designed by the invention needs to realize the prediction of two types of samples, and the loss corresponding to each type of sample is different, so that different types of losses are fused at last, then the network is driven to update, and finally the total loss tends to be minimum until the loss is stable and motionless, and the network training is finished.

The total loss function has three parts, the first part is the predicted probability loss of the oil level table, and a cross entropy loss function is adopted:

wherein g is_ijIs the true probability, p, at grid (i, j)_ijIs the predicted probability at grid (i, j), which has been activated by Sigmoid function. The second part is the predicted loss of the position and size of the oil level gauge, and the Smooth L1 loss function is adopted:

where x is the predicted position is largeThe difference between the small and the true values, Smooth L1 loss function, is insensitive to outliers and outliers, has relatively smaller gradient change, and is more stable during training. The third part is the predicted Loss of the height position of the liquid level surface, and the Loss function is also adopted and recorded as Loss by using Smooth L1_QLoc. The final overall loss function is as follows:

Loss＝θ(αLoss_OProb+βLoss_OLoc)+(1-θ)Loss_QLoc

where θ is the type of the sample, and takes a value of 1 when the sample is from the whole map, and takes a value of 0 when the sample is a cut map of the oil level table. α is a weight for the loss of the oil level gauge probability, and is taken as 1, and β is a weight for the loss of the oil level gauge position size, and is taken as 5.

In step S4, after the loss is stabilized below 0.001 after 50 epochs of training, the image collected by the industrial oil level gauge monitoring camera is preprocessed and input to the network to obtain the position and size of the oil level gauge in the image, the height position coordinate of the liquid level surface is predicted by using the screenshot of the oil level gauge, and then the image not included in the training set is used for testing.

In the invention, image data comes from industrial oil level gauge monitoring, 10000 pictures are collected in total for verifying the algorithm of the invention, the resolution is 1920 multiplied by 1080, the data set comprises different types of oil level gauges, the data set is randomly divided into a training set and a testing set, wherein 90% of image data is used as the training set, 10% of image data is used as the testing set, and then the image data is applied to the experiment to verify the effectiveness of the algorithm on the positioning of the liquid level. The experiment of the invention adopts a deep learning frame Pythrch. Experiments prove that under the condition of sufficient training, the average positioning error (Euclidean distance between a predicted value and a real value) of the height coordinate of the liquid level surface of the test set by the algorithm is 5.54 pixels, the effect is approximately expected, and the difference between the position of the liquid level surface predicted by the network and the real value is not large.

Compared with the prior art, the method has obvious accuracy, can remove the influence of imaging quality by learning large-scale data through a deep neural network technology, accurately finds an oil level meter area, and reduces the interference of image quality by adopting a deep convolutional network to regress the height position of an oil level surface instead of adopting linear detection (Hough transformation and the like) depending on the image quality after finding the oil level meter. Aiming at the characteristic that the liquid level surface is a small target object, a two-step method is adopted, the method is divided into two steps, the larger target (namely the oil level meter) is firstly detected on an original graph to obtain a highly accurate oil level meter, then the oil level meter is used as an interested area, in the area, the liquid level surface is not the small target object any more, and therefore the accurate position of the liquid level surface can be obtained after network detection. In order to achieve the purpose, a mixed training sample method is innovatively adopted in a training method, namely an integral graph and an intercepted graph are proportionally contained in a training batch, the integral graph and the intercepted graph are from two samples with different probability distributions, and models of the two types of samples can be trained simultaneously by selectively weighting loss functions. Whether the oil level meter or the liquid level surface is adopted, the detectors are not small target objects relative to respective images, and the network scale can be reduced appropriately at this time, so that the reasoning speed is accelerated. Therefore, the invention can improve the memory occupancy of the model display and the detection speed.

The above description of the embodiments is only intended to facilitate the understanding of the method of the invention and its core idea. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for reading an oil level meter based on deep network regression is characterized by comprising the following steps:

2. The deep network regression-based oil level gauge reading method according to claim 1, wherein in step S3, different types of losses are fused, and then the network is driven to update, wherein the total loss function is:

Loss＝θ(αLoss_OProb+βLoss_OLoc)+(1-θ)Loss_QLoc

wherein, theta is the type of the sample, the value is 1 when the sample is from the whole graph, and the value is 0 when the sample is a cutting graph of the oil level meter; alpha is the weight of the probability loss of the oil level gauge, and beta is the weight of the position loss of the oil level gauge;

the above loss function includes the following three parts:

the first part is the prediction probability loss of the oil level meter, and a cross entropy loss function is adopted:

wherein, g_ijIs the true probability, p, at grid (i, j)_ijIs the predicted probability at grid (i, j), which has been activated by Sigmoid function;

the second part is the predicted loss of the position and size of the oil level gauge, and the Smooth L1 loss function is adopted:

wherein x is a difference value between the predicted position size and a true value, and a Smooth L1 loss function, which is insensitive to outliers and abnormal values, has relatively smaller gradient change and is more stable during training;

the third part is the predicted Loss of the height position of the liquid level surface, and the Loss function is also adopted and recorded as Loss by using Smooth L1_QLoc。

3. The deep network regression-based oil level gauge reading method according to claim 1, wherein in step S1, sample images are acquired through a large number of cameras installed in a substation scene for monitoring the oil level gauge.