Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method for removing water mist from a drilling platform monitoring picture by using machine learning, which solves the existing defects, and recovers a high-quality clear image by removing the influence of platform mist in the image through continuous frames in a video.
According to the method, modeling of atmospheric parameters is not needed, the restored image is directly output by analyzing context information in the video in an end-to-end mode, the method is more suitable for the actual drilling platform scene, and the robustness is higher.
The purpose of the invention is realized by the following technical scheme:
a method for removing water mist from a drilling platform monitoring picture by machine learning comprises the following steps:
s100: acquiring a video sample of a drilling platform through a camera according to a preset frame rate, and performing frame extraction on the video sample, wherein the video sample comprises a fog-containing video and a fog-free video;
s200: taking continuous 8 frames of pictures from the fog-containing video, and combining the pictures into a video frame sequence X, wherein X = { X = { (X) }i,xi+1,…,xi+7Selecting a frame of actual picture X 'similar to any image in the video frame sequence X from the fog-free video as a target picture, and forming a training set X by the X' and the video frame sequence Xtrain={X,x’};
S300: the prepared training set XtrainThe training model is divided into A, B parts, wherein, part A accounts for 30% of the training set, the rest part is part B, and the model is trained according to the part A = { G = {model,DmodelIn which G ismodelTo generate a model, DmodelIs a discriminant model; wherein, the training is divided into 2 stages, and A, B parts of the training set are respectively used as G in the 2 stagesmodelThe input of (1);
s400: after the model is trained, a generative model G in the model is utilizedmodelDefogging a video-based continuous frame image of a drilling rig.
Preferably, the predetermined frame rate is a frame rate of 25 frames.
Preferably, step S300 includes:
stage 1 of training:
1) inputting part A of training set into part GmodelObtaining GmodelThe output restored picture y';
2) calculating the mean square error loss of y 'and x':
wherein x isj’、yj' are pixel values of x ' and y ', respectively, n is the number of pixels, and j is a value from 0 to n;
3) if loss is not less than the set first threshold, then l is comparedThe oss derivative calculates the gradient and propagates the gradient back to GmodelUpdating the parameters in the model until loss is less than the set first threshold;
if loss is less than the first threshold, entering the 2 nd stage of training;
stage 2 of training:
1) inputting part B of training set into part GmodelObtaining GmodelThe output restored picture y';
2) respectively inputting y 'and x' into a discriminant model DmodelObtaining two discrimination probabilities dfAnd drWhile the maximum value is found by:
max(log(df)+log(1-dr)),
and updating D with the maximized valuemodelWhen two discrimination probabilities df、drWhen the value of (A) approaches 0.5, the updating is stopped;
3) obtaining DmodelIntermediate features in the calculation of the discrimination probability to calculate the perception error Ploss:
wherein v isf、vrThe intermediate features of y 'and x' respectively, m is the number of the intermediate features, and k is from 0 to m;
if Ploss is not less than the second threshold, then the Ploss is propagated backwards to update GmodelUntil Ploss is less than said set second threshold;
and when Ploss is smaller than a set second threshold value, finishing training.
Preferably, when GmodelWhen the neural network is a deep neural network, the method also comprises the following steps:
s301: manually marking each image for training the deep neural network, and marking the positions of key points to obtain each marked image;
s302: and taking each marked image as the input of a corresponding stage of training to construct an auxiliary neural network, and assisting the deep neural network in learning and training the images.
Preferably, step S302 includes the steps of:
s3021: taking each marked image as input, selecting a proper middle layer from the deep neural network, and acquiring the output of the middle layer;
s3022: establishing an auxiliary neural network formed by convolution functions;
s3023: inputting the output of the middle layer and the corresponding attitude estimation matrix of each image before labeling into an auxiliary neural network;
s3024: and combining the outputs of the auxiliary neural network and the deep neural network, and jointly inputting the outputs into a loss function of the deep neural network to optimize the learning of the deep neural network.
Preferably, the attitude estimation matrix in step S3023 is obtained by:
s30231: calibrating a camera of the camera, and solving internal parameters of the camera, wherein the internal parameters comprise: the image optical axis principal point, the focal lengths in the X direction and the Y direction, the tangential distortion coefficient and the radial distortion coefficient;
s30232: the attitude estimation matrix is further solved as follows:
solving an attitude estimation matrix [ R | t ] (X = M [ R | t ])/X),
wherein M is an internal parameter of the camera, X is a world coordinate system, and X is an image pixel coordinate of a known shot object; r, t are the rotation vector and the translation vector of the attitude estimation matrix, respectively.
Preferably, step S3021 includes:
the internal parameters of the camera are solved by a Zhang Zhengyou calibration method and a checkerboard with known size.
Preferably, the grid is 10cm by 10 cm.
Preferably, the deep neural network selects ResNet 50.
Preferably, the auxiliary neural network selects ResNet 18.
In addition, the invention also discloses a method for removing water mist from the monitoring picture of the drilling platform by using machine learning, which comprises the following steps:
in the first step, a video sample is collected through a camera according to a preset frame rate, and frame extraction is carried out on the video sample:
wherein X is a sequence of video frames comprising fog, X' is a sequence of video frames having no or negligible fog effect, Xn、xm'are images extracted after corresponding video frames are extracted from X and X', respectively;
in a second step, 8 consecutive images are taken from the video sequence X containing fog, combined into a video frame sequence,
selecting and from fog-free video
A video frame x' with similar images is taken as a target picture to form a training set
;
In the third step, the prepared training set is used
Dividing the training set into two parts, namely A and B, wherein the part A accounts for 30% of the training set, and the rest part is B, so as to train the model;
and in the fourth step, the trained model is used for removing water mist from the images of the drilling platform based on the video continuous frames.
Compared with the traditional marking method, the method has the advantages that the influence of platform fog in the image is removed through continuous frames in the video, and the computer vision algorithm of the high-quality clear image is restored. Different from other traditional methods of image enhancement based on a single picture, such as estimation of an atmospheric degradation model or histogram equalization, the method does not need to model atmospheric parameters, directly outputs the restored image by analyzing context information in a video in an end-to-end mode, better accords with the actual application of a drilling platform scene, and has higher robustness.
The above description is only an overview of the technical solutions of the present invention, and in order to make the technical means of the present invention more clearly apparent, and to make the implementation of the content of the description possible for those skilled in the art, and to make the above and other objects, features and advantages of the present invention more obvious, the following description is given by way of example of the specific embodiments of the present invention.
Detailed Description
Specific embodiments of the present invention will be described in more detail below with reference to fig. 1 to 4. While specific embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
It should be noted that certain terms are used throughout the description and claims to refer to particular components. As one skilled in the art will appreciate, various names may be used to refer to a component. This specification and claims do not intend to distinguish between components that differ in name but not function. In the following description and in the claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. The description which follows is a preferred embodiment of the invention, but is made for the purpose of illustrating the general principles of the invention and not for the purpose of limiting the scope of the invention. The scope of the present invention is defined by the appended claims.
For the purpose of facilitating understanding of the embodiments of the present invention, the following description will be made by taking specific embodiments as examples with reference to the accompanying drawings, and the drawings are not to be construed as limiting the embodiments of the present invention.
A method for defogging a rig monitor screen using machine learning, the method comprising the steps of:
in the first step, a video sample is collected through a camera according to a preset frame rate, and frame extraction is carried out on the video sample:
wherein X is a sequence of video frames comprising fog, X' is a sequence of video frames having no or negligible fog effect, Xn、xm'are images extracted after corresponding video frames are extracted from X and X', respectively;
in a second step, 8 consecutive images are taken from the video sequence X containing fog, combined into a video frame sequence,
selecting and from fog-free video
A video frame x' with similar images is taken as a target picture to form a training set
;
In the third step, the prepared training set is divided into two parts A and B, wherein the part A accounts for 30% of the training set, and the rest part is B, so as to train the model;
and in the fourth step, the trained model is used for removing water mist from the images of the drilling platform based on the video continuous frames.
In the third step, if the training can not meet the requirements, returning to the second step again, re-selecting another continuous 8 images, re-selecting and re-selecting
A new video frame x' with similar image is used as the target picture, and the second step and the third step are executed again.
In a preferred embodiment of the method, the predetermined frame is a frame rate of 25 frames.
To further understand the present invention, in one embodiment, the method trains a model capable of reconstructing a clear image from an original image disturbed by fog by using a video acquired from an actual scene based on a deep learning computer vision model. The model is divided into two parts: model = { G =model,Dmodel};
Wherein G ismodelTo generate a model, DmodelIs a discriminant model. In the training process, the method inputs G the video sequence interfered by water mistmodelAnd with the expectation that it will get the defogged picture, while passing through DmodelFor GmodelG, judging and supervising produced recovered picturesmodelIn the training process of (1), wherein GmodelIs a deep neural network. Preferably, in one embodiment, DmodelIs a deep neural network. More preferablyIn one embodiment, only the generative model G needs to be deployed in the actual algorithm deploymentmodelTo obtain a restored picture.
For the present invention, the preparation process of the training sample is as follows:
1) acquiring a video sample according to a frame rate of 25 frames by a camera, and performing frame extraction on the video sample:
wherein X and X' are respectively a video frame sequence containing fog and a video frame sequence without fog influence or with small fog influence, and Xn、xm'are images extracted after corresponding video frames are decimated from X and X', respectively. In the training process of other deep learning models for image restoration or image reconstruction, a restored image which is completely consistent with the content in the original image to be reconstructed needs to be provided as a training set. In this method, the inventors found that: monitoring is carried out aiming at the drilling platform, and better constraint is carried out on a reconstructed image. Therefore, only the background of the shot contents of the two is needed, the foreground is similar, and the illumination conditions are not obviously different.
2) Preprocessing the acquired image:
in general, an image has foreground and background portions, where the foreground portion is often transformed and the background portion is less transformed. In the field of water mist removal and restoration of images, a deep learning model usually needs two nearly identical images without mist influence and affected by mist to perform pair training.
The inventor notices that: if the foreground and background parts are input to the network indiscriminately, the algorithm model fails to train because the input picture contains the irrespective part when the foreground part changes obviously. To solve this problem, see fig. 3, where the left ellipse in fig. 3 represents the foreground part that changes frequently, and the foreground part actually applied to the training picture by means of the mask will be masked as shown in the right diagram. Therefore, the foreground part is shielded in a mask mode, and the deep learning model is only focused on sensing the image part affected by the water mist. Therefore, for the above-mentioned video frame sequence containing fog and the video frame sequence without fog influence or with less fog influence, only the background of the contents shot by the two video frames is needed, the foreground is similar, and the illumination condition has no significant difference. During preprocessing, the foreground part 1 is shielded in a mask mode.
3) Taking 8 continuous images from the video sequence containing fog, preprocessing the images, and combining the images into a video frame sequence
,
4) Selecting a video frame x' similar to the image in the video sequence extracted in the previous step from the fog-free video sequence as a target picture to form a training set
。
The model training method disclosed in the method is shown in fig. 1 and fig. 2, and the method needs to train two models so as to generate a model GmodelA restored picture y 'closer to the actual picture x' can be generated from the input fog picture (note: the degree of proximity is determined by the first threshold value hereinafter, i.e., the threshold value in fig. 1). Then inputting x 'and y' into discriminant model DmodelSo that the system can distinguish which is the actual picture and which is the real picture, and feed back the information to the generation model GmodelSo that it produces a more realistic reconstructed picture (note: the degree of realism is determined by a second threshold hereinafter, i.e. the threshold in fig. 2, the first and second thresholds being determinable from empirical values).
In another embodiment, the invention discloses a staged training method: the training is divided into two phases:
first, training G with a small portion of datamodelMake it reach a more stable state to avoid training G at the same timemodelAnd DmodelThe phenomenon of model collapse easily occurs;
then, most of the remaining data is reused to train G at the same timemodelAnd Dmodel。
The following are exemplary: the prepared training set is divided into two parts, A and B, wherein the part A accounts for 30% of the training set (namely the small part of data), and the rest part is B.
Stage 1 of training:
1) inputting part A of training set into part GmodelObtaining GmodelThe output restored picture y';
2) calculating the mean square error loss of y 'and x':
wherein x isj’、yj' are pixel values of x ' and y ', respectively, n is the number of pixels, and j is a value from 0 to n;
3) if the loss is not less than the set first threshold, calculating the gradient by differentiating the loss, and reversely propagating the gradient to GmodelUpdating the parameters in the model until loss is less than the set first threshold;
if loss is less than the first threshold, entering the 2 nd stage of training;
stage 2 of training:
1) inputting part B of training set into part GmodelObtaining GmodelThe output restored picture y';
2) respectively inputting y 'and x' into a discriminant model DmodelObtaining two discrimination probabilities dfAnd drWhile the maximum value is found by:
max(log(df)+log(1-dr)),
and updating D with the maximized valuemodelWhen two discrimination probabilities df、drWhen the value of (A) approaches 0.5, the updating is stopped;
3) obtainGet DmodelIntermediate features in the calculation of the discrimination probability to calculate the perception error Ploss:
wherein v isf、vrThe intermediate features of y 'and x' respectively, m is the number of the intermediate features, and k is from 0 to m;
if Ploss is not less than the second threshold, then the Ploss is propagated backwards to update GmodelUntil Ploss is less than said set second threshold; when Ploss is smaller than a set second threshold value, finishing training;
considering that today's neural networks involving images tend to feed the features of the last layer into the last classifier Softmax, which means that the features are filtered layer by layer in order to express the last "useful" feature as an image, the problem is that this results in the loss of some features and the omission of information, therefore, the present embodiment is to use the intermediate features of some or all intermediate feature layers as well to calculate the perception error Ploss.
Referring to FIGS. 1 and 2, it can be seen that, similar to loss, the perceived error Ploss also controls whether G is updated continuouslymodelThe parameter (c) of (c).
7) If Ploss is not less than the second threshold, then the Ploss is propagated backwards to update GmodelUntil Ploss is less than said set second threshold; as shown in fig. 2, when Ploss is smaller than the set second threshold, the training is ended.
Thus, through two training phases, the final goal is to achieve GmodelAnd (4) training. For the present invention, in the case of GmodelAfter the training is completed, G can be directly connectedmodelDeployed to an on-site server. The inputted picture passes through GmodelThe water mist is removed after the treatment is recovered, and the picture with the water mist removed can be sent to other applications for subsequent use.
It can be appreciated that the back propagation Ploss can be similar to loss, with the gradient calculated by taking the derivative of Ploss and back propagatingGradient to Gmodel。
In another embodiment of the present invention, the substrate is,
when G ismodelWhen the neural network is a deep neural network, the method also comprises the following steps:
s301: manually marking each image for training the deep neural network, and marking the positions of key points to obtain each marked image;
s302: and taking each marked image as the input of a corresponding stage of training to construct an auxiliary neural network, and assisting the deep neural network in learning and training the images.
With this embodiment, it is considered that the final result of the image recognition is to output the coordinates of the respective key points on the image. However, if done according to the prior art: the deep neural network learns the image and directly outputs two-dimensional coordinates to perform optimization learning, which is an extremely nonlinear process, and the constraint of a loss function for optimization on the weight in the neural network is weak during the optimization learning. Therefore, in this embodiment, we finally construct an intermediate state by constructing an auxiliary neural network according to the trained images and the positions of the manually labeled key points, so as to assist the training and learning of the deep neural network. It can be understood that when D ismodelWhen the image is also a deep neural network, the above-mentioned auxiliary neural network mode can be adopted to assist the corresponding neural network to learn and train the image.
In another embodiment, step S302 includes:
s3021: taking each marked image as input, selecting a proper middle layer from the deep neural network, and acquiring the output of the middle layer;
s3022: establishing an auxiliary neural network formed by convolution functions;
s3023: inputting the output of the middle layer and the corresponding attitude estimation matrix of each image before labeling into an auxiliary neural network;
s3024: and combining the outputs of the auxiliary neural network and the deep neural network, and jointly inputting the outputs into a loss function of the deep neural network to optimize the learning of the deep neural network.
In another embodiment of the present invention, the substrate is,
the attitude estimation matrix in step S3023 is obtained by the following steps:
s30231: calibrating the camera, and solving the intrinsic parameters of the camera, wherein the intrinsic parameters comprise: the image optical axis principal point, the focal lengths in the X direction and the Y direction, the tangential distortion coefficient and the radial distortion coefficient;
s30232: the attitude estimation matrix is further solved as follows:
solving an attitude estimation matrix [ R | t ] (X = M [ R | t ])/X),
wherein M is an internal parameter of the camera, X is a world coordinate system, and X is an image pixel coordinate of a known shot object; r, t are the rotation vector and the translation vector of the attitude estimation matrix, respectively.
In another embodiment of the present invention, the substrate is,
step S3021 includes:
the internal parameters of the camera are solved by a Zhang Zhengyou calibration method and a checkerboard with known size.
For example, the intrinsic parameters of the camera are solved by shooting the pixel coordinates of a checkerboard with known dimensions in different directions and different positions in an image coordinate system and by a Zhang friend calibration method.
In another embodiment of the present invention, the substrate is,
the grid is 10cm by 10 cm.
In another embodiment of the present invention, the substrate is,
the deep neural network selects ResNet 50. Typically, the process is carried out by Python.
In another embodiment of the present invention, the substrate is,
the auxiliary neural network selects ResNet 18.
In another embodiment of the present invention, the substrate is,
the convolution function is Conv (input, w), where input represents the input and w represents the weight.
In another embodiment, the loss function is selected as a mean square error function.
Referring to fig. 4, in another embodiment, the intermediate layer outputs described previously select the outputs of intermediate layer C5:7 × 2048.
For the related embodiments described above, the present disclosure effectively reduces the fitting difficulty during the training of the relevant model, and simultaneously improves the robustness of the model. In the same test set, after the images are used as a training set and the training and optimizing methods are adopted, the map @0.5 precision of the model is 2.76% higher than that of the model which is not adopted. It is further noted that, when the attitude estimation matrix is solved by the PNP based on the rannsac algorithm, even if the average error of the PNP solution is found to be about 5% in the actual scene verification, the difficulty of fitting during the training of the relevant model is effectively reduced and the robustness of the model is improved, so that the subsequent actual scene verification is not affected by the error of the attitude estimation matrix.
Although the embodiments of the present invention have been described above with reference to the accompanying drawings, the present invention is not limited to the above-described embodiments and application fields, and the above-described embodiments are illustrative, instructive, and not restrictive. Those skilled in the art, having the benefit of this disclosure, may effect numerous modifications thereto without departing from the scope of the invention as defined by the appended claims.