CN117560494A - Encoding method for rapidly enhancing underground low-quality video - Google Patents

Encoding method for rapidly enhancing underground low-quality video Download PDF

Info

Publication number
CN117560494A
CN117560494A CN202410038681.4A CN202410038681A CN117560494A CN 117560494 A CN117560494 A CN 117560494A CN 202410038681 A CN202410038681 A CN 202410038681A CN 117560494 A CN117560494 A CN 117560494A
Authority
CN
China
Prior art keywords
frame
frames
interpolation
low
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410038681.4A
Other languages
Chinese (zh)
Other versions
CN117560494B (en
Inventor
赵作鹏
高宇蒙
刘营
闵冰冰
缪小然
胡建峰
贺晨
赵广明
周杰
赵强
芈欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Biteda Information Technology Co ltd
China University of Mining and Technology CUMT
Original Assignee
Jiangsu Biteda Information Technology Co ltd
China University of Mining and Technology CUMT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Biteda Information Technology Co ltd, China University of Mining and Technology CUMT filed Critical Jiangsu Biteda Information Technology Co ltd
Priority to CN202410038681.4A priority Critical patent/CN117560494B/en
Publication of CN117560494A publication Critical patent/CN117560494A/en
Application granted granted Critical
Publication of CN117560494B publication Critical patent/CN117560494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/172Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A coding method for rapid enhancement of a submerged low-quality video comprises the following steps: acquiring low-illumination and low-definition video stream data under various light conditions underground a coal mine by using a detection camera; processing the acquired video stream data, intercepting a low-illumination scene video, and constructing an underground low-quality video data set; receiving two reference frames through a frame interpolator and performing frame interpolation operation to generate one reference frame; determining Gop structure, and encoding the I frame and the P frame by using an I frame encoder and a P frame encoder respectively; the input B frame is encoded by the current P frame encoder using the reference frame as a reference. The invention adds the B frame coding function for the existing nerve P frame coder-decoder, greatly improves the enhanced coding performance of the P frame coder to the low-quality video, and has strong flexibility and generalization; the comprehensive analysis of the Gop structure is realized, and the overall coding efficiency is improved; scientific data support is provided for coal mine safety management, and the safety management level is improved.

Description

Encoding method for rapidly enhancing underground low-quality video
Technical Field
The invention relates to a coding method for rapidly enhancing a low-quality video under a mine, and belongs to the technical field of coal mine safety exploitation.
Background
The structure of the mine causes uneven distribution of light throughout the scene, often resulting in insufficient light, especially deep in the mine. In addition, dust, water mist, harmful gas and the like exist in the mine, and all influence the quality of the monitoring video. In order to ensure the safety and productivity of miners, it is important to be able to see clearly all the details in the environment.
In recent years, artificial intelligence technology has been widely used in the field of video processing. Artificial intelligence techniques can extract useful information in video by learning from large amounts of data and categorize, analyze, and process such information. In the submerged low-quality video enhancement coding, the video can be preprocessed, extracted and optimized by utilizing an artificial intelligence technology, so that the video quality and the video readability are improved, and the potential risks such as cracks, collapse, ponding and the like can be detected; for automated and machine-assisted operations, a clear video may even further improve the accuracy and reliability of the machine. In addition, the submerged low-quality video enhancement coding also needs to consider the requirements of real-time performance and robustness. The real-time requirement enhancement algorithm can process video data in a short time and output an enhanced video result; the robustness requires that the enhancement algorithm can adapt to different environments and equipment conditions under the mine, and stable and reliable enhancement results are output. The camera in the environment under the mine is often influenced by factors such as dust, water mist and the like, so that the quality of the shot video is unstable. However, the existing video enhancement algorithm is sensitive to environmental changes, and is difficult to provide stable enhancement results.
Disclosure of Invention
The invention provides a coding method for rapidly enhancing a low-quality video under an ore, which can provide a more robust and stable video enhancement algorithm, improve the enhancement coding performance and coding efficiency of the low-quality video, provide data support for coal mine safety management and improve the level of underground safety management.
In order to achieve the above purpose, the present invention provides a coding method for rapid enhancement of low-quality video under mine, comprising the steps of:
step one, acquiring low-illumination and low-definition video stream data under various light conditions in a coal mine by using a detection camera;
step two, processing the video stream data obtained in the step one, intercepting a low-illumination scene video, and constructing an underground low-quality video data set;
step three, receiving two reference frames of a single underground coal mine low-quality video from a data set through a frame interpolator and performing frame interpolation operation to generate one reference frame;
determining Gop, namely a picture group structure, and respectively encoding the I frame and the P frame by using an I frame encoder and a P frame encoder;
and fifthly, taking the reference frame as a reference, and encoding the input B frame through a current P frame encoder.
Further, the step of processing the video stream data and constructing the downhole low-quality video data set in the step two is as follows:
(1) Pulling a video stream from a detection camera by using OpenCV and GSstreamer to provide real-time video preview and acquisition functions;
(2) And data acquisition is performed in a multithreading manner.
Further, the frame interpolation operation in the step three includes the steps of:
(1) Performing frame interpolation on low-illumination video data by using a Super-SloMo-based frame interpolation method, and normalizing a time index to two reference points of 0 and 1 to interpolate to time t, wherein 0< t <1;
(2) Calculating forward and backward optical flows between reference frames through FlowNet based on frame interpolation of Super-SloMo;
(3) Interpolation is performed at time t using a linear interpolation method in flowinteraction, and an interpolation optical flow calculation formula is as follows:
in the method, in the process of the invention,and->Reference frames->And->Forward and reverse optical flow therebetween;
(4) Using interpolated optical flow between reference framesAnd->Performing warping, and delivering the two warped reference frames together with the original reference frames and the interpolated optical flow to refinnenet for further adjusting the bi-directional optical flow to +.>And->And generate a mask
(5) Generating an interpolation result by using the bidirectional warping, wherein the calculation formula is as follows:
in the method, in the process of the invention,and->Respectively, forward and backward optical flows after being regulated by the refinnet, +.>Representing element multiplication->A mask generated for refianenet.
Further, the step of calculating the forward and backward optical flows between the reference frames by FlowNet in the step (2) of the frame interpolation operation in the step three is:
(1) Between two adjacent frames, finding out matched characteristic point pairs according to a characteristic point matching algorithm;
(2) According to the matched characteristic point pairs, a corresponding relation between two adjacent frames is established through a least square method and other fitting methods, namely, the corresponding position of each pixel point in the next frame is found;
(3) According to the established corresponding relation, calculating a motion vector of each pixel point by an optical flow estimation method to form a forward optical flow field;
(4) According to the matched characteristic point pairs, a corresponding relation between the current frame and the next frame is established through fitting methods such as a least square method, namely, the corresponding position of each pixel point in the current frame is found;
(5) And calculating the motion vector of each pixel point by an optical flow estimation method according to the established corresponding relation to form a reverse optical flow field.
Further, in the step (4) of the frame interpolation operation in the step three, the step of using the interpolation optical flow to perform warping between the reference frames is as follows:
(1) Using a feature point matching algorithm between two adjacent frames, and matching to determine whether the feature vectors of two key points are similar by comparing the Euclidean distance with the Hamming distance, thereby finding matched feature point pairs; the Euclidean distance calculation formula is as follows:
in the method, in the process of the invention,D i andD j respectively the descriptors of the two feature points,Nin order to be a dimension of the descriptor,D i [k]andD j [k]respectively represent two characteristic points at the firstkValues in dimensions;
(2) According to the matched characteristic point pairs, calculating a motion vector of each pixel point by using a light flow estimation method to form a preliminary light flow field;
(3) On the basis of the preliminary optical flow field, interpolation processing is carried out on pixel points between adjacent frames by using an interpolation method, wherein the interpolation method is bilinear interpolation or bicubic interpolation so as to obtain more accurate optical flow estimation;
(4) Performing distortion correction on the reference frame according to the difference between the new position and the current position obtained by interpolation optical flow estimation result, namely, applying optical flow estimation and interpolation technology on the current frame, then interpolating corrected pixel values onto the reference frame, using displacement vectors to represent how many pixels the pixels move in the horizontal direction and how many pixels move in the vertical direction, and capturing the relative displacement between the two frames, so that the movement of a target object between the two frames is smoother and more natural;
(5) And obtaining a final interpolation optical flow estimation result after distortion correction.
Further, the step of determining Gop structure in the step four is as follows:
(1) Setting a first frame and a last frame of the video as I frames, setting an intermediate frame as B frames and P frames, and alternately encoding the B frames and the P frames;
(2) In IBP configuration, the first I frame is used as a reference frame, the subsequent P frame references the I frame, and the B frame references the previous and subsequent I frame, P frame, or B frame.
Further, in the fourth step, the step of encoding the I frame and the P frame by using the I frame encoder and the P frame encoder respectively includes:
(1) I-frame codec inputs frames through a single auto-encoderCompression to reconstruct->
(2) P-frame codec generates input frames by motion estimation of Flow-AE and motion compensation of WarpPrediction of->
(3) Residual correction by Residual-AERebuilding->The calculation process is as follows:
in the method, in the process of the invention,representing optical flow->And->Representing encoder residual and decoder residual, respectively, < >>And 3, as an interpolation result obtained by calculation in the step three, the Flow-AE is a motion estimator, and the Residual-AE is a Residual corrector.
Further, in the fifth step, the step of encoding the input B frame by the current P frame encoder using the reference frame as a reference includes:
(1) B frames needing to be encoded are taken as input, the B frames are bidirectional prediction frames, and pixel values of the B frames can be calculated by referring to pixel values of a previous frame and a next frame, specifically: let B frame to be coded be frame B and P be the previous frame 1 The next frame is P 2 Assuming that (x, y) represents a pixel position in the B frame, its motion vector is obtained by motion estimationRespectively show the pixel at P 1 And P 2 The direction and distance of movement of (a); the pixel value is obtained by a weighted average formula:
(2) Taking a single frame obtained through interpolation operation as a reference frame, wherein the single frame is the interpolation result obtained through calculation
(3) Mapping the pixel values of the B frame to the pixel values of the reference frame; based on the pixel domain mapping, motion estimation is performed, wherein the motion estimation is to estimate motion information of an object by finding a corresponding pixel point in a reference frame, which is most matched with each pixel point in a current B frame, specifically: dividing the current frame into small blocks; defining a search window for each block; searching the reference frame for the most similar to the current block; comparing the similarity of the current block and each possible matching block in the search window by an absolute difference similarity measurement method; selecting a motion vector that minimizes the similarity measure as a motion vector of the current block, the motion vector representing a displacement of the current block relative to a corresponding block in the reference frame; repeating the above process for each block in the video sequence to obtain a motion vector field for the entire frame;
(4) According to the result of motion estimation, motion compensation is performed on the B frame, wherein the motion compensation is performed by performing corresponding translation and rotation on pixel values in the reference frame according to motion information, so as to obtain predicted pixel values of the B frame, specifically: using the motion vector to displace the corresponding block in the reference frame to predict the block position in the current frame; calculating a residual error between the current frame block and the prediction block, namely a difference between an actual pixel value and a prediction pixel value; encoding and transmitting only the residual error so as to reconstruct the original block by adding the prediction block and the residual error at the decoding end;
(5) Coding the difference between the predicted pixel value and the actual pixel value of the B frame through a residual error network;
(6) And outputting the coded B frame as a coding result, wherein the result comprises a predicted pixel value of the B frame and corresponding motion information.
Further, in the step five, the reference frame is taken as a reference, and in the step (3) of encoding the input B frame by the current P frame encoder, the step of mapping the pixel value of the B frame onto the pixel value of the reference frame is as follows:
(1) For each block in the B frame, finding a similar block in the reference frame and mapping the similar block to the B frame, wherein the mapping relation is determined by the information of a motion vector obtained by motion estimation;
(2) According to the determined mapping relation, a mapping table is established, wherein the mapping table is a two-dimensional array, and each element represents the corresponding position of one pixel point in the B frame in the reference frame;
(3) Mapping each pixel point in the B frame to a corresponding position in the reference frame by using the established mapping table;
(4) And replacing the original pixel value in the B frame with the new mapped pixel value to finish the mapping of the pixel domain.
Further, the characteristic point matching algorithm is one of a SIFT algorithm or a SURF algorithm; the optical flow estimation method is one of a Lucas-Kanade algorithm or a Farnesback algorithm.
The method comprises the steps of obtaining low-illumination and low-definition video stream data under various light conditions under a coal mine by using a detection camera; processing the acquired video stream data, intercepting a low-illumination scene video, and constructing an underground low-quality video data set; receiving two reference frames of a single underground coal mine low-quality video from a data set through a frame interpolator and performing frame interpolation operation to generate one reference frame; determining Gop, namely a picture group structure, and encoding the I frame and the P frame by using an I frame encoder and a P frame encoder respectively; the input B frame is encoded by the current P frame encoder using the reference frame as a reference. The method has the advantages that the interpolation block is added, the B frame coding function is added for the existing nerve P frame coder-decoder, the enhanced coding performance of the P frame coder on the low-quality video is greatly improved, and the method has strong flexibility and generalization; the comprehensive analysis of the Gop structure is realized, and the overall coding efficiency is improved; the invention realizes data integration and analysis, uploads the video after enhancement coding to the platform for integration and analysis, provides scientific data support for coal mine safety management, and improves the safety management level.
Drawings
FIG. 1 is a schematic illustration of the working principle of the present invention;
FIG. 2 is a schematic diagram of a frame interpolation method in an embodiment of the invention;
FIG. 3 is a schematic diagram of I-frame and P-frame encoder structures in an embodiment of the invention;
FIG. 4 is a schematic diagram of an in-mine video screenshot after enhancement encoding in accordance with an embodiment of the invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
An encoding method for rapidly enhancing a submerged low-quality video comprises the following steps:
step one, acquiring low-illumination and low-definition video stream data under various light conditions in a coal mine by using a detection camera;
step two, processing the video stream data obtained in the step one, intercepting a low-illumination scene video, and constructing an underground low-quality video data set;
step three, receiving two reference frames of a single underground coal mine low-quality video from a data set through a frame interpolator and performing frame interpolation operation to generate one reference frame;
determining Gop, namely a picture group structure, and respectively encoding the I frame and the P frame by using an I frame encoder and a P frame encoder;
and fifthly, taking the reference frame as a reference, and encoding the input B frame through a current P frame encoder.
As a preferred embodiment, the step of processing the video stream data and constructing the downhole low-quality video data set in the step two is:
(1) Pulling a video stream from a detection camera by using OpenCV and GSstreamer to provide real-time video preview and acquisition functions;
(2) And data acquisition is performed in a multithreading manner.
As a preferred embodiment, as shown in fig. 1, the step of the frame interpolation operation in the step three is:
(1) Performing frame interpolation on low-illumination video data by using a Super-SloMo-based frame interpolation method, and normalizing a time index to two reference points of 0 and 1 to interpolate to time t, wherein 0< t <1;
(2) Calculating forward and backward optical flows between reference frames through FlowNet based on frame interpolation of Super-SloMo;
(3) Interpolation is performed at time t using a linear interpolation method in flowinteraction, and an interpolation optical flow calculation formula is as follows: the method comprises the steps of carrying out a first treatment on the surface of the
In the method, in the process of the invention,and->Reference frames->And->Forward and reverse optical flow therebetween;
(4) Using interpolated optical flow between reference framesAnd->Performing warping, and delivering the two warped reference frames together with the original reference frames and the interpolated optical flow to refinnenet for further adjusting the bi-directional optical flow to +.>And->And generate a mask
(5) Generating an interpolation result by using the bidirectional warping, wherein the calculation formula is as follows:
in the method, in the process of the invention,and->Respectively, forward and backward optical flows after being regulated by the refinnet, +.>Representing element multiplication->A mask generated for refianenet.
As a preferred embodiment, the step of calculating the forward and reverse optical flows between the reference frames by FlowNet in the step (2) of the frame interpolation operation in the step three is:
(1) Between two adjacent frames, finding out matched characteristic point pairs according to a characteristic point matching algorithm;
(2) According to the matched characteristic point pairs, a corresponding relation between two adjacent frames is established through a least square method and other fitting methods, namely, the corresponding position of each pixel point in the next frame is found;
(3) According to the established corresponding relation, calculating a motion vector of each pixel point by an optical flow estimation method to form a forward optical flow field;
(4) According to the matched characteristic point pairs, a corresponding relation between the current frame and the next frame is established through fitting methods such as a least square method, namely, the corresponding position of each pixel point in the current frame is found;
(5) And calculating the motion vector of each pixel point by an optical flow estimation method according to the established corresponding relation to form a reverse optical flow field.
As a preferred embodiment, the step of using the interpolated optical flow for warping between the reference frames in the step (4) of the frame interpolation operation in the step three is:
(1) Using a feature point matching algorithm between two adjacent frames, and matching to determine whether the feature vectors of two key points are similar by comparing the Euclidean distance with the Hamming distance, thereby finding matched feature point pairs; the Euclidean distance calculation formula is as follows:
in the method, in the process of the invention,D i andD j respectively the descriptors of the two feature points,Nin order to be a dimension of the descriptor,D i [k]andD j [k]respectively represent two characteristic points at the firstkValues in dimensions;
(2) According to the matched characteristic point pairs, calculating a motion vector of each pixel point by using a light flow estimation method to form a preliminary light flow field;
(3) On the basis of the preliminary optical flow field, interpolation processing is carried out on pixel points between adjacent frames by using an interpolation method, wherein the interpolation method is bilinear interpolation or bicubic interpolation so as to obtain more accurate optical flow estimation;
(4) Performing distortion correction on the reference frame according to the difference between the new position and the current position obtained by interpolation optical flow estimation result, namely, applying optical flow estimation and interpolation technology on the current frame, then interpolating corrected pixel values onto the reference frame, using displacement vectors to represent how many pixels the pixels move in the horizontal direction and how many pixels move in the vertical direction, and capturing the relative displacement between the two frames, so that the movement of a target object between the two frames is smoother and more natural;
(5) And obtaining a final interpolation optical flow estimation result after distortion correction.
Further, the step of determining Gop structure in the step four is as follows:
(1) Setting a first frame and a last frame of the video as I frames, setting an intermediate frame as B frames and P frames, and alternately encoding the B frames and the P frames;
(2) In IBP configuration, the first I frame is used as a reference frame, the subsequent P frame references the I frame, and the B frame references the previous and subsequent I frame, P frame, or B frame.
As a preferred embodiment, the step of encoding the I frame and the P frame with the I frame encoder and the P frame encoder in the step four includes:
(1) I-frame codec inputs frames through a single auto-encoderCompression to reconstruct->
(2) P-frame codec generates input frames by motion estimation of Flow-AE and motion compensation of WarpPrediction of->
(3) Residual correction by Residual-AERebuilding->The calculation process is as follows:
in the method, in the process of the invention,representing optical flow->And->Representing encoder residual and decoder residual, respectively, < >>And 3, as an interpolation result obtained by calculation in the step three, the Flow-AE is a motion estimator, and the Residual-AE is a Residual corrector.
As a preferred embodiment, in the fifth step, the step of encoding the input B frame by the current P frame encoder using the reference frame as a reference includes:
(1) B frames needing to be encoded are taken as input, the B frames are bidirectional prediction frames, and pixel values of the B frames can be calculated by referring to pixel values of a previous frame and a next frame, specifically: let B frame to be coded be frame B and P be the previous frame 1 The next frame is P 2 Assuming that (x, y) represents a pixel position in the B frame, its motion vector is obtained by motion estimationRespectively show the pixel at P 1 And P 2 The direction and distance of movement of (a); the pixel value is obtained by a weighted average formula:
(2) Taking a single frame obtained through interpolation operation as a reference frame, wherein the single frame is the interpolation result obtained through calculation
(3) Mapping the pixel values of the B frame to the pixel values of the reference frame; based on the pixel domain mapping, motion estimation is performed, wherein the motion estimation is to estimate motion information of an object by finding a corresponding pixel point in a reference frame, which is most matched with each pixel point in a current B frame, specifically: dividing the current frame into small blocks; defining a search window for each block; searching the reference frame for the most similar to the current block; comparing the similarity of the current block and each possible matching block in the search window by an absolute difference similarity measurement method; selecting a motion vector that minimizes the similarity measure as a motion vector of the current block, the motion vector representing a displacement of the current block relative to a corresponding block in the reference frame; repeating the above process for each block in the video sequence to obtain a motion vector field for the entire frame;
(4) According to the result of motion estimation, motion compensation is performed on the B frame, wherein the motion compensation is performed by performing corresponding translation and rotation on pixel values in the reference frame according to motion information, so as to obtain predicted pixel values of the B frame, specifically: using the motion vector to displace the corresponding block in the reference frame to predict the block position in the current frame; calculating a residual error between the current frame block and the prediction block, namely a difference between an actual pixel value and a prediction pixel value; encoding and transmitting only the residual error so as to reconstruct the original block by adding the prediction block and the residual error at the decoding end;
(5) Coding the difference between the predicted pixel value and the actual pixel value of the B frame through a residual error network;
(6) And outputting the coded B frame as a coding result, wherein the result comprises a predicted pixel value of the B frame and corresponding motion information.
Further, in the step five, the reference frame is taken as a reference, and in the step (3) of encoding the input B frame by the current P frame encoder, the step of mapping the pixel value of the B frame onto the pixel value of the reference frame is as follows:
(1) For each block in the B frame, finding a similar block in the reference frame and mapping the similar block to the B frame, wherein the mapping relation is determined by the information of a motion vector obtained by motion estimation;
(2) According to the determined mapping relation, a mapping table is established, wherein the mapping table is a two-dimensional array, and each element represents the corresponding position of one pixel point in the B frame in the reference frame;
(3) Mapping each pixel point in the B frame to a corresponding position in the reference frame by using the established mapping table;
(4) And replacing the original pixel value in the B frame with the new mapped pixel value to finish the mapping of the pixel domain.
As a preferred embodiment, the feature point matching algorithm is one of SIFT or SURF algorithm; the optical flow estimation method is one of a Lucas-Kanade algorithm or a Farnesback algorithm.
Examples
In the embodiment, a coal mine underground low-quality video data set uses videos provided by a Chen four-floor coal mine video control system, 2000 sections of videos are extracted from the videos, 1879 sections of low-quality videos of 5 scenes and 4 light conditions captured by 10 detection cameras are obtained, wherein a training set comprises 1503 sections of low-quality videos in 4 environments of under-mine backlight, non-uniform illumination, strong light interference and low noise, and a test set comprises 376 sections of low-quality videos; in the embodiment, the detection camera needs to have a wide visual angle, a capability of adapting to a complex environment, strong instantaneity and durability, and can work stably and reliably in an underground environment, so that real-time encoding of underground low-quality videos is guaranteed;
as shown in fig. 2, the frame interpolation operation is performed by using a Super-SloMo based frame interpolation method, specifically: calculating forward and backward optical flows between reference frames through FlowNet; interpolation at time t using linear interpolation in FlowInterpolation; warping between reference frames using the interpolated optical flow and submitting the two warped reference points, the original reference point and the interpolated optical flow together to refinnenet to further adjust the bi-directional optical flow and generate a mask; generating an interpolation result by using the bidirectional warping;
encoding a first reference frame of the video as an I frame, encoding a later reference frame as a P frame, and taking the encoded B frame as a reference of a next B frame; the I frame and the P frame are encoded by an I frame encoder and a P frame encoder, and the I frame encoder and the P frame encoder are structured as shown in FIG. 3, and the encoding process is as follows: i-frame codec inputs frames through a single auto-encoderCompression to reconstruct->The method comprises the steps of carrying out a first treatment on the surface of the P-frame codec generates input frame +_by motion estimation of Flow-AE and motion compensation of Warp>Prediction of->The method comprises the steps of carrying out a first treatment on the surface of the Residual correction by Residual-AE +.>Rebuilding->
The input B frame is encoded by the current P frame encoder, and the specific process is as follows: the B frame that needs to be encoded is taken as input. The B frame is a bi-directional prediction frame, and its pixel value can be calculated by referring to the pixel values of the previous frame and the following frame; taking the single frame obtained through interpolation operation as a reference frame; mapping the pixel values of the B frame to the pixel values of the reference frame; motion estimation is performed on the basis of pixel domain mapping. Motion estimation is to estimate motion information of an object by finding a corresponding pixel point in a reference frame that is most matched with each pixel point in a current B frame; and performing motion compensation on the B frame according to the motion estimation result. The motion compensation is to perform corresponding translation and rotation on pixel values in a reference frame according to motion information so as to obtain predicted pixel values of a B frame; coding the difference between the predicted pixel value and the actual pixel value of the B frame through a residual error network; outputting the coded B frame as a coding result, wherein the result comprises a predicted pixel value of the B frame, corresponding motion information and the like, and the comparison between PSNR (peak signal to noise ratio) and MS-SSIM on the UVG, MCL-JCV, HEVC data sets and self-made submerged data sets and the average BD rate gain of the MS-SSIM relative to the H.264 standard is shown in the table I, wherein the PSNR (peak signal to noise ratio) is an index for measuring the image or video quality, and the distortion degree of the image is estimated by comparing the mean square error between an original image and the image after coding or decoding; BD is an index for measuring improvement in video coding performance; MS-SSIM (multi-scale structural similarity) is an index for measuring image or video quality more fully than single-scale SSIM because it takes into account structural information of different spatial scales. The MS-SSIM calculates the structural similarity index over multiple scales and then performs a weighted average. Compared with PSNR, MS-SSIM is more consistent with human eye perception of image quality. The enhanced encoded sub-mine video screenshot is shown in fig. 4.
Table one comparison of the present invention with the average BD rate gain of the present invention over the h.265, SSF on the UVG, MCL-JCV, HEVC data set and on the homemade subore data set and MS-SSIM over the h.264 standard
From Table one can see: the coding method of the invention greatly improves the enhanced coding performance and video quality of low-quality video and improves the overall coding efficiency.

Claims (10)

1. The encoding method for rapidly enhancing the submerged low-quality video is characterized by comprising the following steps of:
step one, acquiring low-illumination and low-definition video stream data under various light conditions in a coal mine by using a detection camera;
step two, processing the video stream data obtained in the step one, intercepting a low-illumination scene video, and constructing an underground low-quality video data set;
step three, receiving two reference frames of a single underground coal mine low-quality video from a data set through a frame interpolator and performing frame interpolation operation to generate one reference frame;
determining Gop structure, and encoding the I frame and the P frame by using an I frame encoder and a P frame encoder respectively;
and fifthly, taking the reference frame as a reference, and encoding the input B frame through a current P frame encoder.
2. The encoding method for rapid enhancement of low-quality video under mine according to claim 1, wherein the step of processing the video stream data and constructing the low-quality video data set under mine in the step two is:
(1) Pulling a video stream from a detection camera by using OpenCV and GSstreamer to provide real-time video preview and acquisition functions;
(2) And data acquisition is performed in a multithreading manner.
3. The encoding method for rapid enhancement of understory low-quality video according to claim 1, wherein the step of frame interpolation operation in the step three is:
(1) Performing frame interpolation on the low-illumination video data by using a Super-SloMo-based frame interpolation method, and normalizing a time index to be two reference points of 0 and 1 to interpolate to time t, wherein 0< t <1;
(2) Calculating forward and backward optical flows between reference frames through FlowNet based on frame interpolation of Super-SloMo;
(3) Interpolation is performed at time t using a linear interpolation method in flowinteraction, and an interpolation optical flow calculation formula is as follows:
in the method, in the process of the invention,and->Reference frames->And->Forward and reverse optical flow therebetween;
(4) Using interpolated optical flow between reference framesAnd->Warp and give the two warped reference frames to RefineNet along with the original reference frame and the interpolated optical flow, furtherStep-adjusting bidirectional optical flow to->And->And generates mask->
(5) Generating an interpolation result by using the bidirectional warping, wherein the calculation formula is as follows:
in the method, in the process of the invention,and->Respectively, forward and backward optical flows after being regulated by the refinnet, +.>Representing element multiplication->A mask generated for refianenet.
4. The encoding method for rapid enhancement of understory low-quality video according to claim 3, wherein the step of calculating the forward and reverse optical flows between the reference frames by FlowNet in the step (2) is:
(1) Between two adjacent frames, finding out matched characteristic point pairs according to a characteristic point matching algorithm;
(2) According to the matched characteristic point pairs, a corresponding relation between two adjacent frames is established through a least square method and other fitting methods, namely, the corresponding position of each pixel point in the next frame is found;
(3) According to the established corresponding relation, calculating a motion vector of each pixel point by an optical flow estimation method to form a forward optical flow field;
(4) According to the matched characteristic point pairs, a corresponding relation between the current frame and the next frame is established through fitting methods such as a least square method, namely, the corresponding position of each pixel point in the current frame is found;
(5) And calculating the motion vector of each pixel point by an optical flow estimation method according to the established corresponding relation to form a reverse optical flow field.
5. The encoding method for rapid enhancement of understory low-quality video according to claim 3, wherein the step of warping between reference frames using interpolation optical flow in the step (4) is:
(1) Using a feature point matching algorithm between two adjacent frames, and matching to determine whether the feature vectors of two key points are similar by comparing the Euclidean distance with the Hamming distance, thereby finding matched feature point pairs; the Euclidean distance calculation formula is as follows:
in the method, in the process of the invention,D i andD j respectively the descriptors of the two feature points,Nin order to be a dimension of the descriptor,D i [k]andD j [k]respectively represent two characteristic points at the firstkValues in dimensions;
(2) According to the matched characteristic point pairs, calculating a motion vector of each pixel point by using a light flow estimation method to form a preliminary light flow field;
(3) On the basis of the preliminary optical flow field, interpolation processing is carried out on pixel points between adjacent frames by using an interpolation method, wherein the interpolation method is bilinear interpolation or bicubic interpolation so as to obtain more accurate optical flow estimation;
(4) Performing distortion correction on the reference frame according to the difference between the new position and the current position obtained by interpolation optical flow estimation result, namely, by applying optical flow estimation and interpolation technology on the current frame, then interpolating corrected pixel values onto the reference frame, using a displacement vector to represent how many pixels the pixel moves in the horizontal direction and how many pixels move in the vertical direction, and capturing the relative displacement between the two frames;
(5) And obtaining a final interpolation optical flow estimation result after distortion correction.
6. The encoding method for rapid enhancement of undersea video according to claim 1, wherein the step of determining Gop structure in the step four is:
(1) Setting a first frame and a last frame of the video as I frames, setting an intermediate frame as B frames and P frames, and alternately encoding the B frames and the P frames;
(2) In IBP configuration, the first I frame is used as a reference frame, the subsequent P frame references the I frame, and the B frame references the previous and subsequent I frame, P frame, or B frame.
7. The encoding method for rapid enhancement of undersea video according to claim 1, wherein the step of encoding the I-frame and the P-frame by an I-frame encoder and a P-frame encoder, respectively, in the fourth step is:
(1) I-frame codec inputs frames through a single auto-encoderCompression to reconstruct->
(2) P-frame codec generates input frames by motion estimation of Flow-AE and motion compensation of WarpPrediction of (2)
(3) Residual correction by Residual-AERebuilding->The calculation process is as follows:
in the method, in the process of the invention,representing optical flow->And->Representing encoder residual and decoder residual, respectively, < >>And 3, as an interpolation result obtained by calculation in the step three, the Flow-AE is a motion estimator, and the Residual-AE is a Residual corrector.
8. The encoding method for rapid enhancement of understory low-quality video according to claim 1, wherein in the fifth step, the reference frame is used as a reference, and the step of encoding the input B frame by the current P frame encoder is:
(1) B frames needing to be encoded are taken as input, the B frames are bidirectional prediction frames, and pixel values of the B frames are calculated by referring to pixel values of a previous frame and a next frame, specifically: let B frame to be coded be frame B and P be the previous frame 1 The next frame is P 2 Assuming that (x, y) represents a pixel position in the B frame, its motion vector is obtained by motion estimationRespectively show the pixel at P 1 And P 2 The direction and distance of movement of (a); the pixel value is obtained by a weighted average formula:
(2) Taking a single frame obtained through interpolation operation as a reference frame, wherein the single frame is the interpolation result obtained through calculation
(3) Mapping the pixel values of the B frame to the pixel values of the reference frame; based on the pixel domain mapping, motion estimation is performed, wherein the motion estimation is to estimate motion information of an object by finding a corresponding pixel point in a reference frame, which is most matched with each pixel point in a current B frame, specifically: dividing the current frame into small blocks; defining a search window for each block; searching the reference frame for the most similar to the current block; comparing the similarity of the current block and each possible matching block in the search window by an absolute difference similarity measurement method; selecting a motion vector that minimizes the similarity measure as a motion vector of the current block, the motion vector representing a displacement of the current block relative to a corresponding block in the reference frame; repeating the above process for each block in the video sequence to obtain a motion vector field for the entire frame;
(4) According to the result of motion estimation, motion compensation is performed on the B frame, wherein the motion compensation is performed by performing corresponding translation and rotation on pixel values in the reference frame according to motion information, so as to obtain predicted pixel values of the B frame, specifically: using the motion vector to displace the corresponding block in the reference frame to predict the block position in the current frame; calculating a residual error between the current frame block and the prediction block, namely a difference between an actual pixel value and a prediction pixel value; encoding and transmitting only the residual error so as to reconstruct the original block by adding the prediction block and the residual error at the decoding end;
(5) Coding the difference between the predicted pixel value and the actual pixel value of the B frame through a residual error network;
(6) And outputting the coded B frame as a coding result, wherein the result comprises a predicted pixel value of the B frame and corresponding motion information.
9. The encoding method for rapid enhancement of understory low-quality video according to claim 8, wherein the step of mapping the pixel values of the B frame onto the pixel values of the reference frame in the step (3) is:
(1) For each block in the B frame, finding a similar block in the reference frame and mapping the similar block to the B frame, wherein the mapping relation is determined by the information of a motion vector obtained by motion estimation;
(2) According to the determined mapping relation, a mapping table is established, wherein the mapping table is a two-dimensional array, and each element represents the corresponding position of one pixel point in the B frame in the reference frame;
(3) Mapping each pixel point in the B frame to a corresponding position in the reference frame by using the established mapping table;
(4) And replacing the original pixel value in the B frame with the new mapped pixel value to finish the mapping of the pixel domain.
10. The encoding method for rapid enhancement of undersea low-quality video according to claim 4, wherein the feature point matching algorithm is one of SIFT or SURF algorithm; the optical flow estimation method is one of a Lucas-Kanade algorithm or a Farnesback algorithm.
CN202410038681.4A 2024-01-11 2024-01-11 Encoding method for rapidly enhancing underground low-quality video Active CN117560494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410038681.4A CN117560494B (en) 2024-01-11 2024-01-11 Encoding method for rapidly enhancing underground low-quality video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410038681.4A CN117560494B (en) 2024-01-11 2024-01-11 Encoding method for rapidly enhancing underground low-quality video

Publications (2)

Publication Number Publication Date
CN117560494A true CN117560494A (en) 2024-02-13
CN117560494B CN117560494B (en) 2024-03-19

Family

ID=89818927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410038681.4A Active CN117560494B (en) 2024-01-11 2024-01-11 Encoding method for rapidly enhancing underground low-quality video

Country Status (1)

Country Link
CN (1) CN117560494B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120069895A1 (en) * 2010-09-17 2012-03-22 David Frederique Blum High Quality Video Encoder
CN103051894A (en) * 2012-10-22 2013-04-17 北京航空航天大学 Fractal and H.264-based binocular three-dimensional video compression and decompression method
CN110545402A (en) * 2019-08-18 2019-12-06 宁波职业技术学院 underground monitoring video processing method, computer equipment and storage medium
CN110662042A (en) * 2018-06-29 2020-01-07 英特尔公司 Global motion estimation and modeling for accurate global motion compensation for video processing
CN115984958A (en) * 2022-12-19 2023-04-18 中国矿业大学 Method for identifying actions of underground coal mine personnel applied to low-illumination environment
CN116600134A (en) * 2023-05-04 2023-08-15 光线云(杭州)科技有限公司 Parallel video compression method and device adapting to graphic engine

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120069895A1 (en) * 2010-09-17 2012-03-22 David Frederique Blum High Quality Video Encoder
CN103051894A (en) * 2012-10-22 2013-04-17 北京航空航天大学 Fractal and H.264-based binocular three-dimensional video compression and decompression method
CN110662042A (en) * 2018-06-29 2020-01-07 英特尔公司 Global motion estimation and modeling for accurate global motion compensation for video processing
CN110545402A (en) * 2019-08-18 2019-12-06 宁波职业技术学院 underground monitoring video processing method, computer equipment and storage medium
CN115984958A (en) * 2022-12-19 2023-04-18 中国矿业大学 Method for identifying actions of underground coal mine personnel applied to low-illumination environment
CN116600134A (en) * 2023-05-04 2023-08-15 光线云(杭州)科技有限公司 Parallel video compression method and device adapting to graphic engine

Also Published As

Publication number Publication date
CN117560494B (en) 2024-03-19

Similar Documents

Publication Publication Date Title
Agustsson et al. Scale-space flow for end-to-end optimized video compression
TWI432034B (en) Multi-view video coding method, multi-view video decoding method, multi-view video coding apparatus, multi-view video decoding apparatus, multi-view video coding program, and multi-view video decoding program
RU2117412C1 (en) Method and device for comparison of picture signal between adjacent frames and evaluation of image movement
KR101344425B1 (en) Multi-view image coding method, multi-view image decoding method, multi-view image coding device, multi-view image decoding device, multi-view image coding program, and multi-view image decoding program
KR101540138B1 (en) Motion estimation with an adaptive search range
JP5012413B2 (en) Jitter estimation method and jitter estimation apparatus
US20110122315A1 (en) Method and apparatus for synchronizing video data
JP6636615B2 (en) Motion vector field encoding method, decoding method, encoding device, and decoding device
CN106688232A (en) Perceptual optimization for model-based video encoding
CN107027025B (en) A kind of light field image compression method based on macro block of pixels adaptive prediction
CN108256511B (en) Human motion detection method based on video coding code stream
Liu et al. Three-dimensional point-cloud plus patches: Towards model-based image coding in the cloud
CN109587503B (en) 3D-HEVC depth map intra-frame coding mode fast decision method based on edge detection
De Simone et al. Deformable block-based motion estimation in omnidirectional image sequences
CN108449599B (en) Video coding and decoding method based on surface transmission transformation
Kamble et al. Modified three-step search block matching motion estimation and weighted finite automata based fractal video compression
Gao et al. Structure-preserving motion estimation for learned video compression
Wang et al. Fast depth video compression for mobile RGB-D sensors
CN117560494B (en) Encoding method for rapidly enhancing underground low-quality video
Guo et al. Learning Cross-Scale Weighted Prediction for Efficient Neural Video Compression
Ma et al. Surveillance video coding with vehicle library
Chen et al. Multisource surveillance video coding with synthetic reference frame
CN114782803A (en) Method for monitoring transmission line galloping based on compression sampling and image recognition
CN105611299A (en) Motion estimation method based on HEVC
JP5334241B2 (en) Frame image motion vector estimation apparatus and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant