CN115883764A - Underwater high-speed video frame interpolation method and system based on data cooperation - Google Patents

Underwater high-speed video frame interpolation method and system based on data cooperation Download PDF

Info

Publication number
CN115883764A
CN115883764A CN202310076493.6A CN202310076493A CN115883764A CN 115883764 A CN115883764 A CN 115883764A CN 202310076493 A CN202310076493 A CN 202310076493A CN 115883764 A CN115883764 A CN 115883764A
Authority
CN
China
Prior art keywords
optical flow
flow estimation
event
data
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310076493.6A
Other languages
Chinese (zh)
Other versions
CN115883764B (en
Inventor
姜宇
齐红
赵明浩
王跃航
张永霁
魏枫林
王凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN202310076493.6A priority Critical patent/CN115883764B/en
Publication of CN115883764A publication Critical patent/CN115883764A/en
Application granted granted Critical
Publication of CN115883764B publication Critical patent/CN115883764B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/30Assessment of water resources

Abstract

An underwater high-speed video frame inserting method and system based on data cooperation. Respectively acquiring RGB data and event data by a traditional camera and an event camera; fusing the acquired RGB data and the event data by utilizing a U-shaped synthesis network to acquire a synthesis result; performing frame optical flow estimation by using the synthetic result and the acquired RGB data through a three-layer multi-scale optical flow estimation network; performing event optical flow estimation by using the acquired RGB data and event data through a three-layer multi-scale optical flow estimation network; and fusing the fusion result, the frame optical flow estimation result performed through the three-layer multi-scale optical flow estimation network and the event optical flow estimation performed through the three-layer multi-scale optical flow estimation network through a U-shaped fusion network, and outputting an intermediate frame. The method and the device realize the generation of the intermediate frame of the video by using the RGB data and the event data, improve the frame rate of the video, and optimize the robustness of the frame interpolation effect under the nonlinear motion in an underwater scene.

Description

Underwater high-speed video frame interpolation method and system based on data cooperation
Technical Field
The invention belongs to the field of video frame synthesis, and particularly relates to an underwater high-speed video frame interpolation method and system based on RGB data and event data cooperation.
Background
The purpose of underwater video frame insertion is to reasonably restore an image of an intermediate frame through front and rear frame information in a given underwater low frame rate video, to be as close to a real motion track as possible, so as to improve the frame rate of the video. The underwater video frame insertion has extremely high application value, the recording cost of the underwater high frame rate video is extremely high, and the low frame rate video is easily acquired by various devices. For example, underwater animals move rapidly, underwater equipment rotates, and the conventional equipment at present can hardly obtain complete visual images due to the rapid movements, mainly because the frame rate of the video is not high enough, so that the actions are not coherent and clear enough from a visual point of view. The video pin method can effectively solve the problem by increasing the frame rate of the video.
However, when the existing video pin method faces the situation of non-linear motion, the motion estimation is not accurate and the details are not complete. The prior art is mainly based on a motion estimation method and a kernel method, the kernel method is mainly to estimate the inter-frame motion by using deformable convolution, but the efficiency of the kernel method is often limited by the size of the deformable convolution kernel. The size of the deformable convolution kernel seriously affects the computational resources occupied by the algorithm, and once the variation range exceeds the size of the deformable convolution kernel, the efficiency of the kernel-based method is obviously reduced. The motion estimation-based technique is to estimate the optical flow between two RGB frames and obtain an intermediate frame by forward mapping or backward mapping, and the motion estimation-based method is limited by the accuracy of motion estimation. The optical flow method is limited to the assumption of constant brightness and the assumption of linear motion, whereas most motion under water is non-linear motion. The optical flow-based method is limited by the optical flow method, and cannot accurately restore the intermediate frame between two nonlinear motion frames. The event camera has extremely low delay, and can output asynchronous event streams in real time, namely the brightness change of pixel points at a certain position at any moment. Once the brightness change in such an image exceeds the event camera threshold, the event camera will output an event with polarity (positive, negative), time, location (x, y). The event information contains the actual motion information of the object, and the actual intermediate motion state of the object, namely the intermediate frame close to the reality, can be simulated and restored by combining the event information and the RGB image information and utilizing the convolutional neural network to carry out deep learning.
Disclosure of Invention
The invention provides an underwater high-speed video frame interpolation method based on cooperation of RGB data and event data, which realizes generation of an intermediate frame of a video by utilizing the RGB data and the event data, improves the frame rate of the video, and optimizes the robustness of a frame interpolation effect under nonlinear motion in an underwater scene.
The invention provides an underwater high-speed video frame interpolation system based on cooperation of RGB data and event data, which realizes generation of an intermediate frame of a video by utilizing the RGB data and the event data, improves the frame rate of the video, and optimizes the robustness of a frame interpolation effect under nonlinear motion in an underwater scene.
The invention is realized by the following technical scheme:
an underwater high-speed video frame interpolation method based on cooperation of RGB data and event data comprises the following steps:
step 1, respectively acquiring RGB data and event data of a visual object in a space where the visual object is located by a traditional camera and an event camera;
step 2, fusing the RGB data and the event data acquired in the step 1 by utilizing a U-shaped synthetic network to acquire a synthetic result;
step 3, performing frame optical flow estimation through a three-layer multi-scale optical flow estimation network by using the synthesis result of the step 2 and the RGB data acquired in the step 1;
step 4, performing event optical flow estimation by using the RGB data and the event data acquired in the step 1 through a three-layer multi-scale optical flow estimation network;
and 5, fusing the fusion result of the step 2, the frame optical flow estimation result which is processed by the three-layer multi-scale optical flow estimation network in the step 3 and the event optical flow estimation which is processed by the three-layer multi-scale optical flow estimation network in the step 4 through a U-shaped fusion network, and outputting an intermediate frame.
A method for inserting frames of underwater high-speed videos based on cooperation of RGB data and event data further comprises the steps of converting asynchronous event data into a synchronous representation form, specifically, selecting 5 boxes for events with time lengths of two boundary frame intervals, compressing the events in nearby timestamps to the corresponding boxes in a bilinear interpolation mode, and obtaining synchronous event frames of 5 channels.
A method for inserting frames of underwater high-speed videos based on cooperation of RGB data and event data is characterized in that a U-shaped synthetic network in step 2 is specifically characterized in that a convolutional neural network model is connected according to residual errors in the U-shaped synthetic network, input RGB images and corresponding event data are synthesized through a decoding and coding structure, and a synthetic result is recorded.
A method for inserting frames of underwater high-speed videos based on cooperation of RGB data and event data is characterized in that a synthesis result obtained in step 2 is specifically that two RGB image frames of 3 channels and two event frames of 5 channels are subjected to 12 groups of convolution, a coding and decoding structure comprising 4 times of down-sampling and 4 times of up-sampling is used for fusing and learning information of two modes, the RGB images of the 3 channels are output as predicted values, L1 loss and perception loss functions are used as loss functions, and real intermediate frame data are used as real values for supervised learning.
A underwater high-speed video frame interpolation method based on cooperation of RGB data and event data is characterized in that a three-layer multi-scale optical flow estimation network in the steps 3 and 4 is specifically that a synthetic result of a U-shaped synthetic network and an RGB image adopt three-layer multi-scale residual connection convolutional neural network model, multi-scale feature information is fused, feature vectors are output, and a frame synthetic result is obtained through optical flow mapping;
and connecting the RGB image and corresponding event data with a convolutional neural network model by adopting three layers of multi-scale residuals, fusing multi-mode multi-scale feature information, outputting a feature vector, and mapping by an optical flow to obtain an event synthesis result.
A method for inserting frames of underwater high-speed video based on cooperation of RGB data and event data includes splicing RGB image frames of two 3 channels and RGB synthesized frames of the obtained 3 channels together to obtain initial input data F, obtaining optical flows F of two 2 channels by F through a first optical flow estimation module 1 ,F 1 Splicing with F after bilinear scaling, passing through a second optical flow estimation module, and obtaining characteristics and F 1 Adding to obtain F 2 ,F 2 Splicing the first image with F subjected to bilinear scaling through a third optical flow estimation module, and obtaining features and F 2 Adding to obtain F 3 (ii) a The input feature F needs to go through 3 optical flow estimation modules in total, and finally two 2-channel optical flows are obtained, which respectively represent motion vectors from an intermediate frame to a left boundary frame and a right boundary frame;
two 3-channel RGB frame-based estimation results can be obtained from two boundary frames and two motion vectors through inverse mapping. Using the L1 loss function as a loss function, and using real intermediate frame data as a real value to perform supervised learning;
the optical flow estimation module includes 10 layers of convolutional networks and one layer of transposed convolutional networks and Relu activation functions.
An underwater high-speed video frame interpolation method based on cooperation of RGB data and event data is characterized in that in the step 3, event optical flow estimation specifically comprises the steps that two RGB image frames of 3 channels and two event data of 5 channels are spliced together to obtain initial input data F, and the initial input data F is used for obtaining optical flows F of two channels 2 through a first optical flow estimation module 1 ,F 1 Splicing with F after bilinear scaling, passing through a second optical flow estimation module, and obtaining characteristics and F 1 Adding to obtain F 2 ,F 2 Splicing the first image with F subjected to bilinear scaling through a third optical flow estimation module, and obtaining features and F 2 Adding to obtain F 3 (ii) a The input feature F needs to go through 3 optical flow estimation modules in total, and finally two 2-channel optical flows are obtained, which respectively represent motion vectors from an intermediate frame to a left boundary frame and a right boundary frame;
two RGB (red, green and blue) result of 3 channels based on event estimation can be obtained by two boundary frames and two motion vectors through reverse mapping; using the L1 loss function as a loss function, and using real intermediate frame data as a real value to perform supervised learning;
the optical flow estimation module includes 10 layers of convolutional networks and one layer of transposed convolutional networks and Relu activation functions.
An underwater high-speed video frame inserting method based on cooperation of RGB data and event data is characterized in that a 3-channel RGB composite frame is spliced together in channel dimensions by using frame-based results of two 3-channel RGB and event-based results of two 3-channel RGB estimation;
firstly, information of two modes is fused and learned through two layers of dynamic convolution networks with 10 experts and 10 groups of convolutions, a coding and decoding structure comprising 4 times of downsampling and 4 times of upsampling, an RGB image of a 3 channel is output to serve as a predicted value, an L1 loss and perception loss function are used as a loss function, and real intermediate frame data are used as a real value (Ground Truth) to be supervised and learned.
An underwater high-speed video frame interpolation method based on cooperation of RGB data and event data is characterized in that a calculation formula of a video frame interpolation model of event optical flow estimation is as follows:
Figure SMS_1
in the formula, 0 and 1 are the time of two boundary frames, t is the time of an intermediate frame, V represents a motion vector, and i and j represent the number of channels of an event representation vector.
An underwater high-speed video frame interpolation system based on RGB data and event data cooperation comprises a synthesis module, an optical flow estimation module, an event optical flow estimation module and a fusion module;
the synthesis module is used for directly synthesizing information of two modes of RGB data and event data to obtain a synthesized video intermediate frame;
the frame optical flow estimation module is used for estimating the optical flows from the intermediate frame to the two boundary frames by utilizing a three-layer multi-scale optical flow estimation network according to the synthesized intermediate frame and RGB data, and obtaining a video intermediate frame after the optical flows are mapped;
the event optical flow estimation module is used for estimating the optical flow from an intermediate frame to two boundary frames by using the event data and the RGB data and utilizing a three-layer multi-scale optical flow estimation network, and obtaining a video intermediate frame after optical flow mapping;
and the fusion module is used for fusing the results of the three modules to obtain the most accurate video intermediate frame.
The invention has the beneficial effects that:
the invention realizes the generation of the intermediate frame of the video by using the RGB data and the event data, improves the frame rate of the video and optimizes the robustness of the frame interpolation effect under the nonlinear motion.
The invention utilizes the convolution neural network to carry out deep learning, and can simulate and restore the real intermediate motion state of the object, namely, the intermediate motion state is close to the real intermediate frame.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
FIG. 2 is a block diagram of the domains of the present invention.
Detailed description of the preferred embodiments
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An underwater high-speed video frame interpolation method based on cooperation of RGB data and event data comprises the following steps:
step 1, respectively acquiring RGB data and event data of a visual object in a space where the visual object is located by a traditional camera and an event camera;
step 2, fusing the RGB data and the event data acquired in the step 1 by utilizing a U-shaped synthetic network to acquire a synthetic result;
step 3, performing frame optical flow estimation through a three-layer multi-scale optical flow estimation network by using the synthesis result of the step 2 and the RGB data acquired in the step 1;
step 4, performing event optical flow estimation by using the RGB data and the event data acquired in the step 1 through a three-layer multi-scale optical flow estimation network;
and 5, fusing the fusion result of the step 2, the frame optical flow estimation result which is processed by the three-layer multi-scale optical flow estimation network in the step 3 and the event optical flow estimation which is processed by the three-layer multi-scale optical flow estimation network in the step 4 through a U-shaped fusion network, and outputting an intermediate frame.
A method for inserting frames of underwater high-speed videos based on cooperation of RGB data and event data further comprises the steps of converting asynchronous event data into a synchronous representation form, specifically, selecting 5 boxes for events with time lengths of two boundary frame intervals, compressing the events in nearby timestamps to the corresponding boxes in a bilinear interpolation mode, and obtaining synchronous event frames of 5 channels.
A method for inserting frames of underwater high-speed videos based on cooperation of RGB data and event data is characterized in that a U-shaped synthetic network in step 2 is specifically characterized in that a convolutional neural network model is connected according to residual errors in the U-shaped synthetic network, input RGB images and corresponding event data are synthesized through a decoding and coding structure, and a synthetic result is recorded.
A method for inserting frames of underwater high-speed videos based on cooperation of RGB data and event data is characterized in that a synthesis result obtained in step 2 is specifically that two RGB image frames of 3 channels and two event frames of 5 channels are subjected to 12 groups of convolution, a coding and decoding structure comprising 4 times of down-sampling and 4 times of up-sampling is used for fusing and learning information of two modes, the RGB images of the 3 channels are output as predicted values, an L1 loss and perception loss function is used as a loss function, and real intermediate frame data is used as a true value (Ground Truth) for supervised learning.
A underwater high-speed video frame interpolation method based on cooperation of RGB data and event data is characterized in that a three-layer multi-scale optical flow estimation network in the steps 3 and 4 is specifically that a synthetic result of a U-shaped synthetic network and an RGB image adopt three-layer multi-scale residual connection convolutional neural network model, multi-scale feature information is fused, feature vectors are output, and a frame synthetic result is obtained through optical flow mapping;
and connecting the RGB image and corresponding event data with a convolutional neural network model by adopting three layers of multi-scale residuals, fusing multi-mode multi-scale feature information, outputting a feature vector, and mapping by an optical flow to obtain an event synthesis result.
A method for inserting frames of underwater high-speed video based on cooperation of RGB data and event data includes splicing RGB image frames of two 3 channels and RGB synthesized frames of the obtained 3 channels together to obtain initial input data F, obtaining optical flows F of two 2 channels by F through a first optical flow estimation module 1 ,F 1 Splicing with F after bilinear scaling, passing through a second optical flow estimation module, and obtaining characteristics and F 1 Adding to obtain F 2 ,F 2 Splicing with F subjected to bilinear scaling, passing through a third optical flow estimation module, and obtaining features and F 2 Are added to obtain F 3 (ii) a The input feature F needs to go through 3 optical flow estimation modules in total, and finally two 2-channel optical flows are obtained and respectively represent motion vectors from an intermediate frame to a left boundary frame and a right boundary frame;
two 3-channel RGB frame-based estimation results can be obtained from two boundary frames and two motion vectors through inverse mapping. Using the L1 loss function as a loss function, and using real intermediate frame data as a real value (Ground Truth) for supervised learning;
the optical flow estimation module includes 10 layers of convolutional networks and one layer of transposed convolutional networks and Relu activation functions.
An underwater high-speed video frame interpolation method based on cooperation of RGB data and event data is characterized in that event optical flow estimation in the step 3 is specifically to obtain two RGB image frames of 3 channels and two event data of 5 channelsSplicing together to obtain initial input data F, obtaining two 2-channel optical flows F through a first optical flow estimation module 1 ,F 1 Splicing with F after bilinear scaling, passing through a second optical flow estimation module, and obtaining characteristics and F 1 Adding to obtain F 2 ,F 2 Splicing with F subjected to bilinear scaling, passing through a third optical flow estimation module, and obtaining features and F 2 Adding to obtain F 3 (ii) a The input feature F needs to go through 3 optical flow estimation modules in total, and finally two 2-channel optical flows are obtained and respectively represent motion vectors from an intermediate frame to a left boundary frame and a right boundary frame;
two RGB (red, green and blue) result of 3 channels based on event estimation can be obtained by two boundary frames and two motion vectors through reverse mapping; using the L1 loss function as a loss function, and using real intermediate frame data as a real value (Ground Truth) for supervised learning;
the optical flow estimation module includes 10 layers of convolutional networks and one layer of transposed convolutional networks and Relu activation functions.
An underwater high-speed video frame inserting method based on cooperation of RGB data and event data is characterized in that a 3-channel RGB composite frame is spliced together in channel dimensions by using frame-based results of two 3-channel RGB and event-based results of two 3-channel RGB estimation;
firstly, information of two modes is fused and learned through two layers of dynamic convolution networks with 10 experts, 10 groups of convolutions and a coding and decoding structure comprising 4 times of down-sampling and 4 times of up-sampling, RGB images of 3 channels are output to serve as predicted values, L1 loss and perceptual loss functions serve as loss functions, and real intermediate frame data serve as real values (Ground Truth) to conduct supervised learning.
An underwater high-speed video frame interpolation method based on cooperation of RGB data and event data is characterized in that a calculation formula of a video frame interpolation model of event optical flow estimation is as follows:
Figure SMS_2
in the formula (I), the compound is shown in the specification,0. 1 is the time of two boundary frames, t is the time of an intermediate frame, V represents a motion vector, and i and j respectively represent the number of channels of an event representation vector. Such as
Figure SMS_3
Motion vector estimation for a period of time from the insertion frame instant to the next event representation; i.e. the motion vector estimate totality from the intermediate frame to the boundary frame is denoted as ≥>
Figure SMS_4
、/>
Figure SMS_5
And the motion vector is the motion vector after the event data calibration.
An underwater high-speed video frame interpolation system based on RGB data and event data cooperation comprises a synthesis module, an optical flow estimation module, an event optical flow estimation module and a fusion module;
the synthesis module is used for directly synthesizing information of two modes of RGB data and event data to obtain a synthesized video intermediate frame;
the frame optical flow estimation module is used for estimating the optical flows from the intermediate frame to the two boundary frames by utilizing a three-layer multi-scale optical flow estimation network according to the synthesized intermediate frame and RGB data, and obtaining a video intermediate frame after the optical flows are mapped;
the event optical flow estimation module is used for estimating the optical flow from an intermediate frame to two boundary frames by using the event data and the RGB data and utilizing a three-layer multi-scale optical flow estimation network, and obtaining a video intermediate frame after optical flow mapping;
and the fusion module is used for fusing the results of the three modules to obtain the most accurate video intermediate frame.
Connecting a convolutional neural network model according to a residual error in the U-shaped synthetic network, synthesizing the input RGB image and corresponding event data through a decoding and coding structure, and recording a synthetic result;
the three-layer multi-scale optical flow estimation network adopts three-layer multi-scale residual errors to connect the synthetic result of the U-shaped synthetic network and the RGB image with a convolutional neural network model, fuses multi-scale feature information, and outputs a feature vector to obtain a frame synthetic result through optical flow mapping;
connecting RGB image and corresponding event data with a convolutional neural network model by adopting three-layer multi-scale residual error, fusing multi-mode multi-scale feature information, outputting feature vectors, and mapping by optical flow to obtain event synthesis result
The U-shaped fusion network uses a U-shaped residual error with double-layer dynamic convolution to connect with a convolution neural network model, and outputs a unique three-channel RGB image through a synthetic result obtained by fusing the U-shaped fusion network with a decoding and coding structure, a frame synthetic result obtained by a three-layer multi-scale optical flow estimation network and an event synthetic result.

Claims (10)

1. An underwater high-speed video frame interpolation method based on cooperation of RGB data and event data is characterized by comprising the following steps of:
step 1, respectively acquiring RGB data and event data by a traditional camera and an event camera;
step 2, fusing the RGB data and the event data acquired in the step 1 by utilizing a U-shaped synthetic network to acquire a synthetic result;
step 3, performing frame optical flow estimation through a three-layer multi-scale optical flow estimation network by using the synthesis result of the step 2 and the RGB data acquired in the step 1;
step 4, performing event optical flow estimation by using the RGB data and the event data acquired in the step 1 through a three-layer multi-scale optical flow estimation network;
and 5, fusing the fusion result of the step 2, the frame optical flow estimation result which is processed by the three-layer multi-scale optical flow estimation network in the step 3 and the event optical flow estimation which is processed by the three-layer multi-scale optical flow estimation network in the step 4 through a U-shaped fusion network, and outputting an intermediate frame.
2. The underwater high-speed video frame interpolation method based on the cooperation of the RGB data and the event data as claimed in claim 1, wherein the step 1 further includes converting asynchronous event data into a synchronous representation form, specifically, selecting 5 boxes for events with a time length of two boundary frame intervals, compressing events in nearby timestamps to corresponding boxes in a manner similar to bilinear interpolation, and acquiring synchronous event frames of 5 channels.
3. The underwater high-speed video frame interpolation method based on the cooperation of the RGB data and the event data according to claim 2, wherein the U-shaped synthetic network of the step 2 is specifically that the input RGB image and the corresponding event data are synthesized through a decoding and coding structure according to a residual connecting convolutional neural network model in the U-shaped synthetic network, and recorded as a synthetic result.
4. The method as claimed in claim 3, wherein the synthesis result obtained in step 2 is specifically that two 3-channel RGB image frames and two 5-channel event frames undergo 12 sets of convolution, a coding and decoding structure including 4 times of downsampling and 4 times of upsampling fuses and learns information of two modalities, 3-channel RGB images are output as predicted values, L1 loss and perceptual loss functions are used as loss functions, and real intermediate frame data are used as real values for supervised learning.
5. The underwater high-speed video frame interpolation method based on the cooperation of the RGB data and the event data as claimed in claim 1, wherein the three-layer multi-scale optical flow estimation network of the step 3 and the step 4 is characterized in that a three-layer multi-scale residual error is adopted to connect a convolutional neural network model, multi-scale feature information is fused, feature vectors are output, and a frame synthesis result is obtained through optical flow mapping;
and connecting a convolution neural network model by adopting three layers of multi-scale residuals, fusing multi-mode multi-scale feature information, outputting feature vectors, and mapping by using an optical flow to obtain an event synthesis result.
6. The underwater high-speed video frame interpolation method based on the cooperation of the RGB data and the event data as claimed in claim 5, wherein the method comprisesCharacterized in that the step 3 frame optical flow estimation specifically comprises splicing two 3-channel RGB image frames and the obtained 3-channel RGB synthesized frame to obtain initial input data F, and obtaining two 2-channel optical flows F by a first optical flow estimation module 1 ,F 1 Splicing with F after bilinear scaling, passing through a second optical flow estimation module, and obtaining characteristics and F 1 Are added to obtain F 2 ,F 2 Splicing the first image with F subjected to bilinear scaling through a third optical flow estimation module, and obtaining features and F 2 Are added to obtain F 3 (ii) a The input feature F needs to go through 3 optical flow estimation modules in total, and finally two 2-channel optical flows are obtained and respectively represent motion vectors from an intermediate frame to a left boundary frame and a right boundary frame;
two boundary frames and two motion vectors are subjected to inverse mapping to obtain RGB (red, green and blue) frame-based estimation results of two 3 channels; using the L1 loss function as a loss function, and using real intermediate frame data as a real value to perform supervised learning;
the optical flow estimation module includes 10 layers of convolutional networks and one layer of transposed convolutional networks and Relu activation functions.
7. The method as claimed in claim 6, wherein the step 3 event optical flow estimation is implemented by splicing two 3-channel RGB image frames and two 5-channel event data to obtain initial input data F, and obtaining two 2-channel optical flows F by the first optical flow estimation module 1 ,F 1 Splicing with F after bilinear scaling, passing through a second optical flow estimation module, and obtaining characteristics and F 1 Adding to obtain F 2 ,F 2 Splicing with F subjected to bilinear scaling, passing through a third optical flow estimation module, and obtaining features and F 2 Adding to obtain F 3 (ii) a The input feature F needs to go through 3 optical flow estimation modules in total, and finally two 2-channel optical flows are obtained, which respectively represent motion vectors from an intermediate frame to a left boundary frame and a right boundary frame;
two RGB (red, green and blue) result of 3 channels based on event estimation can be obtained by two boundary frames and two motion vectors through reverse mapping; using the L1 loss function as a loss function, and using real intermediate frame data as a real value to perform supervised learning;
the optical flow estimation module includes 10 layers of convolutional networks and one layer of transposed convolutional networks and Relu activation functions.
8. The underwater high-speed video frame interpolation method based on the cooperation of the RGB data and the event data as claimed in claim 1, wherein a 3-channel RGB composite frame, two 3-channel RGB frame-based results and two 3-channel RGB event estimation result are spliced together in channel dimension;
firstly, information of two modes is fused and learned through two layers of dynamic convolution networks with 10 experts, 10 groups of convolutions and a coding and decoding structure comprising 4 times of down-sampling and 4 times of up-sampling, RGB images of 3 channels are output to serve as predicted values, L1 loss and perceptual loss functions serve as loss functions, and real intermediate frame data serve as real values to conduct supervised learning.
9. The method for underwater high-speed video frame interpolation based on RGB data and event data collaboration as claimed in claim 4, wherein the calculation formula of the video frame interpolation model of the event optical flow estimation is as follows:
Figure QLYQS_1
in the formula, 0 and 1 are the time of two boundary frames, t is the time of an intermediate frame, V represents a motion vector, and i and j represent the number of channels of an event representation vector.
10. An underwater high-speed video frame interpolation system based on RGB data and event data cooperation is characterized by comprising a synthesis module, an optical flow estimation module, an event optical flow estimation module and a fusion module;
the synthesis module is used for directly synthesizing information of two modes of RGB data and event data to obtain a synthesized video intermediate frame;
the frame optical flow estimation module is used for estimating the optical flows from the intermediate frame to the two boundary frames by utilizing a three-layer multi-scale optical flow estimation network according to the synthesized intermediate frame and RGB data, and obtaining a video intermediate frame after the optical flows are mapped;
the event optical flow estimation module is used for estimating the optical flow from an intermediate frame to two boundary frames by using the event data and the RGB data and utilizing a three-layer multi-scale optical flow estimation network, and obtaining a video intermediate frame after optical flow mapping;
and the fusion module is used for fusing the results of the three modules to obtain the most accurate video intermediate frame.
CN202310076493.6A 2023-02-08 2023-02-08 Underwater high-speed video frame inserting method and system based on data collaboration Active CN115883764B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310076493.6A CN115883764B (en) 2023-02-08 2023-02-08 Underwater high-speed video frame inserting method and system based on data collaboration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310076493.6A CN115883764B (en) 2023-02-08 2023-02-08 Underwater high-speed video frame inserting method and system based on data collaboration

Publications (2)

Publication Number Publication Date
CN115883764A true CN115883764A (en) 2023-03-31
CN115883764B CN115883764B (en) 2023-05-23

Family

ID=85760841

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310076493.6A Active CN115883764B (en) 2023-02-08 2023-02-08 Underwater high-speed video frame inserting method and system based on data collaboration

Country Status (1)

Country Link
CN (1) CN115883764B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116389912A (en) * 2023-04-24 2023-07-04 北京大学 Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera
CN116405626A (en) * 2023-06-05 2023-07-07 吉林大学 Global matching underwater moving object vision enhancement method
CN117745596A (en) * 2024-02-19 2024-03-22 吉林大学 Cross-modal fusion-based underwater de-blocking method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200265590A1 (en) * 2019-02-19 2020-08-20 The Trustees Of The University Of Pennsylvania Methods, systems, and computer readable media for estimation of optical flow, depth, and egomotion using neural network trained using event-based learning
WO2021241804A1 (en) * 2020-05-29 2021-12-02 연세대학교 산학협력단 Multi-flow-based frame interpolation device and method
CN114245007A (en) * 2021-12-06 2022-03-25 西北工业大学 High frame rate video synthesis method, device, equipment and storage medium
WO2022096158A1 (en) * 2020-11-05 2022-05-12 Huawei Technologies Co., Ltd. Device and method for video interpolation
WO2022141376A1 (en) * 2020-12-31 2022-07-07 华为技术有限公司 Posture estimation method related apparatus
CN114885144A (en) * 2022-03-23 2022-08-09 清华大学 High frame rate 3D video generation method and device based on data fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200265590A1 (en) * 2019-02-19 2020-08-20 The Trustees Of The University Of Pennsylvania Methods, systems, and computer readable media for estimation of optical flow, depth, and egomotion using neural network trained using event-based learning
WO2021241804A1 (en) * 2020-05-29 2021-12-02 연세대학교 산학협력단 Multi-flow-based frame interpolation device and method
WO2022096158A1 (en) * 2020-11-05 2022-05-12 Huawei Technologies Co., Ltd. Device and method for video interpolation
WO2022141376A1 (en) * 2020-12-31 2022-07-07 华为技术有限公司 Posture estimation method related apparatus
CN114245007A (en) * 2021-12-06 2022-03-25 西北工业大学 High frame rate video synthesis method, device, equipment and storage medium
CN114885144A (en) * 2022-03-23 2022-08-09 清华大学 High frame rate 3D video generation method and device based on data fusion

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116389912A (en) * 2023-04-24 2023-07-04 北京大学 Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera
CN116389912B (en) * 2023-04-24 2023-10-10 北京大学 Method for reconstructing high-frame-rate high-dynamic-range video by fusing pulse camera with common camera
CN116405626A (en) * 2023-06-05 2023-07-07 吉林大学 Global matching underwater moving object vision enhancement method
CN116405626B (en) * 2023-06-05 2023-09-22 吉林大学 Global matching underwater moving object vision enhancement method and system
CN117745596A (en) * 2024-02-19 2024-03-22 吉林大学 Cross-modal fusion-based underwater de-blocking method

Also Published As

Publication number Publication date
CN115883764B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN115883764B (en) Underwater high-speed video frame inserting method and system based on data collaboration
CN111311490B (en) Video super-resolution reconstruction method based on multi-frame fusion optical flow
CN110782490B (en) Video depth map estimation method and device with space-time consistency
CN111260560B (en) Multi-frame video super-resolution method fused with attention mechanism
CN113837938B (en) Super-resolution method for reconstructing potential image based on dynamic vision sensor
CN108765479A (en) Using deep learning to monocular view estimation of Depth optimization method in video sequence
CN111105432B (en) Unsupervised end-to-end driving environment perception method based on deep learning
CN111028150A (en) Rapid space-time residual attention video super-resolution reconstruction method
CN111582483A (en) Unsupervised learning optical flow estimation method based on space and channel combined attention mechanism
CN112019828B (en) Method for converting 2D (two-dimensional) video into 3D video
CN114170286B (en) Monocular depth estimation method based on unsupervised deep learning
CN113077505B (en) Monocular depth estimation network optimization method based on contrast learning
CN111260680B (en) RGBD camera-based unsupervised pose estimation network construction method
CN115187638B (en) Unsupervised monocular depth estimation method based on optical flow mask
CN112040222A (en) Visual saliency prediction method and equipment
CN115035171A (en) Self-supervision monocular depth estimation method based on self-attention-guidance feature fusion
CN113554032A (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
CN115100090A (en) Monocular image depth estimation system based on space-time attention
Chai et al. Expression-aware face reconstruction via a dual-stream network
Liu et al. GridDehazeNet+: An enhanced multi-scale network with intra-task knowledge transfer for single image dehazing
CN112750092A (en) Training data acquisition method, image quality enhancement model and method and electronic equipment
CN111767679A (en) Method and device for processing time-varying vector field data
CN115841523A (en) Double-branch HDR video reconstruction algorithm based on Raw domain
CN115941872B (en) Video frame inserting method and system for underwater high-speed moving target
CN114663802A (en) Cross-modal video migration method of surveillance video based on characteristic space-time constraint

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant