CN114885144A - High frame rate 3D video generation method and device based on data fusion - Google Patents
High frame rate 3D video generation method and device based on data fusion Download PDFInfo
- Publication number
- CN114885144A CN114885144A CN202210293645.3A CN202210293645A CN114885144A CN 114885144 A CN114885144 A CN 114885144A CN 202210293645 A CN202210293645 A CN 202210293645A CN 114885144 A CN114885144 A CN 114885144A
- Authority
- CN
- China
- Prior art keywords
- frame rate
- video
- event stream
- event
- frames
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/10—Processing, recording or transmission of stereoscopic or multi-view image signals
- H04N13/106—Processing image signals
- H04N13/111—Transformation of image signals corresponding to virtual viewpoints, e.g. spatial image interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
- H04N7/0127—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level by changing the field or frame frequency of the incoming video signal, e.g. frame rate converter
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Abstract
The application discloses a high frame rate 3D video generation method and device based on data fusion, wherein the method comprises the following steps: acquiring video and event data lower than a preset frame rate from an event camera, combining the video and the event data in pairs to generate a plurality of groups of adjacent image frames, calculating to obtain a timestamp set of all intermediate frames, intercepting event streams from two boundary frames to expected intermediate frames, inputting the event streams into a preset impulse neural network for forward propagation to obtain an event stream data feature vector, splicing the event stream data feature vector with the adjacent image frames, inputting the event stream data feature vector into a preset multi-mode fusion network for forward propagation to obtain all intermediate frames, generating high frame rate video higher than a second preset frame rate, and performing forward propagation by using a preset 3D depth estimation network to obtain all high frame rate depth maps, thereby forming the high frame rate 3D video. Therefore, the technical problem that the generated image quality is low due to the fact that the initial brightness value of each pixel point is lacked because only the event stream is used as input in the related technology is solved.
Description
Technical Field
The present application relates to the field of computer vision and neuromorphic computing technologies, and in particular, to a method and an apparatus for generating a high frame rate 3D video based on data fusion.
Background
On one hand, the traditional camera is limited by the frame rate, and the cost of a professional high-speed video camera required for shooting a high-frame-rate video is extremely high; on the other hand, generating a high frame rate 3D video, i.e., a high frame rate depth map video, from a low frame rate video has certain drawbacks in achieving high speed 3D observation.
In the related technology, a video is generated by using a pure event stream, and the event stream is converted into a grid-shaped tensor expression by using a stacking mode, so that an image is generated by using a deep learning method, and the purpose of high-speed 3D observation is achieved.
However, in the related art, only the event stream is used as an input, the initial brightness value of each pixel point is lacked, and it is an underdetermined problem to estimate the brightness only by means of the brightness change record, so that the generated image quality is low and needs to be improved.
Disclosure of Invention
The application provides a high-frame-rate 3D video generation method and device based on data fusion, and aims to solve the technical problem that in the related technology, only event streams are used as input, and the initial brightness value of each pixel point is lacked, so that the generated image quality is low.
An embodiment of a first aspect of the present application provides a high frame rate 3D video generation method based on data fusion, including the following steps: acquiring video and event data lower than a preset frame rate from an event camera; combining every two adjacent image frames in the video to generate a plurality of groups of adjacent image frames, and calculating a timestamp set of all the intermediate frames expected to be obtained; intercepting a first event stream and a second event stream from two boundary frames to an expected intermediate frame according to the timestamp set, and inputting the first event stream and the second event stream to a preset impulse neural network for forward propagation to obtain a first event stream data feature vector and a second event stream data feature vector; splicing the adjacent image frames, the first event stream data feature vector and the second event stream data feature vector, inputting the adjacent image frames, the first event stream data feature vector and the second event stream data feature vector into a preset multi-mode fusion network for forward propagation to obtain all intermediate frames, and generating a high frame rate video higher than a second preset frame rate; and based on the high frame rate video, performing forward propagation by using a preset 3D depth estimation network to obtain all the high frame rate depth maps, and combining all the high frame rate depth maps to form the high frame rate 3D video.
Optionally, in an embodiment of the present application, before inputting the first event stream and the second event stream into the preset spiking neural network for forward propagation, the method further includes: and constructing the impulse neural network based on a Spike Response model as a neuron dynamic model.
Optionally, in an embodiment of the present application, the multi-modal fusion network includes a coarse synthesis sub-network and a fine tuning sub-network, where the coarse synthesis sub-network uses a first U-Net structure, the number of input channels of an input layer is 64+2 × k, the number of output channels of an output layer is k, and the fine tuning sub-network uses a second U-Net structure, the number of input channels of the input layer is 3 × k, the number of output channels of the output layer is k, and k is the number of channels of the image frame of the video lower than the preset frame rate.
Optionally, in an embodiment of the present application, the 3D depth estimation network uses a third U-Net structure, and the number of input channels of the input layer is 3 × k, and the number of output channels of the output layer is 1.
Optionally, in an embodiment of the present application, the calculation formula of the timestamp sets of all intermediate frames is:
where N is the total number of frames of the input low frame rate video, N is the multiple of the desired frame rate boost, t j Is the time stamp of the j frame of the input low frame rate video.
The embodiment of the second aspect of the present application provides a high frame rate 3D video generating device based on data fusion, including: the first acquisition module is used for acquiring videos and event data which are lower than a preset frame rate from the event camera; the computing module is used for combining every two adjacent image frames in the video to generate a plurality of groups of adjacent image frames and computing a timestamp set of all the intermediate frames expected to be obtained; the second acquisition module is used for intercepting a first event stream and a second event stream from two boundary frames to an expected intermediate frame according to the timestamp set, and inputting the first event stream and the second event stream to a preset impulse neural network for forward propagation to obtain a first event stream data feature vector and a second event stream data feature vector; the fusion module is used for splicing the adjacent image frames, the first event stream data feature vector and the second event stream data feature vector, inputting the adjacent image frames, the first event stream data feature vector and the second event stream data feature vector into a preset multi-mode fusion network for forward propagation to obtain all intermediate frames, and generating a high frame rate video higher than a second preset frame rate; and the generating module is used for carrying out forward propagation by utilizing a preset 3D depth estimation network based on the high frame rate video to obtain all the high frame rate depth maps, and combining all the high frame rate depth maps to form the high frame rate 3D video.
Optionally, in an embodiment of the present application, the method further includes: and the construction module is used for constructing the impulse neural network based on a Spike Response model as a neuron dynamic model.
Optionally, in an embodiment of the present application, the multi-modal fusion network includes a coarse synthesis sub-network and a fine tuning sub-network, where the coarse synthesis sub-network uses a first U-Net structure, the number of input channels of an input layer is 64+2 × k, the number of output channels of an output layer is k, and the fine tuning sub-network uses a second U-Net structure, the number of input channels of an input layer is 3 × k, the number of output channels of an output layer is k, and k is the number of channels of the image frame of the video lower than the preset frame rate.
Optionally, in an embodiment of the present application, the 3D depth estimation network uses a third U-Net structure, and the number of input channels of the input layer is 3 × k, and the number of output channels of the output layer is 1.
Optionally, in an embodiment of the present application, the calculation formula of the first event stream and the second event stream is:
where N is the total number of frames of the input low frame rate video, N is the multiple of the desired frame rate boost, t j Is the time stamp of the j frame of the input low frame rate video.
An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the data fusion based high frame rate 3D video generation method according to the above embodiments.
A fourth aspect of the present application provides a computer-readable storage medium, which stores computer instructions for causing the computer to execute the method for generating a high frame rate 3D video based on data fusion according to the foregoing embodiment.
According to the method and the device, inter-frame motion information can be provided by using event data, an event stream is coded by using a pulse neural network, all intermediate frames are obtained through a multi-mode fusion network, a high-frame-rate video is generated, a high-frame-rate 3D video is formed by using a 3D depth estimation network, effective three-dimensional observation on a high-speed scene is achieved, the multi-mode data information can be better used by using the event stream and a low-frame-rate video image frame as input, and the quality of the high-frame-rate 3D video is improved. Therefore, the technical problem that the generated image quality is low due to the fact that the initial brightness value of each pixel point is lacked because only the event stream is used as input in the related technology is solved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a flowchart of a high frame rate 3D video generation method based on data fusion according to an embodiment of the present application;
FIG. 2 is a flowchart of a high frame rate 3D video generation method based on data fusion according to an embodiment of the present application;
fig. 3 is a schematic diagram of low frame rate video data and event stream data of a high frame rate 3D video generation method based on data fusion according to an embodiment of the present application;
fig. 4 is a schematic diagram of inter-frame video data of a high frame rate 3D video generation method based on data fusion according to an embodiment of the present application;
FIG. 5 is a schematic diagram of an input event stream, a low frame rate video and generated high frame rate video data of a high frame rate 3D video generation method based on data fusion according to an embodiment of the present application;
FIG. 6 is a high frame rate depth map at 10 times frame rate boost for a high frame rate 3D video generation method based on data fusion according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a high frame rate 3D video generation apparatus based on data fusion according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The following describes a high frame rate 3D video generation method and apparatus based on data fusion according to an embodiment of the present application with reference to the drawings. In order to solve the technical problem that in the related technology mentioned in the background technology center, only event streams are used as input, and the initial brightness value of each pixel point is lacked, so that the generated image quality is low, the application provides a high-frame-rate 3D video generation method based on data fusion. Therefore, the technical problem that the generated image quality is low due to the fact that the initial brightness value of each pixel point is lacked because only the event stream is used as input in the related technology is solved.
Specifically, fig. 1 is a schematic flowchart of a high frame rate 3D video generation method based on data fusion according to an embodiment of the present application.
As shown in fig. 1, the high frame rate 3D video generation method based on data fusion includes the following steps:
in step S101, video and event data below a preset frame rate are acquired from an event camera.
In the actual execution process, the embodiment of the application can acquire the video and the event data with the frame rate lower than the preset frame rate from the event camera, so that the acquisition of the original data is realized, and a data base is laid for the subsequent generation of the high-frame-rate video.
It can be understood that the event camera is a sensor inspired by biology, the working principle of the event camera is greatly different from that of a traditional camera, the absolute light intensity of a scene collected by the traditional camera at a fixed frame rate is different, the event camera only outputs an event stream when the scene light intensity changes, and compared with the traditional camera, the event camera has the advantages of high dynamic range, high time resolution, no dynamic blurring and the like, and is beneficial to ensuring the generation of a high frame rate video.
The event camera is as a novel vision sensor, can't directly use the various algorithms of traditional camera and image, the event camera does not have the notion of frame rate, its every pixel point asynchronous work, output an event when detecting the light intensity change, every event is a quadruple (x, y, t, p), contain pixel abscissa (x, y), timestamp t and event polarity p (wherein, p is-1 and represents this pixel point light intensity and reduces, p is 1 and represents this pixel point light intensity and increases), summarize the event data of all pixel point outputs, can form the event list of compriseing an event, as the event stream data of camera output. The event camera does not have the concept of frame rate, each pixel point of the event camera works asynchronously, and when the light intensity change is detected, an event is output. And summarizing the event data output by all the pixel points to form an event list consisting of a plurality of events, wherein the event list is used as the event stream data output by the camera.
The preset frame rate may be set by a person skilled in the art, and is not limited herein.
In step S102, two adjacent image frames in the video are combined to generate a plurality of groups of adjacent image frames, and a timestamp set of all the intermediate frames is calculated.
As a possible implementation manner, in the low frame rate video, the embodiment of the present application may combine two adjacent image frames in a low frame rate video to generate multiple groups of adjacent image frames, and calculate a timestamp set T expected to obtain all intermediate frames for each group of adjacent image frames, and record the timestamp set T as:
T={τ 1 1,2 ,τ 2 1,2 ,...,τ n 1,2 ,τ 1 2,3 ,τ 2 2,3 ,...,τ n 2,3 ,...,τ 1 N-1,N ,τ 2 N-1,N ,...,τ n N-1,N }。
optionally, in an embodiment of the present application, a calculation formula of the timestamp sets of all the intermediate frames is:
where N is the total number of frames of the input low frame rate video, N is the multiple of the desired frame rate boost, t j Is the time stamp of the j frame of the input low frame rate video.
Specifically, the calculation formula for the timestamps of all the intermediate frames desired may be as follows:
where N is the total number of frames of the input low frame rate video, N is a multiple of the desired frame rate boost, t j Is the timestamp of the jth frame of the incoming low frame rate video.
According to the embodiment of the application, the time stamp sets of all the intermediate frames can be obtained through calculation, so that the data can be preprocessed, and a basis is provided for subsequent data fusion.
In step S103, a first event stream and a second event stream from two boundary frames to an expected intermediate frame are intercepted according to the timestamp set, and the first event stream and the second event stream are input to a preset impulse neural network for forward propagation, so as to obtain a first event stream data feature vector and a second event stream data feature vector.
Further, the embodiment of the present application may intercept a first event stream epsilon from two boundary frames to a desired intermediate frame according to the intermediate frame timestamp set calculated in step S102 1 And a second stream of events epsilon 2 Inputting the first event flow and the second event flow into a preset impulse neural network for forward propagation to obtain a first event flow data feature vector F 1 And a second event stream data feature vector F 2 . According to the embodiment of the application, the event stream is encoded by using the impulse neural network, so that the effect of denoising the event stream data can be better achieved, and the quality of the generated video is further improved.
Wherein the first event stream ε 1 And a second stream of events epsilon 2 The calculation formula of (c) can be as follows:
wherein, tau i j,j+1 Time stamp of the desired intermediate frame, t j And t j+1 For desired inter-frame phaseAnd inputting the time stamp of the low frame rate video frame adjacent to the input.
It should be noted that the predetermined spiking neural network will be described in detail below.
Optionally, in an embodiment of the present application, before inputting the first event stream and the second event stream into a preset impulse neural network for forward propagation, the method further includes: and constructing an impulse neural network based on a Spike Response model as a neuron dynamic model.
The spiking neural network is described in detail herein.
It can be understood that the impulse neural network is a third-generation artificial neural network, neurons in the impulse neural network are not activated in each iterative propagation, but are activated when the membrane potential of the neurons reaches a certain specific value, and when one neuron is activated, the impulse neural network generates a signal to be transmitted to other neurons to increase or decrease the membrane potential of the neurons, so that the simulated neurons of the impulse neural network are closer to reality and are more suitable for processing time-series pulse signals.
In an actual implementation process, the embodiment of the application can use a Spike Response model as a neuron dynamic model to construct the pulse convolution neural network.
In particular, the spiking neural network may include an input convolutional layer, a hidden convolutional layer, and an output convolutional layer. The number of input channels of the input convolution layer is 2, the size of a convolution kernel is 3 multiplied by 3 corresponding to a positive polarity event and a negative polarity event of an event stream, the step length is 1, and the number of output channels is 16; the number of input channels of the hidden convolution layer is 16, the size of the convolution kernel is 3 multiplied by 3, the step length is 1, and the number of output channels is 16; the number of input channels of the output convolutional layer is 16, the size of the convolutional core is 3 × 3, the step size is 1, and the number of output channels is 32.
In step S104, the adjacent image frames, the first event stream data feature vector and the second event stream data feature vector are spliced and input to a preset multi-modal fusion network for forward propagation, so as to obtain all intermediate frames, and generate a high frame rate video higher than a second preset frame rate.
As a possible way of implementationIn the embodiment of the present application, the adjacent image frames of the low frame rate video obtained in step S102 and the first event stream data feature vector F obtained in step S103 may be combined 1 And a second event stream data feature vector F 2 And splicing, inputting the spliced image frames into a preset multi-mode fusion network for forward propagation, and generating a frame of intermediate frame to complete the calculation of the single high frame rate image frame.
Specifically, the embodiment of the application may first use the adjacent image frames of the low frame rate video and the event stream data feature vector F 1 And F 2 Splicing the two signals, and inputting the two signals into a coarse synthesis sub-network to obtain a coarse output result; and then splicing the coarse output result with the input adjacent image frame, and inputting the result into a fine adjustment sub-network to obtain a final output result.
Further, in the embodiment of the present application, the above steps may be repeated for the timestamp of each expected intermediate frame calculated in step S102, so as to complete the calculation of all intermediate frames, and further generate a high frame rate video higher than the second preset frame rate.
It should be noted that the pre-configured multimodal fusion network is described in detail below.
Optionally, in an embodiment of the present application, the multi-modal fusion network includes a coarse synthesis sub-network and a fine tuning sub-network, wherein the coarse synthesis sub-network uses a first U-Net structure, the number of input channels of an input layer is 64+2 × k, the number of output channels of an output layer is k, and the fine tuning sub-network uses a second U-Net structure, the number of input channels of the input layer is 3 × k, the number of output channels of the output layer is k, and k is the number of channels of an image frame of the video lower than a preset frame rate.
A multimodal fusion network is described in detail herein.
It will be appreciated that the data fusion network comprises a coarse synthesis subnetwork and a fine tuning subnetwork. The coarse synthesis sub-network uses a first U-Net structure, the number of input channels of an input layer is 64+2 xk, and the number of output channels of an output layer is k; the fine tuning sub-network uses a second U-Net structure, the number of input channels of the input layer is 3 xk, and the number of output channels of the output layer is k.
Where k is the number of channels of the image frame of the low frame rate video input in step S101, that is, k is 1 when the image frame of the low frame rate video input in step S101 is a grayscale image, and k is 3 when the image frame of the low frame rate video input in step S101 is an RGB image.
In step S105, based on the high frame rate video, forward propagation is performed by using a preset 3D depth estimation network to obtain all the high frame rate depth maps, and all the high frame rate depth maps are combined to form the high frame rate 3D video.
In an actual execution process, in the embodiment of the present application, the high frame rate image frames obtained in the above steps may be spliced with the adjacent high frame rate image frames before and after the high frame rate image frames, a preset 3D depth estimation network is used to perform forward propagation, a series of high frame rate depth maps are generated, and the generated series of high frame rate depth maps are combined to form a high frame rate 3D video, so as to generate the high frame rate 3D video. According to the method and the device, inter-frame motion information can be provided by using event data, an event stream is coded by using a pulse neural network, all intermediate frames are obtained through a multi-mode fusion network, a high-frame-rate video is generated, a high-frame-rate 3D video is formed by using a 3D depth estimation network, effective three-dimensional observation on a high-speed scene is achieved, the multi-mode data information can be better used by using the event stream and a low-frame-rate video image frame as input, and the quality of the high-frame-rate 3D video is improved.
Optionally, in an embodiment of the present application, the 3D depth estimation network uses a third U-Net structure, and the number of input channels of the input layer is 3 × k, and the number of output channels of the output layer is 1.
The construction of the 3D depth estimation network is described in detail herein.
Specifically, the 3D depth estimation network constructed in the embodiment of the present application may use a third U-Net structure, where the number of input channels of the input layer is 3 × k, and the number of output channels of the output layer is 1, where k is the number of channels of the image frame of the low frame rate video input in step S101, that is, when the image frame of the low frame rate video input in step S101 is a grayscale map, k is 1, and when the image frame of the low frame rate video input in step S101 is an RGB image, k is 3.
The embodiments of the present application will be described in detail with reference to fig. 2 to 7. As shown in fig. 2, the embodiment of the present application includes the following steps:
step S201: and acquiring the low frame rate video data and the event stream data. In the actual execution process, the embodiment of the application can acquire the video and the event data of the frame rate from the event camera, so that the acquisition of the original data is realized, and a data base is laid for the subsequent generation of the high frame rate video.
It can be understood that the event camera does not have the concept of a frame rate, each pixel of the event camera operates asynchronously, an event is output when a light intensity change is detected, each event is a quadruple (x, y, t, p) and includes a pixel abscissa (x, y), a timestamp t and an event polarity p (where p ═ 1 indicates that the light intensity of the pixel decreases, and p ═ 1 indicates that the light intensity of the pixel increases), event data output by all the pixels are summarized, and an event list composed of one event can be formed to serve as event stream data output by the camera.
For example, as shown in fig. 3, the frame rate of the low frame rate video acquired from the event camera in the embodiment of the present application may be 20FPS (Frames Per Second), which is 31 Frames in total, and the duration of the corresponding event stream is 1500 ms.
Step S202: and (4) preprocessing data. The embodiment of the application can combine every two adjacent image frames in the low frame rate video, and for each group of adjacent image frames, calculate and expect to obtain the timestamp sets T of all intermediate frames, and record as:
T={τ 1 1,2 ,τ 2 1,2 ,...,τ n 1,2 ,τ 1 2,3 ,τ 2 2,3 ,...,τ n 2,3 ,...,τ 1 N-1,N ,τ 2 N-1,N ,...,τ n N-1,N },
wherein, the calculation formula of each expected intermediate frame timestamp is as follows:
where N is the total number of frames of the input low frame rate video, N is a multiple of the desired frame rate boost, t j Is the timestamp of the jth frame of the incoming low frame rate video.
For example, in the embodiment of the present application, the input low frame rate video may include N-31 frames of images, the frame rate is 20FPS, and the timestamp of the jth frame of the input low frame rate video is t j (j-1) × 50 ms. If a high frame rate video with a frame rate enhancement of n-10 is obtained, the set of timestamps of all the calculated intermediate frames may be T-0, 5,10,15, 20.
Step S203: and constructing a pulse neural network. In an actual implementation process, the embodiment of the application can use a Spike Response model as a neuron dynamic model to construct the pulse convolution neural network.
In particular, the spiking neural network may include an input convolutional layer, a hidden convolutional layer, and an output convolutional layer. The number of input channels of the input convolution layer is 2, the size of a convolution kernel is 3 multiplied by 3 corresponding to a positive polarity event and a negative polarity event of an event stream, the step length is 1, and the number of output channels is 16; the number of input channels of the hidden convolution layer is 16, the size of the convolution kernel is 3 multiplied by 3, the step length is 1, and the number of output channels is 16; the number of input channels of the output convolutional layer is 16, the size of the convolutional core is 3 × 3, the step size is 1, and the number of output channels is 32.
Step S204: and calculating the event stream coding. In the embodiment of the present application, the timestamp τ of the intermediate frame calculated in step S202 may be obtained i j,j+1 Intercepting the stream of events epsilon from two boundary frames to a desired intermediate frame 1 ,ε 2 And will be epsilon 1 ,ε 2 Respectively inputting the impulse neural network obtained in the step S203 for forward propagation to obtain an event stream data feature vector F 1 And F 2 。
In which two boundary frames to an event stream epsilon of desired intermediate frames 1 And ε 2 The calculation formula of (a) is as follows:
wherein, tau i j,j+1 Is the time stamp of the expected intermediate frame, t j And t j+1 Is the timestamp of the expected intermediate frame adjacent to the input low frame rate video frame.
For example, the timestamp of the 15 th expected intermediate frame, i.e. the 5 th frame, τ inserted in the 2 nd and 3 rd frames of the input low frame rate video in the embodiment of the present application is used 5 2,3 75ms for example, two boundary frames to an event stream ε of the desired intermediate frame 1 And ε 2 As shown in table 1. Wherein, Table 1 and Table 2 are respectively the event stream ε 1 And ε 2 The data table of (1).
Step S205: and constructing a multi-modal fusion network. It will be appreciated that the data fusion network comprises a coarse synthesis subnetwork and a fine tuning subnetwork. The coarse synthesis sub-network adopts a U-Net structure, the number of input channels of an input layer is 64+2 xk, and the number of output channels of an output layer is k; the fine tuning sub-network adopts a U-Net structure, the number of input channels of an input layer is 3 multiplied by k, and the number of output channels of an output layer is k.
Where k is the number of channels of the image frame of the low frame rate video input in step S201, that is, k is 1 when the image frame of the low frame rate video input in step S201 is a grayscale map, and k is 3 when the image frame of the low frame rate video input in step S201 is an RGB image.
For example, in the embodiment of the present application, the image frame of the low frame rate video input in step S201 may be a grayscale map, that is, k is 1, at this time, the number of input channels of the input layer of the coarse synthesis subnetwork is 66, and the number of output channels of the output layer is 1; the number of input channels of the input layer of the fine tuning sub-network is 3; the number of output channels of the output layer is 1.
Step S206: a single high frame rate image frame calculation. As a possible implementation manner, the embodiment of the present application may obtain from step S202The adjacent image frames of the low frame rate video and the first event stream data feature vector F obtained from step S203 1 And a second event stream data feature vector F 2 And splicing, inputting the spliced image frames into a preset multi-mode fusion network for forward propagation, and generating a frame intermediate frame to finish the calculation of the single high frame rate image frame.
Specifically, the embodiment of the application may first use the adjacent image frames of the low frame rate video and the event stream data feature vector F 1 And F 2 Splicing the two signals, and inputting the two signals into a coarse synthesis sub-network to obtain a coarse output result; and then splicing the coarse output result with the input adjacent image frame, and inputting the result into a fine adjustment sub-network to obtain a final output result.
For example, taking the 15 th expected intermediate frame as an example, the generated intermediate frame is shown in fig. 4.
Step S207: all high frame rate image frame calculations. Further, the embodiment of the present application may repeat the above steps S302 to S306 for the timestamp of each expected intermediate frame calculated in step S302, and complete the calculation of all intermediate frames.
For example, in the embodiment of the present application, the input low frame rate video may include N-31 frames of images, and if the high frame rate video with the frame rate increased by N-10 times is obtained, the steps S202 to S206 need to be repeated for a total of 300 times.
In the embodiment of the present application, all the intermediate frames obtained in step S207 are combined to form a high frame rate video, so as to implement high frame rate video generation.
For example, the input event stream, the low frame rate video and the generated high frame rate video may be as shown in fig. 5, where the high frame rate video with the frame rate up n being 10 times is obtained.
Step S208: and constructing a 3D depth estimation network. Specifically, the 3D depth estimation network constructed in the embodiment of the present application may use a third U-Net structure, where the number of input channels of the input layer is 3 × k, and the number of output channels of the output layer is 1, where k is the number of channels of the image frame of the low frame rate video input in step S201, that is, when the image frame of the low frame rate video input in step S201 is a grayscale map, k is 1, and when the image frame of the low frame rate video input in step S201 is an RGB image, k is 3.
Step S209: high frame rate 3D depth estimation calculations.
Step S210: and (6) post-processing the data. In an actual execution process, in the embodiment of the present application, the high frame rate image frames obtained in the above steps may be spliced with the adjacent high frame rate image frames before and after the high frame rate image frames, a preset 3D depth estimation network is used to perform forward propagation, a series of high frame rate depth maps are generated, and the generated series of high frame rate depth maps are combined to form a high frame rate 3D video, so as to generate the high frame rate 3D video.
For example, as shown in fig. 6, in the embodiment of the present application, high frame rate depth map video generation under 10 times frame rate enhancement can be realized, and effective stereoscopic scene observation under a high-speed environment is realized.
According to the high-frame-rate 3D video generation method based on data fusion, event data can be used for providing inter-frame motion information, an impulse neural network is used for coding an event stream, all intermediate frames are obtained through a multi-mode fusion network, a high-frame-rate video is generated, a 3D depth estimation network is further used for forming the high-frame-rate 3D video, effective stereoscopic observation on a high-speed scene is achieved, the event stream and a low-frame-rate video image frame are used as input, the multi-mode data information can be better used, and the quality of the high-frame-rate 3D video is further improved. Therefore, the technical problem that the generated image quality is low due to the fact that the initial brightness value of each pixel point is lacked because only the event stream is used as input in the related technology is solved.
Next, a high frame rate 3D video generation apparatus based on data fusion proposed according to an embodiment of the present application is described with reference to the drawings.
Fig. 7 is a schematic block diagram of a high frame rate 3D video generation apparatus based on data fusion according to an embodiment of the present application.
As shown in fig. 7, the high frame rate 3D video generation apparatus 10 based on data fusion includes: a first acquisition module 100, a calculation module 200, a second acquisition module 300, a fusion module 400, and a generation module 500.
Specifically, the first acquiring module 100 is configured to acquire video and event data from an event camera, where the video and event data are lower than a preset frame rate.
The calculating module 200 is configured to combine every two adjacent image frames in the video to generate a plurality of groups of adjacent image frames, and calculate a timestamp set of all the intermediate frames expected to be obtained.
The second obtaining module 300 is configured to intercept a first event stream and a second event stream from two boundary frames to an expected intermediate frame according to the timestamp set, and input the first event stream and the second event stream to a preset impulse neural network for forward propagation to obtain a first event stream data feature vector and a second event stream data feature vector.
And the fusion module 400 is configured to splice adjacent image frames, the first event stream data feature vector and the second event stream data feature vector, input the spliced image frames, the first event stream data feature vector and the second event stream data feature vector to a preset multi-modal fusion network for forward propagation, obtain all intermediate frames, and generate a high frame rate video higher than a second preset frame rate.
The generating module 500 is configured to perform forward propagation by using a preset 3D depth estimation network based on the high frame rate video, obtain all the high frame rate depth maps, and combine all the high frame rate depth maps to form the high frame rate 3D video.
Optionally, in an embodiment of the present application, the data fusion-based high frame rate 3D video generation apparatus 10 further includes: and constructing a module.
The construction module is used for constructing the impulse neural network based on a Spike Response model as a neuron dynamic model.
Optionally, in an embodiment of the present application, the multi-modal fusion network includes a coarse synthesis sub-network and a fine tuning sub-network, wherein the coarse synthesis sub-network uses a first U-Net structure, the number of input channels of an input layer is 64+2 × k, the number of output channels of an output layer is k, and the fine tuning sub-network uses a second U-Net structure, the number of input channels of the input layer is 3 × k, the number of output channels of the output layer is k, and k is the number of channels of an image frame of the video lower than a preset frame rate.
Optionally, in an embodiment of the present application, the 3D depth estimation network uses a third U-Net structure, and the number of input channels of the input layer is 3 × k, and the number of output channels of the output layer is 1.
Optionally, in an embodiment of the present application, a calculation formula of the timestamp sets of all the intermediate frames is:
where N is the total number of frames of the input low frame rate video, N is the multiple of the desired frame rate boost, t j Is the time stamp of the j frame of the input low frame rate video.
It should be noted that the foregoing explanation on the embodiment of the high frame rate 3D video generation method based on data fusion is also applicable to the high frame rate 3D video generation device based on data fusion of this embodiment, and details are not repeated here.
According to the high-frame-rate 3D video generation device based on data fusion, event data can be used for providing inter-frame motion information, an event stream is encoded through a pulse neural network, all intermediate frames are obtained through a multi-mode fusion network, a high-frame-rate video is generated, a 3D depth estimation network is further used for forming the high-frame-rate 3D video, effective stereoscopic observation of a high-speed scene is achieved, the event stream and a low-frame-rate video image frame serve as input, the multi-mode data information can be better used, and the quality of the high-frame-rate 3D video is further improved. Therefore, the technical problem that the generated image quality is low due to the fact that the initial brightness value of each pixel point is lacked because only the event stream is used as input in the related technology is solved.
Fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
a memory 801, a processor 802, and a computer program stored on the memory 801 and executable on the processor 802.
The processor 802, when executing the program, implements the data fusion-based high frame rate 3D video generation method provided in the above-described embodiments.
Further, the electronic device further includes:
a communication interface 803 for communicating between the memory 801 and the processor 802.
A memory 801 for storing computer programs operable on the processor 802.
The memory 801 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 801, the processor 802 and the communication interface 803 are implemented independently, the communication interface 803, the memory 801 and the processor 802 may be connected to each other via a bus and communicate with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.
Alternatively, in practical implementation, if the memory 801, the processor 802 and the communication interface 803 are integrated into one chip, the memory 801, the processor 802 and the communication interface 803 may communicate with each other through an internal interface.
The processor 802 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
The present embodiment also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor, implements the data fusion-based high frame rate 3D video generation method as above.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless explicitly defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or N wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.
Claims (10)
1. A high frame rate 3D video generation method based on data fusion is characterized by comprising the following steps:
acquiring video and event data lower than a preset frame rate from an event camera;
combining every two adjacent image frames in the video to generate a plurality of groups of adjacent image frames, and calculating a timestamp set of all the intermediate frames expected to be obtained;
intercepting a first event stream and a second event stream from two boundary frames to an expected intermediate frame according to the timestamp set, and inputting the first event stream and the second event stream to a preset impulse neural network for forward propagation to obtain a first event stream data feature vector and a second event stream data feature vector;
splicing the adjacent image frames, the first event stream data feature vector and the second event stream data feature vector, inputting the adjacent image frames, the first event stream data feature vector and the second event stream data feature vector into a preset multi-mode fusion network for forward propagation to obtain all intermediate frames, and generating a high frame rate video higher than a second preset frame rate;
and based on the high frame rate video, performing forward propagation by using a preset 3D depth estimation network to obtain all the high frame rate depth maps, and combining all the high frame rate depth maps to form the high frame rate 3D video.
2. The method of claim 1, further comprising, prior to inputting the first and second event streams into the pre-defined spiking neural network for forward propagation:
and constructing the impulse neural network based on a Spike Response model as a neuron dynamic model.
3. The method of claim 1, wherein the multi-modal fusion network comprises a coarse synthesis sub-network and a fine tuning sub-network, wherein the coarse synthesis sub-network uses a first U-Net structure, the number of input channels of an input layer is 64+2 xk, the number of output channels of an output layer is k, and the fine tuning sub-network uses a second U-Net structure, the number of input channels of the input layer is 3 xk, the number of output channels of the output layer is k, and k is the number of channels of the image frames of the video with the frame rate lower than the preset frame rate.
4. The method of claim 1, wherein the 3D depth estimation network uses a third U-Net structure, and the number of input channels of the input layer is 3 x k, and the number of output channels of the output layer is 1.
5. The method according to any one of claims 1-4, wherein the calculation formula of the timestamp sets of all inter frames is:
where N is the total number of frames of the input low frame rate video, N is the multiple of the desired frame rate boost, t j Is the time stamp of the j frame of the input low frame rate video.
6. A high frame rate 3D video generation apparatus based on data fusion, comprising:
the first acquisition module is used for acquiring videos and event data which are lower than a preset frame rate from the event camera;
the computing module is used for combining every two adjacent image frames in the video to generate a plurality of groups of adjacent image frames and computing a timestamp set of all the intermediate frames expected to be obtained;
the second acquisition module is used for intercepting a first event stream and a second event stream from two boundary frames to an expected intermediate frame according to the timestamp set, and inputting the first event stream and the second event stream to a preset impulse neural network for forward propagation to obtain a first event stream data feature vector and a second event stream data feature vector;
the fusion module is used for splicing the adjacent image frames, the first event stream data feature vector and the second event stream data feature vector, inputting the adjacent image frames, the first event stream data feature vector and the second event stream data feature vector into a preset multi-mode fusion network for forward propagation to obtain all intermediate frames, and generating a high frame rate video higher than a second preset frame rate;
and the generating module is used for carrying out forward propagation by utilizing a preset 3D depth estimation network based on the high frame rate video to obtain all the high frame rate depth maps, and combining all the high frame rate depth maps to form the high frame rate 3D video.
7. The apparatus of claim 6, further comprising: and the construction module is used for constructing the impulse neural network based on a Spike Response model as a neuron dynamic model.
8. The apparatus according to any of claims 6-7, wherein the calculation formula of the timestamp sets of all inter frames is:
where N is the total number of frames of the input low frame rate video, N is the multiple of the desired frame rate boost, t j Is the time stamp of the j frame of the input low frame rate video.
9. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the data fusion based high frame rate 3D video generation method according to any one of claims 1 to 5.
10. A computer readable storage medium having stored thereon a computer program, wherein the program is executed by a processor for implementing the data fusion based high frame rate 3D video generation method according to any of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210293645.3A CN114885144B (en) | 2022-03-23 | 2022-03-23 | High frame rate 3D video generation method and device based on data fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210293645.3A CN114885144B (en) | 2022-03-23 | 2022-03-23 | High frame rate 3D video generation method and device based on data fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114885144A true CN114885144A (en) | 2022-08-09 |
CN114885144B CN114885144B (en) | 2023-02-07 |
Family
ID=82667857
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210293645.3A Active CN114885144B (en) | 2022-03-23 | 2022-03-23 | High frame rate 3D video generation method and device based on data fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114885144B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115883764A (en) * | 2023-02-08 | 2023-03-31 | 吉林大学 | Underwater high-speed video frame interpolation method and system based on data cooperation |
WO2024179078A1 (en) * | 2023-02-28 | 2024-09-06 | 万有引力(宁波)电子科技有限公司 | Fused display method and system, and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160140733A1 (en) * | 2014-11-13 | 2016-05-19 | Futurewei Technologies, Inc. | Method and systems for multi-view high-speed motion capture |
CN111667442A (en) * | 2020-05-21 | 2020-09-15 | 武汉大学 | High-quality high-frame-rate image reconstruction method based on event camera |
CN113888639A (en) * | 2021-10-22 | 2022-01-04 | 上海科技大学 | Visual odometer positioning method and system based on event camera and depth camera |
CN114071114A (en) * | 2022-01-17 | 2022-02-18 | 季华实验室 | Event camera, depth event point diagram acquisition method, device, equipment and medium |
-
2022
- 2022-03-23 CN CN202210293645.3A patent/CN114885144B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160140733A1 (en) * | 2014-11-13 | 2016-05-19 | Futurewei Technologies, Inc. | Method and systems for multi-view high-speed motion capture |
CN111667442A (en) * | 2020-05-21 | 2020-09-15 | 武汉大学 | High-quality high-frame-rate image reconstruction method based on event camera |
CN113888639A (en) * | 2021-10-22 | 2022-01-04 | 上海科技大学 | Visual odometer positioning method and system based on event camera and depth camera |
CN114071114A (en) * | 2022-01-17 | 2022-02-18 | 季华实验室 | Event camera, depth event point diagram acquisition method, device, equipment and medium |
Non-Patent Citations (1)
Title |
---|
宋彩霞等: "5G网络下VR视频的传输方法研究", 《电视技术》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115883764A (en) * | 2023-02-08 | 2023-03-31 | 吉林大学 | Underwater high-speed video frame interpolation method and system based on data cooperation |
WO2024179078A1 (en) * | 2023-02-28 | 2024-09-06 | 万有引力(宁波)电子科技有限公司 | Fused display method and system, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114885144B (en) | 2023-02-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lan et al. | MADNet: A fast and lightweight network for single-image super resolution | |
Yan et al. | Multi-scale dense networks for deep high dynamic range imaging | |
CN111402130B (en) | Data processing method and data processing device | |
CN114885144B (en) | High frame rate 3D video generation method and device based on data fusion | |
US20210133920A1 (en) | Method and apparatus for restoring image | |
CN110717851A (en) | Image processing method and device, neural network training method and storage medium | |
CN113076685B (en) | Training method of image reconstruction model, image reconstruction method and device thereof | |
CN111835983B (en) | Multi-exposure-image high-dynamic-range imaging method and system based on generation countermeasure network | |
CN114885112B (en) | High-frame-rate video generation method and device based on data fusion | |
TWI770432B (en) | Method, device and electronic apparatus for image restoration and storage medium thereof | |
CN112270692B (en) | Monocular video structure and motion prediction self-supervision method based on super-resolution | |
CN111079507B (en) | Behavior recognition method and device, computer device and readable storage medium | |
CN111652921A (en) | Generation method of monocular depth prediction model and monocular depth prediction method | |
CN114881921B (en) | Anti-occlusion imaging method and device based on event and video fusion | |
CN113554726B (en) | Image reconstruction method and device based on pulse array, storage medium and terminal | |
CN114841897B (en) | Depth deblurring method based on self-adaptive fuzzy kernel estimation | |
CN114640885B (en) | Video frame inserting method, training device and electronic equipment | |
CN100465994C (en) | Method and apparatus for downscaling a digital matrix image | |
CN110503002B (en) | Face detection method and storage medium | |
CN113658091A (en) | Image evaluation method, storage medium and terminal equipment | |
CN113158970A (en) | Action identification method and system based on fast and slow dual-flow graph convolutional neural network | |
Hua et al. | An Efficient Multiscale Spatial Rearrangement MLP Architecture for Image Restoration | |
CN117408916A (en) | Image deblurring method based on multi-scale residual Swin transducer and related product | |
JP7508525B2 (en) | Information processing device, information processing method, and program | |
CN115909088A (en) | Optical remote sensing image target detection method based on super-resolution feature aggregation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |