WO2022250372A1 - Ai에 기반한 프레임 보간 방법 및 장치 - Google Patents
Ai에 기반한 프레임 보간 방법 및 장치 Download PDFInfo
- Publication number
- WO2022250372A1 WO2022250372A1 PCT/KR2022/007140 KR2022007140W WO2022250372A1 WO 2022250372 A1 WO2022250372 A1 WO 2022250372A1 KR 2022007140 W KR2022007140 W KR 2022007140W WO 2022250372 A1 WO2022250372 A1 WO 2022250372A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- frame
- optical flow
- feature map
- level
- warped
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000003287 optical effect Effects 0.000 claims abstract description 315
- 238000013528 artificial neural network Methods 0.000 claims abstract description 109
- 238000013473 artificial intelligence Methods 0.000 description 79
- 238000010586 diagram Methods 0.000 description 20
- 230000000875 corresponding effect Effects 0.000 description 17
- 238000004364 calculation method Methods 0.000 description 14
- 230000008569 process Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 5
- 238000004590 computer program Methods 0.000 description 4
- 230000002457 bidirectional effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 206010047571 Visual impairment Diseases 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 210000003792 cranial nerve Anatomy 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/14—Picture signal circuitry for video frequency region
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/01—Conversion of standards, e.g. involving analogue television standards or digital television standards processed at pixel level
Definitions
- the present disclosure relates to a method and apparatus for interpolating a frame of an image. More specifically, the present disclosure relates to a technique of interpolating a frame of an image using AI (Artificial Intelligence).
- AI Artificial Intelligence
- Acquiring a more accurate bidirectional optical flow between two frames based on a flow prediction neural network, obtaining an AI-based interpolation filter having different filter coefficients for each pixel of a frame through an interpolation filter neural network based on a bidirectional optical flow, and AI-based A frame interpolation method and apparatus for an image based on AI used to improve image restoration performance and image quality by interpolating a new frame between two frames using an interpolation filter are provided.
- an AI-based frame interpolation method includes obtaining feature maps of a plurality of levels for a first frame and feature maps of a plurality of levels of a second frame, among successive frames of an image. step; obtaining a first optical flow from a first feature map of a predetermined level to a second feature map and a second optical flow from the second feature map of the predetermined level to the first feature map through a flow prediction neural network; ; Obtaining a forward warped first feature map by forward warping the first feature map using the first optical flow, and forward warping a second feature by forward warping the second feature map using the second optical flow obtaining a map; updating the first optical flow using the forward warped first feature map and updating the second optical flow using the forward warped second feature map; obtaining a first optical flow of a higher level by up-scaling the updated first optical flow to correspond to a higher level higher than the predetermined level; and obtaining a second optical flow of a higher level by up-scaling the updated second
- the upper level is the highest level among the plurality of levels, and the highest level may be a level corresponding to the first frame and the second frame.
- a first feature map of a first frame corresponding to the highest level among a plurality of levels and a second feature map of a second frame corresponding to the highest level are obtained through a first neural network, and first feature maps of levels below the highest level are obtained.
- the maps and second feature maps of levels below the top level are obtained through a downsampling neural network, and feature maps of a first frame of a plurality of levels and feature maps of second frames of a plurality of levels are obtained through a downsampling neural network.
- feature maps and second feature maps of lower levels are obtained through a first neural network, and first feature maps of levels below the highest level are obtained through a first neural network, and second feature maps of second frames of a plurality of levels.
- the step of obtaining the first optical flow and the second optical flow of the higher level obtaining a first importance weight of the predetermined level, wherein the first importance weight is one pixel of the second feature map of the predetermined level indicating the number of a plurality of pixels of the first feature map of the predetermined level mapped to; and obtaining a second importance weight of the predetermined level, wherein the second importance weight represents the number of pixels of the second feature map of the predetermined level mapped to one pixel of the first feature map of the predetermined level. step; may be included.
- the forward warped first feature map is obtained by additionally using the first importance weight of the predetermined level, and the forward warped second feature map is obtained by further using the second importance weight of the predetermined level. This can be obtained.
- a first importance weight of the higher level may be obtained based on the first optical flow of the higher level, and a second importance weight of the higher level may be obtained based on the second optical flow of the higher level.
- the step of determining an AI-based frame interpolation filter for the third frame includes: the first optical flow of the higher level, the second optical flow of the higher level, the first importance weight of the higher level, and the second optical flow of the higher level Obtaining a first intermediate optical flow from the third frame to the first frame and a second intermediate optical flow from the third frame to the second frame through an intermediate flow prediction neural network based on an importance weight ;
- determining step may include.
- the step of determining an AI-based frame interpolation filter for the third frame includes: the first optical flow of the higher level, the second optical flow of the higher level, the first importance weight of the higher level, and the second optical flow of the higher level Obtaining a first intermediate optical flow from the third frame to the first frame and a second intermediate optical flow from the third frame to the second frame through an intermediate flow prediction neural network based on an importance weight ; obtaining a forward warped first frame based on time t and a forward warped second frame based on time t of the third frame, using the first intermediate optical flow and the second intermediate optical flow; and determining an AI-based frame interpolation filter for the third frame through the interpolation filter neural network based on the forward warped first frame and the forward warped second frame.
- the step of determining an AI-based frame interpolation filter for the third frame includes: a first optical flow at a higher level, a second optical flow at a higher level, a first importance weight at the higher level, and a second importance at the higher level obtaining a first intermediate optical flow from the third frame to the first frame and a second intermediate optical flow from the third frame to the second frame through an intermediate flow prediction neural network based on weights; obtaining a backward warped first frame based on the time t of the third frame and a second backward warped frame based on the time t, using the first intermediate optical flow and the second intermediate optical flow; and determining an AI-based frame interpolation filter for the third frame through the interpolation filter neural network based on the backward warped first frame and the backward warped second frame.
- the first optical flow of the predetermined level is updated based on a first correlation value between the forward warped first feature map and the second feature map of the predetermined level
- the second optical flow of the predetermined level It may be updated based on a second correlation value between the forward warped second feature map and the first feature map of the predetermined level.
- the first optical flow of the predetermined level is updated based on candidate pixels within a predetermined range of the first optical flow of the predetermined level, and based on candidate pixels within a predetermined range of the second optical flow of the predetermined level.
- the second optical flow of the predetermined level may be updated.
- the predetermined range may vary according to the size of the feature map of the predetermined level.
- Pixels used for calculating the first correlation value are determined by a first filter set set by a user among pixels within the predetermined range, and pixels used for calculating the second correlation value are within the predetermined range. Among the pixels, it may be determined by a second filter set set by the user.
- Pixels used for calculating the first correlation value are determined by a first filter based on a trained neural network among pixels within the predetermined range, and pixels used for calculating the second correlation value are within the predetermined range. It may be determined by a second filter based on a trained neural network among pixels.
- a highest correlation value among correlation values with pixels within a predetermined range of the second feature map at the predetermined level is determined as a first correlation value, and a correlation value with pixels within a predetermined range of the first feature map at the predetermined level is determined.
- the highest correlation value may be determined as the second correlation value.
- the first optical flow and the second optical flow initially obtained at the lowest level among the plurality of levels may be set to zero.
- the AI-based frame interpolation filter may include one filter kernel corresponding to each of pixels in the first frame and the second frame.
- contextual feature maps of the first frame and the second frame are additionally input to the interpolation filter neural network, and the contextual feature maps of the first frame and the second frame are additionally input. It may be determined as the sum of an output value of the second neural network using a frame as an input and an output value of a predetermined classification network using the first frame and the second frame as inputs.
- the AI-based frame interpolation filter may include a filter kernel for bilinear interpolation used for sub-pixel calculation.
- the AI-based frame interpolation filter may include a filter kernel based on at least one of time and Z-map of the third frame.
- the AI-based frame interpolation filter may include a first frame interpolation filter applied to the first frame and a second frame interpolation filter applied to the second frame.
- the depth information of the first frame and the depth information of the second frame may be additionally input to the interpolation filter neural network.
- an AI-based frame interpolation device includes a memory; and a processor comprising: acquiring feature maps of a plurality of levels for a first frame and feature maps of a plurality of levels for a second frame, among successive frames of an image; obtaining a first optical flow from a first feature map of a predetermined level to a second feature map and a second optical flow from the second feature map of the predetermined level to the first feature map through a flow prediction neural network; ; Obtaining a forward warped first feature map by forward warping the first feature map using the first optical flow, and forward warping the second feature map using the second optical flow to obtain a forward warped second feature map.
- obtaining a feature map updating the first optical flow using the forward warped first feature map and updating the second optical flow using the forward warped second feature map;
- a method and apparatus for frame interpolation of an image based on AI obtains a more accurate bi-directional optical flow based on a flow prediction neural network, and different filter coefficients for each pixel of a frame through an interpolation filter neural network based on the bi-directional optical flow
- image restoration performance can be improved.
- FIG. 1 is a diagram illustrating an example of a frame interpolation process of an image based on AI according to an embodiment.
- FIG. 2 is a diagram of an example of backward warping and forward warping based on optical flow between frames according to one embodiment.
- 3A is a diagram of an example of a method of calculating a correlation value using an inverse warped feature map according to an embodiment.
- 3B is a diagram of a method of calculating a correlation value using an inverse warped feature map according to an embodiment.
- 3C is a diagram of a method of calculating a correlation value using a forward warped feature map according to an embodiment.
- FIG. 4 is a diagram of an example of a filter used to select a pixel candidate to be a calculation target when calculating a correlation value according to an embodiment.
- FIG. 5 is a diagram of an example of a method of updating an optical flow according to an embodiment.
- FIG. 6 is a diagram of an example of a process of interpolating a frame through an AI-based frame interpolation filter according to an embodiment.
- FIG. 7 is a diagram of an example of a frame interpolation filter based on AI according to an embodiment.
- FIG. 8 is a flowchart of a frame interpolation method based on AI according to an embodiment.
- FIG. 9 is a diagram illustrating a configuration of a frame interpolation device based on AI according to an embodiment.
- the expression “at least one of a, b, or c” means “a”, “b”, “c”, “a and b”, “a and c”, “b and c”, “a, b” and c”, or variations thereof.
- one component when one component is referred to as “connected” or “connected” to another component, the one component may be directly connected or directly connected to the other component, but in particular Unless otherwise described, it should be understood that they may be connected or connected via another component in the middle.
- components expressed as ' ⁇ unit (unit)', 'module', etc. are two or more components combined into one component, or one component is divided into two or more components for each more subdivided function. may be differentiated into.
- each of the components to be described below may additionally perform some or all of the functions of other components in addition to its own main function, and some of the main functions of each component may be different from other components. Of course, it may be performed exclusively by a component.
- an 'image' may represent a still image, a motion picture composed of a plurality of continuous still images (or frames), or a video.
- a 'neural network' is a representative example of an artificial neural network model that mimics a cranial nerve, and is not limited to an artificial neural network model using a specific algorithm.
- a neural network may also be referred to as a deep neural network.
- a 'parameter' may be a value used in an operation process of each layer constituting a neural network. For example, it can be used when applying an input value to a predetermined arithmetic expression.
- the parameter may be a value set as a result of training, and may be updated through separate training data as needed.
- a 'feature map' may refer to an image map output by inputting image data to a neural network.
- a feature map represents potential features of the input data.
- 'optical flow of the current level' refers to the optical flow of the feature map of the current level
- ''optical flow of the upper level' refers to the optical flow of the feature map of the upper level of the current level.
- sample' is data allocated to a sampling location in an image or feature map, and may mean data to be processed. For example, it may be a pixel value in a frame in the spatial domain.
- FIG. 1 is a diagram illustrating an example of a frame interpolation process of an image based on AI according to an embodiment.
- a first frame for the first frame A first feature map and a second feature map for the second frame are obtained.
- the downsampled first feature maps 110, 120, and 130 of a plurality of levels and the second feature maps of a plurality of levels are downsampled. Get (115, 125, 135).
- the first neural network may be a general neural network that extracts features of an input image.
- it may be a general convolutional neural network (CNN).
- CNN convolutional neural network
- a neural network for downsampling may be trained to take one feature map as an input and obtain a plurality of levels of downsampled feature maps as an output.
- downsampling feature maps of a plurality of levels may be acquired by using a downsampling neural network trained to downsample by a specific ratio by taking one feature map as an input several times.
- Optical flow may be used to interpolate a new frame between successive frames.
- Optical flow may be defined as a positional difference between samples in two consecutive frames, that is, a first frame I 0 (100) and a second frame I 1 (105). That is, the optical flow determines how the position of the samples in the first frame I 0 (100) is changed in the second frame I 1 (105) or the samples in the second frame I 1 (105) change in the first frame I 0 (100). ) indicates where it is located within.
- the optical flow for the corresponding sample may be derived as (f(x), f(y)).
- Warping is a type of geometric transformation that moves the positions of samples in an image.
- a forward warping frame similar to the first frame is obtained by warping the second frame according to the optical flow representing the relative positional relationship between the samples in the first frame I 0 (100) and the samples in the second frame I 1 (105) do. For example, if a sample located at (1, 1) in the first frame I 0 (100) is most similar to a sample located at (2, 1) in the second frame I 1 (105), the first frame is warped through warping. The position of the sample located at (1, 1) in the frame I 0 (100) may be changed to (2, 1).
- the flow prediction neural network 140 uses the feature maps of a plurality of levels of the pyramid structure to determine the first optical flow (flow 0 ⁇ 1 ) from the first frame I 0 (100) to the second frame I 1 (105) and the second This is a neural network trained to acquire a second optical flow (flow 1 ⁇ 0 ) from the second frame I 1 (105) to the first frame I 0 (100).
- the initial value of (flow 1 0 ⁇ 1 ) is set to 0, and from the second feature map 135 at the lowest level of the second frame I 1 (105) to the first first at the lowest level of the first frame I 0 (100).
- An initial value of the second optical flow (flow 1 1 ⁇ 0 ) of the lowest level to the feature map 130 is set to 0.
- the lowest level is referred to as level 1.
- the first optical flow of level 1 is obtained from the first feature map 130 of level 1 to the second feature map 135 of level 1, and the first feature of level 1 is obtained by using the first optical flow of level 1
- a forward warped first feature map of level 1 is obtained by performing forward warping on the map. Correlation values between the forward warped first feature map of level 1 and the second feature map of level 1 are calculated, and the position of the pixel having the highest correlation value is determined to obtain the first optical flow (flow 1 0 ⁇ of level 1) 1 ) is updated (141).
- the second optical flow of level 1 from the second feature map 135 of level 1 to the first feature map 130 of level 1 is obtained, and the second optical flow of level 1 is used to obtain the first feature map 130 of level 1.
- Forward warping is performed on the 2 feature maps to obtain a level 1 forward warped second feature map. Correlation values between the forward-warped second feature map of level 1 and the first feature map of level 1 are calculated, and the position of the pixel having the highest correlation value is determined to obtain a second optical flow (flow 1 1 ⁇ of level 1) 0 ) is updated (141).
- the forward warped first feature map of level 1 and the forward warped second feature map of level 1 are It may be the same as the first feature map and the second feature map of level 1, respectively.
- a first feature map of level 1 and a second feature map of level 1 are compared with each other to calculate a correlation value, so that the first optical flow of level 1 and the second optical flow of level 1 are updated to non-zero values.
- first optical flow of level 1 and the second optical flow of level 1 are upscaled 142 to a size corresponding to level 2, which is an upper level of the lowest level, so that the first optical flow of level 2 (flow 2 0 ⁇ 1 ) and the second optical flow (flow 2 1 ⁇ 0 ) are obtained.
- Calculating the correlation value is, for example, comparing the forward warped first feature map and the second feature map to determine whether they are similar to each other. This is because if the optical flow is correct, the forward warped first feature map and the second feature map must be the same. However, since the warped image cannot be completely identical to the target image, the optical flow is updated through correlation value calculation.
- a warping method using optical flow and forward warping will be described later with reference to FIG. 2
- a method for calculating a correlation value using optical flow and forward warping will be described later with reference to FIG. 3C .
- a first importance weight of level 1, which is the lowest level, and a second importance weight of level 1 may be obtained through the flow prediction neural network 140.
- the first importance weight of level 1 indicates how many pixels of the first feature map of level 1 are mapped to one pixel of the second feature map of level 1
- the second importance weight of level 1 represents how many pixels of the first feature map of level 1 are mapped to one pixel of the second feature map of level 1. It indicates how many pixels of the second feature map are mapped to one pixel of the first feature map of level 1.
- the initial values of the first importance weight (w 1 0 ) of level 1 and the second importance weight (w 1 1 ) of level 1 are set to zero.
- the first optical flow (flow 2 0 ⁇ 1 ) and the second optical flow (flow 2 1 ⁇ 0 ) of level 2 After obtaining the first optical flow (flow 2 0 ⁇ 1 ) and the second optical flow (flow 2 1 ⁇ 0 ) of level 2, the first optical flow (flow 2 0 ⁇ 1 ) and the second optical flow of level 2 Based on (flow 2 1 ⁇ 0 ), the first importance weight of level 2 and the second importance weight of level 2 are obtained.
- the first importance weight (w 2 0 ) of level 2 and the second importance weight (w 2 1 ) of level 2 are the first optical flow (flow 2 0 ⁇ 1 ) of level 2 and the second optical flow (flow 2 1 ⁇ 0 ) may be further used to obtain a forward warped first feature map of level 2 and a forward warped second feature map of level 2 for update.
- first optical flow (flow 2 0 ⁇ 1 ) and second optical flow (flow 2 1 ⁇ 0 ) of level 2 the first feature map and the second feature map of level 2, which are upper levels of the lowest level.
- first optical flow of level 2 and the second optical flow of level 2 are updated and upscaled to obtain the first optical flow of level 3, which is an upper level of level 2, and the level A second optical flow of 3 is obtained.
- the first optical flow (flow 3 0 ⁇ 1 ) and the second optical flow (flow 3 1 ⁇ 0 ) of level 3 After obtaining the first optical flow (flow 3 0 ⁇ 1 ) and the second optical flow (flow 3 1 ⁇ 0 ) of level 3, the first optical flow (flow 3 0 ⁇ 1 ) and the second optical flow of level 3 Based on (flow 3 1 ⁇ 0 ), the first importance weight of level 3 and the second importance weight of level 3 are obtained.
- the first importance weight (w 3 0 ) of level 3 and the second importance weight (w 3 1 ) of level 3 are the first optical flow (flow 3 0 ⁇ 1 ) of level 3 and the second optical flow (flow 3 1 ⁇ 0 ) may be further used to obtain a forward warped first feature map of level 3 and a forward warped second feature map of level 3 for update.
- the first optical flow of level L-1 (flow L-1 0 ⁇ 1 ) and the second optical flow (flow L-1 1 ⁇ 0 ) of level L-1 are acquired (111), and the first optical flow (flow L-1 0 ⁇ 1 ) of level L-1 and level L
- upscaling is performed to the highest level, and the first optical flow (flow 0 ⁇ 1 ) of the highest level corresponding to the first frame It is determined as the second optical flow (flow 1 ⁇ 0 ) of the highest level corresponding to the second frame.
- the flow prediction neural network 140 obtains a plurality of More accurate optical flows can be obtained by effectively expanding a receptive field without increasing the parameters of the neural network by sharing the parameters of the neural network while sequentially updating and upscaling the optical flow with respect to the feature map of the level.
- the receptive field represents the size of the input region for generating feature maps.
- a first final importance weight (w 0 ) of the highest level corresponding to the first frame and a second final importance weight corresponding to the second frame An importance weight (w 1 ) may also be obtained.
- the flow prediction neural network 140 may be trained to minimize a loss between a frame of a feature map as an input target and a frame warped with a final optical flow.
- the obtained first optical flow of the first frame and the second optical flow of the second frame correspond to a bi-directional optical flow between the first frame and the second frame.
- Each intermediate optical flow for a time t of the third frame between the first frame and the second frame is predicted through the intermediate optical flow prediction neural network 155 using the bidirectional optical flow.
- the intermediate optical flow is a first intermediate optical flow (flow t ⁇ 0 ) from the third frame to the first frame based on the time t between the first frame and the second frame, and a second intermediate optical flow from the third frame to the second frame.
- Intermediate optical flows (flow t ⁇ 1 ) may be included.
- the first importance weight of the first frame and the second importance weight of the second frame may be additionally used in the intermediate optical flow prediction neural network 155 .
- An AI-based frame interpolation filter 175 for the third frame 180 between the first frame and the second frame is obtained through the interpolation filter neural network 170 trained based on the predicted intermediate optical flow.
- a forward warped first frame, a forward warped second frame, a backward warped first frame, and a backward warped second frame using the first frame, the second frame, the first intermediate optical flow, and the second intermediate optical flow. frame can be obtained. Thereafter, the acquired forward warped first frame, forward warped second frame, backward warped first frame, and backward warped second frame are input to the interpolation filter neural network 170, and the AI-based frame interpolation filter 175 ) can be obtained.
- first frame and the second frame may be additionally input to the interpolation filter neural network 170 and used to obtain the AI-based frame interpolation filter 175 .
- first intermediate optical flow and the second intermediate optical flow may be additionally input to the interpolation filter neural network 170 and used to obtain the AI-based frame interpolation filter 175 .
- first frame, the second frame, the first intermediate optical flow, and the second intermediate optical flow are additionally input to the interpolation filter neural network 170, and may be used to obtain the AI-based frame interpolation filter 175.
- a forward warped first frame and a forward warped second frame may be obtained using the first frame, the second frame, the first intermediate optical flow, and the second intermediate optical flow. Thereafter, the obtained forward warped first frame and forward warped second frame are input to the interpolation filter neural network 170, and an AI-based frame interpolation filter 175 may be obtained.
- first frame and the second frame may be additionally input to the interpolation filter neural network 170 and used to obtain the AI-based frame interpolation filter 175 .
- first intermediate optical flow and the second intermediate optical flow may be additionally input to the interpolation filter neural network 170 and used to obtain the AI-based frame interpolation filter 175 .
- first frame, the second frame, the first intermediate optical flow, and the second intermediate optical flow are additionally input to the interpolation filter neural network 170, and may be used to obtain the AI-based frame interpolation filter 175.
- a backward warped first frame and a backward warped second frame may be obtained using the first frame, the second frame, the first intermediate optical flow, and the second intermediate optical flow. Thereafter, the obtained backward warped first frame and backward warped second frame are input to the interpolation filter neural network 170, and an AI-based frame interpolation filter 175 may be obtained.
- first frame and the second frame may be additionally input to the interpolation filter neural network 170 and used to obtain the AI-based frame interpolation filter 175 .
- first intermediate optical flow and the second intermediate optical flow may be additionally input to the interpolation filter neural network 170 and used to obtain the AI-based frame interpolation filter 175 .
- first frame, the second frame, the first intermediate optical flow, and the second intermediate optical flow are additionally input to the interpolation filter neural network 170, and may be used to obtain the AI-based frame interpolation filter 175.
- the AI-based frame interpolation filter 175 may be obtained by additionally inputting the contextual feature map of the first frame and the contextual feature map of the second frame.
- the AI-based interpolation frame filter has different filter kernels for pixels of each of the first frame and the second frame.
- the interpolation filter neural network 170 includes a first intermediate optical flow reversal representing a flow in an opposite direction to the first intermediate optical flow and a second intermediate optical flow reversal representing a flow in an opposite direction to the second intermediate optical flow. It may be additionally input and used to obtain an AI-based frame interpolation filter 175.
- the third frame 180 between the first frame and the second frame is interpolated using the AI-based frame interpolation filter 175. Since the AI-based frame interpolation filter 175 dynamically determines a filter kernel for each pixel, the third frame 180 may be interpolated more accurately.
- This interpolation method can be applied to fields requiring data generation, such as light-field data synthesis, frame rate up-conversion, and 3D rendering.
- FIG. 2 is a diagram of an example of backward warping and forward warping based on optical flow between frames according to one embodiment.
- the first frame or the second frame is warped based on the optical flow 220 from the first frame 210 to the second frame 230 .
- it is referred to as forward warping or backward warping.
- warping the second frame to the first frame corresponding to the direction opposite to the direction of the optical flow 220 is called reverse warping
- the first frame to the second frame corresponding to the direction of the optical flow 220 Warping is called forward warping.
- occlusion areas as a result of warping may overlap, whereas in the forward warped image 255, pixel values for the occlusion areas as a result of warping are set to 0. As a result, a hole area may appear.
- reverse warping it is difficult to calculate a flow by calculating a correlation value because occlusion areas appear overlapping.
- forward warping is suitable for correcting the flow by calculating a correlation value because the matched area is one area due to a hole area.
- 3A is a diagram of an example of a method of calculating a correlation value using an inverse warped feature map according to an embodiment.
- 3B is a diagram of a method of calculating a correlation value using an inverse warped feature map according to an embodiment.
- 3C is a diagram of a method of calculating a correlation value using a forward warped feature map according to an embodiment.
- a correlation value is calculated between a first backward warping feature map 320 and a second feature map 330 obtained by backward warping the first feature map 310 .
- correlation values of the pixels of the second feature map 330 are calculated within a range of (2r+1)(2r+1) with the pixels in the first inverse warping feature map 320 as the center, and the correlation values are correlated with each other. Find the pixel with the highest value.
- the calculation complexity increases with the square of the spatial complexity as the resolution increases, so the correlation value is calculated within a specific range.
- a first reverse warping feature map 350 obtained by backward warping within a range of (2r+1)(2r+1) centered on a pixel in the first feature map 340 and a second backward warping feature map 350 are obtained.
- a correlation value between the feature maps 360 is calculated.
- correlation values of pixels of the second feature map 360 are calculated within a range of (2r+1)(2r+1) with the pixels in the first inverse warping feature map 350 as the center, and the correlation values are correlated with each other. Find the pixel with the highest value.
- r may vary according to the pyramid level or the size of the corresponding feature map. Also, r may vary depending on the performance or number of hardware (eg, memory, etc.) in the frame interpolator.
- FIGS. 3A and 3B calculate a correlation value using the reverse warped feature map, as described above in FIG. 2, blur may occur due to overlapping occlusion areas due to reverse warping. This can confuse the calculation of correlation values to calibrate.
- a correlation value is calculated between a forward warped first feature map 380 and a second feature map 390 obtained by forward warping the first feature map 370 .
- correlation value calculation for flow update is performed according to the method of FIG. 3C .
- the correlation value calculation and flow update method will be described later with reference to FIG. 5 .
- FIG. 4 is a diagram of an example of a filter used to select a pixel candidate to be a calculation target when calculating a correlation value according to an embodiment.
- pixels in the second feature map 410 that are targets of correlation value calculation may be pixels within a range of (2r+1)(2r+1) with one pixel as the center. Rather than calculating all of these candidate pixels, the amount of calculation may be reduced by calculating only some pixels within the range of (2r+1)(2r+1) through the geometric filter 420. Accordingly, a correlation value between the filtered second feature map 430 and the forward warped first feature map may be calculated.
- a filter based on a neural network can be used to refine suitable candidates within a large comparison region. That is, if the geometric filter 420 capable of extracting n candidates within the range of (2r+1)(2r+1) is used, an optimal flow can be corrected on a small space complexity.
- N pixels within the range of (2r+1) 2 of the geometric filter are selected from the range of (2r+1) 2 of the second feature map, and the filtered second feature map 430 ) calculates a correlation value using only N pixels within the (2r+1) 2 range.
- H is the height of the feature map
- W is the width of the feature map
- C is the channel of the feature map.
- the geometric filter 420 may be preset by a user and may be acquired through a trained neural network.
- the geometric filter 430 has a range wider than the range of (2r+1)(2r+1), as well as determined to select only some of the pixels within the range of (2r+1)(2r+1). However, it may be determined to use pixels of (2r+1)(2r+1) among them.
- the geometric filter 420 is additionally used for flow update in the flow prediction neural network 140, the target pixel for correlation value calculation is selected, memory waste is reduced, and flow candidate selection accuracy in the flow prediction neural network 140 can rise
- FIG. 5 is a diagram of an example of a method of updating an optical flow according to an embodiment.
- an optical flow is updated from a current level to a next higher level using feature maps of a plurality of levels of a pyramid structure.
- the optical flow (f_warp (Flow n-1 1 ⁇ 2 )) 510 obtained by up-scaling to correspond to the current level (n-1) after being updated at the lower level is the coordinate of the pixel with the highest correlation value has Adding candidate offsets 520 of pixels within a specific range based on the coordinates of the pixel having the highest correlation value to the optical flow (f_warp(Flow n ⁇ 1 1 ⁇ 2 )) 510 (eg, adding As operation 570), candidate flows are obtained.
- Dot product between pixels within a specific range of the forward warped first feature map (Feat n-1 1 ⁇ 2 ) 530 of the current level and the second feature map (Feat n-1 2 ) 540 of the current level ( 550) is input to the soft argmax function 560 to obtain the location of the pixel having the largest correlation value.
- the forward warped first feature map 530 of the current level serves as a query
- the second feature map 540 of the current level serves as a key, so that the value of the correlation value is The location of the largest pixel is obtained as a result.
- a flow (f_warp(Flow n ⁇ 1 1 ⁇ 2 )′) may be obtained.
- the optical flow of the higher level may be obtained by up-scaling the updated optical flow of the current level (f_warp(Flow n ⁇ 1 1 ⁇ 2 )′) to correspond to the higher level n of the current level.
- An optimal optical flow can be predicted while reducing space complexity by calculating a correlation value through comparison within a limited range.
- (H, W, 2) of the optical flow obtained by up-scaling to correspond to the current level after being updated at the lower level is information representing the height (H), width (W), and x and y coordinates of the pixel ( 2).
- (H, W, C, 1) of the forward-warped feature map of the current level is the height of the feature map (H), the width of the feature map (W), the channel of the feature map (C), and the pixel to be updated currently (1 ).
- (H, W, C, (2r+1) 2 ) of the second feature map of the current level is the height (H) of the feature map, the width (W) of the feature map, the channel (C) of the feature map, and the correlation Indicates the range of pixels subject to value calculation ((2r+1) 2 ).
- FIG. 6 is a diagram of an example of a process of interpolating a frame through an AI-based frame interpolation filter according to an embodiment.
- a first frame interpolation filter 615 for each pixel of the first frame 610 and a second frame interpolation filter 635 for each pixel of the second frame 630 are generated through an interpolation filter neural network.
- the first frame interpolation filter 615 has different filter coefficients for each pixel of the first frame 610
- the second frame interpolation filter 635 has different filter coefficients for each pixel of the second frame 630.
- Warping the frames inevitably causes deterioration in image quality, and when blending is performed using warped images using an optical flow, afterimages or image quality deteriorates in the resulting image.
- a first frame interpolation filter having different filter coefficients for each pixel of the first frame and a second frame interpolation filter having different filter coefficients for each pixel of the second frame are used through a trained interpolation filter neural network. is obtained, and the third frame is interpolated using the first frame interpolation filter 615 and the second frame interpolation filter 635.
- the warping mismatch between frames is corrected to correct flow and brightness, thereby improving the accuracy of interpolated frames. Accordingly, it may be helpful to improve restoration performance and accuracy of high-definition video such as 4K.
- FIG. 7 is a diagram of an example of a frame interpolation filter based on AI according to an embodiment.
- the frame interpolation filter based on the AI obtained through the interpolation filter neural network includes a transform kernel 710 for warping and a transform kernel 710 for occlusion, derived using the learned parameters of the interpolation filter neural network ( 720).
- AI-based frame interpolation filter may additionally include a bilinear interpolation kernel 730 and an attention kernel 740.
- the bilinear interpolation kernel 730 is a transform kernel for sub-pixel calculation, and the attention kernel 740 calculates the time t for the third frame, the depth map when the first frame and the second frame include depth map information. It is a kernel calculated based on previously known information such as information, geometry information of the first frame and the second frame.
- the attention kernel 740 may be derived through a trained neural network.
- the bilinear interpolation kernel 730 may be determined by the predicted flow, and the attention kernel 740 may use a weight for each kernel position, for example, a Gaussian weight.
- the transform kernel 710 for warping and the transform kernel 720 for occlusion are not in one interpolation filter neural network, but in the transform kernel 710 for warping and the transform kernel 720 for occlusion, respectively. It can be learned and output by neural networks for
- the neural network for the transform kernel 710 for warping may be trained mainly using input data for optical flow, and the neural network for the transform kernel 720 for occlusion may be trained mainly using importance weights. .
- FIG. 8 is a flowchart of a frame interpolation method based on AI according to an embodiment.
- step S810 the AI-based frame interpolation apparatus 900 selects first feature maps of a plurality of levels for a first frame and second feature maps of a plurality of levels for a second frame among successive frames of an image.
- a first feature map of a first frame corresponding to the top level and a second feature map of a second frame corresponding to the top level are obtained through a first neural network, and first feature maps of levels below the top level are obtained.
- the feature maps and the second feature maps of levels below the top level are obtained through a downsampling neural network, and the first feature maps of the plurality of levels and the second feature maps of the plurality of levels are the first feature maps of the levels below the top level. and second feature maps of levels below the top level.
- step S820 the AI-based frame interpolation device 900 performs the first optical flow from the first feature map of a predetermined level to the second feature map through the flow prediction neural network and the second feature map of the predetermined level. A second optical flow to the first feature map is obtained.
- step S830 the AI-based frame interpolation apparatus 900 forward warps the first feature map using the first optical flow to obtain a forward warped first feature map, and uses the second optical flow to A forward warped second feature map is obtained by forward warping the second feature map.
- step S840 the AI-based frame interpolator 900 updates the first optical flow using the forward warped first feature map, and uses the forward warped second feature map to generate the second optical flow update
- step S850 the frame interpolation device 900 based on AI upscales the updated first optical flow to correspond to the upper level of the predetermined level, obtains the first optical flow of the upper level, and obtains the updated first optical flow.
- the second optical flow of the higher level is obtained by up-scaling the second optical flow to correspond to the upper level.
- the upper level is the highest level, and the highest level may be a level corresponding to the first frame and the second frame.
- the first optical flow of the highest level and the second optical flow of the highest level may not be upscaled after being updated.
- the forward warped first feature map is obtained by forward warping the first feature map at the highest level using the first optical flow of the highest level
- the second feature map of the highest level is obtained using the second optical flow of the highest level.
- a forward warped second feature map is obtained by forward warping the feature map, a first optical flow of the top level is updated using the forward warped first feature map, and a top level image is obtained using the forward warped second feature map.
- the second optical flow of the top level is updated, and the first optical flow of the top level and the second optical flow of the top level are not upscaled and may be used.
- a forward warped first feature map is obtained by additionally using a first importance weight of a predetermined level
- the second feature map is forward warped by additionally using a second importance weight of a predetermined level.
- a map can be obtained.
- a first importance weight of a higher level may be obtained based on a first optical flow of a higher level
- a second importance weight of a higher level may be obtained based on a second optical flow of a higher level.
- determining an AI-based frame interpolation filter for a third frame using the obtained upper-level first optical flow and the obtained higher-level second optical flow through an interpolation filter neural network The step is: based on the first optical flow of the upper level, the second optical flow of the upper level, the first importance weight of the upper level, and the second importance weight of the upper level, through the intermediate flow prediction neural network, the defroster third frame obtaining a first intermediate optical flow from the first frame to the first frame and a second intermediate optical flow from the third frame to the second frame; Based on the first intermediate optical flow and the second intermediate optical flow, a first frame forward warped with respect to time t of a third frame, a second frame forward warped with respect to time t, and a second warped backward with respect to time t obtaining a second frame reverse warped with respect to 1 frame, time t; And determining an AI-based frame interpolation filter for the third frame through an interpolation filter neural network based on the forward warped first frame, the forward warped
- determining an AI-based frame interpolation filter for a third frame using the obtained upper-level first optical flow and the obtained higher-level second optical flow through an interpolation filter neural network The step is: based on the first optical flow of the upper level, the second optical flow of the upper level, the first importance weight of the upper level, and the second importance weight of the upper level, through the intermediate flow prediction neural network, the defroster third frame obtaining a first intermediate optical flow from the first frame to the first frame and a second intermediate optical flow from the third frame to the second frame; obtaining a first frame forward warped with respect to time t and a second frame forward warped with respect to time t of a third frame based on the first intermediate optical flow and the second intermediate optical flow; and determining an AI-based frame interpolation filter for the third frame through an interpolation filter neural network based on the forward warped first frame and the forward warped second frame.
- determining an AI-based frame interpolation filter for a third frame using the obtained upper-level first optical flow and the obtained higher-level second optical flow through an interpolation filter neural network The step is: based on the first optical flow of the upper level, the second optical flow of the upper level, the first importance weight of the upper level, and the second importance weight of the upper level, through the intermediate flow prediction neural network, the defroster third frame obtaining a first intermediate optical flow from the first frame to the first frame and a second intermediate optical flow from the third frame to the second frame; obtaining a first frame reversely warped with respect to time t and a second frame reversely warped with respect to time t of a third frame based on the first intermediate optical flow and the second intermediate optical flow; and determining an AI-based frame interpolation filter for the third frame through an interpolation filter neural network based on the backward warped first frame and the backward warped second frame.
- the first optical flow of the predetermined level is updated based on the first correlation value between the forward warped first feature map and the second feature map of the predetermined level, and the second optical flow of the predetermined level is forward-directed. It may be updated based on a second correlation value between the warped second feature map and the first feature map of a predetermined level.
- a first optical flow of a predetermined level is updated based on a first correlation value and candidate pixels within a predetermined range of a forward warped first feature map of a predetermined level, and a first forward warped first feature map of a predetermined level is updated.
- a second optical flow of a predetermined level may be updated based on the second correlation value and candidate pixels within a predetermined range of the 2-feature map.
- the predetermined range may vary according to the size of a feature map of a predetermined level.
- the predetermined range is a range of a radius r centered on the pixel that is the target of calculating the correlation value of the feature map, and if the coordinates of the target pixel are (x, y), (x-r ⁇ x ⁇ x+r, y-r ⁇ y ⁇ y+r).
- the size of the radius r may vary depending on the size of a feature map of a predetermined level.
- pixels used to calculate the first correlation value may be determined by a filter set by a user
- pixels used to calculate the second correlation value may be determined by a filter set by the user
- pixels used for calculating the first correlation value may be determined based on the trained neural network, and pixels used for calculating the second correlation value may be determined based on the trained neural network.
- a highest correlation value among correlation values with pixels within a predetermined range of a second feature map of a predetermined level is determined as a first correlation value, and a correlation value within a predetermined range of the first feature map of a predetermined level is determined.
- the highest correlation value may be determined as the second correlation value.
- the first optical flow and the second optical flow initially obtained at the lowest level among the plurality of levels may be set to zero.
- step S860 the AI-based frame interpolation apparatus 900 uses the obtained first optical flow of the higher level and the second optical flow of the obtained higher level through the interpolation filter neural network to obtain the first frame and the second optical flow. Determine an AI-based frame interpolation filter for the third frame between two frames.
- the AI-based frame interpolation filter may include one filter kernel corresponding to each of pixels in the first frame and the second frame.
- contextual feature maps of the first frame and the second frame are additionally input to an interpolation filter neural network, and the contextual feature maps of the first frame and the second frame are further input. It may be determined as the sum of an output value of the second neural network having as an input and an output value of a predetermined classification network having the first frame and the second frame as inputs.
- the predetermined classification network may be one of VGG16 or ResNet, which is one of the structures of VGGNet developed by the research team VGG at Oxford University.
- the output value of the predetermined classification network may be any one of an output value of a final layer of the network, an output value of an intermediate layer of the network, an output value of some layers of the network, and an output value of the intermediate layer or the final layer. have.
- the AI-based frame interpolation filter may further include a filter kernel for bilinear interpolation used for sub-pixel computation.
- the AI-based frame interpolation filter may further include a filter kernel based on at least one of a time of the third frame and a Z-map.
- the depth information of the first frame and the depth information of the second frame may be additionally input to the interpolation filter neural network.
- step S870 the AI-based frame interpolation apparatus 900 obtains a third frame by using the first frame, the second frame, and the AI-based frame interpolation filter.
- the AI-based frame interpolation filter may include a first frame interpolation filter applied to a first frame and a second frame interpolation filter applied to a second frame.
- FIG. 9 is a diagram illustrating a configuration of a frame interpolation device based on AI according to an embodiment.
- the AI-based frame interpolation device 900 includes a feature map acquisition unit 910, an optical flow acquisition unit 920, a forward warping feature map acquisition unit 930, an optical flow update unit 940, It includes an optical flow upscaling unit 950, an interpolation filter acquisition unit 960, and a frame acquisition unit 970.
- the feature map acquisition unit 910, the forward warping feature map acquisition unit 930, the optical flow update unit 940, the optical flow upscaler 950, the interpolation filter acquisition unit 960, and the frame acquisition unit 970 are It can be implemented as a processor, and includes a feature map acquisition unit 910, a forward warping feature map acquisition unit 930, an optical flow update unit 940, an optical flow upscaling unit 950, an interpolation filter acquisition unit 960, and The frame acquisition unit 970 may operate according to instructions stored in a memory (not shown).
- FIG. 9 shows a feature map acquisition unit 910, a forward warping feature map acquisition unit 930, an optical flow update unit 940, an optical flow upscaling unit 950, an interpolation filter acquisition unit 960, and a frame acquisition unit ( 970) are shown individually, but feature map acquisition unit 910, forward warping feature map acquisition unit 930, optical flow update unit 940, optical flow upscale unit 950, interpolation filter acquisition unit 960 ) and the frame acquisition unit 970 may be implemented through one processor.
- a feature map acquisition unit 910, a forward warping feature map acquisition unit 930, an optical flow update unit 940, an optical flow upscaler 950, an interpolation filter acquisition unit 960, and a frame acquisition unit ( 970) may be implemented as a dedicated processor or through a combination of software and a general-purpose processor such as an application processor (AP), central processing unit (CPU), or graphic processing unit (GPU).
- a dedicated processor may include a memory for implementing an embodiment of the present disclosure or a memory processing unit for using an external memory.
- the feature map acquisition unit 910, the forward warping feature map acquisition unit 930, the optical flow update unit 940, the optical flow upscaler 950, the interpolation filter acquisition unit 960, and the frame acquisition unit 970 are It may also consist of a plurality of processors. In this case, it may be implemented by a combination of dedicated processors or a combination of software and a plurality of general-purpose processors such as APs, CPUs, or GPUs.
- the feature map acquisition unit 910 obtains first feature maps of a plurality of levels for a first frame and second feature maps of a plurality of levels for a second frame, among successive frames of an image.
- the optical flow acquisition unit 920 performs a first optical flow from a first feature map of a predetermined level to a second feature map and from the second feature map of the predetermined level to the first feature map through a flow prediction neural network. A second optical flow is obtained.
- the forward warping feature map acquisition unit 930 obtains a forward warped first feature map by forward warping the first feature map using the first optical flow, and obtains the second feature using the second optical flow.
- a forward warped second feature map is obtained by forward warping the map.
- the optical flow updater 940 updates the first optical flow using the forward warped first feature map and updates the second optical flow using the forward warped second feature map.
- the optical flow upscaling unit 950 upscales the updated first optical flow to correspond to a higher level of the predetermined level to obtain a first optical flow of a higher level, and converts the updated second optical flow to the upper level. By up-scaling to correspond to the level, a second optical flow of a higher level is obtained.
- the interpolation filter acquisition unit 960 uses the first optical flow of a higher level and the second optical flow of a higher level through an interpolation filter neural network to obtain an AI-based frame for a third frame between the first frame and the second frame. Determines the interpolation filter.
- the frame acquisition unit 970 obtains a third frame by using the first frame, the second frame, and an AI-based frame interpolation filter.
- the above-described embodiments of the present disclosure can be written as a program that can be executed on a computer, and the written program can be stored in a storage medium readable by a device.
- the device-readable storage medium may be provided in the form of a non-transitory storage medium.
- 'non-temporary storage medium' only means that it is a tangible device and does not contain signals (e.g., electromagnetic waves), and this term refers to the case where data is stored semi-permanently in the storage medium and temporary It does not discriminate if it is saved as .
- a 'non-temporary storage medium' may include a buffer in which data is temporarily stored.
- the method according to various embodiments disclosed in this document may be provided by being included in a computer program product.
- Computer program products may be traded between sellers and buyers as commodities.
- a computer program product is distributed in the form of a device-readable storage medium (eg compact disc read only memory (CD-ROM)), or through an application store or between two user devices (eg smartphones). It can be distributed (e.g., downloaded or uploaded) directly or online.
- a computer program product eg, a downloadable app
- a device-readable storage medium such as a memory of a manufacturer's server, an application store server, or a relay server. It can be temporarily stored or created temporarily.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Graphics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (15)
- 영상의 연속적인 프레임들 중에서, 제1 프레임에 대한 복수의 레벨의 특징 맵들 및 제2 프레임에 대한 상기 복수의 레벨의 특징 맵들을 획득하는 단계;플로우 예측 신경망을 통해, 소정 레벨의 제1 특징 맵에서 제2 특징 맵으로의 제1 옵티컬 플로우 및 상기 소정 레벨의 상기 제2 특징 맵에서 상기 제1 특징 맵으로의 제2 옵티컬 플로우를 획득하는 단계;상기 제1 옵티컬 플로우를 이용하여 상기 제1 특징 맵을 순방향 워핑하여 순방향 워핑된 제1 특징 맵을 획득하고, 상기 제2 옵티컬 플로우를 이용하여 상기 제2 특징 맵을 순방향 워핑하여 순방향 워핑된 제2 특징 맵을 획득하는 단계;상기 순방향 워핑된 제1 특징 맵을 이용하여 상기 제1 옵티컬 플로우를 업데이트하고, 상기 순방향 워핑된 제2 특징 맵을 이용하여 상기 제2 옵티컬 플로우를 업데이트하는 단계;상기 업데이트된 제1 옵티컬 플로우를 상기 소정 레벨보다 높은 상위 레벨에 대응하도록 업스케일하여 상기 상위 레벨의 제1 옵티컬 플로우를 획득하고, 상기 업데이트된 제2 옵티컬 플로우를 상기 상위 레벨에 대응하도록 업스케일하여 상기 상위 레벨의 제2 옵티컬 플로우를 획득하는 단계;보간 필터 신경망을 통해, 상기 획득된 상위 레벨의 제1 옵티컬 플로우 및 상기 획득된 상위 레벨의 제2 옵티컬 플로우를 이용하여, 상기 제1 프레임과 상기 제2 프레임 사이의 제3 프레임에 대한 AI 기반 프레임 보간 필터를 결정하는 단계; 및상기 제1 프레임, 상기 제2 프레임, 및 상기 AI 기반 프레임 보간 필터를 이용하여, 상기 제3 프레임을 획득하는 단계를 포함하는 것을 특징으로 하는 AI에 기반한 프레임 보간 방법.
- 제 1 항에 있어서,상기 상위 레벨은 상기 복수의 레벨들 중 최상위 레벨이고,상기 최상위 레벨은 상기 제1 프레임 및 상기 제2 프레임에 대응하는 레벨인, AI에 기반한 프레임 보간 방법.
- 제 1 항에 있어서,상기 복수의 레벨들 중 최상위 레벨에 대응하는 제1 프레임의 제1 특징 맵 및 최상위 레벨에 대응하는 제2 프레임의 제2 특징 맵은 제1 신경망을 통해 획득되고,상기 최상위 레벨의 아래 레벨들의 제1 특징 맵들 및 상기 최상위 레벨의 아래 레벨들의 제2 특징 맵들은 다운샘플링 신경망을 통해 획득되고,상기 복수의 레벨의 상기 제1 프레임에 대한 특징 맵들 및 상기 복수의 레벨의 상기 제2 프레임에 대한 특징 맵들은 상기 아래 레벨들의 제1 특징 맵들 및 상기 아래 레벨들의 제2 특징 맵들인, AI에 기반한 프레임 보간 방법.
- 제 1 항에 있어서,상기 상위 레벨의 제1 옵티컬 플로우, 상기 제2 옵티컬 플로우를 획득하는 단계는:상기 소정 레벨의 제1 중요도 가중치를 획득하되, 상기 제1 중요도 가중치는 상기 소정 레벨의 제2 특징 맵의 하나의 픽셀에 매핑되는 상기 소정 레벨의 제1 특징 맵의 복수의 픽셀들의 수를 나타내는 단계;상기 소정 레벨의 제2 중요도 가중치를 획득하되, 상기 제2 중요도 가중치는 상기 소정 레벨의 제1 특징 맵의 하나의 픽셀에 매핑되는 상기 소정 레벨의 제2 특징 맵의 복수의 픽셀들의 수를 나타내는 단계;를 더 포함하는, AI에 기반한 프레임 보간 방법.
- 제 4 항에 있어서,상기 소정 레벨의 상기 제1 중요도 가중치를 추가로 이용하여, 상기 순방향 워핑된 제1 특징 맵이 획득되고,상기 소정 레벨의 상기 제2 중요도 가중치를 추가로 이용하여, 상기 순방향 워핑된 제2 특징 맵이 획득되는, AI에 기반한 프레임 보간 방법.
- 제 4 항에 있어서,상기 상위 레벨의 상기 제1 옵티컬 플로우에 기초하여 상기 상위 레벨의 제1 중요도 가중치가 획득되고, 상기 상위 레벨의 상기 제2 옵티컬 플로우에 기초하여 상기 상위 레벨의 제2 중요도 가중치가 획득되는, AI에 기반한 프레임 보간 방법.
- 제 4 항에 있어서,상기 제3 프레임에 대한 AI 기반 프레임 보간 필터를 결정하는 단계는:상기 상위 레벨의 제1 옵티컬 플로우, 상기 상위 레벨의 제2 옵티컬 플로우, 상기 상위 레벨의 제1 중요도 가중치, 상기 상위 레벨의 제2 중요도 가중치에 기초하여, 중간 플로우 예측 신경망을 통해, 상기 제3 프레임에서 상기 제1 프레임으로의 제1 중간 옵티컬 플로우, 상기 제3 프레임에서 상기 제2 프레임으로의 제2 중간 옵티컬 플로우를 획득하는 단계;상기 제1 중간 옵티컬 플로우, 제2 중간 옵티컬 플로우를 이용하여, 상기 제3 프레임의 시간 t에 기초한 순방향 워핑된 제1 프레임, 상기 시간 t에 기초한 순방향 워핑된 제2 프레임, 상기 시간 t에 기초한 역방향 워핑된 제1 프레임, 상기 시간 t에 기초한 역방향 워핑된 제2 프레임을 획득하는 단계; 및순방향 워핑된 제1 프레임, 순방향 워핑된 제2 프레임, 역방향 워핑된 제1 프레임, 역방향 워핑된 제2 프레임에 기초하여, 상기 보간 필터 신경망를 통해, 상기 제3 프레임에 대한 AI 기반 프레임 보간 필터를 결정하는 단계;를 더 포함하는, AI에 기반한 프레임 보간 방법.
- 제 1 항에 있어서,상기 소정 레벨의 상기 제1 옵티컬 플로우는 상기 순방향 워핑된 제1 특징 맵과 상기 소정 레벨의 제2 특징 맵 사이의 제1 상관 값에 기초하여 업데이트되고,상기 소정 레벨의 상기 제2 옵티컬 플로우는 상기 순방향 워핑된 제2 특징 맵과 상기 소정 레벨의 제1 특징 맵 사이의 제2 상관 값에 기초하여 업데이트되는, AI에 기반한 프레임 보간 방법.
- 제 8 항에 있어서,상기 소정 레벨의 제1 옵티컬 플로우의 미리 정해진 범위 내의 후보 픽셀들에 기초하여 상기 소정 레벨의 제1 옵티컬 플로우가 업데이트되고,상기 소정 레벨의 제2 옵티컬 플로우의 미리 정해진 범위 내의 후보 픽셀들에 기초하여 상기 소정 레벨의 제2 옵티컬 플로우가 업데이트되는, AI에 기반한 프레임 보간 방법.
- 제 9 항에상기 미리 정해진 범위는 상기 소정 레벨의 특징 맵의 크기에 따라 달라지는, AI에 기반한 프레임 보간 방법.
- 제 1 항에 있어서,상기 복수의 레벨 중 최하위 레벨에서 초기에 획득되는 제1 옵티컬 플로우와 제2 옵티컬 플로우는 0으로 설정되는, AI에 기반한 프레임 보간 방법.
- 제 1 항에 있어서,상기 AI 기반 프레임 보간 필터는 상기 제1 프레임 및 상기 제2 프레임 내의 픽셀들 각각에 대응하는 하나의 필터 커널을 포함하는, AI에 기반한 프레임 보간 방법.
- 제 12 항에 있어서,상기 AI 기반 프레임 보간 필터를 결정하기 위해, 상기 보간 필터 신경망에 상기 제1 프레임 및 상기 제2 프레임의 문맥적 특징 맵들이 추가로 입력되고,상기 문맥적 특징 맵들은 상기 제1 프레임 및 상기 제2 프레임을 입력으로하는 제2 신경망의 출력 값과 상기 제1 프레임 및 상기 제2 프레임을 입력으로하는 미리 결정된 분류 네트워크의 출력 값의 합으로 결정되는, AI에 기반한 프레임 보간 방법.
- 제 1 항에 있어서,상기 AI 기반 프레임 보간 필터는 상기 제1 프레임에 적용되는 제1 프레임 보간 필터와 상기 제2 프레임에 적용되는 제2 프레임 보간 필터를 포함하는, AI에 기반한 프레임 보간 방법.
- 메모리; 및프로세서를 포함하고,상기 프로세서는:영상의 연속적인 프레임들 중에서, 제1 프레임에 대한 복수의 레벨의 특징 맵들 및 제2 프레임에 대한 상기 복수의 레벨의 특징 맵들을 획득하는 단계;플로우 예측 신경망을 통해, 소정 레벨의 제1 특징 맵에서 제2 특징 맵으로의 제1 옵티컬 플로우 및 상기 소정 레벨의 상기 제2 특징 맵에서 상기 제1 특징 맵으로의 제2 옵티컬 플로우를 획득하는 단계;상기 제1 옵티컬 플로우를 이용하여 상기 제1 특징 맵을 순방향 워핑하여 순방향 워핑된 제1 특징 맵을 획득하고, 상기 제2 옵티컬 플로우를 이용하여 상기 제2 특징 맵을 순방향 워핑하여 순방향 워핑된 제2 특징 맵을 획득하는 단계;상기 순방향 워핑된 제1 특징 맵을 이용하여 상기 제1 옵티컬 플로우를 업데이트하고, 상기 순방향 워핑된 제2 특징 맵을 이용하여 상기 제2 옵티컬 플로우를 업데이트하는 단계;상기 업데이트된 제1 옵티컬 플로우를 상기 소정 레벨보다 높은 상위 레벨에 대응하도록 업스케일하여, 상기 상위 레벨의 제1 옵티컬 플로우를 획득하고, 상기 업데이트된 제2 옵티컬 플로우를 상기 상위 레벨에 대응하도록 업스케일하여, 상기 상위 레벨의 제2 옵티컬 플로우를 획득하는 단계;보간 필터 신경망을 통해, 상기 획득된 상위 레벨의 제1 옵티컬 플로우 및 상기 획득된 상위 레벨의 제2 옵티컬 플로우를 이용하여, 상기 제1 프레임과 상기 제2 프레임 사이의 제3 프레임에 대한 AI 기반 프레임 보간 필터를 결정하는 단계; 및상기 제1 프레임, 상기 제2 프레임, 및 상기 AI 기반 프레임 보간 필터를 이용하여 상기 제3 프레임을 획득하는 단계를 수행하는, AI에 기반한 프레임 보간 장치.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202280037578.7A CN117377974A (zh) | 2021-05-24 | 2022-05-18 | 基于ai的帧插值方法及设备 |
EP22811565.5A EP4318376A4 (en) | 2021-05-24 | 2022-05-18 | AI-BASED FRAME INTERPOLATION METHOD AND DEVICE |
US17/752,347 US20220375030A1 (en) | 2021-05-24 | 2022-05-24 | Method and apparatus for interpolating frame based on artificial intelligence |
Applications Claiming Priority (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2021-0066498 | 2021-05-24 | ||
KR20210066498 | 2021-05-24 | ||
KR20210108356 | 2021-08-17 | ||
KR10-2021-0108356 | 2021-08-17 | ||
KR1020220019101A KR20220158598A (ko) | 2021-05-24 | 2022-02-14 | Ai에 기반한 프레임 보간 방법 및 장치 |
KR10-2022-0019101 | 2022-02-14 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/752,347 Continuation US20220375030A1 (en) | 2021-05-24 | 2022-05-24 | Method and apparatus for interpolating frame based on artificial intelligence |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022250372A1 true WO2022250372A1 (ko) | 2022-12-01 |
Family
ID=84229318
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2022/007140 WO2022250372A1 (ko) | 2021-05-24 | 2022-05-18 | Ai에 기반한 프레임 보간 방법 및 장치 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022250372A1 (ko) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024122856A1 (ko) * | 2022-12-05 | 2024-06-13 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10812825B2 (en) * | 2016-11-14 | 2020-10-20 | Google Llc | Video frame synthesis with deep learning |
US10970600B2 (en) * | 2017-03-08 | 2021-04-06 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for training neural network model used for image processing, and storage medium |
-
2022
- 2022-05-18 WO PCT/KR2022/007140 patent/WO2022250372A1/ko active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10812825B2 (en) * | 2016-11-14 | 2020-10-20 | Google Llc | Video frame synthesis with deep learning |
US10970600B2 (en) * | 2017-03-08 | 2021-04-06 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for training neural network model used for image processing, and storage medium |
Non-Patent Citations (3)
Title |
---|
HYEONJUN SIM; JIHYONG OH; MUNCHURL KIM: "XVFI: eXtreme Video Frame Interpolation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 30 March 2021 (2021-03-30), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081919485 * |
SIMON NIKLAUS; FENG LIU: "Softmax Splatting for Video Frame Interpolation", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 11 March 2020 (2020-03-11), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081619671 * |
ZACHARY TEED; JIA DENG: "RAFT: Recurrent All-Pairs Field Transforms for Optical Flow", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 26 March 2020 (2020-03-26), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081628903 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024122856A1 (ko) * | 2022-12-05 | 2024-06-13 | 삼성전자주식회사 | 전자 장치 및 그 제어 방법 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021080158A1 (en) | Image processing method, apparatus, electronic device and computer readable storage medium | |
AU2018319215B2 (en) | Electronic apparatus and control method thereof | |
US8170376B2 (en) | Super-resolution device and method | |
JPS6163893A (ja) | デイスプレイ装置における擬似中間調画像の表示方法 | |
WO2021085757A1 (ko) | 예외적 움직임에 강인한 비디오 프레임 보간 방법 및 그 장치 | |
JP4564564B2 (ja) | 動画像再生装置、動画像再生方法および動画像再生プログラム | |
WO2019132091A1 (ko) | 머신러닝 기반의 동적 파라미터에 의한 업스케일된 동영상의 노이즈 제거방법 | |
WO2020017871A1 (ko) | 영상 처리 장치 및 그 동작방법 | |
WO2020027519A1 (ko) | 영상 처리 장치 및 그 동작방법 | |
WO2022250372A1 (ko) | Ai에 기반한 프레임 보간 방법 및 장치 | |
EP4367628A1 (en) | Image processing method and related device | |
WO2022075688A1 (en) | Occlusion processing for frame rate conversion using deep learning | |
CN111553843B (zh) | 图像处理方法、图像处理装置及显示装置 | |
JPH0522648A (ja) | 画像処理装置 | |
WO2023055033A1 (en) | Method and apparatus for enhancing texture details of images | |
JP3175914B2 (ja) | 画像符号化方法および画像符号化装置 | |
WO2022250402A1 (ko) | 영상 처리 장치 및 그 동작방법 | |
US20110109794A1 (en) | Caching structure and apparatus for use in block based video | |
WO2020204287A1 (en) | Display apparatus and image processing method thereof | |
WO2019135524A1 (ko) | 그래디언트의 적응적 가중에 기반한 에지 위치 결정 장치 및 방법 | |
JP2009116730A (ja) | 画像処理装置及び方法 | |
WO2023085862A1 (en) | Image processing method and related device | |
WO2021241994A1 (ko) | Rgb-d 카메라의 트래킹을 통한 3d 모델의 생성 방법 및 장치 | |
WO2021158039A1 (ko) | 영상을 다운스케일링하는 방법 및 장치 | |
WO2024053840A1 (ko) | 뉴럴 네트워크 모델을 포함하는 이미지 처리 장치 및 이의 동작 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22811565 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022811565 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2022811565 Country of ref document: EP Effective date: 20231026 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 202280037578.7 Country of ref document: CN |
|
NENP | Non-entry into the national phase |
Ref country code: DE |