WO2021134222A1 - Selective control of conditional filters in resolution-adaptive video coding - Google Patents

Selective control of conditional filters in resolution-adaptive video coding Download PDF

Info

Publication number
WO2021134222A1
WO2021134222A1 PCT/CN2019/129943 CN2019129943W WO2021134222A1 WO 2021134222 A1 WO2021134222 A1 WO 2021134222A1 CN 2019129943 W CN2019129943 W CN 2019129943W WO 2021134222 A1 WO2021134222 A1 WO 2021134222A1
Authority
WO
WIPO (PCT)
Prior art keywords
reference frame
header
interpolation filter
frame
level
Prior art date
Application number
PCT/CN2019/129943
Other languages
French (fr)
Inventor
Yuchen SUN
Tsuishan CHANG
Jian Lou
Original Assignee
Alibaba Group Holding Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Limited filed Critical Alibaba Group Holding Limited
Priority to PCT/CN2019/129943 priority Critical patent/WO2021134222A1/en
Publication of WO2021134222A1 publication Critical patent/WO2021134222A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/523Motion estimation or motion compensation with sub-pixel accuracy
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/59Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial sub-sampling or interpolation, e.g. alteration of picture size or resolution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Definitions

  • FIGS. 1A, 1B, 1C, and 1D illustrate examples of partitioning of frames and subunits thereof.
  • FIGS. 4A and 4B illustrate a flowchart of a video coding method implementing resolution-adaptive video coding according to example embodiments of the present disclosure.
  • FIGS. 5A through 5C illustrate an example of motion prediction without resizing a reference frame according to example embodiments of the present disclosure.
  • motion prediction coding formats may refer to data formats wherein frames are encoded with motion vector information and prediction information of a frame by the inclusion of one or more references to motion information and prediction units (PUs) of one or more other frames.
  • Motion information may refer to data describing motion of a block structure of a frame or a unit or subunit thereof, such as motion vectors and references to blocks of a current frame or of another frame.
  • PUs may refer to a unit or multiple subunits corresponding to a block structure among multiple block structures of a frame, such as an MB or a CTU, wherein blocks are partitioned based on the frame data and are coded according to established video codecs.
  • Motion information corresponding to a PU may describe motion prediction as encoded by any motion vector coding tool, including, but not limited to, those described herein.
  • a sequence of CTUs of a frame which encompass a rectangular region of a frame may be collectively referred to as a tile.
  • CTUs of a frame may, collectively, make up multiple tiles.
  • Tiles of a same width may make up a column of tiles, and tiles of a same height may make up a row of tiles.
  • a frame may be subdivided into one or more columns of tiles and one or more rows of tiles.
  • Tiles or CTUs may make up slices of a frame.
  • a frame may be subdivided into some number of slices, and the entire frame may subdivided be according to raster-scan slice mode or according to rectangular slice mode.
  • a slice of a frame subdivided according to raster-scan slice mode may include multiple tiles which make up a region shaped not according to geometrically shaped regions but generally according to tiles in a raster scan sequence.
  • a slice according to rectangular slice mode may include multiple tiles, or multiple CTUs of a tile, which make up a complete rectangular region. In either case, the multiple tiles may be any number of complete, contiguous tiles of the picture, not necessarily encompassing entire columns of tiles or entire rows of tiles.
  • Slices may make up a subpicture of a frame.
  • a frame may be subdivided into some number of subpictures, each of which may be a rectangular region of the frame, composed of some number of slices of the frame.
  • Slices making up a subpicture may be according to raster-scan slice mode and/or rectangular slice mode as long as the slices collectively make up a rectangular region.
  • FIGS. 1A, 1B, 1C, and 1D illustrate examples of partitioning of frames and subunits thereof.
  • a frame is subdivided into 12 tiles, each outlined in non-bold lines; each tile is subdivided into a further number of CTUs, each outlined in broken lines, where the exact number of CTUs illustrated need not be the actual number of CTUs making up each tile.
  • the frame is also subdivided into three slices, each slice according to raster-scan slice mode; picture data of the frame is completely encompassed by the three slices, each outlined in bold lines.
  • Each of the slices is made up of a number of complete tiles.
  • the three slices are alternatingly shaded merely to distinguish adjacent slices, and there is no particular distinction between shaded slices and non-shaded slices.
  • a frame is subdivided into 24 tiles, each outlined in non-bold lines; each tile is subdivided into a further number of CTUs, each outlined in broken lines, where the exact number of CTUs illustrated need not be the actual number of CTUs making up each tile.
  • the tiles of the frame further make up nine slices, each slice according to rectangular slice mode; picture data of the frame is completely encompassed by the nine slices, each outlined in bold lines.
  • Each of the slices is made up of a number of complete tiles.
  • the nine slices are alternatingly shared merely to distinguish adjacent slices, and there is no particular distinction between shaded slices and non-shaded slices.
  • a frame is subdivided into four tiles, each outlined in non-bold lines (though some of the non-bold lines are obstructed by bold lines, it should be understood that the four tiles are equal in size) ; each tile is subdivided into a further number of CTUs, each outlined in broken lines, where the exact number of CTUs illustrated need not be the actual number of CTUs making up each tile.
  • the tiles of the frame further make up four slices, each slice according to rectangular slice mode; picture data of the frame is completely encompassed by the four slices, each outlined in bold lines. Not all of the slices are made up of complete tiles, and some of the slices are made up of CTUs of a tile making up a complete rectangular region instead of a complete tile.
  • a frame is subdivided into 28 subpictures, not all subpictures having like dimensions or proportions. Slices making up each of these subpictures are not illustrated in further detail, for conciseness, though slices making up subpictures having like dimensions or proportions need not be alike in shape, dimensions or proportions therebetween. Tiles making up the frame are also not illustrated in further detail, for conciseness.
  • a video encoder may obtain a picture from a video source and code the frame to obtain a reconstructed frame that may be output for transmission.
  • Blocks of a reconstructed frame may be intra-coded or inter-coded.
  • FIG. 2 illustrates an example block diagram of a video coding process 200 according to an example embodiment of the present disclosure.
  • a picture from a video source 202 may be encoded to generate a reconstructed frame, and the reconstructed frame may be output at a destination such as a reference frame buffer 204 or a transmission buffer 216.
  • the picture may be input into a coding loop, which may include the steps of inputting the picture into a first in-loop up-sampler or down-sampler 206, generating an up-sampled or down-sampled picture, inputting the up-sampled or down-sampled picture into a video encoder 208, generating a reconstructed frame based on a previous reconstructed frame of the reference frame buffer 204, inputting the reconstructed frame into one or more in-loop filters 210, and outputting the reconstructed frame from the loop, which may include inputting the reconstructed frame into a second up-sampler or down-sampler 214, generating an up-sampled or down-sampled reconstructed frame, and outputting the up
  • video coding standards such as AVC may not currently support headers at other subunit levels, such as headers at a tile level
  • headers at a tile level persons skilled in the art who wish to implement headers at a tile level may develop similar video coding standards wherein headers specify tile-level parameters. Implementation of such standards shall not be described in detail herein.
  • a coded frame is obtained from a source such as a bitstream 220.
  • a source such as a bitstream 220.
  • a previous frame having position N–1 in the bitstream 220 may have a resolution larger than or smaller than a resolution of current frame
  • a next frame having position N+1 in the bitstream 220 may have a resolution larger than or smaller than the resolution of the current frame.
  • the current frame may be input into a coding loop, which may include the steps of inputting the current frame into a video decoder 222, inputting the current frame into one or more in-loop filters 224, inputting the current frame into a third in-loop up-sampler or down-sampler 228, generating an up-sampled or down-sampled reconstructed frame, and outputting the up-sampled or down-sampled reconstructed frame into the reference frame buffer 204.
  • the current frame may be output from the loop, which may include outputting the up-sampled or down-sampled reconstructed frame into a display buffer.
  • the video encoder 208 and the video decoder 222 may each implement a motion prediction coding format, including, but not limited to, those coding formats described herein.
  • Generating a reconstructed frame based on a previous reconstructed frame of the reference frame buffer 204 may include inter-coded motion prediction as described herein, wherein the previous reconstructed frame may be an up-sampled or down-sampled reconstructed frame output by the in-loop up-sampler or down-sampler 214/228, and the previous reconstructed frame serves as a reference picture in inter-coded motion prediction as described herein.
  • motion prediction information may include a motion vector identifying a predictor block.
  • a motion vector may be a displacement vector representing a displacement between a current block and a predictor block that is referenced for coding of the current block.
  • Displacement may be measured in pixels in a horizontal direction and a vertical direction over a current frame.
  • the displacement vector may represent a displacement between a pixel of the current block and a corresponding pixel of the predictor block at the same positions within the respective blocks.
  • the displacement vector may represent a displacement from a pixel at an upper-left corner of the current block to a pixel at an upper-left corner of the predictor block.
  • Inter-coded motion prediction may add a block of a current frame and a motion vector of the current frame to locate a predictor block.
  • a motion vector may indicate a coordinate of a predictor block of a reference frame from which motion information should be derived for the block of the current frame.
  • the coordinate of the predictor block of the reference frame may be located by adding the motion vector to the coordinate of the block of the current frame, assuming that the current frame and the reference frame have the same resolution such that pixels correspond one-to-one between the current frame and the reference frame.
  • motion prediction may support accuracy to an integer pixel scale or to a sub-pixel scale.
  • motion prediction may be accurate to a half-pixel scale, such that an interpolation filter is applied to a frame to interpolate the frame by a factor of 2. That is, between every two pixels of the frame, one pixel is generated as sub-pixel picture information.
  • An interpolation filter by a factor of 2 may be implemented as, for example, a 2-tap bilinear filter.
  • motion prediction may be accurate to a quarter-pixel scale, such that an interpolation filter is applied to a frame to interpolate the frame by a factor of 4.
  • An interpolation filter by a factor of 4 may be implemented as, for example, a 7-tap bilinear filter and an 8-tap Discrete Cosine Transform (DCT) -based finite impulse response (FIR) filter.
  • DCT Discrete Cosine Transform
  • FIR finite impulse response
  • Interpolation may occur in a first stage wherein interpolation is performed to half-pixel accuracy, such that a first interpolation filter is applied to the frame to interpolate the frame by a factor of 2, and then a second stage wherein interpolation is performed to a quarter-pixel accuracy.
  • Motion prediction accuracy to a sub-pixel scale may increase quality of compressed frames over motion prediction accuracy to an integer pixel scale, but at the cost of increased computational cost and computing time for each pixel interpolated.
  • a first up-sampler or down-sampler 206, a second up-sampler or down-sampler 214, and a third up-sampler or down-sampler 228 may each implement an up-sampling or down-sampling algorithm suitable for respectively at least up-sampling or down-sampling coded pixel information of a frame coded in a motion prediction coding format.
  • a first up-sampler or down-sampler 206, a second up-sampler or down-sampler 214, and a third up-sampler or down-sampler 228 may each implement an up-sampling or down-sampling algorithm further suitable for respectively upscaling and downscaling motion information such as motion vectors.
  • a frame serving as a reference picture in generating a reconstructed frame for the current frame may therefore be up-sampled or down-sampled in accordance with the resolution of the current frame relative to the resolutions of the previous frame and of the next frame.
  • the frame serving as the reference picture may be up-sampled in the case that the current frame has a resolution larger than the resolutions of either or both the previous frame and the next frame.
  • the frame serving as the reference picture may be down-sampled in the case that the current frame has a resolution smaller than either or both the previous frame and the next frame.
  • FIG. 3 illustrates an example 300 of motion prediction by up-sampling a reference frame as described above.
  • a current frame 310 has a resolution three times the resolution of the reference frame 320, such that the current frame 310 has 9 pixels for each pixel of the reference frame 320; the ratio of the resolution of the reference frame 320 to the resolution of the current frame 310 may be 1: 3.
  • Given a block 312 of the current frame 310 having coordinates (3, 3) i.e., an upper-leftmost pixel of the block 312 has pixel coordinates (3, 3)
  • adding the motion vector to the coordinates of the block 312 yields the coordinates (4, 4) .
  • the motion vector indicates a predictor block having coordinates at (4, 4) .
  • the coordinates (4, 4) may be applied directly to the up-scaled reference frame 330, and a predictor block 332 at (4, 4) in the up-scaled reference frame 330 may be used in motion prediction for the current frame 310.
  • up-sampling a reference frame utilizes interpolation filters to generate additional sub-pixel picture information between pixels of the original reference frame, to fill in the additional pixels of the up-sampled reference frame so that pixels of the reference frame correspond one-to-one to pixels of the current frame.
  • interpolation filters to generate additional sub-pixel picture information between pixels of the original reference frame, to fill in the additional pixels of the up-sampled reference frame so that pixels of the reference frame correspond one-to-one to pixels of the current frame.
  • references to the reference frame are generally to particular predictor blocks of the reference frame pointed to by motion vectors of a current frame.
  • applying an interpolation filter to a reference frame may cause computation of many pixels that ultimately do not contribute to the video decoding process.
  • example systems performing the above-described operations may vary in processing power, battery charge, and otherwise computational capacity utilized to perform such operations.
  • an example system may be a mobile device having comparatively low processing power and limited battery capacity and not being recharged while a user of the mobile device plays a video on the mobile device.
  • the mobile device may play the video while the battery charge is not full, or while the battery charge is low.
  • video decoding is generally computationally intense, the battery charge may drain quickly as a consequence of applying an interpolation filter to a reference frame during motion prediction in such computing environments.
  • a reference frame having a sub-pixel interpolation filter applied thereto still provides an approximation of motion information at sub-pixel accuracy; when motion information from a reference frame not interpolated at the sub-pixel level is used, the approximation may become less reliable, resulting in loss in picture information.
  • making determinations as to whether to decrease application of an interpolation filter may increase computation time of the motion prediction process by causing a video decoder to perform more operations per frame reconstructed.
  • consistently decreasing application of an interpolation filter may increase latency of video playback to undesirable length. Therefore, it may be desirable to selectively decrease application of an interpolation filter, allowing for decreases while decreases are desirable and no decreases while decreases are not desirable, according to at least the computing environments as described above.
  • an example system may be a computing device having comparatively high processing power and/or a continual power source rather than a limited battery.
  • Computational intensity of video decoding and applying an interpolation filter to a reference frame during motion prediction may be comparatively tolerable in such computing environments. Therefore, it may be desirable to selectively increase application of an interpolation filter, allowing for increases while increases are desirable and no increases while increases are not desirable, according to at least the computing environments as described above.
  • the video decoder may instead resize motion information of the current frame, including motion vectors, and based on parameters set in a header (such as a NAL unit header) which applies to the current frame (examples of which are given below) and under some conditions, the video decoder may apply, or partially apply, an interpolation filter to the reference frame before referencing motion information of the reference frame, and based on parameters set in a header (such as a NAL unit header) which applies to the current frame (examples of which are given below) and under other conditions, the video decoder may not apply an interpolation filter to the reference frame before referencing motion information of the reference frame.
  • a header such as a NAL unit header
  • the video decoder may determine a ratio of a resolution of the current frame to a resolution of the reference frame.
  • Various deciding conditions may be set for determining whether to apply, or partially apply, an interpolation filter to the reference frame before referencing motion information of the reference frame. Deciding conditions may be predicated on one or more factors.
  • the video decoder may apply, or partially apply, an interpolation filter thereto, and with regard to other components of the reference frame, the video decoder may not apply an interpolation filter to the reference frame, all before referencing motion information of the reference frame.
  • Various discriminating conditions may be set for determining which components of a reference frame an interpolation filter is to be applied to.
  • parameters of each example header are generally in accordance with established versions of the AVC standard, except as indicated.
  • those parameters of each example header which are not in accordance with established versions of the AVC standard may be implemented without limitation as to where those parameters are implemented with regard to line number or with regard to relative syntactical position to any other parameters of the example headers, except that, by convention, a parameter including an if () clause referencing another parameter follows the referenced parameter and may be immediately subsequent to the referenced parameter.
  • a parameter including an if () clause referencing another parameter follows the referenced parameter and may be immediately subsequent to the referenced parameter.
  • the left column of each table names parameters identified by particular bits of the NAL unit header.
  • the right column describes how those bits should be parsed.
  • u (n) in the right column denotes that n consecutive bits of the header (trailing the bits described in rows above and preceding the bits described in rows below) should be parsed as a value for the corresponding parameter named in the left column.
  • FIG. 1 An example of a video sequence level NAL unit header syntax is given below in Table 1, applying to frames of a video sequence which is transmitted trailing the NAL unit header in a bitstream.
  • a parameter according to example embodiments of the present disclosure is illustrated in row 15 of Table 1. According to an example embodiment of the present disclosure, the if () statement of row 14 is evaluated; according to another example embodiment of the present disclosure, the if () statement of row 14 is not evaluated (indicated by the “or” and the struck-out text in row 14) .
  • the parameter of row 15 may be conditionally set as true or false only when the if () statement evaluates the value of the parameter of row 13 ( “ref_pic_resampling_enabled_flag” ) as true; otherwise, the parameter of row 15 may not be set. According to those example embodiments where the if () statement is not evaluated, the parameter of row 15 may always be set as true or false.
  • the parameter of row 15 need not be located at those specific positions relative to the other parameters, and may be located anywhere relative to the other parameters, though by convention in those cases where the if () statement of row 14 is evaluated then the parameter of row 15 may directly follow the parameter of row 13 ( “ref_pic_resampling_enabled_flag” ) .
  • the parameter of row 15 being set to a value which evaluates as false may allow for, for example, computing environments wherein computational capacity is not limited, battery charge is not low, and/or battery capacity is not limited, wherein application of an interpolation filter may remain normal.
  • the video encoder may not be aware of a computing environment wherein the video decoder ultimately performs motion prediction, the video encoder may nevertheless set the parameter of row 13 and the parameter of row 15 based on, for example, whether motion prediction may be simplified by, for example, below-mentioned decreases of application of interpolation filters for pictures encoded by the video encoder which are packetized into NAL units designated by each header. For example, where picture data trailing the header is encoded referencing frames of different resolutions, where the reference frame has a smaller resolution, interpolation may be conditionally decreased as described above to reduce computation cost, and thus the video encoder may set the parameter of row 13 and the parameter of row 15 to values which evaluate as true.
  • interpolation being conditionally decreased as described above may fail to reduce computation cost, and thus the video encoder may set at least either the parameter of row 13 or the parameter of row 15 to a value which evaluates as false.
  • the conditional decrease of interpolation may be selectively controlled through NAL unit headers.
  • a video encoder may always set the parameter of row 15 to either a value which evaluates as true or a value which evaluates as false, so that interpolation is always conditionally decreased.
  • the conditional decrease of interpolation may be uniformly applied rather than selectively controlled.
  • the parameter of row 15 may be “simplified_resampling_filter_flag, ” which, when evaluating as true, instructs the video decoder to decrease (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the header.
  • the parameter of row 15 may be a different parameter which, when evaluating as true, instructs the video decoder to increase (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the header.
  • Examples of a picture level NAL unit parameter header syntax and a picture level NAL unit header syntax, respectively, are given below in Tables 2 and 3, applying to a picture which is transmitted trailing the NAL unit header in a bitstream.
  • a parameter according to example embodiments of the present disclosure is illustrated in row 6 of Table 2 and in row 90 of Table 3. According to example embodiments of the present disclosure, the if () statement of row 5 of Table 2 or the if () statement of row 89 of Table 3 is evaluated; according to other example embodiments of the present disclosure, the if () statement of row 5 of Table 2 or the if () statement of row 89 of Table 3 is not evaluated (indicated by the “or” and the struck-out text in row 5 of Table 2 and row 89 of Table 3) .
  • the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 may be conditionally set as true or false only when the respective if () statement evaluates the value of the parameter “ref_pic_resampling_enabled_flag” (which may be found in a video sequence-level header as shown in Table 1, rather than a picture-level parameter header as shown in Table 2 or a picture-level header as shown in Table 3) as true; otherwise, the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 may not be set. According to those example embodiments where the if () statement is not evaluated, the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 may always be set as true or false.
  • the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 need not be located at those specific positions relative to the other parameters, and may be located anywhere relative to the other parameters; since the if () statement of row 5 of Table 2 and the if () statement of row 89 of Table 3 cannot follow the parameter they are evaluating (which is located in a different header) , convention does not dictate positioning of the if () statements either.
  • a video encoder may selectively set the parameter of “ref_pic_resampling_enabled_flag” at a video sequence-level header to a value which evaluates as true (causing the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 to be set) or a value which evaluates as false (causing the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 to not be set) .
  • the video encoder may then set the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 to a value which evaluates as true or a value which evaluates as false.
  • the parameter “ref_pic_resampling_enabled_flag” and the parameter of row 6 of Table 2 or the parameter “ref_pic_resampling_enabled_flag” and the parameter of row 90 of Table 3 being set to a value which evaluates as true may compensate for, for example, computing environments wherein computational capacity is limited, battery charge is low, and/or battery capacity is limited, wherein application of an interpolation filter should be decreased.
  • the parameter “ref_pic_resampling_enabled_flag” and the parameter of row 6 of Table 2 or the parameter “ref_pic_resampling_enabled_flag” and the parameter of row 90 of Table 3 being set to a value which evaluates as true may allow for, for example, computing environments wherein computational capacity is not limited, battery charge is not low, and/or battery capacity is not limited, wherein application of an interpolation filter may remain normal.
  • the video encoder may not be aware of a computing environment wherein the video decoder ultimately performs motion prediction, the video encoder may nevertheless set the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 based on, for example, whether motion prediction may be simplified by, for example, below-mentioned decreases of application of interpolation filters for pictures encoded by the video encoder which are packetized into NAL units designated by each header. For example, where picture data trailing the header is encoded referencing frames of different resolutions, where the reference frame has a smaller resolution, interpolation may be conditionally decreased as described above to reduce computation cost, and thus the video encoder may set the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 to values which evaluate as true.
  • interpolation being conditionally decreased as described above may fail to reduce computation cost, and thus the video encoder may set at least either the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 to a value which evaluates as false.
  • the conditional decrease of interpolation may be selectively controlled through NAL unit headers.
  • the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 may be “simplified_resampling_filter_flag, ” which, when evaluating as true, instructs the video decoder to decrease (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the header.
  • the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 may be a different parameter which, when evaluating as true, instructs the video decoder to increase (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the header.
  • the parameter of row 39 of Table 4 need not be located at those specific positions relative to the other parameters, and may be located anywhere relative to the other parameters; since the if () statement of row 38 of Table 4 cannot follow the parameter it is evaluating (which is located in a different header) , convention does not dictate positioning of the if () statements either.
  • a video encoder may selectively set the parameter of “ref_pic_resampling_enabled_flag” at a video sequence-level header to a value which evaluates as true (causing the parameter of row 39 of Table 4 to be set) or a value which evaluates as false (causing the parameter of row 39 of Table 4 to not be set) .
  • the video encoder may then set the parameter of row 39 of Table 4 to a value which evaluates as true or a value which evaluates as false.
  • the parameter “ref_pic_resampling_enabled_flag” and the parameter of row 39 of Table 4 being set to a value which evaluates as true may compensate for, for example, computing environments wherein computational capacity is limited, battery charge is low, and/or battery capacity is limited, wherein application of an interpolation filter should be decreased.
  • the parameter “ref_pic_resampling_enabled_flag” and the parameter of row 39 of Table 4 being set to a value which evaluates as true may allow for, for example, computing environments wherein computational capacity is not limited, battery charge is not low, and/or battery capacity is not limited, wherein application of an interpolation filter may remain normal.
  • the video encoder may not be aware of a computing environment wherein the video decoder ultimately performs motion prediction, the video encoder may nevertheless set the parameter of row 39 of Table 4 based on, for example, whether motion prediction may be simplified by, for example, below-mentioned decreases of application of interpolation filters for pictures encoded by the video encoder which are packetized into NAL units designated by each header. For example, where picture data trailing the header is encoded referencing frames of different resolutions, where the reference frame has a smaller resolution, interpolation may be conditionally decreased as described above to reduce computation cost, and thus the video encoder may set the parameter of row 39 of Table 4 to values which evaluate as true.
  • a video encoder may always set the parameter of row 39 of Table 4 to either a value which evaluates as true or a value which evaluates as false, so that interpolation is always conditionally decreased.
  • the conditional decrease of interpolation may be uniformly applied rather than selectively controlled.
  • the parameter of row 39 of Table 4 may be “simplified_resampling_filter_flag, ” which, when evaluating as true, instructs the video decoder to decrease (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the header.
  • the parameter of row 39 of Table 4 may be a different parameter which, when evaluating as true, instructs the video decoder to increase (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the header.
  • a video decoder obtains an inter-coded current frame from a sequence.
  • the current frame may have a position N.
  • a previous frame having position N–1 in the sequence may have a resolution larger than or smaller than a resolution of the current frame, and a next frame having position N+1 in the sequence may have a resolution larger than or smaller than the resolution of the current frame.
  • the video decoder obtains a reference frame from a reference frame buffer.
  • this determination may instruct the video decoder to conditionally decrease application of an interpolation filter to the reference frame or components thereof according to the subsequent steps 408 to 420, or to increase application of an interpolation filter to the reference frame or components thereof according to other alternative steps not described herein.
  • the one or more headers of the sequence applying to at least the current frame or a subunit thereof instructs the video decoder to not increase or decrease application of an interpolation filter to the reference frame or components thereof (such as a “ref_pic_resampling_enabled_flag” parameter of a video sequence level header evaluating to false)
  • the subsequent steps 408 to 420 or other alternative steps not described herein may be skipped.
  • the video decoder compares a resolution of the reference frame to a resolution of the current frame and determines that a resolution of the reference frame is different from the resolution of the current frame.
  • the reference frame having a resolution different from the resolution of the current frame may be, for example, a most recent frame of the reference frame buffer, though the reference frame having a resolution different from the resolution of the current frame may be a frame not the most recent frame of the reference frame buffer.
  • the video decoder may further determine that the resolution of the reference frame is larger than the resolution of the current frame, or that the resolution of the reference frame is smaller than the resolution of the current frame.
  • the video decoder determines a ratio of the resolution of the reference frame to the resolution of the current frame.
  • the video decoder determines, by a deciding condition and/or a discriminating condition, whether to apply an interpolation filter to the reference frame or components thereof, or to partially apply an interpolation filter to the reference frame or components thereof, or to not apply an interpolation filter to the reference frame.
  • a deciding condition results in interpolation filters not being applied or not being fully applied to all reference frames of all current frames, but still being applied or partially applied to at least some reference frames of some current frames; in other words, the video decoder decides to decrease application of an interpolation filter to the reference frame.
  • Partial application of an interpolation filter may refer to applying a first stage of an interpolation filter but not applying a second stage of the interpolation filter, as described above.
  • a deciding condition may be whether the resolution of the reference frame is larger than or smaller than the resolution of the current frame, such that in the case that the resolution of the reference frame is larger than the resolution of the current frame, an interpolation filter is not applied to the reference frame or an interpolation filter is only partially applied to the reference frame, or such that in the case that the resolution of the reference frame is smaller than the resolution of the current frame, an interpolation filter is not applied to the reference frame or an interpolation filter is only partially applied to the reference frame.
  • a deciding condition may be whether dimensions of blocks of a reference frame or blocks of the current frame are larger than particular threshold dimensions or are smaller than particular threshold dimensions, such that in the case that blocks of the reference frame or blocks of the current frame are larger than particular threshold dimensions, an interpolation filter is not applied to the reference frame or an interpolation filter is only partially applied to the reference frame, or such that in the case that blocks of the reference frame or blocks of the current frame are smaller than particular threshold dimensions, an interpolation filter is not applied to the reference frame or an interpolation filter is only partially applied to the reference frame.
  • various discriminating conditions are possible, as long as, over the course of a video coding process for a video source, for at least some reference frames of some current frames, interpolation filters are not applied to some components of those reference frames, while interpolation filters are applied to other components of those reference frames.
  • the video decoder decides to decrease application of an interpolation filter to components of the reference frame.
  • a discriminating condition may be to apply an interpolation filter to chroma components of a reference frame and to not apply an interpolation filter to luma components of a reference frame, or to apply an interpolation filter to luma components of a reference frame and to not apply an interpolation filter to chroma components of a reference frame.
  • the video coding method 400 may proceed through to step 420 below and then perform those portions of step 420 corresponding to the respective case. In those cases where the video decoder determines, by a discriminating condition, to apply an interpolation filter to components of the reference frame, or to partially apply an interpolation filter to components of the reference frame, the video coding method 400 may proceed through to step 420 below and then perform those portions of step 420 corresponding to the respective case as applied to only those components of the reference frame. In those cases where the video decoder determines, by a decision condition, to not apply an interpolation filter to the reference frame, the video coding method 400 may proceed by each step as described below except skipping step 420 as described below.
  • FIG. 5A illustrates an example 500 of motion prediction without resizing a reference frame as described herein. Similar to the illustration of FIG. 3, the current frame 510 has a resolution three times the resolution of the reference frame 520, such that the current frame 510 has 9 pixels for each pixel of the reference frame 520, and the ratio of the resolution of the reference frame 520 to the resolution of the current frame 510 is 1: 3.
  • the video decoder determines a motion vector of the block of the current frame, and calculates a pixel coordinate indicated by the motion vector of the block of the current frame.
  • the motion vector may be determined in accordance with steps of motion prediction. Steps of performing motion prediction determining motion vectors shall not be described in detail herein, but may include, for example, deriving a motion candidate list for the block of the current frame; selecting a motion candidate from the derived motion candidate list or merging candidate list; and deriving a motion vector of the motion candidate as a motion vector of the block of the current frame.
  • a decoder may decode a frame on a per-block basis in a coding order among blocks of the frame, such as a raster scan order wherein a first-decoded block is an uppermost and leftmost block of the frame, according to video encoding standards.
  • FIG. 5A As illustrated by FIG. 5A, as an example 500 of calculating a pixel coordinate indicated by a motion vector, given a block 512 of the current frame 510 having coordinates (3, 3) (i.e., an upper-leftmost pixel of the block 512 has pixel coordinates (3, 3) ) , and a motion vector (1, 1) of the block 512, adding the motion vector to the coordinates of the block 512 yields the coordinates (4, 4) .
  • the motion vector indicates a predictor block having coordinates at (4, 4) .
  • the video decoder resizes motion information of the block of the current frame to a resolution of the reference frame in accordance with the ratio.
  • FIG. 5A illustrates this hypothetical predictor block 514 outlined by a hypothetical reference frame 516 at the same resolution as the current frame 510, which does not exist according to example embodiments of the present disclosure. Instead, the video decoder may multiply the coordinates (4, 4) by a factor of 1/3 based on the ratio 1: 3, resulting in the coordinates (4/3, 4/3) .
  • FIG. 5A illustrates a hypothetical block 518 having these coordinates in the reference frame 520.
  • the video decoder locates a predictor block of the reference frame in accordance with the motion information.
  • Resized motion information by itself, may be insufficient for locating a predictor block of the reference frame.
  • scaled coordinates indicated by the motion vector may be in proportion to a resolution of the reference frame, but may not correspond to an integer pixel coordinate of the reference frame; may, additionally, not correspond to a half-pixel coordinate of the reference frame in the case that the video decoder implements half-pixel motion prediction; and may, additionally, not correspond to a quarter-pixel coordinate of the reference frame in the case that the video decoder implements quarter-pixel motion prediction.
  • the video decoder may further round the scaled coordinate of the block to a nearest pixel scale or sub-pixel scale supported by the video decoder.
  • scaled coordinates of (4/3, 4/3) indicated by the motion vector may not correspond to any pixel coordinate in the reference frame 520, whether at integer pixel accuracy, half-pixel accuracy, or quarter-pixel accuracy. Therefore, the video decoder may round the scaled coordinates to quarter-pixel accuracy in the case that the video decoder supports quarter-pixel accuracy; thus, (4/3, 4/3) may be rounded to (1.25, 1.25) , locating a predictor block 522 at (1.25, 1.25) .
  • the video decoder may round the scaled coordinates to half-pixel accuracy in the case that the video decoder does not support quarter-pixel accuracy but does support half-pixel accuracy; thus, (4/3, 4/3) may be rounded to (1.5, 1.5) , locating a predictor block (not illustrated) at (1.5, 1.5) .
  • the video decoder may round the scaled coordinates to integer pixel accuracy in the case that the video decoder does not support either level of sub-pixel accuracy; thus, (4/3, 4/3) may be rounded to (1, 1) , locating a predictor block 524 at (1, 1) .
  • the video decoder may locate the predictor block directly at the scaled coordinates at the reference frame.
  • the video decoder may nevertheless not round the scaled coordinates to the highest granularity level of accuracy supported by the video decoder. Instead, the video decoder may round the scaled coordinates to a lower granularity level of accuracy than the highest level supported.
  • the video decoder applies an interpolation filter to a block at the scaled coordinates at the reference frame to generate sub-pixel values of the predictor block.
  • the interpolation filter may be applied as described above, and, furthermore, in the cases that the scaled coordinates are at half-pixel accuracy or are rounded to half-pixel accuracy, only the first stage of interpolation as described above may be performed, skipping the second stage, therefore reducing computational costs and computing time of decoding.
  • the video decoder does not need to apply an interpolation filter to pixels of the reference block, and step 418 may be skipped with pixels at a block at the scaled coordinates at the reference frame being used directly in motion prediction. Avoidance of application of the interpolation filter may greatly reduce computational costs and computing time of decoding.
  • the video decoder may determine, by a deciding condition and/or a discriminating condition, to increase application of an interpolation filter to the reference frame or components thereof, or to decrease application of an interpolation filter to the reference frame or components thereof.
  • the video decoder deciding to increase application of the interpolation filter may mean that by the normal operation of step 420, the interpolation filter would not be applied to the reference frame or components thereof, and the outcome of step 420 is that the video decoder decides to partially apply or fully apply the interpolation filter to the reference frame or components thereof. Or, this may mean that, by the normal operation of step 420, the interpolation filter would be applied to the reference frame or components thereof in at most one stage, and the outcome of step 420 is that the video decoder decides to apply another interpolation filter to the reference frame or components thereof in a second stage.
  • step 420 this may mean that by the normal operation of step 420, the interpolation filter would be at most partially applied to the reference frame or components thereof, and the outcome of step 420 is that the video decoder decides to at least fully apply the interpolation filter to the reference frame or components thereof. Or, this may mean that by the normal operation of step 420, the interpolation filter would be fully or partially applied to some components of the reference frame, and the outcome of step 420 is that the video decoder decides to fully or partially apply the interpolation filter to all components of the reference frame.
  • step 420 may mean that by the normal operation of step 420, the interpolation filter applied would sample a smaller number of coefficients for each pixel generated, such as 2-tap filters, and the outcome of step 420 is that the video decoder decides to apply an interpolation filter that would apply a larger number of coefficients for each pixel generated, such as 7-tap and 8-tap filters.
  • the video decoder deciding to decrease application of the interpolation filter may mean that by the normal operation of step 420, the interpolation filter would be fully applied to the reference frame or components thereof, and the outcome of step 420 is that the video decoder decides to partially apply or not apply the interpolation filter to the reference frame or components thereof. Or, this may mean that by the normal operation of step 420, the interpolation filter would be partially applied to the reference frame or components thereof, and the outcome of step 420 is that the video decoder decides to not apply the interpolation filter to the reference frame or components thereof.
  • step 420 this may mean that by the normal operation of step 420, the interpolation filter would be fully or partially applied to all components of the reference frame, and the outcome of step 420 is that the video decoder decides to fully or partially apply the interpolation filter to some components of the reference frame. Or, this may mean that by the normal operation of step 420, the interpolation filter applied would sample a larger number of coefficients for each pixel generated, such as 7-tap and 8-tap filters., and the outcome of step 420 is that the video decoder decides to apply an interpolation filter that would apply a smaller number of coefficients for each pixel generated, such as 2-tap filters.
  • Some of the following outcomes may result from step 412 in conjunction with step 420 according to various configurations of deciding conditions and/or discriminating conditions. Each of these outcomes may be desirable over applying an interpolation filter to a reference frame without decrease due to deciding conditions and/or discriminating conditions, due to reducing computing time of the decoding process, though each of these outcomes may result in some degree of information loss.
  • the deciding condition is that the resolution of the reference frame is larger than the resolution of the current frame
  • the video decoder determines, by the deciding condition, to decrease application of an interpolation filter to the reference frame, and in step 420, the video decoder decreases application of an interpolation filter to the reference frame but does not skip application of an interpolation filter to the reference frame.
  • the above outcome may be desirable because, in conditions of decreased application of an interpolation filter, loss incurred from deriving motion information from a reference frame larger than the resolution of the current frame may not be too great, as the larger reference frame inherently contains more picture information than the current frame.
  • the deciding condition is that the resolution of the reference frame is smaller than the resolution of the current frame
  • the video decoder determines, by the deciding condition, to decrease application of an interpolation filter to the reference frame, and in step 420, the video decoder decreases application of an interpolation filter to the reference frame.
  • loss incurred from deriving motion information from a reference frame larger than the resolution of the current frame has been experimentally shown to be somewhat greater than loss incurred from deriving motion information from a reference frame smaller than the resolution of the current frame.
  • the loss incurred is still less than baseline loss incurred from skipping application of an interpolation filter.
  • the discriminating condition is to apply an interpolation filter to chroma components of a reference frame and to not apply an interpolation filter to luma components of a reference frame
  • the video decoder determines, by the discriminating condition, to decrease application of an interpolation filter to the luma components of the reference frame
  • the video decoder decreases application of an interpolation filter to the luma component of the reference frame.
  • the above outcome may be desirable because application of an interpolation filter to luma components may incur greater computational costs than application of an interpolation filter to chroma components.
  • the discriminating condition is to apply an interpolation filter to luma components of a reference frame and to not apply an interpolation filter to chroma components of a reference frame
  • the video decoder determines, by the discriminating condition, to decrease application of an interpolation filter to the chroma components of the reference frame
  • the video decoder decreases application of an interpolation filter to the chroma component of the reference frame.
  • the deciding condition is that dimensions of blocks of a reference frame are larger than particular threshold dimensions, and the video decoder determines, by the deciding condition, to decrease application of an interpolation filter to the reference frame, and in step 420, the video decoder decreases application of an interpolation filter to the reference frame.
  • the above outcome may be desirable because application of an interpolation filter to frames having larger blocks may incur greater computational costs than application of an interpolation filter to frames having smaller blocks.
  • the deciding condition is that dimensions of blocks of a reference frame are smaller than particular threshold dimensions, and the video decoder determines, by the deciding condition, to decrease application of an interpolation filter to the reference frame, and in step 420, the video decoder decreases application of an interpolation filter to the reference frame.
  • the video decoder increases application of an interpolation filter to the reference frame. This may mean that the video decoder decides to apply an interpolation filter that would apply a larger number of coefficients for each pixel generated, such as 10-tap and 12-tap filters.
  • FIG. 6 illustrates an example system 600 for implementing the processes and methods described above for implementing resolution-adaptive video coding.
  • the techniques and mechanisms described herein may be implemented by multiple instances of the system 600 as well as by any other computing device, system, and/or environment.
  • the system 600 shown in FIG. 6 is only one example of a system and is not intended to suggest any limitation as to the scope of use or functionality of any computing device utilized to perform the processes and/or procedures described above.
  • the system 600 may include one or more processors 602 and system memory 604 communicatively coupled to the processor (s) 602.
  • the processor (s) 602 may execute one or more modules and/or processes to cause the processor (s) 602 to perform a variety of functions.
  • the processor (s) 602 may include a central processing unit ( “CPU” ) , a graphics processing unit ( “GPU” ) , both CPU and GPU, or other processing units or components known in the art. Additionally, each of the processor (s) 602 may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems.
  • the modules 606 may include, but are not limited to, a motion prediction module 608, which includes a frame obtaining submodule 610, a reference frame obtaining submodule 612, a header determining submodule 614, a resolution comparing submodule 616, a ratio determining submodule 618, a filter application determining submodule 620, a motion vector determining submodule 622, a motion information resizing submodule 624, a predictor block locating submodule 626, and an interpolation filter applying module 628.
  • a motion prediction module 608 includes a frame obtaining submodule 610, a reference frame obtaining submodule 612, a header determining submodule 614, a resolution comparing submodule 616, a ratio determining submodule 618, a filter application determining submodule 620, a motion vector determining submodule 622, a motion information resizing submodule 624, a predictor block locating submodule
  • the frame obtaining submodule 610 may be configured to obtain an inter-coded current frame from a sequence as abovementioned with reference to FIGS. 4A and 4B.
  • the reference frame obtaining submodule 612 may be configured to obtain a reference frame from a reference frame buffer as abovementioned with reference to FIGS. 4A and 4B.
  • the header determining submodule 614 may be configured to determine that one or more headers of the sequence applying to at least the current frame or a subunit thereof instructs the video decoder to conditionally decrease or increase application of an interpolation filter to the reference frame or components thereof, as abovementioned with reference to FIGS. 4A and 4B.
  • the resolution comparing submodule 616 may be configured to compare resolutions of the reference picture to a resolution of the current frame and determine that a resolution of the reference frame is different from the resolution of the current frame, as abovementioned with reference to FIGS. 4A and 4B.
  • the ratio determining submodule 618 may be configured to determine a ratio of the resolution of the reference frame to the resolution of the current frame, as abovementioned with reference to FIGS. 4A and 4B.
  • the filter application determining submodule 620 may be configured to determine, by a deciding condition and/or a discriminating condition, whether to apply an interpolation filter to the reference frame or components thereof, or to partially apply an interpolation filter to the reference frame or components thereof, or to not apply an interpolation filter to the reference frame, as abovementioned with reference to FIGS. 4A and 4B.
  • the motion information determining submodule 622 may be configured to determine a motion vector of the block of the current frame, and calculate a pixel coordinate indicated by the motion vector of the block of the current frame, as abovementioned with reference to FIGS. 4A and 4B.
  • the motion information resizing submodule 624 may be configured to resize motion information of the block of the current frame to a resolution of the reference frame in accordance with the ratio, as abovementioned with reference to FIGS. 4A and 4B.
  • the predictor block locating submodule 626 may be configured to locate a predictor block of the reference frame in accordance with the resized motion information, as abovementioned with reference to FIGS. 4A and 4B.
  • the interpolation filter applying submodule 628 may be configured to, in the cases that the scaled coordinates are at sub-pixel accuracy or are rounded to sub-pixel accuracy, apply an interpolation filter to a block at the scaled coordinates at the reference frame to generate sub-pixel values of the predictor block, and/or decrease the application of the interpolation filter thereto, in accordance with a determination by the filter application determining submodule 620, as abovementioned with reference to FIGS. 4A and 4B.
  • the system 600 may additionally include an input/output (I/O) interface 640 for receiving video source data and bitstream data, and for outputting decoded frames into a reference frame buffer and/or a display buffer.
  • the system 600 may also include a communication module 650 allowing the system 600 to communicate with other devices (not shown) over a network (not shown) .
  • the network may include the Internet, wired media such as a wired network or direct-wired connections, and wireless media such as acoustic, radio frequency ( “RF” ) , infrared, and other wireless media.
  • RF radio frequency
  • a non-transient computer-readable storage medium is an example of computer-readable media.
  • Computer-readable media includes at least two types of computer-readable media, namely computer-readable storage media and communications media.
  • Computer-readable storage media includes volatile and non- volatile, removable and non-removable media implemented in any process or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
  • Computer-readable storage media includes, but is not limited to, phase change memory ( “PRAM” ) , static random-access memory ( “SRAM” ) , dynamic random-access memory ( “DRAM” ) , other types of random-access memory ( “RAM” ) , read-only memory ( “ROM” ) , electrically erasable programmable read-only memory ( “EEPROM” ) , flash memory or other memory technology, compact disk read-only memory ( “CD-ROM” ) , digital versatile disks ( “DVD” ) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device.
  • PRAM phase change memory
  • SRAM static random-access memory
  • DRAM dynamic random-access memory
  • RAM random-access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory or other memory technology compact disk read-only memory ( “
  • the computer-readable instructions stored on one or more non-transitory computer-readable storage media that, when executed by one or more processors, may perform operations described above with reference to FIGS. 1A-6.
  • computer-readable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types.
  • the order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes.
  • the present disclosure provides selective control of conditional application of interpolation filters to a reference frame to enable inter-frame adaptive resolution changes based on motion prediction video coding standards, decreasing application of an interpolation filter to compensate for computing environments wherein computational capacity is limited, battery charge is low, and/or battery capacity is limited, and maintaining or increasing application of an interpolation filter to allow for computing environments wherein computational capacity is not limited, battery charge is not low, and/or battery capacity is not limited.
  • a method comprising: determining that one or more headers of a sequence applying to at least a current frame or a subunit thereof instructs, by a header syntax, conditional decreasing or conditional increasing of application of an interpolation filter during motion prediction; determining motion information of a block of the current frame or the subunit thereof, the motion information comprising a motion vector of the block and at least one pixel coordinate indicated by the motion vector; locating a predictor block of a reference frame in accordance with the motion information; decreasing or increasing, based on at least one of a deciding condition and a discriminating condition, application of an interpolation filter to the reference frame or components thereof; and performing motion prediction on the current block by reference to the located predictor block.
  • the method as paragraph A recites, further comprising determining that the resolution of the reference frame is different from a resolution of the current frame, and determining that the resolution of the reference frame is larger than or smaller than the resolution of the current frame.
  • a discriminating condition is to apply the interpolation filter to luma components of the reference frame and to not apply the interpolation filter to chroma components of the reference frame, and decreasing application of the interpolation filter to the reference frame is based on the discriminating condition.
  • the one or more headers comprises a video sequence-level, picture-level, or slice-level first header instructing, by the header syntax, conditional decreasing or conditional increasing of application of an interpolation filter to the reference frame or components thereof.
  • the one or more headers further comprises a video sequence-level second header selectively controlling, by the header syntax, whether the first header is set.
  • the motion prediction module further comprises a resolution comparing submodule configured to determine that the resolution of the reference frame is different from a resolution of the current frame, and determine that the resolution of the reference frame is larger than or smaller than the resolution of the current frame.
  • a discriminating condition is to apply the interpolation filter to chroma components of the reference frame and to not apply the interpolation filter to luma components of the reference frame, and decreasing application of the interpolation filter to the reference frame is based on the discriminating condition.
  • a discriminating condition is to apply the interpolation filter to luma components of the reference frame and to not apply the interpolation filter to chroma components of the reference frame, and decreasing application of the interpolation filter to the reference frame is based on the discriminating condition.
  • deciding condition is that dimensions of blocks of the reference frame are smaller than particular threshold dimensions, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
  • the one or more headers comprises a video sequence-level, picture-level, or slice-level first header instructing, by the header syntax, conditional decreasing or conditional increasing of application of an interpolation filter to the reference frame or components thereof.
  • the one or more headers further comprises a video sequence-level second header selectively controlling, by the header syntax, whether the first header is set.
  • a computer-readable storage medium storing computer-readable instructions executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to perform operations comprising: determining that one or more headers of a sequence applying to at least a current frame or a subunit thereof instructs, by a header syntax, conditional decreasing or conditional increasing of application of an interpolation filter during motion prediction; determining motion information of a block of the current frame or the subunit thereof, the motion information comprising a motion vector of the block and at least one pixel coordinate corresponding to the motion vector; locating a predictor block of a reference frame in accordance with the motion information; decreasing or increasing, based on at least one of a deciding condition and a discriminating condition, application of an interpolation filter to the reference frame; and performing motion prediction on the current block by reference to the located predictor block.
  • the computer-readable storage medium as paragraph HH recites, operations further comprising resizing the motion information according to a resolution of the reference frame, and wherein the predictor block of the reference frame is located in accordance with the resized motion information.
  • the computer-readable storage medium as paragraph HH recites, the operations further comprising determining that the resolution of the reference frame is different from a resolution of the current frame, and determining that the resolution of the reference frame is larger than or smaller than the resolution of the current frame.
  • the computer-readable storage medium as paragraph JJ recites, the operations further comprising determining, by the at least one of a deciding condition or a discriminating condition, whether to apply an interpolation filter to the reference frame or components thereof, or to partially apply an interpolation filter to the reference frame or components thereof, or to not apply an interpolation filter to the reference frame.
  • a deciding condition is that the resolution of the reference frame is larger than the resolution of the current frame, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
  • a deciding condition is that the resolution of the reference frame is smaller than the resolution of the current frame, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
  • a discriminating condition is to apply the interpolation filter to chroma components of the reference frame and to not apply the interpolation filter to luma components of the reference frame, and decreasing application of the interpolation filter to the reference frame is based on the discriminating condition.
  • the computer-readable storage medium as paragraph HH recites, wherein the deciding condition is that dimensions of blocks of the reference frame are smaller than particular threshold dimensions, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
  • the computer-readable storage medium as paragraph HH recites, wherein the one or more headers comprises a video sequence-level header.
  • the computer-readable storage medium as paragraph HH recites, wherein the one or more headers comprises a picture-level header.
  • the computer-readable storage medium as paragraph HH recites, wherein the one or more headers comprises a slice-level header.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Systems and methods are provided for selective control of conditional application of interpolation filters to a reference frame to enable inter-frame adaptive resolution changes based on motion prediction video coding standards. The methods and systems described herein include determining that one or more headers instructs a video decoder to conditionally decrease or increase application of an interpolation filter to a reference frame or components thereof; determining a ratio of the resolution of a reference frame to the resolution of a current frame; determining a motion vector of the block of the current frame, and calculating a pixel coordinate indicated by the motion vector of the block of the current frame; locating a predictor block of a reference frame in accordance with the motion information; and subsequently applying an interpolation filter to a block at the scaled coordinates at the reference frame to generate sub-pixel values of the predictor block.

Description

SELECTIVE CONTROL OF CONDITIONAL FILTERS IN RESOLUTION-ADAPTIVE VIDEO CODING BACKGROUND
In conventional video coding formats, such as the H. 264/AVC (Advanced Video Coding) and H. 265/HEVC (High Efficiency Video Coding) standards, video frames in a sequence have their size and resolution recorded at the sequence-level in a header. Thus, in order to change frame resolution, a new video sequence must be generated, starting with an intra-coded frame, which carries significantly larger bandwidth costs to transmit than inter-coded frames. Consequently, although it is desirable to adaptively transmit a down-sampled, low resolution video over a network when network bandwidth becomes low, reduced or throttled, it is difficult to realize bandwidth savings while using conventional video coding formats, because the bandwidth costs of adaptively down-sampling offset the bandwidth gains.
In the development of the next-generation video codec specification, VVC/H. 266, several new motion prediction coding tools are provided to support motion vector coding which references previous frames when those previous frames have different resolutions. Based on these new coding tools, resolution differences in a video sequence may enable techniques which conserve computational costs. For example, it has been proposed to resize motion information of a current frame during motion prediction implemented at a decoder to account for these resolution differences, while avoiding computationally costly processes of up-sampling of reference filters and application of interpolation filters thereto.
However, since video decoding is performed on end-user computing devices, processing power and battery charge of these devices may vary. Therefore, in practical implementations of coding tools on end-user computing devices, it is desirable to take actual processing power and specific systems into consideration in performing the above-referenced proposed techniques. Thus, it is desired to exercise greater control over when techniques that conserve computational power should be performed, and when they do not need to be performed.
BRIEF DESCRIPTION OF THE DRAWINGS
The detailed description is set forth with reference to the accompanying figures. In the figures, the left-most digit (s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items or features.
FIGS. 1A, 1B, 1C, and 1D illustrate examples of partitioning of frames and subunits thereof.
FIG. 2 illustrates an example block diagram of a video coding process according to an example embodiment of the present disclosure.
FIG. 3 illustrates an example of motion prediction by up-sampling a reference frame.
FIGS. 4A and 4B illustrate a flowchart of a video coding method implementing resolution-adaptive video coding according to example embodiments of the present disclosure.
FIGS. 5A through 5C illustrate an example of motion prediction without resizing a reference frame according to example embodiments of the present disclosure.
FIG. 6 illustrates an example system for implementing the processes and methods described herein for implementing resolution-adaptive video coding.
DETAILED DESCRIPTION
Systems and methods discussed herein are directed to implement inter-frame adaptive resolution change in a video encoder and a video decoder, and more specifically to implement selective control of conditional application of interpolation filters to a reference frame to enable inter-frame adaptive resolution changes based on motion prediction video coding standards.
According to example embodiments of the present disclosure implemented to be compatible with AVC, HEVC, VVC, and such video coding standards implementing motion prediction, a frame may be subdivided into macroblocks (MBs) each having dimensions of 16x16 pixels, which may be further subdivided into partitions. According to example embodiments of the present disclosure implemented to be compatible with the HEVC standard, a frame may be subdivided into coding tree  units (CTUs) , the luma and chroma components of which may be further subdivided into coding tree blocks (CTBs) which are further subdivided into coding units (CUs) . According to example embodiments of the present disclosure implemented as other standards, a frame may be subdivided into units of NxN pixels, which may then be further subdivided into subunits. Each of these largest subdivided units of a frame may generally be referred to as a “block” for the purpose of this disclosure.
According to example embodiments of the present disclosure, a block may be subdivided into partitions having dimensions in multiples of 4x4 pixels. For example, a partition of a block may have dimensions of 8x4 pixels, 4x8 pixels, 8x8 pixels, 16x8 pixels, or 8x16 pixels.
According to example embodiments of the present disclosure, motion prediction coding formats may refer to data formats wherein frames are encoded with motion vector information and prediction information of a frame by the inclusion of one or more references to motion information and prediction units (PUs) of one or more other frames. Motion information may refer to data describing motion of a block structure of a frame or a unit or subunit thereof, such as motion vectors and references to blocks of a current frame or of another frame. PUs may refer to a unit or multiple subunits corresponding to a block structure among multiple block structures of a frame, such as an MB or a CTU, wherein blocks are partitioned based on the frame data and are coded according to established video codecs. Motion information corresponding to a PU may describe motion prediction as encoded by any motion vector coding tool, including, but not limited to, those described herein.
Likewise, frames may be encoded with transform information by the inclusion of one or more transformation units (TUs) . Transform information may refer to coefficients representing one of several spatial transformations, such as a diagonal flip, a vertical flip, or a rotation, which may be applied to a sub-block.
Sub-blocks of CUs such as PUs and TUs may be arranged in any combination of sub-block dimensions as described above. A CU may be subdivided into a residual quadtree (RQT) , a hierarchical structure of TUs. The RQT provides an order for motion  prediction and residual coding over sub-blocks of each level and recursively down each level of the RQT.
A sequence of CTUs of a frame which encompass a rectangular region of a frame may be collectively referred to as a tile. CTUs of a frame may, collectively, make up multiple tiles. Tiles of a same width may make up a column of tiles, and tiles of a same height may make up a row of tiles. A frame may be subdivided into one or more columns of tiles and one or more rows of tiles.
Tiles or CTUs may make up slices of a frame. A frame may be subdivided into some number of slices, and the entire frame may subdivided be according to raster-scan slice mode or according to rectangular slice mode. A slice of a frame subdivided according to raster-scan slice mode may include multiple tiles which make up a region shaped not according to geometrically shaped regions but generally according to tiles in a raster scan sequence. A slice according to rectangular slice mode may include multiple tiles, or multiple CTUs of a tile, which make up a complete rectangular region. In either case, the multiple tiles may be any number of complete, contiguous tiles of the picture, not necessarily encompassing entire columns of tiles or entire rows of tiles. In the case of a slice according to raster-scan slice mode, the tiles making up a slice may be a consecutive sequence of tiles in a raster scan order across the frame (that is, a sequence of tiles proceeding row-wise from left to right over a row of tiles, which may start mid-row in any row of tiles and may end mid-row in any row of tiles) . In the case of a slice according to rectangular slice mode, the tiles making up a slice may be some number of tiles which collectively make up a rectangular region of the frame, or may be some number of CTUs making up consecutive, complete rows of CTUs within a tile, the CTUs collectively making up a rectangular region of the frame.
Slices may make up a subpicture of a frame. A frame may be subdivided into some number of subpictures, each of which may be a rectangular region of the frame, composed of some number of slices of the frame. Slices making up a subpicture may be according to raster-scan slice mode and/or rectangular slice mode as long as the slices collectively make up a rectangular region.
FIGS. 1A, 1B, 1C, and 1D illustrate examples of partitioning of frames and subunits thereof. As FIG. 1A illustrates, a frame is subdivided into 12 tiles, each outlined in non-bold lines; each tile is subdivided into a further number of CTUs, each outlined in broken lines, where the exact number of CTUs illustrated need not be the actual number of CTUs making up each tile. The frame is also subdivided into three slices, each slice according to raster-scan slice mode; picture data of the frame is completely encompassed by the three slices, each outlined in bold lines. Each of the slices is made up of a number of complete tiles. The three slices are alternatingly shaded merely to distinguish adjacent slices, and there is no particular distinction between shaded slices and non-shaded slices.
As FIG. 1B illustrates, a frame is subdivided into 24 tiles, each outlined in non-bold lines; each tile is subdivided into a further number of CTUs, each outlined in broken lines, where the exact number of CTUs illustrated need not be the actual number of CTUs making up each tile. The tiles of the frame further make up nine slices, each slice according to rectangular slice mode; picture data of the frame is completely encompassed by the nine slices, each outlined in bold lines. Each of the slices is made up of a number of complete tiles. The nine slices are alternatingly shared merely to distinguish adjacent slices, and there is no particular distinction between shaded slices and non-shaded slices.
As FIG. 1C illustrates, a frame is subdivided into four tiles, each outlined in non-bold lines (though some of the non-bold lines are obstructed by bold lines, it should be understood that the four tiles are equal in size) ; each tile is subdivided into a further number of CTUs, each outlined in broken lines, where the exact number of CTUs illustrated need not be the actual number of CTUs making up each tile. The tiles of the frame further make up four slices, each slice according to rectangular slice mode; picture data of the frame is completely encompassed by the four slices, each outlined in bold lines. Not all of the slices are made up of complete tiles, and some of the slices are made up of CTUs of a tile making up a complete rectangular region instead of a complete tile.
As FIG. 1D illustrates, a frame is subdivided into 28 subpictures, not all subpictures having like dimensions or proportions. Slices making up each of these subpictures are not illustrated in further detail, for conciseness, though slices making up subpictures having like dimensions or proportions need not be alike in shape, dimensions or proportions therebetween. Tiles making up the frame are also not illustrated in further detail, for conciseness.
A video encoder according to motion prediction coding may obtain a picture from a video source and code the frame to obtain a reconstructed frame that may be output for transmission. Blocks of a reconstructed frame may be intra-coded or inter-coded.
FIG. 2 illustrates an example block diagram of a video coding process 200 according to an example embodiment of the present disclosure.
In a video coding process 200, a picture from a video source 202 may be encoded to generate a reconstructed frame, and the reconstructed frame may be output at a destination such as a reference frame buffer 204 or a transmission buffer 216. The picture may be input into a coding loop, which may include the steps of inputting the picture into a first in-loop up-sampler or down-sampler 206, generating an up-sampled or down-sampled picture, inputting the up-sampled or down-sampled picture into a video encoder 208, generating a reconstructed frame based on a previous reconstructed frame of the reference frame buffer 204, inputting the reconstructed frame into one or more in-loop filters 210, and outputting the reconstructed frame from the loop, which may include inputting the reconstructed frame into a second up-sampler or down-sampler 214, generating an up-sampled or down-sampled reconstructed frame, and outputting the up-sampled or down-sampled reconstructed frame into the reference frame buffer 204 or into a transmission buffer 216 to be transmitted to a bitstream.
Data in the transmission buffer 216 carries headers defined according to a video coding standard. For example, according to the AVC standard, data in the transmission buffer 216 may be in the form of Network Abstraction Layer ( “NAL” ) units. Each NAL unit may include parameters recited by a series of headers written by  a video encoder which instruct a video decoder to generate reconstructed frames in accordance with the parameters.
Parameters of these headers may apply to pictures in a video sequence or any smaller unit of pictures or subunits thereof within a video sequence. For example, according to the AVC standard, a video encoder may write a header seq_parameter_set_rbsp () at a video sequence level, the header including parameters applying to all pictures in a video sequence and all subunits thereof. A video encoder may write a header pic_parameter_set_rbsp () at a picture level, the header including parameters applying to a picture and all subunits thereof. At a subpicture level the picture-level header pic_parameter_set_rbsp () may be shared in common with each subpicture except with respect to parameters pertaining to, for example, position, width, and height. A video encoder may write a header slice_header () at a slice level, the header including parameters applying to a slice subunit of a picture and all further subunits thereof.
While video coding standards such as AVC may not currently support headers at other subunit levels, such as headers at a tile level, persons skilled in the art who wish to implement headers at a tile level may develop similar video coding standards wherein headers specify tile-level parameters. Implementation of such standards shall not be described in detail herein.
In a video decoding process 218, a coded frame is obtained from a source such as a bitstream 220. According to example embodiments of the present disclosure, given a current frame having position N in the bitstream 220, a previous frame having position N–1 in the bitstream 220 may have a resolution larger than or smaller than a resolution of current frame, and a next frame having position N+1 in the bitstream 220 may have a resolution larger than or smaller than the resolution of the current frame. The current frame may be input into a coding loop, which may include the steps of inputting the current frame into a video decoder 222, inputting the current frame into one or more in-loop filters 224, inputting the current frame into a third in-loop up-sampler or down-sampler 228, generating an up-sampled or down-sampled reconstructed frame, and outputting the up-sampled or down-sampled reconstructed  frame into the reference frame buffer 204. Alternatively, the current frame may be output from the loop, which may include outputting the up-sampled or down-sampled reconstructed frame into a display buffer.
Upon leading NAL units of a video sequence, a picture, a subpicture, or a slice being input into a video decoder 222, the video decoder 222 may read headers at a video sequence level, a picture level, a subpicture level, or a slice level from the NAL units, and perform motion prediction coding of the trailing video sequence, picture, subpicture, or slice based on parameters written into the respective headers.
According to example embodiments of the present disclosure, the video encoder 208 and the video decoder 222 may each implement a motion prediction coding format, including, but not limited to, those coding formats described herein. Generating a reconstructed frame based on a previous reconstructed frame of the reference frame buffer 204 may include inter-coded motion prediction as described herein, wherein the previous reconstructed frame may be an up-sampled or down-sampled reconstructed frame output by the in-loop up-sampler or down-sampler 214/228, and the previous reconstructed frame serves as a reference picture in inter-coded motion prediction as described herein.
According to example embodiments of the present disclosure, motion prediction information may include a motion vector identifying a predictor block. A motion vector may be a displacement vector representing a displacement between a current block and a predictor block that is referenced for coding of the current block. Displacement may be measured in pixels in a horizontal direction and a vertical direction over a current frame. The displacement vector may represent a displacement between a pixel of the current block and a corresponding pixel of the predictor block at the same positions within the respective blocks. For example, the displacement vector may represent a displacement from a pixel at an upper-left corner of the current block to a pixel at an upper-left corner of the predictor block.
Inter-coded motion prediction may add a block of a current frame and a motion vector of the current frame to locate a predictor block. For example, while decoding a block of the current frame, given a coordinate of a pixel at an upper-left  corner of the block of the current frame, a motion vector may indicate a coordinate of a predictor block of a reference frame from which motion information should be derived for the block of the current frame. The coordinate of the predictor block of the reference frame may be located by adding the motion vector to the coordinate of the block of the current frame, assuming that the current frame and the reference frame have the same resolution such that pixels correspond one-to-one between the current frame and the reference frame.
Moreover, motion prediction may support accuracy to an integer pixel scale or to a sub-pixel scale. For example, according to example embodiments of the present disclosure implemented according to the HEVC standard, motion prediction may be accurate to a half-pixel scale, such that an interpolation filter is applied to a frame to interpolate the frame by a factor of 2. That is, between every two pixels of the frame, one pixel is generated as sub-pixel picture information. An interpolation filter by a factor of 2 may be implemented as, for example, a 2-tap bilinear filter. For example, according to example embodiments of the present disclosure implemented according to the HEVC standard, motion prediction may be accurate to a quarter-pixel scale, such that an interpolation filter is applied to a frame to interpolate the frame by a factor of 4. That is, between every two pixels of the frame, three pixels are generated as sub-pixel picture information. An interpolation filter by a factor of 4 may be implemented as, for example, a 7-tap bilinear filter and an 8-tap Discrete Cosine Transform (DCT) -based finite impulse response (FIR) filter.
Interpolation may occur in a first stage wherein interpolation is performed to half-pixel accuracy, such that a first interpolation filter is applied to the frame to interpolate the frame by a factor of 2, and then a second stage wherein interpolation is performed to a quarter-pixel accuracy. Motion prediction accuracy to a sub-pixel scale may increase quality of compressed frames over motion prediction accuracy to an integer pixel scale, but at the cost of increased computational cost and computing time for each pixel interpolated.
According to example embodiments of the present disclosure, a first up-sampler or down-sampler 206, a second up-sampler or down-sampler 214, and a third  up-sampler or down-sampler 228 may each implement an up-sampling or down-sampling algorithm suitable for respectively at least up-sampling or down-sampling coded pixel information of a frame coded in a motion prediction coding format. A first up-sampler or down-sampler 206, a second up-sampler or down-sampler 214, and a third up-sampler or down-sampler 228 may each implement an up-sampling or down-sampling algorithm further suitable for respectively upscaling and downscaling motion information such as motion vectors.
A frame serving as a reference picture in generating a reconstructed frame for the current frame, such as the previous reconstructed frame, may therefore be up-sampled or down-sampled in accordance with the resolution of the current frame relative to the resolutions of the previous frame and of the next frame. For example, the frame serving as the reference picture may be up-sampled in the case that the current frame has a resolution larger than the resolutions of either or both the previous frame and the next frame. The frame serving as the reference picture may be down-sampled in the case that the current frame has a resolution smaller than either or both the previous frame and the next frame.
FIG. 3 illustrates an example 300 of motion prediction by up-sampling a reference frame as described above. A current frame 310 has a resolution three times the resolution of the reference frame 320, such that the current frame 310 has 9 pixels for each pixel of the reference frame 320; the ratio of the resolution of the reference frame 320 to the resolution of the current frame 310 may be 1: 3. Given a block 312 of the current frame 310 having coordinates (3, 3) (i.e., an upper-leftmost pixel of the block 312 has pixel coordinates (3, 3) ) , and a motion vector (1, 1) of the block 312, adding the motion vector to the coordinates of the block 312 yields the coordinates (4, 4) . Thus, the motion vector indicates a predictor block having coordinates at (4, 4) .
By up-sampling the reference frame 320 to an up-scaled reference frame 330, this results in the up-scaled reference frame 330 having pixels corresponding one-to-one to the current frame 310. Therefore, the coordinates (4, 4) may be applied directly to the up-scaled reference frame 330, and a predictor block 332 at (4, 4) in the up-scaled reference frame 330 may be used in motion prediction for the current frame 310.
However, performing such operations upon a reference frame may incur substantial computational cost and computing time during a video coding process and/or a video decoding process. For example, up-sampling a reference frame utilizes interpolation filters to generate additional sub-pixel picture information between pixels of the original reference frame, to fill in the additional pixels of the up-sampled reference frame so that pixels of the reference frame correspond one-to-one to pixels of the current frame. When a picture is upsized by a factor of x, for every pixel of the original picture, at least an additional x –1 number of pixels of the up-sampled picture will start out empty and must be filled by an interpolation filter. However, motion prediction utilizing a reference frame is unlikely to reference the majority of the new pixels generated by the interpolation filter, as references to the reference frame are generally to particular predictor blocks of the reference frame pointed to by motion vectors of a current frame. Thus, during a video decoding process, applying an interpolation filter to a reference frame may cause computation of many pixels that ultimately do not contribute to the video decoding process.
In particular, example systems performing the above-described operations (such as examples of system 600 of FIG. 6 as described below) may vary in processing power, battery charge, and otherwise computational capacity utilized to perform such operations. For example, an example system may be a mobile device having comparatively low processing power and limited battery capacity and not being recharged while a user of the mobile device plays a video on the mobile device. Moreover, the mobile device may play the video while the battery charge is not full, or while the battery charge is low. As video decoding is generally computationally intense, the battery charge may drain quickly as a consequence of applying an interpolation filter to a reference frame during motion prediction in such computing environments. Thus, on such example systems, it is desirable to decrease application of an interpolation filter in computing environments as described above.
Conversely, applying an interpolation filter has advantages that would be lost if the filter were entirely bypassed throughout the coding process. A reference frame having a sub-pixel interpolation filter applied thereto still provides an approximation of  motion information at sub-pixel accuracy; when motion information from a reference frame not interpolated at the sub-pixel level is used, the approximation may become less reliable, resulting in loss in picture information.
Furthermore, making determinations as to whether to decrease application of an interpolation filter may increase computation time of the motion prediction process by causing a video decoder to perform more operations per frame reconstructed. Thus, consistently decreasing application of an interpolation filter may increase latency of video playback to undesirable length. Therefore, it may be desirable to selectively decrease application of an interpolation filter, allowing for decreases while decreases are desirable and no decreases while decreases are not desirable, according to at least the computing environments as described above.
Alternatively, an example system may be a computing device having comparatively high processing power and/or a continual power source rather than a limited battery. Computational intensity of video decoding and applying an interpolation filter to a reference frame during motion prediction may be comparatively tolerable in such computing environments. Therefore, it may be desirable to selectively increase application of an interpolation filter, allowing for increases while increases are desirable and no increases while increases are not desirable, according to at least the computing environments as described above.
According to example embodiments of the present disclosure, upon a video decoder determining that a resolution of a reference frame is different from a resolution of a current frame, the video decoder may instead resize motion information of the current frame, including motion vectors, and based on parameters set in a header (such as a NAL unit header) which applies to the current frame (examples of which are given below) and under some conditions, the video decoder may apply, or partially apply, an interpolation filter to the reference frame before referencing motion information of the reference frame, and based on parameters set in a header (such as a NAL unit header) which applies to the current frame (examples of which are given below) and under other conditions, the video decoder may not apply an interpolation filter to the reference frame before referencing motion information of the reference frame. To resize motion  information of the current frame, the video decoder may determine a ratio of a resolution of the current frame to a resolution of the reference frame. Various deciding conditions may be set for determining whether to apply, or partially apply, an interpolation filter to the reference frame before referencing motion information of the reference frame. Deciding conditions may be predicated on one or more factors.
Alternatively, with regard to some components of the reference frame, the video decoder may apply, or partially apply, an interpolation filter thereto, and with regard to other components of the reference frame, the video decoder may not apply an interpolation filter to the reference frame, all before referencing motion information of the reference frame. Various discriminating conditions may be set for determining which components of a reference frame an interpolation filter is to be applied to.
Below, examples of NAL unit header syntax according to the AVC video coding standard are listed, according to example embodiments of the present disclosure, in several tables. It should be understood that parameters of each example header are generally in accordance with established versions of the AVC standard, except as indicated. Moreover, according to example embodiments of the present disclosure, those parameters of each example header which are not in accordance with established versions of the AVC standard may be implemented without limitation as to where those parameters are implemented with regard to line number or with regard to relative syntactical position to any other parameters of the example headers, except that, by convention, a parameter including an if () clause referencing another parameter follows the referenced parameter and may be immediately subsequent to the referenced parameter. For parameters which do not reference other parameters,
The left column of each table names parameters identified by particular bits of the NAL unit header. The right column describes how those bits should be parsed. For the purpose of understanding example embodiments of the present disclosure, it is sufficient to understand that “u (n) ” in the right column denotes that n consecutive bits of the header (trailing the bits described in rows above and preceding the bits described in rows below) should be parsed as a value for the corresponding parameter named in the left column.
References to rows of each table begin with row 1 being the second row from the top.
An example of a video sequence level NAL unit header syntax is given below in Table 1, applying to frames of a video sequence which is transmitted trailing the NAL unit header in a bitstream. A parameter according to example embodiments of the present disclosure is illustrated in row 15 of Table 1. According to an example embodiment of the present disclosure, the if () statement of row 14 is evaluated; according to another example embodiment of the present disclosure, the if () statement of row 14 is not evaluated (indicated by the “or” and the struck-out text in row 14) . According to those example embodiments where the if () statement is evaluated, the parameter of row 15 may be conditionally set as true or false only when the if () statement evaluates the value of the parameter of row 13 ( “ref_pic_resampling_enabled_flag” ) as true; otherwise, the parameter of row 15 may not be set. According to those example embodiments where the if () statement is not evaluated, the parameter of row 15 may always be set as true or false.
According to example embodiments of the present disclosure, the parameter of row 15 need not be located at those specific positions relative to the other parameters, and may be located anywhere relative to the other parameters, though by convention in those cases where the if () statement of row 14 is evaluated then the parameter of row 15 may directly follow the parameter of row 13 ( “ref_pic_resampling_enabled_flag” ) .
According to those example embodiments of the present disclosure where the if () statement is evaluated, a video encoder may selectively set the parameter of row 13 to a value which evaluates as true (causing the parameter of row 15 to be set) or a value which evaluates as false (causing the parameter of row 15 to not be set) . In those cases where the parameter of row 15 is set, the video encoder may then set the parameter of row 15 to a value which evaluates as true or a value which evaluates as false. The parameter of row 13 and the parameter of row 15 being set to a value which evaluates as true may compensate for, for example, computing environments wherein computational capacity is limited, battery charge is low, and/or battery capacity is limited, wherein application of an interpolation filter should be decreased. Alternately,  the parameter of row 15 being set to a value which evaluates as false may allow for, for example, computing environments wherein computational capacity is not limited, battery charge is not low, and/or battery capacity is not limited, wherein application of an interpolation filter may remain normal.
Though the video encoder may not be aware of a computing environment wherein the video decoder ultimately performs motion prediction, the video encoder may nevertheless set the parameter of row 13 and the parameter of row 15 based on, for example, whether motion prediction may be simplified by, for example, below-mentioned decreases of application of interpolation filters for pictures encoded by the video encoder which are packetized into NAL units designated by each header. For example, where picture data trailing the header is encoded referencing frames of different resolutions, where the reference frame has a smaller resolution, interpolation may be conditionally decreased as described above to reduce computation cost, and thus the video encoder may set the parameter of row 13 and the parameter of row 15 to values which evaluate as true. However, where picture data trailing the header is encoded referencing frames of where the reference frame does not have a different resolution, or has a larger resolution, interpolation being conditionally decreased as described above may fail to reduce computation cost, and thus the video encoder may set at least either the parameter of row 13 or the parameter of row 15 to a value which evaluates as false. Thus, the conditional decrease of interpolation may be selectively controlled through NAL unit headers.
According to those example embodiments of the present disclosure where the if () statement is not evaluated, a video encoder may always set the parameter of row 15 to either a value which evaluates as true or a value which evaluates as false, so that interpolation is always conditionally decreased. Thus, the conditional decrease of interpolation may be uniformly applied rather than selectively controlled.
According to above example embodiments of the present disclosure, the parameter of row 15 may be “simplified_resampling_filter_flag, ” which, when evaluating as true, instructs the video decoder to decrease (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the  header. However, according to yet further example embodiments of the present disclosure, the parameter of row 15 may be a different parameter which, when evaluating as true, instructs the video decoder to increase (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the header.
Figure PCTCN2019129943-appb-000001
Figure PCTCN2019129943-appb-000002
Figure PCTCN2019129943-appb-000003
Figure PCTCN2019129943-appb-000004
Examples of a picture level NAL unit parameter header syntax and a picture level NAL unit header syntax, respectively, are given below in Tables 2 and 3, applying to a picture which is transmitted trailing the NAL unit header in a bitstream. A parameter according to example embodiments of the present disclosure is illustrated in row 6 of Table 2 and in row 90 of Table 3. According to example embodiments of the present disclosure, the if () statement of row 5 of Table 2 or the if () statement of row 89 of Table 3 is evaluated; according to other example embodiments of the present disclosure, the if () statement of row 5 of Table 2 or the if () statement of row 89 of Table 3 is not evaluated (indicated by the “or” and the struck-out text in row 5 of Table 2 and row 89 of Table 3) . According to those example embodiments where the if () statement is evaluated, the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 may be conditionally set as true or false only when the respective if () statement evaluates the value of the parameter “ref_pic_resampling_enabled_flag” (which may be found in a video sequence-level header as shown in Table 1, rather than a picture-level parameter header as shown in Table 2 or a picture-level header as shown in Table  3) as true; otherwise, the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 may not be set. According to those example embodiments where the if () statement is not evaluated, the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 may always be set as true or false.
According to example embodiments of the present disclosure, the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 need not be located at those specific positions relative to the other parameters, and may be located anywhere relative to the other parameters; since the if () statement of row 5 of Table 2 and the if () statement of row 89 of Table 3 cannot follow the parameter they are evaluating (which is located in a different header) , convention does not dictate positioning of the if () statements either.
According to those example embodiments of the present disclosure where the if () statement is evaluated, a video encoder may selectively set the parameter of “ref_pic_resampling_enabled_flag” at a video sequence-level header to a value which evaluates as true (causing the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 to be set) or a value which evaluates as false (causing the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 to not be set) . In those cases where the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 is set, the video encoder may then set the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 to a value which evaluates as true or a value which evaluates as false. The parameter “ref_pic_resampling_enabled_flag” and the parameter of row 6 of Table 2 or the parameter “ref_pic_resampling_enabled_flag” and the parameter of row 90 of Table 3 being set to a value which evaluates as true may compensate for, for example, computing environments wherein computational capacity is limited, battery charge is low, and/or battery capacity is limited, wherein application of an interpolation filter should be decreased. Alternately, the parameter “ref_pic_resampling_enabled_flag” and the parameter of row 6 of Table 2 or the parameter “ref_pic_resampling_enabled_flag” and the parameter of row 90 of Table 3 being set to a value which evaluates as true may allow for, for example, computing environments  wherein computational capacity is not limited, battery charge is not low, and/or battery capacity is not limited, wherein application of an interpolation filter may remain normal.
Though the video encoder may not be aware of a computing environment wherein the video decoder ultimately performs motion prediction, the video encoder may nevertheless set the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 based on, for example, whether motion prediction may be simplified by, for example, below-mentioned decreases of application of interpolation filters for pictures encoded by the video encoder which are packetized into NAL units designated by each header. For example, where picture data trailing the header is encoded referencing frames of different resolutions, where the reference frame has a smaller resolution, interpolation may be conditionally decreased as described above to reduce computation cost, and thus the video encoder may set the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 to values which evaluate as true. However, where picture data trailing the header is encoded referencing frames of where the reference frame does not have a different resolution, or has a larger resolution, interpolation being conditionally decreased as described above may fail to reduce computation cost, and thus the video encoder may set at least either the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 to a value which evaluates as false. Thus, the conditional decrease of interpolation may be selectively controlled through NAL unit headers.
According to those example embodiments of the present disclosure where the if () statement is not evaluated, a video encoder may always set the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 to either a value which evaluates as true or a value which evaluates as false, so that interpolation is always conditionally decreased. Thus, the conditional decrease of interpolation may be uniformly applied rather than selectively controlled.
According to example embodiments of the present disclosure, the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 may be “simplified_resampling_filter_flag, ” which, when evaluating as true, instructs the video decoder to decrease (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the header. According to other example  embodiments of the present disclosure, the parameter of row 6 of Table 2 or the parameter of row 90 of Table 3 may be a different parameter which, when evaluating as true, instructs the video decoder to increase (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the header.
Figure PCTCN2019129943-appb-000005
Figure PCTCN2019129943-appb-000006
Figure PCTCN2019129943-appb-000007
Figure PCTCN2019129943-appb-000008
Figure PCTCN2019129943-appb-000009
Figure PCTCN2019129943-appb-000010
An example of a slice level NAL unit header syntax is given below in Table 4, applying to a slice of a picture which is transmitted trailing the NAL unit header in a bitstream. A parameter according to example embodiments of the present disclosure is illustrated in row 39 of Table 4. According to example embodiments of the present disclosure, the if () statement of row 38 of Table 4 is evaluated; according to other example embodiments of the present disclosure, the if () statement of row 38 of Table 4 is not evaluated (indicated by the “or” and the struck-out text in row 38 of Table 4) . According to those example embodiments where the if () statement is evaluated, the parameter of row 39 of Table 4 may be conditionally set as true or false only when the respective if () statement evaluates the value of the parameter “ref_pic_resampling_enabled_flag” (which may be found in a video sequence-level header as shown in Table 1, rather than a slice-level header as shown in Table 4) as true; otherwise, the parameter of row 39 of Table 4 may not be set. According to those example embodiments where the if () statement is not evaluated, the parameter of row 39 of Table 4 may always be set as true or false.
According to example embodiments of the present disclosure, the parameter of row 39 of Table 4 need not be located at those specific positions relative to the other parameters, and may be located anywhere relative to the other parameters; since the if () statement of row 38 of Table 4 cannot follow the parameter it is evaluating (which is located in a different header) , convention does not dictate positioning of the if () statements either.
According to those example embodiments of the present disclosure where the if () statement is evaluated, a video encoder may selectively set the parameter of “ref_pic_resampling_enabled_flag” at a video sequence-level header to a value which  evaluates as true (causing the parameter of row 39 of Table 4 to be set) or a value which evaluates as false (causing the parameter of row 39 of Table 4 to not be set) . In those cases where the parameter of row 39 of Table 4 is set, the video encoder may then set the parameter of row 39 of Table 4 to a value which evaluates as true or a value which evaluates as false. The parameter “ref_pic_resampling_enabled_flag” and the parameter of row 39 of Table 4 being set to a value which evaluates as true may compensate for, for example, computing environments wherein computational capacity is limited, battery charge is low, and/or battery capacity is limited, wherein application of an interpolation filter should be decreased. Alternately, the parameter “ref_pic_resampling_enabled_flag” and the parameter of row 39 of Table 4 being set to a value which evaluates as true may allow for, for example, computing environments wherein computational capacity is not limited, battery charge is not low, and/or battery capacity is not limited, wherein application of an interpolation filter may remain normal.
Though the video encoder may not be aware of a computing environment wherein the video decoder ultimately performs motion prediction, the video encoder may nevertheless set the parameter of row 39 of Table 4 based on, for example, whether motion prediction may be simplified by, for example, below-mentioned decreases of application of interpolation filters for pictures encoded by the video encoder which are packetized into NAL units designated by each header. For example, where picture data trailing the header is encoded referencing frames of different resolutions, where the reference frame has a smaller resolution, interpolation may be conditionally decreased as described above to reduce computation cost, and thus the video encoder may set the parameter of row 39 of Table 4 to values which evaluate as true. However, where picture data trailing the header is encoded referencing frames of where the reference frame does not have a different resolution, or has a larger resolution, interpolation being conditionally decreased as described above may fail to reduce computation cost, and thus the video encoder may set at least either the parameter of row 39 of Table 4 to a value which evaluates as false. Thus, the conditional decrease of interpolation may be selectively controlled through NAL unit headers.
According to those example embodiments of the present disclosure where the if () statement is not evaluated, a video encoder may always set the parameter of row 39 of Table 4 to either a value which evaluates as true or a value which evaluates as false, so that interpolation is always conditionally decreased. Thus, the conditional decrease of interpolation may be uniformly applied rather than selectively controlled.
According to example embodiments of the present disclosure, the parameter of row 39 of Table 4 may be “simplified_resampling_filter_flag, ” which, when evaluating as true, instructs the video decoder to decrease (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the header. According to other example embodiments of the present disclosure, the parameter of row 39 of Table 4 may be a different parameter which, when evaluating as true, instructs the video decoder to increase (as described subsequently) application of an interpolation filter to picture data of a video sequence trailing the header.
Figure PCTCN2019129943-appb-000011
Figure PCTCN2019129943-appb-000012
Figure PCTCN2019129943-appb-000013
FIGS. 4A and 4B illustrate a flowchart of a video coding method 400 implementing resolution-adaptive video coding according to example embodiments of the present disclosure.
At step 402, a video decoder obtains an inter-coded current frame from a sequence. The current frame may have a position N. A previous frame having position N–1 in the sequence may have a resolution larger than or smaller than a resolution of the current frame, and a next frame having position N+1 in the sequence may have a resolution larger than or smaller than the resolution of the current frame.
At step 404, the video decoder obtains a reference frame from a reference frame buffer.
At step 406, the video decoder determines that one or more headers of the sequence applying to at least the current frame or a subunit thereof instructs the video decoder to conditionally decrease or increase application of an interpolation filter to the reference frame or components thereof.
As described above with reference to Tables 1 to 4, the video decoder may determine that a “ref_pic_resampling_enabled_flag” parameter of a video sequence level header evaluates to true, and then may determine that a subsequent parameter of row 15 of Table 1, row 6 of Table 2, row 90 of Table 3, or row 39 of Table 4 evaluates to true. Alternately, the video decoder may determine that a parameter of row 15 of Table 1, row 6 of Table 2, row 90 of Table 3, or row 39 of Table 4 evaluates to true. According to example embodiments of the present disclosure, this determination may instruct the video decoder to conditionally decrease application of an interpolation filter  to the reference frame or components thereof according to the subsequent steps 408 to 420, or to increase application of an interpolation filter to the reference frame or components thereof according to other alternative steps not described herein. In the event that the one or more headers of the sequence applying to at least the current frame or a subunit thereof instructs the video decoder to not increase or decrease application of an interpolation filter to the reference frame or components thereof (such as a “ref_pic_resampling_enabled_flag” parameter of a video sequence level header evaluating to false) , the subsequent steps 408 to 420 or other alternative steps not described herein may be skipped.
At step 408, the video decoder compares a resolution of the reference frame to a resolution of the current frame and determines that a resolution of the reference frame is different from the resolution of the current frame.
According to example embodiments of the present disclosure, the reference frame having a resolution different from the resolution of the current frame may be, for example, a most recent frame of the reference frame buffer, though the reference frame having a resolution different from the resolution of the current frame may be a frame not the most recent frame of the reference frame buffer.
According to example embodiments of the present disclosure, the video decoder may further determine that the resolution of the reference frame is larger than the resolution of the current frame, or that the resolution of the reference frame is smaller than the resolution of the current frame.
At step 410, the video decoder determines a ratio of the resolution of the reference frame to the resolution of the current frame.
At step 412, the video decoder determines, by a deciding condition and/or a discriminating condition, whether to apply an interpolation filter to the reference frame or components thereof, or to partially apply an interpolation filter to the reference frame or components thereof, or to not apply an interpolation filter to the reference frame.
According to example embodiments of the present disclosure, various deciding conditions are possible, as long as, over the course of a video coding process for a video source, a deciding condition results in interpolation filters not being applied  or not being fully applied to all reference frames of all current frames, but still being applied or partially applied to at least some reference frames of some current frames; in other words, the video decoder decides to decrease application of an interpolation filter to the reference frame. Partial application of an interpolation filter may refer to applying a first stage of an interpolation filter but not applying a second stage of the interpolation filter, as described above.
For example, a deciding condition may be whether the resolution of the reference frame is larger than or smaller than the resolution of the current frame, such that in the case that the resolution of the reference frame is larger than the resolution of the current frame, an interpolation filter is not applied to the reference frame or an interpolation filter is only partially applied to the reference frame, or such that in the case that the resolution of the reference frame is smaller than the resolution of the current frame, an interpolation filter is not applied to the reference frame or an interpolation filter is only partially applied to the reference frame. Alternatively, a deciding condition may be whether dimensions of blocks of a reference frame or blocks of the current frame are larger than particular threshold dimensions or are smaller than particular threshold dimensions, such that in the case that blocks of the reference frame or blocks of the current frame are larger than particular threshold dimensions, an interpolation filter is not applied to the reference frame or an interpolation filter is only partially applied to the reference frame, or such that in the case that blocks of the reference frame or blocks of the current frame are smaller than particular threshold dimensions, an interpolation filter is not applied to the reference frame or an interpolation filter is only partially applied to the reference frame.
According to example embodiments of the present disclosure, various discriminating conditions are possible, as long as, over the course of a video coding process for a video source, for at least some reference frames of some current frames, interpolation filters are not applied to some components of those reference frames, while interpolation filters are applied to other components of those reference frames. In other words, the video decoder decides to decrease application of an interpolation filter to components of the reference frame.
For example, a discriminating condition may be to apply an interpolation filter to chroma components of a reference frame and to not apply an interpolation filter to luma components of a reference frame, or to apply an interpolation filter to luma components of a reference frame and to not apply an interpolation filter to chroma components of a reference frame.
In those cases where the video decoder determines, by a decision condition, to apply an interpolation filter to the reference frame, or to partially apply an interpolation filter to the reference frame, the video coding method 400 may proceed through to step 420 below and then perform those portions of step 420 corresponding to the respective case. In those cases where the video decoder determines, by a discriminating condition, to apply an interpolation filter to components of the reference frame, or to partially apply an interpolation filter to components of the reference frame, the video coding method 400 may proceed through to step 420 below and then perform those portions of step 420 corresponding to the respective case as applied to only those components of the reference frame. In those cases where the video decoder determines, by a decision condition, to not apply an interpolation filter to the reference frame, the video coding method 400 may proceed by each step as described below except skipping step 420 as described below.
FIG. 5A illustrates an example 500 of motion prediction without resizing a reference frame as described herein. Similar to the illustration of FIG. 3, the current frame 510 has a resolution three times the resolution of the reference frame 520, such that the current frame 510 has 9 pixels for each pixel of the reference frame 520, and the ratio of the resolution of the reference frame 520 to the resolution of the current frame 510 is 1: 3.
At step 414, the video decoder determines a motion vector of the block of the current frame, and calculates a pixel coordinate indicated by the motion vector of the block of the current frame.
The motion vector may be determined in accordance with steps of motion prediction. Steps of performing motion prediction determining motion vectors shall not be described in detail herein, but may include, for example, deriving a motion candidate  list for the block of the current frame; selecting a motion candidate from the derived motion candidate list or merging candidate list; and deriving a motion vector of the motion candidate as a motion vector of the block of the current frame.
A decoder may decode a frame on a per-block basis in a coding order among blocks of the frame, such as a raster scan order wherein a first-decoded block is an uppermost and leftmost block of the frame, according to video encoding standards.
As illustrated by FIG. 5A, as an example 500 of calculating a pixel coordinate indicated by a motion vector, given a block 512 of the current frame 510 having coordinates (3, 3) (i.e., an upper-leftmost pixel of the block 512 has pixel coordinates (3, 3) ) , and a motion vector (1, 1) of the block 512, adding the motion vector to the coordinates of the block 512 yields the coordinates (4, 4) . Thus, the motion vector indicates a predictor block having coordinates at (4, 4) .
At step 416, the video decoder resizes motion information of the block of the current frame to a resolution of the reference frame in accordance with the ratio.
According to example embodiments of the present disclosure, resizing motion information may include, after adding the scaled motion vector to the scaled block coordinate, scaling the resulting coordinate to derive a scaled coordinate indicated by the motion vector.
As illustrated by FIG. 5A, given the ratio of the resolution of the reference frame 520 to the resolution of the current frame 510 being 1: 3, locating a predictor block at the coordinates (4, 4) in the reference frame 520 would produce an incorrect outcome. FIG. 5A illustrates this hypothetical predictor block 514 outlined by a hypothetical reference frame 516 at the same resolution as the current frame 510, which does not exist according to example embodiments of the present disclosure. Instead, the video decoder may multiply the coordinates (4, 4) by a factor of 1/3 based on the ratio 1: 3, resulting in the coordinates (4/3, 4/3) . FIG. 5A illustrates a hypothetical block 518 having these coordinates in the reference frame 520.
At step 418, the video decoder locates a predictor block of the reference frame in accordance with the motion information.
Resized motion information, by itself, may be insufficient for locating a predictor block of the reference frame. Particularly, scaled coordinates indicated by the motion vector may be in proportion to a resolution of the reference frame, but may not correspond to an integer pixel coordinate of the reference frame; may, additionally, not correspond to a half-pixel coordinate of the reference frame in the case that the video decoder implements half-pixel motion prediction; and may, additionally, not correspond to a quarter-pixel coordinate of the reference frame in the case that the video decoder implements quarter-pixel motion prediction. Thus, the video decoder may further round the scaled coordinate of the block to a nearest pixel scale or sub-pixel scale supported by the video decoder.
As illustrated by FIGS. 5B and 5C, for example, scaled coordinates of (4/3, 4/3) indicated by the motion vector may not correspond to any pixel coordinate in the reference frame 520, whether at integer pixel accuracy, half-pixel accuracy, or quarter-pixel accuracy. Therefore, the video decoder may round the scaled coordinates to quarter-pixel accuracy in the case that the video decoder supports quarter-pixel accuracy; thus, (4/3, 4/3) may be rounded to (1.25, 1.25) , locating a predictor block 522 at (1.25, 1.25) . The video decoder may round the scaled coordinates to half-pixel accuracy in the case that the video decoder does not support quarter-pixel accuracy but does support half-pixel accuracy; thus, (4/3, 4/3) may be rounded to (1.5, 1.5) , locating a predictor block (not illustrated) at (1.5, 1.5) . The video decoder may round the scaled coordinates to integer pixel accuracy in the case that the video decoder does not support either level of sub-pixel accuracy; thus, (4/3, 4/3) may be rounded to (1, 1) , locating a predictor block 524 at (1, 1) .
In the case that the scaled coordinates are already at sub-pixel accuracy, rounding may be unnecessary, and the video decoder may locate the predictor block directly at the scaled coordinates at the reference frame.
According to other example embodiments of the present disclosure, in the case that the scaled coordinates do not correspond to any level of accuracy supported by the video encoder, the video decoder may nevertheless not round the scaled coordinates to the highest granularity level of accuracy supported by the video decoder.  Instead, the video decoder may round the scaled coordinates to a lower granularity level of accuracy than the highest level supported.
At step 420, in the cases that the scaled coordinates are at sub-pixel accuracy or are rounded to sub-pixel accuracy, the video decoder applies an interpolation filter to a block at the scaled coordinates at the reference frame to generate sub-pixel values of the predictor block. The interpolation filter may be applied as described above, and, furthermore, in the cases that the scaled coordinates are at half-pixel accuracy or are rounded to half-pixel accuracy, only the first stage of interpolation as described above may be performed, skipping the second stage, therefore reducing computational costs and computing time of decoding.
In the cases that the scaled coordinates are at integer pixel accuracy or are rounded to integer pixel accuracy, the video decoder does not need to apply an interpolation filter to pixels of the reference block, and step 418 may be skipped with pixels at a block at the scaled coordinates at the reference frame being used directly in motion prediction. Avoidance of application of the interpolation filter may greatly reduce computational costs and computing time of decoding.
Similarly, in the case that the video decoder rounds the scaled coordinates to a lower granularity level of accuracy than the highest level supported as described with regard to step 418, computational costs and computing time may be likewise reduced.
Subsequently, whether step 420 is performed or skipped, the video decoder may decode the current block by reference to the reference frame and the located predictor block therein. The decoded frame may be up-sampled or down-sampled to multiple resolutions each in accordance with a different resolution of a plurality of resolutions supported by a bitstream. The up-sampled and/or down-sampled decoded frames may be input into at least one of a reference frame buffer and a display buffer.
The above describes the normal operation of step 420 without application of deciding conditions and/or discriminating conditions of step 412. However, based on the outcome of step 412, the video decoder may determine, by a deciding condition and/or a discriminating condition, to increase application of an interpolation filter to the  reference frame or components thereof, or to decrease application of an interpolation filter to the reference frame or components thereof.
The video decoder deciding to increase application of the interpolation filter may mean that by the normal operation of step 420, the interpolation filter would not be applied to the reference frame or components thereof, and the outcome of step 420 is that the video decoder decides to partially apply or fully apply the interpolation filter to the reference frame or components thereof. Or, this may mean that, by the normal operation of step 420, the interpolation filter would be applied to the reference frame or components thereof in at most one stage, and the outcome of step 420 is that the video decoder decides to apply another interpolation filter to the reference frame or components thereof in a second stage. Or, this may mean that by the normal operation of step 420, the interpolation filter would be at most partially applied to the reference frame or components thereof, and the outcome of step 420 is that the video decoder decides to at least fully apply the interpolation filter to the reference frame or components thereof. Or, this may mean that by the normal operation of step 420, the interpolation filter would be fully or partially applied to some components of the reference frame, and the outcome of step 420 is that the video decoder decides to fully or partially apply the interpolation filter to all components of the reference frame. Or, this may mean that by the normal operation of step 420, the interpolation filter applied would sample a smaller number of coefficients for each pixel generated, such as 2-tap filters, and the outcome of step 420 is that the video decoder decides to apply an interpolation filter that would apply a larger number of coefficients for each pixel generated, such as 7-tap and 8-tap filters.
The video decoder deciding to decrease application of the interpolation filter may mean that by the normal operation of step 420, the interpolation filter would be fully applied to the reference frame or components thereof, and the outcome of step 420 is that the video decoder decides to partially apply or not apply the interpolation filter to the reference frame or components thereof. Or, this may mean that by the normal operation of step 420, the interpolation filter would be partially applied to the reference frame or components thereof, and the outcome of step 420 is that the video decoder  decides to not apply the interpolation filter to the reference frame or components thereof. Or, this may mean that by the normal operation of step 420, the interpolation filter would be fully or partially applied to all components of the reference frame, and the outcome of step 420 is that the video decoder decides to fully or partially apply the interpolation filter to some components of the reference frame. Or, this may mean that by the normal operation of step 420, the interpolation filter applied would sample a larger number of coefficients for each pixel generated, such as 7-tap and 8-tap filters., and the outcome of step 420 is that the video decoder decides to apply an interpolation filter that would apply a smaller number of coefficients for each pixel generated, such as 2-tap filters.
Some of the following outcomes may result from step 412 in conjunction with step 420 according to various configurations of deciding conditions and/or discriminating conditions. Each of these outcomes may be desirable over applying an interpolation filter to a reference frame without decrease due to deciding conditions and/or discriminating conditions, due to reducing computing time of the decoding process, though each of these outcomes may result in some degree of information loss.
In step 412, the deciding condition is that the resolution of the reference frame is larger than the resolution of the current frame, and the video decoder determines, by the deciding condition, to decrease application of an interpolation filter to the reference frame, and in step 420, the video decoder decreases application of an interpolation filter to the reference frame but does not skip application of an interpolation filter to the reference frame.
The above outcome may be desirable because, in conditions of decreased application of an interpolation filter, loss incurred from deriving motion information from a reference frame larger than the resolution of the current frame may not be too great, as the larger reference frame inherently contains more picture information than the current frame.
In step 412, the deciding condition is that the resolution of the reference frame is smaller than the resolution of the current frame, and the video decoder determines, by the deciding condition, to decrease application of an interpolation filter  to the reference frame, and in step 420, the video decoder decreases application of an interpolation filter to the reference frame.
Experimentally, in conditions of decreased application of an interpolation filter, loss incurred from deriving motion information from a reference frame smaller than the resolution of the current frame has been shown to be not too great, even though the smaller reference frame contains less picture information than the current frame.
As between the two above outcomes, loss incurred from deriving motion information from a reference frame larger than the resolution of the current frame has been experimentally shown to be somewhat greater than loss incurred from deriving motion information from a reference frame smaller than the resolution of the current frame. However, in either case, the loss incurred is still less than baseline loss incurred from skipping application of an interpolation filter.
In step 412, the discriminating condition is to apply an interpolation filter to chroma components of a reference frame and to not apply an interpolation filter to luma components of a reference frame, and the video decoder determines, by the discriminating condition, to decrease application of an interpolation filter to the luma components of the reference frame, and in step 420, the video decoder decreases application of an interpolation filter to the luma component of the reference frame.
The above outcome may be desirable because application of an interpolation filter to luma components may incur greater computational costs than application of an interpolation filter to chroma components.
In step 412, the discriminating condition is to apply an interpolation filter to luma components of a reference frame and to not apply an interpolation filter to chroma components of a reference frame, and the video decoder determines, by the discriminating condition, to decrease application of an interpolation filter to the chroma components of the reference frame, and in step 420, the video decoder decreases application of an interpolation filter to the chroma component of the reference frame.
In step 412, the deciding condition is that dimensions of blocks of a reference frame are larger than particular threshold dimensions, and the video decoder determines, by the deciding condition, to decrease application of an interpolation filter to the  reference frame, and in step 420, the video decoder decreases application of an interpolation filter to the reference frame.
The above outcome may be desirable because application of an interpolation filter to frames having larger blocks may incur greater computational costs than application of an interpolation filter to frames having smaller blocks.
In step 412, the deciding condition is that dimensions of blocks of a reference frame are smaller than particular threshold dimensions, and the video decoder determines, by the deciding condition, to decrease application of an interpolation filter to the reference frame, and in step 420, the video decoder decreases application of an interpolation filter to the reference frame.
In step 420, the video decoder increases application of an interpolation filter to the reference frame. This may mean that the video decoder decides to apply an interpolation filter that would apply a larger number of coefficients for each pixel generated, such as 10-tap and 12-tap filters.
FIG. 6 illustrates an example system 600 for implementing the processes and methods described above for implementing resolution-adaptive video coding.
The techniques and mechanisms described herein may be implemented by multiple instances of the system 600 as well as by any other computing device, system, and/or environment. The system 600 shown in FIG. 6 is only one example of a system and is not intended to suggest any limitation as to the scope of use or functionality of any computing device utilized to perform the processes and/or procedures described above. Other well-known computing devices, systems, environments and/or configurations that may be suitable for use with the embodiments include, but are not limited to, personal computers, server computers, hand-held or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, game consoles, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, implementations using field programmable gate arrays ( “FPGAs” ) and application specific integrated circuits ( “ASICs” ) , and/or the like.
The system 600 may include one or more processors 602 and system memory 604 communicatively coupled to the processor (s) 602. The processor (s) 602 may execute one or more modules and/or processes to cause the processor (s) 602 to perform a variety of functions. In some embodiments, the processor (s) 602 may include a central processing unit ( “CPU” ) , a graphics processing unit ( “GPU” ) , both CPU and GPU, or other processing units or components known in the art. Additionally, each of the processor (s) 602 may possess its own local memory, which also may store program modules, program data, and/or one or more operating systems.
Depending on the exact configuration and type of the system 600, the system memory 604 may be volatile, such as RAM, non-volatile, such as ROM, flash memory, miniature hard drive, memory card, and the like, or some combination thereof. The system memory 604 may include one or more computer-executable modules 606 that are executable by the processor (s) 602.
The modules 606 may include, but are not limited to, a motion prediction module 608, which includes a frame obtaining submodule 610, a reference frame obtaining submodule 612, a header determining submodule 614, a resolution comparing submodule 616, a ratio determining submodule 618, a filter application determining submodule 620, a motion vector determining submodule 622, a motion information resizing submodule 624, a predictor block locating submodule 626, and an interpolation filter applying module 628.
The frame obtaining submodule 610 may be configured to obtain an inter-coded current frame from a sequence as abovementioned with reference to FIGS. 4A and 4B.
The reference frame obtaining submodule 612 may be configured to obtain a reference frame from a reference frame buffer as abovementioned with reference to FIGS. 4A and 4B.
The header determining submodule 614 may be configured to determine that one or more headers of the sequence applying to at least the current frame or a subunit thereof instructs the video decoder to conditionally decrease or increase application of  an interpolation filter to the reference frame or components thereof, as abovementioned with reference to FIGS. 4A and 4B.
The resolution comparing submodule 616 may be configured to compare resolutions of the reference picture to a resolution of the current frame and determine that a resolution of the reference frame is different from the resolution of the current frame, as abovementioned with reference to FIGS. 4A and 4B.
The ratio determining submodule 618 may be configured to determine a ratio of the resolution of the reference frame to the resolution of the current frame, as abovementioned with reference to FIGS. 4A and 4B.
The filter application determining submodule 620 may be configured to determine, by a deciding condition and/or a discriminating condition, whether to apply an interpolation filter to the reference frame or components thereof, or to partially apply an interpolation filter to the reference frame or components thereof, or to not apply an interpolation filter to the reference frame, as abovementioned with reference to FIGS. 4A and 4B.
The motion information determining submodule 622 may be configured to determine a motion vector of the block of the current frame, and calculate a pixel coordinate indicated by the motion vector of the block of the current frame, as abovementioned with reference to FIGS. 4A and 4B.
The motion information resizing submodule 624 may be configured to resize motion information of the block of the current frame to a resolution of the reference frame in accordance with the ratio, as abovementioned with reference to FIGS. 4A and 4B.
The predictor block locating submodule 626 may be configured to locate a predictor block of the reference frame in accordance with the resized motion information, as abovementioned with reference to FIGS. 4A and 4B.
The interpolation filter applying submodule 628 may be configured to, in the cases that the scaled coordinates are at sub-pixel accuracy or are rounded to sub-pixel accuracy, apply an interpolation filter to a block at the scaled coordinates at the reference frame to generate sub-pixel values of the predictor block, and/or decrease the  application of the interpolation filter thereto, in accordance with a determination by the filter application determining submodule 620, as abovementioned with reference to FIGS. 4A and 4B.
The system 600 may additionally include an input/output (I/O) interface 640 for receiving video source data and bitstream data, and for outputting decoded frames into a reference frame buffer and/or a display buffer. The system 600 may also include a communication module 650 allowing the system 600 to communicate with other devices (not shown) over a network (not shown) . The network may include the Internet, wired media such as a wired network or direct-wired connections, and wireless media such as acoustic, radio frequency ( “RF” ) , infrared, and other wireless media.
Some or all operations of the methods described above can be performed by execution of computer-readable instructions stored on a computer-readable storage medium, as defined below. The term “computer-readable instructions” as used in the description and claims, include routines, applications, application modules, program modules, programs, components, data structures, algorithms, and the like. Computer-readable instructions can be implemented on various system configurations, including single-processor or multiprocessor systems, minicomputers, mainframe computers, personal computers, hand-held computing devices, microprocessor-based, programmable consumer electronics, combinations thereof, and the like.
The computer-readable storage media may include volatile memory (such as random-access memory ( “RAM” ) ) and/or non-volatile memory (such as read-only memory ( “ROM” ) , flash memory, etc. ) . The computer-readable storage media may also include additional removable storage and/or non-removable storage including, but not limited to, flash memory, magnetic storage, optical storage, and/or tape storage that may provide non-volatile storage of computer-readable instructions, data structures, program modules, and the like.
A non-transient computer-readable storage medium is an example of computer-readable media. Computer-readable media includes at least two types of computer-readable media, namely computer-readable storage media and communications media. Computer-readable storage media includes volatile and non- volatile, removable and non-removable media implemented in any process or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer-readable storage media includes, but is not limited to, phase change memory ( “PRAM” ) , static random-access memory ( “SRAM” ) , dynamic random-access memory ( “DRAM” ) , other types of random-access memory ( “RAM” ) , read-only memory ( “ROM” ) , electrically erasable programmable read-only memory ( “EEPROM” ) , flash memory or other memory technology, compact disk read-only memory ( “CD-ROM” ) , digital versatile disks ( “DVD” ) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device. In contrast, communication media may embody computer-readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. A computer-readable storage medium employed herein shall not be interpreted as a transitory signal itself, such as a radio wave or other free-propagating electromagnetic wave, electromagnetic waves propagating through a waveguide or other transmission medium (such as light pulses through a fiber optic cable) , or electrical signals propagating through a wire.
The computer-readable instructions stored on one or more non-transitory computer-readable storage media that, when executed by one or more processors, may perform operations described above with reference to FIGS. 1A-6. Generally, computer-readable instructions include routines, programs, objects, components, data structures, and the like that perform particular functions or implement particular abstract data types. The order in which the operations are described is not intended to be construed as a limitation, and any number of the described operations can be combined in any order and/or in parallel to implement the processes.
By the abovementioned technical solutions, the present disclosure provides selective control of conditional application of interpolation filters to a reference frame to enable inter-frame adaptive resolution changes based on motion prediction video coding standards, decreasing application of an interpolation filter to compensate for  computing environments wherein computational capacity is limited, battery charge is low, and/or battery capacity is limited, and maintaining or increasing application of an interpolation filter to allow for computing environments wherein computational capacity is not limited, battery charge is not low, and/or battery capacity is not limited. The methods and systems described herein include determining that one or more headers of a sequence applying to at least a current frame or a subunit thereof instructs a video decoder to conditionally decrease or increase application of an interpolation filter to a reference frame or components thereof; determining a ratio of the resolution of a reference frame to the resolution of a current frame; determining a motion vector of the block of the current frame, and calculating a pixel coordinate indicated by the motion vector of the block of the current frame; locating a predictor block of a reference frame in accordance with the motion information; and subsequently applying an interpolation filter to a block at the scaled coordinates at the reference frame to generate sub-pixel values of the predictor block.
EXAMPLE CLAUSES
A. A method comprising: determining that one or more headers of a sequence applying to at least a current frame or a subunit thereof instructs, by a header syntax, conditional decreasing or conditional increasing of application of an interpolation filter during motion prediction; determining motion information of a block of the current frame or the subunit thereof, the motion information comprising a motion vector of the block and at least one pixel coordinate indicated by the motion vector; locating a predictor block of a reference frame in accordance with the motion information; decreasing or increasing, based on at least one of a deciding condition and a discriminating condition, application of an interpolation filter to the reference frame or components thereof; and performing motion prediction on the current block by reference to the located predictor block.
B. The method as paragraph A recites, further comprising resizing the motion information according to a resolution of the reference frame, wherein the locator block of the reference is located in accordance with the resized motion information.
C. The method as paragraph A recites, further comprising determining that the resolution of the reference frame is different from a resolution of the current frame, and determining that the resolution of the reference frame is larger than or smaller than the resolution of the current frame.
D. The method as paragraph C recites, further comprising determining, by the at least one of a deciding condition or a discriminating condition, whether to apply an interpolation filter to the reference frame or components thereof, or to partially apply an interpolation filter to the reference frame or components thereof, or to not apply an interpolation filter to the reference frame.
E. The method as paragraph A recites, wherein a deciding condition is that the resolution of the reference frame is larger than the resolution of the current frame, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
F. The method as paragraph A recites, wherein a deciding condition is that the resolution of the reference frame is smaller than the resolution of the current frame, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
G. The method as paragraph A recites, wherein a discriminating condition is to apply the interpolation filter to chroma components of the reference frame and to not apply the interpolation filter to luma components of the reference frame, and decreasing application of the interpolation filter to the reference frame is based on the discriminating condition.
H. The method as paragraph A recites, wherein a discriminating condition is to apply the interpolation filter to luma components of the reference frame and to not apply the interpolation filter to chroma components of the reference frame, and decreasing application of the interpolation filter to the reference frame is based on the discriminating condition.
I. The method as paragraph A recites, wherein the deciding condition is that dimensions of blocks of the reference frame are larger than particular threshold  dimensions, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
J. The method as paragraph A recites, wherein the deciding condition is that dimensions of blocks of the reference frame are smaller than particular threshold dimensions, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
K. The method as paragraph J recites, wherein decreasing application of the interpolation filter to the reference frame comprises one of:
partially applying the interpolation filter to the reference frame;
not applying the interpolation filter to the reference frame; and
applying the interpolation filter to some, but not all, components of the reference frame.
L. The method as paragraph A recites, wherein the one or more headers comprises a video sequence-level header.
M. The method as paragraph A recites, wherein the one or more headers comprises a picture-level header.
N. The method as paragraph A recites, wherein the one or more headers comprises a slice-level header.
O. The method as paragraph A recites, wherein the one or more headers comprises a video sequence-level, picture-level, or slice-level first header instructing, by the header syntax, conditional decreasing or conditional increasing of application of an interpolation filter to the reference frame or components thereof.
P. The method as paragraph O recites, wherein the one or more headers further comprises a video sequence-level second header selectively controlling, by the header syntax, whether the first header is set.
Q. The method as paragraph O recites, wherein the first header is non-selectively set.
R. A system comprising: one or more processors; and memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors that, when executed by  the one or more processors, perform associated operations, the computer-executable modules including: a motion prediction module comprising: a header determining submodule configured to determine that one or more headers of a sequence applying to at least a current frame or a subunit thereof instructs, by a header syntax, conditional decreasing or conditional increasing of application of an interpolation filter during motion prediction; a motion information determining submodule configured to determine motion information of a block of the current frame or the subunit thereof, the motion information comprising a motion vector of the block and at least one pixel coordinate corresponding to the motion vector; a predictor block locating submodule configured to locate a predictor block of a reference frame in accordance with the motion information; and an interpolation filter applying submodule configured to decrease or increase, based on at least one of a deciding condition and a discriminating condition, application of an interpolation filter to the reference frame; wherein the motion prediction module is further configured to perform motion prediction on the current block by reference to the located predictor block.
S. The system as paragraph R recites, further comprising a motion information resizing submodule configured to resize the motion information according to a resolution of a reference frame, and wherein the predictor block locating submodule is configured to locate the predictor block of the reference frame in accordance with the resized motion information.
T. The system as paragraph R recites, wherein the motion prediction module further comprises a resolution comparing submodule configured to determine that the resolution of the reference frame is different from a resolution of the current frame, and determine that the resolution of the reference frame is larger than or smaller than the resolution of the current frame.
U. The system as paragraph T recites, further comprising an application determining submodule configured to determine, by the at least one of a deciding condition or a discriminating condition, whether to apply an interpolation filter to the reference frame or components thereof, or to partially apply an interpolation filter  to the reference frame or components thereof, or to not apply an interpolation filter to the reference frame.
V. The system as paragraph R recites, wherein a deciding condition is that the resolution of the reference frame is larger than the resolution of the current frame, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
W. The system as paragraph R recites, wherein a deciding condition is that the resolution of the reference frame is smaller than the resolution of the current frame, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
X. The system as paragraph R recites, wherein a discriminating condition is to apply the interpolation filter to chroma components of the reference frame and to not apply the interpolation filter to luma components of the reference frame, and decreasing application of the interpolation filter to the reference frame is based on the discriminating condition.
Y. The system as paragraph R recites, wherein a discriminating condition is to apply the interpolation filter to luma components of the reference frame and to not apply the interpolation filter to chroma components of the reference frame, and decreasing application of the interpolation filter to the reference frame is based on the discriminating condition.
Z. The system as paragraph R recites, wherein the deciding condition is that dimensions of blocks of the reference frame are larger than particular threshold dimensions, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
AA. The system as paragraph R recites, wherein the deciding condition is that dimensions of blocks of the reference frame are smaller than particular threshold dimensions, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
BB. The system as paragraph R recites, wherein the one or more headers comprises a video sequence-level header.
CC. The system as paragraph R recites, wherein the one or more headers comprises a picture-level header.
DD. The system as paragraph R recites, wherein the one or more headers comprises a slice-level header.
EE. The system as paragraph R recites, wherein the one or more headers comprises a video sequence-level, picture-level, or slice-level first header instructing, by the header syntax, conditional decreasing or conditional increasing of application of an interpolation filter to the reference frame or components thereof.
FF. The system as paragraph EE recites, wherein the one or more headers further comprises a video sequence-level second header selectively controlling, by the header syntax, whether the first header is set.
GG. The system as paragraph EE recites, wherein the first header is non-selectively set.
HH. A computer-readable storage medium storing computer-readable instructions executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to perform operations comprising: determining that one or more headers of a sequence applying to at least a current frame or a subunit thereof instructs, by a header syntax, conditional decreasing or conditional increasing of application of an interpolation filter during motion prediction; determining motion information of a block of the current frame or the subunit thereof, the motion information comprising a motion vector of the block and at least one pixel coordinate corresponding to the motion vector; locating a predictor block of a reference frame in accordance with the motion information; decreasing or increasing, based on at least one of a deciding condition and a discriminating condition, application of an interpolation filter to the reference frame; and performing motion prediction on the current block by reference to the located predictor block.
II. The computer-readable storage medium as paragraph HH recites, operations further comprising resizing the motion information according to a resolution of the reference frame, and wherein the predictor block of the reference frame is located in accordance with the resized motion information.
JJ. The computer-readable storage medium as paragraph HH recites, the operations further comprising determining that the resolution of the reference frame is different from a resolution of the current frame, and determining that the resolution of the reference frame is larger than or smaller than the resolution of the current frame.
KK. The computer-readable storage medium as paragraph JJ recites, the operations further comprising determining, by the at least one of a deciding condition or a discriminating condition, whether to apply an interpolation filter to the reference frame or components thereof, or to partially apply an interpolation filter to the reference frame or components thereof, or to not apply an interpolation filter to the reference frame.
LL. The computer-readable storage medium as paragraph HH recites, wherein a deciding condition is that the resolution of the reference frame is larger than the resolution of the current frame, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
MM. The computer-readable storage medium as paragraph HH recites, wherein a deciding condition is that the resolution of the reference frame is smaller than the resolution of the current frame, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
NN. The computer-readable storage medium as paragraph HH recites, wherein a discriminating condition is to apply the interpolation filter to chroma components of the reference frame and to not apply the interpolation filter to luma components of the reference frame, and decreasing application of the interpolation filter to the reference frame is based on the discriminating condition.
OO. The computer-readable storage medium as paragraph HH recites, wherein a discriminating condition is to apply the interpolation filter to luma components of the reference frame and to not apply the interpolation filter to chroma components of the reference frame, and decreasing application of the interpolation filter to the reference frame is based on the discriminating condition.
PP. The computer-readable storage medium as paragraph HH recites, wherein the deciding condition is that dimensions of blocks of the reference frame are  larger than particular threshold dimensions, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
QQ. The computer-readable storage medium as paragraph HH recites, wherein the deciding condition is that dimensions of blocks of the reference frame are smaller than particular threshold dimensions, and decreasing application of the interpolation filter to the reference frame is based on the deciding condition.
RR. The computer-readable storage medium as paragraph HH recites, wherein the one or more headers comprises a video sequence-level header.
SS. The computer-readable storage medium as paragraph HH recites, wherein the one or more headers comprises a picture-level header.
TT. The computer-readable storage medium as paragraph HH recites, wherein the one or more headers comprises a slice-level header.
UU. The computer-readable storage medium as paragraph HH recites, wherein the one or more headers comprises a video sequence-level, picture-level, or slice-level first header instructing, by the header syntax, conditional decreasing or increasing application of an interpolation filter to the reference frame or components thereof.
VV. The computer-readable storage medium as paragraph HH recites, wherein the one or more headers further comprises a video sequence-level second header selectively controlling, by the header syntax, whether the first header is set.
WW. The computer-readable storage medium as paragraph HH recites, wherein the first header is non-selectively set.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as exemplary forms of implementing the claims.

Claims (23)

  1. A method comprising:
    determining motion information of a block of the current frame or the subunit thereof, the motion information comprising a motion vector of the block and at least one pixel coordinate corresponding to the motion vector;
    locating a predictor block of a reference frame in accordance with the motion information;
    decreasing or increasing, based on at least one of a deciding condition and a discriminating condition, application of an interpolation filter to the reference frame or components thereof; and
    performing motion prediction on the current block by reference to the located predictor block.
  2. The method of claim 1, further comprising determining that one or more headers of a sequence applying to at least a current frame or a subunit thereof instructs, by a header syntax, conditional decreasing or conditional increasing of application of an interpolation filter during motion prediction.
  3. The method of claim 2, wherein the one or more headers comprises a video sequence-level header.
  4. The method of claim 2, wherein the one or more headers comprises a picture-level header.
  5. The method of claim 2, wherein the one or more headers comprises a slice-level header.
  6. The method of claim 2, wherein the one or more headers comprises a video sequence-level, picture-level, or slice-level first header instructing, by the header  syntax, conditional decreasing or conditional increasing of application of an interpolation filter to the reference frame or components thereof.
  7. The method of claim 6, wherein the one or more headers further comprises a video sequence-level second header selectively controlling, by the header syntax, whether the first header is set.
  8. The method of claim 6, wherein the first header is non-selectively set.
  9. A system comprising:
    one or more processors; and
    memory communicatively coupled to the one or more processors, the memory storing computer-executable modules executable by the one or more processors that, when executed by the one or more processors, perform associated operations, the computer-executable modules including:
    a motion prediction module comprising:
    a motion information determining submodule configured to determine motion information of a block of the current frame or the subunit thereof, the motion information comprising a motion vector of the block and at least one pixel coordinate corresponding to the motion vector;
    a predictor block locating submodule configured to locate a predictor block of a reference frame in accordance with motion information; and
    an interpolation filter applying submodule configured to decrease or increase, based on at least one of a deciding condition and a discriminating condition, application of an interpolation filter to the reference frame;
    wherein the motion prediction module is further configured to perform motion prediction on the current block by reference to the located predictor block.
  10. The system of claim 9, further comprising a header determining submodule configured to determine that one or more headers of a sequence applying to at least a current frame or a subunit thereof instructs, by a header syntax, conditional decreasing or conditional increasing of application of an interpolation filter during motion prediction.
  11. The system of claim 10, wherein the one or more headers comprises a video sequence-level header.
  12. The system of claim 10, wherein the one or more headers comprises a picture-level header.
  13. The system of claim 10, wherein the one or more headers comprises a slice-level header.
  14. The system of claim 10, wherein the one or more headers comprises a video sequence-level, picture-level, or slice-level first header instructing, by the header syntax, conditional decreasing or conditional increasing of application of an interpolation filter to the reference frame or components thereof.
  15. The system of claim 14, wherein the one or more headers further comprises a video sequence-level second header selectively controlling, by the header syntax, whether the first header is set.
  16. The system of claim 14, wherein the first header is non-selectively set.
  17. A computer-readable storage medium storing computer-readable instructions executable by one or more processors, that when executed by the one or more processors, cause the one or more processors to perform operations comprising:
    determining motion information of a block of the current frame or the subunit thereof, the motion information comprising a motion vector of the block and at least one pixel coordinate corresponding to the motion vector;
    locating a predictor block of a reference frame in accordance with the motion information;
    decreasing or increasing, based on at least one of a deciding condition and a discriminating condition, application of an interpolation filter to the reference frame or components thereof; and
    performing motion prediction on the current block by reference to the located predictor block.
  18. The computer-readable storage medium of claim 17, wherein the operations further comprise determining that one or more headers of a sequence applying to at least a current frame or a subunit thereof instructs, by a header syntax, conditional decreasing or conditional increasing of application of an interpolation filter during motion prediction.
  19. The computer-readable storage medium of claim 18, wherein the one or more headers comprises a video sequence-level header.
  20. The computer-readable storage medium of claim 18, wherein the one or more headers comprises a picture-level header.
  21. The computer-readable storage medium of claim 18, wherein the one or more headers comprises a slice-level header.
  22. The computer-readable storage medium of claim 18, wherein the one or more headers comprises a video sequence-level, picture-level, or slice-level first header instructing, by the header syntax, conditional decreasing or increasing application of an interpolation filter to the reference frame or components thereof.
  23. The computer-readable storage medium of claim 22, wherein the one or more headers further comprises a video sequence-level second header selectively controlling, by the header syntax, whether the first header is set.
PCT/CN2019/129943 2019-12-30 2019-12-30 Selective control of conditional filters in resolution-adaptive video coding WO2021134222A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/129943 WO2021134222A1 (en) 2019-12-30 2019-12-30 Selective control of conditional filters in resolution-adaptive video coding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/129943 WO2021134222A1 (en) 2019-12-30 2019-12-30 Selective control of conditional filters in resolution-adaptive video coding

Publications (1)

Publication Number Publication Date
WO2021134222A1 true WO2021134222A1 (en) 2021-07-08

Family

ID=76687447

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/129943 WO2021134222A1 (en) 2019-12-30 2019-12-30 Selective control of conditional filters in resolution-adaptive video coding

Country Status (1)

Country Link
WO (1) WO2021134222A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013104713A1 (en) * 2012-01-13 2013-07-18 Thomson Licensing Method and device for coding an image block, corresponding method and decoding device
WO2014193956A1 (en) * 2013-05-31 2014-12-04 Qualcomm Incorporated Resampling using scaling factor
CN105430410A (en) * 2014-09-17 2016-03-23 联发科技股份有限公司 Motion compensation apparatus and motion compensation method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013104713A1 (en) * 2012-01-13 2013-07-18 Thomson Licensing Method and device for coding an image block, corresponding method and decoding device
WO2014193956A1 (en) * 2013-05-31 2014-12-04 Qualcomm Incorporated Resampling using scaling factor
CN105430410A (en) * 2014-09-17 2016-03-23 联发科技股份有限公司 Motion compensation apparatus and motion compensation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
B. CHOI (TENCENT), S. WENGER (STEWE), S. LIU (TENCENT): "AHG8: Signaling and filtering for Reference Picture Resampling (RPR)", 15. JVET MEETING; 20190703 - 20190712; GOTHENBURG; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVET-O0332 ; m48447, 26 June 2019 (2019-06-26), XP030219280 *
HENDRY (HUAWEI), S. HONG (HUAWEI), Y.-K. WANG (HUAWEI), J. CHEN (HUAWEI), Y.-C SUN (ALIBABA-INC), T.-S CHANG (ALIBABA-INC), J. LOU: "AHG19: Adaptive resolution change (ARC) support in VVC", 14. JVET MEETING; 20190319 - 20190327; GENEVA; (THE JOINT VIDEO EXPLORATION TEAM OF ISO/IEC JTC1/SC29/WG11 AND ITU-T SG.16 ), no. JVET-N0118, 12 March 2019 (2019-03-12), XP030202642 *

Similar Documents

Publication Publication Date Title
US11902563B2 (en) Encoding and decoding method and device, encoder side apparatus and decoder side apparatus
US20240056606A1 (en) Encoding method and apparatus therefor, and decoding method and apparatus therefor
JP2022544164A (en) Implicit Signaling of Adaptive Resolution Management Based on Frame Type
KR20200005648A (en) Intra prediction mode based image processing method and apparatus therefor
US20180324441A1 (en) Method for encoding/decoding image and device therefor
JP7551784B2 (en) Encoding/Decoding Method, Apparatus and Device Thereof
JP2022544160A (en) Adaptive resolution management signaling
JP2022544156A (en) Block-based adaptive resolution management
WO2021253373A1 (en) Probabilistic geometric partitioning in video coding
KR102699681B1 (en) Inter prediction method and device
JP2022544159A (en) Adaptive resolution management using subframes
US20240137532A1 (en) Methods and systems for adaptive cropping
US20240073437A1 (en) Encoding and decoding method and apparatus, and devices
JP2022544157A (en) Adaptive resolution management predictive rescaling
WO2021003671A1 (en) Resolution-adaptive video coding
WO2021134222A1 (en) Selective control of conditional filters in resolution-adaptive video coding
WO2021046692A1 (en) Resolution-adaptive video coding with conditional interpolation filters
KR20180039722A (en) Method and apparatus for encoding / decoding image
WO2024008063A1 (en) On planar intra prediction mode
US20230239461A1 (en) Inter coding for adaptive resolution video coding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19958224

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19958224

Country of ref document: EP

Kind code of ref document: A1