WO2018023352A1 - Fast motion estimation method based on online learning - Google Patents

Fast motion estimation method based on online learning Download PDF

Info

Publication number
WO2018023352A1
WO2018023352A1 PCT/CN2016/092751 CN2016092751W WO2018023352A1 WO 2018023352 A1 WO2018023352 A1 WO 2018023352A1 CN 2016092751 W CN2016092751 W CN 2016092751W WO 2018023352 A1 WO2018023352 A1 WO 2018023352A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion estimation
inter
sub
pixel motion
unit
Prior art date
Application number
PCT/CN2016/092751
Other languages
French (fr)
Chinese (zh)
Inventor
潘兆庆
孙星明
Original Assignee
南京信息工程大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京信息工程大学 filed Critical 南京信息工程大学
Priority to PCT/CN2016/092751 priority Critical patent/WO2018023352A1/en
Publication of WO2018023352A1 publication Critical patent/WO2018023352A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems

Definitions

  • the invention belongs to the technical field of video coding, and in particular relates to a fast motion estimation method based on online learning.
  • High Efficiency Video Coding is the latest video coding standard that effectively addresses the storage and transmission of high definition (HD) and ultra high definition video.
  • the high coding efficiency achieved by the HEVC encoder is based on a series of advanced coding techniques with high computational complexity, such as a quadtree-based coding unit (CU), motion estimation based on variable-size prediction units, Including integer pixel motion estimation (IPME) and sub-pixel motion estimation (FPME), the huge coding complexity greatly limits the wide application of HEVC encoders in multimedia products.
  • CU quadtree-based coding unit
  • IPME integer pixel motion estimation
  • FPME sub-pixel motion estimation
  • the whole pixel motion estimation is first performed, and then the optimal motion vector of the whole pixel motion estimation is set as the initial search position point of the pixel motion estimation, thereby refining the integer pixel. Search results for motion estimation.
  • a sub-pixel motion processing with 8 search points and half pixel precision is performed on the optimal motion vector obtained during the whole pixel motion estimation process.
  • 8-minute 1/4 pixel sub-pixel motion processing is performed on the optimal motion vector processed by the sub-pixel precision sub-pixel motion.
  • the best search point Determined by finding the minimum value of the Lagrangian rate distortion cost (RD) function, as follows:
  • SATD represents the original PU prediction unit And its predicted PUc unit
  • is the MV of the total candidate sub-pixel motion search point
  • ⁇ MOTION is the Lagrangian multiplier. Is the number of bits required to encode the motion parameters. This method of "traversing all, selecting the best" sub-pixel motion significantly improves the coding efficiency of the HEVC encoder, but at the expense of high coding complexity.
  • Document 1 proposes a fast integer pixel motion algorithm that achieves an optimal motion vector by reducing the number of pixels in the current search window.
  • Document 2 by using the predicted motion vector to measure the motion intensity of the current coding unit, Li et al. propose a fast integer pixel motion algorithm for HEVC content adaptation.
  • Document 3 proposes a fast integer pixel motion algorithm based on motion vector inheritance method, in which if the coded block flag (CBF) of the 2N ⁇ 2N PU prediction unit is zero, the sub-divided PU prediction unit inherits the most of the 2N ⁇ 2N PU prediction unit.
  • CBF coded block flag
  • a fast integer pixel motion estimation algorithm based on confidence interval is proposed for HEVC, in which the integer pixel motion is expressed for the first time as a statistical inference problem, and then a motion estimation based on confidence interval is proposed.
  • Document 5 proposes a depth-based adaptive search range fast integer pixel motion algorithm. These algorithms focus on optimizing the coding complexity of integer pixel motion estimation.
  • the fast integer pixel motion TZSearch algorithm in HEVC uses some early termination strategies, and the space for integer pixel motion complexity can be optimized.
  • the sub-pixel motion estimation coding process can be further optimized.
  • Document 6 proposes a fast two-step sub-pixel motion estimation algorithm, in which five adjacent integer pixel points are modeled as one error surface for the first time, and then a second-order approximation method is proposed to predict the position of the optimal sub-pixel.
  • Document 7 proposes an optimal scalable and low-cost optimal sub-pixel motion estimation algorithm.
  • Sotetsumoto et al. proposed a low-complexity sub-pixel motion estimation algorithm, which first uses the early termination strategy to terminate the FPME sub-pixel motion estimation process, and then designs the half-pixel and 1/4-pixel FPME sub-pixel motion estimation. Different search modes.
  • the object of the present invention is to provide a fast motion estimation method based on online learning, which can effectively save coding time and improve coding performance.
  • the solution of the present invention is:
  • a fast motion estimation method based on online learning comprising the following steps:
  • step (1) Encoding the current coding unit using Inter_2N ⁇ 2N prediction units, the specific process includes integer pixel motion estimation and sub-pixel motion estimation, and the optimal motion vector of the root prediction unit is recorded. in case Go to step (2), otherwise go to step (3), where Representing the best motion vector when the root prediction unit uses the integer pixel motion estimation operation;
  • step (1) storing the encoded information and the write code stream, and returning to step (1) to encode the next coding unit.
  • each frame image is divided into a series of coding tree units based on a quadtree structure, which is a basic processing unit of HEVC; in the encoding process, Based on the quadtree structure, the coding tree unit is further divided into coding units; according to the coding prediction type, the coding unit is further divided into one, two or four prediction units, and the prediction unit is the basic processing of intra prediction and inter prediction. unit.
  • step (1) in the inter prediction 2N ⁇ 2N encoding process of the coding unit, a total of eight prediction unit modes are supported, including Inter_2Nx2N, Inter_2NxN, Inter_Nx2N, Inter_NxN, Inter_2NxnU, Inter_2NxnD, Inter_nLx2N, and Inter_nRx2N.
  • the motion estimation process is: first, performing integer pixel motion estimation on the current coding unit, and then setting the optimal motion vector obtained by the whole pixel motion estimation as the starting position of the sub-pixel motion estimation, and the current coding.
  • the unit performs sub-pixel motion estimation, and finally, the optimal motion vector estimated by comparing the integer pixel motion And the best motion vector for sub-pixel motion estimation Interval rate distortion cost determines the best motion vector for motion estimation of the current prediction mode which is
  • J ipme and J fpme are respectively with Rate distortion cost.
  • step (3) the optimal motion vector of the sub-prediction unit Determined by learning the motion vector information of its root prediction unit, ie
  • the best motion vector representing the integer pixel motion estimation of the sub-prediction unit Indicates the best motion vector for the root prediction unit, The best motion vector representing the integer pixel motion estimation of the root prediction unit, An optimal motion vector that represents the sub-pixel motion estimation of the sub-prediction unit.
  • the present invention firstly divides all prediction units into a root prediction unit (Inter_2N ⁇ 2N) and a sub prediction unit (other prediction unit patterns) according to characteristics of a motion estimation process based on a variable size prediction unit, and then, By learning the sub-pixel motion estimation result of its root prediction unit, the sub-pixel motion estimation process of the sub-prediction unit is adaptively skipped, thereby optimizing the coding complexity of the sub-pixel motion estimation.
  • a root prediction unit Inter_2N ⁇ 2N
  • sub prediction unit other prediction unit patterns
  • Figure 1 is an architectural diagram of the inheritance relationship between prediction units
  • Figure 2 is an experimental result of statistical analysis of conditional probability.
  • Ratio C h Five HEVC standard test video sequences with different resolution and content characteristics, including "BQSquare”, “PartyScene”, “KristenAndSara”, “BasketballDrive” and “PeopleOnStreet", are encoded by HEVC reference software HM12.0. The characteristics of the sequences are defined in detail in Document 11.
  • test conditions are defined as follows, the maximum coding unit size and the maximum quadtree depth are 64 ⁇ 64 and 4, respectively; the integer pixel motion estimation method and its search range are set to TZSearch and ⁇ 64, respectively; the quantization parameter (QP) is 27, and is used.
  • Low-Delay-Main (LDM) and Random-Access-Main (RAM) encoding configurations, other encoding parameters use the default settings in HM and Document 11. Calculate C m and C h from equation (1),
  • T f , T m and T h represent the sub-pixel motion estimation, the motion estimation and the total coding time of the HEVC encoder, respectively.
  • the statistical results are shown in Table 1.
  • the total coding time of the sub-pixel motion estimation accounts for a large proportion of the total coding time of the motion estimation.
  • the value of C m is 47.82% to 82.92% with an average of 68.65%.
  • the value of C m is 51.89% to 84.80%, with an average of 66.14%.
  • FPME aims to refine IPME search results and maximize coding efficiency.
  • the event M indicates that the optimal search point for the whole pixel motion estimation is selected as the best search point for motion estimation
  • the event N indicates that the best search point for the pixel motion estimation is selected as The best search point for motion estimation.
  • Probabilities P(M) and P(N) the statistical results are shown in Table 2.
  • the optimal search point for the whole pixel motion estimation has a higher probability and is selected as the best search point for the entire motion estimation.
  • the probability of P(M) is 93.85% to 99.81%, with an average of 96.23%.
  • the probability of P(N) is only 0.19% to 6.15%, with an average of 3.77%.
  • the probability of P(M) is 92.66% to 98.42%, with an average of 95.28%.
  • the probability of P(N) is only 1.58% to 7.34%, with an average of 4.73%.
  • each frame of image is divided into a series of coding tree units based on a quadtree structure, which is the basic processing unit of HEVC.
  • the coding tree unit is further divided into coding units.
  • the coding unit is further divided into one, two or four prediction units, and the prediction unit is a basic processing unit of intra prediction and inter prediction.
  • interframe predictive coding for coding units, a total of eight prediction unit modes are supported, including Inter_2N ⁇ 2N, Inter_2N ⁇ N, Inter_N ⁇ 2N, Inter_N ⁇ N, Inter_2N ⁇ nU, Inter_2N ⁇ nD, Inter_nL ⁇ 2N, and Inter_nR. ⁇ 2N (see Reference 12).
  • all inter-prediction unit modes are sequentially encoded to achieve maximum removal of time domain data redundancy.
  • an integer pixel motion estimation operation is first performed on the current prediction unit mode.
  • the optimal motion vector obtained by the whole pixel motion estimation is set as the starting position of the sub-pixel motion estimation, and the current prediction unit mode is subjected to the sub-pixel motion estimation operation.
  • the best motion vectors estimated by the whole pixel motion And the best motion vector for sub-pixel motion estimation Interval rate distortion cost determines the best motion vector for motion estimation of the current prediction mode which is
  • J ipme and J fpme are respectively with Rate distortion cost.
  • Inter_2Nx2N is the root of other inter prediction unit modes, and these remaining prediction unit modes are sub-prediction unit modes.
  • the root prediction unit mode selects the optimal motion vector of the whole pixel motion estimation as the final optimal motion vector of the motion estimation process, it means that the content of the root prediction unit is simple or the motion activity of the root prediction unit mode is slow, so that the sub prediction unit There is also a high probability of selecting the optimal motion vector for the whole pixel motion estimation as its final best motion vector and skipping the sub-pixel motion estimation.
  • event S represents that the root prediction unit mode selects the optimal motion vector of the whole pixel motion estimation as the best motion vector of the entire motion estimation process
  • event T denotes that the sub-prediction unit mode also selects the optimal motion vector of the whole pixel motion estimation as the optimal motion vector of the entire motion estimation process
  • S) The experimental results are shown in Figure 2.
  • the root prediction unit mode selects the best motion vector of the whole pixel motion estimation as the best motion vector for motion estimation
  • its sub-prediction unit mode also has considerable possibility to select the whole pixel motion estimation.
  • the best motion vector is used as the best motion vector for its motion estimation.
  • S) is 96.68% to 99.70%
  • S) is 96.70% to 99.07%.
  • the conditional probability is 99%, because the two sequences have many simple areas, for example, most of the areas in "KristenAndSara" are backgrounds.
  • the root prediction unit mode selects the optimal motion vector of the whole pixel motion estimation as its motion.
  • the estimated optimal motion vector whose sub-prediction unit mode also selects the optimal motion vector of the integer pixel motion estimation as the best motion vector for its motion estimation. Therefore, we can use the motion vector vector information of the root prediction unit to quickly encode its sub-prediction unit mode.
  • the optimal motion vector of the sub-prediction unit mode ie Can be determined early by learning the motion vector information of its root prediction unit, ie
  • the best motion vector representing the integer pixel motion estimation of the sub-prediction unit Indicates the best motion vector for the root prediction unit, The best motion vector representing the integer pixel motion estimation of the root prediction unit, An optimal motion vector that represents the sub-pixel motion estimation of the sub-prediction unit.
  • the present invention provides a fast motion estimation method based on online learning, which includes the following steps:
  • the specific process includes integer pixel motion estimation and sub-pixel motion estimation, and the optimal motion vector of the root prediction unit is recorded. If the best motion vector of the root prediction unit is equal to the best motion vector when the root prediction unit uses the integer pixel motion estimation operation That is Go to step (2), otherwise go to step (3);
  • the current coding unit uses the sub-prediction units Inter_2N ⁇ N, Inter_N ⁇ 2N, Inter_N ⁇ N, Inter_2N ⁇ nU, Inter_2N ⁇ nD, Inter_nL ⁇ 2N, and Inter_nR ⁇ 2N to sequentially encode, and the specific process only includes integer pixel motion estimation. Operation, go to step (4);
  • step (1) storing the encoded information and the write code stream, and returning to step (1) to encode the next coding unit.
  • the coding time saved by the proposed method depends on the number of root prediction units that select the integer pixel motion estimation result as the motion estimation optimal motion vector. If most of the root prediction units select the integer pixel motion estimation result as the best motion vector for motion estimation, the coding time will be greatly reduced.
  • the probability that the root prediction unit selects the integer pixel motion estimation result as the motion estimation optimal motion vector varies from 91.25% to 99.51%, and the average 94.10%.
  • the probability that the root prediction unit selects the integer pixel motion estimation result as the motion estimation optimal motion vector is from 89.80% to 99.07%, and the average value is 94.03%.
  • HEVC reference software HM12.0 As a software platform. 15 HEVC standard test video sequences were tested using HEVC general test conditions, using LDM and RAM coding configurations; and the integer pixel motion estimation method and its search range were TZSearch and ⁇ 64, respectively; maximum coding unit size and maximum quadtree depth were respectively It is 64 ⁇ 64 and 4; the test includes 22, 27, 32 and 37 totaling 4 QPs. Hardware platform with Intel Xeon CPU E3-1241v3@3.50 Microsoft Windows 7 64-bit operating system with GHz and 4.00GB RAM.
  • PSNR peak signal-to-noise ratio
  • BR bit rate
  • TETS total coding time savings
  • METS motion estimation time savings
  • T ⁇ (QP i) represents the value of QP QP i, ⁇ ⁇ PCS [9 ], the proposed method ⁇ case, the fast motion estimation algorithm ⁇ total encoding time operations on the resulting HM12.0; T B (QP i ) is expressed as the total coding time obtained by the benchmark detection program motion estimation algorithm at HM12.0, where the QP value is QP i , and the benchmark detection program motion estimation algorithm includes the original TZSearch and the original sub-pixel motion estimation.
  • the algorithm can be expressed as HM12.0-ME; M ⁇ (QP i ) indicates that QP takes the value QP i , ⁇ ⁇ ⁇ PCS [9], in the case of this method ⁇ , the fast motion estimation algorithm ⁇ is on HM12.0
  • the PCS achieved a TETS of 5.63% to 23.62%, an average of 14.96%, and achieved a METS of 8.14% to 38.89% with an average of 25.70%.
  • the BDPSNR between PCS and HM12.0-ME is -0.002dB to -0.018dB, with an average of -0.009dB, and the BDBR between PCS and HM12.0-ME is 0.07% to 0.53%, with an average of 0.29. %.
  • PCS achieved a TETS of 6.26% to 16.56% with an average of 11.45% and a METS of 9.91% to 30.45% with an average of 22.59%.
  • the BDPSNR between PCS and HM12.0-ME is -0.004dB to -0.026dB, with an average of -0.009dB
  • the BDBR between PCS and HM12.0-ME is 0.11% to 0.56%, with an average of 0.26. %. From these values, it can be seen that the PCS achieves excellent RD performance and limits the resulting TETS and METS.
  • the algorithm proposed by the present invention can effectively reduce the coding complexity of the motion estimation process, and the RD performance loss is negligible.
  • the present invention proposes The algorithm achieves a TETS of 20.16% to 35.69%, with an average of 28.82%, and a METS of 57.73% to 66.05%, with an average of 62.60%.
  • the BDPSNR between the proposed algorithm and HM12.0-ME is -0.019dB to -0.097dB, with an average of -0.049dB; and the BDBR between the proposed algorithm and HM12.0-ME is 0.79%. To 2.18%, the average is 1.51%.
  • the proposed algorithm achieves a TETS of 21.06% to 29.95% with an average of 24.35%, and the obtained METS is 61.22% to 66.02% with an average of 63.82%.
  • the BDPSNR between the proposed algorithm and the HM12.0-ME is -0.020 dB to -0.089 dB, with an average of -0.051 dB; and the algorithm proposed by the present invention and the BDBR between HM12.0-ME It is 0.90% to 2.04% with an average of 1.41%. From these values, we can see that the proposed algorithm can effectively reduce the coding complexity and achieve good rate-distortion performance.
  • the motion estimation method proposed by the present invention saves 15.87% of the total coding time and 49.27% of the motion estimation coding time compared to the PCS.
  • the proposed algorithm saves 14.57% of the total encoding time and 53.26% of the motion estimation encoding time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

A fast motion estimation method based on online learning, comprising the steps: (1) using a root prediction unit to perform encoding, comprising integer-pixel motion estimation and sub-pixel motion estimation, on a current encoding unit, and if an optimal motion vector of the root prediction unit is equal to an optimal motion vector of the root prediction unit when performing the integer-pixel motion estimation, turning to step (2), otherwise turning to step (3); (2) using a sub-prediction unit to successively perform encoding, comprising the integer-pixel motion estimation, and turning to step (4); (3) using a sub-prediction unit to perform encoding, comprising the integer-pixel motion estimation and the sub-pixel motion estimation; (4) using an intra-frame prediction unit to perform encoding; and (5) storing encoding information and a write code stream, and returning to step (1) to encode a next encoding unit. The motion estimation method can effectively economise an encoding time and improve the encoding performance.

Description

一种基于在线学习的快速运动估计方法A Fast Motion Estimation Method Based on Online Learning 技术领域Technical field
本发明属于视频编码技术领域,特别涉及一种基于在线学习的快速运动估计方法。The invention belongs to the technical field of video coding, and in particular relates to a fast motion estimation method based on online learning.
背景技术Background technique
高效视频编码(HEVC)是目前最新的视频编码标准,它有效地解决了高清晰度(HD)和超高清晰度视频的存储和传输问题。然而,HEVC编码器实现的高编码效率是以一系列高计算复杂度的先进编码技术为基础的,例如基于四叉树结构的编码单元(CU),基于可变大小的预测单元的运动估计,包括整像素运动估计(IPME)和分像素运动估计(FPME)等,巨大的编码复杂度大大限制了HEVC编码器在多媒体产品中的广泛的应用。High Efficiency Video Coding (HEVC) is the latest video coding standard that effectively addresses the storage and transmission of high definition (HD) and ultra high definition video. However, the high coding efficiency achieved by the HEVC encoder is based on a series of advanced coding techniques with high computational complexity, such as a quadtree-based coding unit (CU), motion estimation based on variable-size prediction units, Including integer pixel motion estimation (IPME) and sub-pixel motion estimation (FPME), the huge coding complexity greatly limits the wide application of HEVC encoders in multimedia products.
在HEVC编码器的运动估计(ME)进程过程中,首先执行整像素运动估计,之后将整像素运动估计的最佳运动矢量设为分像素运动估计的起始搜索位置点,从而细化整像素运动估计的搜索结果。对于HEVC参考软件HM12.0中的分像素运动估计来说,对在整像素运动估计进程过程中得到的最佳运动矢量进行8搜索点半像素精度的分像素运动处理。之后,对半像素精度的分像素运动处理过的最佳运动矢量进行8点1/4像素的分像素运动处理。分像素运动进程过程中共有16个搜索点,最佳搜索点
Figure PCTCN2016092751-appb-000001
通过求拉格朗日率失真代价(RD)函数的最小值来确定,如下式:
In the motion estimation (ME) process of the HEVC encoder, the whole pixel motion estimation is first performed, and then the optimal motion vector of the whole pixel motion estimation is set as the initial search position point of the pixel motion estimation, thereby refining the integer pixel. Search results for motion estimation. For the sub-pixel motion estimation in the HEVC reference software HM12.0, a sub-pixel motion processing with 8 search points and half pixel precision is performed on the optimal motion vector obtained during the whole pixel motion estimation process. Thereafter, 8-minute 1/4 pixel sub-pixel motion processing is performed on the optimal motion vector processed by the sub-pixel precision sub-pixel motion. There are 16 search points in the process of sub-pixel motion, the best search point
Figure PCTCN2016092751-appb-000001
Determined by finding the minimum value of the Lagrangian rate distortion cost (RD) function, as follows:
Figure PCTCN2016092751-appb-000002
Figure PCTCN2016092751-appb-000002
其中,SATD表示原始PU预测单元
Figure PCTCN2016092751-appb-000003
和其预测PUc单元
Figure PCTCN2016092751-appb-000004
之间的残差之和,φ是总候选分像素运动搜索点的MV,λMOTION指拉格朗日乘数,
Figure PCTCN2016092751-appb-000005
是编码运动参数的所需比特数。这种“遍历全部,选出最优”的分像素运动方法显著提高了HEVC编码器的编码效率,然而是以高编码复杂度为牺牲代价。
Where SATD represents the original PU prediction unit
Figure PCTCN2016092751-appb-000003
And its predicted PUc unit
Figure PCTCN2016092751-appb-000004
The sum of the residuals, φ is the MV of the total candidate sub-pixel motion search point, and λ MOTION is the Lagrangian multiplier.
Figure PCTCN2016092751-appb-000005
Is the number of bits required to encode the motion parameters. This method of "traversing all, selecting the best" sub-pixel motion significantly improves the coding efficiency of the HEVC encoder, but at the expense of high coding complexity.
为了降低HEVC编码器ME运动估计过程的编码复杂度,学术界专注于优化整 像素运动和分像素运动估计的预测过程。文献1提出一种快速整像素运动算法,其通过减少当前搜索窗口中的像素数目以实现最佳运动矢量。文献2中,通过使用预测运动矢量来衡量当前编码单元的运动强度,Li等人提出一种用于HEVC内容自适应的快速整像素运动算法。文献3提出一种基于运动矢量继承方法的快速整像素运动算法,其中如果2N×2N PU预测单元的编码块标志(CBF)是零,则子划分PU预测单元继承2N×2N PU预测单元的最佳运动矢量。在文献4中,提出基于置信区间的快速整像素运动估计算法用于HEVC,其中首次将整像素运动表述为统计推断问题,之后提出基于置信区间的运动估计。文献5提出一种基于深度的自适应搜索范围快速整像素运动算法。这些算法专注于优化整像素运动估计的编码复杂度,而HEVC中快速整像素运动TZSearch算法采用了一些提前终止策略,整像素运动复杂度可以优化的空间非常有限。In order to reduce the coding complexity of the HEVC encoder ME motion estimation process, the academic community focuses on optimizing the whole The prediction process of pixel motion and sub-pixel motion estimation. Document 1 proposes a fast integer pixel motion algorithm that achieves an optimal motion vector by reducing the number of pixels in the current search window. In Document 2, by using the predicted motion vector to measure the motion intensity of the current coding unit, Li et al. propose a fast integer pixel motion algorithm for HEVC content adaptation. Document 3 proposes a fast integer pixel motion algorithm based on motion vector inheritance method, in which if the coded block flag (CBF) of the 2N×2N PU prediction unit is zero, the sub-divided PU prediction unit inherits the most of the 2N×2N PU prediction unit. Good motion vector. In Document 4, a fast integer pixel motion estimation algorithm based on confidence interval is proposed for HEVC, in which the integer pixel motion is expressed for the first time as a statistical inference problem, and then a motion estimation based on confidence interval is proposed. Document 5 proposes a depth-based adaptive search range fast integer pixel motion algorithm. These algorithms focus on optimizing the coding complexity of integer pixel motion estimation. The fast integer pixel motion TZSearch algorithm in HEVC uses some early termination strategies, and the space for integer pixel motion complexity can be optimized.
为了进一步降低ME运动估计过程的编码复杂度,分像素运动估计编码过程可再进行优化。文献6提出一种快速两步分像素运动估计算法,其中首次把五个相邻整像素点建模为一个误差表面,之后提出二阶近似法以预测最优分佳分像素的位置。文献7提出一种最优可分级扩展且低代价的最优分像素运动估计算法。文献8中,Sotetsumoto等人提出一种低复杂度的分像素运动估计算法,首先使用提前终止策略以终止FPME分像素运动估计过程,之后为半像素和1/4像素FPME分像素运动估计设计了不同的搜索模式。文献9中,基于残差样本和边缘数目,Blasi等人提出了自适应性精度的运动估计算法,其可自适应性地确定运动矢量精度而跳过1/2像素和1/4像素分像素运动估计。基于误差表面的性能特性和整像素点的数据块失真测定,文献10提出一种自由插值算法以降低HEVC分像素运动的计算复杂度。这些算法有效地降低了分像素运动的编码复杂度,但是未考虑整像素运动和分像素运动之间的最佳运动矢量关系。通过建立整像素运动估计和分像素运动估计之间的最佳运动矢量关系,可以进一步优化分像素运动的编码复杂度。In order to further reduce the coding complexity of the ME motion estimation process, the sub-pixel motion estimation coding process can be further optimized. Document 6 proposes a fast two-step sub-pixel motion estimation algorithm, in which five adjacent integer pixel points are modeled as one error surface for the first time, and then a second-order approximation method is proposed to predict the position of the optimal sub-pixel. Document 7 proposes an optimal scalable and low-cost optimal sub-pixel motion estimation algorithm. In Document 8, Sotetsumoto et al. proposed a low-complexity sub-pixel motion estimation algorithm, which first uses the early termination strategy to terminate the FPME sub-pixel motion estimation process, and then designs the half-pixel and 1/4-pixel FPME sub-pixel motion estimation. Different search modes. In Document 9, based on residual samples and the number of edges, Blasi et al. proposed an adaptive precision motion estimation algorithm that adaptively determines motion vector accuracy while skipping 1/2-pixel and 1/4-pixel sub-pixels. Motion estimation. Based on the performance characteristics of the error surface and the block distortion measurement of the entire pixel, Document 10 proposes a free interpolation algorithm to reduce the computational complexity of HEVC sub-pixel motion. These algorithms effectively reduce the coding complexity of the sub-pixel motion, but do not consider the optimal motion vector relationship between integer pixel motion and sub-pixel motion. By establishing the optimal motion vector relationship between the integer pixel motion estimation and the pixel motion estimation, the coding complexity of the pixel motion can be further optimized.
文献列表:List of documents:
文献1,L.Gao,S.Dong,W.Wang,and W.Gao,“A novel interger-pixel motion estimation algorithm based quadratic prediction,”in Proc.Int.Conf.Image Process.(ICIP),Quebec,Canada,Sept.2015,pp.2810-2814.Document 1, L. Gao, S. Dong, W. Wang, and W. Gao, "A novel interger-pixel motion estimation algorithm based quadratic prediction," in Proc. Int. Conf. Image Process. (ICIP), Quebec, Canada, Sept.2015, pp.2810-2814.
文献2,X.Li,R.Wang,X.Cui,W.Wang,“Context-adaptive fast motion estimation  of HEVC,”in Proc.Int.Symp.Circuits Syst.(ISCAS),Lisbon,Portugal,May 2015,pp.2784-2787.Literature 2, X.Li, R.Wang, X.Cui, W.Wang, "Context-adaptive fast motion estimation Of HEVC," in Proc. Int. Symp. Circuits Syst. (ISCAS), Lisbon, Portugal, May 2015, pp. 2784-2787.
文献3,S.Yang,H.J.Shim,B.Jeon,“Motion vector inheritance method for fast HEVC encoding,”in Proc.Int.Symp.Broadband Multimedia Syst.Broadcast.(BMSB),Beijing,China,Jun.2014,pp.1-4.Document 3, S. Yang, HJ Shim, B. Jeon, "Motion vector inheritance method for fast HEVC encoding," in Proc. Int. Symp. Broadband Multimedia Syst. Broadcast. (BMSB), Beijing, China, Jun. 2014, Pp.1-4.
文献4,N.Hu,E.H.Yang,“Fast motion estimation based on confidence interval,”IEEE Trans.Circuits Syst.Video Technol.,vol.24,no.8,pp.1310-1322,Aug.2014.Document 4, N. Hu, E.H. Yang, "Fast motion estimation based on confidence interval," IEEE Trans. Circuits Syst. Video Technol., vol. 24, no. 8, pp. 1310-1322, Aug. 2014.
文献5,T.-K.Lee,Y.-L.Chan,and W.-C.Siu,“Depth-based adaptive search range algorithm for motion estimation in HEVC,”in Proc.19th Int.Conf.Digital Signal Process.,Hong Kong,Aug.2014,pp.919-923.Document 5, T.-K. Lee, Y.-L. Chan, and W.-C. Siu, "Depth-based adaptive search range algorithm for motion estimation in HEVC," in Proc. 19th Int. Conf.Digital Signal Process., Hong Kong, Aug. 2014, pp. 919-923.
文献6,W.Dai,Oscar C.Au,C.Pang,L.Sun,R.Zou,and S.Li,“A novel fast two step sub-pixel motion estimation algorithm in HEVC,”in Proc.Int.Conf.Acoustics,Speech and Signal Process.(ICASSP),Kyoto,Japan,Mar.2012,pp.1197-1200.Document 6, W. Dai, Oscar C. Au, C. Pang, L. Sun, R. Zou, and S. Li, "A novel fast two step sub-pixel motion estimation algorithm in HEVC," in Proc. Int. Conf. Acoustics, Speech and Signal Process. (ICASSP), Kyoto, Japan, Mar. 2012, pp. 1197-1200.
文献7,H.Li,Y.Zhang,H.Chao,“An optimally scalable and cost-effective fraction-pixel motion estimation algorithm for HEVC,”in Proc.Int.Conf.Acoustics,Speech and Signal Process.(ICASSP),Vancouver,BC,Canada,May 2013,pp.1399-1403.Document 7, H. Li, Y. Zhang, H. Chao, "An optimally scalable and cost-effective fraction-pixel motion estimation algorithm for HEVC," in Proc. Int. Conf. Acoustics, Speech and Signal Process. (ICASSP) , Vancouver, BC, Canada, May 2013, pp. 1399-1403.
文献8,T.Sotetsumoto,T.Song,T.Shimamoto,“Low complexity algorithm for sub-pixel motion estimation of HEVC,”in Proc.Int.Conf.Signal Process.Commun.and Comput.(ICSPCC),Kunming,China,Aug.2013,pp.1-4.Document 8, T. Sotetsumoto, T. Song, T. Shimamoto, "Low complexity algorithm for sub-pixel motion estimation of HEVC," in Proc. Int. Conf. Signal Process. Commun. and Comput. (ICSPCC), Kunming, China, Aug. 2013, pp.1-4.
文献9,S.G.Blasi,I.Zupancic,E.Izquierdo,E.Peixoto,“Adaptive precision motion estimation for HEVC coding,”in Proc.Picture Coding Symp.(PCS),Cairns,Australia,May 2015,pp.144-148.Document 9, SGBlasi, I. Zupancic, E. Izquierdo, E. Peixoto, "Adaptive precision motion estimation for HEVC coding," in Proc. Picture Coding Symp. (PCS), Cairns, Australia, May 2015, pp. 144- 148.
文献10,X.Zuo,L.Yu,“A novel interpolation-free scheme for fractional pixel motion estimation,”in Proc.Picture Coding Symp.(PCS),Cairns,Australia,May 2015,pp.80-84.Document 10, X. Zuo, L. Yu, "A novel interpolation-free scheme for fractional pixel motion estimation," in Proc. Picture Coding Symp. (PCS), Cairns, Australia, May 2015, pp. 80-84.
文献11,F.Bossen,Common test conditions and software reference configurations,ITU-T/ISO/IEC Joint Collaborative Team on Video Coding(JCT-VC)Document JCTVC-J1100,Mar.2012.Document 11, F. Bossen, Common test conditions and software reference configurations, ITU-T/ISO/IEC Joint Collaborative Team on Video Coding (JCT-VC) Document JCTVC-J1100, Mar.2012.
文献12,Z.Pan,S.Kwong,M.-T.Sun,“Early MEMERGE mode decision based on motion estimation and hierarchical depth correlation for HEVC,”IEEE Trans.Broadcast.,vol.60,no.2,pp.405-412,Jun.2014. Document 12, Z. Pan, S. Kwong, M.-T. Sun, "Early MEMERGE mode decision based on motion estimation and hierarchical depth correlation for HEVC," IEEE Trans. Broadcast., vol. 60, no. 2, pp .405-412, Jun.2014.
文献13,G.Bjontegaard,Calculation of Average PSNR Differences between RD curves,ITU-T VCEG,Documement VCEG-M33,Apr.2001.Document 13, G. Bjontegaard, Calculation of Average PSNR Differences between RD curves, ITU-T VCEG, Documement VCEG-M33, Apr. 2001.
发明内容Summary of the invention
本发明的目的,在于提供一种基于在线学习的快速运动估计方法,其能够有效节约编码时间,提高编码性能。The object of the present invention is to provide a fast motion estimation method based on online learning, which can effectively save coding time and improve coding performance.
为了达成上述目的,本发明的解决方案是:In order to achieve the above object, the solution of the present invention is:
一种基于在线学习的快速运动估计方法,包括如下步骤:A fast motion estimation method based on online learning, comprising the following steps:
(1)对当前编码单元使用Inter_2N×2N根预测单元进行编码,具体过程包括整像素运动估计和分像素运动估计,根预测单元的最佳运动矢量记作
Figure PCTCN2016092751-appb-000006
如果
Figure PCTCN2016092751-appb-000007
转到步骤(2),否则转到步骤(3),其中,
Figure PCTCN2016092751-appb-000008
表示根预测单元采用整像素运动估计操作时的最佳运动矢量;
(1) Encoding the current coding unit using Inter_2N×2N prediction units, the specific process includes integer pixel motion estimation and sub-pixel motion estimation, and the optimal motion vector of the root prediction unit is recorded.
Figure PCTCN2016092751-appb-000006
in case
Figure PCTCN2016092751-appb-000007
Go to step (2), otherwise go to step (3), where
Figure PCTCN2016092751-appb-000008
Representing the best motion vector when the root prediction unit uses the integer pixel motion estimation operation;
(2)对当前编码单元使用子预测单元Inter_2N×N、Inter_N×2N、Inter_N×N、Inter_2N×nU、Inter_2N×nD、Inter_nL×2N和Inter_nR×2N依次进行编码,具体过程包括整像素运动估计,转到步骤(4);(2) sequentially encoding the current coding unit using sub-prediction units Inter_2N×N, Inter_N×2N, Inter_N×N, Inter_2N×nU, Inter_2N×nD, Inter_nL×2N, and Inter_nR×2N, the specific process including integer pixel motion estimation, Go to step (4);
(3)对当前编码单元使用子预测单元Inter_2N×N、Inter_N×2N、Inter_N×N、Inter_2N×nU、Inter_2N×nD、Inter_nL×2N和Inter_nR×2N依次进行编码,具体过程包括整像素运动估计和分像素运动估计,转到步骤(4);(3) sequentially encoding the current coding unit using sub-prediction units Inter_2N×N, Inter_N×2N, Inter_N×N, Inter_2N×nU, Inter_2N×nD, Inter_nL×2N, and Inter_nR×2N, the specific process including integer pixel motion estimation and Pixel motion estimation, go to step (4);
(4)对当前编码单元采用帧内预测单元进行编码,转到步骤(5);(4) encoding the current coding unit by using an intra prediction unit, and moving to step (5);
(5)存储编码信息以及写码流,并返回步骤(1)对下一个编码单元进行编码。(5) storing the encoded information and the write code stream, and returning to step (1) to encode the next coding unit.
上述步骤(1)中,当HEVC编码器工作时,每一帧图像被划分为一系列的基于四义树结构的编码树单元,该编码树单元是HEVC的基本处理单元;在编码过程中,基于四叉树结构,编码树单元又进一步分为编码单元;根据编码预测类型,编码单元进一步划分为1个,2个或4个预测单元,预测单元是帧内预测和帧间预测的基本处理单元。In the above step (1), when the HEVC encoder operates, each frame image is divided into a series of coding tree units based on a quadtree structure, which is a basic processing unit of HEVC; in the encoding process, Based on the quadtree structure, the coding tree unit is further divided into coding units; according to the coding prediction type, the coding unit is further divided into one, two or four prediction units, and the prediction unit is the basic processing of intra prediction and inter prediction. unit.
上述步骤(1)中,在编码单元进行帧间预测2N×2N编码进程中,共支持8种预测单元模式,包括Inter_2Nx2N、Inter_2NxN、Inter_Nx2N、Inter_NxN、Inter_2NxnU、Inter_2NxnD、Inter_nLx2N和Inter_nRx2N。In the above step (1), in the inter prediction 2N×2N encoding process of the coding unit, a total of eight prediction unit modes are supported, including Inter_2Nx2N, Inter_2NxN, Inter_Nx2N, Inter_NxN, Inter_2NxnU, Inter_2NxnD, Inter_nLx2N, and Inter_nRx2N.
上述步骤(1)中,运动估计的过程是:首先对当前编码单元进行整像素运动估计,之后将整像素运动估计得到的最佳运动矢量设为分像素运动估计的起始 位置,对当前编码单元进行分像素运动估计,最后,通过比较整像素运动估计的最佳运动矢量
Figure PCTCN2016092751-appb-000009
和分像素运动估计的最佳运动矢量
Figure PCTCN2016092751-appb-000010
间的率失真代价确定当前预测模式的运动估计的最佳运动矢量
Figure PCTCN2016092751-appb-000011
In the above step (1), the motion estimation process is: first, performing integer pixel motion estimation on the current coding unit, and then setting the optimal motion vector obtained by the whole pixel motion estimation as the starting position of the sub-pixel motion estimation, and the current coding. The unit performs sub-pixel motion estimation, and finally, the optimal motion vector estimated by comparing the integer pixel motion
Figure PCTCN2016092751-appb-000009
And the best motion vector for sub-pixel motion estimation
Figure PCTCN2016092751-appb-000010
Interval rate distortion cost determines the best motion vector for motion estimation of the current prediction mode
Figure PCTCN2016092751-appb-000011
which is
Figure PCTCN2016092751-appb-000012
Figure PCTCN2016092751-appb-000012
其中,Jipme和Jfpme分别为
Figure PCTCN2016092751-appb-000013
Figure PCTCN2016092751-appb-000014
的率失真代价。
Among them, J ipme and J fpme are respectively
Figure PCTCN2016092751-appb-000013
with
Figure PCTCN2016092751-appb-000014
Rate distortion cost.
上述步骤(3)中,子预测单元的最佳运动矢量
Figure PCTCN2016092751-appb-000015
通过学习其根预测单元的运动矢量信息而确定,即
In step (3) above, the optimal motion vector of the sub-prediction unit
Figure PCTCN2016092751-appb-000015
Determined by learning the motion vector information of its root prediction unit, ie
Figure PCTCN2016092751-appb-000016
Figure PCTCN2016092751-appb-000016
其中,
Figure PCTCN2016092751-appb-000017
表示子预测单元的整像素运动估计的最佳运动矢量,
Figure PCTCN2016092751-appb-000018
表示根预测单元的最佳运动矢量,
Figure PCTCN2016092751-appb-000019
表示根预测单元的整像素运动估计的最佳运动矢量,
Figure PCTCN2016092751-appb-000020
表示子预测单元的分像素运动估计的最佳运动矢量。
among them,
Figure PCTCN2016092751-appb-000017
The best motion vector representing the integer pixel motion estimation of the sub-prediction unit,
Figure PCTCN2016092751-appb-000018
Indicates the best motion vector for the root prediction unit,
Figure PCTCN2016092751-appb-000019
The best motion vector representing the integer pixel motion estimation of the root prediction unit,
Figure PCTCN2016092751-appb-000020
An optimal motion vector that represents the sub-pixel motion estimation of the sub-prediction unit.
采用上述方案后,本发明首先根据基于可变大小的预测单元的运动估计过程的特征,将所有的预测单元分为根预测单元(Inter_2N×2N)和子预测单元(其它预测单元模式),然后,通过学习其根预测单元的分像素运动估计结果,自适应性地跳过子预测单元的分像素运动估计进程过程,从而优化分像素运动估计的编码复杂度。After adopting the above scheme, the present invention firstly divides all prediction units into a root prediction unit (Inter_2N×2N) and a sub prediction unit (other prediction unit patterns) according to characteristics of a motion estimation process based on a variable size prediction unit, and then, By learning the sub-pixel motion estimation result of its root prediction unit, the sub-pixel motion estimation process of the sub-prediction unit is adaptively skipped, thereby optimizing the coding complexity of the sub-pixel motion estimation.
附图说明DRAWINGS
图1是预测单元间的继承关系架构图;Figure 1 is an architectural diagram of the inheritance relationship between prediction units;
图2是对条件概率进行统计分析的实验结果。Figure 2 is an experimental result of statistical analysis of conditional probability.
具体实施方式detailed description
以下将结合附图,对本发明的技术方案进行详细说明。The technical solution of the present invention will be described in detail below with reference to the accompanying drawings.
为了分析分像素运动估计的编码复杂度,需要测试分像素运动估计的总编码时间与运动估计的总编码时间的比率Cm以及分像素运动估计的总编码时间与HEVC编码器的总编码时间的比率Ch。通过HEVC参考软件HM12.0,编码带有不同分辨率和内容特性的五个HEVC标准测试视频序列,包括“BQSquare”、 “PartyScene”、“KristenAndSara”、“BasketballDrive”和“PeopleOnStreet”,关于这五个序列的特性在文献11中有详细定义。测试条件定义如下,最大编码单元尺寸和最大四叉树深度分别为64×64和4;整像素运动估计方法及其搜索范围分别设为TZSearch和±64;量化参数(QP)为27,并且使用Low-Delay-Main(LDM)和Random-Access-Main(RAM)编码配置,其他编码参数使用HM和文献11中的默认设置。由式(1)计算Cm和ChIn order to analyze the coding complexity of the sub-pixel motion estimation, it is necessary to test the ratio C m of the total coding time of the sub-pixel motion estimation to the total coding time of the motion estimation and the total coding time of the sub-pixel motion estimation and the total coding time of the HEVC encoder. Ratio C h . Five HEVC standard test video sequences with different resolution and content characteristics, including "BQSquare", "PartyScene", "KristenAndSara", "BasketballDrive" and "PeopleOnStreet", are encoded by HEVC reference software HM12.0. The characteristics of the sequences are defined in detail in Document 11. The test conditions are defined as follows, the maximum coding unit size and the maximum quadtree depth are 64×64 and 4, respectively; the integer pixel motion estimation method and its search range are set to TZSearch and ±64, respectively; the quantization parameter (QP) is 27, and is used. Low-Delay-Main (LDM) and Random-Access-Main (RAM) encoding configurations, other encoding parameters use the default settings in HM and Document 11. Calculate C m and C h from equation (1),
Figure PCTCN2016092751-appb-000021
Figure PCTCN2016092751-appb-000021
其中,Tf、Tm和Th分别表示分像素运动估计、运动估计和HEVC编码器的总编码时间。统计结果见表1。Where T f , T m and T h represent the sub-pixel motion estimation, the motion estimation and the total coding time of the HEVC encoder, respectively. The statistical results are shown in Table 1.
从表1中可以看出,分像素运动估计的总编码时间占运动估计的总编码时间的很大比例。对LDM编码配置来说,Cm的值为47.82%到82.92%,平均为68.65%。对RAM编码配置来说,Cm的值为51.89%到84.80%,平均为66.14%。As can be seen from Table 1, the total coding time of the sub-pixel motion estimation accounts for a large proportion of the total coding time of the motion estimation. For the LDM encoding configuration, the value of C m is 47.82% to 82.92% with an average of 68.65%. For the RAM encoding configuration, the value of C m is 51.89% to 84.80%, with an average of 66.14%.
表1 关于FPME编码的编码复杂度分析(%),QP=27Table 1 Code complexity analysis (%) for FPME coding, QP=27
Figure PCTCN2016092751-appb-000022
Figure PCTCN2016092751-appb-000022
此外,我们观察到“BasketballDrive”的Cm的值大幅度下降,这是因为该序列内容复杂且对象移动过快,导致IPME需要更多的编码时间以定位最佳搜索点。此外,应当注意FPME的总编码时间也占HEVC编码器的总编码时间的较大比例。对LDM和RAM编码配置来说,Ch的值为31.57%到52.90%,平均为41.04%,以及31.72%到41.06%,平均为35.57%。从这些数值中,我们可以得出FPME大大增加了运动估计和HEVC编码器的编码复杂度,因此如果优化FPME进程,将显著节省编码时间。 In addition, we observed that the value of C m for "BasketballDrive" dropped significantly because the sequence was complex and the object moved too fast, resulting in IPME requiring more coding time to locate the best search point. In addition, it should be noted that the total coding time of the FPME also accounts for a large proportion of the total coding time of the HEVC encoder. For LDM and RAM encoding configurations, the value of C h is 31.57% to 52.90%, with an average of 41.04%, and 31.72% to 41.06%, with an average of 35.57%. From these values, we can conclude that FPME greatly increases the motion complexity and the coding complexity of the HEVC encoder, so if the FPME process is optimized, the coding time will be significantly saved.
FPME旨在细化IPME的搜索结果,并实现编码效率最大化。为了分析运动估计中最佳搜索点的分布,事件M表示整像素运动估计的最优搜索点被选为运动估计的最佳搜索点,事件N表示分像素运动估计的最佳搜索点被选为运动估计的最佳搜索点。概率P(M)和P(N),统计结果见表2。FPME aims to refine IPME search results and maximize coding efficiency. In order to analyze the distribution of the best search points in the motion estimation, the event M indicates that the optimal search point for the whole pixel motion estimation is selected as the best search point for motion estimation, and the event N indicates that the best search point for the pixel motion estimation is selected as The best search point for motion estimation. Probabilities P(M) and P(N), the statistical results are shown in Table 2.
表2 最佳MV分配的统计结果(%),QP=27Table 2 Statistical results of the best MV allocation (%), QP=27
Figure PCTCN2016092751-appb-000023
Figure PCTCN2016092751-appb-000023
从表2中可以看出,整像素运动估计的最佳搜索点具有较高的概率被选为整个运动估计的最佳搜索点。对LDM编码配置来说,P(M)的概率为93.85%到99.81%,平均为96.23%。而P(N)的概率仅为0.19%到6.15%,平均为3.77%。对RAM编码配置来说,P(M)所占概率为92.66%到98.42%,平均为95.28%。同时,P(N)的概率仅为1.58%到7.34%,平均为4.73%。此外,可观察到在LDM和RAM编码配置条件下的“KristenAndSara”和“PeopleOnStreet”的概率分别最大,这是因为“KristenAndSara”内容简单且大部分区域为背景。通过这些数值,我们可以注意到超过95%的预测单元选择整像素运动估计的最佳搜索点作为其运动估计过程的最佳搜索点,因此,如果这些选择整像素运动估计结果作为整个运动估计过程的最优解的预测单元能够被提前确定,那么运动估计的编码时间将会被显著降低。As can be seen from Table 2, the optimal search point for the whole pixel motion estimation has a higher probability and is selected as the best search point for the entire motion estimation. For the LDM encoding configuration, the probability of P(M) is 93.85% to 99.81%, with an average of 96.23%. The probability of P(N) is only 0.19% to 6.15%, with an average of 3.77%. For the RAM encoding configuration, the probability of P(M) is 92.66% to 98.42%, with an average of 95.28%. At the same time, the probability of P(N) is only 1.58% to 7.34%, with an average of 4.73%. In addition, it can be observed that the probability of "KristenAndSara" and "PeopleOnStreet" under the LDM and RAM encoding configuration conditions is the largest, respectively, because "KristenAndSara" content is simple and most of the area is the background. From these values, we can notice that more than 95% of the prediction units select the best search point for the whole pixel motion estimation as the best search point for their motion estimation process. Therefore, if these select the whole pixel motion estimation result as the whole motion estimation process The prediction unit of the optimal solution can be determined in advance, and the coding time of the motion estimation will be significantly reduced.
当HEVC编码器工作时,每一帧图像被划分为一系列的基于四叉树结构的编码树单元,编码树单元是HEVC的基本处理单元。在编码过程中,基于四叉树结构,编码树单元又进一步分为编码单元。根据编码预测类型,编码单元进一步划分为1个,2个或者4个预测单元,预测单元是帧内预测和帧间预测的基本处理单元。在对编码单元进行帧间预测编码进程中,共支持8种预测单元模式,包括Inter_2N×2N、Inter_2N×N、Inter_N×2N、Inter_N×N、Inter_2N×nU、Inter_2N×nD、Inter_nL×2N和Inter_nR×2N(可参见文献12)。在对编码单 元进行帧间预测编码过程中,所有帧间预测单元模式依次进行编码,以达到最大限度去除时域数据冗余。在帧间预测运动估计过程中,首先对当前预测单元模式进行整像素运动估计操作。然后,将整像素运动估计所得到的最佳运动矢量设为分像素运动估计的起始位置,对当前预测单元模式进行分像素运动估计操作。最后,通过比较整像素运动估计的最佳运动矢量
Figure PCTCN2016092751-appb-000024
和分像素运动估计的最佳运动矢量
Figure PCTCN2016092751-appb-000025
间的率失真代价确定当前预测模式的运动估计的最佳运动矢量
Figure PCTCN2016092751-appb-000026
When the HEVC encoder is operating, each frame of image is divided into a series of coding tree units based on a quadtree structure, which is the basic processing unit of HEVC. In the encoding process, based on the quadtree structure, the coding tree unit is further divided into coding units. According to the coding prediction type, the coding unit is further divided into one, two or four prediction units, and the prediction unit is a basic processing unit of intra prediction and inter prediction. In the process of interframe predictive coding for coding units, a total of eight prediction unit modes are supported, including Inter_2N×2N, Inter_2N×N, Inter_N×2N, Inter_N×N, Inter_2N×nU, Inter_2N×nD, Inter_nL×2N, and Inter_nR. ×2N (see Reference 12). In the process of inter-frame predictive coding of coding units, all inter-prediction unit modes are sequentially encoded to achieve maximum removal of time domain data redundancy. In the inter prediction motion estimation process, an integer pixel motion estimation operation is first performed on the current prediction unit mode. Then, the optimal motion vector obtained by the whole pixel motion estimation is set as the starting position of the sub-pixel motion estimation, and the current prediction unit mode is subjected to the sub-pixel motion estimation operation. Finally, by comparing the best motion vectors estimated by the whole pixel motion
Figure PCTCN2016092751-appb-000024
And the best motion vector for sub-pixel motion estimation
Figure PCTCN2016092751-appb-000025
Interval rate distortion cost determines the best motion vector for motion estimation of the current prediction mode
Figure PCTCN2016092751-appb-000026
which is
Figure PCTCN2016092751-appb-000027
Figure PCTCN2016092751-appb-000027
其中,Jipme和Jfpme分别为
Figure PCTCN2016092751-appb-000028
Figure PCTCN2016092751-appb-000029
的率失真代价。
Among them, J ipme and J fpme are respectively
Figure PCTCN2016092751-appb-000028
with
Figure PCTCN2016092751-appb-000029
Rate distortion cost.
通过考虑预测单元内容特性和预测单元分割类型,如果Inter_2Nx2N的内容属于简单区域,则其他帧间预测单元模式也有很大可能性属于简单区域,比如背景。并且,在视频编码中,内容简单的预测单元通常以整像素运动估计进行编码。如表2所示,超过95%的预测单元选择整像素运动估计的最佳搜索点作为整个运动估计的最佳搜索点。如图1所示,基于预测单元分割类型,Inter_2Nx2N作为其他帧间预测单元模式的根,这些剩余预测单元模式为子预测单元模式。因此,如果根预测单元模式选择整像素运动估计的最佳运动矢量作为运动估计进程的最终最佳运动矢量,意味着根预测单元的内容简单或根预测单元模式的运动活动缓慢,从而子预测单元也有很大可能性选择整像素运动估计的最优运动矢量作为其最终最佳运动矢量且跳过分像素运动估计。为了验证根预测单元模式与其子预测单元模式间的最优运动矢量选择相关性,事件S表示根预测单元模式选择整像素运动估计的最佳运动矢量作为整个运动估计过程的最佳运动矢量,事件T表示子预测单元模式也选择整像素运动估计的最佳运动矢量作为整个运动估计过程的最佳运动矢量,计算条件概率P(T|S)。实验结果如图2所示。By considering the prediction unit content characteristics and the prediction unit partition type, if the content of Inter_2Nx2N belongs to a simple region, other inter prediction unit patterns also have a high probability of belonging to a simple region, such as a background. Also, in video coding, a prediction unit whose content is simple is usually encoded with an integer pixel motion estimation. As shown in Table 2, more than 95% of the prediction units select the best search point for the whole pixel motion estimation as the best search point for the entire motion estimation. As shown in FIG. 1, based on the prediction unit partition type, Inter_2Nx2N is the root of other inter prediction unit modes, and these remaining prediction unit modes are sub-prediction unit modes. Therefore, if the root prediction unit mode selects the optimal motion vector of the whole pixel motion estimation as the final optimal motion vector of the motion estimation process, it means that the content of the root prediction unit is simple or the motion activity of the root prediction unit mode is slow, so that the sub prediction unit There is also a high probability of selecting the optimal motion vector for the whole pixel motion estimation as its final best motion vector and skipping the sub-pixel motion estimation. In order to verify the optimal motion vector selection correlation between the root prediction unit mode and its sub-prediction unit mode, the event S represents that the root prediction unit mode selects the optimal motion vector of the whole pixel motion estimation as the best motion vector of the entire motion estimation process, event T denotes that the sub-prediction unit mode also selects the optimal motion vector of the whole pixel motion estimation as the optimal motion vector of the entire motion estimation process, and calculates the conditional probability P(T|S). The experimental results are shown in Figure 2.
从图2中,我们可以看出当根预测单元模式选择整像素运动估计的最佳运动矢量作为其运动估计的最佳运动矢量,其子预测单元模式也有相当大的可能性选择整像素运动估计的最佳运动矢量作为其运动估计的最佳运动矢量。对LDM编码配置来说,条件概率P(T|S)为96.68%到99.70%,对RAM编码配置来说,条件概率P(T|S)为96.70%到99.07%。对于序列“KristenAndSara”和 “PeopleOnStreet”来说,其条件概率达到99%,这是因为这两个序列具有许多内容简单的区域,例如“KristenAndSara”中的大部分区域都是背景。基于分析结果,可以作出这样的结论:根预测单元模式和其子预测单元模式之间存在很大的运动矢量选择相关性,即根预测单元模式选择整像素运动估计的最佳运动矢量作为其运动估计的最佳运动矢量,其子预测单元模式也选择择整像素运动估计的最佳运动矢量作为其运动估计的最佳运动矢量。因此,我们能够使用根预测单元的运动矢量矢量信息来快速编码其子预测单元模式。From Fig. 2, we can see that when the root prediction unit mode selects the best motion vector of the whole pixel motion estimation as the best motion vector for motion estimation, its sub-prediction unit mode also has considerable possibility to select the whole pixel motion estimation. The best motion vector is used as the best motion vector for its motion estimation. For the LDM encoding configuration, the conditional probability P(T|S) is 96.68% to 99.70%, and for the RAM encoding configuration, the conditional probability P(T|S) is 96.70% to 99.07%. For the sequence "KristenAndSara" and For "PeopleOnStreet", the conditional probability is 99%, because the two sequences have many simple areas, for example, most of the areas in "KristenAndSara" are backgrounds. Based on the analysis results, it can be concluded that there is a large motion vector selection correlation between the root prediction unit mode and its sub-prediction unit mode, that is, the root prediction unit mode selects the optimal motion vector of the whole pixel motion estimation as its motion. The estimated optimal motion vector, whose sub-prediction unit mode also selects the optimal motion vector of the integer pixel motion estimation as the best motion vector for its motion estimation. Therefore, we can use the motion vector vector information of the root prediction unit to quickly encode its sub-prediction unit mode.
鉴于根预测单元与其子预测单元间的最佳运动矢量选择之间的关系,子预测单元模式的最佳运动矢量,即
Figure PCTCN2016092751-appb-000030
可通过学习其根预测单元的运动矢量信息而及早确定,即
In view of the relationship between the optimal motion vector selection between the root prediction unit and its sub-prediction unit, the optimal motion vector of the sub-prediction unit mode, ie
Figure PCTCN2016092751-appb-000030
Can be determined early by learning the motion vector information of its root prediction unit, ie
Figure PCTCN2016092751-appb-000031
Figure PCTCN2016092751-appb-000031
其中,
Figure PCTCN2016092751-appb-000032
表示子预测单元的整像素运动估计的最佳运动矢量,
Figure PCTCN2016092751-appb-000033
表示根预测单元的最佳运动矢量,
Figure PCTCN2016092751-appb-000034
表示根预测单元的整像素运动估计的最佳运动矢量,
Figure PCTCN2016092751-appb-000035
表示子预测单元的分像素运动估计的最佳运动矢量。
among them,
Figure PCTCN2016092751-appb-000032
The best motion vector representing the integer pixel motion estimation of the sub-prediction unit,
Figure PCTCN2016092751-appb-000033
Indicates the best motion vector for the root prediction unit,
Figure PCTCN2016092751-appb-000034
The best motion vector representing the integer pixel motion estimation of the root prediction unit,
Figure PCTCN2016092751-appb-000035
An optimal motion vector that represents the sub-pixel motion estimation of the sub-prediction unit.
基于以上分析,本发明提供一种基于在线学习的快速运动估计方法,包括如下步骤:Based on the above analysis, the present invention provides a fast motion estimation method based on online learning, which includes the following steps:
(1)对当前编码单元使用Inter_2N×2N根预测单元进行编码,具体过程包括整像素运动估计和分像素运动估计,根预测单元的最佳运动矢量记作
Figure PCTCN2016092751-appb-000036
如果根预测单元的最佳运动矢量等于根预测单元采用整像素运动估计操作时的最佳运动矢量
Figure PCTCN2016092751-appb-000037
即如果
Figure PCTCN2016092751-appb-000038
转到步骤(2),否则转到步骤(3);
(1) Encoding the current coding unit using Inter_2N×2N prediction units, the specific process includes integer pixel motion estimation and sub-pixel motion estimation, and the optimal motion vector of the root prediction unit is recorded.
Figure PCTCN2016092751-appb-000036
If the best motion vector of the root prediction unit is equal to the best motion vector when the root prediction unit uses the integer pixel motion estimation operation
Figure PCTCN2016092751-appb-000037
That is
Figure PCTCN2016092751-appb-000038
Go to step (2), otherwise go to step (3);
(2)对当前编码单元使用子预测单元Inter_2N×N、Inter_N×2N、Inter_N×N、Inter_2N×nU、Inter_2N×nD、Inter_nL×2N和Inter_nR×2N依次进行编码,具体过程仅包括整像素运动估计操作,转到步骤(4);(2) The current coding unit uses the sub-prediction units Inter_2N×N, Inter_N×2N, Inter_N×N, Inter_2N×nU, Inter_2N×nD, Inter_nL×2N, and Inter_nR×2N to sequentially encode, and the specific process only includes integer pixel motion estimation. Operation, go to step (4);
(3)对当前编码单元使用子预测单元Inter_2N×N、Inter_N×2N、Inter_N×N、Inter_2N×nU、Inter_2N×nD、Inter_nL×2N和Inter_nR×2N依次进行编码,具体过程包括整像素运动估计和分像素运动估计,转到步骤(4);(3) sequentially encoding the current coding unit using sub-prediction units Inter_2N×N, Inter_N×2N, Inter_N×N, Inter_2N×nU, Inter_2N×nD, Inter_nL×2N, and Inter_nR×2N, the specific process including integer pixel motion estimation and Pixel motion estimation, go to step (4);
(4)对当前编码单元采用帧内预测单元进行编码,转到步骤(5); (4) encoding the current coding unit by using an intra prediction unit, and moving to step (5);
(5)存储编码信息以及写码流,并返回步骤(1)对下一个编码单元进行编码。(5) storing the encoded information and the write code stream, and returning to step (1) to encode the next coding unit.
以下将给出本发明的实验结果。The experimental results of the present invention will be given below.
(一)本发明提出的算法理论上节约的编码时间(1) The theoretically saved coding time of the algorithm proposed by the present invention
通过算法我们可以知道本发明所提出的方法节省的编码时间取决于选择整像素运动估计结果作为运动估计最佳运动矢量的根预测单元的数。如果大多数根预测单元选择整像素运功估计结果作为运动估计的最佳运动矢量,那么编码时间将会大幅度降低。我们测试了5个不同分辨率的视频序列,以分析在实时编码过程中根预测单元选择整像素运动估计的结果最为最佳运动矢量的百分比。测试结果见表3。By algorithm we can know that the coding time saved by the proposed method depends on the number of root prediction units that select the integer pixel motion estimation result as the motion estimation optimal motion vector. If most of the root prediction units select the integer pixel motion estimation result as the best motion vector for motion estimation, the coding time will be greatly reduced. We tested five different resolution video sequences to analyze the percentage of the best motion vector for the root prediction unit to select the integer pixel motion estimation during real-time encoding. The test results are shown in Table 3.
表3 根预测单元选择IPME的MV作为其最终最优MV的百分比,QP=27Table 3 The root prediction unit selects the MV of the IPME as the percentage of its final optimal MV, QP=27
编码配置Coding configuration BQSquareBQSquare PartyScenePartyScene KristenAndSaraKristenAndSara BasketballDriveBasketballDrive PeopleOnStreetPeopleOnStreet AverageAverage
LDMLDM 92.10%92.10% 91.25%91.25% 99.51%99.51% 93.53%93.53% 不适用Not applicable 94.10%94.10%
RAMRAM 91.17%91.17% 89.80%89.80% 不适用Not applicable 96.06%96.06% 99.07%99.07% 94.03%94.03%
从表3中可以看出,当使用LDM编码配置时,在实时视频编码处理中,根预测单元选择整像素运动估计结果作为运动估计最佳运动矢量的概率从91.25%到99.51%不等,平均94.10%。当使用RAM编码配置时,在实时视频编码处理中,根预测单元选择整像素运动估计结果作为运动估计最佳运动矢量的概率从89.80%到99.07%,平均值为94.03%。此外,我们可以看到序列“KristenAndSara”和“PeopleOnStreet”的百分比超过99%,这是因为这两个序列含有相当大的内容简单区域,即背景,且由于连续视频帧间的巨大空间相关性,具有这些简单内容的预测单元有很大的几率选择整像素运动估计结果最为运动估计的最佳运动矢量。基于上述数据,我们可以得出结论,本发明提出的方法能够有效地提前中止运动估计过程且显著地节省运动估计时间。As can be seen from Table 3, when using the LDM encoding configuration, in the real-time video encoding process, the probability that the root prediction unit selects the integer pixel motion estimation result as the motion estimation optimal motion vector varies from 91.25% to 99.51%, and the average 94.10%. When the RAM encoding configuration is used, in the real-time video encoding processing, the probability that the root prediction unit selects the integer pixel motion estimation result as the motion estimation optimal motion vector is from 89.80% to 99.07%, and the average value is 94.03%. In addition, we can see that the percentages of the sequences "KristenAndSara" and "PeopleOnStreet" exceed 99%, because these two sequences contain a fairly large area of content, the background, and due to the huge spatial correlation between successive video frames, A prediction unit with these simple contents has a high probability of selecting the best motion vector for the most motion estimation of the integer pixel motion estimation result. Based on the above data, we can conclude that the method proposed by the present invention can effectively suspend the motion estimation process in advance and significantly save motion estimation time.
(二)实时编码性能比较(two) real-time coding performance comparison
为了测试本发明所提出的方法的编码性能,我们使用HEVC参考软件HM12.0作为软件平台。使用HEVC通用测试条件,测试15个HEVC标准测试视频序列,使用LDM和RAM编码配置;且整像素运动估计方法和其搜索范围分别为TZSearch和±64;最大编码单元尺寸和最大四叉树深度分别为64×64和4;测试包括22、27、32和37共计4个QP。硬件平台为具有Intel Xeon CPU E3-1241v3@3.50 GHz和4.00GB RAM的Microsoft Windows7 64位操作系统。In order to test the coding performance of the proposed method of the present invention, we use the HEVC reference software HM12.0 as a software platform. 15 HEVC standard test video sequences were tested using HEVC general test conditions, using LDM and RAM coding configurations; and the integer pixel motion estimation method and its search range were TZSearch and ±64, respectively; maximum coding unit size and maximum quadtree depth were respectively It is 64×64 and 4; the test includes 22, 27, 32 and 37 totaling 4 QPs. Hardware platform with Intel Xeon CPU E3-1241v3@3.50 Microsoft Windows 7 64-bit operating system with GHz and 4.00GB RAM.
我们在峰值信噪比(PSNR)、比特率(BR)、总编码时间节省(TETS)和运动估计时间节省(METS)等方面将本文所提出的算法和最近出版的PCS算法[9]进行了比较。基准检测程序为HEVC参考软件HM12.0,比较结果概括并列于表IV中,其中,Bjontegaard Delta PSNR(BDPSNR)为同一BR下单位为dB的平均PSNR差值,Bjontegaard Delta BR表示同一PSNR下以百分比计的平均BR差值,并且其根据文献13计算;TETS和METS由下式计算:We performed the proposed algorithm and the recently published PCS algorithm [9] in terms of peak signal-to-noise ratio (PSNR), bit rate (BR), total coding time savings (TETS), and motion estimation time savings (METS). Comparison. The benchmark test procedure is HEVC reference software HM12.0, and the comparison results are summarized and listed in Table IV, where Bjontegaard Delta PSNR (BDPSNR) is the average PSNR difference in dB for the same BR, and Bjontegaard Delta BR is expressed as a percentage under the same PSNR. The average BR difference is calculated and is calculated according to document 13; TETS and METS are calculated by:
Figure PCTCN2016092751-appb-000039
Figure PCTCN2016092751-appb-000039
其中,Tφ(QPi)表示QP取值为QPi,Φ∈{PCS[9],本文方法}的情况下,快速运动估计算法Φ在HM12.0上运算所得的总编码时间;TB(QPi)表示为QP取值为QPi的情况下,基准检测程序运动估计算法在HM12.0运算所得的总编码时间,所述基准检测程序运动估计算法包括原始TZSearch和原始分像素运动估计算法,可以表示为HM12.0-ME;Mφ(QPi)表示QP取值为QPi,Φ∈{PCS[9],本文方法}的情况下,快速运动估计算法Φ在HM12.0上运算所得的总运动估计编码时间;MB(QPi)表示QP取值为QPi,QPi∈{22,27,32,37}的情况下,HM12.0-ME的总运动估计编码时间。Wherein, T φ (QP i) represents the value of QP QP i, Φ∈ {PCS [9 ], the proposed method} case, the fast motion estimation algorithm Φ total encoding time operations on the resulting HM12.0; T B (QP i ) is expressed as the total coding time obtained by the benchmark detection program motion estimation algorithm at HM12.0, where the QP value is QP i , and the benchmark detection program motion estimation algorithm includes the original TZSearch and the original sub-pixel motion estimation. The algorithm can be expressed as HM12.0-ME; M φ (QP i ) indicates that QP takes the value QP i , Φ ∈ {PCS [9], in the case of this method}, the fast motion estimation algorithm Φ is on HM12.0 The total motion estimation coding time obtained by the operation; M B (QP i ) represents the total motion estimation coding time of the HM12.0-ME in the case where the QP value is QP i , QP i ∈ {22, 27, 32, 37} .
从表4中可以看出,对LDM编码配置来说,PCS获得TETS为5.63%到23.62%,平均为14.96%,且实现的METS为8.14%到38.89%,平均为25.70%。同时,PCS和HM12.0-ME之间的BDPSNR为-0.002dB到-0.018dB,平均为-0.009dB,且PCS和HM12.0-ME之间的BDBR为0.07%到0.53%,平均为0.29%。对RAM编码配置来说,PCS获得TETS为6.26%到16.56%,平均为11.45%,且获得的METS为9.91%到30.45%,平均为22.59%。同时,PCS和HM12.0-ME之间的BDPSNR为-0.004dB到-0.026dB,平均为-0.009dB,且PCS和HM12.0-ME之间的BDBR为0.11%到0.56%,平均为0.26%。从这些值中,可以看出PCS实现了极好的RD性能,而限制了所得的TETS和METS。As can be seen from Table 4, for the LDM encoding configuration, the PCS achieved a TETS of 5.63% to 23.62%, an average of 14.96%, and achieved a METS of 8.14% to 38.89% with an average of 25.70%. At the same time, the BDPSNR between PCS and HM12.0-ME is -0.002dB to -0.018dB, with an average of -0.009dB, and the BDBR between PCS and HM12.0-ME is 0.07% to 0.53%, with an average of 0.29. %. For RAM encoding configurations, PCS achieved a TETS of 6.26% to 16.56% with an average of 11.45% and a METS of 9.91% to 30.45% with an average of 22.59%. At the same time, the BDPSNR between PCS and HM12.0-ME is -0.004dB to -0.026dB, with an average of -0.009dB, and the BDBR between PCS and HM12.0-ME is 0.11% to 0.56%, with an average of 0.26. %. From these values, it can be seen that the PCS achieves excellent RD performance and limits the resulting TETS and METS.
从表4中,还可以看出本发明所提出的算法可有效地降低运动估计进程的编码复杂度,而RD性能损失可以忽略不计。对LDM编码配置来说,本发明所提出 的算法实现的TETS为20.16%到35.69%,平均为28.82%,且获得的METS为57.73%到66.05%,平均为62.60%。本发明所提出的算法和HM12.0-ME之间的BDPSNR为-0.019dB到-0.097dB,平均为-0.049dB;且本文所提出的算法和HM12.0-ME之间的BDBR为0.79%到2.18%,平均为1.51%。对RAM编码配置来说,本发明所提出的算法获得的TETS为21.06%到29.95%,平均为24.35%,且获得的METS为61.22%到66.02%,平均为63.82%。此外,本发明所提出的算法和HM12.0-ME之间的BDPSNR为-0.020dB到-0.089dB,平均为-0.051dB;且本发明所提出的算法和HM12.0-ME之间的BDBR为0.90%到2.04%,平均为1.41%。从这些值中,我们可以看出本发明所提出的算法能够有效地降低编码复杂度,且实现良好的率失真性能。From Table 4, it can also be seen that the algorithm proposed by the present invention can effectively reduce the coding complexity of the motion estimation process, and the RD performance loss is negligible. For the LDM encoding configuration, the present invention proposes The algorithm achieves a TETS of 20.16% to 35.69%, with an average of 28.82%, and a METS of 57.73% to 66.05%, with an average of 62.60%. The BDPSNR between the proposed algorithm and HM12.0-ME is -0.019dB to -0.097dB, with an average of -0.049dB; and the BDBR between the proposed algorithm and HM12.0-ME is 0.79%. To 2.18%, the average is 1.51%. For the RAM encoding configuration, the proposed algorithm achieves a TETS of 21.06% to 29.95% with an average of 24.35%, and the obtained METS is 61.22% to 66.02% with an average of 63.82%. In addition, the BDPSNR between the proposed algorithm and the HM12.0-ME is -0.020 dB to -0.089 dB, with an average of -0.051 dB; and the algorithm proposed by the present invention and the BDBR between HM12.0-ME It is 0.90% to 2.04% with an average of 1.41%. From these values, we can see that the proposed algorithm can effectively reduce the coding complexity and achieve good rate-distortion performance.
表4 编码结果汇总Table 4 Summary of coding results
Figure PCTCN2016092751-appb-000040
Figure PCTCN2016092751-appb-000040
Figure PCTCN2016092751-appb-000041
Figure PCTCN2016092751-appb-000041
此外,还可以看到,对于LDM编码配置来说,相较于PCS,本发明所提出的运动估计方法节省了15.87%的总编码时间和49.27%的运动估计编码时间。对于RAM编码配置来说,本发明所提出的算法节省了14.57%的总编码时间和53.26%的运动估计编码时间。基于上述值,我们可以得出本发明所提出的技术方案有效地降低了运动估计编码复杂度的结论。In addition, it can also be seen that for the LDM coding configuration, the motion estimation method proposed by the present invention saves 15.87% of the total coding time and 49.27% of the motion estimation coding time compared to the PCS. For the RAM encoding configuration, the proposed algorithm saves 14.57% of the total encoding time and 53.26% of the motion estimation encoding time. Based on the above values, we can conclude that the technical solution proposed by the present invention effectively reduces the complexity of motion estimation coding.
以上实施例仅为说明本发明的技术思想,不能以此限定本发明的保护范围,凡是按照本发明提出的技术思想,在技术方案基础上所做的任何改动,均落入本发明保护范围之内。 The above embodiments are only for explaining the technical idea of the present invention, and the scope of protection of the present invention is not limited thereto. Any modification made based on the technical idea according to the technical idea of the present invention falls within the protection scope of the present invention. Inside.

Claims (5)

  1. 一种基于在线学习的快速运动估计方法,其特征在于包括如下步骤:A fast motion estimation method based on online learning, comprising the following steps:
    (1)对当前编码单元使用Inter_2N×2N根预测单元进行编码,具体过程包括整像素运动估计和分像素运动估计,根预测单元的最佳运动矢量记作
    Figure PCTCN2016092751-appb-100001
    如果
    Figure PCTCN2016092751-appb-100002
    转到步骤(2),否则转到步骤(3),其中,
    Figure PCTCN2016092751-appb-100003
    表示根预测单元采用整像素运动估计操作时的最佳运动矢量;
    (1) Encoding the current coding unit using Inter_2N×2N prediction units, the specific process includes integer pixel motion estimation and sub-pixel motion estimation, and the optimal motion vector of the root prediction unit is recorded.
    Figure PCTCN2016092751-appb-100001
    in case
    Figure PCTCN2016092751-appb-100002
    Go to step (2), otherwise go to step (3), where
    Figure PCTCN2016092751-appb-100003
    Representing the best motion vector when the root prediction unit uses the integer pixel motion estimation operation;
    (2)对当前编码单元使用子预测单元Inter_2N×N、Inter_N×2N、Inter_N×N、Inter_2N×nU、Inter_2N×nD、Inter_nL×2N和Inter_nR×2N依次进行编码,具体过程包括整像素运动估计,转到步骤(4);(2) sequentially encoding the current coding unit using sub-prediction units Inter_2N×N, Inter_N×2N, Inter_N×N, Inter_2N×nU, Inter_2N×nD, Inter_nL×2N, and Inter_nR×2N, the specific process including integer pixel motion estimation, Go to step (4);
    (3)对当前编码单元使用子预测单元Inter_2N×N、Inter_N×2N、Inter_N×N、Inter_2N×nU、Inter_2N×nD、Inter_nL×2N和Inter_nR×2N依次进行编码,具体过程包括整像素运动估计和分像素运动估计,转到步骤(4);(3) sequentially encoding the current coding unit using sub-prediction units Inter_2N×N, Inter_N×2N, Inter_N×N, Inter_2N×nU, Inter_2N×nD, Inter_nL×2N, and Inter_nR×2N, the specific process including integer pixel motion estimation and Pixel motion estimation, go to step (4);
    (4)对当前编码单元采用帧内预测单元进行编码,转到步骤(5);(4) encoding the current coding unit by using an intra prediction unit, and moving to step (5);
    (5)存储编码信息以及写码流,并返回步骤(1)对下一个编码单元进行编码。(5) storing the encoded information and the write code stream, and returning to step (1) to encode the next coding unit.
  2. 如权利要求1所述的一种基于在线学习的快速运动估计方法,其特征在于:所述步骤(1)中,当HEVC编码器工作时,每一帧图像被划分为一系列的基于四义树结构的编码树单元,该编码树单元是HEVC的基本处理单元;在编码过程中,基于四叉树结构,编码树单元又进一步分为编码单元;根据编码预测类型,编码单元进一步划分为1个,2个或4个预测单元,预测单元是帧内预测和帧间预测的基本处理单元。The online learning-based fast motion estimation method according to claim 1, wherein in the step (1), when the HEVC encoder operates, each frame image is divided into a series of four meanings. a coding tree unit of a tree structure, the coding tree unit is a basic processing unit of HEVC; in the encoding process, based on the quadtree structure, the coding tree unit is further divided into coding units; according to the coding prediction type, the coding unit is further divided into 1 , 2 or 4 prediction units, the prediction unit is the basic processing unit of intra prediction and inter prediction.
  3. 如权利要求1所述的一种基于在线学习的快速运动估计方法,其特征在于:所述步骤(1)中,在编码单元进行帧间预测2N×2N编码进程中,共支持8种预测单元模式,包括Inter_2Nx2N、Inter_2NxN、Inter_Nx2N、Inter_NxN、Inter_2NxnU、Inter_2NxnD、Inter_nLx2N和Inter_nRx2N。The online learning-based fast motion estimation method according to claim 1, wherein in the step (1), in the inter-prediction 2N×2N encoding process of the coding unit, a total of eight prediction units are supported. Modes, including Inter_2Nx2N, Inter_2NxN, Inter_Nx2N, Inter_NxN, Inter_2NxnU, Inter_2NxnD, Inter_nLx2N, and Inter_nRx2N.
  4. 如权利要求1所述的一种基于在线学习的快速运动估计方法,其特征在于:所述步骤(1)中,运动估计的过程是:首先对当前编码单元进行整像素运动估计,之后将整像素运动估计得到的最佳运动矢量设为分像素运动估计的起始 位置,对当前编码单元进行分像素运动估计,最后,通过比较整像素运动估计的最佳运动矢量
    Figure PCTCN2016092751-appb-100004
    和分像素运动估计的最佳运动矢量
    Figure PCTCN2016092751-appb-100005
    间的率失真代价确定当前预测模式的运动估计的最佳运动矢量
    Figure PCTCN2016092751-appb-100006
    The online learning-based fast motion estimation method according to claim 1, wherein in the step (1), the motion estimation process is: first, performing overall pixel motion estimation on the current coding unit, and then The optimal motion vector obtained by pixel motion estimation is set as the starting position of the sub-pixel motion estimation, and the current coding unit is subjected to sub-pixel motion estimation. Finally, the optimal motion vector estimated by comparing the integer pixel motion is compared.
    Figure PCTCN2016092751-appb-100004
    And the best motion vector for sub-pixel motion estimation
    Figure PCTCN2016092751-appb-100005
    Interval rate distortion cost determines the best motion vector for motion estimation of the current prediction mode
    Figure PCTCN2016092751-appb-100006
    which is
    Figure PCTCN2016092751-appb-100007
    Figure PCTCN2016092751-appb-100007
    其中,Jipme和Jfpme分别为
    Figure PCTCN2016092751-appb-100008
    Figure PCTCN2016092751-appb-100009
    的率失真代价。
    Among them, J ipme and J fpme are respectively
    Figure PCTCN2016092751-appb-100008
    with
    Figure PCTCN2016092751-appb-100009
    Rate distortion cost.
  5. 如权利要求1所述的一种基于在线学习的快速运动估计方法,其特征在于:所述步骤(3)中,子预测单元的最佳运动矢量
    Figure PCTCN2016092751-appb-100010
    通过学习其根预测单元的运动矢量信息而确定,即
    A fast motion estimation method based on online learning according to claim 1, wherein in step (3), the optimal motion vector of the sub-prediction unit
    Figure PCTCN2016092751-appb-100010
    Determined by learning the motion vector information of its root prediction unit, ie
    Figure PCTCN2016092751-appb-100011
    Figure PCTCN2016092751-appb-100011
    其中,
    Figure PCTCN2016092751-appb-100012
    表示子预测单元的整像素运动估计的最佳运动矢量,
    Figure PCTCN2016092751-appb-100013
    表示根预测单元的最佳运动矢量,
    Figure PCTCN2016092751-appb-100014
    表示根预测单元的整像素运动估计的最佳运动矢量,
    Figure PCTCN2016092751-appb-100015
    表示子预测单元的分像素运动估计的最佳运动矢量。
    among them,
    Figure PCTCN2016092751-appb-100012
    The best motion vector representing the integer pixel motion estimation of the sub-prediction unit,
    Figure PCTCN2016092751-appb-100013
    Indicates the best motion vector for the root prediction unit,
    Figure PCTCN2016092751-appb-100014
    The best motion vector representing the integer pixel motion estimation of the root prediction unit,
    Figure PCTCN2016092751-appb-100015
    An optimal motion vector that represents the sub-pixel motion estimation of the sub-prediction unit.
PCT/CN2016/092751 2016-08-01 2016-08-01 Fast motion estimation method based on online learning WO2018023352A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/092751 WO2018023352A1 (en) 2016-08-01 2016-08-01 Fast motion estimation method based on online learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/092751 WO2018023352A1 (en) 2016-08-01 2016-08-01 Fast motion estimation method based on online learning

Publications (1)

Publication Number Publication Date
WO2018023352A1 true WO2018023352A1 (en) 2018-02-08

Family

ID=61072440

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/092751 WO2018023352A1 (en) 2016-08-01 2016-08-01 Fast motion estimation method based on online learning

Country Status (1)

Country Link
WO (1) WO2018023352A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111510727A (en) * 2020-04-14 2020-08-07 腾讯科技(深圳)有限公司 Motion estimation method and device
CN113489986A (en) * 2021-05-28 2021-10-08 杭州博雅鸿图视频技术有限公司 Integer pixel motion estimation method and device, electronic equipment and medium
CN113489994A (en) * 2021-05-28 2021-10-08 杭州博雅鸿图视频技术有限公司 Motion estimation method, motion estimation device, electronic equipment and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102196255A (en) * 2010-03-11 2011-09-21 中国科学院微电子研究所 Method for forming video coding complexity control model
CN103414896A (en) * 2013-07-30 2013-11-27 复旦大学 Method for achieving motion estimation based on multiple cores
CN103414899A (en) * 2013-08-16 2013-11-27 武汉大学 Motion estimation method of video coding

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102196255A (en) * 2010-03-11 2011-09-21 中国科学院微电子研究所 Method for forming video coding complexity control model
CN103414896A (en) * 2013-07-30 2013-11-27 复旦大学 Method for achieving motion estimation based on multiple cores
CN103414899A (en) * 2013-08-16 2013-11-27 武汉大学 Motion estimation method of video coding

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PAN, ZHAOQING ET AL.: "Fast Motion Estimation Based on Content Property for Low-Complexity H. 265/HEVC Encoder", IEEE TRANSACTIONS ON BROADCASTING, vol. 62, no. 3, 28 June 2016 (2016-06-28), pages 675 - 685, XP011621500 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111510727A (en) * 2020-04-14 2020-08-07 腾讯科技(深圳)有限公司 Motion estimation method and device
CN111510727B (en) * 2020-04-14 2022-07-15 腾讯科技(深圳)有限公司 Motion estimation method and device
CN113489986A (en) * 2021-05-28 2021-10-08 杭州博雅鸿图视频技术有限公司 Integer pixel motion estimation method and device, electronic equipment and medium
CN113489994A (en) * 2021-05-28 2021-10-08 杭州博雅鸿图视频技术有限公司 Motion estimation method, motion estimation device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
KR102114036B1 (en) Method and apparratus of video decoding
KR102130821B1 (en) Method for video decoding and computer readable redording meduim thereof
KR102070719B1 (en) Method for inter prediction and apparatus thereof
US20190230379A1 (en) Method and device for intra-prediction
Ma et al. Low complexity rate distortion optimization for HEVC
WO2022121787A1 (en) Method and apparatus for video predictive coding
EP3621303B1 (en) Processing method and device for video encoding, and storage medium
CN114830651A (en) Intra prediction method, encoder, decoder, and computer storage medium
EP3824628A1 (en) Transform variations of multiple separable transform selection
Chiang et al. Fast zero block detection and early CU termination for HEVC video coding
WO2018023352A1 (en) Fast motion estimation method based on online learning
WO2022227622A1 (en) Weight-configurable inter-frame and intra-frame joint prediction coding and decoding methods and devices
US9686556B2 (en) Rate-estimation for encoding rate control
WO2020181476A1 (en) Video image prediction method and device
Zhang et al. Rotational weighted averaged template matching for intra prediction
KR102281514B1 (en) Method for inter prediction and apparatus thereof
CN110958452B (en) Video decoding method and video decoder
Wei et al. Fast depth decision with enlarged coding block sizes for HEVC intra coding of 4K ultra-HD video
KR102173576B1 (en) Method for inter prediction and apparatus thereof
KR102380722B1 (en) Method for inter prediction and apparatus thereof
KR102163556B1 (en) Method and Apparatus for Detecting Scene Change of Nighttime Image in Compression Domain
Sairam et al. Fast encoding in HEVC using subsampling with unsymmetrical octagonal search pattern
Zhao et al. Mode selection algorithm in H. 264 based on macroblock merging: Xiao-Yan Zhao & Gong-Li Li

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16910950

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16910950

Country of ref document: EP

Kind code of ref document: A1