KR20190050207A

KR20190050207A - System and method for motion estimation for high-performance hevc encoder

Info

Publication number: KR20190050207A
Application number: KR1020170145514A
Authority: KR
Inventors: 류광기; 전성훈
Original assignee: 한밭대학교 산학협력단
Priority date: 2017-11-02
Filing date: 2017-11-02
Publication date: 2019-05-10
Also published as: KR102007377B1

Abstract

The present invention relates to a system and a method for estimating a motion for a high-performance encoder which can effectively reduce a hardware area and an operation time by calculating a result of the sum of absolute differences (SAD) on the whole prediction unit (PU) block by reusing an SAD operation result calculated by 4×4 unit.

Description

[0001] SYSTEM AND METHOD FOR MOTION ESTIMATION FOR HIGH-PERFORMANCE HEVC ENCODER [0002]

본 발명은 HEVC 부호기를 위한 움직임 추정 시스템 및 방법에 관한 것으로, 더욱 상세하게는 기존 전역 탐색 알고리즘에서 발생하는 SAD 연산 복잡도와 반복을 줄이기 위해 4×4 단위로 계산된 SAD 연산 결과를 모든 블록에 재사용하는 고성능 HEVC 부호기를 위한 움직임 추정 시스템 및 방법에 관한 것이다.The present invention relates to a motion estimation system and method for an HEVC encoder, and more particularly, to a motion estimation system and method for an HEVC encoder, in which a SAD calculation result calculated in 4x4 units is re- To a motion estimation system and method for a high performance HEVC encoder.

최근 UHD(초고선명, Ultra High Definition)급의 초고해상도 영상을 지원하는 다양한 영상 기기의 발전으로 인해 사용자의 고해상도 영상에 대한 관심과 수요가 증가하였다. 이러한 이유로 UHD급 영상과 같은 고해상도 영상을 지원하기 위해 새로운 비디오 압축 기술 표준의 개발이 필요하게 되었다. 이러한 흐름에 맞춰 HEVC(고효율 비디오 코딩, High Efficiency Video Coding)는 H.264/AVC의 표준화를 수행하였던ISO/IEC의 MPEC(Moving Picture Experts Group)과 ITU-T의 VCEG(Video Coding Experts Group)가 2010년 1월에 공동으로 JCT-VC(Joint Collaborative Team on Video Coding)를 결성하여 개발한 새로운 차세대 비디오 압축 표준이다. HEVC에서는 CU(코딩 단위, Coding Unit), PU(예측 단위, Prediction Unit), TU(변환 단위, Transform Unit) 3가지 종류의 부호화 단위 적용과 계층적 쿼드-트리(Quad-tree) 구조의 부호화 수행을 하며, 64×64부터 8×8까지 다양한 크기의 부호화 단위를 사용한다. 이외에도 화면 내 예측 방향의 증가, 향상된 움직임 예측 기법, 움직임 벡터 병합 및 세분화된 루프 필터를 적용하여 이전의 압축 코덱인 H.264/AVC 대비 약 2배의 압축 성능을 보이지만 다양한 부호화 구조와 향상된 예측 기법에 따른 연산 복잡도가 크게 증가하였다.Recent developments of various video equipment supporting UHD (Ultra High Definition) ultra high resolution video have increased the interest and demand of users' high resolution images. For this reason, it has become necessary to develop a new video compression technology standard to support high resolution images such as UHD class images. In accordance with this trend, HEVC (High Efficiency Video Coding) has been developed by Moving Picture Experts Group (MPEC) of ISO / IEC and VCEC (Video Coding Experts Group) of ITU-T who have conducted H.264 / AVC standardization It is a new next generation video compression standard developed jointly by Joint-Collaborative Team on Video Coding (JCT-VC) In HEVC, three kinds of coding units are applied and coding of a hierarchical quad-tree structure is performed by applying CU (Coding Unit), PU (Prediction Unit), PU (Prediction Unit) And uses encoding units of various sizes from 64x64 to 8x8. In addition, it shows about twice the compression performance compared to the previous compression codec H.264 / AVC by applying the increase in intra prediction direction, improved motion estimation, motion vector merging, and refined loop filter. However, The computational complexity is greatly increased.

새로운 기술들 중 움직임 예측은 현재 블록과 가장 유사한 예측 블록을 생성한다. 현재 PU와 참조 블록의 상관도를 비교하는 과정에서 정소화소와 부화소의 특징을 고려하여 간략화된 상관도 측정 방법인 SAD(Sum of Absolute Difference)를 사용하지만 화면 간 예측의 전역 탐색 알고리즘의 경우 4×8 PU부터 최대 64×64 PU까지 다양한 크기의 PU에 대한 SAD 연산을 반복 수행하기 때문에 연산량 및 연산 시간이 많다.Among the new techniques, motion prediction generates a prediction block that is most similar to the current block. In the process of comparing the correlation between the current PU and the reference block, Sum of Absolute Difference (SAD), which is a simplified correlation measurement method, is used in consideration of the characteristics of test pixels and sub-pixels. Since the SAD operation is repeated for PUs of various sizes ranging from × 8 PU to a maximum of 64 × 64 PU, there is a large amount of computation and computation time.

구체적으로 화면 간 예측의 움직임 추정(Motion Estimation, ME)은 인코더에서 수행되는 과정으로 참조 픽쳐에서 현재 PU와 상관도가 높은 예측 블록을 탐색하는 과정이다. 움직임 추정 수행 결과 PU 단위로 참조 픽쳐 리스트의 정보, 참조 픽쳐 인덱스, 움직임 벡터와 차분 신호를 변환 양자화 한 계수가 디코더로 전송된다. 디코더에서는 인코더로부터 전송된 주변 정보를 이용하여, 인코더와 동일한 예측 블록을 생성하고 양자화 된 잔차 신호를 사용하여 복원 블록을 생성하는 움직임 보상 과정을 수행한다. 도 1은 움직임 추정 과정을 나타낸다.Specifically, Motion Estimation (ME) of inter-picture prediction is a process performed by an encoder, and is a process of searching for a prediction block having a high degree of correlation with a current PU in a reference picture. As a result of the motion estimation, the coefficients obtained by transforming and quantizing the information of the reference picture list, the reference picture index, the motion vector and the difference signal in units of PU are transmitted to the decoder. The decoder performs a motion compensation process using the surrounding information transmitted from the encoder to generate the same prediction block as the encoder and generate the reconstruction block using the quantized residual signal. 1 shows a motion estimation process.

현재 PU와 참조 블록의 상관도를 비교하는 과정에서는 간략화된 상관도 측정 방법인 SAD(Sum Absolute Difference) 수학식 1을 사용한다.In the process of comparing the correlation between the current PU and the reference block, SAD (Sum Absolute Difference), which is a simplified correlation measurement method, is used.

여기서 B_cur는 현재 블록, B_ref는 참조 픽쳐 내에 존재하는 움직임 추정 후보 블록, i, j는 현재 PU의 위치 k, l은 움직임 추정 대상의 PU 위치를 나타낸다. 정수 화소의 움직임 추정은 현재 블록과 참조 블록의 차분값의 절대값을 최소화하는 참조 블록을 선택함으로써 하나의 예측 블록을 선택한다.Where B _cur denotes the current block, B _ref denotes the motion estimation candidate block existing in the reference picture, i, j denotes the position of the current PU, and l denotes the PU position of the motion estimation target. In motion estimation of an integer pixel, one prediction block is selected by selecting a reference block that minimizes the absolute value of the difference value between the current block and the reference block.

또한 HEVC에서는 최적의 예측 블록을 찾고자 할 때, Test Zone Search(TZS) 알고리즘과 Full-Search 알고리즘을 사용한다. Full-search 의 경우 도 2와 같이 X, Y 픽셀 좌표 (-64, -64) ~ (64, 64)까지 탐색을 반복 수행하기 때문에 탐색 성능은 뛰어나지만 탐색 범위가 커지면 연산량이 매우 커지는 단점이 있다. 반면 TZS 알고리즘은 grid search로 단계별 4개 혹은 8개씩 search point들만 계산하기 때문에 연산 속도는 빠르지만 Full-Search에 비해 예측 성능이 저하되는 단점이 있다.The HEVC also uses the Test Zone Search (TZS) algorithm and the Full-Search algorithm to find the optimal prediction block. In the full-search case, as shown in FIG. 2, the search performance is improved because the search is repeated from the X and Y pixel coordinates (-64, -64) to (64, 64). However, . On the other hand, the TZS algorithm computes only four or eight search points at each step in the grid search, so that the computation speed is fast but the prediction performance is lower than that of the full search.

따라서 본 발명에서는 고성능 HEVC 부호기를 위해 반복 연산되는 SAD 연산의 연산량 및 연산 시간을 줄이는 새로운 알고리즘을 적용한 시스템 및 방법을 제안한다.Accordingly, the present invention proposes a system and method for applying a new algorithm for reducing the computational complexity and computation time of the SAD computation that is repeatedly computed for a high performance HEVC encoder.

대한민국 공개특허공보 제10-2014-0056599호(2014.05.12)Korean Patent Laid-Open Publication No. 10-2014-0056599 (Apr. 20, 2014) 대한민국 등록특허공보 제10-1621358호(2016.05.17)Korean Registered Patent No. 10-1621358 (2016.05.17)

따라서, 본 발명은 상기한 종래 기술의 문제점을 해결하기 위해 이루어진 것으로서, 본 발명의 목적은 전역 탐색 과정에서 4×4 단위로 계산된 SAD 연산 결과를 재사용하여 전체 PU 블록에 대한 SAD 결과를 계산하고 최적의 PU 분할 블록 선택을 병렬적으로 처리하도록 설계함으로써 하드웨어 면적 및 연산 시간을 효과적으로 감소시킬 수 있는 고성능 HEVC 부호기를 위한 움직임 추정 시스템 및 방법을 제공하는데 있다.SUMMARY OF THE INVENTION Accordingly, the present invention has been made to solve the above problems occurring in the prior art, and it is an object of the present invention to provide a method and apparatus for reusing an SAD calculation result calculated in 4 × 4 units in a global search process, The present invention provides a motion estimation system and method for a high performance HEVC encoder that can effectively reduce hardware area and computation time by designing to optimally process PU split block selection in parallel.

상기와 같은 목적을 달성하기 위한 본 발명의 고성능 HEVC 부호기를 위한 움직임 추정 방법은, 2N×2N에 대해 Merge/SKIP RD-Cost를 계산하는 단계, 4×4 단위로 계산된 SAD 결과를 저장하는 단계, N×N, N×2N, 2N×N PU 모드 순서대로 메모리로부터 해당 영역만큼의 메모리 인덱스를 호출하여 결과 값을 계산하는 단계 및 AMP(비대칭 움직임 파티션, Asymmetric Motion Partition) 조건에 따라 비대칭적인 파티션에 대해 RD-cost를 계산하는 과정에서 상기 메모리로부터 해당 영역만큼의 상기 메모리 인덱스를 호출하여 최적의 PU 모드를 결정하는 단계를 포함한다.According to an aspect of the present invention, there is provided a motion estimation method for a high performance HEVC encoder, the method including: calculating a Merge / SKIP RD-Cost for 2N × 2N; storing a SAD result calculated in units of 4 × 4; (N), N × N, N × 2N, and 2N × N PU modes, calculating a result value by calling a memory index corresponding to a corresponding area from the memory, and calculating an asymmetric partition according to AMP (Asymmetric Motion Partition) And determining the optimal PU mode by calling the memory index of the corresponding area from the memory in the process of calculating the RD-cost for the RD-cost.

한편, 본 발명의 고성능 HEVC 부호기를 위한 움직임 추정 시스템은, 필요한 클록을 분주하는 CLKGen 모듈, 메모리로부터 픽셀들을 입력받기 위한 MemCtrl 모듈, 입력받은 상기 픽셀에 대해 4×4 블록 단위로 SAD를 계산한 값을 저장하는 TComRDCost 모듈 및 4×4 블록 단위로 계산된 상기 SAD 값을 이용하여 모든 PU 블록 분할에 대하여 SAD 연산 결과를 구하는 TEnSearch 모듈을 포함한다.Meanwhile, a motion estimation system for a high performance HEVC encoder according to the present invention includes a CLKGen module for dividing a required clock, a MemCtrl module for receiving pixels from a memory, a SAD calculation unit for a 4x4 block unit And a TEnSearch module for obtaining a SAD calculation result for all PU block segmentation using the SAD value calculated in units of 4x4 blocks.

상술한 바와 같이, 본 발명에 의한 고성능 HEVC 부호기를 위한 움직임 추정 시스템 및 방법은 다음과 같은 효과를 제공한다.As described above, the motion estimation system and method for a high performance HEVC encoder according to the present invention provides the following effects.

전역 탐색 과정에서 4×4 단위로 계산된 SAD 연산 결과를 재사용하여 전체 PU 블록에 대한 SAD 결과를 계산하고 최적의 PU 분할 블록 선택을 병렬적으로 처리하도록 설계함으로써 하드웨어 면적 및 연산 시간을 효과적으로 감소시킬 수 있다.In the global search process, the SAD computation result in 4 × 4 units is reused to calculate the SAD result for the entire PU block and to optimally process the PU split block selection in parallel, effectively reducing the hardware area and computation time .

도 1은 종래 HEVC의 움직임 추정 프로세스를 나타낸다.
도 2는 종래 HEVC에서의 64×64 PU 블록 전역 탐색 기반의 탐색 범위를 나타낸다.
도 3은 본 발명의 바람직한 실시예에 따른 고성능 HEVC 부호기를 위한 움직임 추정 알고리즘에서 블록을 재사용하는 방법을 나타낸다.
도 4는 본 발명의 바람직한 실시예에 따른 고성능 HEVC 부호기를 위한 움직임 추정 알고리즘에서 블록을 재사용하기 위해 제안되는 화면 간 예측의 전체 흐름을 나타낸다.
도 5는 본 발명의 바람직한 실시예에 따른 움직임 추정 시스템의 전체적인 블록도를 나타낸다.
도 6은 본 발명의 바람직한 실시예에 따른 움직임 추정 시스템의 TComRDCost 모듈의 내부 구조를 나타낸다.
도 7은 본 발명의 바람직한 실시예에 따른 움직임 추정 시스템의 TEnSearch 모듈의 내부 구조를 나타낸다.
도 8은 본 발명의 바람직한 실시예에 따른 하드웨어 구조의 검증 방법을 나타낸다.1 shows a motion estimation process of a conventional HEVC.
FIG. 2 shows a search range based on a 64 × 64 PU block global search in a conventional HEVC.
FIG. 3 illustrates a method for reusing blocks in a motion estimation algorithm for a high performance HEVC encoder according to a preferred embodiment of the present invention.
FIG. 4 shows an overall flow of inter-picture prediction that is proposed for reusing blocks in a motion estimation algorithm for a high performance HEVC encoder according to a preferred embodiment of the present invention.
5 shows an overall block diagram of a motion estimation system in accordance with a preferred embodiment of the present invention.
6 shows an internal structure of a TComRDCost module of a motion estimation system according to a preferred embodiment of the present invention.
7 shows an internal structure of a TEnSearch module of a motion estimation system according to a preferred embodiment of the present invention.
FIG. 8 shows a verification method of a hardware structure according to a preferred embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

이하 본 발명의 고성능 HEVC 부호기를 위한 움직임 추정 시스템 및 방법에 대하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, a motion estimation system and method for a high performance HEVC encoder according to the present invention will be described in detail with reference to the accompanying drawings.

도 3은 본 발명의 바람직한 실시예에 따른 고성능 HEVC 부호기를 위한 움직임 추정 알고리즘에서 블록을 재사용하는 방법을 나타낸다.FIG. 3 illustrates a method for reusing blocks in a motion estimation algorithm for a high performance HEVC encoder according to a preferred embodiment of the present invention.

본 발명에서는 Full-Search 알고리즘에서의 연산량을 효과적으로 줄이기 위해 4×4 단위의 SAD 연산 결과를 재사용하여 전체 PU 블록에 대한 SAD 결과를 계산하는 알고리즘을 제안한다. 도 3과 같이 4개의 4×4 블록의 SAD 결과 값의 합은 8×8블록의 SAD 결과 값과 같으며, 8×8 블록 4개의 값은 1개의 16×16 블록의 SAD 결과 값과 같다. 따라서 4×4 단위 SAD 결과 값을 저장하고 필요한 영역(PU 분할 크기) 만큼의 값만 호출하여 값을 더해줌으로써 반복되는 SAD 연산횟수를 효과적으로 줄이고 모든 PU 분할에 대해 SAD 값을 구할 수 있다.In order to effectively reduce the amount of computation in the full-search algorithm, the present invention proposes an algorithm for calculating the SAD result for the entire PU block by reusing the result of the 4 × 4 SAD operation. As shown in FIG. 3, the sum of the SAD values of the 4 × 4 blocks is the same as the SAD value of the 8 × 8 block, and the values of the 4 × 8 blocks are the same as the SAD values of one 16 × 16 block. Therefore, it is possible to reduce the number of iterative SAD operations effectively and to obtain the SAD value for all the PU partitions by storing the 4 × 4 unit SAD result value and calling only the value of the required area (PU division size) and adding the value.

도 4는 본 발명의 바람직한 실시예에 따른 고성능 HEVC 부호기를 위한 움직임 추정 알고리즘에서 블록을 재사용하기 위해 제안되는 화면 간 예측의 전체 흐름을 나타낸다.FIG. 4 shows an overall flow of inter-picture prediction that is proposed for reusing blocks in a motion estimation algorithm for a high performance HEVC encoder according to a preferred embodiment of the present invention.

도 4를 참조하면, 단계 S410에서 2N×2N에 대해 Merge/SKIP RD-Cost를 계산한다. 이후 2N×2N, N×N, N×2N, 2N×N 등의 PU 모드 순서대로 RD-Cost를 계산하는 대신, 2N×2N SAD 연산을 수행할 때 Inter 2N×2N 모드를 수행하고 4×4 단위로 계산된 SAD 결과를 메모리에 저장한 뒤(단계 S420), 차례로 N×N, N×2N, 2N×N PU 모드 순서대로 메모리로부터 해당 영역만큼의 메모리 인덱스(memory index)를 호출하여 결과 값을 계산한다(단계 S431 내지 S433). 마지막으로 AMP(비대칭 움직임 파티션, Asymmetric Motion Partition) 조건에 따라(단계 S440) 비대칭적인 파티션에 대해(단계 S440의 예) RD-cost를 계산하는 과정에서도 메모리로부터 해당 영역만큼의 메모리 인덱스를 호출하여 최적의 PU 모드를 결정한다(단계 S451 내지 S454).Referring to Fig. 4, in step S410, the Merge / SKIP RD-Cost is calculated for 2N x 2N. Instead of calculating RD-Cost in the order of PU mode such as 2N × 2N, N × N, N × 2N and 2N × N, Inter 2N × 2N mode is performed when 2N × 2N SAD operation is performed and 4 × 4 (Step S420). Then, a memory index corresponding to the area is sequentially called from the memory in order of the NxN, Nx2N, and 2NxN PU modes, and the result value (Steps S431 to S433). Finally, in the asymmetric partitions (step S440), according to the AMP (Asymmetric Motion Partition) condition, the memory index of the corresponding area is called from the memory in the process of calculating the RD-cost, (Steps S451 to S454).

실험예 1Experimental Example 1

본 발명에서 제안하는 SAD 알고리즘의 성능 향상을 비교하기 위해서 HEVC 참조 소프트웨어인 HM 16.12을 이용하여 모든 클래스 영상에 대해 검증하였다. 성능 평가 지표는 4가지 QP(22, 27, 32, 37)에 대한 BD-PSNR과 BD-Bitrate를 사용하였다. 표 1은 실험결과를 나타낸다.In order to compare the performance improvement of the SAD algorithm proposed in the present invention, all class images were verified using HM 16.12, the HEVC reference software. Performance evaluation index used BD-PSNR and BD-Bitrate for 4 QP (22, 27, 32, 37). Table 1 shows the experimental results.

[4], [5]의 성능 비교를 진행한 결과 제안하는 SAD 연산 알고리즘은 표준 소프트웨어 대비 평균적으로 61% 인코딩 속도가 향상되었으며 수학식 2와 같이 계산하였다.As a result of comparing the performance of [4] and [5], the proposed SAD algorithm improved the encoding rate by 61% on average compared to the standard software and calculated as Equation 2.

여기서 TS는 인코딩 속도이고, TS_HM은 참조 소프트웨어인 HM 16.12의 TS, TS_propose는 본 발명의 TS이다.Where TS is the encoding speed, TS _HM is the TS of the HM 16.12 reference software, and TS _propose is the TS of the present invention.

그러면, 여기서 상기와 같이 제안된 방법을 이용한 본 발명의 고성능 HEVC 부호기를 위한 움직임 추정 시스템에 대해 설명하기로 한다.Hereinafter, a motion estimation system for a high performance HEVC encoder of the present invention using the above-described method will be described.

본 발명은 32×32 화소 단위로 SAD 연산을 수행할 때 발생하는 높은 연산량과 연산 시간을 감소시키고 하드웨어 면적을 최소화하기 위해, 4×4 연산기의 병렬적 구조를 사용한다.The present invention uses a parallel architecture of 4 × 4 arithmetic units to reduce the amount of computation and computation time that occur when SAD computation is performed in 32 × 32 pixel units and to minimize the hardware area.

기존 움직임 추정 SAD 연산기의 구조는 top-down 방식으로 최적의 SAD값을 계산하여 PU블록 분할을 결정하기 위해 LCU(Largest CU) 블록을 표 2와 같이 최대 3-depth까지 하위 블록으로 분할하여 반복 수행한다. 본 발명의 바람직한 실시예에 따른 움직임 추정 시스템의 구조는 bottom-up 방식으로 최하위 depth(심도)를 먼저 연산하여 상위 depth까지 이전에 계산된 연산 결과를 재사용하는 구조로 설계하였다.The structure of the existing motion estimation SAD calculator is divided into sub-blocks up to 3-depth as shown in Table 2 to determine the optimal SAD value by top-down method and determine the PU block division. do. The structure of the motion estimation system according to the preferred embodiment of the present invention is designed such that the lowest depth is first calculated in a bottom-up manner and the previously computed results are reused up to a higher depth.

최하위 depth인 4×4부터 SAD 연산을 수행하여, 4×4의 SAD 결과들을 8×8에서 사용하는 bottom-up 방식으로 적용한 결과 top-down 방식일 때는 32×32 블록을 처리를 해야 하므로 1024개의 픽셀을 처리하는 반면, bottom-up 방식은 4×4 블록을 병렬적으로 처리하기 때문에 16개의 픽셀만 처리하면 된다. 따라서 본 발명에 따르면 하드웨어 크기와 연산량 및 연산 시간이 크게 줄어든다.As a bottom-up method using 4 × 4 SAD results from 4 × 4, which is the lowest depth, and using 8 × 8 SAD results, it is necessary to process 32 × 32 blocks in the top-down method. While the bottom-up method processes only 16 pixels because it processes 4 × 4 blocks in parallel. Therefore, according to the present invention, hardware size, computation amount, and computation time are greatly reduced.

도 5는 본 발명의 바람직한 실시예에 따른 움직임 추정 시스템의 전체적인 블록도를 나타낸다. 본 발명의 바람직한 실시예에 따른 움직임 추정 시스템의 전체 구조는 필요한 클록을 분주하는 CLKGen 모듈, 메모리로부터 픽셀들을 입력받기 위한 MemCtrl 모듈, 입력받은 픽셀 정보를 4×4 블록 단위로 SAD를 계산하여 값을 저장하는 TComRDCost 모듈, 4×4 블록 단위로 SAD 값이 계산된 정보를 이용하여 모든 PU 블록 분할에 대하여 SAD 연산 결과를 구하는 TEnSearch 모듈로 구성된다.5 shows an overall block diagram of a motion estimation system in accordance with a preferred embodiment of the present invention. The entire structure of the motion estimation system according to the preferred embodiment of the present invention includes a CLKGen module for dividing a required clock, a MemCtrl module for receiving pixels from a memory, a SAD calculation unit for calculating a SAD in units of 4x4 blocks, A TComRDCost module for storing the SAD calculation result, and a TEnSearch module for obtaining the SAD calculation result for all the PU block division using the SAD value calculation information in units of 4 × 4 blocks.

도 6은 본 발명의 바람직한 실시예에 따른 움직임 추정 시스템의 TComRDCost 모듈의 내부 구조를 나타낸다. 본 발명의 바람직한 실시예에 따른 움직임 추정 하드웨어 구조의 TComRDCost 모듈은 4×4 블록 단위로 SAD 연산을 수행하며 LCU 크기 32×32를 4×4 단위로 병렬 처리하기 위해서 64개의 연산기로 구성된다. 각 SAD Computing 모듈에서는 현재 블록 4×4 픽셀, 참조 블록 4×4 픽셀 32개 값에 대하여 각각 abs(절대값, Absolute) 연산을 수행한 뒤 SAD 결과 값을 출력한다.6 shows an internal structure of a TComRDCost module of a motion estimation system according to a preferred embodiment of the present invention. The TComRDCost module of the motion estimation hardware structure according to the preferred embodiment of the present invention performs the SAD operation in units of 4 × 4 blocks and is composed of 64 operators in order to parallelize the LCU size 32 × 32 in units of 4 × 4. Each SAD Computing module performs an abs (absolute value, absolute) operation on the current block 4 × 4 pixels and the reference block 4 × 4 pixels respectively, and outputs the SAD result value.

도 7은 본 발명의 바람직한 실시예에 따른 움직임 추정 시스템의 TEnSearch 모듈의 내부 구조를 나타낸다. 본 발명의 바람직한 실시예에 따른 움직임 추정 하드웨어 구조의 TEnSearch 모듈은 TComRDCost 모듈로부터 입력받은 4×4 단위 SAD 값을 이용하여 bottom-up 방식으로 32×32 블록에 대한 모든 분할 PU에 대하여 SAD 결과를 계산한다. 입력받은 64개의 4×4 SAD 연산 결과를 이용하여 각 depth에 따른 SAD 결과를 계산한다. Depth_3블록에서 8×4, 4×8, 8×8 블록에 대한 SAD 값을 계산한다. 이후 Depth_3의 값은 Depth_2에서 Depth_2의 값은 Depth_1의 값을 계산하는데 재사용되어 bottom-up 계산을 한다. 움직임 추정 알고리즘에서 최적의 PU 블록 분할을 결정하고 움직임 벡터를 추정할 때 최소값의 SAD 결과를 갖는 PU 블록을 결정하여 움직임을 추정하기 때문에 Save_min 블록에서 각 블록에 대하여 최소 SAD 연산 결과를 갖는 값만 SAD_Result로 출력된다.7 shows an internal structure of a TEnSearch module of a motion estimation system according to a preferred embodiment of the present invention. The TEnSearch module of the motion estimation hardware structure according to the preferred embodiment of the present invention computes the SAD result for all the divided PUs for the 32 × 32 block in the bottom-up method using the 4 × 4 unit SAD value received from the TComRDCost module do. The SAD result for each depth is calculated using the results of 64 input 4 × 4 SAD operations. The SAD values for the 8x4, 4x8, and 8x8 blocks are calculated in the Depth_3 block. Then, the value of Depth_3 is reused to calculate the value of Depth_2 and the value of Depth_2 to calculate the bottom-up value. Since motion estimation is performed by determining a PU block having the minimum SAD result when determining the optimal PU block division in the motion estimation algorithm and estimating a motion vector, only the value having the minimum SAD calculation result for each block in the Save_min block is determined as SAD_Result .

실험예 2Experimental Example 2

본 발명의 바람직한 실시예에 따른 하드웨어 구조의 검증 방법은 도 8과 같이 HEVC 참조 소프트웨어인 HM 16.12에서 32×32 블록의 픽셀값을 추출하여 HM 16.12와 본 발명의 바람직한 실시예에 따른 하드웨어 입력을 통해 결과값을 비교하였다.The hardware structure verification method according to the preferred embodiment of the present invention extracts 32 × 32 block pixel values from HM 16.12, which is the HEVC reference software, as shown in FIG. 8, and outputs HM 16.12 and hardware inputs according to the preferred embodiment of the present invention The results were compared.

65nm CMOS 공정 라이브러리로 합성한 결과 본 발명의 바람직한 실시예에 따른 움직임 추정 알고리즘은 표 3과 같이 최대 동작 주파수는 255 MHz, 총 게이트 수는 65.1K로 나타났다.As a result of the synthesis with the 65nm CMOS process library, the maximum operation frequency and the total number of gates of the motion estimation algorithm according to the preferred embodiment of the present invention are 255 MHz and 65.1K, respectively, as shown in Table 3.

이상에서 몇 가지 실시예를 들어 본 발명을 더욱 상세하게 설명하였으나, 본 발명은 반드시 이러한 실시예로 국한되는 것은 아니고 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변형실시될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the present invention is not limited to the disclosed exemplary embodiments, but various modifications may be made without departing from the spirit of the invention.

Claims

A motion estimation method for a high performance HEVC encoder that reuses the sum of absolute difference (SAD) computation results calculated in 4 × 4 units to calculate the SAD result for the entire PU (Prediction Unit) block.

In claim 1,
A Motion Estimation Method for High Performance HEVC Encoder Processing Optimal PU Partition Block Selection in Parallel.

[Claim 2]
A motion estimation method for a high performance HEVC encoder that stores the SAD operation result calculated in the 4 × 4 unit and calls only the SAD operation result value by the PU division size.

Calculating a Merge / SKIP RD-Cost for 2N x 2N,
Storing the SAD result calculated in 4 4 units,
N × N, N × 2N, and 2N × N PU memory indexes of the corresponding area from the memory in the order of the mode,
Determining an optimal PU mode by calling the memory index of the corresponding area from the memory in the process of calculating an RD-cost for an asymmetric partition according to an AMP (Asymmetric Motion Partition) condition, A Motion Estimation Method for High Performance HEVC Encoder.

A motion estimation system for a high performance HEVC encoder that reuses SAD computation results in 4 × 4 units and computes SAD results for the entire PU block.

In claim 5,
A Motion Estimation System for High Performance HEVC Encoder Processing Optimal PU Partition Block Selection in Parallel.

In claim 5 or 6,
A motion estimation system for a high performance HEVC encoder that stores the SAD operation result calculated in the 4 × 4 unit and calls only the SAD operation result value of the PU division size.

In claim 5,
A motion estimation system for a high performance HEVC encoder that first computes an optimal SAD value for the lowest depth and reuses the previously computed results up to the highest depth.

A CLKGen module that schedules the required clock,
A MemCtrl module for receiving pixels from the memory,
A TComRDCost module for storing a value obtained by calculating a SAD in units of 4x4 blocks with respect to the input pixel;
And a TEnSearch module for obtaining a SAD calculation result for all PU block segmentation using the SAD value calculated in 4x4 block units.

In claim 9,
The TComRDCost module performs a SAD operation on a 4x4 block basis and performs a motion estimation for a high performance HEVC encoder composed of a number of SAD operators for parallel processing the LCU (Largest Coding Unit) size in 4x4 units system.

In claim 10,
Wherein the SAD operator performs an absolute value operation on each of the 32 values of the current block 4x4 pixels and the reference block 4x4 pixels and outputs the SAD result value.

10. The method according to any one of claims 9 to 11,
The TEnSearch module calculates a SAD result according to each depth in a bottom-up manner using the 4 × 4 unit SAD value received from the TComRDCost module and calculates a SAD result for all divided PUs for a 32 × 32 block Motion estimation system for HEVC encoder.