KR101182037B1

KR101182037B1 - Apparatus and method for high precision motion estimation in video signal

Info

Publication number: KR101182037B1
Application number: KR1020100101397A
Authority: KR
Inventors: 김병규; 김효성; 박찬섭; 정현기
Original assignee: 선문대학교 산학협력단
Priority date: 2010-10-18
Filing date: 2010-10-18
Publication date: 2012-09-11
Also published as: KR20120039936A

Abstract

비디오 신호의 고정밀 움직임 추정 장치 및 그 방법이 개시된다.
비디오 인코더는 현재 프레임 내 매크로블록의 위치에 상응하는 이전 프레임 내 탐색범위 안의 후보블록들과, 탐색범위 내 후보블록들을 스케일링한 멀티 스케일의 후보블록들을 탐색하고, 매크로블록을 탐색된 후보블록들과 정합하여 해당 매크로블록과의 상관성이 가장 높은 최고 상관 후보블록을 추정한다. 해당 비디오 인코더는 움직임 추정 과정에서 각각의 후보블록을 확대, 축소된 상태로 탐색하면서 율-왜곡 비용을 계산할 수 있도록 스케일링을 통해 변형(확대 또는 축소)된 후보블록들을 정규화한 후 최고 상관 후보블록을 추정하는 멀티 스케일 움직임 추정의 구조를 채용하며, 최고 상관 후보블록의 움직임 벡터 및 비율 벡터를 저장한다.
이러한 구성에 따르면, 비디오 코덱에서 화질 및 압축률에 큰 영향을 미치는 화면 간 예측의 움직임 추정 과정을 보다 정밀하게 개선할 수 있으며, 비디오 인코딩 과정에서 발생하는 잔차 데이터의 양자화로 인한 화질의 손실을 줄임과 동시에 보다 높은 압축률을 얻어낼 수 있다.Disclosed are a high precision motion estimation apparatus and method for a video signal.
The video encoder searches for candidate blocks in the search range in the previous frame corresponding to the position of the macroblock in the current frame, multi-scale candidate blocks in which the candidate blocks in the search range are scaled, and compares the macroblock with the searched candidate blocks. By matching, the highest correlation candidate block having the highest correlation with the macroblock is estimated. The video encoder normalizes the transformed (expanded or reduced) candidate blocks by scaling to calculate the rate-distortion cost while searching each candidate block in a zoomed-out and reduced state during motion estimation, and then selects the highest correlation candidate block. A structure of estimating multi-scale motion estimation is adopted, and a motion vector and a ratio vector of the highest correlation candidate block are stored.
According to this configuration, it is possible to more precisely improve the motion estimation process of inter-screen prediction, which greatly affects the image quality and compression ratio in the video codec, and to reduce the loss of image quality due to quantization of residual data generated during video encoding. At the same time, a higher compression ratio can be obtained.

Description

High precision motion estimation apparatus and method thereof for video signal {APPARATUS AND METHOD FOR HIGH PRECISION MOTION ESTIMATION IN VIDEO SIGNAL}

본 발명은 비디오 신호의 고정밀 움직임 추정 장치 및 그 방법에 관한 것으로, 더욱 상세하게는 비디오 신호를 압축하는 과정에서의 움직임 추정(motion estimation)을 개선한 인코딩 기법에 관한 것이다.
The present invention relates to an apparatus and method for high precision motion estimation of a video signal, and more particularly, to an encoding technique for improving motion estimation in a process of compressing a video signal.

MPEG/H.26x 기반의 비디오 인코더에서 블록 기반의 비디오 신호가 압축되는 과정을 설명하면 다음과 같다.The process of compressing a block-based video signal in an MPEG / H.26x-based video encoder is as follows.

비디오 신호의 각각의 프레임은 슬라이스(slice) 단위로 나누어지고, 슬라이스는 다시 16x16 사이즈의 매크로블록(macroblock) 단위로 나누어진다. MPEG/H.26x 기반의 최신 코덱인 H264/AVC의 경우, 매크로블록이 I 슬라이스에 포함된 블록이면 화면 내 예측을 수행하고, P 또는 B 슬라이스에 포함된 블록일 경우에는 화면 내 예측과 화면 간 예측을 모두 수행한다. 이를 통해 생성되는 잔차 데이터(residual data)와 모드 정보, 움직임 벡터 등을 주파수 변환, 양자화, 엔트로피 부호화의 과정을 거쳐 압축된 데이터 스트림으로 만든다.Each frame of the video signal is divided into slice units, and the slice is divided into macroblock units of 16 × 16 size. H264 / AVC, the latest MPEG / H.26x-based codec, performs intra prediction when a macroblock is a block included in an I slice, and performs intra screen prediction and inter screen prediction when the block is included in a P or B slice. Perform all predictions. Residual data, mode information, motion vectors, etc., generated through this process are transformed into compressed data streams through frequency conversion, quantization, and entropy encoding.

잔차 데이터를 주파수 변환 후 양자화하는 과정에서 정보의 손실이 발생하는데 이는 실제 비디오 영상에서 화질의 손실로 이어진다. 또한, 엔트로피 부호화를 통해 실제적으로 압축이 이루어지는 대상은 잔차 데이터이므로, 화면 내 예측과 화면 간 예측을 보다 정밀하게 개선하는 것을 통해 잔차 데이터를 줄임으로써 압축 효율을 높일 수 있다.Loss of information occurs during quantization of residual data after frequency conversion, which leads to loss of image quality in the actual video image. In addition, since the target that is actually compressed through entropy encoding is the residual data, the compression efficiency can be increased by reducing the residual data by more precisely improving the intra prediction and the inter prediction.

도 1은 종래 기술에 따른 비디오 인코더에서 움직임 추정을 정밀화하기 위해 사용하는 쿼터픽셀 움직임 추정을 도시한 것으로, MPEG/H.26x 기반의 최신 코덱인 H.264/AVC에서 도입된 움직임 추정을 예시하고 있다.FIG. 1 illustrates a quarter pixel motion estimation used to refine motion estimation in a video encoder according to the prior art, and illustrates the motion estimation introduced in H.264 / AVC, the latest codec based on MPEG / H.26x. have.

움직임 추정(motion estimation)이란 현재 프레임의 매크로블록을 이용한 정합(matching) 과정을 통해 이전 프레임에서 가장 상관도가 높은 영역을 찾아내는 과정이다. 일반적인 움직임 추정을 수행하면 정수픽셀 위치에서의 최고 상관 블록을 가리키는 움직임 벡터를 찾아낸다. 쿼터픽셀 위치에서의 움직임 추정은 최고 상관 정수픽셀의 주위 픽셀들을 보간하여 최고 상관 하프픽셀을 구하고, 다시 주위의 픽셀들을 보간하여 최고 상관 쿼터픽셀의 위치를 찾아내는 기법이다. Motion estimation is a process of finding the region with the highest correlation in the previous frame through a matching process using a macroblock of the current frame. General motion estimation finds a motion vector that points to the highest correlation block at integer pixel locations. The motion estimation at the quarter pixel position is a technique of interpolating surrounding pixels of the highest correlation integer pixel to obtain the highest correlation half pixel, and again interpolating the surrounding pixels to find the position of the highest correlation quarter pixel.

도 1의 방법은 움직임 추정의 정밀성을 향상시키는데 효과적인 방법임에는 분명하나, 영상 시퀀스 내의 객체의 수직적, 수평적 움직임만을 추정할 수 있다는 한계가 있다. 그러나, 실제 영상 시퀀스 상의 객체들은 이러한 평면적인 움직임뿐만 아니라 줌인(zoom-in), 줌아웃(zoom-out)과 같은 3차원 움직임이 빈번히 일어나며, 종래의 기법으로는 이러한 움직임을 추정할 수 없다.
Although the method of FIG. 1 is an effective method for improving the precision of motion estimation, there is a limitation that only vertical and horizontal motions of objects in an image sequence can be estimated. However, the objects on the actual image sequence frequently have three-dimensional movements such as zoom-in and zoom-out as well as such planar movements, and such motions cannot be estimated by conventional techniques.

본 발명은 상술한 바와 같은 종래 기술의 문제점을 해결하기 위하여 제안된 것으로, 그 목적은 비디오 인코더에서 화면 간 예측을 위해 수행되는 움직임 추정 과정의 정밀성 향상을 통해 비디오 인코더의 화질과 압축률을 향상시키고자 함에 있다. The present invention has been proposed to solve the problems of the prior art as described above, and an object thereof is to improve the image quality and compression ratio of a video encoder by improving the precision of a motion estimation process performed for inter prediction in a video encoder. It's in the ship.

본 발명이 이루고자 하는 기술적 과제는 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not intended to limit the invention to the precise form disclosed. There will be.

본 발명에 따른 비디오 신호의 고정밀 움직임 추정 장치는 현재 프레임 내 매크로블록의 위치에 상응하는 이전 프레임 내 탐색범위 안의 후보블록들과, 상기 탐색범위 안의 후보블록들을 스케일링한 멀티 스케일의 후보블록들을 탐색하는 후보블록 탐색부; 정합을 통해 상기 탐색된 후보블록들로부터 상기 매크로블록과 상관성이 가장 높은 최고 상관 후보블록을 추정하는 정합 수행부; 및 상기 정합 수행부가 상기 매크로블록과 정규화된 후보블록들을 비교하여 정합을 수행할 수 있도록 상기 멀티 스케일의 후보블록들을 상기 매크로블록의 사이즈로 정규화하여 제공하는 블록 정규화부를 포함한다.The high-precision motion estimation apparatus of a video signal according to the present invention searches for candidate blocks in a search range in a previous frame corresponding to positions of macroblocks in a current frame and multi-scale candidate blocks in which the candidate blocks in the search range are scaled. A candidate block search unit; A matching execution unit for estimating the highest correlation candidate block having the highest correlation with the macroblock from the searched candidate blocks through matching; And a block normalization unit for normalizing the multi-scale candidate blocks to the size of the macroblock so that the matching execution unit compares the macroblock with normalized candidate blocks and performs matching.

본 발명에 따른 비디오 신호의 고정밀 움직임 추정 방법은 비디오 인코더가 현재 프레임의 매크로블록을 지정하는 과정(a); 상기 비디오 인코더가 상기 현재 프레임 내 매크로블록의 위치에 상응하는 이전 프레임 내 탐색범위 안의 후보블록들과, 상기 탐색범위 내 후보블록들을 스케일링한 멀티 스케일의 후보블록들을 탐색하는 과정(b); 및 상기 비디오 인코더가 정합을 통해 상기 탐색된 후보블록들로부터 상기 매크로블록과 상관성이 가장 높은 최고 상관 후보블록을 추정하는 과정(c)을 포함한다. 상기 비디오 인코더는 (c)의 정합 과정에서 상기 탐색된 후보블록들을 상기 매크로블록의 사이즈로 정규화하여 상기 매크로블록과 상기 정규화된 후보블록들을 정합한다.
The method of high precision motion estimation of a video signal according to the present invention includes the steps of: a video encoder designating a macroblock of a current frame; (B) the video encoder searching for candidate blocks in a search range in a previous frame corresponding to positions of macroblocks in the current frame and multi-scale candidate blocks in which candidate blocks in the search range are scaled; And (c) the video encoder estimating a highest correlation candidate block having the highest correlation with the macroblock from the searched candidate blocks through matching. The video encoder normalizes the searched candidate blocks to the size of the macroblock in the matching process of (c) to match the macroblock and the normalized candidate blocks.

본 발명의 비디오 신호의 고정밀 움직임 추정 장치 및 그 방법에 따르면, 비디오 코덱에서 화질 및 압축률에 큰 영향을 미치는 화면 간 예측의 움직임 추정 과정을 보다 정밀하게 개선할 수 있다.According to the high-precision motion estimation apparatus and method of the video signal of the present invention, it is possible to more accurately improve the motion estimation process of inter-screen prediction, which greatly affects the image quality and the compression ratio in the video codec.

또한, 본 발명의 비디오 신호의 고정밀 움직임 추정 장치 및 그 방법에 따르면, 비디오 신호의 압축에 있어서 영상 시퀀스의 확대와 축소 움직임을 반영한 움직임 추정 기법을 사용함으로써 비디오 신호의 잔차 데이터를 효율적으로 줄일 수 있다. 이로 인해, 비디오 인코딩 과정에서 발생하는 잔차 데이터의 양자화로 인한 화질의 손실을 줄임과 동시에 보다 높은 압축률을 얻어낼 수 있다.In addition, according to the high-precision motion estimation apparatus and method of the video signal of the present invention, the residual data of the video signal can be efficiently reduced by using a motion estimation technique that reflects the expansion and contraction motion of the video sequence in the compression of the video signal. . As a result, a higher compression ratio can be obtained while reducing image quality loss due to quantization of residual data generated in the video encoding process.

이러한 효과는 최대 8K(7680x4320)급 해상도를 요구하는 차세대 Ultra HD 방송 서비스 및 비디오 디스크 등에 응용될 것으로 기대된다. 특히, H.265 등의 차세대 고효율 비디오 압축 코덱의 새로운 기법으로써 효과가 클 것으로 기대된다.
This effect is expected to be applied to the next generation Ultra HD broadcasting service and video disc requiring up to 8K (7680x4320) resolution. In particular, it is expected to be effective as a new technique of next-generation high efficiency video compression codec such as H.265.

도 1은 종래 기술에 따른 비디오 인코더에서 움직임 추정을 정밀화하기 위해 사용하는 쿼터픽셀 움직임 추정을 도시한 것이다.
도 2는 본 발명의 일 실시예에 따른 비디오 인코더의 구성도이다.
도 3은 본 발명의 일 실시예에서 움직임 추정을 위한 매크로블록과 서브 매크로블록의 파티션 종류를 도시한 것이다.
도 4는 본 발명의 일 실시예에서 화면 간 예측부의 세부 구성을 도시한 것이다.
도 5는 본 발명의 일 실시예에서 영상 시퀀스의 확대, 축소를 움직임 추정에 반영하기 위해 사용하는 멀티 스케일 움직임 추정을 도시한 것이다.
도 6은 본 발명의 일 실시예에서 확대, 축소된 후보블록의 정규화를 도시한 것이다.
도 7은 본 발명의 일 실시예에서 멀티 스케일의 샘플링 기법을 도시한 것이다.
도 8은 본 발명의 일 실시예에서 도 7의 샘플링 기법을 2차원으로 확장한 형태를 도시한 것이다.
도 9는 본 발명의 일 실시예에 따른 비디오 신호의 고정밀 움직임 추정 방법을 도시한 흐름도이다.FIG. 1 illustrates quarter pixel motion estimation used in a video encoder according to the prior art to refine motion estimation.
2 is a block diagram of a video encoder according to an embodiment of the present invention.
3 illustrates partition types of macroblocks and sub-macroblocks for motion estimation in an embodiment of the present invention.
4 illustrates a detailed configuration of an inter prediction unit in an embodiment of the present invention.
FIG. 5 is a diagram illustrating multi-scale motion estimation used to reflect expansion and contraction of an image sequence in motion estimation in an embodiment of the present invention.
6 illustrates normalization of enlarged and reduced candidate blocks in an embodiment of the present invention.
7 illustrates a multi-scale sampling technique in one embodiment of the present invention.
FIG. 8 illustrates a form in which the sampling technique of FIG. 7 is extended in two dimensions in an embodiment of the present invention.
9 is a flowchart illustrating a high precision motion estimation method of a video signal according to an embodiment of the present invention.

이하에서는 첨부한 도면을 참조하여 본 발명의 바람직한 실시예에 따른 비디오 신호의 고정밀 움직임 추정 장치 및 그 방법에 대해서 상세하게 설명한다.Hereinafter, a high precision motion estimation apparatus and method thereof for a video signal according to an exemplary embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 일 실시예에 따른 비디오 신호의 고정밀 움직임 추정 장치로서, MPEG/H.26x 기반의 비디오 인코더 구조를 예시한 것이다.FIG. 2 illustrates an MPEG / H.26x based video encoder structure as an apparatus for high precision motion estimation of a video signal according to an embodiment of the present invention.

도 2를 참조하여 입력 비디오 신호를 압축하여 압축된 데이터 스트림으로 만드는 과정을 살펴보면 다음과 같다.A process of compressing an input video signal into a compressed data stream with reference to FIG. 2 will now be described.

블록 단위화부(100)는 입력 비디오 신호의 매 프레임을 16x16 사이즈의 매크로블록 단위로 분할한다. 분할된 매크로블록이 I 슬라이스일 경우, 비디오 인코더는 화면 내 예측부(110)를 통해 화면 내 예측을 하고, 그 결과인 잔차 영상(residual image), 화면 내 예측의 모드 정보, 움직임 벡터 등을 주파수 변환부(130)를 거쳐 변환한다. 변환된 계수는 양자화부(140)를 거쳐 양자화된다. 또한, 비디오 인코더는 양자화된 결과를 엔트로피 부호화부(150)를 통해 부호화한 후 압축된 데이터 스트림으로 생성한다.The block unit unit 100 divides each frame of the input video signal into macroblock units of 16 × 16 size. When the segmented macroblock is an I slice, the video encoder performs intra prediction through the intra prediction unit 110 and frequency-resolves the resulting residual image, mode information of intra prediction, motion vector, and the like. The conversion is performed through the conversion unit 130. The transformed coefficient is quantized through the quantization unit 140. In addition, the video encoder encodes the quantized result through the entropy encoder 150 and generates a compressed data stream.

분할된 매크로블록이 P 슬라이스 또는 B 슬라이스일 경우, 비디오 인코더는 역 양자화부(160)와 역 주파수 변환부(170)을 거쳐 원 영상으로 복원된 이전 프레임의 복원 영상과 현재 프레임의 영상 간의 화면 간 예측부(200)를 통해 화면 간 예측을 수행한다.If the segmented macroblock is a P slice or a B slice, the video encoder performs a process between the reconstruction image of the previous frame reconstructed as the original image through the inverse quantization unit 160 and the inverse frequency converter 170 and the image between the image of the current frame. The prediction unit 200 performs inter prediction.

특히, 화면 간 예측부(200)는 영상 시퀀스의 확대, 축소를 움직임 추정에 반영하기 위하여, 움직임 추정 과정에서 각각의 후보블록을 확대, 축소된 상태로 탐색하면서 율-왜곡 비용(rate-distortion cost)을 계산할 수 있도록 스케일링을 통해 변형(확대 또는 축소)된 후보블록들을 정규화한 후 최고 상관 후보블록을 추정하는 멀티 스케일 움직임 추정(Multi-scale motion estimation)의 구조를 채용한다.In particular, the inter-prediction unit 200 searches for each candidate block in an enlarged and reduced state in the motion estimation process in order to reflect the enlargement and reduction of the image sequence in the motion estimation, and then rate-distortion cost. ), A multi-scale motion estimation scheme is employed to normalize candidate blocks modified (enlarged or reduced) through scaling to estimate the highest correlation candidate block.

이후, 비디오 인코더는 화면 간 예측부(200)를 거쳐서 나온 최종 모드의 율-왜곡 비용과 화면 내 예측부(110)에서 나온 최종 모드의 율-왜곡 비용을 비교하여 더 작은 값을 가지는 것을 최종 모드로 결정하여 압축된 데이터 스트림을 생성하게 된다.Thereafter, the video encoder compares the rate-distortion cost of the final mode obtained through the inter-prediction unit 200 with the rate-distortion cost of the final mode from the intra prediction unit 110 to have a smaller value. It determines the compressed data stream.

도 3에 도시된 8개의 모드 중에서 하나가 화면 간 예측의 최종 모드로 결정된다.One of the eight modes shown in FIG. 3 is determined as the final mode of the inter prediction.

도 3은 본 발명의 일 실시예에서 움직임 추정을 위한 매크로블록과 서브 매크로블록의 파티션 종류를 예시한 것으로, H.264/AVC에서 잔차 데이터를 줄여 압축 효율을 높이기 위해 수행하는 기법 중 하나인 매크로블록(P110)과 서브 매크로블록(P120)의 파티션 형태를 도시하고 있다.FIG. 3 illustrates partition types of macroblocks and sub-macroblocks for motion estimation in an embodiment of the present invention, which is one of techniques used to increase compression efficiency by reducing residual data in H.264 / AVC. The partition form of the block P110 and the sub macroblock P120 is illustrated.

일 실시예의 비디오 인코더에서는 움직임 추정의 정밀성을 높이기 위해서 매크로블록을 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4의 총 8가지 모드로 나눈다. 비디오 인코더는 각각의 모드에 따라서 율-왜곡 비용을 계산하고, 각 모드의 율-왜곡 비용들 중에서 최소의 비용을 가지는 모드를 최종 모드로 결정하여 잔차 데이터를 생성한다.In the video encoder of one embodiment, macroblocks are divided into eight modes of 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4 in order to increase the precision of motion estimation. The video encoder calculates the rate-distortion cost according to each mode, and determines the mode having the least cost among the rate-distortion costs of each mode as the final mode to generate residual data.

도 4는 본 발명의 일 실시예에서 화면 간 예측부의 세부 구성을 도시한 것이다.4 illustrates a detailed configuration of an inter prediction unit in an embodiment of the present invention.

화면 간 예측부(200)는 후보블록 탐색부(210), 블록 정규화부(220), 정합 수행부(230)를 포함하여 구성된다.The inter prediction unit 200 includes a candidate block search unit 210, a block normalization unit 220, and a matching execution unit 230.

후보블록 탐색부(210)는 현재 프레임 내 매크로블록의 위치에 상응하는 이전 프레임 내 탐색범위 안의 후보블록들과, 탐색범위 안의 후보블록들을 스케일링한 멀티 스케일의 후보블록들을 탐색한다. 즉, 영상 시퀀스의 확대, 축소를 움직임 추정에 반영하기 위해 확대 또는 축소된 상태의 후보블록들까지 모두 탐색하는 것이다.The candidate block search unit 210 searches for candidate blocks in the search range in the previous frame corresponding to the position of the macroblock in the current frame and multi-scale candidate blocks in which the candidate blocks in the search range are scaled. That is, all the candidate blocks in the enlarged or reduced state are searched to reflect the enlargement or reduction of the image sequence in the motion estimation.

정합 수행부(230)는 정합을 통해 후보블록 탐색부(210)에서 탐색된 후보블록들로부터 매크로블록과 상관성이 가장 높은 최고 상관 후보블록을 추정한다. 구체적으로, 정합 수행부(230)는 현재 매크로블록 내의 픽셀들과 탐색된 후보블록들 사이의 절대 차의 합인 SAD(Sum of the Absolute Difference)를 구하여 SAD 값이 가장 작은 최고 상관 후보블록을 찾아내고, 최고 상관 후보블록의 위치를 움직임 벡터로, 최고 상관 후보블록의 스케일을 비율 벡터로 각각 저장하여 멀티 스케일 움직임 추정을 완료한다.The matching performing unit 230 estimates the highest correlation candidate block having the highest correlation with the macroblock from the candidate blocks searched by the candidate block searching unit 210 through matching. Specifically, the matching unit 230 finds the highest correlation candidate block having the smallest SAD value by obtaining a sum of the absolute difference (SAD), which is the sum of the absolute differences between the pixels in the current macroblock and the searched candidate blocks. The multi-scale motion estimation is completed by storing the positions of the highest correlation candidate blocks as motion vectors and the scales of the highest correlation candidate blocks as ratio vectors.

블록 정규화부(220)는 정합 수행부(230)가 매크로블록과 정규화된 후보블록들을 비교하여 정합을 수행할 수 있도록 후보블록 탐색부(210)에서 탐색된 멀티 스케일의 후보블록들을 매크로블록의 사이즈(예컨대, 16x16)로 정규화하여 제공한다.The block normalization unit 220 compares the multi-block candidate blocks searched by the candidate block search unit 210 with the macroblock size so that the matching unit 230 may compare the macroblock with the normalized candidate blocks. Normalized to (e.g., 16x16).

이러한 구성은 비디오 코덱에서 화질 및 압축률에 큰 영향을 미치는 화면 간 예측의 움직임 추정 과정을 보다 정밀하게 개선할 수 있다. 이를 위해서, 비디오 코덱의 움직임 벡터를 통해 얻을 수 있는 후보블록의 2차원 움직임뿐만 아니라, 후보블록의 스케일링을 통해 확대, 축소와 같은 3차원 움직임을 반영함으로써 현재 프레임의 매크로블록과 보다 상관성이 높은 후보블록을 추정하고, 잔차 데이터를 효과적으로 줄이는 과정이 수행된다.Such a configuration can more accurately improve the motion estimation process of inter-screen prediction, which greatly affects the image quality and compression ratio in the video codec. To this end, candidates that are more correlated with the macroblock of the current frame by reflecting not only two-dimensional motions of the candidate blocks obtained through the motion vector of the video codec but also three-dimensional motions such as enlargement and reduction by scaling the candidate blocks. The process of estimating the block and effectively reducing the residual data is performed.

도 5는 본 발명의 일 실시예에서 영상 시퀀스의 확대, 축소를 움직임 추정에 반영하기 위해 사용하는 멀티 스케일 움직임 추정을 도시한 것이다.FIG. 5 is a diagram illustrating multi-scale motion estimation used to reflect expansion and contraction of an image sequence in motion estimation in an embodiment of the present invention.

(a)는 비디오 인코더의 화면 간 예측에서 수행되는 블록 기반의 움직임 추정 기법이다. (a)에서, 16x16 사이즈의 매크로블록(P200)으로 나누어진 비디오 프레임은 이전 프레임(Previous frame)에서 현재 프레임(Current frame)의 블록 위치에 해당하는 탐색범위(search range)(P210) 안의 후보블록(candidate block)들을 탐색(searching)하면서 정합(matching)을 통해 매크로블록(P200)과 가장 상관성이 높은 후보블록의 위치를 나타내는 움직임 벡터(P220)를 찾아낸다. 정합 과정은 현재 프레임의 매크로블록 내의 픽셀들과 이전 프레임의 후보블록 사이의 절대 차의 합인 SAD(Sum of the Absolute Difference)를 구하는 과정이며, 이 값이 가장 작은 후보블록의 위치를 x와 y로 구성된 2차원 좌표인 움직임 벡터(motion vector)(P220)를 통해 저장한다. 탐색범위(P210) 내의 모든 후보블록에 대해서 정합 과정을 수행하여 움직임 벡터를 찾아내면 움직임 추정 과정은 종료된다.(a) is a block-based motion estimation technique performed in inter prediction of a video encoder. In (a), a video frame divided into 16 × 16 macroblocks P200 is a candidate block in a search range P210 corresponding to a block position of a current frame from a previous frame. Searching for (candidate blocks) to find a motion vector (P220) indicating the position of the candidate block having the most correlation with the macroblock (P200) through matching. The matching process is to calculate sum of the absolute difference (SAD), which is the sum of the absolute differences between the pixels in the macroblock of the current frame and the candidate block of the previous frame. The position of the candidate block having the smallest value is set to x and y. Stored through a motion vector (P220) that is a configured two-dimensional coordinates. The motion estimation process is terminated when a matching process is found for all candidate blocks in the search range P210.

(a)의 기법만으로는 영상 시퀀스 내의 객체의 수직적, 수평적 움직임만을 추정할 수 있다는 한계가 있고, 평면적인 움직임뿐만 아니라 줌인, 줌아웃과 같은 3차원 움직임을 보이는 영상 시퀀스 상의 객체들의 움직임을 정밀하게 추정하기 어렵다.The technique of (a) alone is limited in that it can estimate only vertical and horizontal movements of objects in an image sequence, and precisely estimates the movements of objects in an image sequence showing three-dimensional movements such as zoom-in and zoom-out as well as planar movements. Difficult to do

(b)는 영상 시퀀스의 확대, 축소를 움직임 추정에 반영하여 전술한 문제점을 개선하고자 제안하는 멀티 스케일 움직임 추정의 구조도이다. (a)의 움직임 추정과 기본적인 수행 과정은 같다. 즉, 현재 프레임(Previous frame)의 매크로블록(P300)은 이전 프레임(Previous frame)의 탐색범위(P310) 안에 있는 후보블록들을 탐색하면서 현재 프레임의 매크로블록(P300)과 가장 상관성이 높은 후보블록을 찾아내기 위해 정합 과정을 수행하고, SAD 값이 가장 작은 후보블록의 위치를 움직임 벡터(P320)를 통해 저장한다. 매크로블록(P300)이 탐색범위(P310) 내의 모든 후보블록들에 대해서 정합 과정을 수행하여 움직임 벡터를 찾아내면 움직임 추정 과정은 종료된다.(b) is a structural diagram of a multi-scale motion estimation proposed to improve the above-mentioned problem by reflecting enlargement and reduction of an image sequence in motion estimation. The motion estimation in (a) and the basic process are the same. That is, the macroblock P300 of the current frame searches for candidate blocks within the search range P310 of the previous frame and selects the candidate block having the most correlation with the macroblock P300 of the current frame. A matching process is performed to find and the position of the candidate block having the smallest SAD value is stored through the motion vector P320. When the macroblock P300 performs a matching process on all candidate blocks within the search range P310 to find a motion vector, the motion estimation process ends.

멀티 스케일 움직임 추정 기법은 (a)와 같이 후보블록의 움직임 추정 과정을 완료한 후 해당 후보블록의 사이즈를 한 단계씩 확대 또는 축소한 다음 움직임 추정 과정을 다시 반복한다.In the multi-scale motion estimation technique, as shown in (a), after the motion estimation process of the candidate block is completed, the size of the candidate block is increased or decreased by one step and the motion estimation process is repeated again.

일 실시예의 비디오 인코더는 모든 확대, 축소된 상태의 후보블록들의 정합 과정을 통해 얻은 SAD 값을 비교하여 가장 작은 SAD 값을 가지는 확대, 축소의 단계를 알아낼 수 있다.The video encoder of the present embodiment may compare the SAD values obtained through the matching process of candidate blocks in all the enlarged and reduced states to determine the enlargement and reduction stages having the smallest SAD values.

또한, 비디오 인코더는 확대와 축소의 범위를 제한하기 위해 확장범위(extension range)를 도입하여 확대, 축소의 범위를 제한한다. 확대, 축소의 단계 정보는 비율 벡터(scaling vector)(P330)에 저장된다. 비율 벡터(P330)가 양의 값을 가질 경우 확대, 음의 값을 가질 경우는 축소를 의미한다. 비율 벡터(P330)의 값은 확장범위를 초과할 수 없다. 도 5의 (b)는 후보블록이 확대된 경우의 예를 나타낸 것이다.In addition, the video encoder introduces an extension range to limit the range of enlargement and reduction, thereby limiting the range of enlargement and reduction. Step information of enlargement and reduction is stored in a scaling vector P330. If the ratio vector P330 has a positive value, the ratio vector P330 means a magnification. The value of the ratio vector P330 may not exceed the extension range. FIG. 5B shows an example in which the candidate block is enlarged.

확대, 축소된 후보블록들을 정합하기 위해서는 현재 매크로블록과 후보블록의 사이즈가 같아야 하는데, 이를 위해서 확대 또는 축소된 후보블록들은 도 6과 같이 매크로블록의 사이즈인 16x16으로 다시 정규화해야 한다. 도 6의 (a)는 확대된 후보블록을 정규화하는 것을 도시한 것이며, 도 6의 (b)는 축소된 후보블록을 정규화하는 과정을 도시한 것이다.In order to match the enlarged and reduced candidate blocks, the size of the current macroblock and the candidate block should be the same. For this purpose, the enlarged or reduced candidate blocks should be normalized again to 16 × 16 which is the size of the macroblock. FIG. 6A illustrates normalizing the enlarged candidate block, and FIG. 6B illustrates a process of normalizing the reduced candidate block.

도 7은 확대, 축소된 후보블록을 정규화하기 위한 멀티 스케일의 샘플링 기법(1차원 샘플링 기법)을 도시한 것이다.
도 7의 (a)는 비율 벡터(scaling vector)가 +1로 확대된 후보블록을 16x16으로 정규화하는 과정이며, 도 7의 (b)는 비율 벡터(scaling vector)가 -1로 축소된 후보블록을 16x16으로 정규화하는 과정이다. 16개의 픽셀을 모두 표시하는 것은 복잡한 관계로, 편의상 도 7에서는 6개의 픽셀로 간략화하여 도시하고 있다. 정규화 과정에서, 샘플링 대상이 되는 멀티 스케일 후보블록의 사이즈가 m이고 매크로블록의 사이즈가 n이라 할 때, 정규화된 멀티 스케일 후보블록에서의 픽셀들 간의 거리 r은 수학식 1로 정의된다.7 illustrates a multi-scale sampling technique (one-dimensional sampling technique) for normalizing enlarged and reduced candidate blocks.
FIG. 7A illustrates a process of normalizing a candidate block in which a scaling vector is expanded to +1 to 16 × 16, and FIG. 7B is a candidate block in which a scaling vector is reduced to −1. This process is normalized to 16x16. It is complicated to display all 16 pixels, and for the sake of simplicity, six pixels are illustrated in FIG. 7 for convenience. In the normalization process, when the size of the multi-scale candidate block to be sampled is m and the size of the macroblock is n, the distance r between pixels in the normalized multi-scale candidate block is defined by Equation 1 below.

[수학식 1][Equation 1]

단, 정규화의 결과로서, 원본 픽셀들 사이에 새로 생성되는 픽셀의 위치(좌표)를 i 번째 인덱스라고 할 때, 첫 번째 인덱스(i=1)의 거리는 r/2이다.
i 번째 인덱스의 픽셀의 값 x는 수학식 2로 정의된다.However, as a result of normalization, when the position (coordinate) of the newly generated pixel between the original pixels is called the i-th index, the distance of the first index (i = 1) is r / 2.
The value x of the pixel at the i th index is defined by Equation 2.

삭제delete

[수학식 2]&Quot; (2) "

여기서, a는 인덱스 i에 가장 근접한 바로 이전 위치의 원본 픽셀의 값이며, b는 인덱스 i의 바로 다음 위치의 원본 픽셀의 값을 의미한다.Here, a is the value of the original pixel at the immediately previous position closest to the index i, and b is the value of the original pixel at the next position of the index i.

수학식 2의 scaling coefficient는 수학식 3으로 정의된다.The scaling coefficient of Equation 2 is defined by Equation 3.

[수학식 3]&Quot; (3) "

도 8은 본 발명의 일 실시예에서 도 7의 샘플링 기법을 2차원으로 확장한 형태를 도시한 것이다.FIG. 8 illustrates a form in which the sampling technique of FIG. 7 is extended in two dimensions in an embodiment of the present invention.

샘플링의 대상이 되는 후보블록들은 M x M의 2차원 형태이기 때문에 이를 정규화하기 위한 샘플링 기법 또한 2차원으로 수행되어야 한다. 이를 위해 2단계로 나누어진 샘플링 기법이 수행된다. 첫 번째 단계에서, 후보블록의 수평 방향에 대해서만 1차원 샘플링을 수행한다. 도 7과 같이 매크로블록의 사이즈가 16 x 16인 경우라면, 여기서, 총 16번의 연산이 수행된다. 두 번째 단계에서, 수평 방향으로 샘플링을 완료한 픽셀에 대해서만 다시 수직 방향으로 1차원 샘플링을 수행한다. 이 경우에도, 총 16번의 연산이 수행된다. 도 8의 (a)는 2차원 샘플링의 첫 번째 단계를 나타낸 것이고, (b)는 두 번째 단계를 나타낸 것이다.Since candidate blocks to be sampled are two-dimensional forms of M × M, a sampling technique for normalizing them also needs to be performed in two dimensions. For this purpose, a sampling technique divided into two stages is performed. In the first step, one-dimensional sampling is performed only for the horizontal direction of the candidate block. If the size of the macroblock is 16 x 16 as shown in Figure 7, here, a total of 16 operations are performed. In the second step, one-dimensional sampling is performed again in the vertical direction only for pixels that have completed sampling in the horizontal direction. Also in this case, a total of 16 operations are performed. (A) of FIG. 8 shows a first step of two-dimensional sampling, and (b) shows a second step.

도 9는 본 발명의 일 실시예에 따른 비디오 신호의 고정밀 움직임 추정 방법을 도시한 흐름도이다.9 is a flowchart illustrating a high precision motion estimation method of a video signal according to an embodiment of the present invention.

비디오 인코더는 현재 프레임의 매크로블록을 지정한 후, 현재 프레임 내 매크로블록의 위치에 상응하는 이전 프레임 내 탐색범위 안의 후보블록들과, 탐색범위 내 후보블록들을 스케일링한 멀티 스케일의 후보블록들을 탐색한다. 탐색 과정에서, 비디오 인코더는 스케일링을 위한 비율 벡터의 값이 확장범위를 초과할 때까지 비율 벡터의 값을 초기값으로부터 한 단계씩 증가 또는 감소시키면서, 탐색범위 내 특정 후보블록에 대하여 최소로 축소된 후보블록부터 최대로 확대된 후보블록까지 모두 탐색한다.After designating the macroblock of the current frame, the video encoder searches for candidate blocks in the search range in the previous frame corresponding to the position of the macroblock in the current frame and multi-scale candidate blocks in which the candidate blocks in the search range are scaled. In the search process, the video encoder is reduced to the minimum for a specific candidate block in the search range, increasing or decreasing the value of the rate vector by one step from the initial value until the value of the scale vector for scaling exceeds the extended range. Search from the candidate block to the maximumly expanded candidate block.

이후, 비디오 인코더는 정합을 수행하여 탐색된 후보블록들로부터 매크로블록과 상관성이 가장 높은 최고 상관 후보블록을 추정한다. 정합 과정에서, 비디오 인코더는 탐색된 후보블록들을 매크로블록의 사이즈로 정규화하여 매크로블록과 정규화된 후보블록들을 정합한다.The video encoder then performs matching to estimate the highest correlation candidate block having the highest correlation with the macroblock from the searched candidate blocks. In the matching process, the video encoder normalizes the searched candidate blocks to the size of the macroblock to match the macroblock and normalized candidate blocks.

도 9를 참조하여 멀티 스케일 움직임 추정 과정을 예시적으로 설명하면 다음과 같다.A multi-scale motion estimation process will be described with reference to FIG. 9 as follows.

먼저, 비디오 인코더는 현재 프레임의 매크로블록(macroblock)을 지정하고(S110), 비율 벡터(s: scaling vector)의 초기값을 설정한다(S120). S120에서, 비율 벡터(s)의 초기값은 '확장범위(ER: extension range) * (-1)'로 설정하여 최대로 축소된 후보블록(candidate block)부터 탐색을 시작하도록 한다.First, the video encoder designates a macroblock of a current frame (S110), and sets an initial value of a scaling vector (s) (S120). In S120, the initial value of the ratio vector s is set to 'extension range (ER) * (-1)' to start searching from the maximum reduced candidate block.

비디오 인코더는 이전 프레임의 탐색범위 안에 있는 특정 후보블록을 탐색(searching)(S130)하고, 해당 후보블록을 비율 벡터(s)의 초기값으로 스케일링(scaling)하여 최소로 축소된 상태의 멀티 스케일 후보블록을 탐색한다(S140).The video encoder searches (S130) a specific candidate block within the search range of the previous frame, and scales the candidate block to an initial value of the ratio vector s to minimize the multi-scale candidate in the least reduced state. The block is searched for (S140).

이후, 비디오 인코더는 매크로블록을 S130에서 탐색된 후보블록 및 S140에서 스케일링된 멀티 스케일 후보블록과 정합(matching)하여 매크로블록과 상관성이 가장 높은 후보블록을 추정한다(S150).Thereafter, the video encoder matches the macroblock with the candidate block found in S130 and the multiscale candidate block scaled in S140 to estimate a candidate block having the highest correlation with the macroblock (S150).

후보블록의 움직임 벡터(x, y)가 탐색범위(SR)를 초과할 때까지(S160), S130 내지 S150의 과정이 반복된다. 후보블록들의 탐색범위(SR: searching range)를 모두 탐색하면(S160), 비디오 인코더가 비율 벡터(s)의 값을 한 단계 증가시키고(S180), 비율 벡터(s)의 크기가 확장범위(ER)를 초과할 때까지(S170) 이를 반복한다.The process of S130 to S150 is repeated until the motion vector (x, y) of the candidate block exceeds the search range SR (S160). When all the search ranges (SR) of the candidate blocks are searched (S160), the video encoder increases the value of the rate vector s by one step (S180), and the size of the rate vector s is extended by ER. Repeat this until (S170).

비율 벡터(s)의 크기가 확장범위(ER)를 초과하면(S170) 최대로 확대된 후보블록까지 탐색을 마친 것이므로, S190으로 진행하여 최대 상관도를 갖는 위치에 있는 최고 상관 후보블록의 움직임 벡터(x, y)와 비율 벡터(s)를 저장한 후 해당 매크로블록의 움직임 추정 과정을 종료한다.
If the magnitude of the ratio vector s exceeds the extended range ER (S170), the search is completed to the maximum enlarged candidate block. Therefore, the process proceeds to S190 and the motion vector of the highest correlation candidate block at the position having the maximum correlation. After storing (x, y) and the ratio vector (s), the motion estimation process of the macroblock is terminated.

이상 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, You will understand.

따라서, 이상에서 기술한 실시예들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이므로, 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 하며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.
Therefore, since the embodiments described above are provided to completely inform the scope of the invention to those skilled in the art, it should be understood that they are exemplary in all respects and not limited. The invention is only defined by the scope of the claims.

200: 화면 간 예측부
210: 후보블록 탐색부
220: 블록 정규화부
230: 정합 수행부200: inter prediction
210: candidate block search unit
220: block normalization unit
230: matching execution unit

Claims

A candidate block search unit searching for candidate blocks in a search range in a previous frame corresponding to a position of a macroblock in a current frame, and multi-scale candidate blocks in which candidate blocks in the search range are scaled;
A matching execution unit for estimating the highest correlation candidate block having the highest correlation with the macroblock from the searched candidate blocks through matching; And
An apparatus for estimating high precision motion of a video signal including a block normalizer for normalizing the multi-scale candidate blocks to the size of the macro block so that the matching performer compares the macro block and normalized candidate blocks and performs matching. .

delete

The method of claim 1,
In the normalization process, when a 2D multi-scale candidate block of M × M is a sampling target, after performing 1D sampling in the horizontal direction of the candidate block, 1D sampling is performed in the vertical direction again to obtain a 2D shape. High precision motion estimation apparatus of a video signal, characterized in that to complete the sampling.

The method of claim 1,
In the normalization process, when the size of the multi-scale candidate block to be sampled is m and the size of the macroblock is n, the distance r between pixels in the normalized multi-scale candidate block is
defined by the equation 'r = (m-1) / n',
However, as a result of normalization, when the position of the newly generated pixel between the original pixels is referred to as the i-th index, the distance of the first index (i = 1) is r / 2, and the high-precision motion estimation of the video signal Device.

The method of claim 4, wherein
The value x of the pixel at the i th index is
'x [i] = a + (ba) * scaling coefficient, where a is the value of the original pixel at the position immediately preceding the index i and b is the value of the original pixel at the position immediately after the index i. Defined by the equation,
scaling coefficient,
High precision motion estimation apparatus of a video signal, characterized in that defined by the equation 'scaling coefficient = r * (i mod n)'.

The method of claim 1, wherein the matching execution unit,
Sum of the Absolute Difference (SAD), which is the sum of absolute differences between the pixels in the macroblock and the searched candidate blocks, is found to find the highest correlation candidate block having the smallest SAD value, and the position of the highest correlation candidate block is determined. And a motion vector, storing the scale of the highest correlation candidate block as a ratio vector.

(A) the video encoder designating a macroblock of the current frame;
(B) the video encoder searching for candidate blocks in a search range in a previous frame corresponding to positions of macroblocks in the current frame and multi-scale candidate blocks in which candidate blocks in the search range are scaled; And
(C) estimating, by the video encoder, the highest correlation candidate block having the highest correlation with the macroblock from the searched candidate blocks through matching;
and (c) normalizing the searched candidate blocks to the size of the macroblock to match the macroblock and the normalized candidate blocks in the matching process of (c).

delete

The method of claim 7, wherein
In the searching process of (b), the value of the ratio vector is reduced to a minimum for a specific candidate block within the search range by increasing or decreasing the value of the ratio vector by one step from the initial value until the value of the scale vector for scaling exceeds the extended range. A method of high precision motion estimation of a video signal, characterized by searching for all candidate blocks from the candidate block to the maximum enlarged candidate block.