KR20050074151A

KR20050074151A - Method for selecting motion vector in scalable video coding and the video compression device thereof

Info

Publication number: KR20050074151A
Application number: KR1020040002379A
Authority: KR
Inventors: 한우진; 김수현
Original assignee: 삼성전자주식회사
Priority date: 2004-01-13
Filing date: 2004-01-13
Publication date: 2005-07-18

Abstract

본 발명은 다중 해상도(multi resolution)를 지원하는 스케일러블 비디오 코딩 방법에서, 복수의 해상도(resolution)를 고려한 단일의 모션 벡터(Motion Vector)를 구하는 방법에 관한 것이다.The present invention relates to a method of obtaining a single motion vector in consideration of a plurality of resolutions in a scalable video coding method supporting multi resolutions.

본 발명에 따른, 다중 해상도를 지원하는 스케일러블 비디오 압축에서 모션 벡터를 선정하는 방법은, 소정의 모션 벡터 후보에 대하여, 각각의 해상도 별로 에러를 계산하고, 상기 각각의 에러에 가중치를 부여한 총 에러를 계산한 후, 상기 총 에러가 최소가 되는 모션 벡터 후보를 실제 모션 벡터로 선정하는 것을 특징으로 한다.According to the present invention, a method of selecting a motion vector in scalable video compression that supports multiple resolutions includes calculating an error for each resolution and weighting each error for a predetermined motion vector candidate. After calculating, the motion vector candidate for which the total error is minimum is selected as the actual motion vector.

본 발명에 따르면, 스케일러블 비디오 코딩에 있어 적절한 모션 벡터를 이용함으로써 시간적 필터링 과정에서의 압축률을 제고할 수 있고, 이를 곧 전반적 화질의 향상을 가져오게 된다.According to the present invention, by using an appropriate motion vector in scalable video coding, the compression ratio in the temporal filtering process can be improved, which leads to an improvement in overall image quality.

Description

Method for selecting motion vector in scalable video coding and the video compression device

본 발명은 일반적으로 비디오 압축에 관한 것으로서, 보다 상세하게는 다중 해상도(multi resolution)를 지원하는 스케일러블 비디오 코딩 방법에서, 복수의 해상도(resolution)를 고려한 단일의 모션 벡터(Motion Vector)를 구하는 방법에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention generally relates to video compression. More particularly, in a scalable video coding method that supports multi resolution, a method of obtaining a single motion vector considering a plurality of resolutions It is about.

인터넷을 포함한 정보통신 기술이 발달함에 따라 문자, 음성뿐만 아니라 화상통신이 증가하고 있다. 기존의 문자 위주의 통신 방식으로는 소비자의 다양한 욕구를 충족시키기에는 부족하며, 이에 따라 문자, 영상, 음악 등 다양한 형태의 정보를 수용할 수 있는 멀티미디어 서비스가 증가하고 있다. 멀티미디어 데이터는 그 양이 방대하여 대용량의 저장매체를 필요로 하며 전송시에 넓은 대역폭을 필요로 한다. 따라서 문자, 영상, 오디오를 포함한 멀티미디어 데이터를 전송하기 위해서는 압축코딩기법을 사용하는 것이 필수적이다.As information and communication technology including the Internet is developed, not only text and voice but also video communication are increasing. Conventional text-based communication methods are not enough to satisfy various needs of consumers, and accordingly, multimedia services that can accommodate various types of information such as text, video, and music are increasing. Multimedia data has a huge amount and requires a large storage medium and a wide bandwidth in transmission. Therefore, in order to transmit multimedia data including text, video, and audio, it is essential to use a compression coding technique.

데이터를 압축하는 기본적인 원리는 데이터의 중복(redundancy)을 없애는 과정이다. 이미지에서 동일한 색이나 객체가 반복되는 것과 같은 공간적 중복이나, 동영상 프레임에서 인접 프레임이 거의 변화가 없는 경우나 오디오에서 같은 음이 계속 반복되는 것과 같은 시간적 중복, 또는 인간의 시각 및 지각 능력이 높은 주파수에 둔감한 것을 고려한 심리시각 중복을 없앰으로서 데이터를 압축할 수 있다. The basic principle of compressing data is the process of eliminating redundancy. Spatial overlap, such as the same color or object repeating in an image, temporal overlap, such as when there is almost no change in adjacent frames in a movie frame, or the same note over and over in audio, or high frequency of human vision and perception Data can be compressed by eliminating duplication of psychovisuals considering insensitive to.

현재 대부분의 비디오 코딩 표준은 모션 보상 예측 코딩법에 기초하고 있는데, 시간적 중복은 모션 보상에 근거한 시간적 필터링(temporal filtering)에 의해 제거하고, 공간적 중복은 공간적 변환(spatial transform)에 의해 제거한다.Currently, most video coding standards are based on motion compensated predictive coding, where temporal overlap is eliminated by temporal filtering based on motion compensation, and spatial overlap is removed by spatial transform.

데이터의 중복을 제거한 후 생성되는 멀티미디어를 전송하기 위해서는, 전송매체가 필요한데 그 성은은 전송매체 별로 차이가 있다. 현재 사용되는 전송매체는 초당 수십 메가비트의 데이터를 전송할 수 있는 초고속통신망부터 초당 384 kbit의 전송속도를 갖는 이동통신망 등과 같이 다양한 전송속도를 갖는다.In order to transmit multimedia generated after deduplication of data, a transmission medium is required, and the sex is different for each transmission medium. Currently used transmission media have various transmission speeds, such as high speed communication networks capable of transmitting tens of megabits of data per second to mobile communication networks having a transmission rate of 384 kbits per second.

이와 같은 환경에서, 다양한 속도의 전송매체를 지원하기 위하여 또는 전송환경에 따라 이에 적합한 전송률로 멀티미디어를 전송할 수 있도록 하는, 즉 스케일러빌리티(scalability)를 갖는 데이터 코딩방법이 멀티미디어 환경에 보다 적합하다 할 수 있다.In such an environment, a data coding method capable of transmitting multimedia at a data rate that is suitable for various transmission speeds or according to a transmission environment, that is, scalability may be more suitable for a multimedia environment. have.

이러한 스케일러빌리티란, 비디오/이미지 코딩 방법에 있어서, 하나의 압축된 비트 스트림에 대하여 비트 레이트, 에러율, 시스템 자원 등의 조건에 따라 부분적 디코딩을 할 수 있게 해주는 속성을 말한다.Such scalability refers to an attribute that enables partial decoding on one compressed bit stream according to conditions such as bit rate, error rate, system resource, and the like in a video / image coding method.

이미, MPEG-21(moving picture experts group-21) PART-13에서 스케일러블 비디오 코딩(scalable video coding)에 관한 표준화를 진행 중에 있는데, 그 중 공간적 변환 방법에서 웨이블릿-기반의(wavelet-based) 방식이 유력한 방법으로 인식되고 있다. 그리고, 정지 영상(이하 ＇이미지＇라 한다)에 대해서는 웨이블릿-기반의 스케일러블 이미지 코딩인 JPEG-2000(joint photographic coding experts group-2000) 방식이 이미 실용화되어 있는 상태이다. Already, standardization of scalable video coding in MPEG-21 (moving picture experts group-21) PART-13 is underway. Among them, wavelet-based method in spatial transform method It is recognized in this potent way. For still images (hereinafter referred to as " images "), the JPEG-2000 (joint photographic coding experts group-2000) method, which is a wavelet-based scalable image coding, has already been put into practical use.

도 1은 스케일러블 비디오/이미지 코딩 시스템의 전체 구조를 간략히 도시한 것이다. 1 is a simplified illustration of the overall structure of a scalable video / image coding system.

먼저, 엔코더(encoder; 100)는 입력 비디오/이미지(10)를 부호화하여 하나의 비트 스트림(20)을 생성한다.First, the encoder 100 generates one bit stream 20 by encoding the input video / image 10.

그리고, 프리 디코더(pre-decoder; 200)는 더코더(decoder; 300)와의 통신 환경 또는 디코더(300) 단에서의 기기 성능 등을 고려한 조건, 예를 들어, 비트 레이트, 해상도 또는 프레임 레이트를 추출 조건으로 하여, 엔코더(100)로부터 수신한 비트 스트림(20)을 잘라내어 다양한 비트 스트림(25)을 추출할 수 있다.The pre-decoder 200 extracts a condition, for example, a bit rate, a resolution, or a frame rate in consideration of a communication environment with the decoder 300 or device performance at the decoder 300, and the like. As a condition, various bit streams 25 can be extracted by cutting out the bit stream 20 received from the encoder 100.

디코더(300)는 상기 추출한 비트 스트림(25)으로부터 출력 비디오/이미지(30)를 복원한다. 물론, 상기 추출 조건에 의한 비트 스트림의 추출은 반드시 프리 디코더(150)에서 수행되는 것은 아니고, 디코더(300)에서 수행될 수도 있다. 또한, 프리 디코더(150) 및 디코더(300) 모두에서 수행될 수도 있다.The decoder 300 restores the output video / image 30 from the extracted bit stream 25. Of course, the extraction of the bit stream by the extraction condition is not necessarily performed by the predecoder 150, but may be performed by the decoder 300. It may also be performed in both the predecoder 150 and the decoder 300.

도 2는 스케일러블 비디오 코딩 시스템 중에서 엔코더(encoder)의 구성을 나타낸 블록도이다2 is a block diagram illustrating a configuration of an encoder in a scalable video coding system.

모션 추정부(110)는 입력된 비디오로부터 GOP(Group of Pictures)를 추출하고, 각각의 GOP에 존재하는 프레임들에 관한 모션 추정을 수행하여 모션 벡터(motion vector)를 선정한다. 상기 모션 추정 방법으로서, 계층적 가변 사이즈 블록 매칭법(Hierarchical Variable Size Block Matching; HVSBM)에 의한 계층적인 방법을 사용할 수 있다.The motion estimator 110 extracts a group of pictures (GOP) from the input video and selects a motion vector by performing motion estimation on frames existing in each GOP. As the motion estimation method, a hierarchical method by Hierarchical Variable Size Block Matching (HVSBM) may be used.

시간적 필터링부(120)는 모션 추정부(110)에 하여 구해진 모션 벡터를 이용하여 시간축 방향으로 프레임들을 저주파와 고주파 프레임으로 분해함으로써 시간적 중복성을 감소시킨다. 시간적 필터링 방법으로는, 예컨대 MCTF(Motion Compensated Temporal Filtering)를 사용할 수 있다.The temporal filtering unit 120 reduces temporal redundancy by decomposing the frames into low frequency and high frequency frames in the time axis direction using the motion vector obtained by the motion estimation unit 110. As the temporal filtering method, for example, MCTF (Motion Compensated Temporal Filtering) may be used.

공간적 변환부(130)는 시간적 필터링부(120)에 의하여 시간적 중복성이 제거된 프레임에 대하여, 웨이블릿 변환(wavelet transform)을 사용함으로써, 하나의 프레임을 분해하여 저주파수 서브밴드(sub-band)와 고주파수 서브밴드로 구분하고, 각각에 대한 웨이블릿 계수(wavelet coefficient)를 구한다. The spatial transform unit 130 uses a wavelet transform on a frame from which the temporal redundancy is removed by the temporal filtering unit 120, thereby decomposing one frame to decompose one low frequency subband and a high frequency. The subbands are divided and a wavelet coefficient for each is obtained.

엠베디드 양자화부(140)는 공간적 변환부(130)에서 구한 웨이블릿 계수를 엠베디드 양자화한다. 이와 같이, 웨이블릿 블록별로 웨이블릿 계수를 엠베디드 양자화하는 방법으로는 EZW(Embedded Zerotrees Wavelet Algorithm), SPIHT(Set Partitioning in Hierarchical Trees), EZBC(Embedded ZeroBlock Coding) 등을 사용할 수 있다.The embedded quantization unit 140 embeds the quantized wavelet coefficients obtained by the spatial transform unit 130. As such, embedded quantization of wavelet coefficients for each wavelet block may include embedded zerotrees wavelet algorithm (EZW), set partitioning in hierarchical trees (SPIHT), embedded zeroblock coding (EZBC), and the like.

마지막으로, 엔트로피 부호화부(150)는 엠베디드 양자화부(140)에 의하여 양자화된 웨이블릿 계수 및 모션 추정부(110)을 통하여 선정된 모션 벡터를 출력 비트 스트림(20)으로 부호화한다.Finally, the entropy encoder 150 encodes the wavelet coefficient quantized by the embedded quantizer 140 and the motion vector selected through the motion estimator 110 into the output bit stream 20.

상기 시간적 필터링부(120)에서 시간적 중복성을 제거하기 위한 시간적 필터링을 하기 위해서 가장 중요한 것이 모션 벡터를 예측하는 것이다. 이러한 예측 방법으로 블록 매칭 방법이 많이 사용된다.In order to perform temporal filtering in order to remove temporal redundancy, the temporal filtering unit 120 predicts a motion vector. As a prediction method, a block matching method is frequently used.

이와 같은 블록 매칭 방법을 도 3을 참조하여 설명한다. 블록 매칭 방법은 연속된 두 영상 사이에서 nⅹm개의 매크로 블록으로 나누고 각각의 매크로 블록 단위로 두 영상 사이의 픽셀 차이를 비교하여 모션을 예측하는 것이다. 모션 추정의 탐색 범위는 미리 파라미터로 지정해 줄 수 있다. 만약 모션이 탐색 범위 내에 존재한다면 좋은 성능을 보이지만 영상의 움직임이 너무 빨라서 탐색 범위를 벗어난다면 예측의 정확도는 떨어질 것이다.This block matching method will be described with reference to FIG. 3. The block matching method is to predict motion by dividing nⅹm macroblocks between two consecutive images and comparing pixel differences between the two images in units of each macroblock. The search range for motion estimation can be specified in advance as a parameter. If the motion is within the search range, it shows good performance, but if the motion of the image is too fast and out of the search range, the accuracy of prediction will be lower.

실제로, 모션 벡터를 결정하는 방법은 상기 탐색 범위내에서 현재의 블록이 어디로 움직였는가를 찾는 것이다. 그 방법은 도 3에서와 같이 9개(0벡터 포함)의 가능한 모션 벡터의 후보 각각에 대하여 기준 프레임의 해당 블록내의 픽셀값과 현재 블록의 픽셀값의 차(이하 ＇에러(error)＇라 한다)를 구하였을 때, 그 값이 최소가 되는 경우를 찾아서, 그 경우의 벡터를 모션 벡터로 결정하는 방식이다.In practice, the method of determining the motion vector is to find out where the current block has moved within the search range. The method is referred to as the difference between the pixel value in the corresponding block of the reference frame and the pixel value of the current block for each of the nine possible motion vectors (including zero vectors) as shown in FIG. ), Find the case where the value becomes the minimum, and determine the vector in that case as the motion vector.

종래의 MPEG1, MPEG2 등 스케일러빌리티를 지원하지 않는 코딩 방식에서 하나의 입력 비디오를 이용하여 여러 가지 해상도를 갖는 엔코딩된 비트 스트림을 만들고자 하는 경우에, 각각에 대하여 모션 벡터를 결정하는 방법을 살펴보면, 다음의 두가지 정도가 있었다. 설명의 편의상 4ⅹ4, 8ⅹ8, 16ⅹ16 세가지 모드가 있다고 가정한다.In the case of creating encoded bit streams having various resolutions using one input video in a coding scheme that does not support scalability such as conventional MPEG1 and MPEG2, a method of determining a motion vector for each of There were two degrees. For convenience of explanation, it is assumed that there are three modes: 4x4, 8x8, and 16x16.

첫째는, 이 중에서 가장 해상도가 가장 높은 경우, 즉 해상도가 16ⅹ16인 경우에 모션 추정을 통해 모션 벡터를 구하고, 이것을 그대로 나머지 4ⅹ4 모드 및, 8ⅹ8 모드에 적용하는 방법이다. 다시 말하면, 16ⅹ16 모드에서 구한 모션 벡터를 1/2로 하여 8ⅹ8 모드에서의 모션 벡터로 사용하고, 다시 이것을 1/2로 하여 4ⅹ4 모드에서의 모션 벡터로 사용한다는 것이다.First, a motion vector is obtained through motion estimation when the resolution is the highest, that is, the resolution is 16x16, and is applied to the remaining 4x4 mode and 8x8 mode as it is. In other words, the motion vector obtained in the 16x16 mode is used as the motion vector in the 8x8 mode, and the half is used as the motion vector in the 4x4 mode.

둘째는, 이 중에서 가장 해상도가 낮은 경우, 즉 해상도가 4ⅹ4인 경우에 대하여 모션 벡터를 추정하여 구하고, 이것을 2배한 벡터의 근처를 탐색 범위로 하여 8ⅹ8 모드에서의 모션 벡터를 구한다. 그리고 다시 8ⅹ8 모드에서의 모션 벡터를 2배한 벡터의 근처를 탐색 범위로 하여 16ⅹ16 모드에서의 모션 벡터를 구하는 것이다.Secondly, a motion vector is estimated by obtaining the lowest resolution, that is, a case where the resolution is 4x4, and a motion vector in an 8x8 mode is obtained using the vicinity of the vector which is doubled as the search range. Then, the motion vector in the 16x16 mode is obtained by using the search range of the vector that doubles the motion vector in the 8x8 mode.

상술한 두 가지 방법과 같은 종래에 모션 벡터를 구하는 방법에 따르면, 특정 해상도(resolution)에 최적화된 모션 벡터를 다양한 해상도를 가진 동영상 전체에 적용하게 되므로, 그 이외의 해상도에서는 에러가 증가하게 되고 해상도에 따라 그 편차가 달라진다. 따라서, 다양한 해상도를 지원하는 스케일러블 비디오 코딩에 사용하기에는 적합하지 못하다.According to the conventional method of obtaining a motion vector such as the above two methods, since a motion vector optimized for a specific resolution is applied to the entire video having various resolutions, an error increases at other resolutions. The deviation depends on. Therefore, it is not suitable for use in scalable video coding supporting various resolutions.

또한, 시간적 필터링 과정에서 중요한 역할을 하는 모션 벡터를 잘못 설정하면, 시간적 필터링 과정에서의 압축 효율이 크게 떨어지게 되고, 그것은 결국 화질의 저하로 이어지기 쉽다.In addition, if a motion vector that plays an important role in the temporal filtering process is incorrectly set, the compression efficiency in the temporal filtering process is greatly reduced, which in turn leads to deterioration of image quality.

따라서, 스케일러블 비디오 코딩에 적합하게 사용할 수 있도록 종래의 방법을 개량할 필요가 있다.Therefore, there is a need to improve the conventional method so that it can be suitably used for scalable video coding.

본 발명은 상기한 문제점을 고려하여 창안된 것으로, 스케일러블 비디오 코딩에서 지원하는 모든 해상도에 대하여 일정한 수준 이상의 품질을 보장할 수 있도록 모션 벡터를 선택하는 방법을 제공하는 것을 목적으로 한다.The present invention has been made in view of the above problems, and an object of the present invention is to provide a method of selecting a motion vector to guarantee a certain level or more of quality for all resolutions supported by scalable video coding.

상기한 목적을 달성하기 위하여, 본 발명에 따른, 다중 해상도를 지원하는 스케일러블 비디오 압축에서 모션 벡터를 선정하는 방법에 있어서, 소정의 모션 벡터 후보에 대하여, 각각의 해상도 별로 에러를 계산하고, 상기 각각의 에러에 가중치를 부여한 총 에러를 계산한 후, 상기 총 에러가 최소가 되는 모션 벡터 후보를 실제 모션 벡터로 선정하는 것을 특징으로 한다.In order to achieve the above object, in the method for selecting a motion vector in scalable video compression that supports multiple resolutions according to the present invention, an error is calculated for each resolution for a predetermined motion vector candidate, After calculating the total error weighted to each error, the motion vector candidate for which the total error is minimum is selected as the actual motion vector.

상기한 목적을 달성하기 위하여, 본 발명에 따른 비디오 압축 장치는, 입력된 비디오로부터 각 프레임에 관한 모션 추정을 수행하여 모션 벡터를 선정하는 모션 추정부; 상기 모션 벡터를 이용하여 상기 각 프레임의 시간적 중복성을 제거하는 시간적 필터링부; 및 상기 시간적 중복성이 제거된 프레임에 대하여 웨이블릿 변환을 수행하여 웨이블릿 계수를 구하는 공간적 변환부를 포함하는데,In order to achieve the above object, a video compression apparatus according to the present invention, the motion estimation unit for selecting a motion vector by performing a motion estimation for each frame from the input video; A temporal filtering unit to remove temporal redundancy of each frame using the motion vector; And a spatial transform unit configured to obtain wavelet coefficients by performing wavelet transform on the frame from which the temporal redundancy is removed.

상기 모션 벡터를 선정하는 것은 소정의 모션 벡터 후보에 대하여, 각각의 해상도 별로 에러를 계산하고, 상기 각각의 에러에 가중치를 부여한 총 에러를 계산한 후, 상기 총 에러가 최소가 되는 모션 벡터 후보를 선정하는 것을 특징으로 한다.The selecting of the motion vector is to calculate an error for each resolution for a predetermined motion vector candidate, calculate a total error weighted to each error, and then select a motion vector candidate for which the total error is minimum. It is characterized by selecting.

상기 비디오 압축 장치는, 웨이블릿 계수를 엠베디드 양자화 방법에 의하여 양자화하는 엠베디드 양자화부를 더 포함하는 것이 바람직하다.Preferably, the video compression apparatus further includes an embedded quantizer that quantizes wavelet coefficients by an embedded quantization method.

상기 소정의 모션 벡터 후보는, 최대 해상도를 갖는 프레임의 탐색 범위 내에서 발생 가능한 모션 벡터 중에서 하나의 벡터를 선택하고, 상기 최대 해상도 보다 낮은 해상도에 대하여는 상기 선택한 벡터를 해상도에 비례하여 축소함으로써 선정하는 것이 바람직하다.The predetermined motion vector candidate is selected by selecting one vector among motion vectors that can occur within a search range of a frame having a maximum resolution, and reducing the selected vector in proportion to the resolution for a resolution lower than the maximum resolution. It is preferable.

그리고 상기 에러는, 현재 프레임의 매크로 블록과 참조 프레임의 해당 매크로 블록간에 픽셀 값의 차이인 것이 바람직하다.The error is preferably a difference between pixel values between the macroblock of the current frame and the corresponding macroblock of the reference frame.

또한 상기 에러는, 현재 프레임의 매크로 블록과 참조 프레임의 해당 매크로 블록간에 픽셀 값의 차이를 매크로 블록내의 픽셀 수로 나누어 평균한 값인 것이 바람직하다.In addition, the error is preferably a value obtained by dividing the difference in pixel values between the macroblock of the current frame and the corresponding macroblock of the reference frame by the number of pixels in the macroblock.

상기 가중치는, 각 해상도 중 어느 해상도에 보다 중점을 둘 것인가에 따라 조절하는 것이 바람직하다.It is preferable to adjust the said weight according to which resolution of each resolution is more centered.

이하, 본 명세서 전체에서 ＇비디오(video)＇는 동영상(moving picture)을, ＇이미지(image)＇는 정지영상(still picture)를 의미하는 것으로 한다. 그리고, 비디오와 이미지를 통칭하여 ＇영상＇이라고 표현할 수 있다.Hereinafter, throughout this specification, "video" means a moving picture and "image" means a still picture. The video and the image may be collectively referred to as "image".

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명한다. 본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. Advantages and features of the present invention, and methods of achieving the same will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various forms. It is provided to fully convey the scope of the invention to those skilled in the art, and the present invention is defined only by the scope of the claims. Like reference numerals refer to like elements throughout.

도 4는 본 발명에 따라 모션 추정을 통하여 모션 벡터를 구하는 과정을 나타낸 흐름도이다.4 is a flowchart illustrating a process of obtaining a motion vector through motion estimation according to the present invention.

먼저, 프레임을 매크로 블록으로 분할할 개수(mⅹn)를 선정한다(S100). 그리고, 각각의 해상도를 갖는 프레임 각각을 상기 선정한 개수의 매크로 블록으로 분할한다(S110).First, the number m_n to be divided into macroblocks is selected (S100). Then, each frame having each resolution is divided into the selected number of macro blocks (S110).

다음으로, 최대 해상도를 갖는 프레임에 대하여 탐색 범위(p개의 매크로 블록 크기)를 설정하고(S120), 그 범위 내에서 발생 가능한 모션 벡터의 방향, 즉 모션 벡터의 후보를 모두 구한다(S130).Next, a search range (p macroblock size) is set for a frame having the maximum resolution (S120), and all directions of motion vectors that can occur within the range, that is, candidates for the motion vector are found (S130).

상기 모션 벡터의 후보 중에서 하나를 모션 벡터로 가정한다(S140).One of the candidates of the motion vector is assumed to be a motion vector (S140).

다음으로, 그 아래의 해상도를 갖는 프레임 각각에 대하여 상기 가정한 모션 벡터를 해상도에 비례하여 축소한 것을 모션 벡터로 가정한다(S150).Next, it is assumed that a reduced motion of the assumed motion vector in proportion to the resolution for each frame having a resolution below it is assumed to be a motion vector (S150).

각각의 해상도를 갖는 프레임 각각에 대하여, 상기 가정한 모션 벡터를 기준으로 현재 프레임의 매크로 블록과 참조 프레임(reference frame)의 해당 매크로 블록간에 픽셀 값의 차이, 즉 에러 값을 구한다(S160).For each frame having each resolution, a difference between pixel values, that is, an error value, is obtained between the macroblock of the current frame and the corresponding macroblock of the reference frame based on the assumed motion vector (S160).

해상도 별로 구한 각각의 에러 값(E_jk)에 소정의 계수(w_jk)를 곱하고 그 결과를 전체 해상도 개수에 대하여 합산하여 총 에러(E)를 계산한다(S170). 총 에러는 상기 각각 다른 해상도의 개수가 q개라면 다음의 [식 1]과 같이 표현된다. 여기서, 첨자번호 1, 2, 내지 q는 해상도가 큰 것부터 순서대로 부여하기로 한다. 그리고, 첨자 j는 모션 벡터의 후보에 대한 인덱스이다.The total error E is calculated by multiplying each error value E _jk obtained for each resolution by a predetermined coefficient w _jk and summing the result with respect to the total number of resolutions (S170). The total error is expressed by the following Equation 1 if the number of different resolutions is q. Here, the subscripts 1, 2, and q are given in order from the largest resolution. And the subscript j is the index of the candidate of the motion vector.

E_j= w_j1ⅹE_j1+ w_j2ⅹE_j2 + ... + w_jqⅹE _jq [식 1]E _j = w _j1 Off _j1 + w _j2 Off _j2 + ... + w _jq Off _jq [Equation 1]

다음으로, 모션 벡터의 후보 중 다음 후보를 모션 벡터로 가정하고, S140 내지 S170 과정을 반복한다. 모든 후보 벡터에 대하여 총 에러(E_j)를 계산할 때까지(S180의 예) S140 내지 S170 과정을 반복한다.Next, it is assumed that the next candidate among the motion vector candidates is a motion vector, and processes S140 to S170 are repeated. The processes S140 to S170 are repeated until the total error E _j is calculated for all candidate vectors (Yes of S180).

상기 과정들을 거치면서 최종적으로 구한 총 에러 값들(E₁, E₂, ... , E_p)을 비교하여 최소인 값을 찾는다(S190). 마지막으로, 상기 총 에러 값이 가장 작은 방향의 모션 벡터를 실제 모션 벡터로 선택한다(S191).Comparing the total error values (E ₁ , E ₂ , ..., E _p ) finally obtained through the above process to find the minimum value (S190). Finally, the motion vector of the direction having the smallest total error value is selected as the actual motion vector (S191).

계수는 각 해상도 중 어느 것에 보다 중점을 줄 것인가를 조절할 수 있는 웨이트(weight) 값으로서 계수의 총합(w_j1+w_j2+ ... +w_jq)은 1이 되도록 한다. 이는 통신 환경이나 기기 환경을 고려하여 자유롭게 정할 수 있다. 즉, 디코더 단에서 고해상도의 화상을 이용할 것이라고 예상되는 경우, 또는 저해상도의 화상을 이용할 것이라고 예상되는 경우에 따라서 달라질 것이다.The coefficient is a weight value that can control which of the respective resolutions are more important, so that the sum of the coefficients (w _j1 + w _j2 + ... + w _jq ) is one. This can be freely determined in consideration of the communication environment or the device environment. That is, it will be different depending on the case where it is expected to use a high resolution picture at the decoder stage or when it is expected to use a low resolution picture.

이하에서는, 도 5와 같이 최대 해상도가 16x16이고, 매크로 블록 개수가 4x4인 경우를 예로 들어 도 4의 과정을 설명한다.Hereinafter, the process of FIG. 4 will be described with an example in which the maximum resolution is 16x16 and the macroblock number is 4x4 as shown in FIG. 5.

먼저, 프레임을 매크로 블록으로 분할할 개수를 4x4로 선정하고(S100), 이에 따라 각각의 프레임(a, b, c)을 분할한다(S110).First, the number of frames to be divided into macroblocks is selected as 4x4 (S100), and accordingly, each frame (a, b, c) is divided (S110).

다음으로, 최대 해상도를 갖는 프레임(a)에 대하여 탐색 범위를 빗금친 6개의 매크로 블록의 범위로 설정하고(S120), 그 범위 내에서 모션 벡터의 후보를 모두 구한다(S130). 모션 벡터의 후보는 6개의 벡터로 나타난다. 여기에는 화살표로 표시되지 않은 0벡터도 포함된다.Next, the search range is set to the range of six macroblocks hatched with respect to the frame a having the maximum resolution (S120), and all candidates of the motion vector are found within the range (S130). The candidate of the motion vector is represented by six vectors. This includes zero vectors not indicated by arrows.

상기 모션 벡터의 후보 중에서 벡터(a1)를 모션 벡터로 가정한다(S140).It is assumed that a vector a1 is a motion vector among the candidates of the motion vector (S140).

다음으로, 그 아래의 해상도를 갖는 프레임 각각에 대하여 상기 가정한 모션 벡터를 해상도에 비례하여 각각 1/2, 1/4로 축소한 것, 즉 벡터(a2), 벡터(a3)를 각각 모션 벡터로 가정한다(S150).Next, for each frame having a resolution below it, the hypothesized motion vector is reduced to 1/2 and 1/4 in proportion to the resolution, that is, the vector a2 and the vector a3 are respectively reduced. Assume that (S150).

각각의 해상도를 갖는 프레임 각각에 대하여, 상기 가정한 모션 벡터를 기준으로 현재 프레임의 매크로 블록과 참조 프레임(reference frame)의 해당 매크로 블록간에 픽셀 값의 차이, 즉 에러 값을 구한다(S160). 다만, 매크로 블록당 픽셀 수가 각각 16개, 4개, 1개로 다르므로, 그 에러 값은 픽셀 개수에 대한 평균값으로 한다. 물론 평균값으로 계산하지 않아도 계수(w_jk)에 의하여 조절이 가능하므로 평균값이 아니라 단순합을 에러 값으로 선택할 수도 있다.For each frame having each resolution, a difference between pixel values, that is, an error value, is obtained between the macroblock of the current frame and the corresponding macroblock of the reference frame based on the assumed motion vector (S160). However, since the number of pixels per macroblock differs from 16, 4, and 1, respectively, the error value is an average of the number of pixels. Of course, since it can be adjusted by the coefficient (w _jk ) without calculating the average value, a simple sum may be selected as an error value instead of the average value.

해상도 별로 구한 각각의 에러 값(E_jk)에 소정의 계수(w_jk)를 곱하고 그 결과를 전체 해상도 개수에 대하여 합산하여 총 에러(E_j)를 계산한다(S170). 즉 총 에러는 [식 2]와 같이 된다.A total error E _j is calculated by multiplying each error value E _jk obtained for each resolution by a predetermined coefficient w _jk and summing the result with respect to the total number of resolutions (S170). That is, the total error is as shown in [Equation 2].

E_j= w₁ⅹE₁+ w₂ⅹE₂ + w₃ⅹE₃ [식 2]E _j = w ₁ Off ₁ + w ₂ Off ₂ + w ₃ Off ₃ [Equation 2]

상기 과정들을 거치면서 최종적으로 구한 총 에러 값들, 즉 E₁, E₂, E₃, E₄, E₅, 및 E₆을 비교하여 최소인 값을 찾는다(S190). 마지막으로, 상기 총 에러 값이 가장 작은 방향의 모션 벡터를 실제 모션 벡터로 선택한다(S191).Finally, the total error values obtained through the above processes, that is, E ₁ , E ₂ , E ₃ , E ₄ , E ₅ , and E ₆ are compared to find the minimum value (S190). Finally, the motion vector of the direction having the smallest total error value is selected as the actual motion vector (S191).

본 발명에서 제시한 도 4의 흐름도와 같은 모션 추정 과정을 통해 모션 벡터를 결정한 후 나머지 엔코딩 과정은 종래의 기술을 그대로 이용할 수 있다. 즉 도 2의 설명에서와 같이, 상기 결정된 모션 벡터를 이용한 시간적 필터링 과정, 공간적 변환 과정, 엠베디드 양자화 과정, 및 엔트로피 부호화 과정를 거친 후에 최종적으로 비트 스트림을 생성한다.After determining the motion vector through the motion estimation process as shown in the flowchart of FIG. 4 presented in the present invention, the remaining encoding process may use the conventional technology as it is. That is, as described in FIG. 2, the bit stream is finally generated after the temporal filtering process, the spatial transform process, the embedded quantization process, and the entropy encoding process using the determined motion vector.

도 6은 본 발명에 따른 엔코딩 방법을 수행하기 위한 시스템의 구성도이다. 상기 시스템은 TV, 셋탑박스, 데스크탑, 랩탑 컴퓨터, 팜탑(palmtop) 컴퓨터, PDA(personal digital assistant), 비디오 또는 이미지 저장 장치(예컨대, VCR(video cassette recorder), DVR(digital video recorder) 등)를 나타내는 것일 수 있다. 뿐만 아니라, 상기 시스템은 상기한 장치들을 조합한 것, 또는 상기 장치가 다른 장치의 일부분으로 포함된 것을 나타내는 것일 수도 있다. 상기 시스템은 적어도 하나 이상의 비디오/이미지 소스(video source; 510), 하나 이상의 입출력 장치(520), 프로세서(540), 메모리(550), 그리고 디스플레이 장치(530)를 포함하여 구성될 수 있다.6 is a block diagram of a system for performing an encoding method according to the present invention. The system includes a TV, set-top box, desktop, laptop computer, palmtop computer, personal digital assistant, video or image storage device (e.g., video cassette recorder (VCR), digital video recorder (DVR), etc.). It may be to indicate. In addition, the system may represent a combination of the above devices, or that the device is included as part of another device. The system may include at least one video / image source 510, at least one input / output device 520, a processor 540, a memory 550, and a display device 530.

비디오/이미지 소스(510)는 TV 리시버(TV receiver), VCR, 또는 다른 비디오/이미지 저장 장치를 나타내는 것일 수 있다. 또한, 상기 소스(510)는 인터넷, WAN(wide area network), LAN(local area network), 지상파 방송 시스템(terrestrial broadcast system), 케이블 네트워크, 위성 통신 네트워크, 무선 네트워크, 전화 네트워크 등을 이용하여 서버로부터 비디오/이미지를 수신하기 위한 하나 이상의 네트워크 연결을 나타내는 것일 수도 있다. 뿐만 아니라, 상기 소스는 상기한 네트워크들을 조합한 것, 또는 상기 네트워크가 다른 네트워크의 일부분으로 포함된 것을 나타내는 것일 수도 있다.Video / image source 510 may be representative of a TV receiver, a VCR, or other video / image storage device. In addition, the source 510 may be a server using the Internet, a wide area network (WAN), a local area network (LAN), a terrestrial broadcast system, a cable network, a satellite communication network, a wireless network, a telephone network, or the like. It may be indicative of one or more network connections for receiving video / images from the network. In addition, the source may be a combination of the above networks, or may indicate that the network is included as part of another network.

입출력 장치(520), 프로세서(540), 그리고 메모리(550)는 통신 매체(560)를 통하여 통신한다. 상기 통신 매체(560)에는 통신 버스, 통신 네트워크, 또는 하나 이상의 내부 연결 회로를 나타내는 것일 수 있다. 상기 소스(510)로부터 수신되는 입력 비디오/이미지 데이터는 메모리(550)에 저장된 하나 이상의 소프트웨어 프로그램에 따라 프로세서(540)에 의하여 처리될(processed) 수 있고, 디스플레이 장치(530)에 제공되는 출력 비디오/이미지를 생성하기 위하여 프로세서(540)에 의하여 실행될 수 있다.The input / output device 520, the processor 540, and the memory 550 communicate through the communication medium 560. The communication medium 560 may represent a communication bus, a communication network, or one or more internal connection circuits. Input video / image data received from the source 510 may be processed by the processor 540 according to one or more software programs stored in the memory 550, and output video provided to the display device 530. May be executed by the processor 540 to generate an image.

특히, 메모리(550)에 저장된 소프트웨어 프로그램은 본 발명에 따른 방법을 수행하는 스케일러블 웨이블릿 기반의 코덱을 포함한다. 상기 코덱은 메모리(550)에 저장되어 있을 수도 있고, CD-ROM이나 플로피 디스크와 같은 저장 매체에서 읽어들이거나, 각종 네트워크를 통하여 소정의 서버로부터 다운로드한 것일 수도 있다. 상기 소프트웨어에 의하여 하드웨어 회로에 의하여 대체되거나, 소프트웨어와 하드웨어 회로의 조합에 의하여 대체될 수 있다.In particular, the software program stored in the memory 550 includes a scalable wavelet based codec for performing the method according to the present invention. The codec may be stored in the memory 550, read from a storage medium such as a CD-ROM or a floppy disk, or downloaded from a predetermined server through various networks. It may be replaced by hardware circuitry by the software or by a combination of software and hardware circuitry.

이상 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although embodiments of the present invention have been described above with reference to the accompanying drawings, those skilled in the art to which the present invention pertains may implement the present invention in other specific forms without changing the technical spirit or essential features thereof. I can understand that. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive.

종래의 모션 추정 알고리즘은 특정 해상도의 모션 벡터를 위주로 하여 다양한 해상도를 갖는 동영상의 모션 벡터를 선택하였다.The conventional motion estimation algorithm selects a motion vector of a video having various resolutions based on a motion vector of a specific resolution.

그러나, 본 발명에 따르면, 스케일러블 비디오 코딩에 있어 다양한 해상도를 고려한 모션 벡터를 이용할 수 있게 되어, 다양한 해상도를 갖는 동영상을 적응적으로 생성할 수 있다.However, according to the present invention, it is possible to use a motion vector considering various resolutions in scalable video coding, thereby adaptively generating a video having various resolutions.

또한, 본 발명에 따르면, 스케일러블 비디오 코딩에 있어 적절한 모션 벡터를 이용함으로써 시간적 필터링 과정에서의 압축률을 제고할 수 있고, 이를 곧 전반적 화질의 향상을 가져오게 된다.In addition, according to the present invention, by using an appropriate motion vector in scalable video coding, it is possible to improve the compression ratio in the temporal filtering process, which leads to an improvement in overall image quality.

도 1은 스케일러블 비디오/이미지 코딩 시스템의 전체 구조를 간략히 도시한 도면.1 is a simplified diagram of an overall structure of a scalable video / image coding system.

도 2는 스케일러블 비디오 코딩 시스템 중에서 엔코더의 구성을 나타낸 블록도.2 is a block diagram showing the configuration of an encoder in a scalable video coding system.

도 3은 모션 추정 방법 중 블록 매칭 방법을 설명하는 도면.3 is a diagram for explaining a block matching method among motion estimation methods.

도 4는 본 발명에 따라 모션 추정을 통하여 모션 벡터를 구하는 과정을 나타낸 흐름도.4 is a flowchart illustrating a process of obtaining a motion vector through motion estimation according to the present invention.

도 5는 최대 해상도가 16x16이고, 매크로 블록 개수가 4x4인 경우에 본 발명을 적용한 예시도.5 is an exemplary diagram to which the present invention is applied when the maximum resolution is 16x16 and the number of macroblocks is 4x4.

도 6은 본 발명에 따른 엔코딩 방법을 수행하기 위한 시스템의 구성도6 is a block diagram of a system for performing an encoding method according to the present invention.

(도면의 주요부분에 대한 부호 설명)(Symbol description of main part of drawing)

100 : 엔코더 110 : 모션 추정부100: encoder 110: motion estimation unit

200 : 프리 디코더 300 : 디코더200: pre decoder 300: decoder

520 : 입출력 장치 530 : 디스플레이 장치 520: input and output device 530: display device

540 : 프로세서 550 : 메모리540: processor 550: memory

Claims

In the method of selecting a motion vector in scalable video coding that supports multiple resolutions,

For a given motion vector candidate, the error is calculated for each resolution, the total error weighted to each error is calculated, and the motion vector candidate for which the total error is minimized is selected as the actual motion vector. A method of selecting a motion vector characterized by the above-mentioned.

The method of claim 1, wherein the predetermined motion vector candidate is

A motion vector is selected by selecting one vector from among motion vectors that can be generated within a search range of a frame having a maximum resolution, and reducing the selected vector in proportion to the resolution for a resolution lower than the maximum resolution. How to.

The method of claim 1, wherein the error is

And a pixel value difference between the macroblock of the current frame and the corresponding macroblock of the reference frame.

The method of claim 1, wherein the error is

And dividing the difference in pixel values between the macroblock of the current frame and the corresponding macroblock of the reference frame by the number of pixels in the macroblock.

The method of claim 1, wherein the weight is

A method of selecting a motion vector, characterized in that it is adjusted according to which resolution of each resolution.

A motion estimation unit for selecting a motion vector by performing motion estimation on each frame from the input video;

A temporal filtering unit to remove temporal redundancy of each frame using the motion vector; And

A spatial transform unit is configured to obtain a wavelet coefficient by performing wavelet transform on the frame from which the temporal redundancy is removed.

For a predetermined motion vector candidate, an error is calculated for each resolution, a total error weighted to each error is calculated, and then a motion vector candidate for which the total error is minimized is selected. Compression device.

The method of claim 6,

And an embedded quantizer for quantizing the wavelet coefficients by an embedded quantization method.

7. The method of claim 6, wherein the predetermined motion vector candidate is

And selecting one of the motion vectors that can occur within the search range of the frame having the maximum resolution, and reducing the selected vector in proportion to the resolution for a resolution lower than the maximum resolution.

The method of claim 6, wherein the error is

The method of claim 6, wherein the weight is

The video compression device, characterized in that the adjustment according to which of the resolution to focus more.