KR100843080B1

KR100843080B1 - Video transcoding method and apparatus thereof

Info

Publication number: KR100843080B1
Application number: KR1020070000791A
Authority: KR
Inventors: 신규환
Original assignee: 삼성전자주식회사
Priority date: 2006-02-24
Filing date: 2007-01-03
Publication date: 2008-07-02
Also published as: KR20070088334A; CN101026758B; CN101026758A

Abstract

본 발명은 입력 비디오 스트림을 GOP 구조가 상이한 다른 포맷으로 트랜스코딩할 때, 여러 개의 참조 프레임에서 적합한 참조 프레임을 고속으로 선택할 수 있는 방법에 관한 것이다.The present invention relates to a method for rapidly selecting a suitable reference frame from several reference frames when transcoding an input video stream into another format having a different GOP structure.

본 발명의 일 실시예에 따른 트랜스코더는, 입력 비디오 스트림으로부터 변환 계수 및 비디오 프레임을 복원하는 복원부와, 상기 변환 계수의 크기에 근거하여, 상기 비디오 프레임에 의해 참조된 제1 프레임과 상기 제1 프레임과 다른 위치의 제2 프레임 중에서 하나를 선택하는 선택부와, 상기 선택된 프레임을 참조하여 상기 복원된 비디오 프레임을 부호화하는 부호화부로 이루어진다.A transcoder according to an embodiment of the present invention includes a reconstruction unit for reconstructing a transform coefficient and a video frame from an input video stream, and a first frame and the first frame referred to by the video frame based on the magnitude of the transform coefficient. And a selector for selecting one of the second frames at different positions from the one frame, and an encoder for encoding the reconstructed video frame with reference to the selected frame.

트랜스코딩, 트랜스코더, 참조 프레임, 모션 벡터, 블록 Transcoding, transcoder, reference frame, motion vector, block

Description

Video transcoding method and apparatus

도 1a는 MPEG-2 비디오 메인 프로파일(Main Profile)의 GOP 구조를 나타내는 도면.1A is a diagram illustrating a GOP structure of an MPEG-2 video main profile.

도 1b는 H.264 베이스라인 프로파일(Baseline Profile)의 GOP 구조를 나타내는 도면.1B is a diagram illustrating a GOP structure of an H.264 Baseline Profile.

도 2a 및 도 2b는 H.264가 지원하는 다중 참조(multiple reference)의 개념을 보여주는 도면들.2A and 2B illustrate the concept of multiple references supported by H.264.

도 3a 및 도 3b는 트랜스코딩 시에 참조 프레임을 선택하는 방법을 설명하는 도면들.3A and 3B illustrate a method of selecting a reference frame during transcoding.

도 4는 본 발명의 일 실시예에 따른 트랜스코더의 구성을 도시하는 블록도.4 is a block diagram showing the configuration of a transcoder according to an embodiment of the present invention.

도 5는 도 4의 트랜스코더에 포함되는 복원부의 구성을 도시하는 블록도.FIG. 5 is a block diagram showing a configuration of a recovery unit included in the transcoder of FIG. 4. FIG.

도 6은 도 4의 트랜스코더에 포함되는 부호화부의 구성을 도시하는 블록도.FIG. 6 is a block diagram showing a configuration of an encoding unit included in the transcoder of FIG. 4. FIG.

(도면의 주요부분에 대한 부호 설명)(Symbol description of main part of drawing)

100 : 트랜스코더 110 : 복원부100: transcoder 110: restoring unit

111 : 엔트로피 디코더 112 : 역 양자화부111: entropy decoder 112: inverse quantization unit

113 : 역 변환부 114 : 역 예측부113: inverse transform unit 114: inverse predictor

120 : 선택부 130 : 부호화부120: selector 130: encoder

131 : 예측부 132 : 변환부131: prediction unit 132: transformation unit

133 : 양자화부 134 : 엔트로피 인코더133: quantization unit 134: entropy encoder

본 발명은 입력 비디오 스트림을 GOP(Group Of Picture) 구조가 상이한 다른 포맷으로 트랜스코딩할 때, 여러 개의 참조 프레임에서 적합한 참조 프레임을 고속으로 선택할 수 있는 방법에 관한 것이다.The present invention relates to a method for rapidly selecting an appropriate reference frame from several reference frames when transcoding an input video stream into another format having a different group of picture (GOP) structure.

인터넷을 포함한 정보통신 기술이 발달함에 따라 문자, 음성뿐만 아니라 화상통신이 증가하고 있다. 기존의 문자 위주의 통신 방식으로는 소비자의 다양한 욕구를 충족시키기에는 부족하며, 이에 따라 문자, 영상, 음악 등 다양한 형태의 특성을 수용할 수 있는 멀티미디어 서비스가 증가하고 있다. 멀티미디어 데이터는 그 양이 방대하여 대용량의 저장매체를 필요로 하며 전송 시에 넓은 대역폭을 필요로 한다. 따라서 문자, 영상, 오디오를 포함한 멀티미디어 데이터를 전송하기 위해서는 압축코딩기법을 사용하는 것이 필수적이다.As information and communication technology including the Internet is developed, not only text and voice but also video communication are increasing. Existing text-oriented communication methods are not enough to satisfy various needs of consumers. Accordingly, multimedia services that can accommodate various types of characteristics such as text, video, and music are increasing. The multimedia data has a huge amount and requires a large storage medium and a wide bandwidth in transmission. Therefore, in order to transmit multimedia data including text, video, and audio, it is essential to use a compression coding technique.

데이터를 압축하는 기본적인 원리는 데이터의 중복(redundancy) 요소를 제거하는 과정이다. 이미지에서 동일한 색이나 객체가 반복되는 것과 같은 공간적 중복이나, 동영상 프레임에서 인접 프레임이 거의 변화가 없는 경우나 오디오에서 같은 음이 계속 반복되는 것과 같은 시간적 중복, 또는 인간의 시각 및 지각 능력이 높 은 주파수에 둔감한 것을 고려한 시각적 중복을 제거함으로써 데이터를 압축할 수 있다. 일반적인 비디오 코딩 방법에 있어서, 비디오 데이터의 시간적 중복은 모션 보상에 근거한 시간적 필터링(temporal filtering)에 의해 제거하고, 공간적 중복은 공간적 변환(spatial transform)에 의해 제거한다.The basic principle of compressing data is to eliminate redundancy in the data. Spatial overlap such as repeating the same color or object in an image, temporal overlap such as when there is almost no change in adjacent frames in a video frame, or the same sound repeats repeatedly in audio, or human visual and perceptual ability. Data can be compressed by eliminating visual redundancy that takes into account frequency insensitivity. In a general video coding method, temporal redundancy of video data is removed by temporal filtering based on motion compensation, and spatial redundancy is removed by spatial transform.

비디오 데이터의 중복을 제거한 결과는 다시 양자화 과정을 통하여 소정의 양자화 스텝에 따라서 손실 부호화된다. 상기 양자화된 결과는 최종적으로 엔트로피 부호화(entropy coding)를 통하여 최종적으로 무손실 부호화된다.The result of removing the redundancy of the video data is again loss coded according to a predetermined quantization step through a quantization process. The quantized result is finally losslessly coded through entropy coding.

그런데, 부호화된 비디오 데이터는 그대로 최종 단말 기기에 전달되어 복호화되기도 하지만, 최종 단말 기기로 전송하기 전에 네트워크 상황 또는 최종 단말 기기의 성능 등을 고려하여 트랜스코딩(transcoding)되기도 한다. 예를 들어, 부호화된 비디오 데이터가 현재 네트워크를 통하여 전송하기에 적합하지 않은 경우, 전송 서버 측에서는 상기 비디오 데이터의 SNR(signal-to-noise ratio), 프레임율, 해상도 또는 코딩 방식(코덱)을 변경하게 되는데, 이러한 과정을 "트랜스코딩"이라고 한다.However, the encoded video data may be transmitted and decoded to the final terminal device as it is, but may be transcoded in consideration of network conditions or performance of the final terminal device before being transmitted to the final terminal device. For example, if the encoded video data is not suitable for transmission over the current network, the transmitting server side changes the signal-to-noise ratio (SNR), frame rate, resolution or coding scheme (codec) of the video data. This process is called "transcoding".

MPEG-2로 코딩된 비디오 데이터를 H.264 방식으로 트랜스코딩하는 종래의 방법은 주파수 영역(frequency domain)에서 컨버전(conversion)하는 방법과 화소 영역(pixel domain)에서 컨버전하는 방법으로 나눌 수 있다. 주파수 영역에서 컨버전하는 방법은 트랜스코딩의 입력 포맷과 출력 포맷간에 유사성이 큰 경우에 주로 사용되며, 화소 영역에서 컨버전하는 방법은 상기 유사성이 작은 경우에 주로 사용된다. 특히, 화소 영역에서 컨버전하는 방법은 인코딩 시에 추정된 모션 벡터를 재활 용한다.Conventional methods of transcoding MPEG-2 coded video data in an H.264 method may be divided into a method of converting in a frequency domain and a method of converting in a pixel domain. The method of converting in the frequency domain is mainly used when the similarity between the input format and the output format of transcoding is large, and the method of converting in the pixel region is mainly used when the similarity is small. In particular, the method of converting in the pixel region recycles the motion vector estimated at the time of encoding.

그런데, 트랜스코딩에 의하여 GOP(Group of Pictures) 구조가 달라지거나 모션 벡터의 참조 방식이 달라지는 경우에는, 기존의 모션 벡터를 그대로 이용하기는 힘들다. 이러한 이유로, 만일 트랜스코딩 시에 복원된 영상들로부터 모션 벡터를 재계산한다면 많은 시간과 자원을 소모하게 될 것이다. 또한, 재계산을 피하고자 멀리 떨어진 프레임을 참조하도록 하면, 직전 프레임을 참조하는 것보다 많은 잔차(Residue)가 발생하여 비트율 상승 및 화질 저하를 가져올 수도 있다.However, when the GOP (Group of Pictures) structure is changed or the motion vector reference method is changed by transcoding, it is difficult to use the existing motion vector as it is. For this reason, if the motion vector is recalculated from the reconstructed images during transcoding, it will consume a lot of time and resources. In addition, if a reference to a far-away frame is referred to avoid recomputation, more residuals may occur than referring to a previous frame, resulting in an increase in bit rate and deterioration in image quality.

이렇게 GOP 구조(참조 방식)이 다른 비디오 스트림 간에 트랜스코딩이 이루어질 때, 계산 복잡성, 화질 및 비트율 간에 적절한 타협점(Trade-off)를 얻기 위하여, 어떤 프레임을 참조 프레임으로 선택하여야 하는가는 매우 어려운 문제이다. When transcoding is performed between video streams having different GOP structures (reference methods), it is very difficult to select which frame as a reference frame in order to obtain an appropriate trade-off between computational complexity, image quality, and bit rate. .

본 발명이 이루고자 하는 기술적 과제는, 따라서, 입력과 출력 간에 서로 다른 GOP 구조(참조 방식)을 갖는 트랜스코딩 과정에서, 트랜스코딩 속도와 화상 품질을 고려하여 적합한 참조 프레임을 선택하는 방법 및 장치를 제공하는 것이다.Accordingly, the present invention provides a method and apparatus for selecting an appropriate reference frame in consideration of transcoding speed and picture quality in a transcoding process having a different GOP structure (reference method) between an input and an output. It is.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the following description.

상기 기술적 과제를 달성하기 위한 본 발명의 일 실시예에 따른, 입력 비디오 스트림을 변환하여 출력 비디오 스트림을 생성하는 트랜스코더는, 상기 입력 비 디오 스트림으로부터 변환 계수 및 비디오 프레임을 복원하는 복원부; 상기 변환 계수의 크기에 근거하여, 상기 비디오 프레임에 의해 참조된 제1 프레임과 상기 제1 프레임과 다른 위치의 제2 프레임 중에서 하나를 선택하는 선택부; 및 상기 선택된 프레임을 참조하여 상기 복원된 비디오 프레임을 부호화하는 부호화부를 포함한다.According to an aspect of the present invention, a transcoder for converting an input video stream to generate an output video stream includes: a reconstruction unit for reconstructing a transform coefficient and a video frame from the input video stream; A selection unit selecting one of a first frame referred to by the video frame and a second frame at a position different from the first frame based on the magnitude of the transform coefficient; And an encoder which encodes the reconstructed video frame with reference to the selected frame.

상기 기술적 과제를 달성하기 위한 본 발명의 일 실시예에 따른 트랜스코딩 방법은, 입력 비디오 스트림으로부터 변환 계수 및 비디오 프레임을 복원하는 단계; 상기 변환 계수의 크기에 근거하여, 상기 비디오 프레임에 의해 참조된 제1 프레임과 상기 제1 프레임과 다른 위치의 제2 프레임 중에서 하나를 선택하는 단계; 및 상기 선택된 프레임을 참조하여 상기 복원된 비디오 프레임을 부호화하는 단계를 포함한다.According to an aspect of the present invention, there is provided a transcoding method comprising: reconstructing a transform coefficient and a video frame from an input video stream; Selecting one of a first frame referenced by the video frame and a second frame at a position different from the first frame based on the magnitude of the transform coefficient; And encoding the reconstructed video frame with reference to the selected frame.

이하 첨부된 도면들을 참조하여 본 발명의 일 실시예를 상세히 설명한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1a는 MPEG-2 비디오 메인 프로파일(Main Profile)의 GOP 구조를, 도 1b는 H.264 베이스라인 프로파일(Baseline Profile)의 GOP 구조를 나타낸다. 도 1에 도시된 바와 같이, B 프레임은 전후의 I 프레임 또는 P 프레임을 참조할 수 있지만, 다른 B 프레임을 참조할 수는 없다. 그러나, P 프레임은 I 프레임 또는 다른 P 프레임을 참조할 수 있다. 이러한 참조는 하나의 GOP 구조 내에서 이루어지는 것이 일반적이다.FIG. 1A illustrates a GOP structure of an MPEG-2 video main profile, and FIG. 1B illustrates a GOP structure of an H.264 baseline profile. As shown in FIG. 1, a B frame may refer to an I or P frame before and after, but may not refer to another B frame. However, a P frame may refer to an I frame or another P frame. Such references are generally made within a single GOP structure.

한편, H.264 베이스라인 프로파일은, 도 2에 도시한 바와 같이, 어떤 프레임은 직전 프레임을 참조하는 구조로 되어 있다. 그러나, 일반적으로 H.264는 단일의 GOP 내에서는 어떤 프레임을 참조할 수 있으며, 다중 참조도 가능한 구조로 되어 있다.On the other hand, the H.264 baseline profile, as shown in Fig. 2, has a structure in which some frames refer to the immediately preceding frame. However, in general, H.264 can refer to any frame within a single GOP, and has a structure that allows multiple references.

도 2a 및 도 2b는 H.264가 지원하는 다중 참조(multiple reference)의 개념을 보여주는 도면이다. 도 2a를 참조하면, 현재 P 프레임(10)은 동시에 복수의 프레임(20, 25)을 참조할 수 있음을 알 수 있다. 이는 모션 벡터를 추정하고 현재 프레임의 잔차(residual)를 생성하는 단위가 프레임 단위가 아니라 매크로블록(MB) 단위로 이루어지기 때문에 가능한 것이다.2A and 2B illustrate the concept of multiple references supported by H.264. Referring to FIG. 2A, it can be seen that the current P frame 10 may refer to a plurality of frames 20 and 25 at the same time. This is possible because the unit for estimating the motion vector and generating the residual of the current frame is made in units of macroblocks (MBs), not units of frames.

도 2b는 현재 P 프레임(10)의 서로 다른 매크로블록(MB1, MB2)들이 각각 서로 다른 프레임(20, 25) 상의 영역들(ref1, ref2)을 참조하는 것을 보여준다. 이와 같이, H.264에서는 매크로블록 별로 적합한 참조 프레임을 선택하게 함으로써 비디오 코딩의 다양성 및 적응성을 제공한다.2b shows that different macroblocks MB1 and MB2 of the current P frame 10 refer to regions ref1 and ref2 on different frames 20 and 25, respectively. As such, in H.264, selecting a suitable reference frame for each macroblock provides diversity and adaptability of video coding.

트랜스코더는 서로 다른 GOP 구조를 갖는 도 1a와 같은 입력 비디오를 도 2b와 같은 출력 비디오로 트랜스코딩하기 위하여는 입력 비디오의 모션 벡터를 재계산할 필요가 있다. 그러나, 출력 비디오에서 바로 전 프레임을 참조하기 위해서 모션 벡터를 재계산하면 계산 시간을 많이 소모하게 된다. 반면에, 재계산을 하지 않도록 입력 비디오의 참조 방식을 그대로 사용하여 원거리의 프레임을 참조한다면, 직전 프레임을 참조하는 것에 비하여 큰 잔차가 발생하여 화질 저하(또는 비트율 상승)을 유발할 수 있다. 따라서, 트랜스코딩 시에는 연산량과 화질(또는 비트율) 간에는 일정한 타협점(Trade-Off)을 찾아낼 필요가 있는 것이다.The transcoder needs to recalculate the motion vector of the input video in order to transcode the input video as shown in FIG. 1A into the output video as shown in FIG. 2B having different GOP structures. However, recalculating the motion vector to refer to the previous frame in the output video consumes a lot of computation time. On the other hand, if the far frame is referenced using the input video reference method as it is without recomputation, a large residual may be generated as compared to the previous frame, resulting in deterioration of the image quality (or bit rate increase). Therefore, when transcoding, it is necessary to find a certain trade-off between the amount of computation and the image quality (or bit rate).

도 3a 및 도 3b는 트랜스코딩 시에 참조 프레임을 선택하는 방법을 설명하는 도면으로서, 도 3a는 트랜스코딩 전의 입력 비디오 구조이고, 도 3b는 트랜스코딩 후의 출력 비디오 구조이다. 도 3a에서 현재 처리하는 프레임이 B₂ 이고, 모션 벡터는 I 프레임을 가리키고 있다. MPEG-2 구조에서는, 프레임 B₂의 모든 전방 참조 벡터는 I 프레임을 가리킨다. 반면, 도 3b와 같은 H.264 구조에서는, MB1과 MB2의 전방 순방향 모션 벡터(forward motion vector)들(mv1, mv2)은 I 프레임을 가리킬 수도 있고, P1 프레임을 가리킬 수도 있다. 만일, I 프레임을 가리키는 모션 벡터 mv2(I)가, P1 프레임을 가리키는 모션 벡터 mv2(P₁)에 비해 잔차를 그다지 크게 하지 않는 경우라면, 연산 속도를 위해서 mv2(I)를 선택하는 것이 유리할 것이다. 반면에, 그 반대의 경우라면, mv2(P₁)를 선택하는 것이 유리할 것이다.3A and 3B illustrate a method of selecting a reference frame during transcoding, in which FIG. 3A is an input video structure before transcoding and FIG. 3B is an output video structure after transcoding. In FIG. 3A, the frame currently being processed is B ₂ , and a motion vector indicates an I frame. In the MPEG-2 structure, all forward reference vectors of frame B ₂ point to I frames. On the other hand, in the H.264 structure as shown in FIG. 3B, forward motion vectors mv1 and mv2 of MB1 and MB2 may indicate an I frame or a P1 frame. If the motion vector mv2 (I) pointing to the I frame does not make the residual much larger than the motion vector mv2 (P ₁ ) pointing to the P1 frame, it would be advantageous to select mv2 (I) for the computation speed. . On the other hand, if vice versa, it would be advantageous to choose mv2 (P ₁ ).

본 발명은 GOP 구조가 변경되는 트랜스코딩에서, 출력 비디오의 규격이 H.264와 같이 다중 참조를 지원하는 경우라면, 참조 프레임을 선택함에 있어 입력 비디오의 것과, 직전 프레임을 새로운 참조 프레임으로 지정하는 것 중의 하나를 결정하는 방법을 제공하고자 한다. 입력 비디오의 참조 프레임을 그대로 따르면, 기존의 모션 벡터를 재활용함으로써 고속의 변환이 가능하고, 새로운 참조 프레임을 선택하면 연산량이 많이 필요하지만 우수한 화질을 얻을 수 있다. 따라서 양자 간에 적절한 타협점(trade-off)을 통해 트랜스코딩 속도 및 화질을 모드 고려한 트랜스코딩을 수행하는 것이 가능하다.According to the present invention, in the case of transcoding in which the GOP structure is changed, when the output video standard supports multiple references such as H.264, the input video and the previous frame are designated as a new reference frame in selecting a reference frame. It is intended to provide a way to determine one of the things. By following the reference frame of the input video as it is, high-speed conversion is possible by reusing existing motion vectors, and selecting a new reference frame requires a large amount of computation, but obtains excellent image quality. Therefore, it is possible to perform transcoding in consideration of transcoding speed and image quality through appropriate trade-off between the two.

도 4는 본 발명의 일 실시예에 따른 트랜스코더(100)의 구성을 도시하는 블록도이다. 트랜스코더(100)는 입력 비디오 스트림을 변환하여 출력 비디오 스트림 을 생성한다. 이를 위하여, 트랜스코더(100)는, 복원부(110), 선택부(120) 및 부호화부(130)를 포함하여 구성될 수 있다.4 is a block diagram showing the configuration of a transcoder 100 according to an embodiment of the present invention. Transcoder 100 converts the input video stream to produce an output video stream. To this end, the transcoder 100 may include a reconstructor 110, a selector 120, and an encoder 130.

복원부(110)는 입력 비디오 스트림으로부터 변환 계수(transform coefficient) 및 비디오 프레임을 복원한다. 선택부(120)는 상기 변환 계수의 크기에 근거하여, 상기 비디오 프레임에 의해 참조된 제1 프레임과 상기 제1 프레임과 다른 위치의 제2 프레임 중에서 하나를 선택한다. 그리고, 부호화부(130)는 상기 선택된 프레임을 참조하여 상기 복원된 비디오 프레임을 부호화한다.The reconstructor 110 reconstructs the transform coefficient and the video frame from the input video stream. The selector 120 selects one of a first frame referred to by the video frame and a second frame at a position different from the first frame based on the magnitude of the transform coefficient. The encoder 130 encodes the reconstructed video frame with reference to the selected frame.

도 5는 복원부(110)을 구성을 도시하는 블록도이다. 복원부(110)는 엔트로피 디코더(111), 역 양자화부(112), 역 변환부(113) 및 역 예측부(114)를 포함하여 구성될 수 있다.5 is a block diagram showing the configuration of the restoration unit 110. The reconstruction unit 110 may include an entropy decoder 111, an inverse quantization unit 112, an inverse transformer 113, and an inverse predictor 114.

엔트로피 디코더(111)는 가변길이 복호화, 산술 복호화 등의 알고리즘을 이용하여 입력 비디오 스트림을 무손실 복호화하여 양자화 계수 및 모션 벡터를 복원한다.The entropy decoder 111 losslessly decodes the input video stream using algorithms such as variable length decoding and arithmetic decoding to restore quantization coefficients and motion vectors.

역 양자화부(112)는 상기 복원된 양자화 계수를 역 양자화한다. 이러한 역 양자화 과정은 비디오 인코더에서 수행된 양자화 과정의 역에 해당한다. 상기 역 양자화 결과 변환 계수(transform coefficient)를 얻을 수 있다. 상기 변환 계수는 선택부(120)에 제공된다.The inverse quantizer 112 inverse quantizes the reconstructed quantization coefficients. This inverse quantization process corresponds to the inverse of the quantization process performed in the video encoder. A transform coefficient of the inverse quantization result may be obtained. The transform coefficient is provided to the selector 120.

역 변환부(113)는 상기 변환 계수를, 역 DCT 변환, 역 웨이브렛 변환 등의 역 공간적 변환 기법을 사용하여 역 변환한다.The inverse transform unit 113 inversely transforms the transform coefficient by using an inverse spatial transform technique such as an inverse DCT transform or an inverse wavelet transform.

역 예측부(114)는 엔트로피 디코더(111)에서 복원된 모션 벡터를 이용하여 현재 프레임에 대한 참조 프레임을 모션 보상하여 예측 프레임을 생성하고, 상기 생성된 예측 프레임을 상기 역 변환부(113)에서 역 변환된 결과와 가산하여 복원된 프레임을 생성한다.The inverse predictor 114 generates a prediction frame by motion compensating a reference frame with respect to the current frame by using the motion vector reconstructed by the entropy decoder 111, and generates the predicted frame by the inverse transform unit 113. It adds the inverse transformed result to produce a reconstructed frame.

다시, 도 4를 참조하면, 선택부(120)는 복원부(110)로부터 제공된 변환 계수를 이용하여, 상기 입력 비디오 스트림에서 참조 프레임으로 사용된 제1 프레임을 그대로 사용할 것인지, 이와 다른 제2 프레임을 사용할 것인지를 선택한다. 이에 의하여 선택된 프레임은 부호화부(130)에서 참조 프레임으로 이용된다. 이러한 선택을 위하여, 선택부(120)는 상기 변환 계수로부터 소정의 문턱값을 계산하고, 이 문턱값을 상기 판별 기준으로 사용한다.Again, referring to FIG. 4, the selector 120 may use the first frame used as a reference frame in the input video stream as it is, using the transform coefficient provided from the reconstructor 110, or another second frame. Choose whether to use. The frame selected by this is used by the encoder 130 as a reference frame. For this selection, the selector 120 calculates a predetermined threshold value from the transform coefficient, and uses the threshold value as the criterion of determination.

본 발명에서는 하나의 프레임 내에서 고정된 문턱값을 사용하는 방법과, 실시간 응용에 적합하도록 문턱값이 프레임 내에서도 적응적으로 변화하는 가변 방법을 예로 든다.In the present invention, a method of using a fixed threshold value in one frame and a variable method in which the threshold value is adaptively changed even in a frame to be suitable for a real-time application are taken as an example.

고정된 Fixed 문턱값을Threshold 이용하는 방법 How to use

본 실시예에서는 문턱값(TH_g)은 단일 프레임 내에서는 고정되어 있다. 문턱값(TH_g)은 다양한 방법으로 정할 수 있겠으나, 일 예로서 다음의 수학식 1과 같이 계산될 수 있다.In this embodiment, the threshold TH _g is fixed within a single frame. The threshold TH _g may be determined in various ways, but may be calculated as shown in Equation 1 as an example.

수학식 1에서 N은 한 프레임의 블록의 개수이고, C_m(i,j)는 m번째 블록 내의 좌표 (i,j)의 위치에서의 변환 계수이다. 또한, V_ctl은 문턱값의 크기를 조절할 수 있는 제어 파라미터(디폴트 값 = 1.0)이다. 상기 블록은 DCT 변환의 단위인 DCT 블록 크기를 가질 수도 있고, 모션 추정의 단위인 매크로블록 크기를 가질 수도 있다.In Equation 1, N is the number of blocks of one frame, and C _m (i, j) is a transform coefficient at the position of coordinate (i, j) in the m-th block. In addition, V _ctl is a control parameter (default value = 1.0) that can adjust the size of the threshold. The block may have a DCT block size that is a unit of DCT transformation or a macroblock size that is a unit of motion estimation.

현재 블록의 인덱스를 k라고 할 때, 참조 프레임의 선택 기준은 다음의 수학식 2와 같다.When the index of the current block is k, the selection criterion of the reference frame is shown in Equation 2 below.

if ( S|C_k(i,j)| < TH_g ) if (S | C _k (i, j) | <TH _g )

then, Ref_orig를 그대로 참조 프레임으로 선택한다.Then, select Ref _orig as the reference frame.

else, Ref₀를 참조 프레임으로 선택한다.else, Ref ₀ is selected as the reference frame.

수학식 2에서, S|C_k(i,j)|는 현재 블록에 포함되는 변환 계수의 절대치의 합을 의미하고, Ref_orig는 입력 비디오 스트림에서 현재 블록의 참조 프레임으로 사용된 제1 프레임을 의미하며, Ref₀는 상기 제1 프레임과는 다른 위치의 제2 프레임을 의미한다. 바람직하게는, 제2 프레임은 현재 블록이 속하는 프레임(현재 프레임)의 직전 프레임이다.In Equation 2, S | C _k (i, j) | means the sum of the absolute values of the transform coefficients included in the current block, and Ref _orig denotes the first frame used as a reference frame of the current block in the input video stream. Ref ₀ means a second frame at a position different from the first frame. Preferably, the second frame is the frame immediately before the frame to which the current block belongs (the current frame).

수학식 2가 의미하는 것은, 평균보다 큰 에너지(Energy)를 갖는 블록 에 대해서는 현재 프레임과 보다 가까운 프레임을 참조 프레임으로 선택하겠다는 것이다. 이렇게 함으로써 에너지가 평균보다 작은 블록은 입력 비디오 스트림에서의 모션 벡터를 그대로 사용하고, 그보다 큰 블록은 상대적으로 인접한 프레임을 참조 프레임으로 하여 새로이 모션 벡터를 구하겠다는 것이다. 이러한 방법을 통하여, 화질 및 트랜스코딩 속도 간에 적절한 타협점(trade-off)을 찾을 수 있다.Equation 2 means that for a block having an energy larger than the average, a frame closer to the current frame is selected as the reference frame. This means that blocks with less than average energy will use the motion vectors in the input video stream, while larger blocks will obtain new motion vectors using the relatively adjacent frames as reference frames. In this way, a suitable trade-off can be found between image quality and transcoding speed.

그런데, 수학식 1과 같이, 아직 처리되지 않은 블록까지 고려하여 문턱값을 구하는 방법은, 다소 많은 계산을 요할 수 있다. 따라서 현재 처리할 블록의 인덱스가 k인 경우, 문턱값(TH_g)를 계산함에 있어서 다음의 수학식 3과 같이 현재 처리된 블록만을 고려하는 실시예도 생각할 수 있다.However, as shown in Equation 1, a method for obtaining a threshold value considering a block that has not yet been processed may require a lot of calculation. Therefore, when the index of the block to be processed is k, an embodiment considering only the currently processed block as shown in Equation 3 below may be considered in calculating the threshold value TH _g .

선택부(120)에서 참조 프레임을 선택하는 단위인 블록과 실제 모션 벡터가 할당되는 매크로블록의 크기가 다를 수도 있는데, 이 경우에는 모션 벡터의 병합이나 분해가 필요할 수도 있다.The size of the block that is the unit for selecting the reference frame and the macroblock to which the actual motion vector is allocated may be different. In this case, merging or decomposition of the motion vector may be necessary.

가변 variable 문턱값을Threshold 사용하는 방법 How to use

트랜스코더의 실시간 응용에서는 제한 시간까지 프레임들을 처리할 수 있는가가 중요한 이슈가 된다. 실시간 트랜스코딩 상황에서는 현재 가용한 계산 시간을 하나의 인자로 하여 가변적으로 문턱값을 조절할 필요가 있다. 즉, 가변 문턱 값(TH_l)은 다음의 수학식 4와 같이 고정 문턱값(TH_g)에 가변 계수 RTfactor를 곱함으로써 계산될 수 있다.In a real-time application of transcoder, whether the frames can be processed until the time limit is an important issue. In the real-time transcoding situation, it is necessary to variably adjust the threshold value using the currently available calculation time as one factor. That is, the variable threshold TH _l may be calculated by multiplying the fixed threshold TH _g by the variable factor RTfactor as shown in Equation 4 below.

수학식 4는, 현재 프레임을 처리하기 위한 제한 시간을 넘을 것 같은 경우에는 문턱값(TH_l)을 증가시켜 트랜스코딩 속도를 향상시키고, 충분한 시간이 남아 있을 경우에는 문턱값(TH_l)을 감소시켜 화질의 향상을 꾀할 수 있음을 의미한다.Equation 4 improves the transcoding speed by increasing the threshold TH _l when it is likely to exceed the time limit for processing the current frame, and decreases the threshold TH _l when sufficient time remains. This means that the image quality can be improved.

상기 RTfactor는 여러 가지 방법으로 결정할 수 있겠지만, 고려할 인자가 현재 처리 중인 블록의 인덱스, 제한 시간 중 잔여 시간 등임을 고려하면 다음의 수학식 5와 같이 결정될 수 있다. The RTfactor can be determined in various ways, but considering the factor to be considered is the index of the block currently being processed, the remaining time of the time limit, etc., it can be determined as in Equation 5 below.

수학식 5에서, k는 현재 처리중인 블록의 인덱스 번호(0 ≤ k < N)이고, N은 프레임을 이루는 전체 블록의 수이다. 그리고, T_due는 현재의 프레임의 변환을 마쳐야 하는 시각을, T_cur는 현재 시각을 각각 의미하며, framerate는 영상 재생시 초당 프레임수를 의미한다. 상기 framerate는 상수이지만 (T_due-T_cur)를 정규화(normalize)하기 위하여 곱해져 있다. 따라서, 수학식 5의 분자 및 분모는 모두 0과 1 사이의 값을 가진다. 수학식 5는 현재 프레임에서 처리할 잔여 블록 수가 많을수록 RTfactor는 커짐으로써 트랜스코딩 속도가 높이고, 처리 가능한 시간이 많을수록 RTfactor가 작아짐으로써 트랜스코딩 속도를 낮추어 화질 향상을 도모한다는 것을 의미한다.In Equation 5, k is an index number (0 ≦ k <N) of a block currently being processed, and N is the total number of blocks forming a frame. In addition, T _due means the time at which the current frame should be converted, T _cur means the current time, and frame rate means the number of frames per second during video playback. The framerate is a constant but multiplied to normalize (T _due -T _cur ). Thus, the numerator and denominator of equation (5) both have values between 0 and 1. Equation 5 means that as the number of remaining blocks to be processed in the current frame increases, the RT factor increases, so that the transcoding speed increases, and as the processing time increases, the RT factor decreases, thereby lowering the transcoding speed and improving image quality.

수학식 5와 마찬가지 취지에서, RTfactor는 다음의 수학식 6과 같이 정의될 수도 있다.For the same purpose as in Equation 5, the RTfactor may be defined as in Equation 6 below.

선택부(120)는 이상에서 설명한 고정 문턱값 또는 가변 문턱값과 현재 블록에 포함되는 변환 계수의 절대치의 합을 비교하여, 입력 비디오 스트림의 모션 벡터 및 참조 프레임(제1 프레임)을 그대로 사용할지 새로운 프레임(제2 프레임)을 참조하여 모션 벡터를 계산할지를 선택한다. 이러한 선택은 각각의 블록 별로 이루어지며 참조 프레임 정보로서 부호화부(130)에 제공된다.The selector 120 compares the sum of the fixed or variable threshold described above with the absolute value of the transform coefficients included in the current block, and selects whether to use the motion vector and the reference frame (first frame) of the input video stream as it is. It is selected whether to calculate a motion vector with reference to a new frame (second frame). This selection is made for each block and provided to the encoder 130 as reference frame information.

모션 벡터가 역방향 모션 벡터일 경우에 순방향 모션 벡터로 근사하는 방법이 이미 알려져 있으므로, 순방향 모션 벡터를 얻을 수 없는 경우에는 역방향 모션 벡터를 근사하여 순방향 모션 벡터를 얻은 다음 이를 기존의 모션 벡터 및 참조 프레임 대신 사용할 수 있다. 예를 들어, B 프레임의 어떤 매크로블록이 이 후행하는 P 프레임의 어떤 블록을 참조한다면, 이 블록과 오버랩(overlap)되는 P 프레임의 매크로블록 중 가장 넓은 면적을 덮는 매크로블록을 선택하고, 이 매크로블록의 선 행 I 프레임에 대한 모션 벡터를 구할 수 있다. 이 때 B 프레임에서 사용할 수 있는 I 프레임에 대한 모션 벡터는, P 프레임 블록에 대한 모션 벡터와, P프레임 블록 중 가장 많이 오버랩된 매크로블록의 I 프레임에 대한 모션 벡터의 합으로 계산될 수 있다.Since the method of approximating the forward motion vector is already known when the motion vector is the reverse motion vector, if the forward motion vector cannot be obtained, the forward motion vector is approximated to obtain the forward motion vector, and then the existing motion vector and the reference frame are obtained. Can be used instead. For example, if a macroblock of a B frame refers to a block of this trailing P frame, the macroblock that covers the largest area of the macroblocks of the P frame overlapping this block is selected, and this macro The motion vector for the previous row I frame of the block can be obtained. In this case, the motion vector for the I frame that can be used in the B frame may be calculated as the sum of the motion vector for the P frame block and the motion vector for the I frame of the macroblock most overlapping among the P frame blocks.

도 6은 부호화부(130)의 구성을 도시하는 블록도이다. 부호화부(130)는 예측부(131), 변환부(132), 양자화부(133) 및 엔트로피 인코더(134)를 포함하여 구성될 수 있다.6 is a block diagram showing the configuration of the encoder 130. The encoder 130 may include a predictor 131, a transformer 132, a quantizer 133, and an entropy encoder 134.

예측부(131)는 상기 참조 프레임 정보를 이용하여 현재 프레임의 각각의 블록 별로 제1 프레임 및 제2 프레임 중 하나를 참조 프레임으로 하여 모션 벡터를 구한다. 상기 제1 프레임은 복원부(110)에서 복원된 프레임 중에서 현재 프레임의 참조 프레임으로 사용된 프레임을 의미하고, 상기 제2 프레임은 상기 제1 프레임과 다른 시간적 위치에 있는 프레임을 의미한다.The prediction unit 131 obtains a motion vector by using one of the first frame and the second frame as a reference frame for each block of the current frame using the reference frame information. The first frame refers to a frame used as a reference frame of the current frame among the frames reconstructed by the reconstructor 110, and the second frame refers to a frame at a different time position from the first frame.

이 때, 예측부(131)는 현재 프레임의 어떤 블록이 제1 프레임을 참조 프레임으로 하는 경우에는 상기 블록에 기존의 입력 비디오 스트림의 모션 벡터를 상기 현재 블록에 그대로 할당한다. 또한, 상기 블록이 제2 프레임을 참조 프레임으로 하는 경우에는, 상기 제2 프레임을 참조하여 모션 벡터를 추정하고 추정된 모션 벡터를 상기 현재 블록에 할당한다.In this case, when any block of the current frame uses the first frame as the reference frame, the prediction unit 131 allocates the motion vector of the existing input video stream to the current block as it is. In addition, when the block uses the second frame as a reference frame, the motion vector is estimated with reference to the second frame and the estimated motion vector is assigned to the current block.

그리고, 예측부(131)는 현재 프레임에 속하는 블록들에 할당된 모션 벡터를 이용하여 해당 참조 프레임(제1 프레임 또는 제2 프레임)을 모션 보상하여 예측 프레임을 생성하고, 현재 프레임에서 상기 예측 프레임을 차감함으로써 잔차를 생성 한다.The prediction unit 131 generates a prediction frame by motion compensating the corresponding reference frame (the first frame or the second frame) using the motion vector assigned to the blocks belonging to the current frame, and the prediction frame in the current frame. Create a residual by subtracting

변환부(132)는 상기 생성된 잔차에 대하여 공간적 변환(spatial transform)을 수행한다. 이러한 공간적 변환 방법으로는 DCT(Discrete Cosine Transform), 웨이브렛 변환(wavelet transform) 등이 사용될 수 있다. 공간적 변환 결과 변환 계수가 구해지는데, 공간적 변환 방법으로 DCT를 사용하는 경우 DCT 계수가, 웨이브렛 변환을 사용하는 경우 웨이브렛 계수가 구해진다.The transform unit 132 performs a spatial transform on the generated residual. As the spatial transformation method, a discrete cosine transform (DCT), a wavelet transform, or the like may be used. As a result of the spatial transform, a transform coefficient is obtained. When the DCT is used as the spatial transform method, the DCT coefficient is obtained, and when the wavelet transform is used, the wavelet coefficient is obtained.

양자화부(133)는 공간적 변환부(132)에서 구한 변환 계수를 양자화하여 양자화 계수를 생성한다. 양자화(quantization)란 임의의 실수 값으로 표현되는 상기 변환 계수를 일정 구간으로 나누어 불연속적인 값(discrete value)으로 나타내는 작업을 의미한다. 이러한 양자화 방법으로는 스칼라 양자화, 벡터 양자화 등의 방법이 있는데, 이 중 간단한 스칼라 양자화 방법은 변환 계수를 양자화 테이블의 해당 값으로 나눈 후 정수 자리로 반올림하는 과정으로 수행된다.The quantization unit 133 quantizes the transform coefficients obtained by the spatial transform unit 132 to generate quantization coefficients. Quantization refers to an operation of dividing the transform coefficients represented by arbitrary real values into discrete values. Such quantization methods include scalar quantization and vector quantization. Among them, a simple scalar quantization method is performed by dividing transform coefficients by corresponding values in a quantization table and rounding them to integer positions.

엔트로피 인코더(134)는 상기 양자화 계수와, 예측부(131)에 의하여 제공되는 모션 벡터를 무손실 부호화하여 출력 비디오 스트림을 생성한다. 이러한 무손실 부호화 방법으로는, 산술 부호화(arithmetic coding), 가변 길이 부호화(variable length coding) 등이 사용될 수 있다.The entropy encoder 134 losslessly encodes the quantization coefficients and the motion vectors provided by the predictor 131 to generate an output video stream. As such a lossless coding method, arithmetic coding, variable length coding, or the like may be used.

지금까지 도 4 내지 도 6의 각 구성요소들은 메모리 상의 소정 영역에서 수행되는 태스크, 클래스, 서브 루틴, 프로세스, 오브젝트, 실행 쓰레드, 프로그램과 같은 소프트웨어(software)나, FPGA(field-programmable gate array)나 ASIC(application-specific integrated circuit)과 같은 하드웨어(hardware)로 구 현될 수 있으며, 또한 상기 소프트웨어 및 하드웨어의 조합으로 이루어질 수도 있다. 상기 구성요소들은 컴퓨터로 판독가능한 저장 매체에 포함되어 있을 수도 있고, 복수의 컴퓨터에 그 일부가 분산되어 분포될 수도 있다.Up to now, each of the components of FIGS. 4 to 6 is software such as a task, a class, a subroutine, a process, an object, an execution thread, a program, or a field-programmable gate array (FPGA) that is performed in a predetermined area on a memory. Or hardware such as an application-specific integrated circuit (ASIC), or a combination of the software and hardware. The components may be included in a computer readable storage medium or a part of the components may be distributed over a plurality of computers.

이상 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야 한다.Although embodiments of the present invention have been described above with reference to the accompanying drawings, those skilled in the art to which the present invention pertains may implement the present invention in other specific forms without changing the technical spirit or essential features thereof. I can understand that. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive.

본 발명에 따르면, 입력 비디오 스트림을 GOP 구조가 상이한 다른 포맷으로 트랜스코딩 함에 있어서, 최적의 참조 프레임을 선택함으로써, 제한된 연산 능력(Computation Power) 내에서 상대적으로 높은 화질 또는 낮은 비트율을 실현할 수 있는 장점이 있다.According to the present invention, in transcoding an input video stream into another format having a different GOP structure, by selecting an optimal reference frame, a relatively high picture quality or a low bit rate can be realized within a limited computation power. There is this.

Claims

A transcoder that converts an input video stream to produce an output video stream,

A reconstruction unit for reconstructing transform coefficients and video frames from the input video stream;

A selection unit selecting one of a first frame referred to by the video frame and a second frame at a position different from the first frame based on the magnitude of the transform coefficient; And

And an encoder configured to encode the reconstructed video frame with reference to the selected frame.

The method of claim 1, wherein the second frame

A transcoder, which is a frame located immediately before the video frame in encoding order.

The method of claim 1,

The input video stream is an MPEG standard video stream and the output video stream is an H.264 standard video stream.

The method of claim 1, wherein the selection unit

If the sum of the absolute values of the transform coefficients for a specific block does not exceed a predetermined threshold, the first frame is selected as a reference frame for the specific block,

And selecting the second frame as a reference frame for the particular block if the sum of the absolute values of the transform coefficients for the particular block exceeds a predetermined threshold.

The method of claim 4, wherein the threshold is

A transcoder, which is the sum of the absolute values of the transform coefficients belonging to a single frame divided by the number of blocks.

The method of claim 4, wherein the threshold is

And a sum of absolute values of transform coefficients belonging to the currently processed block among transform coefficients belonging to a single frame divided by the number of the currently processed blocks.

The method of claim 4, wherein the threshold is

The sum of absolute values of transform coefficients belonging to a single frame divided by the number of blocks is multiplied by a predetermined variable coefficient, and the variable coefficient is determined by the number of residual blocks to be processed in the single frame and the remaining time to be processed. Transcoder.

The method of claim 7, wherein the variable coefficient is

A transcoder calculated by dividing the number of residual blocks to be processed by the number of blocks belonging to the single frame by the product of the remaining time times the frame rate.

The method of claim 1, wherein the encoder

And if the selected frame is a first frame, uses the motion vector as it is, and if the selected frame is a second frame, estimating the motion vector with reference to the second frame.

A transcoding method for converting an input video stream to produce an output video stream.

Recovering transform coefficients and video frames from the input video stream;

Selecting one of a first frame referenced by the video frame and a second frame at a position different from the first frame based on the magnitude of the transform coefficient; And

And encoding the reconstructed video frame with reference to the selected frame.

The method of claim 10, wherein the second frame

And a frame located immediately before the video frame in encoding order.

The method of claim 10,

Wherein the input video stream is an MPEG standard video stream and the output video stream is an H.264 standard video stream.

The method of claim 10, wherein the selecting step

Selecting the first frame as a reference frame for the particular block if the sum of the absolute values of the transform coefficients for the particular block does not exceed a predetermined threshold; And

Selecting the second frame as a reference frame for the particular block if the sum of the absolute values of the transform coefficients for the particular block exceeds a predetermined threshold.

The method of claim 13, wherein the threshold is

Wherein the sum of the absolute values of the transform coefficients belonging to a single frame is divided by the number of blocks.

The method of claim 13, wherein the threshold is

The sum of absolute values of transform coefficients belonging to a single frame divided by the number of blocks is multiplied by a predetermined variable coefficient, and the variable coefficient is determined by the number of residual blocks to be processed in the single frame and the remaining time to be processed. Transcoding method.

The method of claim 16, wherein the variable coefficient

Calculated by dividing the number of residual blocks to be processed by the number of blocks belonging to the single frame by the product of the remaining time times the frame rate.

The method of claim 10, wherein the encoding is performed.

If the selected frame is the first frame, using the motion vector as it is, and if the selected frame is the second frame, estimating the motion vector with reference to the second frame, Transcoding method.