KR20080055965A

KR20080055965A - Encoder assisted frame rate up conversion using various motion models

Info

Publication number: KR20080055965A
Application number: KR1020087010199A
Authority: KR
Inventors: 팡 시; 세이풀라 하일트 오구즈; 수밋 싱 세티; 비제이아라크쉬미 알. 라빈드란
Original assignee: 콸콤 인코포레이티드
Priority date: 2005-09-27
Filing date: 2006-09-27
Publication date: 2008-06-19
Also published as: JP2009510939A; WO2007038728A3; US20070071100A1; WO2007038728A2; KR100957322B1; AR055184A1; EP1941743A2; TW200737985A; US9258519B2

Abstract

An Encoder Assisted Frame Rate Up Conversion (EA-FRUC) system that utilizes various motion models, such as affine models, in addition to video coding and pre-processing operations at the video encoder to exploit the FRUC processing that will occur in the decoder in order to improve the modeling of moving objects, compression efficiency and reconstructed video quality. Furthermore, objects are identified in a way that reduces the amount of information necessary for encoding to render the objects on the decoder device.

Description

ENCODER ASSISTED FRAME RATE UP CONVERSION USING VARIOUS MOTION MODELS}

본 특허 출원은 (a) 2005년 9월 27일에 제출된 "서로 다른 모션 모델들을 사용하는 인코더 보조 프레임 레이트 상향 변환 방법"이라는 명칭의 임시 출원 번호 60/721,375 및 (b) 2005년 9월 27일에 제출된 "인코더 보조 프레임 레이트 상향 변환을 위한 방법 및 장치"라는 명칭의 임시 출원 번호 60/721,376에 우선권을 청구하며, 본 명세서에서 참조로서 통합된다.This patent application contains the provisional application number 60 / 721,375 entitled (a) Method of Encoder Auxiliary Frame Rate Upconversion Using Different Motion Models, filed September 27, 2005 and (b) September 27, 2005. Priority is filed in Provisional Application No. 60 / 721,376, entitled "Methods and Apparatus for Encoder Auxiliary Frame Rate Upconversion," filed on May, incorporated herein by reference.

본 개시물은 비디오 데이터를 인코딩하기 위한 방법 및 장치에 관한 것이다.This disclosure relates to a method and apparatus for encoding video data.

오늘날 다양한 프레임 레이트들을 지원하는 비디오 포맷들이 존재한다. 다음 포맷들은 현재 가장 널리 사용되고 있는 것들로서, 초당 지원되는 프레임들(fps)에 따라 순서대로 열거된다: 24(필름 네거티브), 25(PAL), 30(일반적으로 인터레이싱된(interlaced) 비디오), 및 60(고해상도(HD), 예컨대 720p). 상기 프레임들은 대부분의 애플리케이션들에 적합하지만, 이동 핸드셋 비디오 통신들에 필요한 낮은 대역폭을 달성하기 위해 프레임 레이트들은 때때로 15, 10, 7.5 또는 3 fps 만큼 낮은 레이트들로 감소된다. 상기 낮은 레이트들이 몇몇 비디오를 디스플레이하기 위해 더 낮은 계산 능력들을 가지는 로우 엔드 디바이스들을 허용하지만, 결과적인 비디오 품질은 모션에서 부드럽기보다 오히려 "단속동작(jerkiness)"(즉, 슬라이드 쇼 효과를 가지는)을 경험한다. 또한, 감소된 프레임들은 종종 비디오 내의 모션의 양을 정확히 트래킹하지 않는다. 예를 들어, 더 적은 프레임들은 스포츠 이벤트에서 발생하는 것과 같은 "하이 모션" 비디오 컨텐츠 부분들 동안 감소되어야 하지만, 더 많은 프레임들은 토크쇼에서 발생하는 것과 같은 "로우 모션" 비디오 컨텐츠 부분들 동안 감소될 수 있다. 비디오 압축은 컨텐트에 따라 결정되며, 비디오 압축 효율을 개선하기 위해 코딩될 시퀀스 내의 움직임 및 텍스처 특징들을 분석하여 통합할 수 있다. Today there are video formats that support various frame rates. The following formats are currently most widely used, listed in order of supported frames per second (fps): 24 (film negative), 25 (PAL), 30 (generally interlaced video), And 60 (high resolution (HD), such as 720p). The frames are suitable for most applications, but the frame rates are sometimes reduced to rates as low as 15, 10, 7.5 or 3 fps to achieve the low bandwidth required for mobile handset video communications. While the low rates allow low end devices with lower computational capabilities to display some video, the resulting video quality may not be “jerkiness” (ie, with a slide show effect) rather than smooth in motion. Experience. Also, reduced frames often do not accurately track the amount of motion in the video. For example, fewer frames should be reduced during "high motion" video content portions such as those occurring at sporting events, while more frames may be reduced during "low motion" video content portions such as those occurring at talk shows. have. Video compression is content dependent and can be integrated by analyzing motion and texture features in the sequence to be coded to improve video compression efficiency.

프레임 레이트 상향 변환(FRUC)은 재구성된 비디오의 프레임 레이트를 증가시키기 위해 비디오 디코더에서 비디오 보간을 사용하는 프로세스이다. FRUC에서, 보간된 프레임들은 레퍼런스들로서 수신된 프레임들을 사용하여 생성된다. 현재, FRUC 프레임 보간을 구현하는 시스템들은 모션 보상 보간과 전송된 모션 벡터들의 프로세싱에 기반한 접근법들을 포함한다. FRUC는 또한 다양한 비디오 포맷들 사이에서의 변환을 위해 사용된다. 예를 들어, 필름 및 비디오 사이의 각각의 컬러 프레임 레이트 차이들을 조정하는 필름-대-비디오 테입 전달 기법인, 텔레시네(Telecine) 및 역 텔레시네 애플리케이션들에서, 프로그레시브 비디오(24 프레임/초)는 NTSC 인터레이스된 비디오(29. 97 프레임/초)로 변환된다. Frame Rate Up Conversion (FRUC) is a process that uses video interpolation in a video decoder to increase the frame rate of reconstructed video. In FRUC, interpolated frames are generated using the frames received as references. Currently, systems implementing FRUC frame interpolation include approaches based on motion compensation interpolation and processing of transmitted motion vectors. FRUC is also used for conversion between various video formats. For example, in telecine and reverse telecine applications, which are film-to-video tape delivery techniques that adjust respective color frame rate differences between film and video, progressive video (24 frames / second) is NTSC interlaced. Is converted into video (29.97 frames / second).

다른 FRUC 접근법은 모션 추정 및 블록 기반 프로세싱의 결함들에 의해 야기되는 블록 압축 가공물(artifact)들을 줄이기 위해 가중-적응 모션 보상 보간(WAMCI)을 이용한다. 이러한 접근은 다수의 모션 보상 보간(MCI) 이미지들의 가 중합에 의한 보간에 기초한다. 블록 경계들에 있는 블록 압축 가공물들은 또한 오버랩된 블록 모션 보상(OBMC)과 유사한 기법을 적용함으로써 제안된 방법에서 줄어들게 된다. 구체적으로, 오버랩된 영역들을 처리하면서 블러링(blurring)을 줄이기 위해서, 상기 방법은 블록 모션 타입을 결정하도록 모션 분석을 이용하고 적응적으로 OBMC를 적용한다. 실험에 의거한 결과들은 제안된 접근법이 블록 압축 가공물들을 상당하게 줄여서 결과들을 향상시킨다고 제시하고 있다. Another FRUC approach uses weighted-adaptive motion compensation interpolation (WAMCI) to reduce block compression artifacts caused by defects in motion estimation and block-based processing. This approach is based on interpolation by additive polymerization of multiple motion compensated interpolation (MCI) images. Block compression artifacts at the block boundaries are also reduced in the proposed method by applying a technique similar to overlapped block motion compensation (OBMC). Specifically, to reduce blurring while processing overlapping areas, the method uses motion analysis to adaptively determine the block motion type and adaptively applies OBMC. Experimental results suggest that the proposed approach improves the results by significantly reducing block compression artifacts.

또다른 FRUC 접근법은 인코더로부터 부정확하게 전송된 임의의 모션 벡터들의 사용에 의해 야기되는 압축 가공물들을 줄이기 위해 벡터 신뢰성 분석을 이용한다. 이러한 접근법에서, 프레임 해석을 위한 가장 바람직한 접근법을 결정하기 위해 전송된 모션 벡터들과 비교되는 모션 벡터들을 구성하도록 모션 추정이 이용된다. 모션 추정을 이용하는 기존의 상향 변환 알고리즘들에서, 추정 프로세스는 프레임이 보간되도록 허용하는 모션 벡터들을 구성하기 위해 두 개의 인접한 디코딩된 프레임들을 이용하여 수행된다. 그러나, 이러한 알고리즘들은 모션 추정 연산을 위해 요구되는 계산의 양에 대한 고려 없이 전송 대역폭의 활용을 향상시키도록 시도한다. 이와 비교하여, 전송된 모션 벡터들을 이용하는 상향 변환 알고리즘들에서는, 보간된 프레임들의 품질은 인코더에 의해 획득된 모션 벡터들에 크게 의존한다. 이러한 두가지 접근법들의 조합을 이용하여, 전송된 모션 벡터들은 먼저 이들이 보간 프레임들을 구성하는데 사용될 수 있는지 여부를 결정하기 위해 분석된다. 그 다음에 보간을 위해 사용되는 상기 방법은 적응적으로 세 가지 방법들로부터 선택된다: 로컬 모션-보상 보간, 글로벌(global) 모션-보상 보간 및 프레임-반 복 보간. Another FRUC approach uses vector reliability analysis to reduce compression artifacts caused by the use of any motion vectors incorrectly transmitted from the encoder. In this approach, motion estimation is used to construct motion vectors that are compared to the transmitted motion vectors to determine the most desirable approach for frame interpretation. In existing upconversion algorithms using motion estimation, the estimation process is performed using two adjacent decoded frames to construct motion vectors that allow the frame to be interpolated. However, these algorithms attempt to improve the utilization of transmission bandwidth without considering the amount of computation required for motion estimation operations. In comparison, in up-conversion algorithms using transmitted motion vectors, the quality of the interpolated frames is highly dependent on the motion vectors obtained by the encoder. Using a combination of these two approaches, the transmitted motion vectors are first analyzed to determine whether they can be used to construct interpolated frames. The method used for interpolation is then adaptively selected from three methods: local motion-compensation interpolation, global motion-compensation interpolation and frame-repeat interpolation.

FRUC 기술들은 일반적으로 비디오 디코더의 후-처리 기능들로서 구현됨에도 불구하고, 그리하여 비디오 인코더는 전형적으로 이러한 동작에 포함되지 않는다. 그러나, 인코더-보조(encoder-assisted) FRUC(EA-FRUC)로서 지칭되는 접근법에서는, 여전히 디코더가 제거되는 벡터 또는 잔여 데이터 없이 독립적으로 프레임들의 중요 부분들을 재생성하도록 허용하면서, 인코더는 모션 벡터들 또는 기준 프레임들(예를 들어, 잔여 데이터)과 관련된 특정 정보의 전송이 제거될 있는지 여부를 결정할 수 있다. 예를 들어, 양방향 예측 비디오 코딩 방법은 MPEG-2에서 B-프레임 코딩을 향상시킨 방법으로서 소개되어 있다. 이러한 방법에서, 모션-보상 예측 코딩에서 실제 모션 벡터들의 애플리케이션을 인에이블하기 위해 에러 기준의 사용이 제안된다. 왜곡 측정은 절대차 합계(SAD)에 기반하지만, 이러한 왜곡 측정은 특히 시퀀스에 있는 두 개의 프레임들 사이의 모션 양이 한정되어야 할 필요가 있는 실제 왜곡 측정을 제공하는데 있어서는 불충분하다고 알려져 있다. 또한, 임계치 변동에 대한 분류들은 가급적이면 컨텐트 종속적이기 때문에, 이러한 임계치들이 가변적이어야 하는 경우에 임계치들의 변화는 고정된 임계치들을 이용하여 분류된다. Although FRUC techniques are generally implemented as post-processing functions of a video decoder, a video encoder is therefore typically not included in this operation. However, in an approach referred to as encoder-assisted FRUC (EA-FRUC), the encoder is still capable of generating motion vectors or motion vectors while still allowing the decoder to regenerate significant portions of frames independently without the vector or residual data being removed. It may be determined whether the transmission of specific information related to the reference frames (eg, residual data) is to be removed. For example, the bidirectional predictive video coding method has been introduced as a method for improving B-frame coding in MPEG-2. In this method, the use of an error criterion is proposed to enable the application of real motion vectors in motion-compensated predictive coding. Distortion measurements are based on absolute difference sum (SAD), but such distortion measurements are known to be insufficient in providing actual distortion measurements, especially where the amount of motion between two frames in a sequence needs to be limited. Also, since classifications for threshold variations are preferably content dependent, changes in thresholds are sorted using fixed thresholds when those thresholds should be variable.

FRUC 비디오 압축 기술들은 인코더 개선 정보를 사용하는 것을 포함하며, 비디오 프레임들 내의 오브젝트들의 모션을 모델링하기 위해 이동하는 모션 모델들과 함께 블럭-기반의 모션 예측을 사용한다. 블럭-기반의 모션 예측은 비디오 신호들에 고유한 시간 정정 구조를 이용한다. 블럭-기반의 모션 예측에 의해 사용되는 것과 같은 이동하는 모션 모델링은 비디오 포착 디바이스의 렌즈들과 더 평행하거나 덜 평행하는 평면 내에서 이동하는 모션을 관통시키면서 고정된 형태를 유지하는 바디들에 대하여 비디오 신호들 내에서의 시간 중복성을 감소시키거나 제거할 수 있다. 이동하는 모션 모델은 인코딩된 블럭당 2개의 파라미터들을 사용한다.FRUC video compression techniques include using encoder enhancement information, and use block-based motion prediction with moving motion models to model the motion of objects within video frames. Block-based motion prediction uses a time correction structure inherent to video signals. Moving motion modeling, such as used by block-based motion prediction, allows video to move through a motion that moves in a plane that is more parallel or less parallel to the lenses of the video capture device while maintaining a fixed shape. Time redundancy within the signals can be reduced or eliminated. The moving motion model uses two parameters per encoded block.

하이브리드 비디오 압축에 기초한 모션-보상 예측 및 변환 코딩에서, 비디오 프레임들은 이동하는 모션 모델의 사용에 따라 종래의 인코더들에 의해 분할되며, 상기 경우에 분할들은 이동하는 모션을 경험하면서 고정된 형태를 유지하는 오브젝트 바디들의 위치를 결정하기 위해 생성된다. 예를 들어, 자동차가 지나가는 동안 카메라를 호출하는 사람의 비디오 시퀀스는 시퀀스의 고정된 배경을 표시하는 스틸 이미지, 호출하는 사람의 머리 부분을 표시하는 비디오 오브젝트, 사람과 관련된 음성을 표시하는 오디오 오브젝트, 및 직사각형의 지원 영역을 가지는 쪽화면(sprite)으로 움직이는 자동차를 표시하는 또다른 비디오 오브젝트를 포함하는 오브젝트들로 분할될 수 있다. 스틸 이미지에서 쪽화면의 위치는 일시적으로 이동할 수 있다.In motion-compensated prediction and transform coding based on hybrid video compression, video frames are partitioned by conventional encoders according to the use of a moving motion model, in which case the partitions remain fixed while experiencing the moving motion. To determine the location of the object bodies. For example, a video sequence of a person calling a camera while a car is passing can include a still image showing the fixed background of the sequence, a video object showing the head of the caller, an audio object showing the voice associated with the person, And another video object representing a moving car on a sprite having a rectangular support area. In still images, the position of the page can be moved temporarily.

공교롭게, 이동하는 모델 모션 예측은 블럭 당 2개 이상의 파라미터들을 요구하는 모션에서 오브젝트들에 대한 모션을 정확히 예측하거나 설명할 수 없다. 카메라 모션 및 초점 거리 변화와 결합하여 독립적으로 이동하는 오브젝트들은 모션 예측을 위해 효율적으로 근사화되어야 하는 복잡한 모션 벡터를 발생한다. 따라서, 나머지 신호(예측 에러라 공지됨)는 고려할만한 전력을 가지며, 따라서 상기 이동을 포함하는 비디오 프레임들은 압축에 비효율적이다. 상기 오브젝트들을 포 함하는 비디오 프레임들이 블럭-기반의 모션 예측을 사용하여 보간될 때, 보간된 프레임의 주관적인 또는 객관적인 품질 모두는 블럭 모션 다이내믹들을 설명하기 위해 이동하는 모션 모델 기본구조의 제한들로 인해 낮다. 또한, 비디오 시퀀스들이 이동하는 모델 모션 예측에 따라 분할될 때, 임의의 움직임 및 변형들을 경험하는 오브젝트의 보간들을 처리하는 알고리즘의 효율이 제한된다.Unfortunately, moving model motion prediction cannot accurately predict or account for motion for objects in a motion that requires two or more parameters per block. Independently moving objects in combination with camera motion and focal length changes generate complex motion vectors that must be efficiently approximated for motion prediction. Thus, the rest of the signal (known as a prediction error) has power to be considered, so video frames containing the movement are inefficient for compression. When video frames containing the objects are interpolated using block-based motion prediction, both the subjective or objective quality of the interpolated frame is due to the limitations of the motion model infrastructure that move to account for block motion dynamics. low. In addition, when video sequences are partitioned according to moving model motion prediction, the efficiency of the algorithm for processing interpolations of an object that experiences arbitrary movement and deformations is limited.

보간을 수행하기 위한 정보를 전송하는데 필요한 대역폭의 양을 감소시키면서 이동하는 오브젝트들을 적절히 모델링하는 디코더 디바이스에서 고품질의 보간된 프레임들을 제공하고, 저전력 처리에 의존하는 멀티미디어 이동 디바이스들에 적합하도록 상기 프레임들을 생성하는데 필요한 계산양을 감소시키는 접근 방식이 바람직하다. Provide high quality interpolated frames in a decoder device that properly models moving objects while reducing the amount of bandwidth required to transmit information to perform interpolation, and provide the frames for multimedia mobile devices that rely on low power processing. An approach that reduces the amount of computation needed to produce is desirable.

본 명세서에 개시된 특정 양상들은 이동하는 오브젝트들의 모델링, 압축 효율 및 재구성된 비디오 품질을 개선하기 위해 디코더 내에서 발생할 FRUC 프로세싱을 활용하기 위해 비디오 인코더에서 비디오 코딩 및 사전-처리 동작들에 부가하여 다양한 모션 모델들을 사용하는 인코더 보조 프레임 레이트 상향 변환(EA-FRUC)을 제공한다. Certain aspects disclosed herein provide a variety of motion in addition to video coding and pre-processing operations in a video encoder to utilize FRUC processing that will occur within a decoder to improve modeling, compression efficiency, and reconstructed video quality of moving objects. Provides encoder assisted frame rate up-conversion (EA-FRUC) using models.

일 양상에서, 멀티미디어 데이터 처리 방법이 개시된다. 상기 방법은 제 1 및 제 2 비디오 프레임들 중 적어도 하나를 다수의 분할(partition)들로 분할하는 단계, 상기 분할들 중 적어도 하나 내의 적어도 하나의 오브젝트에 대한 모델링 정보 - 상기 모델링 정보는 상기 제 1 및 제 2 비디오 프레임들과 연관됨 - 를 결정하는 단계, 상기 모델링 정보에 기초하여 보간 프레임을 생성하는 단계, 및 상기 보간 프레임에 기초하여 인코딩 정보 - 상기 인코딩 정보는 상기 보간 프레임과 일시적으로 함께 위치된(co-located) 비디오 프레임을 생성하기 위해 사용됨 - 를 생성하는 단계를 포함한다. In one aspect, a method of processing multimedia data is disclosed. The method comprises partitioning at least one of the first and second video frames into a plurality of partitions, modeling information for at least one object in at least one of the partitions, wherein the modeling information is the first one. And associated with second video frames, generating an interpolation frame based on the modeling information, and encoding information based on the interpolation frame, wherein the encoding information is temporarily located with the interpolation frame. Used to generate a co-located video frame.

또다른 양상에서, 멀티미디어 데이터 처리 장치가 개시된다. 상기 장치는 제 1 및 제 2 비디오 프레임들 중 적어도 하나를 다수의 분할(partition)들로 분할하는 수단, 상기 다수의 분할들 중 적어도 하나 내의 적어도 하나의 오브젝트에 대한 모델링 정보 - 상기 모델링 정보는 상기 제 1 및 제 2 비디오 프레임들과 연관됨 - 를 결정하는 수단, 상기 모델링 정보에 기초하여 보간 프레임을 생성하는 수단, 및 상기 보간 프레임에 기초하여 인코딩 정보 - 상기 인코딩 정보는 상기 보간 프레임과 일시적으로 함께 위치된(co-located) 비디오 프레임을 생성하기 위해 사용됨 - 를 생성하는 수단을 포함한다.In another aspect, an apparatus for processing multimedia data is disclosed. The apparatus comprises means for dividing at least one of the first and second video frames into a plurality of partitions, modeling information for at least one object in at least one of the plurality of partitions, the modeling information being the Means for determining an associated with first and second video frames, means for generating an interpolation frame based on the modeling information, and encoding information based on the interpolation frame, the encoding information being temporarily associated with the interpolation frame. Means for generating a co-located video frame.

추가 양상에서, 멀티미디어 데이터 처리 장치가 개시된다. 상기 장치는 제 1 및 제 2 비디오 프레임들 중 적어도 하나를 다수의 분할(partition)들로 분할하도록 구성된 분할 모듈, 상기 다수의 분할들 중 적어도 하나 내의 적어도 하나의 오브젝트에 대한 모델링 정보 - 상기 모델링 정보는 상기 제 1 및 제 2 비디오 프레임들과 연관됨 - 를 결정하도록 구성된 모델링 모듈, 상기 모델링 정보에 기초하여 보간 프레임을 생성하도록 구성된 프레임 생성 모듈, 상기 보간 프레임에 기초하여 인코딩 정보를 생성하도록 구성된 인코딩 모듈, 및 상기 인코딩 정보를 디코더로 전송하도록 구성된 전송 모듈을 포함한다. In a further aspect, an apparatus for processing multimedia data is disclosed. The apparatus comprises a partitioning module configured to partition at least one of the first and second video frames into a plurality of partitions, modeling information for at least one object in at least one of the plurality of partitions-the modeling information A modeling module configured to determine an associated with the first and second video frames, a frame generation module configured to generate an interpolated frame based on the modeling information, and an encoding configured to generate encoding information based on the interpolated frame. A module, and a sending module, configured to send the encoding information to a decoder.

또다른 양상에서, 멀티미디어 데이터를 처리하기 위한 명령들을 포함하는 기계 판독가능한 매체가 개시된다. 상기 명령들은 실행시 기계가 제 1 및 제 2 비디오 프레임들 중 적어도 하나를 다수의 분할(partition)들로 분할하고, 상기 분할들 중 적어도 하나 내의 적어도 하나의 오브젝트에 대한 모델링 정보 - 상기 모델링 정보는 상기 제 1 및 제 2 비디오 프레임들과 연관됨 - 를 결정하고, 상기 모델링 정보에 기초하여 보간 프레임을 생성하며, 그리고 상기 보간 프레임에 기초하여 인코딩 정보 - 상기 인코딩 정보는 상기 보간 프레임과 일시적으로 함께 위치된(co-located) 비디오 프레임을 생성하기 위해 사용됨 - 를 생성하도록 한다. In another aspect, a machine readable medium is disclosed that includes instructions for processing multimedia data. The instructions may cause the machine to divide at least one of the first and second video frames into a plurality of partitions, the modeling information for at least one object in at least one of the partitions, the modeling information being Determine an associated with the first and second video frames, generate an interpolated frame based on the modeling information, and encode information based on the interpolated frame—the encoding information is temporarily associated with the interpolated frame. Used to generate a co-located video frame.

또다른 양상에서 멀티미디어 데이터를 처리하기 위한 프로세서가 개시된다. 프로세서는 제 1 및 제 2 비디오 프레임들 중 적어도 하나를 다수의 분할(partition)들로 분할하고, 상기 분할들 중 적어도 하나 내의 적어도 하나의 오브젝트에 대한 모델링 정보 - 상기 모델링 정보는 상기 제 1 및 제 2 비디오 프레임들과 연관됨 - 를 결정하고, 상기 모델링 정보에 기초하여 보간 프레임을 생성하며, 그리고 상기 보간 프레임에 기초하여 인코딩 정보 - 상기 인코딩 정보는 상기 보간 프레임과 일시적으로 함께 위치된(co-located) 비디오 프레임을 생성하기 위해 사용됨 - 를 생성하도록 구성된다. In another aspect, a processor for processing multimedia data is disclosed. The processor divides at least one of the first and second video frames into a plurality of partitions, and modeling information for at least one object in at least one of the partitions, wherein the modeling information is determined by the first and second images. Is associated with two video frames, generates an interpolated frame based on the modeling information, and encodes information based on the interpolated frame, the encoding information being temporarily co-located with the interpolated frame. located) is used to generate a video frame.

다른 목적들, 특징들 및 장점들이 하기의 상세한 설명에서 당업자에게 명백할 것이다. 그러나, 상세한 설명 및 특정 예들은 예시적인 양상들을 표시하면서 설명을 위해 제공되고 제한되지 않음이 이해되어야 한다. 하기의 설명에서 다양한 변경들 및 수정들이 본 발명의 사상을 벗어나지 않고 실행될 수 있다. Other objects, features and advantages will be apparent to those skilled in the art from the following detailed description. However, it is to be understood that the detailed description and specific examples are provided by way of illustration and not limitation, displaying illustrative aspects. Various changes and modifications can be made in the following description without departing from the spirit of the invention.

도 1A는 스트리밍 비디오의 전달을 위한 일 양상에 따라 다양한 모션 모델들을 사용하여 인코더 보조 프레임 레이트 상향 변환(EA-FRUC) 시스템을 구현하는 통신 시스템의 일 예의 설명이다.1A is an illustration of an example of a communication system implementing an encoder assisted frame rate up-conversion (EA-FRUC) system using various motion models in accordance with an aspect for delivery of streaming video.

도 1B는 스트리밍 비디오의 전달을 위한 일 양상에 따라 다양한 모션 모델들을 사용하도록 구성된 EA-FRUC 디바이스의 일 예의 설명이다.1B is an illustration of an example of an EA-FRUC device configured to use various motion models in accordance with an aspect for delivery of streaming video.

도 2는 다양한 모델들을 사용하도록 구성된 도 1A의 EA-FRUC 시스템의 동작을 설명하는 흐름도이다.2 is a flow diagram illustrating the operation of the EA-FRUC system of FIG. 1A configured to use various models.

도 3은 오브젝트 기반의 모델링 정보 및 디코더 정보를 사용하여 업샘플링하기 위한 인코딩 비디오 데이터를 설명하는 흐름도이다.3 is a flowchart illustrating encoded video data for upsampling using object-based modeling information and decoder information.

도 4는 본 발명의 일 양상에 따른 비디오 프레임 내의 오브젝트들에 대한 모델링 정보를 결정하는 것을 설명하는 흐름도이다.4 is a flow diagram illustrating determining modeling information for objects in a video frame in accordance with an aspect of the present invention.

도 5는 아핀 모델들을 사용하여 비디오 프레임 내의 오브젝트들에 대한 모션 벡터 침식 정보를 결정하는 것을 설명하는 흐름도이다.5 is a flow diagram illustrating determining motion vector erosion information for objects in a video frame using affine models.

도 6은 본 발명의 특정 양상들에 따라 이동하는 모션 모델 기본구조 내의 모션 모델들을 디코딩하도록 구성된 디코딩 디바이스를 사용하여 오브젝트 기반의 모델링 정보 및 디코더 정보를 사용하여 업샘플링된 인코딩된 비디오 데이터 비트스트림을 디코딩하는 것을 설명하는 흐름도이다.6 illustrates an upsampled encoded video data bitstream using object-based modeling information and decoder information using a decoding device configured to decode motion models within a moving motion model infrastructure in accordance with certain aspects of the present invention. This is a flowchart illustrating decoding.

본 명세서에 개시된 것과 같은 인코더 보조-FRUC(EA-FRUC) 시스템의 일 양상 에서, 인코더는 디코더에서 사용되는 FRUC 알고리즘을 미리 알고 있을 뿐만 아니라 소스 프레임들에 액세스한다. 인코더는 추가로 소스 프레임들 내에서 이동하는 오브젝트들을 정확히 모델링하기 위해 이동하는 모션 모델들을 포함하는 다양한 모션 모델들을 사용하도록 구성된다. 그와 함께 생성된 보간 프레임을 사용하는 인코더는 FRUC를 수행할 때 디코더를 보조하고 보간 동안 실행되는 결정들을 개선하기 위해 추가 정보를 전송한다. FRUC가 디코더 내에서 수행될 것이라는 지식을 이용하여, EA-FRUC 시스템은 압축 효율을 개선하고(따라서 전송 대역폭의 사용을 개선하고), 재구성된 비디오 품질(재구성된 이동하는 오브젝트들의 표시를 포함함)을 개선하기 위해 다양한 모션 모델들, 즉 비디오 인코더에서의 비디오 코딩 및 사전-처리 동작들을 사용한다. 특히, 아핀 모션 모델링과 같이 인코더로부터의 다양한 모션 모델 정보는 일반적으로 인코더에 의해 전송되어 디코더에 제공되는 정보를 보충하거나 대체하여 모션 모델링 정보가 인코더 보조 FRUC 내에서 사용될 수 있도록 한다.In one aspect of an encoder assisted-FRUC (EA-FRUC) system as disclosed herein, the encoder knows in advance the FRUC algorithm used in the decoder as well as accesses the source frames. The encoder is further configured to use various motion models, including moving motion models, to accurately model moving objects within the source frames. The encoder using the interpolated frame generated therewith sends additional information to assist the decoder when performing FRUC and to improve the decisions made during interpolation. Using the knowledge that FRUC will be performed within the decoder, the EA-FRUC system improves compression efficiency (and thus improves the use of transmission bandwidth), and reconstructed video quality (including the representation of reconstructed moving objects). We use various motion models, namely video coding and pre-processing operations in the video encoder to improve the performance. In particular, various motion model information from the encoder, such as affine motion modeling, is generally transmitted by the encoder to supplement or replace the information provided to the decoder so that the motion modeling information can be used within the encoder assisted FRUC.

일 양상에서, 인코더에 의해 제공되는 정보는 디코더에서 보간될 이미지의 공간(예컨대, 세부조정들, 모델 결정들, 이웃 특징들) 및 시간(예컨대, 모션 벡터(들) 결정들) 특징들뿐만 아니라 FRUC 프로세스에 의해 생성된 정규 예측(B 또는 P) 프레임 코딩 및 보간된 프레임과 관련하여 서로 다른 정보와 같은 파라미터들을 포함한다. 인코더에 의해 제공되는 정보는 추가로 원래의 비디오 스트림으로부터 이동하는 오브젝트들을 정확하고 효율적으로 표현하기 위해 선택된 다양한 모션 모델들을 포함한다. In one aspect, the information provided by the encoder is not only the spatial (eg, refinements, model decisions, neighbor features) and time (eg, motion vector (s) determinations) features of the image to be interpolated at the decoder. Parameters such as normal information (B or P) frame coding generated by the FRUC process and different information with respect to interpolated frames. The information provided by the encoder further includes various motion models selected to accurately and efficiently represent the moving objects from the original video stream.

몇몇 모션 예측 기술들은 이동하는 모션에 부가하여 비디오 압축을 위해 사용될 수 있다. 추가의 모션 타입들은 회전 모션; 줌-인 및 줌-아웃 모션; 견고한 바디의 가정 하에 구조들의 변경 및 장면 오브젝트들의 형태 위반이 발생하는 변형들; 아핀 모션; 글로벌 모션; 및 오브젝트 기반의 모션을 포함한다. 아핀 모션 모델들은 이동 모션, 회전 모션, 잘라내기, 평행이동, 변형들 및 줌-인 및 줌-아웃 시나리오들을 위한 오브젝트 스케일링을 포함하는 다수의 모션 타입들을 지원한다. 아핀 모션 모델은 다른 모션 타임들을 통합하기 때문에 이동 모델보다 다용도로 사용된다. 아핀 모션 모델은 회전, 스케일링 및 잘라내기를 고려하여 인코딩된 블럭당 6개 파라미터들을 사용한다. 따라서 장면 내의 오브젝트들의 실제 동적 모션에 더 높은 적응성을 허용한다.Some motion prediction techniques can be used for video compression in addition to moving motion. Additional motion types include rotational motion; Zoom-in and zoom-out motion; Deformations where changes in structures and shape violations of scene objects occur under the assumption of a rigid body; Affine motion; Global motion; And object-based motion. Affine motion models support a number of motion types including moving motion, rotational motion, cropping, translation, transformations and object scaling for zoom-in and zoom-out scenarios. The affine motion model is more versatile than the moving model because it incorporates different motion times. The affine motion model uses six parameters per encoded block, taking into account rotation, scaling and truncation. Thus allowing higher adaptability to the actual dynamic motion of the objects in the scene.

오브젝트 기반의 모션 예측 기술은 서로 다른 모션 타입들을 경험하는 다수의 오브젝트들을 포함하는 장면에 대한 비디오 프레임들을 위해 사용된다. 상기 상환들에서, 어떠한 단일 모션 모델도 서로 다른 다이내믹들(dynamics)을 캡처할 수 없지만 대신에 모델들의 크기가 사용될 수 있으며, 상기 경우에 개별 모델들이 장면 내의 각각의 오브젝트에 대하여 명확하게 제작된다. Object-based motion prediction techniques are used for video frames for a scene that includes multiple objects that experience different motion types. In the above reimbursements, no single motion model can capture different dynamics, but instead the size of the models can be used, in which case separate models are explicitly made for each object in the scene.

본 명세서에서 논의되는 인코더 디바이스의 특정 양상들은 인코딩 디바이스에 의해 인코딩되는 데이터를 디코딩하는데 사용될 디코더 디바이스의 속성들을 평가하며, 프레임들을 보간할 때 디코더 디바이스에서 압축 효율, 성능, 오브젝트 렌더링을 개선하기 위해 비디오 데이터의 인코딩을 최적화한다. 예를 들어, 디코더 디바이스는 FRUC 또는 에러 숨김을 개선할 수 있다. 일 양상에서, 비디오 프레임 들은 동작들, 시간 변경 다이내믹들 또는 고유하게 식별가능한 오브젝트들에 기초하여 불균일한 크기와 불균일한 형태의 영역들의 집합으로 분할된다. 특정 양상들에 따라, 인코더 디바이스는 글로벌 모션의 위치를 결정하기 위해 비디오 데이터를 분석한다(변화하는 지속 기간의 세그먼트들 내에서). 글로벌 모션의 위치가 결정되면, 관련된 모델 파라미터들 및 신호들은 아핀 모션 모델들과 같은 다양한 모션 모델들을 사용하여 추정된다. 각각의 오브젝트 또는 분할들에 대하여 이동하고, 회전하고, 스케일링하고, 형태학적으로 변화하는 변환들을 설명하는 아핀 모션 모델이 그 후에 생성된다. 분할 정보는 연관된 모델들과 함께 잔여 신호의 전력을 감소시킬 수 있는 예측 신호를 생성하기 위해 사용될 수 있다. 연관된 모델과 함께 분할 맵은 타입 및 파라미터 정보를 포함하며, 디코더 디바이스로 전송된다. 나머지 신호는 더 높은 품질 재구성을 가능하도록 개별적으로 압축되어 디코더 디바이스로 전송될 수 있다. 특정 양상들에서, 디코더 디바이스는 수정된 이동하는 모션 모델 기본구조 내의 인코딩된 모션 모델에서의 정보를 사용하여 인코딩된 데이터를 분석할 수 있다.Certain aspects of the encoder device discussed herein evaluate properties of a decoder device to be used to decode data encoded by the encoding device, and to improve compression efficiency, performance, object rendering at the decoder device when interpolating frames. Optimize the encoding of the data. For example, the decoder device may improve FRUC or error concealment. In one aspect, video frames are divided into a set of regions of non-uniform size and non-uniform shape based on operations, time varying dynamics or uniquely identifiable objects. According to certain aspects, the encoder device analyzes the video data (in segments of varying duration) to determine the location of global motion. Once the position of the global motion is determined, the relevant model parameters and signals are estimated using various motion models, such as affine motion models. An affine motion model is then created that describes the moving, rotating, scaling, and morphologically changing transforms for each object or partitions. The partitioning information can be used with the associated models to generate a predictive signal that can reduce the power of the residual signal. The partition map along with the associated model contains type and parameter information and is sent to the decoder device. The remaining signals can be individually compressed and sent to the decoder device to enable higher quality reconstruction. In certain aspects, the decoder device may analyze the encoded data using the information in the encoded motion model within the modified moving motion model framework.

특정 양상들은 인코딩이 디코더 디바이스에서 오브젝트들을 렌더링하는데 필요한 정보의 양을 상당히 감소시키는 오브젝트들을 식별하는 프로세스를 설명한다. 상기 양상들 중 몇몇에서, 하나의 배경 오브젝트 및 임의의 수의 전경 오브젝트들은 이미지 세분화, 그래프 기반 기술들 또는 장면 구성 정보를 사용하여 식별된다. 배경 오브젝트는 그 후에 분류된다. 전술된 2개 단계들을 포함하는 오브젝트 기반의 장면 분석이 비디오 시퀀스의 세부 부분 또는 전체 비디오 시퀀스에 수행되어 종료되면, 각각의 오브젝트의 전개(evolution) 및 그 다이내믹 동작은 적절한 모션-변형 모델에 의해 정확히 설명될 수 있다. 예를 들어, 균일한 이동 모션을 경험하는 오브젝트에 대하여, 전체 궤도는 모션 벡터(공칭 프레임간 간격과 관련하여 정규화된)에 의해 간단히 설명될 수 있다. 상기 정보는 상기 오브젝트의 단일 스냅샷의 시각 데이터와 결합하여 오브젝트가 장면 밖으로 이동하거나 그 모션 또는 시각적 특성들의 일부가 변화할 때까지 디코더 디바이스에서 오브젝트를 정확히 렌더링하는데 사용될 수 있다. 오브젝트의 모션 또는 시각적 특성들 중 하나에서의 변경은 오브젝트에 대한 최소 불균일 시각적 샘플링 패턴을 식별하는데 사용될 수 있다. 유사한 방식으로, 잠정적으로 어느 정도 복잡한 모션 궤도들(trajectories) 및 폐쇄(occlusion) 속성들은 장면 내에서 이전에 식별된 오브젝트들에 대하여 결정될 수 있다.Certain aspects describe a process for identifying objects in which encoding significantly reduces the amount of information needed to render the objects at the decoder device. In some of the above aspects, one background object and any number of foreground objects are identified using image segmentation, graph based techniques or scene configuration information. The background object is then classified. If object-based scene analysis, including the two steps described above, is performed on a detailed or complete video sequence of the video sequence, then the evolution of each object and its dynamic behavior are accurately determined by the appropriate motion-deformation model. Can be explained. For example, for an object that experiences uniform moving motion, the overall trajectory can simply be described by a motion vector (normalized with respect to the nominal interframe spacing). The information can be used in conjunction with the visual data of a single snapshot of the object to accurately render the object at the decoder device until the object moves out of the scene or some of its motion or visual characteristics change. Changes in one of the motion or visual characteristics of the object may be used to identify a minimal non-uniform visual sampling pattern for the object. In a similar manner, potentially complex motion trajectories and occlusion properties can be determined for previously identified objects in the scene.

하기의 설명에서, 본원 발명의 양상들의 충분한 이해를 제공하기 위해 특정 세부 설명들이 주어진다. 그러나, 상기 앙샹들은 상기 특정 세부 설명들 없이 실행될 수 있음이 당업자에 의해 인식될 것이다. 예를 들어, 전자 소자들은 불필요한 설명에서 양상들을 불명료하지 않도록 하기 위해 블럭 다이어그램으로 도시될 수 있다. 다른 경우들에서, 상기 소자들, 다른 구조들 및 기술들은 상기 양상들을 추가로 설명하기 위해 상세히 도시될 수 있다.In the following description, specific details are given to provide a thorough understanding of aspects of the present invention. However, it will be appreciated by those skilled in the art that the features may be practiced without the specific details. For example, electronic components may be shown in block diagrams in order not to obscure aspects in unnecessary description. In other instances, the devices, other structures and techniques may be shown in detail to further illustrate the aspects.

본원 발명의 양상들은 흐름도, 흐름 다이어그램, 구조 다이어그램 또는 블럭 다이어그램으로 도시된 프로세스로서 설명될 수 있다. 흐름도는 순차적인 프로세스로서 동작들을 설명할 있지만, 상기 동작들 다수는 동시에 또는 순차적으로 수행 될 수 있고, 상기 프로세스는 반복될 수 있다. 또한, 동작들의 순서는 재배열될 수 있다. 프로세스는 동작들이 종료될 때 종료된다. 프로세스는 방법, 기능, 절차, 서브루틴, 서브프로그램 등등에 상응할 수 있다. 프로세스가 하나의 기능에 해당할 때, 그 종료는 상기 기능의 호출 기능 또는 주요 기능으로의 복귀에 해당한다. Aspects of the invention may be described as a process depicted in a flow diagram, flow diagram, structure diagram or block diagram. Although the flowchart describes the operations as a sequential process, many of the operations may be performed simultaneously or sequentially, and the process may be repeated. In addition, the order of the operations may be rearranged. The process ends when the operations end. Processes may correspond to methods, functions, procedures, subroutines, subprograms, and so forth. When a process corresponds to one function, the end corresponds to the return of the function or the main function of the function.

도 1A는 스트리밍 비디오의 전달을 위한 일 양상에 따라 다양한 모션 모델들을 사용하여 인코더 보조 프레임 레이트 상향 변환(EA-FRUC) 시스템을 구현하는 통신 시스템의 일 예의 설명이다. 시스템(100)은 인코더 디바이스(105) 및 디코더 디바이스(110)를 포함한다.1A is an illustration of an example of a communication system implementing an encoder assisted frame rate up-conversion (EA-FRUC) system using various motion models in accordance with an aspect for delivery of streaming video. System 100 includes encoder device 105 and decoder device 110.

인코더 디바이스(105)는 프레임 생성기(115), 모델러(120), 분할기(160), 멀티미디어 인코더(125), 메모리 컴포넌트(130), 프로세서(135), 및 수신기/송신기(140)를 포함한다. 프로세서(135)는 일반적으로 예시적인 인코더 디바이스(105)의 전체 동작을 제어한다.Encoder device 105 includes frame generator 115, modeler 120, divider 160, multimedia encoder 125, memory component 130, processor 135, and receiver / transmitter 140. The processor 135 generally controls the overall operation of the example encoder device 105.

분할기 컴포넌트(160)는 비디오 프레임들을 서로 다른 블럭들로 분할하여 모션 모델들이 비디오 프레임의 서브 세트 영역들과 연관될 수 있게 한다. 모션-변형 정보의 분석은 초기 장면/프레임을 분할하는데 성공적으로 사용될 수 있고, 전송된 프레임들의 데이터에 기초하여 성공적으로 보간될 수 있는 프레임들과는 대조적으로 압축되어 전송될 필요가 있는 프레임들의 최소 시간 샘플링을 결정하는데 사용될 수 있다. 특정 양상들에서, 샘플링 인스턴스들의 (최소) 개수는 모션-변형 다이내믹들이 변화들을 경험하는 시점에 기초한다. 따라서 적절한 프레임 보간은 모션-변형 다이내믹들의 적절한 분할에 기초하여 실행될 수 있다.The divider component 160 divides the video frames into different blocks to allow motion models to be associated with subset regions of the video frame. Analysis of motion-deformation information can be used to segment the initial scene / frame successfully, and the minimum time sampling of the frames that need to be compressed and transmitted in contrast to the frames that can be successfully interpolated based on the data of the transmitted frames. Can be used to determine In certain aspects, the (minimum) number of sampling instances is based on when the motion-deformation dynamics experience changes. Appropriate frame interpolation can thus be performed based on the proper partitioning of motion-modified dynamics.

모델러 컴포넌트(120)는 모션 모델들을 결정하여 이들을 장면을 포함하는 비디오 프레임들 내에서 발견되는 오브젝트들과 연관시키도록 구성된다.The modeler component 120 is configured to determine motion models and associate them with objects found within video frames containing the scene.

프레임 생성기 컴포넌트(115)는 인코더 디바이스(105)에 의해 전송된 데이터를 디코딩하는데 사용될 디코더에서의 정보뿐만 아니라 원래의 비디오 스트림으로부터의 데이터를 사용하는 보간된 프레임들을 생성한다. 보간된 프레임들을 생성하기 위한 시스템 및 방법들은 "비디오 압축을 위한 인코더 보조-프레임 레이트 상향변환(EA-FRUC)을 위한 방법 및 장치"라는 명칭의 미국 특허 공개공보 2006/0165176에 개시되며, 본 명세서에서 참조로서 통합된다. Frame generator component 115 generates interpolated frames that use data from the original video stream as well as information at the decoder that will be used to decode the data sent by encoder device 105. Systems and methods for generating interpolated frames are disclosed in US Patent Publication 2006/0165176 entitled "Methods and Apparatus for Encoder Coordinated-Frame Rate Upconversion (EA-FRUC) for Video Compression". Incorporated by reference.

멀티미디어 인코더(125)는 공간 도메인으로부터 DCT(이산 코사인 변환)의 경우에 주파수 도메인과 같은 또 다른 도메인으로 비디오(또는 오디오 또는 비공개 캡션 텍스트)데이터를 변환 및/또는 양자화하는 변환기/양자화기 컴포넌트를 포함하는 서브 컴포넌트들을 포함할 수 있다. 멀티미디어 인코더는 엔트로피(entropy) 인코더 컴포넌트를 포함할 수 있다. 엔트로피 인코더 컴포넌트는 컨텍스트-적응형(context-adaptive) 가변 길이 코딩(CAVLC)을 사용할 수 있다. 인코딩된 데이터는 양자화된 데이터, 변환된 데이터, 압축된 데이터 또는 이들의 임의의 조합을 사용할 수 있다. 메모리 컴포넌트(130)는 인코딩될 로우 비디오 데이터, 전송될 인코딩된 비디오 데이터, 헤더 정보, 헤더 디렉토리 또는 다양한 인코더 컴포넌트들에 의해 동작되는 중간 데이터와 같은 정보를 저장하기 위해 사용된다.Multimedia encoder 125 includes a converter / quantizer component that transforms and / or quantizes video (or audio or private caption text) data from the spatial domain to another domain, such as the frequency domain in the case of discrete cosine transform (DCT). It may include subcomponents. The multimedia encoder can include an entropy encoder component. The entropy encoder component can use context-adaptive variable length coding (CAVLC). The encoded data may use quantized data, transformed data, compressed data, or any combination thereof. The memory component 130 is used to store information such as raw video data to be encoded, encoded video data to be transmitted, header information, header directory, or intermediate data operated by various encoder components.

상기 예에서, 수신기/송신기 컴포넌트(140)는 외부 소스(145)로부터 인코딩 될 데이터를 수신하기 위해 사용되는 회로 및/또는 로직을 포함한다. 외부 소스(145)는 예를 들면, 외부 메모리, 인터넷, 라이브 비디오 및/또는 오디오 피드(feed)가 될 수 있고, 데이터를 수신하는 것은 유선 및/또는 무선 통신들을 포함할 수 있다. 송신기(140)는 또한 네트워크(150)를 통해 인코딩된 데이터를 전송(Tx)하기 위한 송신기와 같은 회로 및/또는 로직을 포함한다. 네트워크(150)는 전화기, 케이블 및 광섬유 또는 무선 시스템과 같은 유선 시스템의 일부가 될 수 있다. 무선 통신 시스템들의 경우에, 네트워크(150)는 코드 분할 다중 접속(CDMA 또는 CDMA 2000) 통신 시스템의 일부를 포함할 수 있거나, 선택적으로 시스템은 주파수 분할 다중 접속(FDMA) 시스템, 직교 주파수 분할 다중 접속(OFDMA) 시스템, GSM/GPRS(범용 패킷 무선 서비스)/EDGE(개선된 데이터 GSM 환경) 또는 서비스 산업을 위한 TETRA(지상 중계 회선의 무선) 이동 전화기 기술과 같은 시간 분할 다중 접속(TDMA) 시스템, 광대역 코드 분할 다중 접속(WCDMA), 높은 데이터 레이트(1xEV-DO 또는 1xEV-DO 골드(Gold) 멀티캐스트) 시스템 또는 상기 기술들의 조합을 사용하는 임의의 무선 통신 시스템이 될 수 있다. 전송된 데이터는 비디오, 오디오 및/또는 비공개 캡션과 같은 다수의 비트 스트림들을 포함할 수 있다. In the above example, receiver / transmitter component 140 includes circuitry and / or logic used to receive data to be encoded from an external source 145. External source 145 can be, for example, external memory, the Internet, live video and / or audio feeds, and receiving data can include wired and / or wireless communications. Transmitter 140 also includes circuitry and / or logic, such as a transmitter for transmitting (Tx) the encoded data over network 150. Network 150 may be part of a wired system such as a telephone, cable and fiber optic or wireless system. In the case of wireless communication systems, network 150 may comprise part of a code division multiple access (CDMA or CDMA 2000) communication system, or optionally the system may be a frequency division multiple access (FDMA) system, an orthogonal frequency division multiple access Time division multiple access (TDMA) systems, such as (OFDMA) systems, GSM / GPRS (Universal Packet Radio Service) / EDGE (Enhanced Data GSM Environment), or TETRA (Terrestrial Relay Lines) mobile phone technology for the service industry, Wideband code division multiple access (WCDMA), high data rate (1xEV-DO or 1xEV-DO Gold multicast) systems, or any wireless communication system using a combination of the above techniques. The transmitted data may comprise a number of bit streams, such as video, audio and / or private captions.

도 1에 도시된 인코더 디바이스(105)중 하나 또는 그이상의 엘리먼트들은 생략, 재배치 및/또는 결합될 수 있다. 예를 들어, 프로세서 컴포넌트(135)는 인코더 디바이스(105)의 외부 장치가 될 수 있다.One or more elements of the encoder device 105 shown in FIG. 1 may be omitted, rearranged, and / or combined. For example, processor component 135 may be external to encoder device 105.

디코더 디바이스(110)는 멀티미디어 디코더(165), 메모리 컴포넌트(170), 수신기(175) 및 프로세서(180)를 포함하여 인코더 디바이스(105)와 유사한 컴포넌트 들을 포함한다. 디코더 디바이스(110)는 네트워크(150)를 통하거나 외부 저장장치(185)로부터 전송된 인코딩된 멀티미디어 데이터를 수신한다. 수신기(175)는 네트워크(150)와 결합하여 인코딩된 데이터를 수신하기(Rx) 위해 사용되는 회로 및/또는 로직뿐만 아니라 외부 저장장치(185)로부터 인코딩된 데이터를 수신하기 위한 로직을 포함한다. 외부 저장장치(185)는 예를 들면, 외부 RAM 또는 ROM 또는 원격 서버가 될 수 있다.Decoder device 110 includes components similar to encoder device 105, including multimedia decoder 165, memory component 170, receiver 175, and processor 180. Decoder device 110 receives encoded multimedia data transmitted via network 150 or from external storage 185. Receiver 175 includes logic for receiving encoded data from external storage 185 as well as circuitry and / or logic used to receive (Rx) encoded data in conjunction with network 150. External storage 185 may be, for example, external RAM or ROM or a remote server.

멀티미디어 디코더(165)는 수신된 인코딩된 멀티미디어 비트스트림들을 디코딩할 때 사용되는 회로 및/또는 로직을 포함한다. 멀티미디어 디코더(165)의 서브 컴포넌트들은 역양자화 컴포넌트, 역변환 컴포넌트 및 다양한 에러 복원 컴포넌트들을 포함할 수 있다. 에러 복원 컴포넌트들은 더 낮은 레벨의 에러 검출 및 정정 컴포넌트들(리드-솔로몬(Reed-Solomon) 코딩 및/또는 터보-코딩) 뿐만 아니라 더 낮은 계층의 방법들에 의해 정정할 수 없는 데이터를 대체 및/또는 숨기기 위해 사용되는 상위 계층 에러 복원 및/또는 에러 숨김을 포함할 수 있다.The multimedia decoder 165 includes circuitry and / or logic used to decode the received encoded multimedia bitstreams. Subcomponents of the multimedia decoder 165 may include an inverse quantization component, an inverse transform component, and various error recovery components. Error recovery components replace and / or replace data that cannot be corrected by lower level error detection and correction components (Reed-Solomon coding and / or turbo-coding) as well as lower layer methods. Or higher layer error recovery and / or error hiding used for hiding.

디코딩된 멀티미디어 데이터는 디스플레이 컴포넌트(190)를 사용하여 디스플레이되거나, 외부 저장 장치(185) 내에 저장되거나 내부 메모리 컴포넌트(170) 내에 저장될 있다. 디스플레이 컴포넌트(190)는 디코더 디바이스(110)의 통합된 부분이 될 수 있다. 디스플레이 컴포넌트(190)는 디스플레이 스크린 및/또는 스피커들을 포함하여 비디오 및/또는 오디오 디스플레이 하드웨어 및 로직과 같은 부분들을 포함한다. 디스플레이 컴포넌트(190)는 또한 외부 주변 디바이스들이 될 수 있다. 상기 예에서, 수신기(175)는 외부 저장 컴포넌트(185) 또는 디스플레이 컴포 넌트(190)에 디코딩된 멀티미디어 데이터를 통신하기 위해 사용된 로직을 포함한다.Decoded multimedia data may be displayed using display component 190, stored in external storage 185, or stored in internal memory component 170. Display component 190 may be an integrated part of decoder device 110. Display component 190 includes portions such as video and / or audio display hardware and logic, including a display screen and / or speakers. Display component 190 may also be external peripheral devices. In this example, receiver 175 includes logic used to communicate decoded multimedia data to external storage component 185 or display component 190.

도 1에 도시된 디코더 디바이스(110)의 일 또는 그 이상의 엘리먼트들은 생략되거나, 재배치되거나 및/또는 결합될 수 있는 것이 주지되어야 한다. 예를 들어, 프로세서(180)는 디코더 디바이스(110)의 외부 장치가 될 수 있다.It should be noted that one or more elements of decoder device 110 shown in FIG. 1 may be omitted, rearranged, and / or combined. For example, the processor 180 may be an external device of the decoder device 110.

도 1B는 스트리밍 비디오의 전달을 위한 일 양상에 따라 다양한 모션 모델들을 사용하도록 구성된 EA-FRUC 디바이스(155)의 일 예의 설명이다. 다양한 모션 모델들을 사용하도록 구성된 EA-FRUC 디바이스(100)는 제 1 및 제 2 비디오 스트림들을 분할하기 위한 모듈(161), 모델링 정보를 결정하기 위한 모듈(121), 보간 프레임을 생성하기 위한 모듈(116) 및 인코딩 정보를 생성하기 위한 모듈(126)을 포함한다.1B is an illustration of an example of an EA-FRUC device 155 configured to use various motion models in accordance with an aspect for delivery of streaming video. The EA-FRUC device 100 configured to use various motion models includes a module 161 for dividing the first and second video streams, a module 121 for determining modeling information, a module for generating an interpolation frame ( 116 and a module 126 for generating encoding information.

일 양상에서, 제 1 및 제 2 비디오 프레임들 중 적어도 하나를 다수의 분할들로 분할하기 위한 수단은 제 1 및 제 2 비디오 프레임들을 분할하기 위한 모듈(161)을 포함한다. 일 양상에서, 상기 다수의 분할들 중 적어도 하나 내의 적어도 하나의 오브젝트에 대한 모델링 정보를 결정하기 위한 수단은 모델링 정보를 결정하기 위한 모듈(121)을 포함한다. 일 양상에서, 모델링 정보에 기초하여 보간 프레임을 생성하기 위한 수단은 보간 프레임을 생성하기 위한 모듈(116)을 포함한다. 일 양상에서, 보가 프레임에 기초하여 인코딩 정보를 생성하기 위한 수단은 인코딩 정보를 생성하기 위한 모듈(126)을 포함한다.In one aspect, means for dividing at least one of the first and second video frames into a plurality of partitions includes a module 161 for dividing the first and second video frames. In one aspect, the means for determining modeling information for at least one object in at least one of the plurality of partitions includes a module 121 for determining modeling information. In one aspect, means for generating an interpolation frame based on the modeling information includes a module 116 for generating the interpolation frame. In one aspect, means for generating encoding information based on the boy frame includes a module 126 for generating encoding information.

도 2는 다양한 모션 모델들을 사용하도록 구성된 도 1A의 EA-FRUC 시스템의 동작을 설명하는 흐름도이다. 먼저, 단계(201)에서, 비디오 데이터는 도 3을 참조항 상세히 설명되는 것과 같이 오브젝트 기반의 모델링 정보 및 디코더 디바이스(110) 상의 정보를 사용하여 업샘플링하기 위해 인코딩된다. 다음에, 단계 202에서, 인코딩된 정보는 디코더 디바이스(110)로 전송된다. 특정 양상들에서, 인코딩된 정보는 인코더 디바이스(105)의 송신기 모듈(140)로부터 디코더 디바이스(110)의 수신기(175)로 전송된다. 인코딩된 정보를 수신하면, 단계(203)에서 프로세스는 디코더 디바이스(110)가 인코딩된 오브젝트 기반의 모델링 정보를 사용하여 원래의 비디오 데이터의 압축된 버전을 재생성하여 인코딩된 정보를 디코딩할 때 종료한다. 단계(203)는 도 6을 참조하여 추가로 설명될 것이다.2 is a flow diagram illustrating operation of the EA-FRUC system of FIG. 1A configured to use various motion models. First, in step 201, video data is encoded for upsampling using object based modeling information and information on decoder device 110 as described in detail with reference to FIG. Next, in step 202, the encoded information is sent to the decoder device 110. In certain aspects, the encoded information is sent from the transmitter module 140 of the encoder device 105 to the receiver 175 of the decoder device 110. Upon receiving the encoded information, in step 203 the process ends when the decoder device 110 regenerates the compressed version of the original video data using the encoded object-based modeling information to decode the encoded information. . Step 203 will be further described with reference to FIG. 6.

도 3은 오브젝트 기반의 모델링 정보 및 디코더 정보를 사용하여 업샘플링하기 위해 비디오 데이터를 인코딩하는 것을 설명하는 흐름도이다. 먼저, 단계 301에서, 모델링 정보는 도 4를 참조하여 추가로 설명되는 것과 같이 비디오 프레임 내의 오브젝트들에 대하여 결정된다. 다음에, 단계 302에서, 인코딩된 비디오 데이터를 디코딩하도록 사용될 디코딩 시스템에서의 정보는 인코딩된 비디오를 업샘플링하기 위해 추가로 사용된다. 마지막으로, 단계 303에서, 인코딩된 비디오 비트스트림은 "스케일가능한 비디오 코딩에서 프레임 레이트 상향 변환 기술들을 사용하기 위한 방법 및 장치"라는 명칭의 미국 특허 공개공보 2006/0002465에서 논의되는 것과 같이 생성되며, 상기 특허는 본 명세서에서 참조로서 통합된다.3 is a flow diagram illustrating encoding video data for upsampling using object based modeling information and decoder information. First, in step 301, modeling information is determined for the objects in the video frame as further described with reference to FIG. Next, at step 302, the information in the decoding system to be used to decode the encoded video data is further used to upsample the encoded video. Finally, in step 303, the encoded video bitstream is generated as discussed in US Patent Publication 2006/0002465 entitled "Methods and Apparatus for Using Frame Rate Upconversion Techniques in Scalable Video Coding", The patent is incorporated herein by reference.

도 4는 본 발명의 일 양상에 따라 비디오 프레임 내의 오브젝트들에 대한 모델링 정보를 결정하는 것을 설명하는 흐름도이다. 설명되는 양상에서, 이동하는 오브젝트들은 임의의 모션들 및 변형들을 경험하는 오브젝트들을 인식하는 것과 관련된 본 명세서의 특정 기술들을 사용하여 식별된다. 다른 양상들에서, 오브젝트들은 종래 기술에서 공지된 것과 같이 각각의 비디오 프레임에 하이브리드 비디오 압축 방식에 기초한 모션-보상된 예측 및 변환 코딩을 균일하게 적용함으로써 식별될 수 있다. 또한, 논의되는 양상에서, 오브젝트-기반의 아핀 모델 또는 로컬 GMC로 공통으로 지칭되는 아핀 모델들은 비디오 프레임의 일부분을 커버하기 위해 사용된다. 상기 경우에서, 인코더 디바이스(105)는 모션 내의 오브젝트들의 위치를 결정하기 위해 오브젝트 분할을 수행하며, 그후에 아핀 모델 자체와 오브젝트 서술자를 사용하여 아핀 모델 추정을 업데이트한다. 예를 들어, 이진 비트맵은 비디오 프레임 내의 서술된 오브젝트의 경계를 표시할 수 있다. 아핀 모델이 전체 비디오 프레임을 커버하는 양상들에서, 글로벌 움직임 보상(GMC)이 사용된다. GMC 경우들을 위해, 아핀 모델 모션에서 사용되는 6개 파라미터들은 프레임의 모션을 설명하기 위해 사용되며, 비트 스트림 내에 포함된 임의의 다른 모션 정보 없이 디코더 디바이스(110)로 전송된다. 또다른 양상들에서, 아핀 모델들과 다른 모션 모델들이 사용될 수 있다. 4 is a flow diagram illustrating determining modeling information for objects in a video frame in accordance with an aspect of the present invention. In the aspect described, moving objects are identified using specific techniques herein related to recognizing objects experiencing certain motions and deformations. In other aspects, objects may be identified by uniformly applying motion-compensated prediction and transform coding based on a hybrid video compression scheme to each video frame as is known in the art. Also in the aspects discussed, affine models commonly referred to as object-based affine models or local GMC are used to cover a portion of the video frame. In this case, the encoder device 105 performs object segmentation to determine the position of the objects in motion, and then updates the affine model estimate using the affine model itself and the object descriptor. For example, the binary bitmap may indicate the boundaries of the described object within the video frame. In aspects in which the affine model covers the entire video frame, global motion compensation (GMC) is used. For GMC cases, the six parameters used in the affine model motion are used to describe the motion of the frame and are sent to the decoder device 110 without any other motion information included in the bit stream. In still other aspects, affine models and other motion models may be used.

먼저, 단계 401에서, 비디오 프레임은 블럭들로 분할된다. 특정 양상들에서, 블럭들은 고정된 크기와 형태를 갖는다. 다른 양상들에서, 프레임은 두드러지는 모션-변형 행동들, 영역들 내의 시간 변화 다이내믹들, 고유하게 식별할 수 있는 오브젝트들을 포함하는 인자들 중 하나 또는 조합에 기초하여 불균일한 크기 및/또는 불균일한 형태의 블럭들로 분할될 수 있다.First, in step 401, the video frame is divided into blocks. In certain aspects, the blocks have a fixed size and shape. In other aspects, the frame is non-uniform in size and / or non-uniform based on one or a combination of prominent motion-deformation behaviors, time varying dynamics in regions, and factors including uniquely identifiable objects. It can be divided into blocks of the form.

다음에, 단계 402에서, 하나의 배경 오브젝트가 식별되고 제로 또는 그 이상의 전경 오브젝트들이 식별된다. 특정 양상들에서, 이미지 분할을 사용하여 식별이 실행될 수 있다. 이미지 분할은 임계화와 결합하여 밝기 및 컬러 값들과 같은 픽셀 도메인 특성들뿐만 아니라 영역-기반 방법들과 결합하여 상기 특성들의 특정 통계치들, 예컨대 평균, 분산, 표준 편차, 최소-최대, 중간값 및 다른 통계들을 분석하는 것을 포함한다. 다른 양상들에서, 식별은 Markov 랜덤 필드 또는 Fractals를 사용하여 실행될 수 있다. 다른 양상들에서, 식별은 기울기 이미지들 및 형태 모델들로의 Watershed 변환을 포함하는 에지/윤곽 검출을 사용하여 실행된다. 추가 양상들에서, 식별은 일반적으로 활성 윤곽 모델이라 지칭되는 접속-유지 완화-기반의 분할 방법들을 사용하여 실행될 수 있다. 다른 양상들에서, 식별은 모션 필드들과 같은 시간 정보를 사용하여 실행될 수 있다. 특정 양상들에서, 이미지 분할은 단일 기본 구조 내에서의 전술된 이미지 분할 접근 방식들 중 몇몇 또는 전부의 조합을 사용하여 발생할 수 있다.Next, in step 402, one background object is identified and zero or more foreground objects are identified. In certain aspects, identification may be performed using image segmentation. Image segmentation may be combined with thresholding in addition to pixel domain characteristics such as brightness and color values, as well as with region-based methods to determine specific statistics of the characteristics such as mean, variance, standard deviation, minimum-maximum, median and Analyzing other statistics. In other aspects, the identification can be performed using Markov random fields or Fractals. In other aspects, the identification is performed using edge / contour detection including Watershed transform into gradient images and shape models. In further aspects, identification may be performed using connection-maintaining mitigation-based segmentation methods generally referred to as an active contour model. In other aspects, the identification may be performed using time information such as motion fields. In certain aspects, image segmentation may occur using a combination of some or all of the image segmentation approaches described above within a single basic structure.

특정 양상들에서, 오브젝트들은 국부 및 전역, 의미론적 및 통계학적(강도/텍스처) 그룹 규들(CUES)과 같은 그래프-기반의 기술들을 사용하여 식별될 수 있다. 추가 양상들에서, 전술된 오브젝트들의 식별은 창작-툴로부터 사용가능한 장면 구성 정보를 사용하여 실행될 수 있다. 특정 양상들에서, 배경 오브젝트 및 임의의 전경 오브젝트는 단일 기본구조 내에서의 전술된 식별 접근 방식들 중 몇몇 또는 전부의 조합을 사용하여 식별될 수 있다.In certain aspects, objects may be identified using graph-based techniques such as local and global, semantic and statistical (strength / texture) group rules (CUES). In further aspects, the identification of the above-described objects may be performed using scene composition information available from the creation-tool. In certain aspects, the background object and any foreground object may be identified using a combination of some or all of the aforementioned identification approaches within a single infrastructure.

그후에, 단계 403에서, 배경 오브젝트가 분류된다. 특정 양상들에서, 배경 오브젝트는 배경 오브젝트의 일 전송이 디코더 디바이스(110)에서의 향후 프레임 보간 및/도는 디코딩/재구성 작업들에 만족하는 스틸 이미지로 분류될 수 있다. 다른 양상들에서, 배경 오브젝트는 팬, 스크롤, 회전, 줌-인 또는 줌-아웃 모션과 같은 정체 모션을 경험하는 스틸(거의 고정적인) 이미지로 분류된다. 상기 경우에, 인코더 디바이스(105)는 글로벌 모션 모델의 설명과 결합하여 배경 이미지의 특정 샘플 상태들을 전송할 것을 적절히 선택한다. 전송은 디코더 디바이스(110)에서 프레임 보간 및/또는 디코딩/재구성 작업들에 대하여 만족할 수 있다. 추가 양상들에서, 배경 오브젝트의 분류는 전술된 2개의 클래스들 중 하나에 속하지 않을 수 있으며, 그 경우에 배경 이미지의 상태들의 잠정적으로 높은 밀도의 시간 샘플링은 디코더 디바이스(110)에서의 성공적인 프레임 보간 및/또는 디코딩/재구성을 지원하기 위해 인코더 디바이스(105)에 의해 전송될 수 있다. Then, in step 403, the background object is classified. In certain aspects, the background object may be classified as a still image where one transmission of the background object satisfies future frame interpolation and / or decoding / reconstruction operations at decoder device 110. In other aspects, the background object is classified as a still (almost stationary) image that experiences stagnant motion such as pan, scroll, rotate, zoom in or zoom out motion. In such a case, the encoder device 105 properly selects to send specific sample states of the background image in combination with the description of the global motion model. The transmission may be satisfied for frame interpolation and / or decoding / reconstruction tasks at the decoder device 110. In further aspects, the classification of the background object may not belong to one of the two classes described above, in which case the potentially high density time sampling of the states of the background image is a successful frame interpolation at the decoder device 110. And / or by encoder device 105 to support decoding / reconstruction.

다음에, 단계(404)에서, 비디오 데이터로부터 식별된 오브젝트들에 대한 모션 벡터 정보가 처리된다. 모센 벡터 정보는 "모션 벡터 처리를 위한 방법 및 장치"라는 명칭의 미국 특허 공개공보 2006/0018382에 개시된 시스템들 및 방법들을 사용하여 처리될 수 있고, 본 명세서에 참조로서 통합된다. 단계 405에서, 추정된 아핀 모델들은 이동하는 오브젝트들과 연관된다. 아핀 모델은 구분적인 평면 모션 벡터 필드 근사화의 수행시 최소한의 감소에 기초하여 추정될 수 있다. 각각의 식별된 이동하는 오브젝트와 연관된 각각의 아핀 모델은 도 5를 참조하여 하기에서 설명되는 것과 같이 모션 벡터 침식 정보를 사용하여 단계(406)에서 특정되고, 그후에 모션 기반의 오브젝트 분할을 사용하여 단계(407)에서 추가로 특정된다. 상 기 추가 특정들은 단계(408)에서 가각의 개별 아핀 모델을 업데이트하기 위해 사용되고, 결국 프로세스는 단계(409)에서 오브젝트 서술자가 아핀 모델들을 위해 생성될 때 종료한다.Next, in step 404, motion vector information for the objects identified from the video data is processed. Mossen vector information may be processed using the systems and methods disclosed in US Patent Publication 2006/0018382 entitled “Methods and Apparatus for Motion Vector Processing” and incorporated herein by reference. In step 405, the estimated affine models are associated with the moving objects. The affine model can be estimated based on the minimal reduction in performing the discrete planar motion vector field approximation. Each affine model associated with each identified moving object is specified in step 406 using motion vector erosion information, as described below with reference to FIG. 5, and then using motion based object segmentation. It is further specified at 407. The further specifications are used to update each individual affine model in step 408, and the process eventually ends when an object descriptor is created for the affine models in step 409.

도 5는 아피 모델들을 사용하여 비디오 프레임 내의 오브젝트들에 대한 모션 벡터 침식 정보를 결정하는 것을 설명하는 흐름도이다. 먼저, 단계 501에서, 인코더 디바이스(105)는 이동하는 오브젝트와 연관하기 위한 아핀 모델을 결정한다. 인코더 디바이스(105)는 그 후에 단계 502에서 비디오 프레임에 대한 오브젝트 맵의 제 1 마크로 블럭으로 진행하며, 단계 503에서, 오브젝트 맵의 각각의 마크로 블럭에 대하여 인코더 디바이스(105)는 결정 단계 504에서 마크로 블럭이 단계(501)로부터 결정된 아핀 모델이 정합하는지를 결정한다. 만약 마크로 블럭이 아핀 모델과 정합하지 않으면, 단계 505에서 정합하는 마크로 블럭을 사용하여 아핀 모델 기반의 오브젝트 맵이 업데이트된다. 인코더 디바이스(105)는 단계(503)로 복귀함으로써 단계(506)에서 다음 마크로 블럭으로 진행한다. 그러나 만약 마크로 블럭이 아핀 모델과 정합하지 않으면, 디코더 디바이스는 단계(503)로 복귀함으로써 단계(506)에서 다음 마크로 블럭으로 즉시 진행한다. 그렇지않으면, 프로세스는 종료한다.FIG. 5 is a flow diagram illustrating determining motion vector erosion information for objects in a video frame using api models. First, at step 501, encoder device 105 determines an affine model for associating with a moving object. The encoder device 105 then proceeds to step 502 with the first macro block of the object map for the video frame, and at step 503, for each macro block of the object map, the encoder device 105 determines the macro in decision step 504. The block determines if the affine model determined from step 501 matches. If the macro block does not match the affine model, the affine model based object map is updated using the matching macro block in step 505. The encoder device 105 advances to the next macro block in step 506 by returning to step 503. However, if the macro block does not match the affine model, then the decoder device immediately proceeds to the next macro block in step 506 by returning to step 503. Otherwise, the process terminates.

이동하는 모델을 사용하는 블럭 기반의 모션 보상이 디코더 디바이스들(디바이스들의 소프트웨어 또는 하드웨어 양상들)에서 광범위하게 전개되지만, EA-FRUC가 디코더 디바이스들 내에서 구현될 서로 다른 모션 모델들을 사용하기 위해, 인코더 디바이스(105)로부터의 모션 정보는 이동하는 블럭 기반의 모션 벡터 기본구 조 내에서 설명된다. 특정 양상들에서, 디코더 디바이스(110)의 이동하는 블럭 기반의 모션 기본구조 내의 서로 다른 모션 모델을 설명하는 프로세스는 더 큰 블럭 사이즈르 위한 모션 벡터를 생성하기 위해 더 작은 블럭 사이즈들의 블럭 모션 벡터들을 위해 재귀적으로 실행될 수 있다.While block-based motion compensation using a moving model is widely deployed in decoder devices (software or hardware aspects of devices), for EA-FRUC to use different motion models to be implemented within decoder devices, Motion information from the encoder device 105 is described within a moving block based motion vector basic structure. In certain aspects, the process of describing different motion models in the moving block-based motion framework of decoder device 110 may generate block motion vectors of smaller block sizes in order to generate motion vectors for larger block sizes. Can be run recursively.

비디오 비트 스트림 내에 인코딩된 모션 모델에서의 정보를 사용할 때, 디코더 디바이스(110)는 원래의 비디오 내의 오브젝트를 디스플레이하기 위해 사용된 다수의 픽셀들의 일부분을 사용하여 선택된 이동하는 오브젝트들에 대한 모션 벡터들을 생성한다. 특정 양상들에서, 선택된 픽셀들은 블럭 내에 균일하게 분포될 수 있다. 다른 양상들에서, 픽셀들은 블럭으로부터 랜덤하게 선택될 수 있다. When using the information in the motion model encoded in the video bit stream, decoder device 110 uses the portion of the plurality of pixels used to display the object in the original video to select motion vectors for the selected moving objects. Create In certain aspects, the selected pixels may be evenly distributed within the block. In other aspects, the pixels may be randomly selected from the block.

특정 양상들에서, 블럭들의 다수의 모션 벡터들은 그후에 블럭을 표시하는 단일 모션 벡터를 생성하도록 통합되고, 상기 모션 벡터는 추가로 전술된 것과 같은 벡터 평탄화와 같은 후속 처리가 수행될 수 있다. 다른 양상들에서, 선택된 픽셀 또는 오브젝트 모션 벡터는 관심 있는 블럭을 표시하는 모션 벡터를 생성하기 위해 모션 추정 모듈에 대한 시드(seed) 모션 벡터로서 사용될 수 있다. In certain aspects, multiple motion vectors of the blocks are then integrated to produce a single motion vector representing the block, which motion vector may further be subjected to subsequent processing such as vector planarization as described above. In other aspects, the selected pixel or object motion vector can be used as a seed motion vector for the motion estimation module to generate a motion vector indicative of the block of interest.

도 6은 본 발명의 특정 양상들에 따라 이동하는 모션 모델 기본구조 내의 모션 모델들을 디코딩하도록 구성된 디코더 디바이스를 사용하여 오브젝트 기반의 모델링 정보 및 디코더 정보를 사용하여 업샘플링된 인코딩된 비디오 데이터 비트스트림을 디코딩하는 것을 설명하는 흐름도이다.6 illustrates an upsampled encoded video data bitstream using object-based modeling information and decoder information using a decoder device configured to decode motion models within a moving motion model framework in accordance with certain aspects of the present invention. This is a flowchart illustrating decoding.

단계 601에서, 디코더 디바이스(110)는 2개의 기준 프레임들을 포함하는 비디오 비트스트림에 대하여 인코딩된 정보를 수신한다. 다음에, 결정 단계(602)에 서, 디코더 디바이스(110)는 비트 스트림이 인코더 개선된 보간 프레임을 포함하는지의 여부를 결정한다. 만약 인코더 개선된 보간 프레임이 포함되면, 단계(603)에서 디코더 디바이스는 보간된 프레임과 일시적으로 공용인(co-terminal) 비디오 프레임을 생성하기 위해 기준 프레임에 부가하여 다양한 모션 모델들과 관련된 인코더 개선된 정보를 포함하는 보간 프레임을 사용한다. 다시 말해서, 디코더 디바이스는 보간 프레임을 대신하는 비디오 프레임을 생성하기 위해 인코더 개선된 보간 프레임과 함께 그 연관된 기준 프레임을 사용한다. 그러나, 만약 단계(602)에서 디코더 디바이스(110)가 인코더 개선된 보간된 프레임 정보가 비트 스트림내에 포함되지 않는다고 결정하면, 단계(604)에서 디코더 디바이스(110)는 양방향 프레임(B-프레임)을 생성하기 위해 기준 프레임을 사용할 것이다.In step 601, the decoder device 110 receives encoded information for a video bitstream that includes two reference frames. Next, at decision step 602, the decoder device 110 determines whether the bit stream includes an encoder enhanced interpolation frame. If an encoder enhanced interpolation frame is included, in step 603 the decoder device adds an encoder enhancement associated with various motion models in addition to the reference frame to generate a video frame that is temporarily co-terminal with the interpolated frame. Use an interpolation frame containing the compiled information. In other words, the decoder device uses its associated reference frame together with the encoder enhanced interpolation frame to produce a video frame that replaces the interpolation frame. However, if at step 602 the decoder device 110 determines that the encoder enhanced interpolated frame information is not included in the bit stream, then at step 604 the decoder device 110 generates a bidirectional frame (B-frame). We will use the frame of reference to create it.

당업자는 정보 및 신호들이 임의의 다수의 상이한 기술들 및 테크닉들을 사용하여 표현될 수 있음을 인식할 것이다. 예를 들어, 상기 설명을 통해 참조될 수 있는 데이터, 지시들, 명령들, 정보, 신호들, 비트들, 심볼들 및 칩들은 전압들, 전류들, 전자기파들, 전자기장들, 또는 전자기 입자들, 광학계들 또는 광학 입자들, 또는 그들의 임의의 조합에 의해 표시될 수 있다. Those skilled in the art will appreciate that information and signals may be represented using any of a number of different technologies and techniques. For example, data, instructions, instructions, information, signals, bits, symbols, and chips that may be referenced throughout the description may include voltages, currents, electromagnetic waves, electromagnetic fields, or electromagnetic particles, By optical systems or optical particles, or any combination thereof.

당업자는 또한 본 명세서에 개시된 실시예들과 관련하여 설명된 논리적인 블럭들, 모듈들, 회로들, 및 알고리즘 단계들이 전자하드웨어, 컴퓨터 소프트웨어, 또는 그들의 조합으로서 실행될 수 있음을 인식할 것이다. 상기 하드웨어 및 소프트웨어의 상호교환가능성을 명백히 설명하기 위해, 다양한 요소들, 블럭들, 모듈들, 회로들, 및 단계들이 그들의 기능성에 관련하여 전술되었다. 상기 기능성이 하드웨어로 실행되는지 또는 소프트웨어로 실행되는지의 여부는 전체 시스템에 부과된 특정 애플리케이션 및 설계 제약에 따라 결정한다. 당업자는 각각의 특정 애플리케이션을 위해 다양한 방식들로 설명된 기능성을 실행할 수 있지만, 상기 실행 결정들은 본 발명의 영역으로부터 벗어나는 것으로 해석될 수 없다.Those skilled in the art will also recognize that the logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or a combination thereof. To clearly illustrate the interchangeability of the hardware and software, various elements, blocks, modules, circuits, and steps have been described above with regard to their functionality. Whether the functionality is implemented in hardware or software is determined by the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

본 명세서에서 개시된 실시예와 관련하여 다양하게 설명되는 논리 블럭들, 모듈들, 및 회로들은 범용 프로세서, 디지털 신호 처리기(DSP), 응용 집적 회로(ASIC), 현장 프로그램가능한 게이트 어레이(FPGA), 또는 다른 프로그램가능한 로직 디바이스, 이산 게이트 또는 트랜지스터 로직, 이산 하드웨어 요소들, 또는 본 명세서에 개시된 기능을 수행하도록 설계된 그들의 임의의 조합을 사용하여 실행되거나 수행될 수 있다. 범용 프로세서는 마이크로프로세서가 될 수 있지만, 선택적으로 프로세서는 임의의 종래의 프로세서, 제어기, 마이크로제어기, 또는 상태 기계가 될 수 있다. 프로세서는 또한 예를 들어, DSP 및 마이크로프로세서의 조합, 복수의 마이크로프로세서, DSP 코어와 결합된 하나 또는 그이상의 마이크로프로세서, 또는 임의의 다른 구성과 같은 컴퓨팅 장치들의 조합으로서 실행될 수 있다.The various logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be general purpose processors, digital signal processors (DSPs), application integrated circuits (ASICs), field programmable gate arrays (FPGAs), or It may be executed or performed using other programmable logic devices, discrete gate or transistor logic, discrete hardware elements, or any combination thereof designed to perform the functions disclosed herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, eg, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other configuration.

본 명세서에 개시된 실시예와 관련하여 설명되는 방법 또는 알고리즘의 단계는 하드웨어에서, 프로세서에 의해 실행되는 소프트웨어 모듈에서, 또는 그들의 조합에서 즉시 구현될 수 있다. 소프트웨어 모듈은 RAM 메모리, 플래시 메모리, ROM 메모리, EPROM 메모리, EEPROM 메모리, 레지스터들, 하드디스크, 제거가능한 디스크, CD-ROM 또는 임의의 다른 저장 매체 형태로 당업자에게 공지된다. 예시적인 저장 매체는 저장매체로부터 정보를 판독하고 정보를 기록할 수 있는 프로세서에 접속된다. 선택적으로, 저장 매체는 프로세서의 필수 구성요소이다. 프로세서 및 저장 매체는 ASIC 내에 상주할 수 있다. ASIC은 사용자 터미널 내에 상주할 수 있다. 선택적으로, 프로세서 및 저장 매체는 사용자 디바이스내에서 이산요소들로서 상주할 수 있다.The steps of a method or algorithm described in connection with the embodiments disclosed herein may be immediately implemented in hardware, in a software module executed by a processor, or in a combination thereof. Software modules are known to those skilled in the art in the form of RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, removable disk, CD-ROM or any other storage medium. Exemplary storage media are connected to a processor capable of reading information from and recording information from the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside within an ASIC. The ASIC can reside in a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a user device.

개시된 실시예의 전술된 설명은 당업자가 본 발명을 구현하고 이용하기에 용이하도록 하기 위하여 제공되었다. 이들 실시예에 대한 여러 가지 변형은 당업자에게 자명하며, 여기서 한정된 포괄적인 원리는 본 발명의 사용 없이도 다른 실시예에 적용될 수 있다. 따라서, 본 발명은 설명된 실시예에 한정되는 것이 아니며, 여기에 개시된 원리 및 신규한 특징에 나타낸 가장 넓은 범위에 따른다.The foregoing description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without the use of the present invention. Accordingly, the invention is not limited to the described embodiments but is to be accorded the widest scope indicated in the principles and novel features disclosed herein.

Claims

As a multimedia data processing method,

Dividing at least one of the first and second video frames into a plurality of partitions;

Determining modeling information for at least one object in at least one of the partitions, the modeling information associated with the first and second video frames;

Generating an interpolation frame based on the modeling information; And

Generating encoding information based on the interpolated frame, wherein the encoding information is used to generate a video frame temporarily co-located with the interpolated frame.

The method of claim 1, wherein determining model information for at least one object in one of the partitions comprises:

Determining block based motion field estimation;

Identifying at least one object based on the block based motion field estimation; And

Determining an affine model for the at least one object.

The method of claim 1,

Using color features to identify boundaries of the at least one object.

The method of claim 1,

Using texture features to identify boundaries of the at least one object.

The method of claim 1,

Using pixel region features to identify boundaries of the at least one object.

The method of claim 1,

Determining motion vector erosion information associated with one of the partitions, wherein the transmitted encoding information includes the motion vector erosion information.

The method of claim 1,

Wherein said modeling information comprises an affine model.

The method of claim 7, wherein

Wherein said affine model comprises at least one of translation, rotation, shearing, and scaling motion.

The method of claim 1,

The modeling information comprising a global motion model.

A multimedia data processing device,

Means for partitioning at least one of the first and second video frames into a plurality of partitions;

Means for determining modeling information for at least one object in at least one of the plurality of partitions, the modeling information associated with the first and second video frames;

Means for generating an interpolation frame based on the modeling information; And

Means for generating encoding information based on the interpolated frame, wherein the encoding information is used to generate a video frame temporarily co-located with the interpolated frame.

The method of claim 10, wherein the determining means,

Means for determining block based motion field estimation;

Means for identifying at least one object based on the block based motion field estimation; And

Means for determining an affine model for the at least one object.

The method of claim 10,

And means for using color features to identify boundaries of the at least one object.

The method of claim 10,

And means for using texture features to identify boundaries of the at least one object.

The method of claim 10,

And means for using pixel region features to identify boundaries of the at least one object.

The method of claim 10,

Means for determining motion vector erosion information associated with one of the partitions, wherein the transmitted encoding information includes the motion vector erosion information.

The method of claim 10,

And the modeling information comprises an affine model.

The method of claim 16,

A multimedia data processing device,

A partitioning module configured to partition at least one of the first and second video frames into a plurality of partitions;

A modeling module configured to determine modeling information for at least one object in at least one of the plurality of partitions, the modeling information associated with the first and second video frames;

A frame generation module configured to generate an interpolation frame based on the modeling information;

An encoding module configured to generate encoding information based on the interpolated frame; And

And a transmitting module configured to transmit the encoding information to a decoder.

A machine readable medium comprising instructions for processing multimedia data, the instructions causing a machine to perform the following operations when executed:

Split at least one of the first and second video frames into a plurality of partitions;

Determine modeling information for at least one object in at least one of the partitions, the modeling information associated with the first and second video frames;

Generate an interpolation frame based on the modeling information; And

And generate encoding information based on the interpolated frame, wherein the encoding information is used to generate a video frame temporarily co-located with the interpolated frame.

A processor for processing multimedia data,

Generate an interpolation frame based on the modeling information; And