TW201223249A - Coding stereo video data - Google Patents

Coding stereo video data Download PDF

Info

Publication number
TW201223249A
TW201223249A TW100134425A TW100134425A TW201223249A TW 201223249 A TW201223249 A TW 201223249A TW 100134425 A TW100134425 A TW 100134425A TW 100134425 A TW100134425 A TW 100134425A TW 201223249 A TW201223249 A TW 201223249A
Authority
TW
Taiwan
Prior art keywords
enhancement layer
resolution
view
data
layer data
Prior art date
Application number
TW100134425A
Other languages
Chinese (zh)
Inventor
Ying Chen
hong-qiang Wang
Marta Karczewicz
Original Assignee
Qualcomm Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US13/194,656 external-priority patent/US20120075436A1/en
Application filed by Qualcomm Inc filed Critical Qualcomm Inc
Publication of TW201223249A publication Critical patent/TW201223249A/en

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

In one example, a method of decoding video data comprising base layer data having a first resolution and enhancement layer data having the first resolution includes decoding the base layer data, wherein the base layer data comprises a reduced resolution version of a left view relative to the first resolution and a reduced resolution version of a right view relative to the first resolution. The method also includes decoding enhancement layer data comprising enhancement data for exactly one of the left view and the right view, wherein the enhancement data has the first resolution, and wherein decoding the enhancement layer data comprises decoding the enhancement layer data relative to at least a portion of the base layer data.

Description

201223249 六、發明說明: 【發明所屬之技術領域】 係關於立體視 本發明係關於視頻編碼,且更特定古之 頻資料之編碼。 國臨時申請案第 國臨時申請案第 一者之全文係以 本申請案主張2011年4月28曰申請之美 61/480,336號及2010年9月24日申請之美 61/3 86,463號的權利,該兩個申請案中每 引用的方式併入本文中。 【先前技術】 數位視頻能力可併入至廣泛範圍之器件中,該等器件包 括數位電視、數位直播系統、無線廣播系統、個人數位助 理(PDA)、膝上型或桌上型電腦、數位相機、數位記錄器 件、數位媒體播放器、視頻遊戲器件、視頻遊戲控制台、 蜂巢式或衛星無線電電話、視頻電傳會議器件及其類似 者。數位視頻器件實施視頻壓縮技術(諸如,在由MpEG- 2 、 MPEG-4 、 ITU-T H.263 或 ITU-T H.264/MPEG-4 第 10 部 分(進階視頻編碼(Advanced Video Coding,AVC)所定義之 標準及此等標準之延伸中所描述的視頻壓縮技術),以更 有效率地傳輸及接收數位視頻資訊。 視頻壓縮技術執行空間預測及/或時間預測以縮減或移 除為視頻序列所固有之冗餘。對於以區塊為基礎之視頻編 碼’可將一視頻圖框或片段分割成若干巨集區塊。可進一 步分割每一巨集區塊。使用關於相鄰巨.集區塊之空間預測 來編碼經框内編碼(I)圖框或片段中之巨集區塊。經框間編 158748.doc 201223249 碼(P或B)圖框或片段中之巨集區塊可使用關於同一圖框或 片段中之相鄰巨集區塊之空間預測,或關於其他參考圖框 之時間預測。 已作出基於H.264/AVC來開發新視頻編碼標準之努力。 一此類標準為可縮放視頻編碼(scalable video coding,SVC) 標準,其為對H.264/AVC之可縮放延伸。另一標準為多視 圖視頻編碼(multi-view video coding, MVC),其已成為對 H.264/AVC之多視圖延伸。JVT-AB204 之「Joint Draft 8.0 on Multiview Video Coding」(2008 年 7 月,德國漢諾威 (Hannover)第28次JVT會議)中描述MVC之聯合草案,其可 得自 http://wftp3.itu.int/av-arch/jvt-site/2008_07_Hannover/ JVT-AB204.zip。JVT-AD007之「Editors' draft revision to ITU-T Rec. H.264 | ISO/IEC 14496-10 Advanced Video Coding-in preparation for ITU-T SG 16 AAP Consent (in integrated form)」(2009年 2月,瑞士 曰内瓦(Geneva)第 30 次JVT會議)中描述AVC標準之版本,其可得自 http://wftp3 .itu.int/av-arch/j vt-site/2009_0 l_Geneva/JVT-AD007.zip。JVT-AD007 文件將 SVC 及 MVC 整合於 AVC 規範 中〇 【發明内容】 一般而言,本發明描述用於支援立體視頻資料(例如, 用以產生三維(3D)效果之視頻資料)之技術。為了產生視 頻之三維效果,可同時地或幾乎同時地展示一場景之兩個 視圖,例如,一左眼視圖及一右眼視圖。本發明之技術包 158748.doc 201223249 括形成具有—個基礎層及一或多個增強層之一可縮放位元 串級舉例而&,本發明之技術包括形成包括個別圖框之 -基礎層’每—圖框具有用於一場景之兩個縮減解析度視 圖之資料。亦即,該基礎層之一圖框包括用於來自該場景 之稍微不同水平透視之兩個影像之資料。因此,該基礎層 之圖忙 T被稱為經封裝圖框(packe(j frame)。除了該基礎 層以外,本發明之技術亦包括形成對應於該基礎層之一或 多個視圖之全解析度表示的一或多個增強層。該等增強層 可被層間預測(例如,相對於用於該基礎層之同一視圖之 視頻資料),及/或被視圖間預測(例如,相對於用於與該增 強層之視圖一起形成一立體視圖對的該基礎層之另一視圖 之視頻資料,或相對於一不同增強層之視頻資料)。該等 增強層中至少一者僅含有該等立體視圖中之一者之經編碼 信號。 ' 在一實例中,一種解碼包含基礎層資料及增強層資料之 視頻資料之方法包括解碼具有一第一解析度之基礎層資 料’其中該基礎層資料包含一左視圖之相對於該第一解析 度的一縮減解析度版本一右視圖之及相對於該第一解析度 的一縮減解析度版本。該方法亦包括解碼具有該第—解析 度且包含用於該左視圖及該右視圖中恰好一者之增強資料 之增強層資料,其中該增強資料具有該第一解析度,且其 中解碼該增強層資料包含相對於該基礎層資料之至少—部 分解碼該增強層資料。該方法亦包括組合該經解碼增強層 責料與該經解碼增強層所對應的該經解碼基礎層資料之談 158748.doc 201223249 左視圖或該右視圖中之該一者。 a在另—實例中,一種用於解碼包含基礎層資料及增強層 資料之視頻資料之裝置包括—視頻解碼器。在此實例中, 該視頻解碼器經組態以解碼具有—第—解析度之基礎層資 料其令s亥基礎層資料包含一左視圖之相對於該第一解析 度的一縮減解析度版本及一右視圖之相對於該第一解析度 、縮減解析度版本。該視頻解碼器亦經組態以解碼具有 該第一解析度且包含用於該左視圖及該右視圖中恰好一者 之增強資料之增強層資料,纟中該增強資料具有該第一解 析度,且其中解碼該增強層資料包含相對於該基礎層資料 之至少一部分解碼該增強層資料。該視頻解碼器亦經組態 以組合該經解碼增強層資料與該經解碼增強層所對應的該 經解碼基礎層資料之該左視圖或該右視圖中之該一者。 在另一實例中·,一種用於解碼包含基礎層資料及增強層 資料之視頻資料之裝置包括用於解碼具有一第一解析度之 基礎層資料的一構件,其中該基礎層資料包含一左視圖之 相對於該第一解析度的一縮減解析度版本及一右視圖之相 對於該第一解析度的一縮減解析度版本。該裝置亦包括用 於解碼具有該第一解析度且包含用於該左視圖及該右視圖 中恰好一者之增強資料之增強層資料的一構件,其中該增 強資料具有該第一解析度,且其中解碼該增強層資料包含 相對於該基礎層資料之至少一部分解碼該增強層資料。該 裝置亦包括用於組合該經解碼增強層資料與該經解碼增強 層所對應的該經解碼基礎層資料之該左視圖或該右視圖中 158748.doc -6 - 201223249 之該一者的一構件。 實例中 種包含一電腦可讀儲存媒體之電腦程 式產品,該電腦可讀儲存媒體在其上經儲存有指令,該等 指令在執行時使用於解碼具有基礎層資料及增強層資料之 視頻資料之-器件之一處理器解碼具有一第一解析度之基 礎層資料’其中該基礎層資料包含一左視圖之相對於該第 一解析度的一縮減解析度版本及一右視圖之相對於該第一 解析度的-縮減解析度版本。該等指令亦使該處理器解碼 具有•亥第解析度且包含用於該左視圖及該右視圖中恰好 一者之增強資料之增強層資料,其中該增強㈣具有該第 解析度’且其中解碼該增強層資料包含相對於該基礎層 資料之至少一部分解碼該增強層資料。該等指令亦使該處 理器組合該經解碼增強層資料與該經解碼增強層所對應的 該經解碼基礎層資料之該左視圖或該右視圖中之該一者。 在另一貝例中,一種編碼包含基礎層資料及增強層資料 之視頻資料之方法包括編碼具有—第—解析度之基礎層資 料,其中該基礎層資料包含一左視圖之相對於該第一解析 度的一縮減解析度版本及一右視圖之相對於該第一解析度 的一縮減解析度版本。該方法亦包括編碼具有一第一解析 度且包含用於該左視圖及該右視圖中恰好一者之增強資料 之增強層資料,其中該增強資料具有該第一解析度,且其 中解碼該增強層資料包含相對於該基礎層資料之至少一部 分解碼該增強層資料。 在另一實例中,一種用於編碼包含一場景之一左視圖及 158748.doc 201223249 該場景之一右視圖之視頻資料的裝置包括一視頻編碼器, 其中該左視圖具有一第一解析度且該右視圖具有該第一解 析度。在此實例中,該視頻編碼器經組態以編碼包含該左 視圖之相對於該第一解析度的一縮減解析度版本及該右視 圖之相對於該第一解析度的該縮減解析度版本之基礎層資 料。該視頻編碼器亦經組態以編碼包含用於該左視圖及該 右視圖中恰好一者之增強資料之增強層資料,其中該增強 資料具有該第一解析度。該視頻編碼器亦經組態以輸出該 基礎層資料及該增強層資料。 在另一實例中,一種用於編碼包含一場景之一左視圖及 該場景之一右視圖之視頻資料的裝置包括用於編碼包含相 對於一第一解析度的該左視圖之一縮減解析度版本及該右 視圖之相對於該第一解析度的該縮減解析度版本之基礎層 資料的一構件,其中該左視圖具有該第一解析度且該右視 圖具有該第一解析度。該裝置亦包括用於編碼包含用於該 左視圖及該右視圖中恰好一者之增強資料之增強層資料的 一構件,其中該增強資料具有該第一解析度。該裝置亦包 括用於輸出該基礎層資料及該增強層資料的一構件。 在另一實例中’一種包含一電腦可讀儲存媒體之電腦程 式產品,該電腦可讀儲存媒體在其上經儲存有指令,該等 指令在執行時使用於編碼視頻資料之一器件之一處理器接 收包含一場景之一左視圖及該場景之一右視圖之視頻資 料,其中該左視圖具有一第一解析度且該右視圖具有該第 一解析度。該等指令亦使該處理器編碼包含該左視圖之相 158748.doc 201223249 對於S亥第一解析度的一縮減解析度版本及該右視圖之相對 於該第一解析度的該縮減解析度版本之基礎層資料。該等 指令亦使該處理器編碼包含用於該左視圖及該右視圖,中恰 好一者之增強資料之增強層資料,其中該增強資料具有該 第一解析度。該等指令亦使該處理器輸出該基礎層資料及 該增強層資料。 下文在隨附圖式及描述中陳述一或多個實例之細節。其 他特徵、目標及優點將自該描述及該等圖式以及申請專利 範圍顯而易見。 【實施方式】 一般而言’本發明係關於用於支援立體視頻資料(例 如’用以產生三維視覺效果之視頻資料)之技術。為了產 生視頻之三維視覺效果,同時地或幾乎同時地展示一場景 之兩個視圖,例如,左眼視圖及右眼視圖。可自表示觀察 者之左眼與右眼之間的水平差別的稍微不同水平位置俘獲 對應於該場景之左眼視圖及右眼視圖的同一場景之兩個圖 像。藉由同時地或幾乎同時地顯示此等兩個圖像,使得左 眼視圖圖像被觀察者之左眼感知且右眼視圖圖像被觀察者 之右眼感知’觀察者可體驗三維視頻效果。 本發明提供用於形成包括具有複數個經封裝圖框之一個 基礎層及一或多個全解析度增強層之可縮放多視圖位元串 流之技術。該基礎層之經封裝圖框中每一者可對應於單一 視頻資料圖框,該圖框具有用於對應於一場景之不同視圖 (例如,「右眼視圖J及「左眼視圖」)之兩個圖像之資料。 158748.doc 201223249 詳言之’本發明之技術可包括編碼具有一場景之左眼視圖 之縮減解析度圖像及該場景之右眼視圖之縮減解析度圖像 的基礎層’該等縮減解析度圖像經封裝成一個圖框且被編 碼。另外,本發明之技術包括以可縮放方式編碼兩個全解 析度增強層,每一全解析度增強層包括基礎層中所包括之 立體對之一個視圖。舉例而言’除了基礎層以外,本發明 之技術亦可包括編碼具有右眼視圖或左眼視圖之全解析度 圖像之第一增強層。本發明之技術亦可包括編碼具有另一 各別視圖(例如’未包括於第一增強層中之右眼視圖或左 眼視圖)之全解析度圖像之第二增強層。根據本發明之一 些態樣,可以可縮放方式編碼多視圖位元串流。亦即,接 收可縮放多視圖位元串流之器件可接收及利用僅基礎層、 該基礎層及一個增強層,或該基礎層及兩個增強層。 在些貫例中,本發明之技術可有關非對稱經封裝圖框 之使用。亦即,在一些實例中,可將基礎層與一個增強層 組合以產生經編碼於該增強層中的用於一個視圖之全解析 度圖像,及經編碼為該基礎層之部分的用於另一視圖之縮 減解析度圖像。在不損失一般性的情況下,假設全解析度 圖像(例如’來自第一增強層)為基礎層之右眼視圖且縮減 解析度圖像為基礎層之左眼視圓部分。以此方式,目的地 器件可升取樣左眼視圖以提供三維輸出。再次,在此實例 中,增強層可被層間預測(例如,相對於用於基礎層令之 左眼視圖之資料)’及/或被視圖間預測(例如,相對於用於 基礎層中之右眼視圖之資料)。 158748.doc •10· 201223249 本發明通常將_圖像稱為一視 將圖一框锱A a a 樣本。本發明通常 ··、' L 3 —或多個圖像,該圖框 定時間例項之存取單元之至少 編碼為表-特 is m ^ 因此,—圖框可對 樣本(亦即’單一圖像),或在經封裝圖框 =包括來自多個視圖之樣本(亦即,兩個或兩個 另外’本發明通常提及可包括具有相似特性之—系列圖 框之「層」。根據本發明之態樣,「基礎層」可包括一系列 經封裝圖框(例如,包括在單—時間例項用於兩個視圖之 資料之圖框)’且包括於經封裝圖框中之每一視圖之每一 圖像可以縮減解析度(例如,半解析度)予以編碼。根據本 發明之態樣’「增強層」可包括用於基礎層之視圖中之一 者之資料,其可用以相對於獨自地在基礎層處解碼資料以 相對較高品質(例如’具有縮減失真)再生用於視圖之全解 析度圖像。根據一些實例,如上文所提到可組合(增強 層之)一個視圖之全解析度圖像與來自基礎層之另一視圖 之縮減解析度圖像以形成立體場景之非對稱表示。 根據一些實例,基礎層可順應於H 264/avc,此情形允 許次取樣兩個圖像且將其封裝成單一圖框以用於編碼。另 外’可關於基礎層及/或關於另一增強層編碼增強層。在 一實例中,基礎層可含有半解析度第一圖像(例如,「左眼 視圖」)及半解析度第二圖像(例如,「右眼視圖」),該等 圖像係以特定圖框封裝配置(例如,上下式、並列式、交 錯列、交錯行、梅花陣式(quincunx)(例如,「棋盤形」), 158748.doc • 11 - 201223249 或其他方式)封裝成單一圖框。另外,第一增強層可包括 對應於基礎層中所包括之圖像中之一者之全解析度圖像, 而第二增強層可包括對應於基礎層中所包括之另一各別圖 像之另一全解析度圖像。 在一實例中,第一增強層可對應於基礎層之第一視圖 (例如’左眼視圖),而第二增強層可對應於基礎層之第二 視圖(例如,右眼視圖)^在此實例中,第一增強層可包括 自基礎層之左眼視圖被層間預測及/或自基礎層之右眼視 圖被視圖間預測之全解析度圖框。此外,第二增強層可包 括自基礎層之右眼視圖被層間預測及/或自基礎層之左眼 視圖被視圖間預測之全解析度圖框。或者或另外,第二增 強層可包括自第一增強層被視圖間預測之全解析度圖框。 在另一實例中,第一增強層可對應於基礎層之第二視圖 (例如,右眼視圖)’而第二增強層可對應於基礎層之第一 視圖(例如,左眼視圖)。在此實例中,第一增強層可包括 自基礎層之右眼視圖被層間預測及/或自基礎層之左眼視 圖被視圖間預測之全解析度圖框。此外,第二增強層可包 括自基礎層之左眼視圖被層間預測及/或自基礎層之右眼 視圖被視圖間預測之全解析度圖框。或者或另外,第二增 強層可包括自第一增強層被視圖間預測之全解析度圖框。 本發明之技術包括根據允許接收器件(諸如,具有解碼 器之用戶端器件)接收及利用該基礎層、該基礎層及一個 增強層或該基礎層及兩個增強層之可縮放編碼格式來編碼 資料。舉例而言,各種用戶端器件可能能夠利用同一表示 158748.doc 201223249 之不同操作點。 詳言之,在操作點對應於僅基礎層且用戶端器件能夠進 行二維(2D)顯示之實例中,用戶端器件可解碼基礎層且捨 棄與基礎層之視圖中之一者相關聯之圖像。亦即,例如, 用戶端器件可顯示與基礎層之一個視圖(例如,左眼視圖) 相關聯之圖像且捨棄與基礎層之另一視圖(例如,右眼視 圖)相關聯之圖像。 在操作點包括基礎層且用戶端器件能夠進行立體或三維 (3D)顯示之另一實例中,用戶端器件可解碼基礎層且顯示 與基礎層相關聯之兩個視圖之圖像。亦即,用戶端器件可 接收基礎層,且根據本發明之技術,重新建構左眼視圖及 右眼視圖之圖像以用於顯示。用戶端器件可在顯示基礎層 之左眼視圖及右眼視圖之圖像之前升取樣該等圖像。 在另一實例中’操作點可包括該基礎層及一個增強層。 在此實例中,具有2D「高清晰度」(HD)顯示能力之用戶 端器件可接收該基礎層及一個增強層,且根據本發明之技 術’重新建構來自該增強層之僅全解析度視圖之圖像。如 本文中所使用,「高清晰度」可指代192〇xi〇8〇個像素之原 生解析度,但應理解,構成「高清晰度」之解析度係相對 的’且亦可將其他解析度視為「高清晰度」。 在操作點包括該基礎層及一個增強層且用戶端器件具有 立體顯示能力之另一實例中,用戶端器件可解碼及重新建 構該増強層之全解析度視圖之圖像,以及該基礎層之相反 視圖之半解析度圖像。用戶端器件可接著在顯示之前升取 158748.doc -13· 201223249 樣基礎層之半解析度圖像。 在又一實例中’操作點可包括該基礎層及兩個增強層。 在此實例中’用戶端器件可接收該基礎層及兩個增強層, 且根據本發明之技術’重新建構左眼視圖及右眼視圖之圖 像以用於3D HD顯示。因此,用戶端器件可利用增強層以 提供與兩個視圖有關之全解析度資料。因此,用戶端器件 可顯示兩個視圖之原生全解析度圖像。 本發明之技術之可縮放性質允許各種用戶端器件利用該 基礎層、該基礎層及一個增強層或該基礎層及兩個增強 層。根據一些態樣,能夠顯示單視圖之用戶端器件可利用 提供單視圖重新建構之視頻資料。舉例而言,此器件可接 收該基礎層或該基礎層及一個增強層以提供單視圖表示。 在此實例中’用戶端ϋ件可避免請求或在接收後隨即捨棄 與另-視圖相關聯之增強層資料。當器件未接收或解碼第 二視圖之增強層資料時’器件可升取樣來自基礎層之一個 視圖之圖像》 根據其他態樣,能夠顯示—個以上視圖之用戶端器科 (例如’三維電視 '電腦、手持型器件或其類似者)可利用 來自基礎層、第-增強層及/或第二增強層之資料。舉例 而。此器件可利用來自基礎層之資料以使用處於第一解 析度的基礎層之兩個視圖來產生場景之三維表示。或者, 此器件可利用來自該美虛爲s . 这基礎層及一個增強層之資料以產生場 景之三維表示’其尹場景之視圖中之一者相比於場景之另 一視圖具有相對較高解狀许 ,, 权门解析度。或者,此器件可利用來自該 \5S748.doc -14- 201223249 基礎層及兩個增強層之資料以產生場景之三維表示,其中 兩個視圖皆具有相對較高解析度。 以此方式,多媒體内容之表示可包括三個層:具有用於 兩個視圖(例如,左視圖及右視圖)之視頻資料之基礎層、 用於該兩個視圖中之_者之第—增強層,及用於該兩個視 圖中之另一者之第二增強層。如上文所論述,該兩個視圖 可形成一立體視圖對,此在於:可顯示該兩個視圖之資料 以產生二維效果。根據本發明之技術,可自經編碼於基礎 層中之對應視圖及/或經編碼於基礎層中之相反視圖中任 一者或其兩者預測第一增強層。可自經編碼於基礎層中之 對應視圖及/或經編碼於第一增強層中之對應視圖中任一 者或其兩者預測第二增強層。本發明將增強層自基礎層之 對應視圖之預測稱為「層間預測」,且將增強層自相反視 圖(是來自基礎層或是來自另一增強層)之預測稱為「視圖 間預測」。該等增強層中任一者或其兩者可被層間預測及/ 或視圖間預測》 本發明亦提供用於在網路抽象層(NAl)處(例如,在Nal 單元之補充增強資訊(SEI)訊息或序列參數集(SPS)中)用信 號傳輸層相依性之技術。本發明亦提供用於在(同一時間 例項之)存取單元中用信號傳輸NAL單元之解碼相依性之 技術。亦即,本發明提供用於用信號傳輸如何使用特定 NAL單元以預測可縮放多視圖位元串流之其他層之技術。 在H.264/AVC(進階視頻編碼)之實例中,將經編碼視頻區 &组織成NAL單元,該等NAL單元提供專注於諸如視頻電 158748.doc 201223249 話、儲存、廣播或串流傳輸之應用的「網路親和性」視頻 表示。可將NAL單元分類為視頻編碼層(Vcl)NAL單元及 非VCL NAL單元^ VCL單元可含有來自核心壓縮引擎之輸 出’且可包括區塊、巨集區塊及/或片段位準資料(slice level data)。其他NAL單元可為非VCL NAL單元。在一些 實例中,一個時間例項中之經編碼圖像(通常呈現為主要 經編碼圖像)可含於可包括一或多個NAL單元之存取單元 中。 在一些實例中,本發明之技術可應用於H.264/AVC編解 碼器,或基於進階視頻編碼(AVC)(諸如,可縮放視頻編碼 (S VC)、多視圖視頻編碼(MVC),或H.264/AVC之其他延 伸)之編解碼器。此等編解碼器可經組態以在SEI訊息與存 取單元相關聯時辨識SEI訊息,其中SEI訊息可以ISO基礎 媒體檔案格式或MPEG-2系統位元串流而囊封於存取單元 内。該等技術亦可應用於未來編碼標準’例如’ H.265/ HEVC(高效率視頻編碼)。 SEI訊息可含有不為解碼來自VCL NAL單元之經編碼圖 像樣本所必要之資訊’但可幫助與解碼、顯示、錯誤回彈 性及其他目的有關之程序。SEI訊息可含於非VCL 元中。SEI訊息為一些標準規範之正規性部分,且因此對 於標準順應性解碼器實施並非總是強制性的。SEI訊息可 為序列位準SEI訊息或圖像位準SEI訊息° 一些序列位準資 訊可含於SEI訊息(諸如,SVC之實例中之可縮放性資訊SEI 訊息,及MVC中之視圖可縮放性資mSEI訊息)中°此等實 158748.doc -16· 201223249 例SEI訊息可輸送關於(例如)操作點之提取及操作點之特性 的資訊。 H.264/AVC提供圖框封裝SEI訊息,其為指示用於包括兩 個圖像(例如’場景之左視圖及右視圖)之圖框之圖框封裝 類型的編解碼器位準訊息。舉例而言,支援各種類型之圖 框封裝方法以用於兩個圖框之空間交錯。所支援之交錯方 法包括棋盤式、行交錯、列交錯、並列式、上下式,及具 有棋盤式升頻轉換之並列式。圖框封裝SEI訊息被描述於 「Information technology-Coding of audio-visual objects-Part 10: Advanced Video Coding, AMENDMENT 1: Constrained baseline profile, stereo high profile and frame packing arrangement SEI message」(N101303, MPEG of ISO/IEC JTC1/SC29/WG11(2009年10月,中國西安))中,其併入至 H.264/AVC標準之最新近的版本中。以此方式, H.264/AVC支援將左視圖及右視圖之兩個圖像交錯成一個 圖像及將此等圖像編碼成一視頻序列。 本發明提供指示可用於經編碼視頻資料之操作點的操作 點SEI訊息。舉例而言,本發明提供指示用於各種縮減解 析度及全解析度層組合之操作點的操作點SEI訊息。可基 於對應於不同圖框率之不同時間子集來進一步分類此等組 合。解碼器可使用此資訊以判定一位元串流是否包括多個 層,及將基礎層適當地分離成兩個視圖及增強視圖之構成 圖像。 另外,根據本發明之一些態樣,本發明之技術包括提供 158748.doc -17- 201223249 針對h.264/avc之序列參數集(「sps」)延伸。舉例而言, 序列參數集可含有可用以解碼相對大數目個VCL魏單元 之資訊。序列參數集可應用於被稱作經編碼視頻序列之一 系列經連續編碼圖I。根據_些實命】,本發明《技術可與 提供SPS延伸以描述下列各者有關:(1)在基礎層中左眼視 圖之圖像之部位;(2)全解析度增強層之次序(例如,左眼 視圖之圖像是否在右眼視圖之圖像之前被編碼,或右眼視 圖之圖像是否在左眼視圖之圖像之前被編碼);(3)全解析 度增強層之相依性(例如,增強層是自基礎層或是自另一 增強層被預測(4)用於單視圖圖像之全解析度之操作點 之支援(例如,針對該基礎層及一個對應增強層之圖像中 之一者之支援);(5)非對稱操作點之支援(例如,針對包括 圖框之基礎層之支援,該等圖框具有用於一個視圖之全解 析度圖像及用於另一視圖之縮減解析度圖像);(6)層間預 測之支援;及(7)視圖間預測之支援。 圖1為說明貫例視頻編碼及解碼系統的方塊圖,該視頻 編碼及解碼系統可利用用於形成包括來自一場景之兩個視 圖之圖像之可縮放多視圖位元串流的技術。如圖1所示, 系統10包括經由通信頻道16而將經編碼視頻傳輸至目的地 器件14之來源器件12。來源器件12及目的地器件14可包含 廣泛範圍之器件中任一者,諸如,固定或行動計算器件、 機上盒、遊戲控制台、數位媒體播放器或其類似者。在一 些狀況下,來源器件12及目的地器件14可包含無線通信器 件’諸如,無線手機、所謂蜂巢式或衛星無線電電話,或 158748.doc -18- 201223249 可經由通信頻道16而傳達視頻資訊之任何無線器件,在此 狀況下’通信頻道16係無線的。 然而,涉及形成可縮放多視圖位元串流的本發明之技術 未必限於無線應用或設定。舉例而言,此等技術可應用於 空中電視廣播、有線電視傳輸、衛星電視傳輸、網際網路 視頻傳輸、經編碼至儲存媒體上之經編碼數位視頻,或其 他情境。因此,通信頻道16可包含適於傳輸經編碼視頻資 料之無線或有線媒體的任何組合。 在圖1之貫例中,來源器件丨2包括視頻來源i 8、視頻編 碼器20、調變器/解調變器(數據機)22及傳輸器以。目的地 器件14包括接收器26、數據機28、視頻解碼器30及顯示器 件32。根據本發明,來源器件12之視頻編碼器汕可經組態 以應用用於形成可縮放多視圖位元串流(例如,一個基礎 層及一或多個增強層(例如,兩個增強層))之技術。舉例而 cr,基礎層可包括用於兩個圖像之經編碼資料,每一圖像 係來自一場景之一不同視圖(例如,左眼視圖及右眼視 圖)’其中視頻編碼器20縮減兩個圖像之解析度且將該等 圖像組合成單一圖框(例如,每一圖像為全解析度圖框之 解析度的-半)。第—增強層可包括用於基礎層之視圖中 之一者之全解析度表示的經編碼資料,而第二增強層可包 括用於基礎層t另一各別視圖之全解析度的經編碼資料。 詳言之,視頻編碼器20可實施視圖間預測及/或層間預 測以相對於基礎層編碼增強層。假以例如)視頻編碼器2〇 編碼對應於基礎層之左眼視圖之圖像的增強層。在此實例 158748.doc 201223249 中,視頻編碼器20可實施層間預測方案以自基礎層之左眼 視圖之對應圖像預測增強層。在一些實例中,視頻編碼器 20可在預測增強層之圖像之前重新建構基礎層之左眼視圖 之圖像。舉例而言,視頻編碼器20可在預測增強層之圖像 之前升取樣基礎層之左眼視圖之圖像。視頻編碼器2 〇可藉 由基於經重新建構基礎層來執行層間紋理預測或藉由基於 基礎層之運動向量來執行層間運動預測而執行層間預測。 或者或另外,視頻編碼器20可實施視圖間預測方案以自基 礎層之右眼視圖之圖像預測增強層。在此實例中,視頻編 碼器20可在執行針對增強層之視圖間預測之前重新建構基 礎層之右眼視圖之全解析度圖像。 除了對應於基礎層之左眼視圖之全解析度圖像的增強層 以外’視頻編碼器20亦可編碼對應於基礎層之右眼視圖之 全解析度圖像的另一增強層◊根據本發明之一些態樣,視 頻編碼器20可使用關於基礎層之視圖間預測及/或層間預 測來預測右眼視圖之增強層圖像。另外,視頻編碼器2〇可 使用關於另一經先前產生增強層(例如,與左眼視圖對應 之增強層)之視圖間預測來預測右眼視圖之增強層圖像。 在其他實例中’來源器件及目的地器件可包括其他組件 或配置《舉例而言,來源器件12可自外部視頻來源18(諸 如’外部相機)接收視頻資料。同樣地,目的地器件14可 與外部顯示器件建立介面連接,而非包括整合式顯示器 件。 圖1之所說明系統10僅僅為一實例。可藉由任何數位視 158748.doc 201223249 頻編碼及/或解碼器件執行用於產生可縮放多視圖位元串 流之技術。雖然通常藉由視頻編碼器件執行本發明之技 術,但亦可藉由視頻編碼器/解碼器(通常被稱為 「CODEC」)執行該等技術。此外,亦可藉由視頻預處理 器或視頻後處理器(諸如,檔案囊封單元、檔案解囊封單 元、視頻多工器或視頻解多工器)執行本發明之技術之態 樣。來源器件12及目的地器件14僅僅為此等編碼器件之實 例,在該等編碼器件中,來源器件12產生用於傳輸至目的 地器件14之經編碼視頻資料。在一些實例中,5|件12、14 可以實質上對稱方式而操作,使得器件12、14中每一者包 括視頻編碼及解碼組件。因此,系統丨〇可支援在器件i 2、 14之間的單向或雙向視頻傳輸,例如,對於視頻串流傳 輸、視頻播放、視頻廣播、視頻遊戲或視頻電話。 來源器件12之視頻來源18可包括諸如視頻相機之視頻俘 獲器件、含有經先前俘獲視頻之視頻封存件,及/或來自 視頻内谷提供者之視頻饋送件。作為一另外替代例,視頻 來源18可產生以電腦圖形為基礎之資料以作為來源視頻, 或實況視頻、經封存視頻及經電腦產生視頻之組合。在一 些狀況下’若視頻來源1 8為視頻相機,則來源器件丨2及目 的地器件14可形成所謂相機電話或視頻電話。然而,如上 文所提及,本發明中所描述之技術一般而言可適用於視頻 編碼’且可應用於藉由行動計算器件或通常藉由非行動計 算器件執行之無線及/或有線應用。在任何狀況下,可藉 由視頻編碼器20編碼經俘獲、經預俘獲或經電腦產生視 158748.doc •21- 201223249 頻。 視頻來源18可將來自兩個或兩個以上視圖之圖像提供至 視頻編碼器20。可自稍微不同水平位置同時地或幾乎同時 地俘獲同一場景之兩個圖像,使得該兩個圖像可用以產生 三維效果。或者,視頻來源18(或來源器件12之另一單元) 可使用深度資訊或差別資訊以自第一視圖之第一圖像產生 第二視圖之第二圖像。深度或差別資訊可藉由俘獲第一視 圖之相機判定,或可自第一視圖中之資料演算。 MPEG-C第3部分提供用於將圖像之深度圖包括於視頻串 流中之規定格式。該規範被描述於「Text of ISO/IEC FDIS 23002-3 Representation of Auxiliary Video and Supplemental Information」(ISO/IEC JTC 1/SC 29/WG 11, MPEG Doc, N81368 (2007年1月,摩洛哥馬拉喀什(Marrakech)))中。在 MPEG-C第3部分中,輔助視頻可為深度圖或視差圖。當表 示深度圖時,MPEG-C第3部分可在用以表示深度圖之每一 深度值及解析度的位元之數目方面提供靈活性。舉例而 言,該圖可為藉由該圖描述之影像之寬度的四分之一及影 像之高度的二分之一。該圖可經編碼為單色視頻樣本,例 如,在僅具有照度分量之H.264/AVC位元串流内。或者, 如H.264/AVC中所定義,該圖可經編碼為輔助視頻資料。 在本發明之上下文中,深度圖或視差圖可具有與主要視頻 資料相同之解析度。雖然H.264/AVC規範當前未規定使用 輔助視頻資料以編碼深度圖,但可結合用於使用此深度圖 或視差圖之技術來使用本發明之技術。 158748.doc -22- 201223249 可接著藉由數據機22根據通信標準來調變經編碼視頻資 訊,且經由傳輸器24而將該資訊傳輸至目的地器件14。數 據機22可包括各種混頻器、濾波器、放大器,或經設計以 用於信號調變之其他組件。傳輸器24可包括經設計以用於 傳輪資料之電路,包括放大器、濾波器,及一或多個天 線。 目的地器件14之接收器26經由頻道16而接收資訊,且數 據機28解調變該資訊。再次,視頻編碼程序可實施本文中 所描述之技術中之一或多者以提供可縮放多視圖位元串 流。亦即’視頻編碼程序可實施此處所描述之技術中之一 或多者以提供一位元串流,該位元串流具有包括兩個視圖 之縮減解析度圖像之一個基礎層,以及包括該基礎層之視 圖之對應全解析度圖像之兩個增強層。 經由頻道16而傳達之資訊可包括藉由視頻編碼器2〇定義 之語法資訊,該資訊亦被視頻解碼器3〇使用,其包括描述 巨集區塊及其他經編碼單元(例如,G〇p)之特性及/或處理 的語法元素。因此’視頻解碼器3G可將基礎層解封裝成視 圖之構成圖像、解碼該等圖像,且將縮賴析度圖像升取 樣至全解析度。視頻解碼器3〇亦可判定用以編碼一或多個 增強層(例如,預測方法)及解碼—或多個增強層以產生包 括於基礎層中之一個或兩個視圖之全解析度圖像的方法。 顯示器件32可向使用者顯示㈣碼圖像。 顯示器件32可包含多種顯示器件中任—者諸如陰極 射線管(CRT)、液晶顯示器(lcd)、電漿顯示器、有機發光 158748.doc -23· 201223249 二極體(OLED)顯示器,或另一類型之顯示器件。顯示器 件32可同時地或幾乎同時地顯示來自多視圖位元串流之兩 個圖像。舉例而言,顯示器件3 2可包含能夠同時地或幾乎 同時地顯示兩個視圖之立體觀三維顯示器件。 使用者可佩戴主動式眼鏡以快速地且交替地遮擋左透鏡 及右透鏡,使得顯示器件32可與主動式眼鏡同步地在左視 圖與右視圖之間快速地切換。或者,顯示器件3 2可同時地 顯示兩個視圖,且使用者可佩戴被動式眼鏡(例如,具有 偏光透鏡),其對該等視圖進行濾波以使適當視圖傳遞通 過使用者之眼睛。作為又一實例,顯示器件32可包含不需 要眼鏡之自動立體觀顯示器。 在圖1之實例中’通信頻道16可包含任何無線或有線通 信媒體,諸如’射頻(RF)頻譜或一或多個實體傳輸線,或 無線媒體及有線媒體之任何組合。通信頻道丨6可形成以封 包為基礎之網路(諸如,區域網路、廣域網路,或諸如網 際網路之全域網路)之部分。通信頻道16通常表示用於將 視頻資料自來源器件12傳輸至目的地器件14之任何合適通 ^媒體或不同通信媒體之集合,包括有線或無線媒體之任 何合適組合。通信頻道16可包括路由器、交換器、基地 台’或可有用於促進自來源器件12至目的地器件14之通信 的任何其他設備。 視頻編碼器20及視頻解碼器30可根據視頻壓縮標準(諸 如’ ITU-TH.264標準,或者被稱為MPEG-4第10部分(進階 視頻編媽(AVC)))而操作。然而’本發明之技術不限於任 158748.doc -24- 201223249 何特定編碼標準。其他實例包括MPEG-2及ITU-T H.263。 雖然圖1中未圖示,但在一些態樣中’視頻編碼器20及視 頻解碼器3 0可各自與音訊編碼器及解碼器整合’且可包括 適當MUX-DEMUX單元或其他硬體及軟體以處置共同資料 串流或分離資料串流中之音訊及視頻兩者之編碼。適用 時,MUX-DEMUX單元可遵守ITU H.223多工器協定,或 諸如使用者資料報協定(UDP)之其他協定。 ITU-T H.264/MPEG-4(AVC)標準被ITU-T視頻編碼專家 群(VCEG)連同ISO/IEC動晝專家群(MPEG)—起闡明為被稱 為聯合視頻團隊(JVT)之集體合作夥伴之產品。在一些態 樣中,本發明中所描述之技術可應用於通常遵守H.264標 準之器件。H.264標準被描述於由ITU-T研究群提出且日期 為2005年3月之ITU-T建議案H.264(用於通用視聽服務之進 階視頻編碼)中,該標準在本文中可被稱為H,264標準或 H.264規範,或H.264/AVC標準或規範。聯合視頻團隊 (JVT)繼續致力於對H.264/MPEG-4 AVC之延伸。 本發明之技術可包括對H.264/AVC標準之修改型延伸。 舉例而言,視頻編碼器20及視頻解碼器30可利用修改型可 縮放視頻編碼(SVC)、多視圖視頻編碼(MVC),或 H.264/AVC之其他延伸。在一實例中’本發明之技術包括 被稱為「多視圖圖框相容」(「multi-view frame compatible, MFC」)之H.264/AVC延伸,其包括一個「基礎視圖」(例 如,在本文中被稱為基礎層)及一或多個「增強視圖」(例 如,在本文中被稱為增強層)。亦即,MFC延伸之「基礎 158748.doc -25- 201223249 視圖」可包括一場景之兩個視圖之縮減解析度圖像,其係 以稍微不同水平透視被俘獲’但在時間上幾乎同時地或幾 乎同時地被俘獲。因而’ MFC延伸之「基礎視圖」實際上 可包括來自如本文中所描述之多個「視圖」(例如,左眼 視圖及右眼視圖)之圖像。另外,MFC延伸之「增強視 圖」可包括「基礎視圖」中所包括之視圖中之一者之全解 析度圖像。舉例而言,MFC延伸之「增強視圖」可包括 「基礎視圖」之左眼視圖之全解析度圖像。MFC延伸之另 一「增強視圖」可包括「基礎視圖」之右眼視圖之全解析 度圖像。 視頻編碼器20及視頻解碼器30各自可實施為多種合適編 碼器電路中任一者,諸如,一或多個微處理器、數位信號 處理器(DSP)、特殊應用積體電路(ASIC)、場可程式化開 陣列(FPGA)、離散邏輯、軟體、硬體、韌體,或其任何組 合。視頻編碼器20及視頻解碼器3〇中每一者可包括於一戋 多個編碼器或解碼器中,該一或多個編碼器或解碼器中任 一者可在各別相機、電腦、行動器件、用戶器件、廣播器 件、機上盒、伺服器或其類似者中整合為組合式編碼器/ 解碼器(CODEC)之部分。 一視頻序列通常包括一系列視頻圖框。一圖像群組 (GOP)通常包含-系列—或多個視頻圖框。G〇p可在G〇p 之標頭中、在GOP之—或多個圖框之標頭中或在別處包括 語法資料,該語法資料描述包括於G〇p中之數個圖框。每 一圖框可包括描述用於各別圖框之編碼模式的圖框語法資 158748.doc •26· 201223249 料。視頻編碼器20通常對個別視頻圖框内之視頻區塊進行 操作,以便編碼視頻資料。視頻區塊可對應於巨集區塊或 巨集區塊之分割區。視頻區塊可具有固定或變化大小,且 可根據規定編碼標準而在大小上不同。每一視頻圖框可包 括複數個片段。每一片段可包括複數個巨集區塊,該複數 個巨集區塊可經配置成亦被稱為子區塊之分割區。 作為一實例,ITU-T H.264標準支援以各種區塊大小(諸 如’用於明度分量之16乘16、8乘8或4乘4’及用於色度分 量之8x8)之框内預測,以及以各種區塊大小(諸如,用於 明度分量之16x16、16x8、8x16、8x8、8x4、4x8 及 4x4, 及用於色度分量之對應經縮放大小)之框間預測。在本發 明中,「ΝχΝ」與「N乘N」可互換地用以指代在垂直尺寸 及水平尺寸上區塊之像素尺寸,例如,16χ16像素或16乘 16像素。一般而言,16xl6區塊在垂直方向上將具有“個 像素(y=16)且在水平方向上將具有16個像素(χ=ΐ6)。同樣 地,ΝχΝ區塊通常在垂直方向上具有Ν個像素且在水平方 向上具有Ν個像素’其中Ν表示非負整數值。可按列及行 來配置區塊中之像素°此外,區塊未必需要在水平方向上 ”有數目與在垂直方向上像素之數目相同的像素。舉例而 言,區塊可包含ΝχΜ像素,其中Μ未必等 小於16乘16之區塊大,j、可被稱為丨錄16巨集區塊之分割 區。視頻區塊可包含在像素域中之像素資料區塊,或在變 換域中之變換係數區塊,例如,在將諸如㈣㈣變換 (DCT) |數變換 '小波變換或概念上相似變換之變換應 158748.doc •27· 201223249 用於表示經編碼視頻區塊與預測性視頻區塊之間的像素差 之殘餘視頻區塊資料之後。在一些狀況下,視頻區塊可包 含在變換域中之經量化變換係數區塊。 較小視頻區塊可提供較好解析度,且可用於包括高位準 之細節的視頻圖框之部位。一般而言,巨集區塊及各種分 割區(有時被稱為子區塊)可被視為視頻區塊。另外,一片 #又可被視為複數個視頻區塊,諸如,巨集區塊及/或子區 塊。每一片段可為一視頻圖框之一獨立可解碼單元。或 者’圖框自身可為可解碼單元,或圖框之其他部分可被定 義為可解碼單元。術語「經編碼單元」可指代視頻圖框之 任何獨立可解碼單元,諸如,整個圖框、圖框之片段、亦 被稱為序列之圖像群組(GOP) ’或根據適用編碼技術而定 義之另一獨立可解碼單元。 在用以產生預測性資料及殘餘資料之框内預測性或框間 預測性編碼之後,且在應用於殘餘資料以產生變換係數之 任何變換(諸如,H.264/AVC中所使用之4M或8x8整數變 換’或離散餘弦變換DCT)之後,可執行變換係數之量 化。量化通常指代量化變換係數以可能地縮減用以表示該 等係數之資料之量的程序。該量化程序可縮減與該等係數 中之一些或全部相關聯的位元深度。舉例而言,可在量化 期間將《位元值降值捨位至W位元值,其中W大於/« 〇 在量化之後’可(例如)根據内容自調適性可變長度編碼 (content adaptive variable length coding,CAVLC)、上下文 自調適性二進位舁術編碼(context adaptive binary 158748.doc • 28 - 201223249 —C coding,CABAC)或另一熵 化資料之摘边雄1 π z^π 4之熵、.扁碼。經組態以用於熵編碼之處理單 處理單元可執行其他處理功能,諸 . 眾如,經量化係數之零延 :長度編碼’及/或語法資訊(諸如’經編碼區塊型樣(cBp) 、巨集區塊類型、編瑪模式、經編碼單元(諸如,圖 ::、:段、巨集區塊或序列)之最大巨集區塊大小,或其 類似者)之產生。 視頻編碼器2〇可(例如)在圖框標頭、區塊標頭、片段標 =或⑽標射將語法㈣(諸如,以區塊為基礎之語法 枓、以圖裡為基礎之語法資料’及/或以G〇p為基礎之語 法資料)進—步發送至視頻解碼器3〇。GOP語法資料可描 :各別GO”之數個圖框,且圖框語法資料可指示用以編 =對應圖框之編碼/預測模式。因此,視頻解碼器3〇可包 3標準視頻解碼器,且未必需要經料地組態以實現或利 用本發明之技術。 適用時,視頻編碼器20及視頻解碼器3〇各自可實施為多 種合適。編碼器或解碼器電路中任一者,諸如,一或多個微 處理器、數位信號處理器(Dsp)、才寺殊應用積體電路 (ASIC)、場可程式化㈣列(FpGA)、離散邏輯電路、軟 硬體、勒體,或其任何組合。視頻編碼器2〇及視頻解 I器30中每―者可包括於—或多個編碼器或解瑪器中,該 —或多個編碼器或解碼器中任—者可整合為組合式視頻编 "°解‘I H (CODEC)之部分^包括視頻編碼器2(5及/或視 頻解喝器30之裝置可包含積體電路、微處理器、計算器 158748.doc •29- 201223249 件,及/或諸如行動電話之無線通信器件。 視頻解碼H3G可經組態以接收包括—個基礎層及兩個增 強層之可縮放多視圖位元串流。視頻解碼器3〇可經進一步 組態以將基礎層解封裝成兩個對應圖像集合,例如,左眼 視圖之縮減解析度圖像,及右眼視圖之縮減解析度圖像。 視頻解碼器30可解碼該等圖像,且升取樣(例如,經由内 插)縮減解析度圖像以產生經解碼全解析度圖像。另外, 在-些實例中’視頻解碼器3〇可參考基礎層之經解碼圖像 來解瑪增強層’該等增強層包括對應於該基礎層之全解析 度圖像。亦即’視頻解碼器3〇亦可支援視圖間預測方法及 層間預測方法。 在-些實例中,視頻解❹3()可經組態以判定目的地器 件Μ是否能夠解碼及顯示三維資料。若目的地器件μ不能 夠解碼及顯示三維資料,則視頻解碼器30可解封裝一經接 收基礎層,但捨棄該等縮減解析度圖像中之一者。視頻解 碼器30亦可捨棄對應於基礎層之經捨棄縮減解析度圖像之 全解析度增強層。視頻解碼器3〇可解碼剩餘縮減解析度圖 像、升取樣或升頻轉換縮減解析度圖像,且使視頻顯示器 32顯不來自此視圖之圖像以呈現二維視頻資料。在另一實 例中’視㈣碼H3G可解碼難料解析錢像及對應增 強層且使視頻顯示器32顯示來自此視圖之圖像以呈現二 維視頻資料。因此’視頻解碼器30可解碼圖框之僅一部分 且將經解碼圖像提供至顯示器件32,而不試圖解碼全部圖 框0 158748.doc -30- 201223249 以此方式,不管目的地器件14是否能夠顯示三維視頻資 料,目的地器件14皆可接收包括一個基礎層及兩個增強層 之可縮放多視圖位元串流。因此,具有各種解碼及顯現能 力之各種目的地器件可經組態以自視頻編碼器20接收同一 位元串流。亦即,一些目的地器件可能能夠解碼及顯現三 維視頻資料,而其他目的地器件可能不能夠解碼及/或顯現 二維視頻資料’但是,該等器件中每一者可經組態以接收 及使用來自同一可縮放多視圖位元串流之資料。 根據一些實例,可縮放多視圖位元串流可包括複數個操 作點以促進解碼及顯示經接收經編碼資料之子集。舉例而 言’根據本發明之態樣,可縮放多視圖位元串流包括四個 操作點:(1)包括兩個視圖(例如,左眼視圖及右眼視圖)之 縮減解析度圖像之基礎層;(2)該基礎層及包括左眼視圖之 全解析度圖像之增強層;(3)該基礎層及包括右眼視圖之全 解析度圖像之增強層;及(4)該基礎層、該第一增強層及該 第二增強層,使得該兩個增強層一起包括兩個視圖之全解 析度圖像。 圖2A為說明可實施用於產生可縮放多視圖位元串流之技 術之視頻編碼器2〇之實例的方塊圖,該可縮放多視圖位元 串流具有包括一場景之兩個視圖(例如,左眼視圖及右眼 視圖)之縮減解析度圖像之基礎層,以及包括該基礎層之 視圖中之一者之全解析度圖像之第一增強層,及包括來自 該基礎層之另-各別視圖之全解析度圖像之第二增強層。 應理解,圖2A之特定組件可出於概念目的而關於單一組件 158748.doc 201223249 予以展示及描述’但可包括一或多個功能單元。另外,雖 然圖2A之特疋組件可關於單一組件予以展示及描述,但此 等組件可實體上包令_伽+ ^ 3 個或一個以上離散及/或整合式單 元0 關於圖2A,且在本發明中之別處,將視頻編碼㈣描述 為編碼-或多個視頻資料圖框。如上文所描述,一層(例 如’基礎層及增強層)可包括構成多媒體内容之一系列圖 框。因此’「基礎圖框」可指代基礎層中之單—視頻資料 圖框另外,增強圖框」可指代增強層中之單一視頻資 料圖框。 通常,視頻編碼器20可執行視頻圖框内之區塊(包括巨 集區塊,或巨集區塊之分割區或子分割區)之框内編碼及 框間編碼《框内編碼依賴於空間預測以縮減或移除給定視 頻圖框内之視頻之空間冗餘。框内模式(1模式)可指代若干 以空間為基礎之壓縮模式中任一者,且諸如單向預測(p模 式)或雙向預測(B模式)之框間模式可指代若干以時間為基 礎之壓縮模式中任一者。框間編碼依賴於時間預測以縮減 或移除視頻序列之鄰近圖框内之視頻之時間冗餘。 在一些實例中,視頻編碼器20亦可經組態以執行增強層 之視圖間預測及層間預測。舉例而言,視頻編碼器2 〇可經 組態以根據H.264/AVC之多視圖視頻編碼(MVC)延伸來執 行視圖間預測。另外’視頻編碼器20可經、纟耳態以根據 H.264/AVC之可縮放視頻編碼(SVC)延伸來執行層間預 測。因此,增強層可自基礎層被視圖間預測或層間預須,j。 158748.doc -32- 201223249 另外,一個增強層可自另一增強層被視圖間預測。 如圖2A所示,視頻編碼器2〇接收待編碼視頻圖像内之當 前視頻區塊。在圖2A之實例中,視頻編碼器2〇包括運動補 償單元44、運動/差別估計單元42、參考圖框储存器以、 求和器50、變換單元52、量化單元54及烟編碼單元%。為 了視頻區塊重新建構,視頻編碼器2〇亦包括逆量化單元 58、逆變換單元6〇及求和器62。亦可包括解區塊渡波器 (圖2A中未圖示)以對區塊邊界進行渡波以自經重新建構視 頻移除方塊效純影。必要時,解區塊缝㈣常將對求 和器62之輸出進行濾波。 在編瑪程㈣間,視頻編碼㈣接㈣編碼視頻圖像或 片段。可將該圖像或片段劃分成多個視頻區塊。運動估 計/差別單元42及運動補償單元44相對於一或多個參考圖 框中之-或多個區塊執行經接收視頻區塊之框間預測性編 碼。亦即,運動估計/差別單元42可相對於不同時間例項 之一或多個參考圖框中之—或多個區塊執行經接收視頻區 =之框間預測性編碼,例如,使用同—視圖之—或多個參 考圖框之運動估計。另外,運動估計/差別單元仏可相對 於同-時間例項之-或多個參考圖框中之一或多個區塊執 作經接收視頻區塊之框間預測性編碼,例如,使用不同視 圖之一或多個參考圖框之運動差別。框内預測單元46可相 對於作為待編碼區塊之同—圖框或片段中之—或多個相鄰 區塊執行經純視頻區塊之框内預測性編碼以提供空間麼 縮。模式選擇單元4〇可「备丨1、# 了(例如)基於誤差結果而選擇框内編 J58748.doc •33- 201223249 碼模式或框間編碼模式中之一者,且將所得的經框内編碼 區塊或經框間編瑪區塊提供至求和器5 〇以產生殘餘區塊資 料,且將所得的經框内編碼區塊或經框間編碼區塊提供至 求和器62以重新建構經編碼區塊以供參考圖框中使用。 詳言之,視頻編碼器20可自形成立體視圖對之兩個視圖 接收圖像。可將該兩個視圖稱為視圖〇及視圖1,其中視圖 〇對應於左眼視圖圖像且視圖〇對應於右眼視圖圖像。應理 解,該等視圖可被不同地標記,且取而代之,視圖i可對 應於左眼視圖且視圖〇可對應於右眼視圖。 在一實例中’視頻編碼器20可藉由以縮減解析度(諸 如,半解析度)來編碼視圖〇及視圖丨之圖像而編碼基礎 層。亦即,視頻編碼器20可在編碼視圖0及視圖圖像之 前將該等圖像降取樣達原先的二分之一。視頻編碼器2〇可 將經編碼圖像進一步封裝成經封裝圖框。假設(例如)視頻 編碼器20接收視圖〇圖像及視圖!圖像,每一圖像具有办個 像素之高度及w個像素之寬度,其中冰及&為非負的非零整 數。視頻編碼器20可藉由將視圖〇圖像及視圖j圖像之高度 降取樣至A/2個像素之高度且將經降取樣視圖〇配置於經降 取樣視圖1上方而形成上下配置式經封裝圖框。在另一實 例中,視頻編碼器20可藉由將視圖〇圖像及視圖1圖像之寬 度降取樣至w/2個像素之寬度且將經降取樣視圖〇配置至經 降取樣視圖1之相對左邊而形成並列配置式經封裝圖框。 並列式圖框封裝配置及上下式圖框封裝配置僅僅係作為實 例而提供,且應理解,視頻編碼器2〇可以其他配置(諸 158748.doc -34· 201223249 如,棋盤式型樣、交錯行或交錯列)來封裝基礎圖框之視 圖0圖像及視圖1圖像。舉例而言,視頻編碼器20可根據 H.264/AVC規範來支援圖框封裝。 除了基礎層以外,視頻編碼器20亦可編碼對應於包括於 基礎層中之視圖的兩個增強層。亦即,視頻編碼器20可編 碼視圖0之全解析度圖像,以及視圖1之全解析度圖像。視 頻編碼器20可執行視圖間預測及層間預測以預測該兩個增 強層。 視頻編碼器20可進一步提供指示可縮放多視圖位元串流 之多種特性的資訊。舉例而言,視頻編碼器20可提供指示 基礎層之封裝配置、增強層之序列(例如,對應於視圖0之 增強層是在對應於視圖1之增強層之前或是之後出現)、該 等增強層是否自彼此被預測及其他資訊的資料。作為一實 例,視頻編碼器20可以序列參數集(SPS)延伸之形式提供 此資訊,該SPS延伸應用於一系列經連續編碼圖框。SPS 延伸可根據下表1之實例資料結構而定義: 表 l-seq_parameter_set_mfc_extension SPS訊息 seq parameter set mfc_extension() { C 描述符 upper—left frame 0 0 u(l) left view enhance first 0 U(l) full left right dependent flag 0 u(l) one view full idc 0 u(2) assymetric flag 0 u⑴ inter layer pred disable flag 0 u(l) inter view pred disable flag 0 u(l) > 158748.doc -35- 201223249 SPS °孔息可向視頻解碼器(諸如,視頻解碼器30)通知輸 出經解碼圖像含有包括使用經指示圖框封裝配置方案之多 個相異經空間封裝構成圖框之圖框的樣本》SPS訊息亦可 向視頻解碼器30通知增強圖框之特性。 詳。之視頻編碼器20可將upper—left_frame_0設定至為 1之值以指示:每一構成圖框之左上部明度樣本屬於左視 圖,藉此指示基礎層之哪些部分對應於左或右視圖。視頻 編碼器20可將upper_ieft_frame_〇設定至為〇之值以指示: 每一構成圖框之左上部明度樣本屬於右視圖。 本發明亦將特定視圖之經編碼圖像稱為「視圖分量」。 亦即,視圖分量可包含在特定時間用於特定視圖(及/或特 疋層)之經編碼圖像。因此,可將存取單元定義為包含共 同時間例項之全部視圖分量。存取單元之解碼次序及存取 單元之視圖分量未必需要與輸出或顯示次序相同。 視頻編碼器20可設定ieft_view_enhance—first以規定每一 存取單元中視圖分量之解碼次序。在一些實例中,視頻編 碼器20可將left一view_enhance_first設定至為1之值以指 示:全解析度左視圖圖框在解碼次序中位於基礎圖框NAL 單元之後’及全解析度右視圖圖框在解碼次序中位於全解 析度左視圖圖框之後。視頻編碼器20可將left_view_ enhance一first設定至為〇之值以指示:全解析度右視圖圖框 在解碼次序中位於基礎圖框NAL單元之後,及全解析度左 視圖圖框在解碼次序中位於全解析度右視圖圖框之後。 視頻編碼器 20可將 full_left_right_dependent_flag設定至 158748.doc -36 - 201223249 為〇之值以指示:全解析度右視圖圖框及全解析度左視圖 圖框之解碼係獨立的,此意謂全解析度左視圖圖框及全解 析度右視圖圖框之解碼相依於基礎視圖且彼此不相依。視 頻編碼器 20 可將 full_left—right_dependent一flag 設定至為 1 之值以指示.全解析度圖框中之一者(例如,全解析度右 視圖圖框或全解析度左視圖圖框)相依於另一全解析度圖 框。 視頻編碼器20可將〇ne_view_full—idc設定至為〇之值以指 示:不存在用於全解析度單視圖呈現之操作點。視頻編碼 器20可將one—view—fuujdc設定至為1之值以指示:存在在 解碼次序中提取第三視圖分量之後所允許的全解析度單視 圖操作點。視頻編碼器2〇可將one一view_full—idc設定至為2 之值以指示:除了在此值等於1時所支援之操作點以外, 亦存在在解碼次序中提取第二視圖分量之後所允許的全解 析度單視圖操作點。 視頻編碼器20可將asymmetric_flag設定至為〇之值以指 示:不允許任何非對稱操作點。視頻編碼器2〇可將 asymmetric_flag設定至為i之值以指示:允許非對稱操作 點,使得當解碼任何全解析度單視圖操作點時,允許全解 析度視圖連同基礎視圖中之另一視圖一起形成非對稱表 示。 視頻編碼器20可將inter—iayer_pred一disable_nag設定至 為1之值以指示:當編碼位元串流時且當序列參數集在作 用中時’未使用任何層間預測。視頻編碼器2〇可將 158748.doc -37- 201223249 inter_layer_pred_disable_flag設定至為0之值以指示:可能 使用層間預測。 視頻編碼器20可將inter_view_pred_disable_flag設定至 為1之值以指示:當編碼位元串流時且當序列參數集在作 用中時,未使用任何視圖間預測。視頻編碼器20可將 inter_view_pred_disable_flag設定至為1之值以指示:可能 使用視圖間預測。 除了 SPS延伸以夕卜,視頻編碼器20亦可提供VUI訊息。 詳言之,對於對應於全解析度圖框(例如,該等增強圖框 中之一者)之非對稱操作點,視頻編碼器可施加VUI訊息以 規定基礎視圖之修剪區域。與全解析度視圖組合之經修剪 區域形成用於非對稱操作點之表示。可描述經修剪區域, 使得可區別全解析度圖像與非對稱經封裝圖框中之縮減解 析度圖像。 視頻編碼器20亦可定義用於基礎圖框與增強圖框之各種 組合的數個操作點。亦即,視頻編碼器可在操作點SEI中 用信號傳輸多種操作點。在一實例中,視頻編碼器20可經 由下表2中所提供之SEI訊息而提供操作點: 表 2-operation_point_info(payloadSize) SEI訊息 operation point info( payloadSize ) { max temporal id 5 u(3) for( i = 0; i < (3+ full left right dependent flag); i++) { proflle idc 5 u(8) for (j = 0; j <= max temporal id; j++) { level info predict flag[i][j] 5 u(l) 158748.doc -38 - 201223249201223249 VI. Description of the invention: [Technical field to which the invention pertains] The present invention relates to the encoding of video coding and more specific ancient frequency data. The full text of the first application of the provisional application of the country's provisional application is based on the right of this application to claim US 61/480,336, filed on April 28, 2011, and US 61/3 86,463, filed on September 24, 2010. The manner of each reference in both applications is incorporated herein. [Prior Art] Digital video capabilities can be incorporated into a wide range of devices, including digital TVs, digital live systems, wireless broadcast systems, personal digital assistants (PDAs), laptop or desktop computers, digital cameras. , digital recording devices, digital media players, video game devices, video game consoles, cellular or satellite radio phones, video teleconferencing devices and the like. Digital video devices implement video compression techniques (such as in MpEG-2, MPEG-4, ITU-T H. 263 or ITU-T H. 264/MPEG-4 Part 10 (Standards defined by Advanced Video Coding (AVC) and the video compression techniques described in the extension of these standards) to transmit and receive digital video information more efficiently . Video compression techniques perform spatial prediction and/or temporal prediction to reduce or remove redundancy inherent to a video sequence. For block-based video encoding, a video frame or segment can be segmented into a number of macroblocks. You can further divide each macro block. Use about the adjacent giant. The spatial prediction of the cluster block encodes the macroblocks in the intra-frame coded (I) frame or segment. Framed by 158748. The macroblocks in the 201223249 code (P or B) frame or fragment can use spatial predictions for neighboring macroblocks in the same frame or segment, or time predictions for other reference frames. Has been made based on H. 264/AVC to develop new video coding standards. One such standard is the scalable video coding (SVC) standard, which is for H. Scalable extension of 264/AVC. Another standard is multi-view video coding (MVC), which has become the opposite of H. 264/AVC multi-view extension. "Joint Draft 8." of JVT-AB204 0 on Multiview Video Coding (July 2008, Hannover, Germany, 28th JVT meeting) describes the joint draft of MVC, which is available from http://wftp3. Itu. Int/av-arch/jvt-site/2008_07_Hannover/ JVT-AB204. Zip. JVT-AD007 "Editors' draft revision to ITU-T Rec.  H. 264 | ISO/IEC 14496-10 Advanced Video Coding-in preparation for ITU-T SG 16 AAP Consent (integra form) (August 2009, Geneva, Geneva, 30th JVT Conference) describes AVC The standard version, which is available from http://wftp3. Itu. Int/av-arch/j vt-site/2009_0 l_Geneva/JVT-AD007. Zip. The JVT-AD007 file integrates SVC and MVC into the AVC specification. [Invention] In general, the present invention describes techniques for supporting stereoscopic video material (e.g., video material for generating three-dimensional (3D) effects). To produce a three-dimensional effect of the video, two views of a scene, such as a left eye view and a right eye view, can be displayed simultaneously or nearly simultaneously. Technical package of the invention 158748. Doc 201223249 includes forming a scalable bit cell cascade with one base layer and one or more enhancement layers. The technique of the present invention includes forming a base layer including an individual frame. Data from two reduced resolution views of a scene. That is, one of the base layer frames includes material for two images from slightly different horizontal perspectives of the scene. Therefore, the map busy T of the base layer is referred to as a packed frame (j frame). In addition to the base layer, the technology of the present invention also includes forming a full resolution corresponding to one or more views of the base layer. One or more enhancement layers represented by degrees. The enhancement layers may be inter-layer predicted (eg, relative to video material for the same view of the base layer), and/or inter-view predicted (eg, relative to Forming, together with the view of the enhancement layer, video material of another view of the base layer of a stereoscopic view pair, or video material relative to a different enhancement layer. At least one of the enhancement layers only contains the stereoscopic views One of the encoded signals. In an example, a method of decoding video data including base layer data and enhancement layer data includes decoding base layer data having a first resolution, wherein the base layer data includes one a reduced resolution version of the left view relative to the first resolution and a reduced resolution version relative to the first resolution. The method also includes decoding having First-resolution and including enhancement layer data for exactly one of the left view and the right view, wherein the enhanced material has the first resolution, and wherein decoding the enhancement layer data comprises relative to the basis At least partially decoding the enhancement layer data. The method also includes combining the decoded enhancement layer blame with the decoded base layer data corresponding to the decoded enhancement layer. Doc 201223249 One of the left or right view. In another example, an apparatus for decoding video material comprising base layer data and enhancement layer data includes a video decoder. In this example, the video decoder is configured to decode a base layer data having a -degree of resolution that causes the base layer data to include a reduced resolution version of the left resolution relative to the first resolution and A right view is reduced relative to the first resolution, reduced resolution version. The video decoder is also configured to decode enhancement layer data having the first resolution and including enhancement data for exactly one of the left view and the right view, wherein the enhancement data has the first resolution And wherein decoding the enhancement layer data comprises decoding the enhancement layer data with respect to at least a portion of the base layer data. The video decoder is also configured to combine the decoded enhancement layer material with the one of the left or the right view of the decoded base layer material corresponding to the decoded enhancement layer. In another example, an apparatus for decoding video material including base layer data and enhancement layer data includes a component for decoding base layer data having a first resolution, wherein the base layer data includes a left A reduced resolution version of the view relative to the first resolution and a reduced resolution version of the right view relative to the first resolution. The apparatus also includes means for decoding enhancement layer data having the first resolution and including enhancement data for exactly one of the left view and the right view, wherein the enhanced material has the first resolution, And wherein decoding the enhancement layer data comprises decoding the enhancement layer data with respect to at least a portion of the base layer data. The apparatus also includes the left or right view for combining the decoded enhancement layer material with the decoded base layer data corresponding to the decoded enhancement layer. Doc -6 - 201223249 One of the components of this one. An example includes a computer program product comprising a computer readable storage medium having stored thereon instructions for performing decoding of video material having base layer data and enhancement layer data when executed - one of the devices the processor decodes the base layer data having a first resolution, wherein the base layer data includes a reduced resolution version of the left view relative to the first resolution and a right view relative to the first A resolution-reduced resolution version. The instructions also cause the processor to decode enhancement layer data having a resolution of HI and including enhancement data for exactly one of the left view and the right view, wherein the enhancement (4) has the first resolution 'and wherein Decoding the enhancement layer data includes decoding the enhancement layer data relative to at least a portion of the base layer material. The instructions also cause the processor to combine the decoded enhancement layer material with the one of the left or the right view of the decoded base layer material corresponding to the decoded enhancement layer. In another example, a method of encoding video data including base layer data and enhancement layer data includes encoding base layer data having a first resolution, wherein the base layer data includes a left view relative to the first A reduced resolution version of the resolution and a reduced resolution version of the right view relative to the first resolution. The method also includes encoding enhancement layer data having a first resolution and including enhancement data for exactly one of the left view and the right view, wherein the enhancement data has the first resolution, and wherein the enhancement is decoded The layer data includes decoding the enhancement layer data relative to at least a portion of the base layer data. In another example, a method for encoding includes a left view of a scene and 158748. Doc 201223249 The apparatus for video material of one of the right views includes a video encoder, wherein the left view has a first resolution and the right view has the first resolution. In this example, the video encoder is configured to encode a reduced resolution version of the left view relative to the first resolution and the reduced resolution version of the right view relative to the first resolution Basic layer information. The video encoder is also configured to encode enhancement layer data containing enhancement data for exactly one of the left view and the right view, wherein the enhancement data has the first resolution. The video encoder is also configured to output the base layer data and the enhancement layer data. In another example, an apparatus for encoding video material comprising a left view of a scene and a right view of one of the scenes includes encoding for reducing a resolution of the left view relative to a first resolution And a component of the version and the right view of the base layer data of the reduced resolution version of the first resolution, wherein the left view has the first resolution and the right view has the first resolution. The apparatus also includes means for encoding enhancement layer data comprising enhanced data for exactly one of the left view and the right view, wherein the enhanced material has the first resolution. The apparatus also includes a component for outputting the base layer data and the enhancement layer data. In another example, a computer program product comprising a computer readable storage medium having stored thereon instructions for processing one of the devices used to encode the video material during execution The device receives video material including a left view of one of the scenes and a right view of the one of the scenes, wherein the left view has a first resolution and the right view has the first resolution. The instructions also cause the processor to encode the phase containing the left view 158748. Doc 201223249 A reduced resolution version of the first resolution of Shai and a base layer data of the reduced resolution version of the right resolution relative to the first resolution. The instructions also cause the processor to encode enhancement layer data containing enhancement data for exactly one of the left view and the right view, wherein the enhancement data has the first resolution. The instructions also cause the processor to output the base layer data and the enhancement layer data. The details of one or more examples are set forth in the accompanying drawings and description. Other features, objectives, and advantages will be apparent from the description and the drawings and claims. [Embodiment] In general, the present invention relates to a technique for supporting stereoscopic video material (e.g., 'video material for generating a three-dimensional visual effect'). To produce a three-dimensional visual effect of the video, two views of a scene, such as a left eye view and a right eye view, are displayed simultaneously or nearly simultaneously. A slightly different horizontal position that can represent the horizontal difference between the observer's left and right eyes captures two images of the same scene corresponding to the left eye view and the right eye view of the scene. By displaying the two images simultaneously or almost simultaneously, the left eye view image is perceived by the observer's left eye and the right eye view image is perceived by the observer's right eye. The observer can experience the 3D video effect. . The present invention provides techniques for forming a scalable multiview bitstream that includes a base layer having a plurality of encapsulated frames and one or more full resolution enhancement layers. Each of the packaged frames of the base layer may correspond to a single video data frame having different views for corresponding to a scene (eg, "right eye view J and "left eye view") Information on two images. 158748. Doc 201223249 DETAILED DESCRIPTION OF THE INVENTION The technique of the present invention may include a base layer that encodes a reduced resolution image having a left eye view of a scene and a reduced resolution image of a right eye view of the scene 'the reduced resolution image It is packaged into a frame and encoded. Additionally, the techniques of the present invention include encoding two full resolution enhancement layers in a scalable manner, each full resolution enhancement layer including a view of the stereo pairs included in the base layer. For example, in addition to the base layer, the techniques of this disclosure may also include a first enhancement layer that encodes a full resolution image having a right eye view or a left eye view. The techniques of this disclosure may also include encoding a second enhancement layer having a full resolution image of another respective view (e.g., a right eye view or a left eye view not included in the first enhancement layer). According to one aspect of the invention, the multiview bitstream can be encoded in a scalable manner. That is, a device that receives a scalable multiview bitstream can receive and utilize only the base layer, the base layer, and an enhancement layer, or the base layer and the two enhancement layers. In some instances, the techniques of the present invention may be directed to the use of asymmetric packaged frames. That is, in some examples, the base layer can be combined with an enhancement layer to produce a full-resolution image for one view encoded in the enhancement layer, and encoded as part of the base layer for Another view of the reduced resolution image. Without loss of generality, it is assumed that the full resolution image (e.g., 'from the first enhancement layer) is the right eye view of the base layer and the reduced resolution image is the left eye circle portion of the base layer. In this way, the destination device can upsample the left eye view to provide a three dimensional output. Again, in this example, the enhancement layer can be inter-layer predicted (eg, relative to the material for the left eye view of the base layer order)' and/or inter-view predicted (eg, relative to the right in the base layer) Information on the eye view). 158748. Doc •10· 201223249 The present invention generally refers to the image as a view of the frame A A a sample. The present invention generally has, for example, 'L 3 — or a plurality of images, the at least one of the access units of the frame time instance is encoded as a table-is is m ^. Therefore, the frame can be on the sample (ie, a single graph) Like, or in a packaged frame = including samples from multiple views (ie, two or two additional 'this invention generally refers to a layer that may include similar features - a series of layers. In the aspect of the invention, the "base layer" may include a series of encapsulated frames (eg, a frame included in a single-time instance for data of two views)' and included in each of the encapsulated frames. Each image of the view may be encoded with reduced resolution (eg, half resolution). The "enhancement layer" according to aspects of the present invention may include information for one of the views of the base layer, which may be used to Decoding data at the base layer on its own to reproduce a full-resolution image for a view with relatively high quality (eg, 'with reduced distortion). According to some examples, a view that can be combined (enhanced layer) as mentioned above Full resolution image and come The other view of the base layer reduces the resolution image to form an asymmetrical representation of the stereoscopic scene. According to some examples, the base layer may conform to H 264/avc, which allows subsampling two images and packaging them into a single The frame is used for encoding. In addition, the enhancement layer may be encoded with respect to the base layer and/or with respect to another enhancement layer. In an example, the base layer may contain a semi-resolution first image (eg, "left eye view") And a second-resolution second image (eg, "right-eye view") that is packaged in a particular frame (eg, top-bottom, side-by-side, staggered, staggered, quincun) (for example, "checkerboard"), 158748. Doc • 11 - 201223249 or otherwise) packaged into a single frame. Additionally, the first enhancement layer may include a full-resolution image corresponding to one of the images included in the base layer, and the second enhancement layer may include another individual image included in the base layer Another full resolution image. In an example, the first enhancement layer may correspond to a first view of the base layer (eg, 'left eye view'), and the second enhancement layer may correspond to a second view of the base layer (eg, right eye view) ^ here In an example, the first enhancement layer may include a full resolution frame that is inter-layer predicted from the left eye view of the base layer and/or inter-view predicted from the right eye view of the base layer. Additionally, the second enhancement layer can include a full resolution frame that is inter-layer predicted from the right eye view of the base layer and/or inter-view predicted from the left eye view of the base layer. Alternatively or additionally, the second enhancement layer may comprise a full resolution frame that is inter-view predicted from the first enhancement layer. In another example, the first enhancement layer can correspond to a second view of the base layer (e.g., right eye view)' and the second enhancement layer can correspond to a first view of the base layer (e.g., a left eye view). In this example, the first enhancement layer may include a full resolution frame that is inter-layer predicted from the right eye view of the base layer and/or inter-view predicted from the left eye view of the base layer. Additionally, the second enhancement layer can include a full resolution frame that is inter-layer predicted from the left eye view of the base layer and/or inter-view predicted from the right eye view of the base layer. Alternatively or additionally, the second enhancement layer may comprise a full resolution frame that is inter-view predicted from the first enhancement layer. The techniques of the present invention include encoding in accordance with a scalable encoding format that allows a receiving device, such as a client device having a decoder, to receive and utilize the base layer, the base layer, and an enhancement layer or the base layer and two enhancement layers. data. For example, various client devices may be able to utilize the same representation 158748. Doc 201223249 different operating points. In particular, in instances where the operating point corresponds to only the base layer and the client device is capable of two-dimensional (2D) display, the client device can decode the base layer and discard the map associated with one of the views of the base layer. image. That is, for example, the client device can display an image associated with one view of the base layer (e.g., a left eye view) and discard an image associated with another view of the base layer (e.g., a right eye view). In another example where the operating point includes a base layer and the client device is capable of stereo or three dimensional (3D) display, the client device can decode the base layer and display an image of the two views associated with the base layer. That is, the client device can receive the base layer and, in accordance with the teachings of the present invention, reconstruct the images of the left eye view and the right eye view for display. The client device can upsample the images before displaying the image of the left eye view and the right eye view of the base layer. In another example, the operating point can include the base layer and an enhancement layer. In this example, a client device having 2D "High Definition" (HD) display capability can receive the base layer and an enhancement layer and reconfigure a full-resolution view from the enhancement layer in accordance with the teachings of the present invention. The image. As used herein, "high definition" may refer to the native resolution of 192 〇 xi 〇 8 〇 pixels, but it should be understood that the resolutions that constitute "high definition" are relative 'and may be other resolutions Degree is considered "high definition." In another example where the operating point includes the base layer and an enhancement layer and the client device has stereoscopic display capabilities, the client device can decode and reconstruct the image of the full resolution view of the mule layer, and the base layer A half-resolution image of the opposite view. The client device can then pick up 158748 before displaying it. Doc -13· 201223249 Half-resolution image of the base layer. In yet another example, the operating point can include the base layer and two enhancement layers. In this example, the client device can receive the base layer and the two enhancement layers, and reconstruct the images of the left eye view and the right eye view for 3D HD display in accordance with the teachings of the present invention. Therefore, the client device can utilize the enhancement layer to provide full resolution data related to the two views. Therefore, the client device can display the native full resolution image of the two views. The scalable nature of the techniques of the present invention allows various client devices to utilize the base layer, the base layer, and an enhancement layer or the base layer and two enhancement layers. According to some aspects, a client device capable of displaying a single view can utilize video material that provides a single view reconstruction. For example, the device can receive the base layer or the base layer and an enhancement layer to provide a single view representation. In this example, the 'user interface' can avoid requesting or discarding the enhancement layer material associated with the other-view after receiving it. When the device does not receive or decode the enhancement layer data of the second view, the device can upsample the image from a view of the base layer. According to other aspects, the client device can display more than one view (eg '3D TV 'Computer, handheld device or the like) may utilize data from the base layer, the first enhancement layer, and/or the second enhancement layer. For example. The device can utilize data from the base layer to generate a three-dimensional representation of the scene using two views of the base layer at the first resolution. Alternatively, the device can be utilized from the beauty of the s.  The base layer and the data of an enhancement layer are used to generate a three-dimensional representation of the scene. One of the views of the Yin scene has a relatively higher resolution than the other view of the scene, and the weight resolution. Alternatively, this device can be utilized from the \5S748. Doc -14- 201223249 The base layer and the data of the two enhancement layers to produce a three-dimensional representation of the scene, both of which have relatively high resolution. In this way, the representation of the multimedia content can include three layers: a base layer with video material for two views (eg, left and right views), for the first of the two views - enhanced a layer, and a second enhancement layer for the other of the two views. As discussed above, the two views can form a pair of stereo views in that the data of the two views can be displayed to produce a two-dimensional effect. In accordance with the teachings of the present invention, the first enhancement layer can be predicted from either or both of the corresponding views encoded in the base layer and/or the opposite views encoded in the base layer. The second enhancement layer can be predicted from either or both of the corresponding views encoded in the base layer and/or the corresponding views encoded in the first enhancement layer. The present invention refers to the prediction of the enhancement layer from the corresponding view of the base layer as "inter-layer prediction", and the prediction of the enhancement layer from the opposite view (either from the base layer or from another enhancement layer) is referred to as "inter-view prediction". Either or both of the enhancement layers may be inter-layer predicted and/or inter-view predicted. The invention is also provided for use at the Network Abstraction Layer (NAl) (eg, supplemental enhancement information in the Nal unit (SEI) ) Techniques for signal transmission layer dependencies in Messages or Sequence Parameter Sets (SPS). The present invention also provides techniques for signaling the decoding dependencies of NAL units in an access unit (of the same time instance). That is, the present invention provides techniques for signaling how to use a particular NAL unit to predict other layers of a scalable multiview bitstream. In H. In the example of 264/AVC (Advanced Video Coding), the encoded video regions & are organized into NAL units that provide focus on, for example, video power 158748. Doc 201223249 "Network affinity" video representation for words, storage, broadcast or streaming applications. The NAL unit can be classified into a video coding layer (Vcl) NAL unit and a non-VCL NAL unit. The VCL unit can contain an output from the core compression engine and can include a block, a macro block, and/or a fragment level data (slice). Level data). Other NAL units may be non-VCL NAL units. In some examples, an encoded image (generally presented as a primary encoded image) in a time instance may be included in an access unit that may include one or more NAL units. In some examples, the techniques of the present invention are applicable to H. 264/AVC codec, or based on Advanced Video Coding (AVC) such as Scalable Video Coding (SVC), Multiview Video Coding (MVC), or H. Codec for other extensions of 264/AVC). The codecs can be configured to identify SEI messages when the SEI message is associated with the access unit, wherein the SEI messages can be encapsulated within the access unit by an ISO base media file format or an MPEG-2 system bit stream . These techniques can also be applied to future coding standards 'for example' H. 265/ HEVC (High Efficiency Video Coding). The SEI message may contain information that is not necessary to decode the encoded image samples from the VCL NAL unit' but may aid in decoding, display, error resilience, and other purposes. SEI messages can be included in non-VCL elements. SEI messages are part of the formality of some standard specifications and are therefore not always mandatory for standard compliance decoder implementations. The SEI message can be a sequence level SEI message or an image level SEI message. Some sequence level information can be included in the SEI message (such as the SEI message in the SVC instance, and the view scalability in MVC). mSEI message) in the middle of this 158748. Doc -16· 201223249 Example SEI messages convey information about, for example, the extraction of operating points and the characteristics of operating points. H. 264/AVC provides a frame encapsulation SEI message that is a codec level message indicating the type of frame encapsulation for a frame that includes two images (e.g., 'left and right views of the scene'). For example, various types of frame encapsulation methods are supported for spatial interleaving of two frames. The supported interleaving methods include checkerboard, line interleaving, column interleaving, side-by-side, top and bottom, and side-by-side with checkerboard upconversion. The frame encapsulation SEI message is described in "Information technology-Coding of audio-visual objects-Part 10: Advanced Video Coding, AMENDMENT 1: Constrained baseline profile, stereo high profile and frame packing arrangement SEI message" (N101303, MPEG of ISO/ IEC JTC1/SC29/WG11 (October 2009, Xi'an, China)), which is incorporated into H. The latest version of the 264/AVC standard. In this way, H. 264/AVC supports interleaving two images of the left and right views into one image and encoding the images into a video sequence. The present invention provides an operating point SEI message indicating an operating point available for encoded video material. For example, the present invention provides an operating point SEI message indicating an operating point for various reduced resolution and full resolution layer combinations. These combinations can be further classified based on different time subsets corresponding to different frame rates. The decoder can use this information to determine if a one-bit stream includes multiple layers, and to properly separate the base layer into two views and an enhanced view. Additionally, in accordance with some aspects of the present invention, the techniques of the present invention include providing 158748. Doc -17- 201223249 for h. The sequence parameter set ("sps") of 264/avc is extended. For example, a sequence parameter set may contain information that can be used to decode a relatively large number of VCL Wei cells. The sequence parameter set can be applied to a series of coded video sequences, referred to as a continuous coded picture I. According to some of the actuals, the techniques of the present invention may be related to providing an SPS extension to describe the following: (1) the location of the image of the left eye view in the base layer; (2) the order of the full resolution enhancement layer ( For example, whether the image of the left eye view is encoded before the image of the right eye view, or whether the image of the right eye view is encoded before the image of the left eye view); (3) Dependence of the full resolution enhancement layer (eg, the enhancement layer is predicted from the base layer or from another enhancement layer (4) for the full resolution of the single-view image (for example, for the base layer and a corresponding enhancement layer) Support for one of the images); (5) support for asymmetric operating points (eg, for support of the base layer including the frame, the frames have full resolution images for one view and are used for (6) Support for inter-layer prediction; and (7) Support for inter-view prediction. Figure 1 is a block diagram showing a video encoding and decoding system for a video encoding and decoding system. Available for forming including from a scene Technique for scalable multi-view bitstreams of images of two views. As shown in Figure 1, system 10 includes source device 12 that transmits encoded video to destination device 14 via communication channel 16. Source device 12 And destination device 14 may comprise any of a wide range of devices, such as fixed or mobile computing devices, set-top boxes, game consoles, digital media players, or the like. In some cases, source device 12 and Destination device 14 may comprise a wireless communication device such as a wireless handset, a so-called cellular or satellite radiotelephone, or 158748. Doc -18- 201223249 Any wireless device that can communicate video information via communication channel 16, in which case 'communication channel 16 is wireless. However, the techniques of the present invention that relate to forming scalable multi-view bitstreams are not necessarily limited to wireless applications or settings. For example, such techniques can be applied to over-the-air television broadcasts, cable television transmissions, satellite television transmissions, internet video transmissions, encoded digital video encoded onto storage media, or other contexts. Thus, communication channel 16 can include any combination of wireless or wired media suitable for transmitting encoded video material. In the example of Figure 1, source device 丨2 includes video source i 8, video encoder 20, modulator/demodulation transformer (data machine) 22, and transmitter. Destination device 14 includes a receiver 26, a data machine 28, a video decoder 30, and a display unit 32. In accordance with the present invention, the video encoder 来源 of the source device 12 can be configured to apply to form a scalable multi-view bit stream (eg, one base layer and one or more enhancement layers (eg, two enhancement layers) ) technology. For example, cr, the base layer may include encoded data for two images, each image from a different view of one scene (eg, left eye view and right eye view) 'where video encoder 20 is reduced by two The resolution of the images and the images are combined into a single frame (eg, each image is - half of the resolution of the full resolution frame). The first enhancement layer may include encoded data for a full resolution representation of one of the views of the base layer, and the second enhancement layer may include a full resolution encoding for another separate view of the base layer t data. In particular, video encoder 20 may implement inter-view prediction and/or inter-layer prediction to encode the enhancement layer relative to the base layer. For example, the video encoder 2 编码 encodes an enhancement layer corresponding to the image of the left eye view of the base layer. In this example 158748. In doc 201223249, video encoder 20 may implement an inter-layer prediction scheme to predict the enhancement layer from the corresponding image of the left eye view of the base layer. In some examples, video encoder 20 may reconstruct an image of the left eye view of the base layer prior to predicting the image of the enhancement layer. For example, video encoder 20 may upsample the image of the left eye view of the base layer prior to predicting the image of the enhancement layer. Video encoder 2 may perform inter-layer prediction by performing inter-layer texture prediction based on the reconstructed base layer or by performing inter-layer motion prediction based on the base layer-based motion vector. Alternatively or additionally, video encoder 20 may implement an inter-view prediction scheme to predict an enhancement layer from an image of a right eye view of the base layer. In this example, video encoder 20 may reconstruct a full-resolution image of the right eye view of the base layer prior to performing inter-view prediction for the enhancement layer. In addition to the enhancement layer corresponding to the full-resolution image of the left-eye view of the base layer, the video encoder 20 may also encode another enhancement layer corresponding to the full-resolution image of the right-eye view of the base layer, in accordance with the present invention. In some aspects, video encoder 20 may use inter-view prediction and/or inter-layer prediction with respect to the base layer to predict an enhancement layer image of the right eye view. Additionally, video encoder 2 may predict the enhancement layer image of the right eye view using inter-view prediction with respect to another previously generated enhancement layer (e.g., an enhancement layer corresponding to the left eye view). In other examples, the source device and destination device may include other components or configurations. For example, source device 12 may receive video material from an external video source 18 (such as an 'external camera'). Similarly, destination device 14 can establish an interface connection with an external display device rather than an integrated display device. The system 10 illustrated in Figure 1 is merely an example. Can be viewed by any number 158748. Doc 201223249 The frequency encoding and/or decoding device performs techniques for generating scalable multiview bitstreams. Although the techniques of the present invention are typically performed by video encoding devices, such techniques may also be performed by a video encoder/decoder (commonly referred to as "CODEC"). In addition, the techniques of the present invention may also be performed by a video preprocessor or a video post processor such as a file encapsulation unit, an archive decapsulation unit, a video multiplexer, or a video demultiplexer. Source device 12 and destination device 14 are merely examples of such encoding devices in which source device 12 generates encoded video material for transmission to destination device 14. In some examples, the 5 | pieces 12, 14 can operate in a substantially symmetrical manner such that each of the devices 12, 14 includes a video encoding and decoding component. Thus, the system can support one-way or two-way video transmission between devices i 2, 14, for example, for video streaming, video playback, video broadcasting, video games, or video telephony. The video source 18 of the source device 12 may include a video capture device such as a video camera, a video archive containing previously captured video, and/or a video feed from a video intra Valley provider. As a further alternative, video source 18 may generate computer graphics based material as a source video, or a combination of live video, archived video, and computer generated video. In some cases, if the video source 18 is a video camera, the source device 丨 2 and the destination device 14 may form a so-called camera phone or video phone. However, as mentioned above, the techniques described in this disclosure are generally applicable to video coding' and are applicable to wireless and/or wired applications that are performed by mobile computing devices or typically by non-mobile computing devices. In any case, captured, pre-captured, or computer generated by video encoder 20 may be encoded 158748. Doc •21- 201223249 Frequency. Video source 18 may provide images from two or more views to video encoder 20. Two images of the same scene can be captured simultaneously or nearly simultaneously from slightly different horizontal positions such that the two images can be used to produce a three dimensional effect. Alternatively, video source 18 (or another unit of source device 12) may use depth information or difference information to generate a second image of the second view from the first image of the first view. The depth or difference information may be determined by capturing the camera of the first view or may be calculated from the data in the first view. Part 3 of MPEG-C provides a prescribed format for including depth maps of images in a video stream. This specification is described in "Text of ISO/IEC FDIS 23002-3 Representation of Auxiliary Video and Supplemental Information" (ISO/IEC JTC 1/SC 29/WG 11, MPEG Doc, N81368 (January 2007, Marrakesh, Morocco) (Marrakech))). In MPEG-C Part 3, the auxiliary video can be a depth map or a disparity map. When representing a depth map, MPEG-C Part 3 provides flexibility in terms of the number of bits used to represent each depth value and resolution of the depth map. By way of example, the figure may be a quarter of the width of the image as depicted by the figure and one-half the height of the image. The map can be encoded as a monochrome video sample, for example, in H with only the illuminance component. 264/AVC bit stream. Or, such as H. As defined in 264/AVC, the map can be encoded as auxiliary video material. In the context of the present invention, the depth map or disparity map may have the same resolution as the primary video material. Although H. The 264/AVC specification currently does not specify the use of auxiliary video material to encode depth maps, but the techniques of the present invention can be used in conjunction with techniques for using such depth maps or disparity maps. 158748. The doc -22-201223249 may then modulate the encoded video information by the data machine 22 in accordance with the communication standard and transmit the information to the destination device 14 via the transmitter 24. The data machine 22 can include various mixers, filters, amplifiers, or other components designed for signal modulation. Transmitter 24 may include circuitry designed for routing data, including amplifiers, filters, and one or more antennas. Receiver 26 of destination device 14 receives the information via channel 16, and data processor 28 demodulates the information. Again, the video encoding program can implement one or more of the techniques described herein to provide a scalable multi-view bitstream. That is, the 'video encoding program can implement one or more of the techniques described herein to provide a one-bit stream having a base layer including reduced resolution images of two views, and including The view of the base layer corresponds to two enhancement layers of the full resolution image. The information conveyed via channel 16 may include syntax information defined by video encoder 2, which is also used by video decoder 3, which includes describing macroblocks and other encoded units (eg, G〇p) ) The characteristics and/or syntax elements of the process. Therefore, the video decoder 3G can decapsulate the base layer into a constituent image of the view, decode the images, and upsample the reduced resolution image to full resolution. The video decoder 3 may also determine to encode one or more enhancement layers (eg, prediction methods) and decoding - or multiple enhancement layers to produce a full-resolution image of one or two views included in the base layer Methods. Display device 32 can display a (four) code image to the user. Display device 32 can include any of a variety of display devices such as cathode ray tube (CRT), liquid crystal display (lcd), plasma display, organic light 158748. Doc -23· 201223249 Diode (OLED) display, or another type of display device. Display device 32 can display two images from a multiview bit stream simultaneously or nearly simultaneously. For example, display device 32 can include a stereoscopic three-dimensional display device capable of displaying two views simultaneously or nearly simultaneously. The user can wear the active glasses to quickly and alternately block the left and right lenses so that the display device 32 can quickly switch between the left and right views in synchronization with the active glasses. Alternatively, display device 32 can display two views simultaneously, and the user can wear passive glasses (e.g., have a polarizing lens) that filters the views to pass the appropriate view through the user's eyes. As yet another example, display device 32 can include an autostereoscopic display that does not require glasses. In the example of Figure 1, communication channel 16 may comprise any wireless or wired communication medium, such as a radio frequency (RF) spectrum or one or more physical transmission lines, or any combination of wireless and wired media. Communication channel 丨6 can form part of a packet-based network, such as a regional network, a wide area network, or a global network such as the Internet. Communication channel 16 generally represents any suitable combination of media or different communication media for transmitting video material from source device 12 to destination device 14, including any suitable combination of wired or wireless media. Communication channel 16 may include a router, switch, base station' or any other device that may be used to facilitate communication from source device 12 to destination device 14. Video encoder 20 and video decoder 30 may be based on video compression standards (such as 'ITU-TH. The 264 standard, or MPEG-4 Part 10 (Advanced Video Editing (AVC)), operates. However, the technology of the present invention is not limited to any of 158748. Doc -24- 201223249 What is the specific coding standard. Other examples include MPEG-2 and ITU-T H. 263. Although not shown in FIG. 1, in some aspects 'video encoder 20 and video decoder 30 may each be integrated with an audio encoder and decoder' and may include a suitable MUX-DEMUX unit or other hardware and software. To handle the encoding of both audio and video in a common data stream or separate data stream. The MUX-DEMUX unit complies with ITU H. When applicable. 223 multiplexer agreement, or other agreement such as User Datagram Protocol (UDP). ITU-T H. The 264/MPEG-4 (AVC) standard is clarified by the ITU-T Video Coding Experts Group (VCEG) along with the ISO/IEC Expert Group (MPEG) as a product of a collective partner called the Joint Video Team (JVT). . In some aspects, the techniques described in this disclosure can be applied to generally comply with H. 264 standard device. H. The 264 standard is described in ITU-T Recommendation H., proposed by the ITU-T Study Group and dated March 2005. In 264 (Advanced Video Coding for General Audiovisual Services), this standard may be referred to herein as the H, 264 standard or H. 264 specification, or H. 264/AVC standard or specification. The Joint Video Team (JVT) continues to work on H. Extension of 264/MPEG-4 AVC. The technology of the present invention may include a pair of H. A modified extension of the 264/AVC standard. For example, video encoder 20 and video decoder 30 may utilize modified scalable video coding (SVC), multi-view video coding (MVC), or H. Other extensions of 264/AVC. In one example, the technique of the present invention includes H, which is referred to as "multi-view frame compatible" ("MFC"). 264/AVC extensions, which include a "base view" (e.g., referred to herein as the base layer) and one or more "enhanced views" (e.g., referred to herein as enhancement layers). That is, the MFC extension "Basic 158748. The doc -25-201223249 view may include reduced resolution images of two views of a scene that are captured at slightly different horizontal perspectives but captured almost simultaneously or nearly simultaneously in time. Thus, the "base view" of the MFC extension may actually include images from a plurality of "views" (e.g., left eye view and right eye view) as described herein. In addition, the "enhanced view" of the MFC extension may include a full resolution image of one of the views included in the "base view". For example, an "enhanced view" of an MFC extension may include a full-resolution image of a left-eye view of the "base view." Another "enhanced view" of the MFC extension may include a full resolution image of the right eye view of the "base view". Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuits, such as one or more microprocessors, digital signal processors (DSPs), special application integrated circuits (ASICs), Fields can be programmed to open arrays (FPGAs), discrete logic, software, hardware, firmware, or any combination thereof. Each of video encoder 20 and video decoder 3A may be included in a plurality of encoders or decoders, each of which may be in a separate camera, computer, Integration into a combined encoder/decoder (CODEC) in mobile devices, user devices, broadcast devices, set-top boxes, servers, or the like. A video sequence typically includes a series of video frames. A group of pictures (GOP) usually contains - series - or multiple video frames. G〇p may include grammar data in the header of G〇p, in the header of the GOP—or multiple frames, or elsewhere—the grammar data describing several frames included in G〇p. Each frame may include a frame syntax that describes the coding mode for the respective frame 158748. Doc •26· 201223249. Video encoder 20 typically operates on video blocks within individual video frames to encode video material. The video block may correspond to a partition of a macroblock or a macroblock. Video blocks may have fixed or varying sizes and may vary in size according to a specified coding standard. Each video frame can include a plurality of segments. Each segment may comprise a plurality of macroblocks, which may be configured as partitions also referred to as sub-blocks. As an example, ITU-T H. The 264 standard supports intra-frame prediction in various block sizes (such as '16 by 16, 8 by 8 or 4 by 4' for luma components and 8x8 for chroma components), and in various block sizes (such as For inter-frame prediction of 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, and 4x4 for luma components, and for corresponding scaled values of chroma components. In the present invention, "ΝχΝ" and "N by N" are used interchangeably to refer to the pixel size of a block in a vertical size and a horizontal size, for example, 16 χ 16 pixels or 16 by 16 pixels. In general, a 16x16 block will have "one pixel (y = 16) in the vertical direction and 16 pixels (χ = ΐ 6) in the horizontal direction. Similarly, the ΝχΝ block usually has a 垂直 in the vertical direction. One pixel and one pixel in the horizontal direction' where Ν represents a non-negative integer value. The pixels in the block can be arranged in columns and rows. Furthermore, the blocks do not necessarily need to be horizontally "the number is in the vertical direction" The number of pixels with the same number of pixels. For example, a block may include a ΝχΜ pixel, where Μ does not necessarily have to be larger than a block of 16 by 16, and j may be referred to as a partition of a 16 macroblock. The video block may comprise a block of pixel data in the pixel domain, or a block of transform coefficients in the transform domain, for example, a transform such as a (four) (four) transform (DCT) | number transform 'wavelet transform or a conceptually similar transform 158748. Doc •27· 201223249 is used after the residual video block data representing the pixel difference between the encoded video block and the predictive video block. In some cases, a video block may contain quantized transform coefficient blocks in the transform domain. Smaller video blocks provide better resolution and can be used for portions of video frames that include high levels of detail. In general, macroblocks and various partitions (sometimes referred to as sub-blocks) can be considered video blocks. In addition, a slice may be considered as a plurality of video blocks, such as macroblocks and/or sub-blocks. Each segment can be an independently decodable unit of one of the video frames. Alternatively, the frame itself may be a decodable unit, or other portions of the frame may be defined as decodable units. The term "coded unit" may refer to any independently decodable unit of a video frame, such as an entire frame, a fragment of a frame, also referred to as a sequence of image groups (GOP)' or according to applicable coding techniques. Another independent decodable unit defined. After any in-frame predictive or inter-frame predictive coding to generate predictive and residual data, and applied to the residual data to produce any transform of the transform coefficients (such as H. After the 4M or 8x8 integer transform 'or discrete cosine transform DCT used in 264/AVC, the quantization of the transform coefficients can be performed. Quantization generally refers to the process of quantizing the transform coefficients to possibly reduce the amount of data used to represent the coefficients. The quantization procedure can reduce the bit depth associated with some or all of the coefficients. For example, the bit value can be truncated to a W bit value during quantization, where W is greater than /« 之后 after quantization can be, for example, based on content adaptive variable length coding (content adaptive variable) Length coding, CAVLC), context adaptive binary coding (context adaptive binary 158748. Doc • 28 - 201223249 — C coding, CABAC) or entropy of another entropy data, 1 π z^π 4 . Flat code. The processing unit configured for entropy encoding can perform other processing functions, such as .  For example, the length of the quantized coefficients: length encoding 'and/or syntax information (such as 'coded block pattern (cBp), macro block type, marshalling mode, coded unit (such as:: , the maximum macroblock size of a segment, a macroblock, or a sequence, or the like. The video encoder 2 can, for example, be in the frame header, the block header, the fragment label = or (10) the grammar will be grammar (4) (such as block-based grammar, graph-based grammar data) 'and/or grammar data based on G〇p) is sent to the video decoder 3〇. The GOP syntax data may describe several frames of each GO", and the frame syntax data may indicate the encoding/prediction mode used to edit the corresponding frame. Therefore, the video decoder 3 may include a standard video decoder. And not necessarily required to be configured to implement or utilize the techniques of the present invention. Where applicable, video encoder 20 and video decoder 3〇 may each be implemented in any suitable. Encoder or decoder circuit, such as , one or more microprocessors, digital signal processors (Dsp), talented application integrated circuits (ASIC), field programmable (four) columns (FpGA), discrete logic circuits, hardware and software, orthodontics, or Any combination thereof, each of the video encoder 2 and the video decoder 30 may be included in - or a plurality of encoders or semaphores, and the - or a plurality of encoders or decoders may be integrated For the combined video editing " ° solution 'IH (CODEC) part ^ including video encoder 2 (5 and / or video decomposer 30 device can include integrated circuit, microprocessor, calculator 158748. Doc •29- 201223249, and/or wireless communication devices such as mobile phones. The video decoding H3G can be configured to receive a scalable multiview bitstream comprising - a base layer and two enhancement layers. The video decoder 3 can be further configured to decapsulate the base layer into two corresponding image sets, such as a reduced resolution image of the left eye view and a reduced resolution image of the right eye view. Video decoder 30 may decode the images and upsample (e.g., via interpolation) to reduce the resolution image to produce a decoded full resolution image. Additionally, in some examples the 'video decoder 3' may refer to the decoded image of the base layer to de-emphasize the enhancement layer'. The enhancement layers include full-resolution images corresponding to the base layer. That is, the video decoder 3 can also support the inter-view prediction method and the inter-layer prediction method. In some examples, video decoding 3() can be configured to determine if the destination device can decode and display the three-dimensional data. If the destination device μ is unable to decode and display the three-dimensional data, video decoder 30 may decapsulate the received base layer but discard one of the reduced resolution images. Video decoder 30 may also discard the full resolution enhancement layer corresponding to the discarded reduced resolution image of the base layer. The video decoder 3 can decode the remaining reduced resolution image, upsampling or upconversion to reduce the resolution image, and cause the video display 32 to display images from this view to render the 2D video material. In another example, the 'four' code H3G can decode the unfortunately parsed money image and corresponding enhancement layer and cause the video display 32 to display an image from this view to render the two-dimensional video material. Thus video decoder 30 can decode only a portion of the frame and provide the decoded image to display device 32 without attempting to decode all of the frames 0 158748. Doc -30- 201223249 In this manner, regardless of whether the destination device 14 is capable of displaying 3D video material, the destination device 14 can receive a scalable multiview bitstream comprising one base layer and two enhancement layers. Thus, various destination devices with various decoding and presentation capabilities can be configured to receive the same bit stream from video encoder 20. That is, some destination devices may be able to decode and visualize 3D video material, while other destination devices may not be able to decode and/or visualize 2D video material 'however, each of these devices may be configured to receive and Use data from the same scalable multiview bitstream. According to some examples, the scalable multiview bitstream may include a plurality of operating points to facilitate decoding and displaying a subset of the received encoded data. For example, in accordance with aspects of the present invention, a scalable multiview bitstream includes four operating points: (1) a reduced resolution image that includes two views (eg, a left eye view and a right eye view) a base layer; (2) an enhancement layer of the base layer and a full-resolution image including a left-eye view; (3) an enhancement layer of the base layer and a full-resolution image including a right-eye view; and (4) the The base layer, the first enhancement layer, and the second enhancement layer are such that the two enhancement layers together comprise a full resolution image of the two views. 2A is a block diagram illustrating an example of a video encoder 2 that may implement techniques for generating a scalable multiview bitstream having two views including a scene (eg, a left layer view and a right eye view) a base layer of the reduced resolution image, and a first enhancement layer including a full resolution image of one of the views of the base layer, and including another layer from the base layer - The second enhancement layer of the full resolution image of the respective views. It should be understood that the particular components of Figure 2A may be for a single component 158748 for conceptual purposes. Doc 201223249 is shown and described 'but may include one or more functional units. In addition, although the components of FIG. 2A can be shown and described with respect to a single component, such components can be physically wrapped with _ gamma + ^ 3 or more discrete and/or integrated units 0 with respect to FIG. 2A, and Elsewhere in the present invention, video coding (4) is described as an encoded- or multiple video material frame. As described above, a layer (e.g., 'base layer and enhancement layer') may include a series of frames constituting multimedia content. Therefore, the 'base frame' can refer to the single-video data frame in the base layer. In addition, the enhanced frame can refer to a single video material frame in the enhancement layer. In general, video encoder 20 may perform intra-frame coding and inter-frame coding of blocks within a video frame (including macroblocks, or partitions or sub-partitions of macroblocks). "In-frame coding depends on space. Forecast to reduce or remove spatial redundancy of video within a given video frame. An in-frame mode (1 mode) may refer to any of a number of space-based compression modes, and an inter-frame mode such as unidirectional prediction (p mode) or bidirectional prediction (B mode) may refer to several times in time Any of the basic compression modes. Inter-frame coding relies on temporal prediction to reduce or remove temporal redundancy of video within adjacent frames of a video sequence. In some examples, video encoder 20 may also be configured to perform inter-view prediction and inter-layer prediction of the enhancement layer. For example, video encoder 2 〇 can be configured to be based on H. 264/AVC Multiview Video Coding (MVC) is extended to perform inter-view prediction. In addition, the video encoder 20 can pass through the ear state according to H. 264/AVC's Scalable Video Coding (SVC) extension is used to perform inter-layer prediction. Therefore, the enhancement layer can be predicted from the base layer or inter-layer pre-existing from the base layer, j. 158748. Doc -32- 201223249 In addition, one enhancement layer can be predicted between views from another enhancement layer. As shown in Figure 2A, video encoder 2 receives the current video block within the video image to be encoded. In the example of Fig. 2A, video encoder 2 includes motion compensation unit 44, motion/difference estimation unit 42, reference frame store, summer 50, transform unit 52, quantization unit 54, and smoke coding unit %. For reconstruction of the video block, the video encoder 2〇 also includes an inverse quantization unit 58, an inverse transform unit 6〇, and a summer 62. A deblocking waver (not shown in Figure 2A) may also be included to traverse the block boundaries to remove the block effect from the reconstructed video. If necessary, the solution block (4) will often filter the output of the summer 62. Between the programming (4), the video coding (4) is connected to (4) the encoded video image or segment. The image or segment can be divided into multiple video blocks. Motion estimation/differentiation unit 42 and motion compensation unit 44 perform inter-frame predictive encoding of the received video blocks relative to one or more blocks in one or more reference frames. That is, the motion estimation/discrimination unit 42 may perform inter-frame predictive coding of the received video zone = with respect to one or more of the different time instances - or multiple blocks, for example, using the same - Motion estimation of the view—or multiple reference frames. In addition, the motion estimation/differential unit may perform inter-frame predictive coding of the received video block with respect to one or more blocks of the same-time instance item or multiple reference frames, for example, using different The difference in motion between one or more reference frames. In-frame prediction unit 46 may perform intra-frame predictive coding of the pure video block relative to one or more of the same blocks in the block to be encoded to provide spatial compression. The mode selection unit 4 can "prepare 1, # (for example) based on the error result and select the in-frame J58748. Doc • 33- 201223249 one of code mode or inter-frame coding mode, and the resulting in-frame code block or inter-frame code block is provided to summer 5 to generate residual block data, and The resulting intra-coded block or inter-coded block is provided to summer 62 to reconstruct the coded block for use in the reference frame. In particular, video encoder 20 can receive images from two views that form a stereo view pair. The two views may be referred to as a view and a view 1, where view 〇 corresponds to a left eye view image and view 〇 corresponds to a right eye view image. It should be understood that the views may be marked differently, and instead, view i may correspond to a left eye view and view 〇 may correspond to a right eye view. In one example, video encoder 20 may encode the base layer by encoding the view and view image with reduced resolution (e.g., half resolution). That is, video encoder 20 may downsample the images to the original one-half before encoding view 0 and the view image. The video encoder 2 can further encapsulate the encoded image into a packaged frame. Assume, for example, that video encoder 20 receives the view image and view! The image, each image has a height of pixels and a width of w pixels, where ice and & are non-negative non-zero integers. The video encoder 20 can form an upper and lower configuration by downsampling the heights of the view 〇 image and the view j image to a height of A/2 pixels and arranging the downsampled view 上方 over the downsampled view 1. Package the frame. In another example, video encoder 20 may downsample the width of the view 〇 image and the view 1 image to a width of w/2 pixels and configure the downsampled view 经 to the downsampled view 1 A side-by-side configuration packaged frame is formed on the left side. The side-by-side frame encapsulation configuration and the top-bottom frame encapsulation configuration are provided as examples only, and it should be understood that the video encoder 2 can be configured in other ways (156748. Doc -34· 201223249 For example, checkerboard pattern, staggered line or staggered column) to encapsulate the view image and the view 1 image of the base frame. For example, video encoder 20 may be based on H. The 264/AVC specification supports frame encapsulation. In addition to the base layer, video encoder 20 may also encode two enhancement layers corresponding to the views included in the base layer. That is, video encoder 20 may encode a full resolution image of view 0 and a full resolution image of view 1. Video encoder 20 may perform inter-view prediction and inter-layer prediction to predict the two enhancement layers. Video encoder 20 may further provide information indicative of various characteristics of the scalable multiview bitstream. For example, video encoder 20 may provide a sequence indicating the encapsulation configuration of the base layer, the enhancement layer (eg, the enhancement layer corresponding to view 0 is present before or after the enhancement layer corresponding to view 1), the enhancements Whether the layers are predicted from each other and other information. As an example, video encoder 20 may provide this information in the form of a sequence parameter set (SPS) extension that is applied to a series of consecutive coded frames. The SPS extension can be defined according to the example data structure in Table 1 below: Table l-seq_parameter_set_mfc_extension SPS message seq parameter set mfc_extension() { C descriptor upper-left frame 0 0 u(l) left view enhance first 0 U(l) full Left control dependent 0 u(l) one view full idc 0 u(2) assymetric flag 0 u(1) inter layer pred disable flag 0 u(l) inter view pred disable flag 0 u(l) > 158748. Doc - 35 - 201223249 SPS can inform the video decoder (such as video decoder 30) that the output decoded image contains a plurality of different spatial packaging components including the illustrated frame encapsulation configuration scheme. The sample SPS message of the frame may also inform the video decoder 30 of the characteristics of the enhanced frame. detailed. The video encoder 20 may set upper_left_frame_0 to a value of 1 to indicate that the left upper luma sample of each of the constituent frames belongs to the left view, thereby indicating which portions of the base layer correspond to the left or right view. The video encoder 20 may set upper_ieft_frame_〇 to a value of 〇 to indicate that: the left upper luminance sample of each of the constituent frames belongs to the right view. The present invention also refers to a coded image of a particular view as a "view component." That is, the view component can include encoded images for a particular view (and/or feature layer) at a particular time. Thus, an access unit can be defined to contain all of the view components of a co-simultaneous instance. The decoding order of the access unit and the view component of the access unit do not necessarily need to be the same as the output or display order. Video encoder 20 may set ieft_view_enhance_first to specify the decoding order of the view components in each access unit. In some examples, video encoder 20 may set left-view_enhance_first to a value of 1 to indicate that the full-resolution left view frame is located after the base frame NAL unit in the decoding order' and the full-resolution right view frame Located in the decoding order after the full resolution left view frame. Video encoder 20 may set left_view_enhance_first to a value of 〇 to indicate that the full resolution right view frame is located after the base frame NAL unit in the decoding order, and the full resolution left view frame is in the decoding order. Located after the full resolution right view frame. Video encoder 20 may set full_left_right_dependent_flag to 158748. Doc -36 - 201223249 is the value of 〇 indicates: the full resolution right view frame and the full resolution left view frame are independent of decoding, which means full resolution left view frame and full resolution right view The decoding of the frame depends on the base view and is not dependent on each other. Video encoder 20 may set full_left_right_dependent_flag to a value of 1 to indicate. One of the full resolution frames (for example, a full resolution right view frame or a full resolution left view frame) depends on another full resolution frame. Video encoder 20 may set 〇ne_view_full_idc to a value of 〇 to indicate that there is no operating point for full resolution single view presentation. Video encoder 20 may set one-view-fuujdc to a value of one to indicate that there is a full-resolution single-view operation point allowed after extracting the third view component in the decoding order. The video encoder 2 may set one-view_full_idc to a value of 2 to indicate that, in addition to the operating point supported when the value is equal to 1, there is also a permission allowed after extracting the second view component in the decoding order. Full resolution single view operation point. Video encoder 20 may set the asymmetric_flag to a value of 〇 to indicate that no asymmetric operating points are allowed. Video encoder 2 may set the asymmetric_flag to a value of i to indicate that an asymmetric operating point is allowed such that when decoding any full resolution single view operating point, the full resolution view is allowed along with another view in the base view. Form an asymmetric representation. Video encoder 20 may set inter-iayer_pred-disable_nag to a value of 1 to indicate that no inter-layer prediction is used when encoding the bit stream and when the sequence parameter set is in effect. Video Encoder 2〇 can be 158748. Doc -37- 201223249 inter_layer_pred_disable_flag is set to a value of 0 to indicate that inter-layer prediction may be used. Video encoder 20 may set inter_view_pred_disable_flag to a value of 1 to indicate that no inter-view prediction is used when encoding the bit stream and when the sequence parameter set is in effect. Video encoder 20 may set inter_view_pred_disable_flag to a value of 1 to indicate that inter-view prediction may be used. In addition to the SPS extension, the video encoder 20 can also provide VUI messages. In particular, for asymmetric operating points corresponding to a full resolution frame (e.g., one of the enhanced frames), the video encoder can apply a VUI message to specify the cropped region of the base view. The pruned regions combined with the full resolution view form a representation for asymmetric operating points. The trimmed regions can be described such that the reduced resolution images in the fully resolved image and the asymmetric encapsulated frame can be distinguished. Video encoder 20 may also define a number of operating points for various combinations of base frames and enhanced frames. That is, the video encoder can signal multiple operating points in the operating point SEI. In an example, video encoder 20 may provide an operating point via the SEI message provided in Table 2 below: Table 2 - operation_point_info (payloadSize) SEI message operation point info(payloadSize) { max temporal id 5 u(3) for ( i = 0; i <(3+ full left control dependent flag); i++) { proflle idc 5 u(8) for (j = 0; j <= max temporal id; j++) { level info predict flag[i][j] 5 u(l) 158748.doc -38 - 201223249

位元串流中任—者遵守^設定樓或位I。H.264/AVC未 規定編碼器,但編碼器具有保證經產生位元串流對於解碼 器係標準順應性之任務。在視頻編碼標準之上下文中, 設定檔」對應於演算法、特徵或工具及施加至其之約束 之子集。如由H.264標準所定義,例如,「設定檔」為由 H.264標準所規定之整個位元串流語法之子集。「位準」對 應於解碼器資源消耗(諸如,解碼器記憶體及計算)之限 制,其與圖像之解析度、位元率及巨集區塊(MB)處理率有 關。設定檔可用profile_idc(設定檔指示符)值予以用信號 傳輸,而位準可用level jdc(位準指示符)值予以用信號傳 輸。 表2之實例SEI訊息描述視頻資料之表示之操作點。 max—temporal—id元素通常對應於該表示之操作點之最大圖 158748.doc -39· 201223249 框率。SEI訊息亦提供位元串流之設定檔及該等操作點中 每者之位準的指示。操作點之level_idc可變化,然而, 操作點可與經先前用信號傳輸操作點相同,其中 temporal_id 等於 index」且 layer id 等於 index_i。SEI 訊息使 用aVerage_frame_rate元素來進一步描述針對temp〇rai」d值 中每一者之平均圖框率。雖然在此實例中使用操作點SEI 汛息以用信號傳輸一表示之操作點之特性,但應理解,在 其他貫例中,可使用其他資料結構或技術以用信號傳輸操 作點之相似特性。舉例而言,用信號傳輸可形成序列參數 集多視圖圖框相容(MFC)延伸之部分。 視頻編碼器20亦可產生NAL單元標頭延伸。根據本發明Anyone in the bit stream will follow the setting floor or bit I. The H.264/AVC does not specify an encoder, but the encoder has the task of ensuring that the bitstream is generated for the decoder's standard compliance. In the context of a video coding standard, a profile" corresponds to a subset of algorithms, features or tools and constraints imposed thereon. As defined by the H.264 standard, for example, "profile" is a subset of the entire bitstream syntax specified by the H.264 standard. The "level" corresponds to the limitations of decoder resource consumption (such as decoder memory and computation), which is related to image resolution, bit rate, and macro block (MB) processing rate. The profile can be signaled with a profile_idc (profile indicator) value, and the level can be signaled with a level jdc (level indicator) value. The example SEI message of Table 2 describes the operating point of the representation of the video material. The max-temporal-id element usually corresponds to the maximum graph of the operating point of the representation 158748.doc -39· 201223249 frame rate. The SEI message also provides an indication of the location of the bit stream and the level of each of the operating points. The level_idc of the operating point may vary, however, the operating point may be the same as the previously signaled operating point, where temporal_id is equal to index" and layer id is equal to index_i. The SEI message uses the aVerage_frame_rate element to further describe the average frame rate for each of the temp〇rai"d values. Although operating point SEI information is used in this example to signal the characteristics of an operational point of representation, it should be understood that in other embodiments, other data structures or techniques may be used to signal similar characteristics of the operating point. For example, signal transmission can form part of a multi-view frame compatible (MFC) extension of a sequence parameter set. Video encoder 20 may also generate a NAL unit header extension. According to the invention

之態樣,視頻編碼器20可產生用於經封裝基礎圓框之NAL 單元標頭,及用於增強圖框之分離NAL單元標頭。在一些 實例中,基礎層NAL單元標頭可用以指示增強層之視圖係 自基礎層NAL單元被預測。增強層NAL單元標頭可用以指 示NAL單元是否屬於第二視圖,及導出第二視圖為左視 圖。此外,增強層NAL單元標頭可用於另一全解析度增強 圖框之視圖間預測。 在一實例中,可根據下表3來定義用於基礎圖框之nal 單元標頭: 158748.doc -40- 201223249 表 3-nal_unit_header_base_view_extension NAL單元 nalunit header base view extension() { c 描述符 anchor pic flag 全部 u(l) inter view—frame 0 flag 全部 u⑴ inter view_ frame 1 flag 全部 u(l) inter layer frame 0 flag 全部 u(l) inter layer frame 1 flag 全部 u(l) temporal id 全部 u(3) } 視頻編碼器20可將anchor_pic_flag設定至為1之值以規 定:當前NAL單元屬於錨存取單元。在一實例中,當 non_idr_flag值等於0時,視頻編碼器20可將 anchor_pic_flag設定至為1之值。在另一實例中,當 nal_ref_idc值等於0時,視頻編碼器20可將anchor_pic_flag 設定至為0之值。根據本發明之一些態樣,對於存取單元 之全部VCLNAL單元,anchor_pic_flag之值可能相同。 視頻編碼器20可將inter_view_frame_0_flag設定至為0之 值以規定:當前視圖分量(例如,當前層)之圖框0分量(例 如,左視圖)未被當前存取單元中之任何其他視圖分量(例 如,其他層)用於視圖間預測。視頻編碼器20可將 inter_view_frame_0_flag設定至為1之值以規定:當前視圖 分量之圖框〇分量(例如,左視圖)可被當前存取單元中之其 他視圖分量用於視圖間預測。 視頻編碼器20可將11^61*_^6\¥_:^&1116_1_【1&§設定至為0之 值以規定:當前視圖分量之圖框1部分(例如,右視圖)未被 當前存取單元中之任何其他視圖分量用於視圖間預測。視 158748.doc -41 - 201223249 頻編碼器20可將inter_view_frame_l_flag設定至為1之值以 規定:當前視圖分量之圖框1部分可被當前存取單元中之 其他視圖分量用於視圖間預測。 視頻編碼器20可將inter_layer_frame_0_flag設定至為0之 值以規定:當前視圖分量之圖框0部分(例如,左視圖)未被 當前存取單元中之任何其他視圖分量用於層間預測。視頻 編碼器20可將inter_view_frame_0_flag設定至為1之值以規 定:當前視圖分量之圖框〇部分可被當前存取單元中之其 他視圖分量用於層間預測。 視頻編碼器20可將inter_layer_frame_l_flag設定至為0之 值以規定:當前視圖分量之圖框1部分(例如,左視圖)未被 當前存取單元中之任何其他視圖分量用於層間預測。視頻 編碼器20可將inter_view_frame_l_flag設定至為1之值以規 定:當前視圖分量之圖框1部分可被當前存取單元中之其 他視圖分量用於層間預測。 在另一實例中,可將 inter_view_frame_0—flag 及 inter_view_frame_l_flag組合成一個旗標。舉例而言,若 圖框0部分或圖框1部分可用於視圖間預測,則視頻編碼器 20可將inter_view_flag(表示上文所描述之inter_view_ frame_0_:flag及 inter_view_frame_l_flag之組合的旗標)設 定至為1之值。 在另一實例中,可將 inter_layer_frame_0_flag 及 inter layer_frame_l_flag組合成一個旗標。舉例而言,若圖框0 部分或圖框1部分可用於層間預測,則視頻編碼器20可將 158748.doc -42- 201223249 inter一layer_flag(表示 inter一layer_frame_0_flag 及 inter 1巳乂61'_;^&1116_1_0&§之組合的旗標)設定至為1之值。 在另一實例中,可將 inter_view_frame_0_flag 及 inter_layer_frame_0_flag組合成一個旗標。舉例而言,若 圖框0部分可用於其他視圖分量之預測,則視頻編碼器20 可將 inter一component一frame_0_flag(表示 inter_view_frame_ 0_flag及inter_layer_frame_0_flag之組合的旗標)設定至為1 之值。 在另一實例中,可將 inter__view_frame_l_flag 及 inter_layer_frame—l_Hag組合成一個旗標。舉例而言,若 圖框1部分可用於其他視圖分量之預測,則視頻編碼器20 可另年 inter_component—frame_l_flag(表示 inter_view_frame_ l_flag及inter_layer_frame_l_flag之組合的旗標)設定至為1 之值。 在另一實例中,可將 inter_view—flag 及 inter_layer_flag 組合成一個旗標。舉例而言,若圖框〇部分或圖框1部分可 用於視圖間或層間預測,則視頻編碼器20可將 inter_component_flag(表示 inter一view一Hag 及 inter_layer_ flag之組合的旗標)設定至為1之值。 視頻編碼器20可設定second_view—flag以指示相屬視圖 分量為第二視圖或是第三視圖,其中「相屬視圖分量」指 代第二視圖旗標所對應之視圖分量。舉例而言,視頻編碼 器20可將second_view_flag設定至為1之值以規定:相屬視 圖分量為第二視圖。視頻編碼器20可將second_view_flag 158748.doc -43- 201223249 設定至為〇之值以規定:相屬視圖分量為第三視圖。 視頻編碼器20可設定temporal_id以規定NAL單元之時間 識別符。值至temporal_id之指派可受到子位元串流提取程 序約束。根據一些實例,對於全部首碼NAL單元及在存取 單元之MFC延伸NAL單元之經編碼片段,temporal_id之值 相同。當存取單元含有nal_unit_type等於5或idr_flag等於1 之任何NAL單元時,temporal_id可等於0。 在一實例中,可根據下表4來定義用於全解析度增強圖 框之NAL單元標頭。 表 4-nal_unit_header_full_view_extension NAL單元 nal unit header full view extension() { c 描述符 non idr flag 全部 u⑴ anchor pic flag 全部 u⑴ inter view flag 全部 u⑴ second view flag 全部 u(l) temporal id 全部 u(3) reserved two bits 全部 u(2) ) 表4之實例NAL單元標頭可描述該標頭所對應之NAL單 元。non-idr-flag可描述NAL單元是否為瞬時解碼再新 (IDR)圖像。IDR圖像通常為可經獨立地解碼之圖像群組 (GOP)之圖像(例如,經框内編碼圖像),且其中圖像群組 中之全部其他圖像皆可相對於IDR圖像或GOP之其他圖像 被解碼。因此,GOP中無任何圖像係相對於在GOP外之圖 像被預測。anchor_pic_flag指示對應NAL單元是否對應於 錫圖像,亦即,經編碼圖像,在該經編碼圖像中,全部片 158748.doc -44- 201223249 段僅參考同一存取單元内之片段(亦即’未使用框間預 測)。inter_view_fiag指示對應於NAL單元之圖像是否被當 前存取單元中之任何其他視圖分量用於視圖間預測。In this aspect, video encoder 20 may generate a NAL unit header for the encapsulated base circular frame and a separate NAL unit header for the enhanced frame. In some examples, the base layer NAL unit header can be used to indicate that the view of the enhancement layer is predicted from the base layer NAL unit. The enhancement layer NAL unit header can be used to indicate whether the NAL unit belongs to the second view and to derive the second view as a left view. In addition, the enhancement layer NAL unit header can be used for inter-view prediction of another full resolution enhancement frame. In an example, the nal unit header for the base frame can be defined according to Table 3 below: 158748.doc -40- 201223249 Table 3-nal_unit_header_base_view_extension NAL unit nalunit header base view extension() { c descriptor anchor pic flag All u(l) inter view_frame 0 flag all u(1) inter view_ frame 1 flag all u(l) inter layer frame 0 flag all u(l) inter layer frame 1 flag all u(l) temporal id all u(3) The video encoder 20 may set the anchor_pic_flag to a value of 1 to specify that the current NAL unit belongs to the anchor access unit. In an example, video encoder 20 may set anchor_pic_flag to a value of one when the non_idr_flag value is equal to zero. In another example, video encoder 20 may set anchor_pic_flag to a value of zero when the nal_ref_idc value is equal to zero. According to some aspects of the invention, the value of anchor_pic_flag may be the same for all VCL NAL units of an access unit. Video encoder 20 may set inter_view_frame_0_flag to a value of zero to specify that the frame 0 component (eg, the left view) of the current view component (eg, the current layer) is not any other view component in the current access unit (eg, , other layers) for inter-view prediction. Video encoder 20 may set inter_view_frame_0_flag to a value of one to specify that the frame 〇 component of the current view component (e.g., the left view) may be used for inter-view prediction by other view components in the current access unit. The video encoder 20 may set 11^61*_^6\¥_:^&1116_1_[1&§ to a value of 0 to specify that the frame 1 portion of the current view component (eg, the right view) is not Any other view component in the current access unit is used for inter-view prediction. IF 158748.doc -41 - 201223249 The frequency encoder 20 may set the inter_view_frame_l_flag to a value of 1 to specify that the frame 1 portion of the current view component may be used for inter-view prediction by other view components in the current access unit. Video encoder 20 may set inter_layer_frame_0_flag to a value of zero to specify that the frame 0 portion of the current view component (e.g., the left view) is not used for inter-layer prediction by any other view component in the current access unit. Video encoder 20 may set inter_view_frame_0_flag to a value of 1 to specify that the frame portion of the current view component may be used for inter-layer prediction by other view components in the current access unit. Video encoder 20 may set inter_layer_frame_l_flag to a value of zero to specify that the frame 1 portion of the current view component (e.g., the left view) is not used for inter-layer prediction by any other view component in the current access unit. Video encoder 20 may set inter_view_frame_l_flag to a value of 1 to specify that the frame 1 portion of the current view component may be used for inter-layer prediction by other view components in the current access unit. In another example, inter_view_frame_0_flag and inter_view_frame_l_flag may be combined into one flag. For example, if the frame 0 portion or the frame 1 portion is available for inter-view prediction, the video encoder 20 may set the inter_view_flag (a flag indicating a combination of the inter_view_frame_0_:flag and the inter_view_frame_l_flag described above) to 1 value. In another example, inter_layer_frame_0_flag and inter layer_frame_l_flag may be combined into one flag. For example, if the frame 0 portion or the frame 1 portion is available for inter-layer prediction, the video encoder 20 may assign 158748.doc -42 - 201223249 inter-layer_flag (indicating inter-layer_frame_0_flag and inter 1巳乂61'_; The flag of ^&1116_1_0& § is set to a value of 1. In another example, inter_view_frame_0_flag and inter_layer_frame_0_flag may be combined into one flag. For example, if portion 0 of the frame is available for prediction of other view components, video encoder 20 may set inter-component one frame_0_flag (a flag indicating a combination of inter_view_frame_0_flag and inter_layer_frame_0_flag) to a value of one. In another example, inter__view_frame_l_flag and inter_layer_frame_l_Hag may be combined into one flag. For example, if block 1 is available for prediction of other view components, video encoder 20 may set the value to a value of 1 for another year inter_component_frame_l_flag (a flag indicating a combination of inter_view_frame_l_flag and inter_layer_frame_l_flag). In another example, inter_view_flag and inter_layer_flag may be combined into one flag. For example, if the frame portion or the frame 1 portion can be used for inter-view or inter-layer prediction, the video encoder 20 can set the inter_component_flag (the flag indicating the combination of the inter-view-Hag and the inter_layer_flag) to 1 The value. Video encoder 20 may set second_view_flag to indicate that the dependent view component is a second view or a third view, where "associative view component" refers to the view component corresponding to the second view flag. For example, video encoder 20 may set second_view_flag to a value of one to specify that the dependent view component is the second view. Video encoder 20 may set second_view_flag 158748.doc -43 - 201223249 to a value of 以 to specify that the associated view component is the third view. Video encoder 20 may set temporal_id to specify the time identifier of the NAL unit. The assignment of the value to temporal_id can be constrained by the sub-bitstream extractor. According to some examples, the values of temporal_id are the same for all of the first code NAL units and the encoded fragments of the MFC extended NAL units of the access unit. When the access unit contains any NAL unit with nal_unit_type equal to 5 or idr_flag equal to 1, the temporal_id may be equal to zero. In an example, the NAL unit header for the full resolution enhancement frame can be defined according to Table 4 below. Table 4 - nal_unit_header_full_view_extension NAL unit nal unit header full view extension() { c descriptor non idr flag all u(1) anchor pic flag all u(1) inter view flag all u(1) second view flag all u(l) temporal id all u(3) reserved two Bits all u(2) The instance NAL unit header of Table 4 can describe the NAL unit to which the header corresponds. The non-idr-flag may describe whether the NAL unit is an Instantaneous Decoding Renewal (IDR) picture. An IDR image is typically an image of an image group (GOP) that can be independently decoded (eg, an intra-frame encoded image), and wherein all other images in the image group are relative to the IDR map Other images like or GOP are decoded. Therefore, no image in the GOP is predicted relative to the image outside the GOP. Anchor_pic_flag indicates whether the corresponding NAL unit corresponds to a tin image, that is, an encoded image, in which all slices 158748.doc -44 - 201223249 only refer to segments within the same access unit (ie, 'Unused inter-frame predictions.' Inter_view_fiag indicates whether the image corresponding to the NAL unit is used for inter-view prediction by any other view component in the current access unit.

SeC〇nd-View-nag指示對應於NAL·單元之視圖分量為第一 增強層或是第二增強層qemp〇ral—id值規定NAL單元之時 間識別符(其可對應於圖框率)。 模式選擇單元40可自視圖〇圖像及在時間上對應於視圖〇 圖像之視圖1圖像接收呈區塊之形式的原始視頻資料。亦 即,視圖0圖像及視圖丨圖像可能已在實質上相同時間被俘 獲。根據本發明之一些態樣,可降取樣視圖〇圖像及視圖】 圖像’且視頻編碼器可編碼經降取樣圖像。舉例而言,視 頻編碼器20可編碼經封裝圖框中之視圖〇圖像及視圖1圖 像。視頻編碼器20亦可編碼全解析度增強圖框。亦即,視 頻編瑪器20可編碼包括全解析度視圖〇圖像之增強圖框及 包括全解析度視圖1圖像之增強圖框。視頻編碼器2〇可將 視圖0圖像及視圖1圖像之經解碼版本儲存於參考圖框儲存 器64中以促進增強圖框之層間預測及視圖間預測。 運動估計/差別單元42及運動補償單元44可高度地整 合’但出於概念目的而被分離地說明。運動估計為產生運 動向量之程序’該等運動向量估計視頻區塊之運動。舉例 而言,運動向量可指示預測性參考圖框(或其他經編碼單 元)内之預測性區塊相對於經編碼於當前圖框(或其他經編 碼單元)内之當前區塊的位移。預測性區塊為被發現在像 素差方面緊密地匹配於待編碼區塊之區塊,該像素差可藉 158748.doc •45- 201223249 由絕對差和(SAD)、平方差和(SSD)或其他差量度予以判 定。運動向量亦可指示巨集區塊之分割區的位移。運動補 償可涉及基於藉由運動估計/差別單元42判定之運動向量 (或位移向量)來取出或產生預測性區塊。再次,在一些實 例中,運動估計/差別單元42及運動補償單元44可功能地 整合。 運動估計/差別單元42可藉由比較經框間編碼圖像之視 頻區塊與參考圖框儲存器64中參考圖框之視頻區塊來演算 經框間編碼圖像之視頻區塊的運動向量(或差別向量)^運 動補償單元44亦可内插參考圖框之次整數像素,例如,ι 圖框或P圖框。ITU-T H.264標準提及參考圖框「清單」, 例如,清單0及清單^清單〇包括具有早於當前圖像之顯 不次序的參考圖框,而清單丨包括具有遲於當前圖像之顯 示次序的參考圖框。運動估計/差別單元42比較來自參考 圖框儲存器64之一或多個參考圖框之區塊與當前圖像(例 如,P圖像或B圖像)之待編碼區塊。當參考圖框儲存器料 中之參考圖框包括次整數像素之值時,藉由運動估計/差 別單元42 /秀真之運動向;g可指代參考圖框之次整數像素部 位。運動估計/差別單元42將經演算運動向量發送至熵編 碼單元56及運動補償單元44。可將藉由運動向量識別之參 考圖框稱為預測性區塊。運動補償單元44演算參考圖框之 預測性區塊之殘餘誤差值。 運動估計/差別單元42亦可經組態以執行視圖間預測, 在此狀況下’運動估計/差別單元42可演算一個視圖圖像 158748.doc •46· 201223249 (例如,視圖〇)之區塊與參考圖框視圖圖像(例如,視圖1) 之對應區塊之間的位移向量。或者或另外,運動估計/差 另J單元42可經組態以執行層間預測。亦即,運動估計/差 別單元42可經組態以執行以運動為基礎之層間預測,在此 狀况下,運動估計/差別單元42可基於與基礎圖框相關聯 之經縮放運動向量來演算預測子。 如上文所描述,框内預測單元46可相對於作為待編碼區 塊之同一圖框或片段中之一或多個相鄰區塊執行經接收視 頻區塊之框内預測性編碼以提供空間壓縮。根據一些實 例,框内預測單元46可經組態以執行增強圖框之層間預 測。亦即,框内預測單元46可經組態以執行以紋理為基礎 之層間預測,在此狀況下,框内預測單元46可升取樣基礎 圖框且基於基礎圖框及增強圖框中之經共同定位紋理來演 算預測子。在一些實例中,以層間紋理為基礎之預測僅可 用於如下增強圖框之區塊:該增強圖框具有在對應基礎圖 框中經編碼為受約束框内模式之經共同定位區塊。舉例而 &,爻約束框内模式區塊被框内編碼而不參考來自經框間 編碼之相鄰區塊之任何樣本。 根據本發明之態樣,可獨立地編碼該等層(例如,基礎 層、第一增強層及第二增強層)中每一者。假設(例如)視頻 編碼器20編碼三個層:(1)具有視圖〇(例如,左眼視圖)及 視圖1(例如,右眼視圖)之縮減解析度圖像之基礎層;(2) 具有視圖0之全解析度圖像之第一增強層;及(3)具有視圖i 之全解析度圖像之第二增強層。在此實例中,視頻編碼器 158748.doc •47- 201223249 2 0可針對每一層而實施不同編瑪模式(例如,經由模式選 擇單元40) » 在此實例中’運動估計/差別單元42及運動補償單元44 可經組態以框間編碼基礎層之兩個縮減解析度圖像。亦 即’運動估計/差別單元42可藉由比較基礎圖框之圖像之 視頻區塊與參考圖框儲存器64中之參考圖框之視頻區塊來 演算基礎圖框之圖像之視頻區塊之運動向量,而運動補償 單元4 4可演算參考圖框之預測性區塊之殘餘誤差值。或者 或另外’框内預測單元46可框内編碼基礎層之兩個縮減解 析度圖像。 視頻編碼器20亦可實施運動估計/差別單元42、運動補 .償單元44及框内預測單元46以框内預測、框間預測、層間 預測或視圖間預測該等增強層(亦即,第一增強層(例如, 對應於視圖0)及第二增強層(例如,對應於視圖υ)中每一 者。舉例而言,除了框内預測模式及框間預測模式以外, 視頻編碼器20亦可利用基礎層之視圖〇之縮減解析度圖像 以層間預測第一增強層之全解析度圖像。或者,視頻編碼 器20可利用基礎層之視圖!之縮減解析度圊像以視圖間預 測第一增強層之全解析度圖像。根據本發明之一些態樣, 在用層間或視圖間預測方法來預測增強層之前,可升取樣 或以其他方式重新建構基礎層之縮減解析度圖像。 當使用層間預測來預測第一增強層時,視頻編碼器2〇可 使用紋理預測方法或運動預測方法。當使用以紋理為基礎 之層間預測以預測第一增強層時,視頻編碼器2〇可將基礎 158748.doc •48· 201223249 層之視圖〇之圖像升取樣至全解析度,且視頻編碼器2〇可 使用基礎層之視圖〇之圖像之經共同定位紋理作為第一增 強層之圖像之預測子。視頻編碼器2〇可使用多種滤波器 (包括自調適性濾波器)來升取樣基礎層之視圖〇之圖像。視 頻編碼器20可使用與上文關於經運動補償殘餘部分所描述 之方法相同的方法來編碼殘餘部分(例如,預測子與基礎 層之視圖0之圖像中之原始紋理之間的殘餘部分)。在解碼 器(例如,圖1所示之視頻解碼器30)處,解碼器30可使用預 測子及殘餘值來重新建構像素值。 當使用以運動為基礎之層間預測以自基礎層之對應縮減 解析度圖像預測第一增強層時’視頻編碼器2〇可縮放與基 礎層之視圖0之圖像相關聯的運動向量。舉例而言,在視 圖〇之圖像及視圖1之圖像並列地封裝於基礎層中的配置 中’視頻編碼器20可在水平方向上縮放與基礎層之視圖〇 之經預測圖像相關聯的運動向量以補償縮減解析度基礎層 與全解析度增強層之間的差。在一些實例中,視頻編碼器 2〇可藉由用信號傳輸運動向量差(MVD)值來進一步改進與 基礎層之視圖0之圖像相關聯的運動向量,該MVD值考量 同縮減解析度基礎層相關聯之運動向量與同全解析度增強 層相關聯之運動向量之間的差。 在另一實例中,視頻編碼器20可使用運動跳過技術來執 行層間運動預測,該技術被定義於對H.264/AVC之聯合多 視圖視頻模型(「JMVM」)延伸中。jmvM延伸被論述於 (例如)JVT-U207(2006年10月20日至27日,中國杭州第21次 158748.doc •49· 201223249 JVT 會議)中’其可得自 http://ftp3.itu.int/av-arch/jvt-site/2006_10_Hangzhou/JVT-U207.zip。運動跳過技術可使 視頻編碼器20能夠藉由給定差別而重新使用來自在同一時 間例項中但為另一視圖之圓像的運動向量。在一些實例 中,可將差別值全域地用信號傳輸且局域地延伸至使用運 動跳過技術之每一區塊或片段》根據一些態樣,視頻編碼 器20可將差別值設定至零’此係因為用以預測增強層的基 礎層之部分被共同定位。 當使用視圖間預測來預測第一增強層之圖框時,相似於 框間編碼’視頻編碼器2 0可利用運動估計/差別單元4 2以 演算增強層圖框之區塊與參考圖框之對應區塊(例如,基 礎圖框之視圖1之圖像)之間的位移向量。在一些實例中, 視頻編碼器20可在預測第一增強層之前升取樣基礎圖框之 視圖1之圖像。亦即,視頻編碼器2〇可升取樣基礎層之視 圖量之圖像且將經升取樣圖像儲存於參考圖框儲存器 64中’使得該等圖像可用於預測目的。根據一些實例,當 基礎圖框之參考區塊或區塊分割區已被框間編碼時,視頻 編碼器20可僅使用視圖間預測以編碼區塊或區塊分割區。 根據本發明之一些態樣,視頻編碼器2〇可與第一增強層 相似地或相同地編碼第二增強層(例如,對應於視圖1}。亦 即,視頻編碼器20可利用基礎層之視圖縮減解析度圖 像以使用層間預測來預測第二增強層(例如,視圖丨之全解 析度圖像)。視頻編碼器20亦可利用基礎層之視圖〇之縮減 解析度圖像以使用視圖間預測來預測第二增強層。根據此 158748.doc -50- 201223249 實例,該等增強層(亦即,第一增強層及第二增強層)彼此 不相依。相反地,第二增強層僅將基礎層用於預測目的。 或者或另外,視頻編碼器2〇可使用第一增強層(例如, 視圖0之全解析度圖像)來編碼第二增強層(例如,視圖i之 全解析度圖像)以用於預測目的。亦即,第一增強層可用 以使用視圖間預測來預測第二增強層。舉例而言,可將來 自第一增強層之視圖0之全解析度圖像儲存於參考圖框儲 存器64中,使得該等圖像可在編碼第二增強層時用於預測 目的。 變換單元52將諸如離散餘弦變換(DCT)、整數變換或概 念上相似變換之變換應用於殘餘區塊,從而產生包含殘餘 變換係數值之視頻區塊。變換單元52可執行其他變換,諸 如,由H.264標準所定義之變換,該等變換在概念上相似 於DCT。亦可使用小波變換、整數變換、次頻帶變換,或 其他類型之變換。在任何狀況下,變換單元52皆將變換應 用於殘餘區塊,從而產生殘餘變換係數區塊。變換單元今之 可將殘餘資訊自像素值域轉換至諸如頻域之變換域。量化 單元54量化殘餘變換係數以進一步縮減位元率。量化程序 可縮減與該等係數中之一些或全部相關聯的位元深度。可 藉由調整量化參數來修改量化程度。 在量化之後,熵編碼單元56熵編碼經量化變換係數。舉 例而s,熵編碼單元56可執行内容自調適性可變長度編螞 (CAVLC)、上下文自調適性二進位算術編碼(cabac),或 另一熵編碼技術。在藉由熵編碼單元56之熵編碼之後,可 158748.doc 51 201223249 下,上下文可基於相鄰巨集區塊。 將經編碼視頻傳輸至另一器 擷取。在上下文自調適性二 件或予以封存以供稍後傳輸或 進位算術編碼(CABAC)之狀況 在-些狀況下,除了綱編碼以外,視頻編碼器2〇之熵編 碼單元56或另-單元亦可經組態以執行其他編碼功能。舉 例而言,_編碼單元56可經組態以判定巨集區塊及分㈣ 之CBP值4,在-些狀況下,熵編瑪單元%可執行巨集 區塊或其分割區中之係數之延行長度編碼。詳言之’熵編 碼單元56可應用Z字形(zig_zag)掃描或其他掃描型樣以掃 描巨集區塊或分割區中之變換係數’且編碼為零之延行以 供進一步壓縮。熵編碼單元56亦可用適當語法元素來建構 標頭資訊以用於在經編碼視頻位元串流中傳輸。 逆量化單7058及逆變換單元6〇分別應用逆量化及逆變換 以在像素域中重新建構殘餘區塊,例如,以供稍後用作參 考區塊。運動補償單元44可藉由將殘餘區塊加至參考圖框 儲存器64之圖框中之一者之預測性區塊而演算參考區塊。 運動補償單元44亦可將一或多個内插濾波器應用於經重新 建構殘餘區塊以演算次整數像素值以供運動估計中使用。 求和器62將經重新建構殘餘區塊加至藉由運動補償單元44 產生之經運動補償預測區塊以產生經重新建構視頻區塊以 儲存於參考圖框儲存器64中。經重新建構視頻區塊可被運 動估計/差別單元42及運動補償單元44用作參考區塊以框 間編碼後續視頻圖框中之區塊。 為了使能夠進行框間預測及視圖間預測,如上文所描 158748.doc 201223249 述,視 ITU-T 清單1 頻編碼器20可維持—或多個參考清單。舉例而言, H.264標準提及參考圖框「清單」,例如,清單〇及 本發月之態樣與建構提供用於框間預測及視圖間 預測之參相像之訪性排序的參相像料有關。根據 本發月之a ,態樣’視頻編碼器2〇可根據h 規範 中所描述之清單之修改型版本來建構參相像清單。舉例 而言,視頻編碼器20可初始化參考圖像清單(如Η 264/Αν(: 規範中所陳述),此情形維持參考圖像以用於框間預測目 的。根據本發明之態樣,接著將視圖間參考圖像附加至該 清單。 當編碼非基礎層分量(例如,第一或第二增強層)時,視 頻編碼器20可使僅一個視圖間參考可用,舉例而言,當編 碼第一增強層時,視圖間參考圖像可為同一存取單元内基 礎層之經升取樣對應圖像。在此實例中,right dependent_flag可等於1且depViewID可設定至〇。當編碼第 二增強層時’視圖間參考圖像可為同一存取單元内基礎層 之經升取樣對應圖像。在此實例中,full_left_right_ dependent_flag可等於〇且depViewID可設定至〇。或者,視 圖間參考圖像可為同一存取單元中之全解析度第一增強 層。因此 ’ full_left—right_dependent_flag 可等於 〇 且 depViewID可設定至1。用戶端器件可使用此資訊以判定何 資料為成功地解碼增強層而進行擷取所必要。 參考圖像清單可經修改以靈活性地配置參考圖像之次 序。舉例而言,視頻編碼器20可根據下表5來建構參考圖 158748.doc 53- 201223249 像清單: 表 5-ref一pic」ist_mfc_modification() ref pic list mfc modification) { c 描述符 if( slice type % 5 ! = 2 && slice type % 5 ! = 4 ) { ref pic list modification flag 10 2 u(l) if( ref pic list modification flag 10) do { modification of pic nums idc 2 ue(v) if( modification_of_pic_nums_idc = = 0 11 modification of pic nums idc = = 1 ) abs diff pic num minusl 2 ue(v) else if( modification of pic nums idc = = 2 ) long term pic num 2 ue(v) else if (modification_of_pic_nums_idc = = 4 11 modification of pic nums idc = = 5 ) abs diff view idx minusl 2 ue(v) } while( modification of pic nums idc != 3 ) I if( slice type % 5 = = 1) { ref pic list modification flag 11 2 u(l) if( ref pic list modification flag 11 ) do { modiflcation of pic nums idc 2 ue(v) if( modification_of_pic_nums_idc = = 0 11 modification of pic nums idc = = 1 ) abs diff pic num minusl 2 ue(v) else if( modification of pic nums idc = = 2 ) long term pic num 2 ue(v) else if (modification of pic nums idc = = 6) continue; // no extra value needs to be signalled in this case. 1 u⑴ } while( modification of_pic_nums_idc != 3 ) } i 158748.doc -54- 201223249 表5之實例參考圖像清單修改可描述參考圖像清單。舉 例而言,modification_of_pic_nums_idc 連同 abs diff pic num一minusl ' long一term_pic_num 或 abs_diff_view idx minus 1 —起可規定參考圖像或僅視圖間參考分量中哪些被 再映射。對於視圖間預測,視圖間參考圖像及當前圖像可 根據預设而屬於立體内谷之兩個相反視圖。在一也實例 中,視圖間參考圖像可對應於為基礎層之部分的經解媽圖 像。因此,在將經解碼圖像用於視圖間預測之前,可能需 要升取樣。可使用多種濾波器(包括自調適性濾波器,以 及AVC 6分接頭内插濾波器:[1,_5, 2〇, 2〇, _5, 1]/32)來升 取樣基礎層之低解析度圖像。 在另一實例中’對於視圖間預測,視圖間參考圖像可對 應於與當前圖像(例如,同一存取單元中之不同經解碼解 析度)相同的視圖及不同視圖。在該狀況下,如表6(下文) 所示,引入C〇ll〇cated_flag以指示當前圖像及視圖間預測 圖像是否對應於同一視圖。若c〇11〇cated_flag等於i,則視 圖間參考圖像及當前圖像皆可為同一視圖之表示(例如, 左視圖或右視圖,相似於層間紋理預測)。若 collocated—flag等於〇,則視圖間參考圖像及當前圖像可為 不同視圖之表示(例如,一個左視圖圖像及—個右視圖圖 像)〇 158748.doc •55· 201223249 表 6-ref一pic_list_mfc_modification() ref pic list mfc modification() { C 描述符 if( slice type % 5 != 2 && slice—type 0/〇 5 != 4 ) { ref pic list modification flag 10 2 u⑴ if( ref pic list modification flag 10 ) do { modification of pic nums idc 2 ue(v) if( modification_of_pic_nums_idc = = 0 11 modification of pic—nums_ idc = = 1 ) abs diff pic num minusl 2 ue(v) else if( modification of pic nums idc = = 2 ) long term pic num 2 ue(v) else if ( modification_of_pic_nums_idc = = 4 11 modification—of_pic—nums idc = = 5 ) abs diff view idx minusl 2 ue(v) } while( modification—of_pic nums idc != 3 ) ) if( slice type % 5 = = 1 ) { ref pic list modification 2 U⑴ if( ref pic list modification flag—11 ) do { modification of pic nums idc 2 ue(v) if( modification_of_pic_nums_idc = = 〇 11 modification of pic nums—idc = = 1 ) abs diff pic num minusl 2 ue(v) else if( modification of pic nums idc = = 2 ) long term pic num 2 ue(v) else if (modification of pic nums idc = = 6) colocated flag 1 u(l) } while( modification of pic nums idc != 3 ) ) ) 根據本發明之一些態樣,在表7(下文)中規定 158748.doc -56- 201223249 modification_of_pic_nums_idc之值。在一些實例中,緊接 地跟在 ref_pic_list_modification_flag_10 或 ref一pic list modification flag一11 之後的第一 modification of pic nums_idc之值可能不等於3 〇 表 7-modification_of一pie_nums_ide modification of pic nums idc 所規定之修改 0 abs_diffjpic_num_minusl存在且對應於待自圖 像號碼預測值減去之差 1 abs_diff_pic_num_minus 1存在且對應於待加至 圖像號瑪預測值之差 2 longterm_pic_num存在且規定參考圖像之長 期圖像號碼 3 用於初始參考圖像清單之修改之結束迴圈 6 此值指示使用視圖間參考。 根據本發明之態樣,abs_diff_view」dx_minusl加上1可 規定待放至參考圖像清單中之當前索引的視圖間參考索引 與視圖間參考索引之預測值之間的絕對差。在用於上表6 及表7中所呈現之語法之解碼程序期間,當modification_ of_Pic_nums_idc(表7)等於6時,視圖間參考圖像將被放至 當前參考圖像清單之當前索引位置中。 進行以下程序以將具有短期圖像號碼picNumLX之圖像 置放至索引位置refldxLX中、將任何其他剩餘圖像之位置 移位至清單中之較後位置,且遞增refldxLX之值: 158748.doc -57- 201223249 for( cldx = num_ref_idx_lX_active_minusl + 1; cldx > refldxLX; cldx—) RefPicListXf cldx ] = RefPicListX[ cldx - 1]The SeC〇nd-View-nag indicates that the view component corresponding to the NAL unit is the first enhancement layer or the second enhancement layer qemp〇ral_id value specifies the time identifier of the NAL unit (which may correspond to the frame rate). The mode selection unit 40 can receive the original video material in the form of a tile from the view 〇 image and the view 1 image corresponding in time to the view 图像 image. That is, the view 0 image and the view 丨 image may have been captured at substantially the same time. In accordance with some aspects of the present invention, the image and view image can be downsampled and the video encoder can encode the downsampled image. For example, video encoder 20 may encode a view image and a view 1 image in a packaged frame. Video encoder 20 may also encode a full resolution enhancement frame. That is, the video coder 20 can encode an enhanced frame including a full-resolution view image and an enhanced frame including a full-resolution view 1 image. The video encoder 2 may store the decoded version of the view 0 image and the view 1 image in the reference frame store 64 to facilitate inter-layer prediction and inter-view prediction of the enhanced frame. Motion estimation/difference unit 42 and motion compensation unit 44 may be highly integrated' but are separately illustrated for conceptual purposes. Motion estimation is the process of generating motion vectors. These motion vectors estimate the motion of the video block. For example, a motion vector may indicate a displacement of a predictive block within a predictive reference frame (or other coded unit) relative to a current block encoded within a current frame (or other coded unit). The predictive block is a block that is found to closely match the block to be coded in terms of pixel difference, which can be borrowed by 158748.doc •45-201223249 by absolute difference sum (SAD), squared difference sum (SSD) or Other differences are judged. The motion vector may also indicate the displacement of the partition of the macroblock. Motion compensation may involve taking out or generating a predictive block based on a motion vector (or displacement vector) determined by motion estimation/discrimination unit 42. Again, in some embodiments, motion estimation/difference unit 42 and motion compensation unit 44 may be functionally integrated. The motion estimation/discrimination unit 42 may calculate the motion vector of the video block of the inter-coded image by comparing the video block of the inter-coded image with the video block of the reference frame in the reference frame store 64. (or difference vector) ^ Motion compensation unit 44 may also interpolate the sub-integer pixels of the reference frame, for example, an ι frame or a P-frame. The ITU-T H.264 standard refers to the reference frame "list". For example, list 0 and list 〇 list include reference frames having a display order that is earlier than the current image, and list 丨 includes having a later picture than A reference frame like the display order. The motion estimation/discrimination unit 42 compares the block to be coded from the block of one or more reference frames of the reference frame store 64 with the current image (e.g., P picture or B picture). When the reference frame in the reference frame storage material includes the value of the sub-integer pixel, the motion estimation/difference unit 42/show motion direction; g may refer to the sub-integer pixel portion of the reference frame. Motion estimation/difference unit 42 sends the calculated motion vector to entropy encoding unit 56 and motion compensation unit 44. The reference frame identified by the motion vector can be referred to as a predictive block. The motion compensation unit 44 calculates the residual error value of the predictive block of the reference frame. Motion estimation/difference unit 42 may also be configured to perform inter-view prediction, in which case motion estimation/difference unit 42 may calculate a block of view images 158748.doc • 46· 201223249 (eg, view 〇) The displacement vector between the corresponding block and the reference frame view image (for example, view 1). Alternatively or additionally, the motion estimation/difference J unit 42 can be configured to perform inter-layer prediction. That is, motion estimation/difference unit 42 may be configured to perform motion-based inter-layer prediction, in which case motion estimation/difference unit 42 may calculate based on the scaled motion vector associated with the base frame. Forecaster. As described above, in-frame prediction unit 46 may perform in-frame predictive coding of received video blocks to provide spatial compression relative to one or more adjacent blocks in the same frame or segment as the block to be encoded. . According to some embodiments, in-frame prediction unit 46 may be configured to perform inter-layer prediction of the enhancement frame. That is, the in-frame prediction unit 46 can be configured to perform texture-based inter-layer prediction, in which case the in-frame prediction unit 46 can upsample the base frame and based on the base frame and the enhancement frame. Coordinate the texture to calculate the predictor. In some examples, the prediction based on the inter-layer texture can only be used for the block of the enhanced frame having co-located blocks that are encoded as constrained in-frame modes in the corresponding base frame. For example, &, the in-frame mode block is intra-frame coded without reference to any samples from the inter-frame coded neighboring blocks. According to aspects of the invention, each of the layers (e.g., the base layer, the first enhancement layer, and the second enhancement layer) can be independently encoded. It is assumed that, for example, video encoder 20 encodes three layers: (1) a base layer having a reduced resolution image of view 〇 (eg, left eye view) and view 1 (eg, right eye view); (2) having a first enhancement layer of the full resolution image of view 0; and (3) a second enhancement layer having a full resolution image of view i. In this example, video encoder 158748.doc • 47- 201223249 20 may implement different marshalling modes for each layer (eg, via mode selection unit 40) » In this example 'motion estimation/differentiation unit 42 and motion Compensation unit 44 may be configured to encode the two reduced resolution images of the base layer. That is, the motion estimation/differentiation unit 42 can calculate the video region of the image of the base frame by comparing the video block of the image of the base frame with the video block of the reference frame in the reference frame store 64. The motion vector of the block, and the motion compensation unit 44 can calculate the residual error value of the predictive block of the reference frame. Alternatively or additionally, the in-frame prediction unit 46 may encode the two reduced resolution images of the base layer in-frame. The video encoder 20 may also implement the motion estimation/differentiation unit 42, the motion compensation unit 44, and the intra-frame prediction unit 46 to perform enhancements such as intra-frame prediction, inter-frame prediction, inter-layer prediction, or inter-view prediction (ie, An enhancement layer (e.g., corresponding to view 0) and a second enhancement layer (e.g., corresponding to view υ). For example, in addition to the intra-frame prediction mode and the inter-frame prediction mode, video encoder 20 also The resolution image of the base layer can be used to reduce the resolution image to inter-predict the full-resolution image of the first enhancement layer. Alternatively, the video encoder 20 can utilize the view of the base layer to reduce the resolution artifacts to inter-view prediction. Full resolution image of the first enhancement layer. According to some aspects of the invention, the reduced resolution image of the base layer may be upsampled or otherwise reconstructed prior to predicting the enhancement layer using inter-layer or inter-view prediction methods. When inter-layer prediction is used to predict the first enhancement layer, the video encoder 2 may use a texture prediction method or a motion prediction method. When using texture-based inter-layer prediction to predict the first In the strong layer, the video encoder 2 can upsample the image of the base 158748.doc •48· 201223249 layer to full resolution, and the video encoder 2 can use the image of the base layer view. The co-located texture is used as a predictor for the image of the first enhancement layer. The video encoder 2 can use a variety of filters (including self-adaptive filters) to upsample the image of the base layer view. Video Encoder 20 The residual portion (eg, the residual portion between the predictor and the original texture in the image of view 0 of the base layer) may be encoded using the same method as described above with respect to the motion compensated residual portion. (eg, at video decoder 30 shown in Figure 1), decoder 30 may reconstruct the pixel values using the predictor and residual values. When using motion-based inter-layer prediction to reduce the resolution map from the corresponding base layer Like the prediction of the first enhancement layer, the 'video encoder 2' can scale the motion vector associated with the image of view 0 of the base layer. For example, the image of the view and the image of view 1 are juxtaposed. The video encoder 20 can scale the motion vector associated with the predicted image of the base layer view in the horizontal direction to compensate for the reduced resolution base layer and the full resolution enhancement layer. In some examples, video encoder 2 may further improve the motion vector associated with the image of view 0 of the base layer by signaling a motion vector difference (MVD) value that is reduced in the same way. The difference between the motion vector associated with the resolution base layer and the motion vector associated with the full resolution enhancement layer. In another example, video encoder 20 may use motion skipping techniques to perform inter-layer motion prediction, the technique It is defined in the extension of the Joint Multiview Video Model ("JMVM") for H.264/AVC. The jmvM extension is discussed, for example, in JVT-U207 (October 20-27, 2006, Hangzhou, China, 21st) 158748.doc •49· 201223249 JVT Conference) 'It is available from http://ftp3.itu.int/av-arch/jvt-site/2006_10_Hangzhou/JVT-U207.zip. The motion skipping technique can enable video encoder 20 to reuse motion vectors from round images in the same time instance but in another view by a given difference. In some examples, the difference values may be signaled globally and locally extended to each block or segment using motion skipping techniques. According to some aspects, video encoder 20 may set the difference value to zero' This is because the portions of the base layer used to predict the enhancement layer are co-located. When inter-view prediction is used to predict the frame of the first enhancement layer, similar to the inter-frame coding 'video encoder 20 can use the motion estimation/discrimination unit 42 to calculate the block of the enhancement layer frame and the reference frame. The displacement vector between the corresponding block (for example, the image of view 1 of the base frame). In some examples, video encoder 20 may upsample the image of view 1 of the base frame before predicting the first enhancement layer. That is, the video encoder 2 can upsample the image of the base layer and store the upsampled image in the reference frame store 64 such that the images are available for prediction purposes. According to some examples, when the reference block or block partition of the base frame has been inter-frame encoded, video encoder 20 may only use inter-view prediction to encode the block or block partition. In accordance with some aspects of the present invention, video encoder 2 may encode a second enhancement layer (e.g., corresponding to view 1) similarly or identically to the first enhancement layer. That is, video encoder 20 may utilize the base layer The view reduces the resolution image to predict the second enhancement layer (eg, the full resolution image of the view) using inter-layer prediction. Video encoder 20 may also utilize the view of the base layer to reduce the resolution image to use the view. Inter-prediction to predict the second enhancement layer. According to this example 158748.doc -50-201223249, the enhancement layers (ie, the first enhancement layer and the second enhancement layer) are not dependent on each other. Conversely, the second enhancement layer only The base layer is used for prediction purposes. Alternatively or additionally, the video encoder 2 may encode the second enhancement layer using a first enhancement layer (eg, a full resolution image of view 0) (eg, full resolution of view i) Image) for prediction purposes. That is, the first enhancement layer can be used to predict the second enhancement layer using inter-view prediction. For example, full-resolution image storage of view 0 from the first enhancement layer can be stored. Yushen The picture frame store 64 is such that the pictures can be used for prediction purposes when encoding the second enhancement layer. The transform unit 52 applies a transform such as a discrete cosine transform (DCT), an integer transform, or a conceptually similar transform to the residual Blocks, thereby producing video blocks containing residual transform coefficient values. Transform unit 52 may perform other transforms, such as transforms defined by the H.264 standard, which are conceptually similar to DCT. Wavelet transforms may also be used. Integer transform, sub-band transform, or other type of transform. In any case, transform unit 52 applies the transform to the residual block, thereby generating a residual transform coefficient block. The transform unit can now remnant information from the pixel value. The domain is converted to a transform domain such as the frequency domain. Quantization unit 54 quantizes the residual transform coefficients to further reduce the bit rate. The quantization procedure may reduce the bit depth associated with some or all of the coefficients. The degree of quantization is modified. After quantization, entropy encoding unit 56 entropy encodes the quantized transform coefficients. For example, s, entropy encoding unit 56 can execute Content self-adaptive variable length coding (CAVLC), context self-adaptive binary arithmetic coding (cabac), or another entropy coding technique. After entropy coding by entropy coding unit 56, 158748.doc 51 201223249 The context may be based on neighboring macroblocks. The encoded video is transmitted to another device. The context is self-adapting two pieces or is sealed for later transmission or carry arithmetic coding (CABAC). In some cases, in addition to the code, the entropy coding unit 56 or the other unit of the video encoder 2 can also be configured to perform other coding functions. For example, the _ coding unit 56 can be configured to determine the macro. The block and sub-(4) have a CBP value of 4. In some cases, the entropy marshalling unit % can perform the extended length coding of the coefficients in the macroblock or its partition. The 'entropy encoding unit 56 may apply a zigzag (zig_zag) scan or other scan pattern to scan the transform coefficients' in the macroblock or partition and encode a zero extension for further compression. Entropy encoding unit 56 may also construct header information with appropriate syntax elements for transmission in the encoded video bitstream. The inverse quantization unit 7058 and the inverse transform unit 6 应用 apply inverse quantization and inverse transform, respectively, to reconstruct the residual block in the pixel domain, for example, for later use as a reference block. Motion compensation unit 44 may calculate the reference block by adding the residual block to the predictive block of one of the frames of reference frame store 64. Motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for use in motion estimation. Summer 62 adds the reconstructed residual block to the motion compensated prediction block generated by motion compensation unit 44 to produce a reconstructed video block for storage in reference frame store 64. The reconstructed video block can be used by the motion estimation/discrimination unit 42 and the motion compensation unit 44 as reference blocks to encode blocks in subsequent video frames. In order to enable inter-frame prediction and inter-view prediction, as described above, 158748.doc 201223249, the ITU-T List 1 frequency encoder 20 may maintain - or multiple reference lists. For example, the H.264 standard refers to the reference list "list". For example, the list and the format and construction of the month provide a reference image for the interview ranking of the reference between the inter-frame prediction and the inter-view prediction. Related to the material. According to this month's a, the aspect 'video encoder 2' can construct a reference picture list according to a modified version of the list described in the h specification. For example, video encoder 20 may initialize a list of reference images (eg, 264 264/Αν (: stated in the specification), which maintains the reference image for inter-frame prediction purposes. In accordance with aspects of the present invention, An inter-view reference image is appended to the list. When encoding a non-base layer component (eg, a first or second enhancement layer), video encoder 20 may make only one inter-view reference available, for example, when encoding When an enhancement layer is used, the inter-view reference image may be an upsampled corresponding image of the base layer in the same access unit. In this example, the luminance dependent_flag may be equal to 1 and the depViewID may be set to 〇. When encoding the second enhancement layer The inter-view reference image may be an upsampled corresponding image of the base layer in the same access unit. In this example, full_left_right_ dependent_flag may be equal to 〇 and depViewID may be set to 〇. Alternatively, the inter-view reference image may be The full resolution first enhancement layer in the same access unit. Therefore 'full_left_right_dependent_flag can be equal to 〇 and depViewID can be set to 1. The user device can make This information is necessary to determine what data is captured for successful decoding of the enhancement layer. The reference image list can be modified to flexibly configure the order of the reference images. For example, video encoder 20 can be according to Table 5 below. To construct the reference picture 158748.doc 53- 201223249 like list: Table 5 - ref a pic "ist_mfc_modification () ref pic list mfc modification) { c descriptor if (slice type % 5 ! = 2 && slice type % 5 ! = 4 ) { ref pic list modification flag 10 2 u(l) if( ref pic list modification flag 10) do { modification of pic nums idc 2 ue(v) if( modification_of_pic_nums_idc = = 0 11 modification of pic nums idc = = 1 ) abs diff pic num minusl 2 ue(v) else if( modification of pic nums idc = = 2 ) long term pic num 2 ue(v) else if (modification_of_pic_nums_idc == 4 11 modification of pic nums idc == 5 Abs diff view idx minusl 2 ue(v) } while( modification of pic nums idc != 3 ) I if( slice type % 5 = = 1) { ref pic list modification flag 11 2 u(l) if( ref pic List Modification flag 11 ) do { modiflcation of pic nums idc 2 ue(v) if( modification_of_pic_nums_idc = = 0 11 modification of pic nums idc = = 1 ) abs diff pic num minusl 2 ue(v) else if( modification of pic nums idc = = 2 ) long term pic num 2 ue(v) else if (modification of pic nums idc == 6) continue; // no extra value needs to be signalled in this case. 1 u(1) } while( modification of_pic_nums_idc != 3 ) } i 158748.doc -54- 201223249 Example of Table 5 The reference image list modification can describe a list of reference images. For example, modification_of_pic_nums_idc along with abs diff pic num-minusl 'long_term_pic_num or abs_diff_view idx minus 1 may specify which of the reference pictures or only inter-view reference components are remapped. For inter-view prediction, the inter-view reference image and the current image can belong to two opposite views of the stereo valley according to the preset. In an example, the inter-view reference image may correspond to a decoded mom image that is part of the base layer. Therefore, it may be necessary to upsample before the decoded image is used for inter-view prediction. A variety of filters (including self-adaptive filters, and AVC 6 tap interpolation filters: [1,_5, 2〇, 2〇, _5, 1]/32) can be used to upsample the low resolution of the base layer image. In another example, for inter-view prediction, the inter-view reference image may correspond to the same view and different views as the current image (e.g., different decoded resolutions in the same access unit). In this case, as shown in Table 6 (below), C〇ll〇cated_flag is introduced to indicate whether the current image and the inter-view predicted image correspond to the same view. If c〇11〇cated_flag is equal to i, the inter-view reference image and the current image may both be representations of the same view (eg, left or right view, similar to inter-layer texture prediction). If collocated_flag is equal to 〇, the inter-view reference image and the current image can be represented by different views (for example, a left view image and a right view image) 〇158748.doc •55· 201223249 Table 6- Ref_pic_list_mfc_modification() ref pic list mfc modification() { C descriptor if( slice type % 5 != 2 && slice_type 0/〇5 != 4 ) { ref pic list modification flag 10 2 u(1) if ( ref pic list modification flag 10 ) do { modification of pic nums idc 2 ue(v) if( modification_of_pic_nums_idc = = 0 11 modification of pic—nums_ idc = = 1 ) abs diff pic num minusl 2 ue(v) else if( Modification of pic nums idc = = 2 ) long term pic num 2 ue(v) else if ( modification_of_pic_nums_idc = = 4 11 modification—of_pic—nums idc = = 5 ) abs diff view idx minusl 2 ue(v) } while( modification —of_pic nums idc != 3 ) ) if( slice type % 5 = = 1 ) { ref pic list modification 2 U(1) if( ref pic list modification flag—11 ) do { modification of pic nums i Dc 2 ue(v) if( modification_of_pic_nums_idc = = 〇11 modification of pic nums—idc = = 1 ) abs diff pic num minusl 2 ue(v) else if( modification of pic nums idc = = 2 ) long term pic num 2 Ue(v) else if (modification of pic nums idc == 6) colocated flag 1 u(l) } while( modification of pic nums idc != 3 ) ) ) According to some aspects of the invention, in Table 7 (below ) specifies the value of 158748.doc -56- 201223249 modification_of_pic_nums_idc. In some instances, the value of the first modification of pic nums_idc immediately following ref_pic_list_modification_flag_10 or ref-pic list modification flag-11 may not be equal to 3 〇Table 7-modification_of-pie_nums_ide modification of pic nums idc Abs_diffjpic_num_minus1 exists and corresponds to the difference 1 abs_diff_pic_num_minus 1 to be subtracted from the image number prediction value and corresponds to the difference 2 longterm_pic_num to be added to the image number prediction value and the long-term image number 3 of the reference image is used for End of modification of initial reference image list Loop 6 This value indicates the use of inter-view references. According to an aspect of the present invention, abs_diff_view"dx_minusl plus 1 specifies the absolute difference between the inter-view reference index of the current index to be placed in the reference picture list and the predicted value of the inter-view reference index. During the decoding procedure for the syntax presented in Tables 6 and 7, when the modification_of_Pic_nums_idc (Table 7) is equal to 6, the inter-view reference image will be placed in the current index position of the current reference image list. The following procedure is performed to place the image with the short-term image number picNumLX into the index position refldxLX, shift the position of any other remaining image to the later position in the list, and increment the value of refldxLX: 158748.doc - 57- 201223249 for( cldx = num_ref_idx_lX_active_minusl + 1; cldx >refldxLX; cldx—) RefPicListXf cldx ] = RefPicListX[ cldx - 1]

RefPicListX[ refldxLX-H- ] = short-term reference picture with PicNum equal to picNumLXRefPicListX[ refldxLX-H- ] = short-term reference picture with PicNum equal to picNumLX

nldx = refldxLX for( cldx = refldxLX; cldx <= numjref_idx_lX_active_minusl + 1; cldx++ ) if( PicNumF( RefPicListX[ cldx ]) != picNumLX | | viewID(RefPicListX[ cldx ] ) != depViewlD )Nldx = refldxLX for( cldx = refldxLX; cldx <= numjref_idx_lX_active_minusl + 1; cldx++ ) if( PicNumF( RefPicListX[ cldx ]) != picNumLX | | viewID(RefPicListX[ cldx ] ) != depViewlD )

RefPicListX[ nldx-H- ] = RefPicListX[ cldx ] 其中viewID ()返回至每一視圖分量之view_id。當參考圖 像為來自基礎層之圖像之經升取樣版本時,viewID()可返 回至基礎層之同一 view_id,其為0。當參考圖像不屬於基 礎層(例如,參考圖像為第一增強層)時,viewID ()可返回 至適當視圖之viewjd,其可為1(第一增強層)或2(第二增 強層)。 視頻編碼器20亦可用經編碼視頻資料(例如,由解碼器 (解碼器30,圖1)所使用之資訊)來提供特定語法以適當地 解碼經編碼視頻資料。根據本發明之一些態樣,為了使能 夠進行層間預測,視頻編碼器20可在片段標頭中提供語法 元素以指示:(1)在片段中,無區塊被層間紋理預測;(2) 在片段中,全部區塊皆被層間紋理預測;或(3)在片段中, 一些區塊可能被層間紋理預測且一些區塊可能不被層間紋 理預測。另外,視頻編碼器20可在片段標頭中提供語法元 素以指示:(1)在片段中,無區塊被層間運動預測;(2)在 片段中,全部區塊皆被層間運動預測;或(3)在片段中,一 些區塊可能被層間運動預測且一些區塊可能不被層間運動 預測。 另外,為了使能夠進行層間預測,視頻編碼器20可提供 158748.doc -58 - 201223249 在區塊位準下之一些語法資料。舉例而言,本發明之態樣 包括名稱為mb_base_texture—flag之語法元素。此旗標·用 以指示是否針對整個區塊(例如,整個巨集區塊)調用層間 纹理預測。視頻編碼器20可將1111)_5&36」6乂111116_£>138設定為 等於1以用信號傳輸:將對應基礎層中之經重新建構像素 用作參考以使用層間紋理預測來重新建構當前區塊。另 外,視頻編碼器可將mb_base_texture_flag設定為等於1以 用信號傳輸:跳過當前區塊中之其他語法元素之編碼,惟 用於殘餘編碼之語法元素(亦即,CBP、8x8變換旗標,及 係數)除外。視頻編碼器20可將mb_base_texture_flag設定 為等於0以用信號傳輸··應用規則區塊編碼》若區塊為規 則框内區塊,則編碼程序等同於H.264/AVC規範中所陳述 之規則框内區塊編碼。 為了使能夠進行層間預測,視頻編碼器20可提供在區塊 位準下之其他語法資料。舉例而言,本發明之態樣包括名 稱為 mbPart_texture_prediction_flag [mbPartldx]之語法元 素,其經編碼以指示視頻編碼器20是否使用層間預測以編 碼分割區mbPartldx。此旗標可應用於具有框間16x16、 8x16、16x8及8x8之分割區類型之區塊,但通常不應用於 以下8x8之分割區類型之區塊。視頻編碼器20可將 mbPart_texture_prediction_flag設定為等於 1 以指示:將層 間紋理預測應用於對應分割區。視頻編碼器20可將 mbPart_texture_prediction_flag設定為等於 0 以指示:編碼 被稱作 motion_prediction_flag_10/l [mbPartldx]之旗標。視 158748.doc •59- 201223249 頻編碼器20可將motion_prediction_flag_10/l設定為等於1 以指示:可使用基礎層中之對應分割區之運動向量來預測 分割區mbPartldx之運動向量。視頻編碼器20可將 motion_prediction_flag_10/l設定為等於0以指示:可以與 H.264/AVC規範中之方式相同的方式重新建構運動向量。 下文所示之表8包括區塊位準語法元素: 表 8-macroblock_layer_in_mfc_extension() macroblock layer in mfc extension() { C 描述符 mb base texture flag 2 u⑴丨ae(v) if( ! mb base texture flag) { mb type 2 ue(v) | ae(v) if(mb type = = I PCM) { while( !byte aligned()) pcm alignment zero bit 3 f(l) for( i = 0; i < 256; i++) pcm sample luma[ i ] 3 u(v) for( i = 0; i < 2 * MbWidthC * MbHeightC; i++) pcm sample chromaf i 1 3 u(v) } else { noSubMbPartSizeLessThan8x8Flag = 1 if( mb_type != I_NxN && MbPartPredMode( mb type, 0 ) != Intra_16xl6 && NumMbPart( mb type) = = 4) { sub mb pred in mfc extension( mb type) 2 for( mbPartldx = 0; mbPartldx < 4; mbPartldx-H-) if( sub_mb_type[ mbPartldx ] != B Direct 8x8 ) { if( NumSubMbPart( sub_mb_type [mbPartldx 1) > 1) noSubMbPartSizeLessThan8x8Flag = 0 158748.doc -60- 201223249 } else if(! direct 8x8 inference flag ) noSubMbPartSizeLessThan8x8Flag = 0 } else { if( transform一8x8_mode_flag && mb一type == I NxN) transform size_8x8 flag 2 u⑴ | ae(v) mb jpred in mfc extension( mb type) 2 I I ) if( scan idx end >= scan idx start) { if( base_mode一flag 11 MbPartPredMode( mb—type, 0) != Intra—16x16 ) { coded block pattern 2 me(v) | ae(v) if( CodedBlockPattemLuma > 0 && transform一8x8一mode一flag && (base一mode一flag 11 (mb_type != I_NxN && noSubMbPartSizeLessThan8x8Flag && (mb_type != B一Direct一 16x16 丨 | direct—8x8 inference flag )))) transform size 8x8 flag 2 u(l)|ae(v) ) if( CodedBlockPattemLuma〉0 | 丨 CodedBlockPattemChroma > 0 11 ( MbPartPredMode( mb type,0) = = Intra—16x16 ) ) { mb qp delta 2 se(v) | ae(v) residual(scan—idx—start,scan idx end) 3|4 } ) } 在表8所示之實例中,視頻編碼器20可將mb_base_ texture_flag設定為等於1以指示:針對整個巨集區塊應用 158748.doc •61 · 201223249 層間紋理預測"另外,視頻編碼器20可將mb_base_ texture_flag設定為等於0以指示:在「多視圖圖框相容」 MFC結構中,語法元素mb_type及其他有關語法元素存在 於巨集區塊中。 下文所示之表9亦包括區塊位準語法元素: 表 9-mb_pred_in_mfc_extension( mb_type ) mb pred in mfc extension( mb type) { C 描述符 if( MbPartPredMode( mb_type5 0) = = Intra_4x4 11 MbPartPredMode( mb type, 0) = = Intra_8x8 11 MbPartPredMode( mb type, 0) = = Intra 16x16 ) { if( MbPartPredMode( mb type, 0) = = Intra 4x4 ) for( luma4x4BlkIdx = 0; luma4x4BlkIdx < 16; luma4x4BlkIdx-H-) { prev intra4x4 pred mode flag[ luma4x4BlkIdx ] 2 u(l) | ae(v) if( !prev intra4x4 pred mode flagf Iuma4x4BIkIdx ]) rem intra4x4 pred mode[ luma4x4BlkIdx ] 2 u(3) | ae(v) ) if( MbPartPredMode( mb type, 0) = = Intra 8x8 ) for( luma8x8BlkIdx = 0; luma8x8BlkIdx < 4; luma8x8BlkIdx-H-) { prev intra8x8 pred mode flagf luma8x8BlkIdx ] 2 u⑴丨ae(v) if( !prev intra8x8 pred mode flagf luma8x8BlkIdx 1) rem intra8x8 pred mode[ luma8x8BlkIdx Ί 2 u(3) | ae(v) I if( ChromaArrayType != 0) intra chroma pred mode 2 ue(v) | ae(v) } else if( MbPartPredMode( mb type, 0) ! = Direct) { for( mbPartldx = 0; mbPartldx < NumMbPart( mb_type ); mbPartldx-H-) mbPart texture prediction flagf mbPartldx 1 2 u(l) | ae(v) for( mbPartldx = 0; mbPartldx < NumMbPart( mb type); -62- 158748.doc 201223249 mbPartIdx++) if(! mbPart_texture_prediction_flag[ mbPartldx ] &&MbPartPredMode( mb_type, mbPartldx ) != Pred LI) motion prediction flagJ0[ mbPartldx ] 2 u(l) | ae(v) for( mbPartldx = 0; mbPartldx < NximMbPart( mb_type); mbPartIdx++) if(! mbPart_texture_prediction_flag[ mbPartldx ] &&MbPartPredMode( mb_type, mbPartldx ) != Pred L0) motion prediction flag 11『mbPartldx 1 2 u(l) | ae(v) for( mbPartldx = 0; mbPartldx < NumMbPart( mb type); mbPartldx-H-) if( (! mbPart_texture_prediction_flag[ mbPartldx ] && !motion_prediction一flag—10 [ mbPartldx ]&& (num_ref_idx_10_active_minusl > 0 丨丨 mb—field_ decoding_flag) && MbPartPredMode( mb_type, mbPartldx ) != Pred LI) ref idx 10[ mbPartldx ] 2 te(v) | ae(v) for( mbPartldx = 0; mbPartldx < NumMbPart( mb__type); mbPartIdx++) if((! mbPart_texture_prediction_flag[ mbPartldx ] && ! motionjprediction_flag_ll [ mbPartldx ]&& (num_ref_idx_ll_active_minusl > 0 11 mb_field_ decoding一flag )&& MbPartPredMode( mb_type, mbPartldx ) != Pred_L0 && Imotion prediction flag_ll[ mbPartldx ]) ref一idx—11 [ mbPartldx ] 2 te(v) | ae(v) for( mbPartldx = 0; mbPartldx < NumMbPart( mb type); mbPartIdx++) if(mbPart_texture_prediction_flag[ mbPartldx ] && MbPartPredMode ( mb一type,mbPartldx ) != Pred LI) 158748.doc -63- 201223249RefPicListX[ nldx-H- ] = RefPicListX[ cldx ] where viewID () returns to the view_id of each view component. When the reference image is an upsampled version of the image from the base layer, viewID() can be returned to the same view_id of the base layer, which is zero. When the reference image does not belong to the base layer (eg, the reference image is the first enhancement layer), the viewID() may be returned to the viewjd of the appropriate view, which may be 1 (first enhancement layer) or 2 (second enhancement layer) ). Video encoder 20 may also use encoded video material (e.g., information used by a decoder (decoder 30, Fig. 1)) to provide a particular syntax to properly decode the encoded video material. In accordance with some aspects of the present invention, to enable inter-layer prediction, video encoder 20 may provide syntax elements in the slice header to indicate: (1) in the segment, no block is predicted by inter-layer texture; (2) In the segment, all blocks are predicted by the inter-layer texture; or (3) in the segment, some blocks may be predicted by the inter-layer texture and some blocks may not be predicted by the inter-layer texture. In addition, video encoder 20 may provide syntax elements in the slice header to indicate: (1) in the segment, no block is predicted by inter-layer motion; (2) in the segment, all blocks are predicted by inter-layer motion; or (3) In the segment, some blocks may be predicted by inter-layer motion and some blocks may not be predicted by inter-layer motion. In addition, to enable inter-layer prediction, video encoder 20 may provide some grammar data at 158748.doc -58 - 201223249 at block level. For example, aspects of the invention include a syntax element named mb_base_texture_flag. This flag is used to indicate whether inter-layer texture prediction is invoked for the entire block (for example, the entire macro block). Video encoder 20 may set 1111)_5&36"6乂111116_£>138 equal to 1 for signal transmission: use the reconstructed pixels in the corresponding base layer as a reference to reconstruct the current texture using inter-layer texture prediction. Block. In addition, the video encoder may set mb_base_texture_flag equal to 1 to signal: skip the encoding of other syntax elements in the current block, but use the syntax elements for residual encoding (ie, CBP, 8x8 transform flags, and Except for the coefficient). Video encoder 20 may set mb_base_texture_flag equal to 0 to signal the application of regular block coding. If the block is an intra-block, the encoding procedure is equivalent to the rule box stated in the H.264/AVC specification. Inner block coding. In order to enable inter-layer prediction, video encoder 20 may provide other syntax material at block level. For example, aspects of the present invention include a syntax element called mbPart_texture_prediction_flag [mbPartldx] that is encoded to indicate whether video encoder 20 uses inter-layer prediction to encode partition mbPartldx. This flag can be applied to blocks with partition types of 16x16, 8x16, 16x8, and 8x8 between frames, but is generally not applied to the following 8x8 partition type blocks. Video encoder 20 may set mbPart_texture_prediction_flag equal to 1 to indicate that the inter-layer texture prediction is applied to the corresponding partition. Video encoder 20 may set mbPart_texture_prediction_flag equal to 0 to indicate that the encoding is referred to as a motion_prediction_flag_10/l [mbPartldx] flag. 158 748.doc • 59- 201223249 The frequency encoder 20 may set the motion_prediction_flag_10/l equal to 1 to indicate that the motion vector of the partition mbPartldx may be predicted using the motion vector of the corresponding partition in the base layer. Video encoder 20 may set motion_prediction_flag_10/l equal to 0 to indicate that motion vectors may be reconstructed in the same manner as in the H.264/AVC specification. Table 8 shown below includes block level syntax elements: Table 8 - macroblock_layer_in_mfc_extension() macroblock layer in mfc extension() { C descriptor mb base texture flag 2 u(1)丨ae(v) if( ! mb base texture flag) { mb type 2 ue(v) | ae(v) if(mb type = = I PCM) { while( !byte aligned()) pcm alignment zero bit 3 f(l) for( i = 0; i <256; i++) pcm sample luma[ i ] 3 u(v) for( i = 0; i < 2 * MbWidthC * MbHeightC; i++) pcm sample chromaf i 1 3 u(v) } else { noSubMbPartSizeLessThan8x8Flag = 1 if( mb_type != I_NxN && MbPartPredMode( mb type, 0 ) != Intra_16xl6 && NumMbPart( mb type) = = 4) { sub mb pred in mfc extension( mb type) 2 for( mbPartldx = 0; mbPartldx &lt ; 4; mbPartldx-H-) if( sub_mb_type[ mbPartldx ] != B Direct 8x8 ) { if( NumSubMbPart( sub_mb_type [mbPartldx 1) > 1) noSubMbPartSizeLessThan8x8Flag = 0 158748.doc -60- 201223249 } else if(! direct 8x8 inference flag ) noSubMbPartSizeLessThan8x8Flag = 0 } else { i f(transform-8x8_mode_flag && mb_type == I NxN) transform size_8x8 flag 2 u(1) | ae(v) mb jpred in mfc extension( mb type) 2 II ) if( scan idx end >= scan idx start ) { if( base_mode_flag 11 MbPartPredMode( mb—type, 0) != Intra—16x16 ) { coded block pattern 2 me(v) | ae(v) if( CodedBlockPattemLuma > 0 && transform one 8x8 one Mode_flag && (base_mode_flag 11 (mb_type != I_NxN && noSubMbPartSizeLessThan8x8Flag && (mb_type != B_Direct_16x16 丨| direct—8x8 inference flag ))))) 8x8 flag 2 u(l)|ae(v) ) if( CodedBlockPattemLuma>0 | 丨CodedBlockPattemChroma > 0 11 ( MbPartPredMode( mb type,0) = = Intra—16x16 ) ) { mb qp delta 2 se(v) | Ae(v) residual(scan_idx_start, scan idx end) 3|4 } ) } In the example shown in Table 8, video encoder 20 may set mb_base_ texture_flag equal to 1 to indicate: for the entire macro Block Application 158748.doc •61 · 201223249 Interlayer Prediction " In addition, video encoder 20 may mb_base_ texture_flag set equal to zero to indicate that: "multi-view frame compatible" MFC in the structure, and other syntax elements mb_type syntax elements present in the relevant macro block. Table 9 shown below also includes block level syntax elements: Table 9 - mb_pred_in_mfc_extension ( mb_type ) mb pred in mfc extension ( mb type) { C descriptor if( MbPartPredMode( mb_type5 0) = = Intra_4x4 11 MbPartPredMode( mb type , 0) = = Intra_8x8 11 MbPartPredMode( mb type, 0) = = Intra 16x16 ) { if( MbPartPredMode( mb type, 0) = = Intra 4x4 ) for( luma4x4BlkIdx = 0; luma4x4BlkIdx <16; luma4x4BlkIdx-H-) { prev intra4x4 pred mode flag[ luma4x4BlkIdx ] 2 u(l) | ae(v) if( !prev intra4x4 pred mode flagf Iuma4x4BIkIdx ]) rem intra4x4 pred mode[ luma4x4BlkIdx ] 2 u(3) | ae(v) ) if( MbPartPredMode( mb type, 0) = = Intra 8x8 ) for( luma8x8BlkIdx = 0; luma8x8BlkIdx <4; luma8x8BlkIdx-H-) { prev intra8x8 pred mode flagf luma8x8BlkIdx ] 2 u(1)丨ae(v) if( !prev intra8x8 pred mode Flagf luma8x8BlkIdx 1) rem intra8x8 pred mode[ luma8x8BlkIdx Ί 2 u(3) | ae(v) I if( ChromaArrayType != 0) intra chroma pred mode 2 ue(v) | ae(v) } else if( MbPartPredMode( mb Type, 0) ! = Direct) { for( mbPartldx = 0; mbPartldx < NumMbPart( mb_type ); mbPartldx-H-) mbPart texture prediction flagf mbPartldx 1 2 u(l) | ae(v) for( mbPartldx = 0; mbPartldx < NumMbPart( Mb type); -62- 158748.doc 201223249 mbPartIdx++) if(! mbPart_texture_prediction_flag[ mbPartldx ] &&MbPartPredMode( mb_type, mbPartldx ) != Pred LI) motion prediction flagJ0[ mbPartldx ] 2 u(l) | ae(v ) for( mbPartldx = 0; mbPartldx < NximMbPart( mb_type); mbPartIdx++) if(! mbPart_texture_prediction_flag[ mbPartldx ] &&MbPartPredMode( mb_type, mbPartldx ) != Pred L0) motion prediction flag 11『mbPartldx 1 2 u(l ) | ae(v) for( mbPartldx = 0; mbPartldx < NumMbPart( mb type); mbPartldx-H-) if( (! mbPart_texture_prediction_flag[ mbPartldx ] && !motion_prediction_flag_10 [ mbPartldx ]&&; (num_ref_idx_10_active_minusl > 0 丨丨mb_field_decoding_flag) && MbPartPredMode( mb_type, mbPartldx ) != Pred LI) ref idx 10[ mbPartldx ] 2 te(v) | ae(v) for( mbPartldx = 0; mbPartldx < NumMbPart( mb__type); mbPartIdx++) if((! mbPart_texture_prediction_flag[ mbPartldx ] && ! motionjprediction_flag_ll [ mbPartldx ]&& (num_ref_idx_ll_active_minusl &gt ; 0 11 mb_field_ decoding_flag )&& MbPartPredMode( mb_type, mbPartldx ) != Pred_L0 && Imotion prediction flag_ll[ mbPartldx ]) ref idx—11 [ mbPartldx ] 2 te(v) | ae(v) For( mbPartldx = 0; mbPartldx < NumMbPart( mb type); mbPartIdx++) if(mbPart_texture_prediction_flag[ mbPartldx ] && MbPartPredMode ( mb_type,mbPartldx ) != Pred LI) 158748.doc -63- 201223249

for( compldx = 0; compldx < 2; compIdx++) mvd JOf mbPartldx ][ 0 ][ compldx ] 2 se(v) | ae(v) for( mbPartldx = 0; mbPartldx < NumMbPart( mb_type); mbPartldx-H-) if(mbPart_texture_prediction_£lag[ mbPartldx ]&& MbPartPredMode( mb—type,mbPartldx ) != Pred L0 ) for( compldx = 0; compldx < 2; compIdx++) mvd Ilf mbPartldx ]f 0 ]Γ compldx 1 2 se(v) | ae(v) ) I 在表8所示之實例中,視頻編碼器20可將mbPart_ texture_prediction_flag[ mbPartldx ]設定為等於1以指示: 針對對應分割區mbPartldx調用層間紋理預測。視頻編碼 器 20 可將 mbPart_texture_prediction_flag設定為等於 〇 以指 示:針對分割區mbPartldx未調用層間紋理預測。另外’ 視頻編碼器 20可將 motion—prediction_flag_ll/0[mbPartIdx] 設定為等於1以指示:將使用基礎層之運動向量作為參考 的替代運動向量預測程序用於導出巨集區塊分割區 mbPartldx之清單1/0運動向量,及自基礎層推斷巨集區塊 分割區mbPartldx之清單1/0參考索引。 下文所示之表10亦包括子區塊位準語法元素: 表 10-sub_mb_pred_in_mfc_extensi〇n(mb—type ) sub mb pred in mfc extension( mb type) { c 描述符 for( mbPartldx = 0; mbPartldx < 4; mbPartIdx++) { mbPart texture prediction flag f mbPartldx 1 2 u(l) | ae(v) if(!texture prediction flag[ mbPartldx 1) sub mb type[mbPartldx] 2 ue(v) | ae(v) } 158748.doc -64- 201223249 for( mbPartldx = 0; mbPartldx < 4; mbPartldx-H-) if(!mbPart_texture_prediction_flag [ mbPartldx ] && SubMbPredMode( sub_mb_type[ mbPartldx ] ) != Direct && SubMbPredMode( sub_mb_type[ mbPartldx ]) != Pred LI) motion prediction flag 10f mbPartldx ] 2 u(l)|ae(v) for( mbPartldx = 0; mbPartldx < 4; mbPartIdx++) if(!mbPart_texturej3rediction_flag [ mbPartldx ] && SubMbPredMode( sub_mb_type[ mbPartldx ] ) != Direct && SubMbPredMode( sub_ mb type[ mbPartldx ]) != Pred L0) motion prediction flag Ilf mbPartldx 1 2 u(l) | ae(v) for( mbPartldx = 0; mbPartldx < 4; mbPartIdx++) if(! mbPart_texture_prediction_flag [ mbPartldx ] !motion_prediction_flag_10[ mbPartldx ] && && (num_ref_idx_10_active_minusl > 0 | | mb_field_ decoding_flag )&& mb_type != P_8x8refD && sub_mb_type [mbPartldx ] != B_Direct_8x8 &&SubMbPredMode( sub_mb_type[ mbPartldx ] ) != Pred LI))) ref—idx—10 [ mbPartldx 1 2 te(v) | ae(v) for( mbPartldx = 0; mbPartldx < 4; mbPartldx-H-) if(! mbPart_texture_prediction_flag [ mbPartldx ] !motion_prediction_flag_ll [ mbPartldx ] && && (num_ref_idx_ll_active_minusl > 0 | | mb_field_decoding_ flag) && sub_mb_type[ mbPartldx ] != B_Direct_8x8 && SubMbPredMode( sub_mb_type[ mbPartldx ] ) != Pred L0 ))) ref一idx Ilf mbPartldx 1 2 te(v) | ae(v) for( mbPartldx = 0; mbPartldx < 4; mbPartldx-H-) if(! rabPart_texture_prediction_flag [ mbPartldx ] && sub一mb一type[ mbPartldx ] != B_Direct一8x8 && SubMbPredMode( sub_mb_type[ mbPartldx ] ) != Pred LI) •65- 158748.doc 201223249 for( subMbPartldx = 0; subMbPartldx < NumSubMbPart( sub_mb_type[ mbPartldx ]); subMbPartldx-H-) for( compldx = 0; compldx < 2; compIdx-H-) mvd_10[ mbPartldx ][ subMbPartldx ][ compldx ] 2 se(v) | ae(v) for( mbPartldx = 0; mbPartldx < 4; mbPartIdx++) if( ! mbPart_texture_prediction_flag [ mbPartldx ] && sub_mb_type[ mbPartldx ] != B_Direct_8x8 && SubMbPredMode( sub_mb_type[ mbPartldx ] ) != Pred L0 ) for( subMbPartldx = 0; subMbPartldx < NumSubMbPart( sub_mb_type[ mbPartldx ]); subMbPartIdx++) for( compldx = 0; compldx < 2; compIdx-H-) mvd 11[ mbPartldx ][ subMbPartldx If compldx 1 2 se(v) | ae(v) } 在表10所示之實例中,視頻編碼器20可將mbPart_ texture_prediction_flag[ mbPartldx ]設定為等於1以指示: 針對對應分割區mbPartldx調用層間紋理預測《視頻編碼 器 20 可將 mbPart_texture_prediction_flag設定為等於 0 以指 示:針對分割區mbPartldx未調用層間紋理預測。 視頻編碼器 20 可將 motion_prediction_flag_ll/0 [mbPartldx]設定為等於1以指示:將使用基礎層之運動向 量作為參考的替代運動向量預測程序用於導出巨集區塊分 割區mbPartldx之清單1/0運動向量,及自基礎層推斷巨集 區塊分割區mbPartldx之清單1/0參考索引。 視頻編碼器20可能不設定motion_prediction_flag_ll/0 -66- 158748.doc 201223249 [mbPartldx]旗標(例如’不存在旗標)以指示:未將層間運 動預測用於巨集區塊分割區mbPartldx。 根據本發明之一些態樣’視頻編碼器20可啟用或停用在 片段標頭位準下之 mb_base_texture_flag、mbPart_texture prediction—flag 及 motion一prediction_flag_ll/0。舉例而 言,當片段中之全部區塊皆具有相同特性時,在片段位準 下而非在區塊位準下用信號傳輸此等特性可能會提供相對 位元節約。 以此方式’圖2A為說明可實施用於產生可縮放多視圖位 元串流之技術之視頻編碼器2 〇之實例的方塊圖,該可縮放 多視圖位元串流具有包括對應於一場景之兩個視圖(例 如,左眼視圖及右眼視圖)之兩個縮減解析度圖像之一個 基礎層,以及兩個額外增強層。第一增強層可包括基礎層 之視圖中之一者之全解析度圖像,而第二增強層可包括基 礎層之另一各別視圖之全解析度圖像。 再次,應理解’圖2A之特定組件可出於概念目的而關於 單一組件予以展示及描述’但可包括一或多個功能單元。 舉例而言,如關於圖2B更詳細地所描述,運動估計/差別 單元42可包含用於執行運動估計及運動差別演算之分離單 元。 圖2B為說明可實施用於產生可縮放多視圖位元串流之技 術之視頻編碼器之另一實例的方塊圖,該可縮放多視圖位 元_ 具有一個基礎層及兩個增強層。如上文所提到,視 頻編碼器20之特定組件可關於單一组件予以展示及描述, 158748.doc -67· 201223249 但可包括一個以上離散及/或整合式單元。此外,視頻編 碼器20之特定組件可高度地整合或併入至同一實體組件 令’但出於概念目的而予以分離地說明。因此,圖2B所示 之實例可包括與圖2A所示之視頻編碼器20之組件相同的許 多組件’但以替代配置予以展示以在概念上說明三個層 (例如’基礎層142、第一增強層84及第二增強層86)之編 碼。 圖2B所示之實例說明視頻編碼器2〇產生包括三個層之可 縮放多視圖位元申流。如上文所描述,該等層中每一者可 包括構成多媒體内容之一系列圖框。根據本發明之態樣, 該二個層包括基礎層82、第一增強層84及第二增強層86。 在一些實例中’基礎層142之圖框可包括兩個並列式經封 裝縮減解析度圖像(例如,左眼視圖(「B1」)及右眼視圖 (「B2」))。第一增強層可包括基礎層之左眼視圖之全解析 度圖像(「E1」),且第二增強層可包括基礎層之右眼視圖 之全解析度圖像(「E2」)。然而,圖2B所示之基礎層配置 及增強層序列僅僅係作為一個實例而提供。在另一實例 中’基礎層82可包括呈替代封裝配置(例如,上下式、列 交錯式、行交錯式、棋盤式及其類似者)之縮減解析度圖 像。此外’第一増強層可包括右眼視圖之全解析度圖像, 而第二增強層可包括左眼視圖之全解析度圖像。 在圖2B所不之實例中,視頻編碼器2〇包括三個框内預測 單元46及二個運動估計/運動補償單元9〇(例如,其可與圖 2A所示之組合式運動估計/差別單元42及運動補償單元44 158748.doc •68· 201223249 相似地或相同地被組態),其中每一層82至86具有一關聯 框内預測單元46及運動估計/補償單元90。另外,第一增 強層84及第二增強層86各自與包括層間紋理預測單元 及層間運動預測單元1 〇2之層間預測單元(藉由虛線%分組) 以及視圖間預測單元1〇〇相關聯。圖2B之剩餘組件可與圖 2A所不之組件相似地被組態。亦即,求和器5〇及參考圖框 儲存器64可在兩個表示中相似地被組態,而圖⑶之變換及 置化單元114可與圖2 A所示之組合式變換單元52及量化單 兀*54相似地被組態。另外,圖⑸之逆量化/逆變換單元/重 新建構/解區塊單元122可與圖2A所示之組合式逆量化單元 58及逆變換單元6〇相似地被組態。模式選擇單元在圖a 中表示為開關,其在該等預測單元中每一者之間進行雙態 觸發,其可(例如)基於誤差結果而選擇編碼方式(框内、框 間、層間運動、層間紋理,或視圖間)中之一者。 一般而言,視頻編碼器20可使用上文關於圖2八所描述之 框内或框間編碼方法來編碼基礎層82(>舉例而言,視頻編 碼器20可❹框内預測單元46來框⑽碼包括於基礎⑽ 縮減解析度圖像。視頻編碼器2〇可使用運動估計/補 f單元90(例如,其可與圖2績示之組合式運動估計/差別 單元42及㈣補償單元44彳目㈣或相同地被組態)來框間 編碼包括於基礎層82中之縮減解析度圖像。另夕卜視頻編 碼器20可使用框内預測單元46來框内編碼第—增強層㈣ 第二增強層,或使用運動補償估計/補償單元9〇來框間編 蜗第一增強層84或第二增強層86。 158748.doc •69· 201223249 根據本發明之態樣,視頻編碼器20亦可實施特定其他視 圖間或層間編碼方法以編碼第一增強層84及第二增強層 86。舉例而言,視頻編碼器2〇可使用層間預測單元(藉由 虛線98分組)以編碼第一增強層84及第二增強層%。舉例 而言,根據第一增強層84包括左眼視圖之全解析度圖像的 實例,視頻編碼器20可使用層間預測單元98以自基礎層之 左眼視圖(例如’ B1)之縮減解析度圖像來層間預測第一增 強層84。此外,視頻編碼器2〇可使用層間預測單元98以自 基礎層之右眼視圖(例如,B2)之縮減解析度圖像來層間預 測第二增強層86。在圖2B所示之實例中,層間預測單元98 可自與基礎層82相關聯之運動估計/補償單元9〇接收資料 (例如’運動向量資料、紋理資料及其類似者)。 在圖2B所示之實例中,層間預測單元98包括用於層間紋 理預測第一增強圖框84及第二增強圖框86之層間紋理預測 單元100 ’以及用於層間運動預測第一增強圖框84及第二 增強圖框86之層間運動預測單元1 〇2。 視頻編碼器20亦可包括視圖間預測單元1 〇6以視圖間預 測第一增強層84及第二增強層86。根據一些實例,視頻編 碼器20可自基礎層之右眼視圖(B2)之縮減解析度圖像來視 圖間預測第一增強層84(例如,左眼視圖之全解析度圖 像)。相似地,視頻編碼器20可自基礎層之左眼視圖(b 1)之 縮減解析度圖像來視圖間預測第二增強層86(例如,右眼 視圖之全解析度圖像)。此外,根據一些實例,視頻編碼 器20亦可基於第一增強層84來視圖間預測第二增強層86。 158748.doc -70- 201223249 在藉由變換及量化單元114執行的殘餘變換係數之變換 及量化之後,視頻編碼器20可用熵編碼及多工單元118來 執行經量化殘餘變換係數之熵編碼及多工。亦即,熵編碼 及多工單元118可編碼經量化變換係數,例如,執行内容 自調適性可變長度編碼(CAVLC)、上下文自調適性二進位 算術編碼(CABAC)或另一熵編碼技術(如關於圖2A所描 述)。另外,熵編碼及多工單元118可產生語法資訊,諸 如,經編碼區塊型樣(CBP)值、巨集區塊類型、編碼模 式、經編碼單元(諸如,圖框、片段、巨集區塊或序列)之 最大巨集區塊大小,或其類似者。熵編碼及多工單元i i 8 可將此經壓縮視頻資料格式化成所謂「網路抽象層單元」 或NAL單元《每一NAL單元包括識別儲存至nal單元之資 料之類型的標頭。根據本發明之—些態樣,如上文關於圖 2A所描述,視頻編碼器2〇可將不同於用於第一增強層料及 第一增強層86之NAL格式的NAL格式用於基礎層82。 再次’雖,然圖2B所示之特定組件可表示為相異單元,但 應理解,視頻編碼器2G之特定組件可高度地整合或併入至 同一實體組件令。因此,作為-實例,雖然圖2B包括三個 離散柜内關單元46,但視頻編碼n2()可使㈣—實體組 件以執行框内預測。 圖3為說明視頻解碼器3〇之實例的方塊圖,視頻解碼器 3〇解碼經編碼視頻序列°在圖3之實财,視頻解碼器3〇 包括熵解踢單元13()、運動補償單元m、框内預測單元 ⑴、逆量化單元136、逆變換單元138、參考圖框錯存器 158748.doc -71. 201223249 142及求和器140。在一些實例中,視頻解碼器3〇可執行與 關於視頻編碼器20(圖2A及圖2B)所描述之編碼遍次 (encoding pass)大體上互逆的解碼遍次(心⑶⑴叩pass)。 詳S之,視頻解碼器30可經組態以接收包括基礎層、第 一增強層及第二增強層之可縮放多視圖位元串流。視頻解 碼器30可接收指示用於基礎層之圖框封裝配置、增強層之 次序的資訊,以及用於適當地解碼可縮放多視圖位元串流 之其他資訊。舉例而言,視頻解碼器3〇可經組態以解譯 「多視圖圖框相容」(MFC)SPS及SEI訊息。視頻解碼器3〇 亦可經組態以判定是解碼多視圖位元串流之全部三個層, 或是僅解碼該等層之子集(例如,基礎層及第一增強層\。 此判定可基於視頻顯示器32(1)是否能夠顯示三維視頻資 料、視頻解碼器30是否具有解碼特定位元率及/或圖框率 之多個視ffil (及升取樣特定位元率及/或圖框率之縮減解析 度視圖)之能力’或關於視頻解碼器3〇及/或視頻顯示器32 之其他因素。 當目的地器件14不能夠解螞及/或顯示三維視頻資料 時,視頻解碼器30可將經接收基礎層解封裝成構成縮減解 析度經編碼圖像’接著捨棄該等縮減解析度經編碼圖像中 之一者。因此,視頻解碼器3〇可推選僅解碼基礎層之一半 (例如,左眼視圖之圖像)。另外,視頻解碼器30可推選僅 解碼该等增強層中之-者。亦即,視頻解碼器⑽可推選解 碼對應於基礎圖框之經保留縮減解析度圖像的增強層,同 時捨棄對應於基礎層之經捨棄圖像的增強層。藉由保留該 158748.doc -72· 201223249 等增強層中之一者’視頻解碼器30可能能夠縮減與升取樣 或内插基礎層之經保留圖像相關聯的錯誤。 當目的地器件14能夠解碼及顯示三維視頻資料時,視頻 解碼器30可將經接收基礎層解封裝成構成縮減解析度經編 碼圖像’且解碼該等縮減解析度圖像中每一者。根據一些 實例,視頻解碼器3 0亦可相依於視頻解碼器3 〇及/或視頻 顯示器32之能力而解瑪該等增強層中之一者或其兩者。藉 由保留該等增強層中之一者或其兩者’視頻解碼器3 〇可縮 減與升取樣或内插基礎層之圖像相關聯的錯誤β再次,藉 由解碼器30解碼之層可相依於視頻解碼器3〇及/或目的地 器件14及/或通信頻道16(圖1)之能力。 視頻解碼器30可擷取經視圖間編碼圖像之位移向量,或 經框間或層間編碼圖像(例如’基礎層之兩個縮減解析度 圖像及增強層之兩個全解析度圖像)之運動向量。視頻解 碼器30可使用位移向量或運動向量以擷取預測區塊以解碼 圖像之區塊。在一些實例中,在解碼基礎層之縮減解析度 圖像之後’視頻解碼器30可將經解碼圖像升取樣至與增強 層圖像之解析度相同的解析度。 運動補償單元132可基於自熵解碼單元130所接收之運動 向量來產生預測資料。運動補償單元132可使用在位元串 流中所接收之運動向量以識別參考圖框儲存器丨42中之參 考圖框中之預測區塊.。框内預測單元134可使用在位元串 流中所接收之框内預測模式以由空間鄰近區塊形成預測區 塊。逆量化單元136逆量化(亦即,解量化)提供於位元串流 158748.doc •73· 201223249 中且藉由熵解碼單元〗3 〇解碼之經量化區塊係數。逆量化 程序可包括(例如)如由Η.264解碼標準所定義之習知程序。 逆量化程序亦可包括針對每一巨集區塊使用藉由編碼器2〇 演算之量化參數QPY,以判定量化程度且同樣地判定應被 應用之逆量化程度。 逆變換單元58將逆變換(例如,逆DCT、逆整數變換或 概念上相似逆變換程序)應用於變換係數,以便在像素域 中產生殘餘區塊。運動補償單元132產生經運動補償區 鬼從而了犯地基於内插遽波器來執行内插。待用於具有 子像素精確度之運動估計的用於内插濾波器之識別符可包 括於語法元素中。運動補償單元132可在視頻區塊之編碼 期間使用如藉由視镅媼石基哭> μ in. „For( compldx = 0; compldx <2; compIdx++) mvd JOf mbPartldx ][ 0 ][ compldx ] 2 se(v) | ae(v) for( mbPartldx = 0; mbPartldx < NumMbPart( mb_type); mbPartldx-H -) if(mbPart_texture_prediction_£lag[ mbPartldx ]&& MbPartPredMode( mb-type,mbPartldx ) != Pred L0 ) for( compldx = 0; compldx <2; compIdx++) mvd Ilf mbPartldx ]f 0 ]Γ compldx 1 2 se(v) | ae(v) ) I In the example shown in Table 8, video encoder 20 may set mbPart_ texture_prediction_flag[ mbPartldx ] equal to 1 to indicate: Inter-layer texture prediction is invoked for the corresponding partition mbPartldx. Video encoder 20 may set mbPart_texture_prediction_flag equal to 〇 to indicate that inter-layer texture prediction is not invoked for partition mbPartldx. In addition, video encoder 20 may set motion_prediction_flag_ll/0[mbPartIdx] equal to 1 to indicate that an alternative motion vector predictor that uses the motion vector of the base layer as a reference is used to derive a list of macroblock partitions mbPartldx. The 1/0 motion vector, and the list 1/0 reference index of the macroblock partition mbPartldx is inferred from the base layer. Table 10 shown below also includes sub-block level syntax elements: Table 10-sub_mb_pred_in_mfc_extensi〇n(mb-type) sub mb pred in mfc extension( mb type) { c descriptor for( mbPartldx = 0; mbPartldx < mbPart 。 。 。 。 。 。 .doc -64- 201223249 for( mbPartldx = 0; mbPartldx <4; mbPartldx-H-) if(!mbPart_texture_prediction_flag [ mbPartldx ] && SubMbPredMode( sub_mb_type[ mbPartldx ] ) != Direct && SubMbPredMode( sub_mb_type [ mbPartldx ]) != Pred LI) motion prediction flag 10f mbPartldx ] 2 u(l)|ae(v) for( mbPartldx = 0; mbPartldx <4; mbPartIdx++) if(!mbPart_texturej3rediction_flag [ mbPartldx ] && SubMbPredMode ( sub_mb_type[ mbPartldx ] ) != Direct && SubMbPredMode( sub_ mb type[ mbPartldx ]) != Pred L0) motion prediction flag Ilf mbPartldx 1 2 u(l) | ae(v) for( mbPartldx = 0; mbPartldx<4; mbPartIdx++) if(! mbPart_texture_prediction_flag [ mbPartldx ] !motion_prediction_flag_10[ mbPartldx ] &&&& (num_ref_idx_10_active_minusl > 0 | | mb_field_ decoding_flag )&& mb_type != P_8x8refD && sub_mb_type [ mbPartldx ] != B_Direct_8x8 &&SubMbPredMode( sub_mb_type[ mbPartldx ] ) != Pred LI))) ref—idx—10 [ mbPartldx 1 2 te(v) | ae(v) for( mbPartldx = 0; mbPartldx <4; mbPartldx-H-) if(! mbPart_texture_prediction_flag [ mbPartldx ] !motion_prediction_flag_ll [ mbPartldx ] &&&& (num_ref_idx_ll_active_minusl > 0 | | mb_field_decoding_flag) && sub_mb_type[ mbPartldx ] != B_Direct_8x8 && SubMbPredMode( sub_mb_type[ mbPartldx ] ) != Pred L0 ))) ref - idx Ilf mbPartldx 1 2 te(v) | ae(v) for( mbPartldx = 0; mbPartldx <4; mbPartldx-H-) if( ! rabPart_texture_prediction_flag [ mbPartldx ] && sub_mb_type[ mbPartldx ] != B_Direct-8x8 && SubMbPredMode( sub_mb_ty Pe[ mbPartldx ] ) != Pred LI) •65- 158748.doc 201223249 for( subMbPartldx = 0; subMbPartldx < NumSubMbPart( sub_mb_type[ mbPartldx ]); subMbPartldx-H-) for( compldx = 0; compldx <2; compIdx-H-) mvd_10[ mbPartldx ][ subMbPartldx ][ compldx ] 2 se(v) | ae(v) for( mbPartldx = 0; mbPartldx <4; mbPartIdx++) if( ! mbPart_texture_prediction_flag [ mbPartldx ] && sub_mb_type [ mbPartldx ] != B_Direct_8x8 && SubMbPredMode( sub_mb_type[ mbPartldx ] ) != Pred L0 ) for( subMbPartldx = 0; subMbPartldx < NumSubMbPart( sub_mb_type[ mbPartldx ]); subMbPartIdx++) for( compldx = 0; compldx <2; compIdx-H-) mvd 11[ mbPartldx ][ subMbPartldx If compldx 1 2 se(v) | ae(v) } In the example shown in Table 10, video encoder 20 may set mbPart_ texture_prediction_flag[ mbPartldx ] to Equal to 1 to indicate: Inter-layer texture prediction is invoked for the corresponding partition mbPartldx. Video Encoder 20 may set mbPart_texture_prediction_flag equal to 0 to indicate: for partition mbPartldx does not call inter-layer texture prediction. Video encoder 20 may set motion_prediction_flag_ll_0 [mbPartldx] equal to 1 to indicate that an alternative motion vector predictor that uses the motion vector of the base layer as a reference is used to derive list 1/0 motion of the macroblock partition mbPartldx The vector, and the list 1/0 reference index of the mbPartldx of the macroblock partition from the base layer. Video encoder 20 may not set motion_prediction_flag_ll/0 -66- 158748.doc 201223249 [mbPartldx] flag (e.g., 'no flag') to indicate that inter-layer motion prediction is not used for macroblock partition mbPartldx. In accordance with some aspects of the present invention, video encoder 20 may enable or disable mb_base_texture_flag, mbPart_texture prediction_flag, and motion-prediction_flag_ll/0 under the slice header level. For example, when all of the blocks in the segment have the same characteristics, signaling these characteristics at the segment level rather than at the block level may provide relative bit savings. In this manner, FIG. 2A is a block diagram illustrating an example of a video encoder 2 that may implement techniques for generating a scalable multi-view bitstream having a stream corresponding to a scene. Two base views (for example, the left eye view and the right eye view) are two base layers of the reduced resolution image, and two additional enhancement layers. The first enhancement layer may comprise a full resolution image of one of the views of the base layer, and the second enhancement layer may comprise a full resolution image of another respective view of the base layer. Again, it should be understood that the particular components of FIG. 2A may be shown and described with respect to a single component for conceptual purposes, but may include one or more functional units. For example, as described in more detail with respect to Figure 2B, motion estimation/discrimination unit 42 may include separate units for performing motion estimation and motion difference calculations. 2B is a block diagram illustrating another example of a video encoder that may implement techniques for generating a scalable multiview bitstream, the scalable multiview bit_ having one base layer and two enhancement layers. As mentioned above, certain components of video encoder 20 may be shown and described with respect to a single component, 158748.doc - 67 201223249 but may include more than one discrete and/or integrated unit. In addition, certain components of video encoder 20 may be highly integrated or incorporated into the same physical component, but are described separately for conceptual purposes. Thus, the example shown in FIG. 2B can include many of the same components as the components of video encoder 20 shown in FIG. 2A' but is shown in an alternate configuration to conceptually illustrate three layers (eg, 'base layer 142, first Encoding of enhancement layer 84 and second enhancement layer 86). The example shown in Figure 2B illustrates a video encoder 2 that produces a scalable multiview bitstream that includes three layers. As described above, each of the layers can include a series of frames that make up the multimedia content. According to aspects of the invention, the two layers include a base layer 82, a first reinforcement layer 84, and a second reinforcement layer 86. In some examples, the frame of the base layer 142 may include two side-by-side encapsulated reduced resolution images (e.g., left eye view ("B1") and right eye view ("B2"). The first enhancement layer may include a full-resolution image ("E1") of the left-eye view of the base layer, and the second enhancement layer may include a full-resolution image ("E2") of the right-eye view of the base layer. However, the base layer configuration and enhancement layer sequence shown in Fig. 2B are provided merely as an example. In another example, the base layer 82 can include reduced resolution images in alternative package configurations (e.g., top and bottom, column interlaced, line interlaced, checkerboard, and the like). Further, the 'first prime layer can include a full resolution image of the right eye view, and the second enhancement layer can include a full resolution image of the left eye view. In the example of FIG. 2B, the video encoder 2 includes three in-frame prediction units 46 and two motion estimation/motion compensation units 9 (eg, which can be combined with the motion estimation/difference shown in FIG. 2A). Unit 42 and motion compensation unit 44 158748.doc • 68· 201223249 are configured similarly or identically), with each layer 82 to 86 having an associated in-frame prediction unit 46 and motion estimation/compensation unit 90. Further, the first enhancement layer 84 and the second enhancement layer 86 are each associated with an inter-layer prediction unit (grouped by a broken line %) including an inter-layer texture prediction unit and an inter-layer motion prediction unit 1 and an inter-view prediction unit 1A. The remaining components of Figure 2B can be configured similarly to the components of Figure 2A. That is, the summer 5 and reference frame store 64 can be similarly configured in both representations, and the transform and localization unit 114 of FIG. 3 can be combined with the combined transform unit 52 of FIG. 2A. And the quantization unit *54 is similarly configured. Further, the inverse quantization/inverse transform unit/reset/deblock unit 122 of Fig. 5 can be configured similarly to the combined inverse quantization unit 58 and inverse transform unit 6A shown in Fig. 2A. The mode selection unit is represented in Figure a as a switch that performs a toggle between each of the prediction units, which may, for example, select an encoding based on the error result (in-frame, inter-frame, inter-layer motion, One of the interlayer textures, or between views. In general, video encoder 20 may encode base layer 82 using the in-frame or inter-frame coding methods described above with respect to FIG. 2 (> for example, video encoder 20 may be in-frame prediction unit 46 The block (10) code is included in the base (10) reduced resolution image. The video encoder 2 can use the motion estimation/complement f unit 90 (eg, it can be combined with the combined motion estimation/differentiation unit 42 and (4) compensation unit of FIG. 2 44 items (four) or identically configured to inter-frame encode the reduced resolution image included in the base layer 82. Alternatively, the video encoder 20 may use the in-frame prediction unit 46 to encode the first enhancement layer in-frame. (d) a second enhancement layer, or using a motion compensation estimation/compensation unit 9 to interleave the first enhancement layer 84 or the second enhancement layer 86. 158748.doc • 69· 201223249 According to the aspect of the invention, the video encoder 20 may also implement certain other inter-view or inter-layer coding methods to encode the first enhancement layer 84 and the second enhancement layer 86. For example, the video encoder 2 may use inter-layer prediction units (grouped by dashed line 98) to encode a reinforcing layer 84 and a second reinforcing layer For example, video encoder 20 may use inter-layer prediction unit 98 to reduce from the left eye view of the base layer (eg, 'B1'), depending on the example in which first enhancement layer 84 includes a full-resolution image of the left-eye view. The resolution image is used to inter-layer predict the first enhancement layer 84. Further, the video encoder 2 may use the inter-layer prediction unit 98 to inter-predict the second by reducing the resolution image from the right-eye view of the base layer (eg, B2) Enhancement layer 86. In the example shown in FIG. 2B, inter-layer prediction unit 98 may receive data (eg, 'motion vector data, texture data, and the like) from motion estimation/compensation unit 9 associated with base layer 82. In the example shown in FIG. 2B, the inter-layer prediction unit 98 includes an inter-layer texture prediction unit 100' for the inter-layer texture prediction first enhancement frame 84 and the second enhancement frame 86, and a first enhancement frame for inter-layer motion prediction. 84 and the inter-layer motion prediction unit 1 of the second enhancement frame 86. The video encoder 20 may also include an inter-view prediction unit 〇6 to inter-view the first enhancement layer 84 and the second enhancement layer 86. For example, video encoder 20 may interpret the first enhancement layer 84 (eg, a full-resolution image of the left-eye view) from the reduced-resolution image of the right-eye view (B2) of the base layer. Similarly, the video The encoder 20 may interpret the second enhancement layer 86 (eg, the full-resolution image of the right-eye view) from the reduced-resolution image of the left-eye view (b 1) of the base layer. Further, according to some examples, Video encoder 20 may also inter-view predictive second enhancement layer 86 based on first enhancement layer 84. 158748.doc -70-201223249 Video coding after transformation and quantization of residual transform coefficients performed by transform and quantization unit 114 The entropy coding and multiplexing of the quantized residual transform coefficients may be performed by entropy coding and multiplexing unit 118. That is, entropy coding and multiplexing unit 118 may encode quantized transform coefficients, eg, perform content self-adaptive variable length coding (CAVLC), context self-adaptive binary arithmetic coding (CABAC), or another entropy coding technique ( As described in relation to Figure 2A). Additionally, entropy coding and multiplexing unit 118 may generate syntax information such as coded block type (CBP) values, macro block types, coding modes, coded units (such as frames, segments, macro regions) The largest macro block size of a block or sequence, or the like. Entropy coding and multiplexing unit i i 8 may format this compressed video material into a so-called "network abstraction layer unit" or NAL unit. Each NAL unit includes a header that identifies the type of material stored to the nal unit. In accordance with aspects of the present invention, as described above with respect to FIG. 2A, video encoder 2 may use a NAL format different from the NAL format for the first enhancement layer and the first enhancement layer 86 for base layer 82. Again, although the particular components shown in Figure 2B can be represented as distinct units, it should be understood that certain components of video encoder 2G can be highly integrated or incorporated into the same physical component order. Thus, as an example, although Figure 2B includes three discrete cabinet inbound units 46, video encoding n2() may cause the (4)-entity component to perform in-frame prediction. 3 is a block diagram illustrating an example of a video decoder 3 that decodes an encoded video sequence. In FIG. 3, the video decoder 3 includes an entropy solution kick unit 13(), a motion compensation unit. m, in-frame prediction unit (1), inverse quantization unit 136, inverse transform unit 138, reference frame slot 158748.doc-71. 201223249 142 and summer 140. In some examples, video decoder 3 may perform a decoding pass (heart (3) (1) 叩 pass) that is substantially reciprocal to the encoding pass described with respect to video encoder 20 (Figs. 2A and 2B). In detail, video decoder 30 can be configured to receive a scalable multi-view bitstream comprising a base layer, a first enhancement layer, and a second enhancement layer. Video decoder 30 may receive information indicating the frame packing configuration for the base layer, the order of the enhancement layers, and other information for properly decoding the scalable multiview bitstream. For example, video decoder 3 can be configured to interpret "Multi-View Frame Compatible" (MFC) SPS and SEI messages. The video decoder 3〇 can also be configured to determine whether to decode all three layers of the multiview bitstream, or to decode only a subset of the layers (eg, the base layer and the first enhancement layer\. Based on whether the video display 32(1) is capable of displaying the three-dimensional video material, whether the video decoder 30 has a plurality of video ffils that decode a particular bit rate and/or frame rate (and upsample specific bit rate and/or frame rate) The ability to reduce the resolution view) or other factors related to the video decoder 3 and/or the video display 32. When the destination device 14 is unable to decode and/or display the 3D video material, the video decoder 30 may The received base layer is decapsulated into one of the reduced resolution encoded images to form a reduced resolution encoded image. Accordingly, the video decoder 3 may select only one half of the base layer to be decoded (eg, In addition, video decoder 30 may select to decode only those of the enhancement layers. That is, video decoder (10) may deselect to decode the preserved reduced resolution image corresponding to the base frame. Enhancement And discarding the enhancement layer corresponding to the discarded image of the base layer. By retaining one of the enhancement layers 158748.doc -72·201223249, the video decoder 30 may be able to reduce and upsample or interpolate the base layer The error associated with the image is preserved. When the destination device 14 is capable of decoding and displaying the three-dimensional video material, the video decoder 30 may decapsulate the received base layer into a reduced resolution encoded image 'and decode the same Each of the resolution images is reduced. According to some examples, video decoder 30 may also rely on the capabilities of video decoder 3 and/or video display 32 to resolve one of the enhancement layers or two of them. By retaining one of the enhancement layers or both of the 'video decoders 3', the error β associated with the upsampled or interpolated base layer image can be reduced again, decoded by the decoder 30. The layers may be dependent on the capabilities of video decoder 3 and/or destination device 14 and/or communication channel 16 (FIG. 1). Video decoder 30 may capture the displacement vector of the inter-view encoded image, or via interframe or Inter-layer coded image A motion vector such as 'two reduced resolution images of the base layer and two full resolution images of the enhancement layer.) Video decoder 30 may use a displacement vector or motion vector to retrieve the prediction block to decode the image. Blocks. In some examples, after decoding the reduced resolution image of the base layer, video decoder 30 may upsample the decoded image to the same resolution as the resolution of the enhancement layer image. Motion Compensation Unit 132 The prediction data may be generated based on the motion vector received from the entropy decoding unit 130. The motion compensation unit 132 may use the motion vector received in the bitstream to identify the reference frame in the reference frame store 42 Prediction block: In-frame prediction unit 134 may use the intra-frame prediction mode received in the bitstream to form a prediction block from spatially neighboring blocks. Inverse quantization unit 136 inverse quantizes (i.e., dequantizes) the quantized block coefficients that are provided in bitstream 158748.doc • 73·201223249 and decoded by entropy decoding unit 33 。. The inverse quantization procedure can include, for example, a conventional program as defined by the 264.264 decoding standard. The inverse quantization procedure may also include using the quantization parameter QPY calculated by the encoder 2〇 for each macroblock to determine the degree of quantization and similarly determine the degree of inverse quantization that should be applied. Inverse transform unit 58 applies an inverse transform (e.g., inverse DCT, inverse integer transform, or conceptually similar inverse transform procedure) to the transform coefficients to produce residual blocks in the pixel domain. The motion compensation unit 132 generates a motion compensated region ghost and thus performs interpolation based on the interpolation chopper. The identifier for the interpolation filter to be used for motion estimation with sub-pixel precision may be included in the syntax element. The motion compensation unit 132 can be used during encoding of the video block, such as by 镅媪石基哭> μ in.

以解碼經編碼視頻序列之其他資訊。To decode other information of the encoded video sequence.

158748.doc 元132或框内預 L和以形成經解碼區塊。 以對經解碼區塊進行濾 201223249 波’以便移除方塊效應假影。 。接著將轉碼視頻ϋ塊儲存158748.doc Element 132 or pre-frame L in the box to form a decoded block. The 201223249 wave is filtered on the decoded block to remove blockiness artifacts. . Then transfer the transcoded video block

以用於呈現於 顯示器件(諸如,圖1之顯示器件32)上。 根據本發明之一些態樣,視頻解碼器3〇可針對彼此分離 地管理經解碼圖像,例如,儲存於參考圖框儲存器M2中 之經解碼圖像。在一些實例中,視頻解碼器3〇可根據 H.264/AVC規範而針對每一層分離地管理經解碼圖像。在 視頻解碼器30已解碼對應增強層之後,視頻解碼器3〇可移 除任何經升取樣經解碼圖像,例如,來自基礎層且出於增 強層預測目的而被升取樣之經解碼圖像。 在一實例中,視頻解碼器30可接收經編碼可縮放多視圖 位元串流,其具有包括左眼視圖及右眼視圖之縮減解析度 圖像之基礎層,以及包括基礎圖框之左眼視圖之全解析度 圖像之第一增強層。在此實例中,視頻解碼器3〇可解碼包 括於基礎層中之左眼視圖之縮減解析度圖像,且升取樣該 等縮減解析度圖像以層間預測第一增強層。亦即,視頻解 碼器30可在解碼第一增強層之前升取樣基礎層之縮減解析 度圖像。在解碼第一增強層後,視頻解碼器3〇隨即可接著 自參考圖框儲存器142移除左眼視圖(例如,來自基礎層)之 經升取樣圖像。 視頻解碼器30可經組態以根據經接收旗標來管理經解碼 圖像。舉例而言,特定旗標可具備識別基礎層之哪些圖像 必須出於預测目的而被升取樣之經接收經編碼視頻資料。 158748.doc •75- 201223249 根據一貫例,若視頻解碼器3〇接收等於一(「i」)之 inter_view_frame_〇_flag , inter_layer_frame_〇—flag 或 inte^mponent—ffame_〇—flag,則視頻解碼器3〇可識別應 升取樣圖框0部分’亦即,對應於視圖〇的基礎層之部分。 另一方面,若視頻解碼器接收等於一(Γι」)之 inter_view_frame_l_flag 、 inter_layer—frame—i_flag 或 inter一component_frame_l_flag,則視頻解碼器 3〇可識別應 升取樣圖框1部分,亦即,對應於視圖丨的基礎層之部分。 根據本發明之一些態樣,視頻解碼器3〇可經組態以提取 及解碼子位元串流。亦即,例如,視頻解碼器3〇可能能夠 使用多種操作點來解碼(30)可縮放多視圖位元串流。在一 些貫例中’視頻解碼器3 0可提取對應於基礎層之圖框經封 裝子位元串流(例如,根據Η·264/Αν(:規範而封裝)。視頻 解碼器30亦可解碼單視圖操作點。視頻解碼器3〇亦可解碼 非對稱操作點。 解碼器30可自諸如圖2A及圖2B所示之視頻編碼器20的 編碼器接收識別操作點之語法或指令。舉例而言,視頻解 碼器30可接收變數twoFullViewsFlag(在存在時)、變數 twoHalfViewsFlag(在存在時)、變數tidTarget(在存在時)及 變數LeftViewFlag(在存在時)。在此實例中,視頻解碼器 3〇可使用上文所描述之輸入變數來應用以下操作以導出子 位元串流: 1. 將視圖0、1及2標記為目標視圖。 2. 當 twoFullViewsFlag為假時, 158748.doc 76· 201223249 a. 若 LeftViewFlag及 left_view_enhance_first 皆為 1 或 0((LeftViewFlag+left_view_enhance_first) %2 ==0), 則將視圖2標記為非目標視圖; b. 否則,(LeftViewFlag+left一view_enhance_first) 0/〇2 ==\ , i.若 full_left_right_dependent一flag 為 1,則將視 圖1標記為非目標視圖。 3. 將全部VCL NAL單元及填充資料NAL單元(對於該等 單元,以下條件中任一者為真)標記為「待自位元串流 移除」: a. temporal_id大於tldTarget, b. nal—ref_idc等於 0且 inter_component_flag等於 0(或 全部以下旗標等於 〇 : inter_view_frame_0_flag、 inter_view_frame 一 1 一flag 、inter」ayer一frame_0 一 flag 、 inter_layer_frame_l_flag、inter view flag 及 inter_ layer_flag) 〇 c. view」d等於(2-second_view_flag)之視圖為非目標 視圖。 4. 移除全部存取單元,對於該等單元,全部VCL NAL單 元皆被標記為「待自位元串流移除」》 5. 移除被標記為「待自位元串流移除」之全部VCL NAL 單元及填充資料NAL單元。 6. 當twoHalfViewsFlag為1時,移除以下NAL單元: 158748.doc -77- 201223249 a. nal_unit_type 等於 NEWTYPE1 或 NEWTYPE2 之全 部NAL單元。 b. 含有SPS mfc延伸(可能地具有新類型)及在本修正 案中所定義之SEI訊息(具有不同SEI類型)的全部 NAL單元。 在此實例中,當不存在作為至此子子句之輸入的 twoFullViewsFlag時,推斷twoFullViewsFlag等於1。當不 存在作為至此子子句之輸入的twoHalfViewsFlag時,推斷 twoHalfViewsFlag等於0。當不存在作為至此子子句之輸入 的tldTarget時,推斷tldTarget等於7。當不存在作為此子子 句之輸入的LeftViewFlag時,推斷LeftViewFlag為真。 雖然關於視頻解碼器30予以描述,但在其他實例中,可 藉由目的地器件(例如,圖1所示之目的地器件14)之另一器 件或組件執行子位元串流提取。舉例而言,根據本發明之 一些態樣,可將子位元串流識別為屬性,例如,識別為被 包括作為視頻服務之名單(manifest)之部分的屬性。在此 實例中,可在用戶端(例如,目的地器件14)開始播放任何 特定視頻表示之前傳輸該名單,使得該用戶端可使用該等 屬性以選擇操作點。亦即,該用戶端可選擇接收僅該基礎 層、該基礎層及一個增強層,或該基礎層及兩個增強層。 圖4為說明左眼視圖圖像180及右眼視圖圖像182的概念 圖,左眼視圖圖像180及右眼視圖圖像182係藉由視頻編碼 器20組合以形成具有對應於左眼視圖圖像180及右眼視圖 圖像182之縮減解析度圖像的基礎層之經封裝圖框For presentation on a display device such as display device 32 of FIG. In accordance with some aspects of the present invention, video decoder 3 may manage decoded images, e.g., decoded images stored in reference frame store M2, separately from each other. In some examples, video decoder 3 may separately manage the decoded images for each layer in accordance with the H.264/AVC specification. After video decoder 30 has decoded the corresponding enhancement layer, video decoder 3 may remove any upsampled decoded image, eg, a decoded image from the base layer and upsampled for enhancement layer prediction purposes . In an example, video decoder 30 may receive an encoded scalable multiview bitstream having a base layer comprising a reduced resolution image of a left eye view and a right eye view, and a left eye including a base frame The first enhancement layer of the full resolution image of the view. In this example, video decoder 3 may decode the reduced resolution image included in the left eye view in the base layer and upsample the reduced resolution images to inter-layer predict the first enhancement layer. That is, video decoder 30 may upsample the reduced resolution image of the base layer prior to decoding the first enhancement layer. After decoding the first enhancement layer, the video decoder 3 can then remove the upsampled image of the left eye view (e.g., from the base layer) from the reference frame store 142. Video decoder 30 may be configured to manage the decoded image in accordance with the received flag. For example, a particular flag may have received encoded video material that identifies which images of the base layer must be upsampled for prediction purposes. 158748.doc •75- 201223249 According to a consistent example, if video decoder 3 receives an inter_view_frame_〇_flag equal to one ("i"), inter_layer_frame_〇_flag or inte^mponent-ffame_〇_flag, then the video The decoder 3 〇 can identify the portion of the base portion of the sample block that is up, that is, corresponds to the base layer of the view 〇. On the other hand, if the video decoder receives an inter_view_frame_l_flag, an inter_layer_frame_i_flag or an inter_component_frame_l_flag equal to one (Γι), the video decoder 3〇 can identify the portion of the upsampled frame 1, that is, corresponding to the view. Part of the base layer of 丨. In accordance with some aspects of the present invention, video decoder 3A can be configured to extract and decode sub-bitstreams. That is, for example, video decoder 3 may be capable of decoding (30) scalable multi-view bitstreams using a variety of operating points. In some embodiments, video decoder 30 may extract a frame-packed sub-bitstream corresponding to the base layer (eg, encapsulated according to Η·264/Αν(:Specification). Video decoder 30 may also decode Single view operation point. The video decoder 3 can also decode asymmetric operation points. The decoder 30 can receive syntax or instructions for identifying the operating point from an encoder such as the video encoder 20 shown in Figures 2A and 2B. In other words, video decoder 30 may receive the variables twoFullViewsFlag (when present), the variable twoHalfViewsFlag (when present), the variable tidTarget (when present), and the variable LeftViewFlag (when present). In this example, video decoder 3〇 You can use the input variables described above to apply the following operations to derive sub-bitstreams: 1. Mark views 0, 1, and 2 as target views. 2. When twoFullViewsFlag is false, 158748.doc 76· 201223249 a If both LeftViewFlag and left_view_enhance_first are 1 or 0 ((LeftViewFlag+left_view_enhance_first) %2 ==0), then view 2 is marked as a non-target view; b. Otherwise, (LeftViewFlag+left-vi Ew_enhance_first) 0/〇2 ==\ , i. If full_left_right_dependent_flag is 1, then view 1 is marked as a non-target view. 3. All VCL NAL units and padding data NAL units (for these units, in the following conditions) Either true is marked as "Pending from bitstream removal": a. temporal_id is greater than tldTarget, b. nal_ref_idc is equal to 0 and inter_component_flag is equal to 0 (or all of the following flags are equal to 〇: inter_view_frame_0_flag, inter_view_frame-1 A flag, an "ayer_frame_0_flag, inter_layer_frame_l_flag, inter view flag, and inter_layer_flag) 〇c. view"d is equal to (2-second_view_flag) view is a non-target view. 4. Remove all access units, for For all units, all VCL NAL units are marked as "Pending from Bit Stream Removal". 5. Remove all VCL NAL units and padding data NAL units marked as "Pending Stream Removal". 6. When twoHalfViewsFlag is 1, remove the following NAL unit: 158748.doc -77- 201223249 a. nal_unit_type is equal to all NAL units of NEWTYPE1 or NEWTYPE2. b. All NAL units containing SPS mfc extensions (possibly with new types) and SEI messages (with different SEI types) as defined in this amendment. In this example, when there is no twoFullViewsFlag as input to this subclause, it is inferred that twoFullViewsFlag is equal to 1. When there is no twoHalfViewsFlag as input to this subclause, it is inferred that twoHalfViewsFlag is equal to zero. When there is no tldTarget as input to this subclause, it is inferred that tldTarget is equal to 7. When there is no LeftViewFlag as input to this subclause, it is inferred that LeftViewFlag is true. Although described with respect to video decoder 30, in other examples, sub-bitstream extraction may be performed by another device or component of a destination device (e.g., destination device 14 shown in FIG. 1). For example, in accordance with some aspects of the present invention, a sub-bitstream can be identified as an attribute, e.g., identified as an attribute that is included as part of a manifest of a video service. In this example, the list can be transmitted before the user (e.g., destination device 14) begins playing any particular video representation so that the client can use the attributes to select an operating point. That is, the UE may choose to receive only the base layer, the base layer, and an enhancement layer, or the base layer and the two enhancement layers. 4 is a conceptual diagram illustrating a left eye view image 180 and a right eye view image 182, which are combined by the video encoder 20 to form a view corresponding to the left eye. Encapsulated frame of the base layer of the reduced resolution image of image 180 and right eye view image 182

158748.doc 201223249 1 84(「基礎層圖框184」)。視頻編碼器20亦形成對應於左 眼視圖圖像180的增強層之圖框186(「增強層圖框186」)。 在此實例中’視頻編碼器20接收包括一場景之左眼視圖之 原始視頻資料的圖像18 〇,及包括該場景之右眼視圖之原 始視頻資料的圖像182 »左眼視圖可對應於視圖〇,而右眼 視圖可對應於視圖1。圖像180、182可對應於同一時間例 項之兩個圖像。舉例而言,圖像丨80、182可能已在實質上 相同時間被相機俘獲。 在圖4之實例中,用X來指示圖像ι8〇之樣本(例如,像 素)’而用Ο來指示圖像18 2之樣本。如圖所示,視頻編碼 器20可降取樣圖像18〇、降取樣圖像182,且組合該等圖像 以形成視頻編碼器20可編碼之基礎層圖框184。在此實例 中’視頻編碼器20以並列式配置將經降取樣圖像i 80及經 降取樣圖像182配置於基礎層圖框184中。為了降取樣圖像 180及182且將經降取樣圖像配置於並列式基礎層圖框ι84 中’視頻編碼器20可抽取每一圖像180及182之交替行。作 為另一實例’視頻編碼器20可完全地移除圖像180及182之 交替行以產生圖像180及182之經降取樣版本。 然而’在其他實例中,視頻編碼器2〇可以其他配置來封 裝經降取樣圖像180及經降取樣圖像182。舉例而言,視頻 編碼器20可使圖像18〇及182之行交替。在另一實例中,視 頻編碼器20可抽取或移除圖像1 及182之列,且以上下式 或交替配置來配置經降取樣圖像。在又一實例中,視頻編 碼器20可梅花陣式(棋盤式)取樣圖像18〇及ι82且將樣本配 158748.doc •79· 201223249 置於基礎層圖框184中。 除了基礎層圖框1 84以外,視頻編碼器2〇亦可編碼對應 於基礎層圖框1 84之左眼視圖(例如,視圖〇)之圖像的全解 析度增強層圖框1 8 6。根據本發明之一些態樣,如先前所 描述,視頻編碼器20可使用層間預測(藉由虛線188表示)來 編碼增強層圖框186。亦即,視頻編碼器2〇可使用利用層 間紋理預測之層間預測或利用層間運動預測之層間預測來 編碼增強層圖框1 86。或者或另外’如先前所描述,視頻 編碼器20可使用視圖間預測(藉由虛線19〇表示)來編碼增強 層圖框186。 在圖4之說明中,基礎層圖框184包括對應於來自圖像 180之資料的X及對應於來自圖像丨82之資料的〇。然而, 應理解,對應於圖像180及182的基礎層圖框184之資料將 未必與在降取樣之後圖像1 80及1 82之資料恰好對準。同樣 地,在編瑪之後,基礎層圖框1 84中圖像之資料將很可能 不同於圖像180、182之資料。因此,不應假設基礎層圖框 1 84中一個X或〇之資料必要地等同於圖像J 8〇、182中之對 應X或0,或基礎層圖框184中之X或〇為與圖像18〇、182中 之X或Ο之解析度相同的解析度。 圖5為說明左眼視圖圖像18〇及右眼視圖圖像〖82的概念 圖,左眼視圖圖像180及右眼視圖圖像182係藉由視頻編竭 器20組合以形成基礎層之圖框184(「基礎層圖框184」)及 對應於右眼視圖圖像182的增強層之圖框192(「增強層圖 框192」)。在此實例中’視頻編碼器2〇接收包括一場景之 158748.doc •80- 201223249 左眼視圖之原始視頻資料的圖像18〇,及包括該場景之右 眼視圖之原始視頻資料的圖像182。左眼視圖可對應於視 圖〇,而右眼視圖可對應於視圖丨。圖像18〇、182可對應於 同一時間例項之兩個圖像。舉例而言,圖像180、182可能 已在實質上相同時間被相機俘獲。 相似於圖4所示之實例,圖5所示之實例包括用χ指示的 圖像180之樣本(例如,像素),及用〇指示的圖像182之樣 本。如圖所示,視頻編碼器2〇可降取樣及編碼圖像18〇、 降取樣及編碼圖像1 82,且組合該等圖像而以與圖4所示之 方式相同的方式形成基礎層圖框丨84。 除了基礎層圖框184以外,視頻編碼器2〇亦可編碼對應 於基礎層184之右眼視圖(例如,視圖之圖像的全解析度 增強層圖框m。根據本發明之一些態樣,如先前所描 述,視頻編碼器20可使用層間預測(藉由虛線188表示)來編 碼增強層圖框192。亦即,視頻編碼器2〇可使用利用層間 紋理預測之層間預測或利用層間運動預測之層間預測來編 碼增強層圖框192。或者或另外,如先前所描述,視頻編 碼器20可使用視圖間預測(藉由虛線190表示)來編碼增強層 圖框192。 圖6為說明左眼視圖圖像18〇及右眼視圖圖像182的概念 圖,左眼視圖圖像180及右眼視圖圖像182係藉由視頻編碼 器20組合以形成基礎層之圖框184(「基礎層圖框184」)、 包括左眼視圖180之全解析度圖像的第一增強層之圖框 〔第一增強層圖框186」),及包括右眼視圖182之全解析 158748.doc • 81 - 201223249 度圖像的第二增強層之圖框(「第二增強層圖框192」)》在 此實例中,視頻編碼器20接收包括一場景之左眼視圖之原 始視頻資料的圖像180,及包括該場景之右眼視圖之原始 視頻資料的圖像182。左眼視圖可對應於視圖〇,而右眼視 圖可對應於視圖1。圖像180、182可對應於同一時間例項 之兩個圖像。舉例而言,圖像180、182可能已在實質上相 同時間被相機俘獲》 相似於圖4及圖5所示之實例,圖6所示之實例包括用又指 示的圖像180之樣本(例如,像素),及用〇指示的圖像182 之樣本。如圖所示,視頻編碼器2〇可降取樣及編碼圖像 1 80、降取樣及編碼圖像丨82 ,且組合該等圖像而以與圖4 及圖5所示之方式相同的方式形成基礎層圖框丨84。 除了基礎層圖框1 84以外,視頻編碼器2〇亦可編碼對應 於基礎層圖框1 84之左眼視圖圖像(例如,視圖〇)的第一增 強層圖框186 ^視頻編碼器2〇亦可編碼對應於基礎層圖框 184之右眼視圖圖像(例如,視圖丨)的第二增強層圖框192。 然而,增強層圖框之排序僅僅係作為一個實例而提供。亦 即,在其他貫例中,視頻編瑪器2〇可編碼對應於基礎層圖 框184之右眼視圖之圖像的第—增強層圖框,及對應於基 礎層圖框184之左眼視圖之圖像的第二增強層圖框。 在圖6所示之實例中,如先前所描述,視頻編碼器20可 基於基礎層圖框184而使用層間預測(藉由虛線188表示)來 編碼第一增強層圖框186。亦即,視頻編碼器2〇可使用利 用層間紋理預測之層間預測或利用層間運動預測之層間預 158748.doc 82· 201223249 測來編碼第一增強層圖框186 »或者或另外,如先前所描 述’視頻編碼器20可基於基礎層圖框184而使用視圖間預 測(藉由虛線19〇表示)來編碼第一增強層圖框186。 如上文所描述,視頻編碼器20亦可基於基礎層圖框184 而使用層間預測(藉由虛線194表示)來編碼第二增強層圖框 192。亦即’視頻編碼器20可基於基礎層圖框1 84而使用利 用層間紋理預測之層間預測或利用層間運動預測之層間預 測來編妈第二增強層圖框192。 或者或另外,視頻編碼器2〇可基於第一增強層圖框186 而使用視圖間預測(藉由虛線190表示)來編碼第二增強層圖 框 192。 根據本發明之態樣,專用於每一層(亦即,基礎層1、 第一增強層186及第二增強層192)之多視圖可縮放位元串 流之頻寬的量可根據該層之相依性而變化。舉例而言,一 般而言,視頻編碼器20可將可縮放多視圖位元串流之頻寬 的50%至60%指派至基礎層184。亦即,與基礎層184相關 聯之資料構成專用於位元串流之整個f料的观至祕。 若第一增強層186及第二增強層192彼此不相依(例如,第 -增強層192不將第—增強層186用於預測目的),則視頻 編碼器20可將剩餘頻寬之近似相等量指派至各別增強層 186、192中每一者(例如,用於每一各別增強層186、192 之頻寬的25%至20%)。戎去,从袖 ^ β )4者’若第二增強層192係自第一 增強層186被預測,則L rs m 貝J視頻編碼益20可將頻寬之相對較大 量指派至第一增強層18 6。f % . *❸亦即’視頻編碼器2〇可將頻寬 158748.doc -83 - 201223249 且將頻寬之近似 之近似25%至30%指派至第一增強層! % 15〇/。至20%指派至第二增強層192。 圓7為說明用於形成及編碼可縮放多視圖位元串流之實 例方法200的流程圖,該可縮放多視圖位元串流包括具有 兩個不同視圖之兩個縮減解析度圖像之基礎層,以及第一 增強層及第二增強層。雖然關於圖1及圖2A至圖2B之實例 組件予以大體上描述’但應理解,其他編碼器、編碼單元 及編碼器件可經組態以執行圖7之方法。此外,未必需要 以圖7所示之次序執行圖7之方法之步驟,且可執行較少、 額外或替代步驟。 在圖7之實例中,視頻編碼器2〇首先接收左眼視圖(例 如’視圖G)之圖像(2G2) 〇視頻編碼器2〇亦可接收右眼視圖 (例如視圖1)之圖像(204),使得兩個經接收圖像形成立 體〜像對。左眼視圖及右眼視圖可形成立體視圖對,其亦 被稱為互補視圖對。右眼視圖之經接收圖像可對應於與左 眼視圖之經接收圖像相同的時間部位。亦即,左眼視圖之 圖像及右眼視圖之圖像可能已在實質上相同時間被俘獲或 產生》視頻編碼器2〇可接著縮減左眼視圖圖像之圖像及右 眼視圖之圖像的解析度(206) ο在一些實例中,視頻編碼器 20之預處理單元可接收該等圖像。在一些實例中,視頻處 理單元可在視頻編碼器20外部。 在圖7之實例中’視頻編碼器2〇縮減左眼視圖之圖像及 右眼視圖之圖像的解析度(206)。舉例而言,視頻編碼器2〇 可次取樣經接收左眼視圖圖像及右眼視圖圖像(例如,使 158748.doc -84- 201223249 用逐列、逐行或梅花陣式(棋盤式)次取樣)、抽取經接收左 眼視圖圖像及右眼視圖圖像之列或行,或以其他方式縮減 經接收左眼視圖圖像及右眼視圖圖像之解析度。在一些實 例中’視頻編碼器20可產生具有左眼視圖之對應全解析度 圖像之寬度之一半或高度之一半的兩個縮減解析度圖像。 在包括視頻預處理器之其他實例中,視頻預處理器可經組 態以縮減右眼視圖圖像之解析度。 視頻編碼器20可接著形成包括經降取樣左眼視圖圖像及 經降取樣右眼視圖圖像兩者之基礎層圖框(2〇8)。舉例而 言,視頻編碼器20可形成基礎層圖框,其具有並列式配 置、上下式配置、具有與右視圖圖像之行交錯的左視圖圖 像之行、具有與右視圖圖像之列交錯的左視圖圖像之列, 或呈「棋盤式」類型配置。 視頻編碼器20可接著編碼基礎層圖框(21〇)。根據本發 明之態樣,如關於圖2A及圖2B所描述,視頻編碼器2〇可 框内或框間編碼基礎層之圖像。在編碼基礎層圖框之後, 視頻編碼器20可接著編碼第一增強層圖框(2丨2)。根據圖7 所示之實例,視頻編碼器2〇將左視圖圖像編碼為第一增強 層圖框’但在其他實例中,視頻編碼器2〇可將右視圖圖像 編碼為第—增強層圖框。視頻編碼器20可框内、框間、層 U例如,層間紋理預測或層間運動預測)或視圖間編碼第 曰強層圖框。視頻編碼器2G可使用基礎層之對應縮減解 析度圖+像(例如,左眼視圖之圖像)作為用於預測目的之參 考。若視頻編碼器20使用層間預測來編碼第一增強層圖 158748.doc •85· 201223249 框’則視頻編碼器20可首先升取樣基礎層圖框之左眼視圖 圖像以用於預測目的。或者,若視頻編碼器20使用視圖間 預測來編碎第一增強層圖框,則視頻編碼器20可首先升取 樣基礎層圖框之右眼視圖圖像以用於預測目的。 在編碼第—增強層圖框之後,視頻編碼器20可接著編碼 第二增強層圖框(214)。根據圖7所示之實例,視頻編碼器 20將右視圖圖像編碼為第二增強層圖框,但在其他實例 中’視頻編碼器20可將左視圖圖像編碼為第二增強層圖 框。相似於第一增強層圖框,視頻編碼器20可框内、框 間、層間(例如,層間紋理預測或層間運動預測)或視圖間 編碼第二增強層圖框。視頻編碼器2〇可使用基礎層圖框之 對應圖像(例如,右眼視圖之圖像)作為用於預測目的之參 考來編碼第二增強層圖框。舉例而言,若視頻編碼器20使 用層間預測來編碼第二增強層圖框,則視頻編瑪器2〇可首 先升取樣基礎層圖框之右眼視圖圖像以用於預測目的。或 者,右視頻編碼器20使用視圖間預測來編碼第二增強層圖 框’則視頻編碼請可首先升取樣基礎層圖框之左眼視圖 圖像以用於預測目的。 根據本發明之態樣,視頻編碼器20亦(或或者)可使用第 增強層圖框以預測第二增強層圖框。亦即,視頻編碼器 可使用第一增強層來視圖間編碼第二增強層圖框以用於預 測目的。 視頻編碼器20可接著輸出經編碼層(216)。亦即,視頻 編碼器20可輸出包括來自基礎層、第一增強層及第二增強 158748.doc •86- 201223249 層之圖框的可縮放多視圖位元串流。根據一些實例,視頻 編喝器20或搞接至視頻編碼器2G之單元可將經編碼層健存 至電腦可讀儲存媒體、廣播經編碼層、經由網路傳輸或網 路廣播而傳輸經編碼層,或以其他方式提供經編職 料。 亦應理解,視頻編碼器20未必需要提供指示基礎層圖框 之圖框封裝配置及針對位元争流之每—圖框提供層之次序 的資訊。在-些實例t ’視頻編碼器2G可針對指示位元串 流之每一圖框之此資訊的整個位元串流提供單一資訊集 合’例如,SPS及阳訊息。在一些實例中,視頻編碼器2〇 可週期性地(例如’在每一視頻斷片之後、在圖像群組 (GOP)之後、在視頻區段之後、每隔特定數目個圖框,或 以其他週期性間隔)提供資訊。在一些實例中,視頻編碼 器20或與視頻編碼器20相關聯之另一單元亦可按需求(例 如,回應於針對SPS或SEI訊息的來自用戶端器件之請求, 或針對位元_流之標頭資料之一般請求)來提供sps及犯 訊息。 圖8為說明用於解碼可縮放多視圖位元串流之實例方法 240的流程圖,該可縮放多視圖位元串流具有基_ 1 增強層及第二增強層。雖然關於圖丨及圖3之實例組件予 以大體上描述,但應理解,其他解碼器、解碼單元及解碼 器件可經組態以執行圖8之方法。此外,未必需要以請 不之次序執行圖8之方法之步驟’且可執行較少、額外或 替代步驟。 158748.doc -87- 201223249 最初,視頻解碼器30可接收特定表示之潛在操作點之指 不(242)。亦即,視頻解碼器3〇可接收在可縮放多視圖位元 串流中提供哪些層之指以及該等層之相依性。舉例而 言,視頻解碼器30可接收提供關於經解碼視頻資料之資訊 的SPS、SEI及NAL訊息。在一些實例中,視頻解碼器3〇可 能已在接收經編碼層之前先前地接收位元串流之訊 息’在該狀況下’視頻解瑪器3〇可能已在接收經編碼層之 前已經判定可縮放多視圖位元串流之層。在一些實例中, 傳輸限制(例如’傳輸媒體之頻寬限定或限制)可造成增強 層被降級或捨棄,使得特定操作點不可用。 曰 包括視頻解碼器30之用戶端器件(例如,圖1之目的地器 件⑷亦可判定該用戶端器件之解瑪及顯現能力(24句。在 一些實例中,視頻解碼!130或經安裝有視頻解碼器3〇之用 戶端器件可能不具有解碼或顯現三維表示之圖像的能力, 或可能不具有解碼該等增強層令之一者或其兩者之圖像的 能力。在又其他實例中,網路中之頻寬可用性可禁止該基 礎層及-個或兩個增強層之擷取。因&,用戶端器件可其 於視頻解碼器3〇之解碼能力、經安裝有視頻解碼器30之: 戶:器件之顯現能力及/或當前網路條件來選擇操作點 路体株在I些實财,以端^件可經重新評估網 :條件且基於新網路條件來請求不同操作點之資料,例 二二可用頻寬增加時插取較多資料(諸如,一個或兩 固增強層)或在可用頻寬減小時褐取較少資料(諸如,僅一 個增強層或無任何增強層)。 158748.doc -88 - 201223249 在選擇操作點之後’視頻解碼器30可解碼可縮放多視圖 位元串流之基礎層(248)。舉例而言,視頻解碼器3〇可解碼 基礎層之左眼視圖之圖像及右眼視圖之圖像、分離經解碼 . 圖像,且將該等圖像升取樣至全解析度。根據一些實例, 視頻解碼器30可首先解碼基礎層之左眼視圖之圖像,繼而 解瑪基礎層之右眼視圖之圖像。在視頻解碼器3〇將經解碼 基礎層分離成構成圖像(例如’左眼視圖之圖像及右眼視 圖之圖像)之後,視頻解碼器30可儲存左眼視圖圖像及右 眼視圖圖像之複本用於參考以解碼增強層。另外,基礎層 之左眼視圖圖像及右眼視圖圖像皆可為縮減解析度圖像。 因此,視頻解碼器30可(例如)藉由内插遺失資料以形成左 眼視圖圖像及右眼視圖圖像之全解析度版本來升取樣左眼 視圖圖像及右眼視圖圖像。 在些實例中,視頻解碼器30或經安裝有視頻解碼器3〇 之器件(例如,圖1所示之目的地器件14)可能不具有解碼該 等增強層中之一者或其兩者的能力。在其他實例中,傳輸 限制(例如,傳輸媒體之頻寬限定或限制)可造成增強層被 降級或捨棄。在其他實例中,視頻顯示器32可能不具有呈 現兩個視圖之能力,例如,可能不具備3_D能力。因此, • 在圖8所示之實例中,視頻解碼器30判定(步驟246之)選定 操作點是否包括解碼第一增強層(25〇)。 若視頻解碼器3〇未解碼第一增強層,或第一增強層不再 存在於位元串流中,則視頻解碼器30可升取樣(例如,内 插)基礎層之左眼視圖圖像及右眼視圖圖像,且將左眼視 158748.doc -89- 201223249 圖圖像及右眼視圖圖像之經升取樣表示發送至視頻顯示器 32,視頻顯示器32可同時地或幾乎同時地顯示左眼視圖及 右眼視圖圖像(252)。在另一實例中,若視頻顯示器32不能 夠顯示立體(例如,3D)内容,則視頻解碼器3〇或視頻顯示 器32可在顯示之前捨棄左眼視圖圖像或右眼視圖圖像。 然而,視頻解碼器30可解碼第一增強層(254)。如上文 關於圖3所描述,視頻解碼器3 〇可接收用以幫助視頻解碼 器30解碼第一增強層之語法。舉例而言,視頻解碼器30可 判疋疋框内、框間、層間(例如,紋理或運動)或是視圖丨 預測用以編碼第一增強層。視頻解碼器3〇可接著相應地》 碼第一增強層。根據本發明之一些態樣,視頻解碼器3〇7 在解碼第一增強層之前升取樣基礎層之對應圖像。 如上文所描述’視頻解碼器3〇或經安裝有視頻解碼器] 之器件可能;ί;具有解碼增強層兩者之能力,或傳輸限制; 造成第二增強層被降級或捨棄。因此,在解碼第一增強/ 之後,視頻解碼H3G判定選定操作點(步驟246)是否包括角 碼第二增強層(256)。 若視頻解碼器30未解碼第二增強層,或第二增強層不士 存在於位元串流t,則視頻解碼器3q可捨棄不與第_增歹 層相關聯的基礎層之圖像, ^ ^ a 且將與第一增強層相關聯之田 像發达至顯示器32(258)。亦卽,斟认 , 邛即,對於不能夠顯示立體内 之視頻顯示器32,視頻解辑哭、 _ 轉碼器30或視頻顯示器32可在顯. 之刖捨棄不與第一增強層相關 -,㈣一…'目關聯的基礎層之圖像。舉例 ° 右第一增強層包括全解才 、 鮮析度左眼視圖圖像,則視頻阁 158748.doc •90. 201223249 碼器30或顯示器32可在顯示之前捨棄基礎層之右眼視圖圖 像。或者’若第一增強層包括全解析度右眼視圖圖像,則 視頻解碼器30或顯示器32可在顯示之前捨棄基礎層之左眼 視圖圖像。 在另一實例中,若視頻解碼器20未解碼第二增強層,或 第一增強層不再存在於位元串流中’則視頻解碼器3 〇可將 一個經升取樣圖像(例如,來自基礎層)及一個全解析度圖 像(例如’來自增強層)發送至顯示器32,顯示器32可同時 地或幾乎同時地顯示左眼視圖圖像及右眼視圖圖像。亦 即,若第一增強層對應於左視圖圖像,則視頻解碼器3〇可 將來自第一增強層之全解析度左視圖圖像及來自基礎層之 經升取樣右視圖圖像發送至顯示器32。或者,若第一增強 層對應於右視圖圖像,則視頻解碼器3〇可將來自第一增強 層之全解析度右視圖圖像及來自基礎層之經升取樣左視圖 圖像發送至顯示器32。顯示器32可同時地或幾乎同時地呈 現該一個全解析度圖像及該一個經升取樣圖像。 然而,視頻解碼器30可解碼第二增強層(26〇)。如上文 關於圖3所描述’視頻解碼器3〇可接收用以幫助視頻解碼 器30解碼第二增強層之語法。舉例而言,視頻解碼器可 判定是框内、框間、層間(例如,紋理或運動)或是視圖間 預測用以編碼第二增強層。視頻解碼器30可接著相應地解 碼第二增強層1據本發明之—㈣'樣,視頻解竭器辦 在解碼第1強層之前升取樣基礎層之對應經解碼圖像。 或者,若解碼器30敢第二增強層係基於第—增強層被預 158748.doc -91- 201223249 測,則解碼器30可在解碼第二增強層時使用經解碼第一增 強層。 在解碼第一增強層(254)及第二增強層(26〇)兩者之後, 視頻解碼器30可將來自該等增強層之全解析度左視圖圖像 及全解析度右視圖圖像兩者發送至顯示器32。顯示器32可 同時地或幾乎同時地呈現全解析度左視圖圖像及全解析度 右視圖圖像(262)。 在一些實例中,視頻解碼器3〇或經安裝有視頻解碼器% 之器件(例如,圖1所示之目的地器件14)可能不具備三維視 頻播放之能力。在此等實例中,視頻解碼器3〇可能不解碼 兩個圖像。亦即’解瑪器3G可能僅解瑪基礎層之左眼視圖 圖像且跳過(例如,捨棄)基礎層之右眼視圖圖像。另外, 視頻解碼H 30可能僅解碼對應於基礎層之經解碼視圖的增 強層》以此方式,器件可能能夠接收及解碼可縮放多視圖 位元串流,而無論該等器件是否能夠解碼及/或顯現三維 視頻資料。 * 雖然關於視頻編碼器及視頻解碼器予以大體上描述,但 本發明之技術可實施於其他器件及編碼單元中。舉例而 δ,可藉由可經組態以接收兩個分離互補位元串流且變碼 該兩個位元串流以形成包括基礎層、第一增強層及第二增 強層之單-位元辛流的變碼器(transc〇der)執行用於形心 括基礎層、第一增強層及第二增強層之可縮放多視圖位元 串流的技術。作為另-實例’可藉由經組態以接收包括基 礎層、第-增強層及第二增強層之位元串流且產生對應二 158748.doc •92· 201223249 基礎層之各別視圖之兩個分離位元串流(每一者包括用於 各別視圖之經編碼視頻資料)的變碼器執行用於分解可縮 放多視圖位元串流之技術。 在一或多個實例中,所描述功能可實施於硬體'軟體、 韌體或其任何組合中。若實施於軟體中,則該等功能可作 為一或多個指令或程式碼而儲存於電腦可讀媒體上或經由 電腦可讀媒體而傳輸,且藉由以硬體為基礎之處理單元執 行。電腦可讀媒體可包括電腦可讀儲存媒體(其對應於諸 如資料儲存媒體之有形媒體)或通信媒體,通信媒體包括 促進(例如)根據通信協定而將電腦程式自一處傳送至另一 處之任何媒體。以此方式,電腦可讀媒體通常可對應於⑴ 為非暫時之有形電腦可讀儲存媒體’或⑺諸如信號或載波 之通信媒體。資料儲存媒體可為可藉由一或多個電腦或一 或多:處理器存取以擷取指令、程式碼及/或資料結構以 用於實施本發明中所描述之技術的任何可用媒^電腦程 式產品可包括電腦可讀媒體。 藉由實例而非限制’此等電腦可讀儲存媒體可包含 RAM R〇M、EEpR〇M、CD R〇M或其他光碟儲存器件、 磁碟儲存器件或其他磁性儲存器件、快閃記憶體,或可用 以儲存呈指令或資料結構之形式之所要程式碼且可藉由電 腦存取的任何其他媒體。又,可將任何連接適當地稱作電 腦可讀媒體。舉例而言,若使用同軸電镜、光覆、雙絞 =、數位用戶線(DSL)或諸如紅外線、無線電及微波之盔 線技術而自網站、伺服器或其他遠端來源傳輪指令,則同 158748.doc •93- 201223249 抽電境、光纜、雙絞線、DSL或諸如紅外線、無線電及微 波之無線技術包括於媒體之定義中。然而,應理解,電腦 可讀儲存媒體及資料儲存媒體不包括連接、載波、信號或 其他暫時性媒體,而是有關非暫時性有形儲存媒體。如本 文中所使用,磁碟及光碟包括緊密光碟(CD)、雷射光碟、 光碟、數位影音光碟(DVD)、軟性磁碟及藍光光碟,其中 磁碟通常以磁性方式再生資料,而光碟用雷射以光學方式 再生貝料。以上各者之組合亦應包括於電腦可讀媒體之範 疇内。 可藉由諸如一或多個數位信號處理器(DSp)、一般用途 微處理器、特殊應用積體電路(ASIC)、場可程式化邏輯陣 列(FPGA)或其他等效積體或離散邏輯電路之一或多個處理 器執行指令。因此’本文中所使用之術語「處理器」可指 代上述結構或適於實施本文中所描述之技術之任何其他結 構中任―者°另外’在-些態樣中,可將本文中所描述之 功能性提供於經組態以用於編碼及解碼之專用硬體及/或 軟體模組内,或併入於組合式編解碼器中。又該等技術 可完全地實施於一或多個電路或邏輯元件中。 本發明之技術可實施於廣泛多種器件或裝置中,該等器 件或裝置包括無線手機、積體電路(1(:)或1(:集合(例如,晶 片組)。本發明中描述各種組件、模組或單元以強調經組 態以執行所揭示技術之器件之功能態樣,但未必需要藉由 不同硬體單元之實j見。相反地,如上文所描述,可將各種 單元組合於編解碼器硬體單元中,或藉由互操作性硬體單 158748.doc •94· 201223249 元(包括如上文所描述之农 而結合合 ^ 或多個處理器)之集合 適軟體及/或動體來提供該等單元。 已描述各種實例 利範圍之範_内。 此等實例及其他實例係在以下申請專 【圖式簡單說明】 圖為說明貫例視頻編碼及解碼系統的方塊圖,該視頻 編碼及解W統可利用料形成包括來自—場景之兩個視 圖之圖像之可縮放多視圖位元串流的技術。 圖2A為說明可實施用於產生可縮放多視圖位元串流之技 術之視頻編碼器之實例的方塊圖,該可縮放多視圖位元串 流具有包括兩個縮減解析度圖像之—個基礎層,及各自包 括來自該基礎層之各別全解析度圖像之兩個額外增強層。 圖2B為說明可實施用於產生可縮放多視圖位元奉流之技 術之視頻編碼器之另-實例的方塊圖,該可縮放多視圖位 元串流具有包括兩個縮減解析度圖像之一個基礎層,及各 自包括對應於該基礎層之各別全解析度圖像之兩個額外增 強層。 圖3為說明視頻解碼器之實例的方塊圖,該視頻解碼器 解碼經編碼視頻序列。 圖4為說明左眼視圖圖像及右眼視圖圖像的概念圖,該 左眼視圖圖像及該右眼視圖圖像係藉由視頻編碼器組合以 形成具有用於兩個視圖之縮減解析度圖像之基礎層,以及 該左眼視圖圖像之全解析度增強層。 圖5為說明左眼視圖圖像及右眼視圖圖像的概念圖,該 158748.doc •95- 201223249 左眼視圖圖像及該右眼視圖圖像係藉由視頻編碼器組合以 形成具有用於兩個視圖之縮減解析度圖像之基礎層,以及 該右眼視圖圖像之全解析度增強層。 圖6為說明左眼視圖圖像及右眼視圖圖像的概念圖,該 左眼視圖圖像及該右眼視圖圖像係藉由視頻編碼器組合以 形成基礎層、全解析度左眼視圖圖像及全解析度右眼視圖 圖像。 圖7為說明用於形成及編碼可縮放多視圖位元串流之實 例方法的流程圖,該可縮放多視圖位元丰流包括具有兩個 不同視圖之兩個縮減解析度圖像之基礎層,以及第一增強 層及第二增強層。 圖8為說明用於解碼可縮放多視圖位元串流之實例方法 的流程圖,該可縮放多視圖位元串流具有基礎層、第一增 強層及第二增強層。 曰 【主要元件符號說明】 10 系統 12 來源器件 14 目的地器件 16 通信頻道 18 視頻來源 20 視頻編碼器 22 調變器/解調變器(數據機) 24 傳輸器 26 接收器 158748.doc •96· 201223249 28 數據機 30 視頻解碼器 32 顯示器件 40 模式選擇單元 42 運動/差別估計單元 44 運動補償單元 46 框内預測單元 50 求和器 52 變換單元 54 量化單元 56 熵編碼單元 58 逆量化單元 60 逆變換單元 62 求和器 64 參考圖框儲存器 82 基礎層 84 第一增強層 86 第二增強層 90 運動估計/運動補償單元 98 層間預測單元 100 層間紋理預測單元 102 層間運動預測單元 106 視圖間預測單元 114 變換及量化單元 158748.doc -97- 201223249 118 122 130 132 134 136 138 140 142 180 182 184 186 188 190 192 194 B1 B2 El E2 熵編碼及多工單元 逆量化/逆變換單元/重輯構/解區塊單 熵解碼單元 運動補償單元 框内預測單元 逆量化單元 逆變換單元 求和器 參考圖框儲存器 左眼視圖圖像 右眼視圖圖像 基礎層/基礎層圖框 增強層/增強層圖框/第一增強層 層間預測 視圖間預測 增強層/增強層圖框/第二增強層 層間預測 基礎層之左眼視圖 基礎層之右眼視圖 基礎層之左眼視圖之全解析度圖像 基礎層之右眼視圖之全解析度圖像 158748.doc -98-158748. Doc 201223249 1 84 ("Foundation Layer Frame 184"). Video encoder 20 also forms a frame 186 ("Enhancement Layer Frame 186") corresponding to the enhancement layer of left eye view image 180. In this example, 'video encoder 20 receives image 18 原始 of the original video material including the left eye view of a scene, and image 182 of the original video material including the right eye view of the scene » the left eye view may correspond to The view is 〇, and the right eye view can correspond to view 1. Images 180, 182 may correspond to two images of the same time instance. For example, images 丨 80, 182 may have been captured by the camera at substantially the same time. In the example of Fig. 4, X is used to indicate a sample (e.g., pixel) of image ’8 and Ο is used to indicate a sample of image 18 2 . As shown, video encoder 20 may downsample image 18, downsample image 182, and combine the images to form base layer frame 184 that video encoder 20 may encode. In this example, video encoder 20 configures downsampled image i 80 and downsampled image 182 in base layer frame 184 in a side-by-side configuration. To downsample images 180 and 182 and to place the downsampled image in a side-by-side base layer frame ι 84, video encoder 20 may extract alternating lines for each of images 180 and 182. As another example, video encoder 20 may completely remove alternating lines of images 180 and 182 to produce a downsampled version of images 180 and 182. However, in other examples, video encoder 2 may have other configurations to encapsulate downsampled image 180 and downsampled image 182. For example, video encoder 20 may alternate the lines of images 18 and 182. In another example, video encoder 20 may extract or remove columns of images 1 and 182 and configure the downsampled image as described above or alternately. In yet another example, the video encoder 20 can sample the images 18 〇 and ι 82 in a plum pattern (checkerboard) and assign the sample to 158748. Doc •79· 201223249 is placed in the base layer frame 184. In addition to the base layer frame 184, the video encoder 2〇 may also encode a full resolution enhancement layer frame 186 corresponding to the image of the left eye view (e.g., view 基础) of the base layer frame 184. In accordance with some aspects of the present invention, video encoder 20 may encode enhancement layer frame 186 using inter-layer prediction (represented by dashed line 188) as previously described. That is, video encoder 2 may encode enhancement layer frame 86 using inter-layer prediction using inter-layer texture prediction or inter-layer prediction using inter-layer motion prediction. Alternatively or additionally, as previously described, video encoder 20 may encode enhancement layer frame 186 using inter-view prediction (represented by dashed line 19A). In the illustration of FIG. 4, base layer frame 184 includes X corresponding to the material from image 180 and 〇 corresponding to the data from image frame 82. However, it should be understood that the data of the base layer frame 184 corresponding to images 180 and 182 will not necessarily be exactly aligned with the data of images 180 and 182 after downsampling. Similarly, after the marshalling, the data of the image in the base layer frame 184 will likely be different from the data of the images 180, 182. Therefore, it should not be assumed that the data of an X or 基础 in the base layer frame 184 is necessarily equivalent to the corresponding X or 0 in the image J 8 〇, 182, or the X or 〇 in the base layer frame 184 is The resolution is the same as the resolution of X or Ο in 18〇, 182. 5 is a conceptual diagram illustrating a left-eye view image 18〇 and a right-eye view image 82, which are combined by the video squeegee 20 to form a base layer. A frame 184 ("base layer frame 184") and a frame 192 of the enhancement layer corresponding to the right eye view image 182 ("enhancement layer frame 192"). In this example, the video encoder 2 receives a scene including 158748. Doc •80- 201223249 The image of the original video material in the left eye view is 18〇, and the image 182 of the original video material including the right eye view of the scene. The left eye view may correspond to the view 〇, and the right eye view may correspond to the view 丨. Images 18 〇, 182 may correspond to two images of the same time instance. For example, images 180, 182 may have been captured by the camera at substantially the same time. Similar to the example shown in Fig. 4, the example shown in Fig. 5 includes samples (e.g., pixels) of image 180 indicated by ,, and samples of image 182 indicated by 〇. As shown, the video encoder 2 can downsample and encode the image 18, downsample, and encode the image 182, and combine the images to form the base layer in the same manner as shown in FIG. Figure 丨84. In addition to the base layer frame 184, the video encoder 2〇 may also encode a right-eye view (eg, a full-resolution enhancement layer frame m of the image of the view) corresponding to the base layer 184. In accordance with some aspects of the present invention, As previously described, video encoder 20 may encode enhancement layer frame 192 using inter-layer prediction (represented by dashed line 188). That is, video encoder 2 may use inter-layer prediction using inter-layer texture prediction or utilize inter-layer motion prediction. Inter-layer prediction encodes enhancement layer frame 192. Alternatively or additionally, video encoder 20 may encode enhancement layer frame 192 using inter-view prediction (represented by dashed line 190) as previously described. Figure 6 illustrates left eye A conceptual diagram of the view image 18A and the right eye view image 182, the left eye view image 180 and the right eye view image 182 are combined by the video encoder 20 to form a base layer frame 184 ("base layer map" Block 184"), the first enhancement layer frame (first enhancement layer frame 186") including the full-resolution image of the left eye view 180, and the full resolution 158748 including the right eye view 182. Doc • 81 - Frame of the second enhancement layer of the 201223249 degree image ("Second Enhancement Layer Frame 192")" In this example, video encoder 20 receives the original video material including the left eye view of a scene. Image 180, and an image 182 of the original video material including the right eye view of the scene. The left eye view may correspond to view 〇 and the right eye view may correspond to view 1. Images 180, 182 may correspond to two images of the same time instance. For example, images 180, 182 may have been captured by the camera at substantially the same time. Similar to the examples shown in Figures 4 and 5, the example shown in Figure 6 includes a sample of image 180 that is again indicated (e.g. , pixel), and a sample of image 182 indicated by 〇. As shown, video encoder 2 can downsample and encode image 180, downsample and encode image 丨82, and combine the images in the same manner as shown in Figures 4 and 5. A base layer frame 丨 84 is formed. In addition to the base layer frame 184, the video encoder 2 〇 may also encode a first enhancement layer frame 186 corresponding to the left eye view image (eg, view 基础) of the base layer frame 184 ^Video Encoder 2 The second enhancement layer frame 192 corresponding to the right eye view image (eg, view 基础) of the base layer frame 184 may also be encoded. However, the ordering of the enhancement layer frames is provided as an example only. That is, in other examples, the video coder 2 〇 can encode a first enhancement layer frame corresponding to the image of the right eye view of the base layer frame 184 and a left eye corresponding to the base layer frame 184 The second enhancement layer of the image of the view. In the example shown in FIG. 6, video encoder 20 may encode first enhancement layer frame 186 using inter-layer prediction (represented by dashed line 188) based on base layer frame 184, as previously described. That is, the video encoder 2 can use inter-layer prediction using inter-layer texture prediction or inter-layer prediction using inter-layer motion prediction. Doc 82· 201223249 Measured to encode the first enhancement layer frame 186 » Alternatively or additionally, as previously described, video encoder 20 may use inter-view prediction (represented by dashed line 19 )) based on base layer frame 184 The first enhancement layer frame 186. As described above, video encoder 20 may also encode second enhancement layer frame 192 using inter-layer prediction (represented by dashed line 194) based on base layer frame 184. That is, the video encoder 20 can compose the second enhancement layer frame 192 based on the base layer frame 184 using inter-layer prediction using inter-layer texture prediction or inter-layer prediction using inter-layer motion prediction. Alternatively or additionally, video encoder 2 may encode second enhancement layer frame 192 using inter-view prediction (represented by dashed line 190) based on first enhancement layer frame 186. According to an aspect of the present invention, the amount of bandwidth of the multi-view scalable bit stream dedicated to each layer (ie, the base layer 1, the first enhancement layer 186, and the second enhancement layer 192) may be based on the layer Dependency changes. For example, in general, video encoder 20 may assign 50% to 60% of the bandwidth of the scalable multiview bitstream to base layer 184. That is, the data associated with the base layer 184 constitutes the view of the entire f-material dedicated to the bit stream. If the first enhancement layer 186 and the second enhancement layer 192 are not dependent on each other (eg, the first enhancement layer 192 does not use the first enhancement layer 186 for prediction purposes), the video encoder 20 may approximate the remaining bandwidth as an equal amount. Each of the respective enhancement layers 186, 192 is assigned (eg, 25% to 20% of the bandwidth for each respective enhancement layer 186, 192). If the second enhancement layer 192 is predicted from the first enhancement layer 186, the L rs m video coding benefit 20 can assign a relatively large amount of bandwidth to the first enhancement. Layer 18 6. f % .  *❸ is also the 'video encoder 2〇 can be bandwidth 158748. Doc -83 - 201223249 and assign approximately 25% to 30% of the approximate bandwidth to the first enhancement layer! % 15〇/. Up to 20% is assigned to the second enhancement layer 192. Circle 7 is a flow diagram illustrating an example method 200 for forming and encoding a scalable multiview bitstream that includes the basis of two reduced resolution images having two different views. a layer, and a first enhancement layer and a second enhancement layer. Although the example components of Figures 1 and 2A-2B are generally described, it should be understood that other encoders, coding units, and encoding devices can be configured to perform the method of Figure 7. Moreover, it is not necessary to perform the steps of the method of Figure 7 in the order shown in Figure 7, and fewer, additional or alternative steps may be performed. In the example of FIG. 7, video encoder 2 〇 first receives an image of a left eye view (eg, 'view G') (2G2) 〇 video encoder 2 〇 can also receive an image of a right eye view (eg, view 1) ( 204), causing the two received images to form a stereo to image pair. The left eye view and the right eye view may form a stereo view pair, which is also referred to as a complementary view pair. The received image of the right eye view may correspond to the same time portion as the received image of the left eye view. That is, the image of the left eye view and the image of the right eye view may have been captured or generated at substantially the same time. "Video Encoder 2" may then reduce the image of the left eye view image and the view of the right eye view. Resolution of the image (206). In some examples, the pre-processing unit of video encoder 20 can receive the images. In some examples, the video processing unit can be external to video encoder 20. In the example of Fig. 7, the video encoder 2 reduces the resolution of the image of the left eye view and the image of the right eye view (206). For example, video encoder 2 〇 may subsample the left eye view image and the right eye view image (eg, make 158748. Doc -84- 201223249 Using a column-by-column, progressive or plum pattern (checkerboard) sampling, extracting columns or rows of received left-eye view images and right-eye view images, or otherwise reducing received left The resolution of the eye view image and the right eye view image. In some embodiments, video encoder 20 may generate two reduced resolution images having one-half or one-half the width of the corresponding full-resolution image of the left-eye view. In other examples including a video pre-processor, the video pre-processor can be configured to reduce the resolution of the right-eye view image. Video encoder 20 may then form a base layer frame (2〇8) including both the downsampled left eye view image and the downsampled right eye view image. For example, video encoder 20 may form a base layer frame having a side-by-side configuration, a top-bottom configuration, a row of left-view images with rows interlaced with the right-view image, and a column with right-view images Interlaced left view image, or "checkerboard" type configuration. Video encoder 20 may then encode the base layer frame (21〇). In accordance with aspects of the present invention, as described with respect to Figures 2A and 2B, video encoder 2 may encode an image of the base layer in-frame or inter-frame. After encoding the base layer frame, video encoder 20 may then encode the first enhancement layer frame (2丨2). According to the example shown in FIG. 7, the video encoder 2 编码 encodes the left view image as the first enhancement layer frame 'but in other examples, the video encoder 2 编码 encodes the right view image as the first enhancement layer Frame. Video encoder 20 may encode the first layer of the frame between frames, between frames, layer U, for example, inter-layer texture prediction or inter-layer motion prediction. The video encoder 2G can use the corresponding reduced resolution map + image of the base layer (e.g., the image of the left eye view) as a reference for prediction purposes. If video encoder 20 uses inter-layer prediction to encode the first enhancement layer map 158748. Doc •85· 201223249 Box 'The video encoder 20 may first upsample the left eye view image of the base layer frame for prediction purposes. Alternatively, if video encoder 20 uses inter-view prediction to shred the first enhancement layer frame, video encoder 20 may first upsample the right eye view image of the base layer frame for prediction purposes. After encoding the first enhancement layer frame, video encoder 20 may then encode a second enhancement layer frame (214). According to the example shown in FIG. 7, video encoder 20 encodes the right view image as a second enhancement layer frame, but in other examples 'video encoder 20 may encode the left view image as a second enhancement layer frame . Similar to the first enhancement layer frame, video encoder 20 may encode the second enhancement layer frame in-frame, inter-frame, inter-layer (e.g., inter-layer texture prediction or inter-layer motion prediction) or inter-view coding. The video encoder 2 may encode the second enhancement layer frame using a corresponding image of the base layer frame (e.g., an image of the right eye view) as a reference for prediction purposes. For example, if video encoder 20 uses inter-layer prediction to encode the second enhancement layer frame, video coder 2 may first upsample the right eye view image of the base layer frame for prediction purposes. Alternatively, the right video encoder 20 encodes the second enhancement layer frame using inter-view prediction. Then the video encoding may first upsample the left eye view image of the base layer frame for prediction purposes. In accordance with aspects of the present invention, video encoder 20 may also (or alternatively) use a first enhancement layer frame to predict a second enhancement layer frame. That is, the video encoder may use the first enhancement layer to encode the second enhancement layer frame between views for prediction purposes. Video encoder 20 may then output an encoded layer (216). That is, video encoder 20 may output including from the base layer, the first enhancement layer, and the second enhancement 158748. Doc •86- 201223249 The scalable multiview bitstream of the layer's frame. According to some examples, the video composer 20 or the unit spliced to the video encoder 2G may store the encoded layer to a computer readable storage medium, a broadcast encoded layer, transmit via a network transmission, or a network broadcast. Layer, or otherwise provide warp material. It should also be understood that video encoder 20 does not necessarily need to provide information indicating the frame encapsulation configuration of the base layer frame and the order in which the layers are provided for each bit stream. In some instances t' video encoder 2G may provide a single set of information, e.g., SPS and positive messages, for the entire bit stream of this information indicating each frame of the bit stream. In some examples, video encoder 2 may be periodically (eg, after each video fragment, after a group of pictures (GOP), after a video segment, every certain number of frames, or Other periodic intervals) provide information. In some examples, video encoder 20 or another unit associated with video encoder 20 may also be on demand (eg, in response to a request from a client device for an SPS or SEI message, or for a bit_stream) The general request for header information) to provide sps and commit messages. 8 is a flow diagram illustrating an example method 240 for decoding a scalable multiview bitstream having a base_1 enhancement layer and a second enhancement layer. Although the components of the FIG. 3 and FIG. 3 are generally described, it should be understood that other decoders, decoding units, and decoding devices can be configured to perform the method of FIG. Moreover, it is not necessary to perform the steps of the method of Figure 8 in the order in which it is performed, and fewer, additional, or alternative steps may be performed. 158748. Doc -87- 201223249 Initially, video decoder 30 may receive a pointer to a potential operational point of a particular representation (242). That is, video decoder 3 can receive which layers of fingers are provided in the scalable multiview bitstream and the dependencies of the layers. For example, video decoder 30 may receive SPS, SEI, and NAL messages that provide information about decoded video material. In some examples, the video decoder 3 may have previously received the message of the bit stream before receiving the encoded layer 'in this case' the video decimator 3 may have determined before the received encoded layer Scales the layers of a multiview bitstream. In some instances, transmission restrictions (e.g., bandwidth limitations or limitations of the transmission medium) may cause the enhancement layer to be degraded or discarded such that a particular operating point is not available.用户 includes the client device of video decoder 30 (eg, destination device (4) of FIG. 1 can also determine the gamma and presentation capabilities of the client device (24 sentences. In some instances, video decoding! 130 or installed The video decoder 3's client device may not have the ability to decode or visualize an image of a three-dimensional representation, or may not have the ability to decode an image of one or both of the enhancement layers. The bandwidth availability in the network may prohibit the capture of the base layer and the one or two enhancement layers. Because &, the user equipment can decode the video decoder 3, and install video decoding. Device 30: User: The device's presentation capability and/or current network conditions to select the operating point system in the real money, the end can be re-evaluated network: conditions and request different based on the new network conditions Operating point data, example 22, when the available bandwidth is increased, more data is inserted (such as one or two solid enhancement layers) or when the available bandwidth is reduced, less data is obtained (such as only one enhancement layer or no Enhancement layer). 158748. Doc -88 - 201223249 After selecting an operating point, video decoder 30 may decode the base layer (248) of the scalable multiview bitstream. For example, the video decoder 3 can decode the image of the left eye view of the base layer and the image of the right eye view, and the separation is decoded.  Images, and the images are upsampled to full resolution. According to some examples, video decoder 30 may first decode the image of the left eye view of the base layer, and then the image of the right eye view of the base layer. After the video decoder 3 detaches the decoded base layer into constituent images (eg, an image of the left eye view and an image of the right eye view), the video decoder 30 may store the left eye view image and the right eye view. A copy of the image is used for reference to decode the enhancement layer. In addition, the left eye view image and the right eye view image of the base layer may be reduced resolution images. Thus, video decoder 30 may upsample the left eye view image and the right eye view image, for example, by interpolating the missing data to form a full resolution version of the left eye view image and the right eye view image. In some examples, video decoder 30 or a device mounted with video decoder 3 (eg, destination device 14 shown in FIG. 1) may not have one or both of decoding the enhancement layers. ability. In other instances, transmission restrictions (e.g., bandwidth limitations or limitations of the transmission medium) may cause the enhancement layer to be degraded or discarded. In other examples, video display 32 may not have the ability to present two views, for example, may not have 3_D capabilities. Thus, in the example shown in Figure 8, video decoder 30 determines (step 246) whether the selected operating point includes decoding the first enhancement layer (25A). If the video decoder 3 does not decode the first enhancement layer, or the first enhancement layer no longer exists in the bitstream, the video decoder 30 may upsample (eg, interpolate) the left eye view image of the base layer. And the right eye view image, and the left eye is seen 158748. Doc -89 - 201223249 The upsampled representation of the image and right eye view image is sent to video display 32, which can display the left eye view and the right eye view image (252) simultaneously or nearly simultaneously. In another example, if video display 32 is unable to display stereoscopic (e.g., 3D) content, video decoder 3 or video display 32 may discard the left eye view image or the right eye view image prior to display. However, video decoder 30 may decode the first enhancement layer (254). As described above with respect to Figure 3, video decoder 3 may receive syntax to assist video decoder 30 in decoding the first enhancement layer. For example, video decoder 30 may determine intra-frame, inter-frame, inter-layer (e.g., texture or motion) or view 丨 prediction to encode the first enhancement layer. The video decoder 3 接着 can then code the first enhancement layer accordingly. In accordance with some aspects of the present invention, video decoder 〇7 upsamples the corresponding image of the base layer prior to decoding the first enhancement layer. A device of the 'video decoder 3' or a video decoder installed as described above may; have the ability to decode both enhancement layers, or transmission restrictions; causing the second enhancement layer to be degraded or discarded. Thus, after decoding the first enhancement /, video decoding H3G determines whether the selected operating point (step 246) includes the corner second enhancement layer (256). If the video decoder 30 does not decode the second enhancement layer, or the second enhancement layer does not exist in the bit stream t, the video decoder 3q may discard the image of the base layer that is not associated with the first enhancement layer. ^ ^ a and the field image associated with the first enhancement layer is developed to display 32 (258). Also, 斟 , 邛 邛 邛 对于 对于 对于 对于 对于 对于 对于 对于 对于 对于 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频 视频  Then discard the image of the base layer that is not associated with the first enhancement layer -, (4) one... Example ° Right first enhancement layer includes full solution, fresh resolution left eye view image, then video court 158748. Doc •90.  The 201223249 coder 30 or display 32 can discard the right eye view image of the base layer prior to display. Or' If the first enhancement layer includes a full resolution right eye view image, video decoder 30 or display 32 may discard the left eye view image of the base layer prior to display. In another example, if video decoder 20 does not decode the second enhancement layer, or the first enhancement layer is no longer present in the bitstream, then video decoder 3 may upsample an image (eg, From the base layer and a full resolution image (eg, 'from the enhancement layer') is sent to display 32, which can display the left eye view image and the right eye view image simultaneously or nearly simultaneously. That is, if the first enhancement layer corresponds to the left view image, the video decoder 3 may send the full resolution left view image from the first enhancement layer and the upsampled right view image from the base layer to Display 32. Alternatively, if the first enhancement layer corresponds to the right view image, the video decoder 3 may send the full resolution right view image from the first enhancement layer and the upsampled left view image from the base layer to the display. 32. Display 32 can present the one full resolution image and the one upsampled image simultaneously or nearly simultaneously. However, video decoder 30 may decode the second enhancement layer (26〇). The video decoder 3, as described above with respect to Figure 3, can receive syntax to assist video decoder 30 in decoding the second enhancement layer. For example, the video decoder may determine whether to be in-frame, inter-frame, inter-layer (e.g., texture or motion) or inter-view prediction to encode the second enhancement layer. Video decoder 30 may then decode the second enhancement layer 1 accordingly. According to the invention, the video decompressor upsamples the corresponding decoded image of the base layer before decoding the first strong layer. Alternatively, if the decoder 30 dares that the second enhancement layer is based on the first enhancement layer is pre-158748. Doc -91 - 201223249 The decoder 30 may use the decoded first enhancement layer when decoding the second enhancement layer. After decoding both the first enhancement layer (254) and the second enhancement layer (26〇), video decoder 30 may provide full resolution left view image and full resolution right view image from the enhancement layers. The sender sends to the display 32. Display 32 may present a full resolution left view image and a full resolution right view image (262) simultaneously or nearly simultaneously. In some instances, the video decoder 3 or a device mounted with a video decoder % (e.g., destination device 14 shown in Figure 1) may not have the capability to play 3D video. In these examples, video decoder 3 may not decode two images. That is, the 'solvator 3G may only solve the left eye view image of the base layer and skip (for example, discard) the right eye view image of the base layer. In addition, video decoding H 30 may only decode the enhancement layer corresponding to the decoded view of the base layer. In this manner, the device may be able to receive and decode the scalable multiview bitstream regardless of whether the devices are capable of decoding and/or Or visualize 3D video material. * Although generally described with respect to video encoders and video decoders, the techniques of this disclosure may be implemented in other devices and coding units. For example, δ may be formed by receiving two separate complementary bitstreams and transcoding the two bitstreams to form a single-bit comprising a base layer, a first enhancement layer, and a second enhancement layer The transcoder of the symplectic stream performs a technique for centring the scalable multiview bitstream of the base layer, the first enhancement layer, and the second enhancement layer. Alternatively, the instance can be configured to receive a bit stream comprising the base layer, the first enhancement layer, and the second enhancement layer and to generate a corresponding two 158748. Doc •92· 201223249 Two separate bitstreams of the individual views of the base layer (each of which includes encoded video material for the respective views) are executed to decompose the scalable multiview bit string Streaming technology. In one or more examples, the functions described can be implemented in a hardware 'soft body, a firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions or code on a computer readable medium or transmitted via a computer readable medium and executed by a hardware based processing unit. The computer readable medium can include a computer readable storage medium (which corresponds to a tangible medium such as a data storage medium) or a communication medium that facilitates, for example, transferring a computer program from one location to another in accordance with a communication protocol. Any media. In this manner, computer readable media generally may correspond to (1) a non-transitory tangible computer readable storage medium' or (7) a communication medium such as a signal or carrier wave. The data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to capture instructions, code, and/or data structures for use in implementing the techniques described in this disclosure. Computer program products may include computer readable media. By way of example and not limitation, such computer-readable storage media may include RAM R〇M, EEpR〇M, CD R〇M or other optical disk storage device, disk storage device or other magnetic storage device, flash memory, Or any other medium that can be used to store the desired code in the form of an instruction or data structure and accessible by a computer. Also, any connection can be suitably referred to as a computer readable medium. For example, if you use a coaxial electron microscope, light cover, twisted pair = digital subscriber line (DSL) or helmet line technology such as infrared, radio and microwave to transmit instructions from a website, server or other remote source, then Same as 158748. Doc •93- 201223249 Power extraction, fiber optic cable, twisted pair, DSL or wireless technologies such as infrared, radio and microwave are included in the definition of the media. However, it should be understood that computer readable storage media and data storage media do not include connections, carriers, signals or other transitory media, but rather non-transitory tangible storage media. As used herein, magnetic disks and optical disks include compact discs (CDs), laser compact discs, optical discs, digital audio and video discs (DVDs), flexible magnetic discs, and Blu-ray discs, in which the magnetic discs are typically magnetically regenerated and used in optical discs. The laser optically regenerates the material. Combinations of the above should also be included in the context of computer readable media. Can be by, for example, one or more digital signal processors (DSp), general purpose microprocessors, special application integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuits One or more processors execute the instructions. Thus, the term "processor" as used herein may refer to any of the above-described structures or any other structure suitable for implementing the techniques described herein. In addition, in some aspects, The described functionality is provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated into a combined codec. Yet these techniques can be fully implemented in one or more circuits or logic elements. The techniques of this disclosure may be implemented in a wide variety of devices or devices, including wireless handsets, integrated circuits (1 (:) or 1 (: collections (eg, wafer sets). Various components are described in the present invention, A module or unit emphasizing the functional aspects of a device configured to perform the disclosed techniques, but does not necessarily need to be seen by different hardware units. Conversely, as described above, various units can be combined Decoder hardware unit, or by interoperability hardware single 158748. Doc • 94· 201223249 (including a combination of a farm or a plurality of processors as described above) A collection of software and/or moving bodies to provide such units. Various examples have been described within the scope of the scope. These and other examples are in the following application. [Simplified illustration of the drawings] The figure shows a block diagram of a conventional video encoding and decoding system. The video encoding and decoding can be used to form two views including from the scene. The technique of scalable multiview bitstreaming of images. 2A is a block diagram illustrating an example of a video encoder that may implement techniques for generating a scalable multiview bitstream having a plurality of reduced resolution images. The base layer, and two additional enhancement layers each including a respective full resolution image from the base layer. 2B is a block diagram illustrating another example of a video encoder that may implement techniques for generating scalable multiview bitstreams, the scalable multiview bitstream having two reduced resolution images. A base layer, and each of which includes two additional enhancement layers corresponding to respective full resolution images of the base layer. 3 is a block diagram illustrating an example of a video decoder that decodes an encoded video sequence. 4 is a conceptual diagram illustrating a left eye view image and a right eye view image, the left eye view image and the right eye view image being combined by a video encoder to form a reduced resolution for two views. The base layer of the degree image, and the full resolution enhancement layer of the left eye view image. FIG. 5 is a conceptual diagram illustrating a left eye view image and a right eye view image, which is 158748. Doc • 95- 201223249 The left eye view image and the right eye view image are combined by a video encoder to form a base layer having reduced resolution images for two views, and the right eye view image Full resolution enhancement layer. 6 is a conceptual diagram illustrating a left eye view image and a right eye view image, the left eye view image and the right eye view image being combined by a video encoder to form a base layer, full resolution left eye view Image and full resolution right eye view image. 7 is a flow diagram illustrating an example method for forming and encoding a scalable multiview bitstream that includes a base layer of two reduced resolution images having two different views. And a first enhancement layer and a second enhancement layer. 8 is a flow diagram illustrating an example method for decoding a scalable multiview bitstream having a base layer, a first enhancement layer, and a second enhancement layer.曰 [Main component symbol description] 10 System 12 Source device 14 Destination device 16 Communication channel 18 Video source 20 Video encoder 22 Modulator/demodulator (data machine) 24 Transmitter 26 Receiver 158748. Doc • 96· 201223249 28 data machine 30 video decoder 32 display device 40 mode selection unit 42 motion/difference estimation unit 44 motion compensation unit 46 in-frame prediction unit 50 summer 52 conversion unit 54 quantization unit 56 entropy coding unit 58 inverse Quantization unit 60 inverse transform unit 62 summer 64 reference frame storage 82 base layer 84 first enhancement layer 86 second enhancement layer 90 motion estimation/motion compensation unit 98 inter-layer prediction unit 100 inter-layer texture prediction unit 102 inter-layer motion prediction unit 106 inter-view prediction unit 114 transform and quantization unit 158748. Doc -97- 201223249 118 122 130 132 134 136 138 140 142 180 182 184 186 188 190 192 194 B1 B2 El E2 Entropy coding and multiplex unit inverse quantization/inverse transform unit/replication/deblocking single entropy decoding unit Motion compensation unit in-frame prediction unit inverse quantization unit inverse transform unit summer reference frame storage left eye view image right eye view image base layer/base layer frame enhancement layer/enhancement layer frame/first enhancement layer Inter-layer prediction inter-view prediction enhancement layer/enhancement layer frame/second enhancement layer inter-layer prediction base layer left-eye view base layer right-eye view base layer left-eye view full-resolution image base layer right-eye view Full resolution image 158748. Doc -98-

Claims (1)

201223249 七、申請專利範圍: 1· -種解碼包含基礎層資料及增強層資料之視頻資料之方 法,該方法包含: 解碼具有一第一解析度之基礎層資料,其中該基礎層 • 資料包含—左視圖之相對於該第—解析度的-縮減解析 • 度版本及—右視圖之相對於該第-解析度的一縮減解析 度版本; 解碼具有該第-解析度且包含用於該左視圖及該右視 圖中恰好-者之增強資料之增強層資料,其中該增強資 料具有該第—解析度’且其中解碼該增強層資料包含相 對於該基礎層資狀至少-部分解碼該增強層資料;及 組合該經解碼增強層資料與該經解碼增強層所對應的 該經解碼基礎層資料之該左視圖或該右視圖中之該一 者。 如凊求項1之方法’其中該增強層資料包含第一増強層 資料’該方法進-步包含與該第一增強層資料分離地解 碼不與該第—增強層資料相關聯的用於該左視圖及該右 視圖中恰好一者之第二增強層資料,其中該第二增強層 . 具有該第—解析度,且其中解碼該第二增強層資料包: - 相對於該基礎層資料之至少-部分或第-增強層資料之 至少一部分解碼該第二增強層資料。 3.如請求項2之方法’其中解碼該第二增強層資料包含自 對應於該第一增強層的該基礎層資料之該視圖之一經升 取樣版本掏取用於該第二增強層資料之層間預測資料, 158748.doc 201223249 其中該經升取樣版本具有該第一解析度。 4.如請求項2之方法’其中解碼該第二增強層資料包含自 具有該第-解析度的該基礎層之該另—視圖之一經升取 樣版本及該第一增強層資料中至少一者擷取用於該第二 增強層資料之視圖間預測資料。 5. 如請求項4之方法,其進一步包含解碼位於與該第二增 強層相關聯之一片段標頭中的#考圖料單建構資料,曰 該參考圖像清單建構資料指示該預測資料是與具有該第 一解析度的該基礎層之該另—視圖之該經升取樣版本相 關聯或是與該第一增強層資料相關聯。 6. 如請求項1之方法,其中解碼該第一增強層資料包含自 對應於該第一增強層的該基礎層資料之該視圖之一經升 取樣版本擷取用於該第一增強層資料之層間預測資料, 其中該經升取樣版本具有該第一解析度。 7. 如請求項1之方法,其中解碼該第一增強層資料包含自 該基礎層資料之該另一視圖之一經升取樣版本擷取用於 該第一增強層資料之視圖間預測資料,其中該經升取樣 版本具有該第一解析度。 8. 一種用於解碼包含基礎層資料及增強層資料之視頻資料 之裝置’該裝置包含一視頻解碼器,該視頻解碼器經組 態以: 解碼具有一第一解析度之基礎層資料,其中該基礎層 資料包含一左視圖之相對於該第一解析度的一縮減解析 度版本及一右視圖之相對於該第一解析度的一縮減解析 158748.doc -2- 201223249 度版本; 解碼具有該第-解析度且包含用於該左視圖及該右視 圖中恰好一者之增強資料之增強層資料, 其中該增強資料具有該第一解析度,且 其中解碼該增強層資料包含相對於該基礎層資料之至 少一部分解碼該增強層資料;及 組“亥經解碼增強層資料與該經解碼增強層所對應的 該經解碼基礎層資料之該左視圖或該右視圖中之該一 者0 9.如請求項8之裝置,其中該增強層資料包含第一增強層 資料,該視頻解碼器經進一步組態以與該第一增強層資 料分離地解碼不與該第—掸%溫=处 个兴邊弟增強層f料相關聯的用於該左 視圖及該右視圖中恰好一者之第二增強層資料,其中該 第二增強層具有該第-解析度’且其中解碼該第二增強 層資枓包含相對於該基礎層資料之至少一部分或第一增 強層資料之至少一部分解瑪該第二增強層資料。 1〇.如請求項9之裝置’其中為了解碼該第二增強層資料, 該解碼器經組態以自對應於該第二增強層的該基礎層資 枓之該視圖之—經升取樣版本操取用於該第二增強層資 料之層間預測資料,其中該經升取樣版本具有該第一解 析度。 n.=!9之裝置’其中為了解碼該第二增強層資料, :讀組態以自具有該第—解析度的該基礎層之該 圖之、&升取樣版本及該第—増強層資料中至少 158748.doc 201223249 一者擷取用於該第二增強層資料之視圖間預測資料。 12.如請求項〗丨之裝置,其中該視頻解碼器經進一步組態以 解碼位於與該第二增強層相關聯之一片段標頭中的:考 圖像清單建構資料,該參考圖像清單建構資料指示該預 測資料是與具有該第—解析度的該基礎層之該另一視圖 之該經升取樣版本相關聯或是與該第—增強層資料相關 聯。 13.如請求項8之裝置,其中為了解碼該第一增強層資料, 該解碼器經組態以自對應於該第一增強層的該基礎層資 料之忒視圖之一經升取樣版本擷取用於該第一增強層資 料之層間制資料’其中該經升取樣版本具有該第一解 析度。 月求項8之裝置,其中為了解碼該第_增強層資料, 該解碼器經組態以自該基礎層資料之該另一視圖之一經 升取樣版本擷取用於該第一增強層資料之視圖間預測資 料,其中該經升取樣版本具有該第一解析度。 15. 如請求項8之裝置,其中該裝置包含下列各者中至少一 者: 一積體電路; 一微處理器;及 一無線通信器件,其包括視頻編碼器。 16. 種用於解碼包含基礎層資料及增強層資料之視頻資料 之裝置,該裝置包含: 用於解碼具有一第一解析度之基礎層資料的構件,其 158748.doc 201223249 中該基礎層資料包含一左視圖之相對於該第一解析度的 一縮減解析度版本及一右視圖之相對於該第一解析度的 一縮減解析度版本; • 用於解碼具有該第一解析度且包含用於該左視圖及該 右視圖中恰好一者之增強資料之增強層資料的構件,其 中該增強資料具有該第一解析度,且其中解碼該增強層 資料包含相對於該基礎層資料之至少一部分解碼該增強 層資料;及 用於組合該經解碼增強層資料與該經解碼增強層所對 應的該經解碼基礎層資料之該左視圖或該右視圖中之該 一者的構件。 17·如請求項16之裝置,其中該增強層資料包含第一增強層 資料,該裝置進一步包含用於與該第一增強層資料分離 地解碼不與該第一增強層資料相關聯的用於該左視圖及 該右視圖中恰好一者之第二增強層資料的一構件,其中 s玄第二增強層具有該第一解析度,且其中解碼該第二增 強層資料包含相對於該基礎層資料之至少—部分或第— 增強層資料之至少一部分解碼該第二增強層資料。 18. —種包含一電腦可讀儲存媒體之電腦程式產品,該電腦 可讀儲存媒體在其上經儲存有指令,該等指令在執行時 使用於解碼具有基礎層資料及增強層資料之視頻資料之 一器件之一處理器: 解碼具有一第一解析度之基礎層資料,其中該基礎層 資料包含一左視圖之相對於該第一解析度的一縮減解析 158748.doc 201223249 度版本及一右視圖之相對於該第一解析度的一縮減解析 度版本; 解碼具有該第一解析度且包含用於該左視圖及該右視 圖中恰好一者之增強資料之增強層資料,其中該增強資 料具有該第一解析度,且其中解碼該増強層資料包含相 對於該基礎層資料之至少一部分解碼該增強層資料;及 組合該經解碼增強層資料與該經解碼增強層所對應的 該經解碼基礎層資料之該左視圖或該右視圖中之該一 者。 19. 如請求項18之電腦程式產品,其中該增強層資料包含第 一增強層資料,且該電腦程式產品進—步包含使該處理 器與該第一增強層資料分離地解碼不與該第一增強層資 料相關聯的用於該左視圖及該右視圖中恰好一者之第二 增強層資料的指令’纟中該第二增強層具有該第一解析 度,且其中解碼該第二增強層資料包含相對於該基礎層 資料之至少一部分或第一增強層資料之至少一部分解碼 該增強層資料。 20. —種編碼包含基礎層資料及增強層資料之視頻資料之方 法,該方法包含: 編碼具有-第-解析度之基礎層資料,其中該基礎層 資料包含-左視圖之㈣於㈣_解析度的—縮減解析 度版本及-右視圖之相對於該第—解析度的—縮減解析 度版本;及 編碼具有該第-解析度且包含用於該左視圖及該右視 158748.doc 201223249 圖中恰好—者之增強資料之增強層資料,其中該增強資 料具有該第-解析度,且其中解碼該增強層資料包含相 對於該基礎層資料之至少—部分解碼該增強層資料。 . 次广戈項2G之方法’其中該增強層資料包含第一增強層 . -;斗/方法冑步包含與該第-增強層資料分離地編 I不與該第增強層資料相關聯的用於該左視圖及該右 視圖中恰好-者之第:增強層㈣,其中該第二增強層 具有該第-解析度,且其中編碼該第二增強層資料包含 相對於該基礎層資料之至少一部分或第一增強層資料之 至少一部分編碼該第二增強層資料。 22. 如請求項21之方法’其中編碼㈣二增強層資料包含自 對應於該第二增強層㈣基礎層資料之該視圖之一經升 取樣版本來層間預測該第二增強層資料,其中該經升取 樣版本具有該第一解析度。 23. 如請求項21之方法’其中編碼該第二增強層資料包含自 具有該第一解析度的該基礎層之該另一視圖之一經升取 樣版本及該第一增強層資料中至少一者來視圖間預測該 第二增強層資料。 Λ 24. 如請求項21之方法,其進一步包含提供指示針對該第一 - 增強層資料及該第二增強層資料中至少一者是否啟用層 間預測及是否啟用視圖間預測之資訊。 25·如請求項21之方法,其進一步包含提供指示包含該基礎 層、該第一增強層及該第二增強層之一表示之操作點的 資訊,其中指示該等操作點之該資訊指示包括於該等操 158748.doc 201223249 作點之每一者中之層、表示兮楚, 衣不β玄專麵作點之一最大圖框率 之一最大時間識別符、表示該等接# 必寻轹作點所遵守之視頻編 石馬設定稽之設定稽指示符、表干兮楚姐从机 τ 衣不該等刼作點所遵守之該 荨視頻編碼設定標之位準之彳#進 平炙位準指不符,及該等操作點 之平均圖框率。 26·如請求項21之方法’其進_步包含編碼位於與該第二增 強層相關聯之一片段標頭中的參考圖像清單建構資料, 该參考圖像清單建構資料指示該預測f料是與具有該第 -解析度的該基礎層之該另—視圖之該經升取樣版本相 關聯或是與該第一增強層資料相關聯。 27.如請求項2G之方法,其中編碼該增強層f料包含自該基 礎層資料之一對應左視圖或右視圖之一經升取樣版本來 層間預測該增強層資料,其中該經升取樣版本具有該第 一解析度。 28. 如凊求項20之方法,其中編碼該增強層資料包含自該基 礎層資料之一對應左視圖或右視圖之一相反視圖之一經 升取樣版本來視圖間預測該增強層資料,其中該經升取 樣版本具有該第一解析度。 29. —種用於編碼包含一場景之一左視圖及該場景之—右視 圖之視頻資料的裝置,其中該左視圖具有一第一解析度 且該右視圖具有該第一解析度,該裝置包含一視頻編碼 器,該視頻編碼器經組態以:編碼包含該左視圖之相對 於該第一解析度的一縮減解析度版本及該右視圖之相對 於該第一解析度的該縮減解析度版本之基礎層資料;編 15S74S.doc • 8 · 201223249 碼包含用於該左視圖及該右視圖中恰好一者之增強資料 之增強層資料’其中該增強資料具有該第一解析度;及 輸出該基礎層資料及該增強層資料。 30·如請求項29之裝置,其中該增強層資料包含第一增強層 資料’且該視頻編碼器經進一步組態以與該第一增強層 資料分離地編碼不與該第一增強層資料相關聯的用於該 左視圖及該右視圖中恰好一者之第二增強層資料,其中 該第二增強層具有該第一解析度,且其中編碼該第二增 強層資料包含相對於該基礎層資料之至少一部分或第一 增強層資料之至少一部分編碼該第二增強層資料。 31.如請求項30之裝置,其中編碼該第二增強層資料包含自 對應於該第二增強層的該基礎層資料之該視圖之一經升 取樣版本來層間預測該第二增強層資料,其中該經升取 樣版本具有該第一解析度。 32·如請求項30之裝置,其中編碼該第二增強層資料包含自 具有該第—解析度的該基礎層之該另一視圖之一經升取 樣版本及該第—增強層資料中至少—者來視圖間預測該 弟一增強層資料。 33. =凊求項3〇之裝置,其中該視頻編碼器經進一步組態以 ^仏私示針對該第一增強層資料及該第二增強層資料中 夕者尺否啟用層間預測及是否啟用視圖間預測之資 訊。 34. 如請求項3〇 之裝置’其中該視頻編碼器經進一步組皞以 均不包含該基礎層、該第一增強層及該第二增強層 158748.doc 201223249 之一表示之操作點的資訊,其中心料操作點之該資 訊指示包括於該等操作點之每—者中之層、表示該等操 作點之―最大圖框率之—最大時_別符、表示該等操 作點所遵守之視頻編碼設定檔之設㈣指示符、表示該 等操作點所遵守之該黧;^瓶E — 项了 <成等視頻編碼設定檔之位準之位準指 示符,及該等操作點之平均圖框率。 35.如喷求項3G之裝置’其中該視頻編碼器經進—步组維以 編碼位於與該第二增強層相關聯之—片段標頭中时考 圖像清單建構資料,該參考圖像清單建構資料指示該預 測資料是與具有該第-解析度的該基礎層之該另一視圖 之該經升取樣版本相關聯或是與該第—增強層資料相關 聯。 36_如請求項29之裝置,其中編碼該增強層資料包含自該基 礎層資料之-對應左視圖或右視圖之—經升取樣版本來 層間預㈣增強層諸,其巾㈣升取樣版本具有該第 一解析度》 37. 如請求項29之裝置,其中編料增強層資料包含自該基 楚層資料《對應左視圖或右視圖之一相反視圖之一經 升取樣版本來視圖間顏該增強層諸,其中該經升取 樣版本具有該第一解析度。 38. 如請求項29之裝置,其中該裝置包含下列各者中至少一 者: 一積體電路; 一微處理器;及 158748.doc -10- 201223249 無線通信器件,其包括該視頻編碼器。 39.種用於編碼包含一場景之一左視圖及該場景之一右視 圖之視頻資料的裝置,其中該左視圖具有—第―解析度 且該右視圖具有該第一解析度,該裝置包含: 用於,為碼包含該左視圖之相對於該第一解析度的一縮 減解析度版本及該右才見圖之相對於該第一解析度的該縮 減解析度版本之基礎層資料的構件; 用於編碼包含用於該左視圖及該右視圖中恰好一者之 增強㈣之增強層資料的構件’其中該增強資料具有該 第一解析度;及 用於輸出該基礎層冑料及該增強層資料的構件。 如:月长項39之裝置’其中該增強層資料包含第一增強層 貝料,且該裝置進一步包含用於與該第一增強層資料分 離地編碼不與該第—增強層資料相關聯的用於該左視圖 及該右視圖+恰好—者之第二增強層資料的構件,其中 〇第增強層具有s亥第一解析度,且其中編碼該第二增 強層資料包含相對於該基礎層資料之至少一部分或第一 增強層資料之至少一部分編碼該第二增強層資料。 41.一種包含一電腦可讀健存媒體之電腦程式產品,該電腦 可讀儲存㈣在其上經儲存有指令,該等指令在執行時 使用於編碼視頻資料之一器件之一處理器·· 接收包含一場景之一左視圖及該場 景之一右視圖之視 頻資料’其中該左視圖具有一第— 有該第一解析度; 解析度且該右視圖具 158748.doc 11· 201223249 一解析度的一縮減解 一解析度的該縮減解 編碼包含該左視圖之相對於該第 析度版本及該右視圖之相對於該第 析度版本之基礎層資料; 編碼包含用於該左視圖及該右視圖中恰好一者之增201223249 VII. Patent application scope: 1. A method for decoding video data including base layer data and enhancement layer data, the method comprising: decoding base layer data having a first resolution, wherein the base layer • data includes— a reduced resolution version of the left view relative to the first resolution and a reduced resolution version relative to the first resolution; the decoding has the first resolution and is included for the left view And the enhancement layer data of the enhancement data of the right view, wherein the enhancement data has the first resolution and wherein decoding the enhancement layer data comprises at least partially decoding the enhancement layer data relative to the base layer asset And combining the decoded enhancement layer data with the one of the left view or the right view of the decoded base layer data corresponding to the decoded enhancement layer. The method of claim 1, wherein the enhancement layer data includes the first layer of the layer data, the method further comprises: decoding the data separately from the first enhancement layer data and not associated with the first enhancement layer data for a second enhancement layer and a second enhancement layer data of the right view, wherein the second enhancement layer has the first resolution, and wherein the second enhancement layer packet is decoded: - relative to the base layer data At least a portion or at least a portion of the enhancement layer data decodes the second enhancement layer material. 3. The method of claim 2, wherein decoding the second enhancement layer data comprises extracting, by the upsampled version, one of the views of the base layer data corresponding to the first enhancement layer for the second enhancement layer data Inter-layer prediction data, 158748.doc 201223249 wherein the upsampled version has the first resolution. 4. The method of claim 2, wherein decoding the second enhancement layer data comprises at least one of the upsampled version of the other view of the base layer having the first resolution and the first enhancement layer profile The inter-view prediction data for the second enhancement layer data is retrieved. 5. The method of claim 4, further comprising decoding #图图单建建资料 located in a segment header associated with the second enhancement layer, wherein the reference image list construction data indicates that the prediction material is Associated with the upsampled version of the other view of the base layer having the first resolution or associated with the first enhancement layer profile. 6. The method of claim 1, wherein decoding the first enhancement layer data comprises extracting, by the upsampled version, one of the views of the base layer data corresponding to the first enhancement layer for the first enhancement layer data Inter-layer prediction data, wherein the upsampled version has the first resolution. 7. The method of claim 1, wherein decoding the first enhancement layer data comprises extracting inter-view prediction data for the first enhancement layer data from an upsampled version of the another view of the base layer data, wherein The upsampled version has the first resolution. 8. A device for decoding video material comprising base layer data and enhancement layer data, the device comprising a video decoder configured to: decode base layer data having a first resolution, wherein The base layer data includes a reduced resolution version of the left view relative to the first resolution and a reduced resolution 158748.doc -2- 201223249 degree version relative to the first resolution of the right view; The first resolution and the enhancement layer data for the enhanced data of the right view and the right view, wherein the enhanced data has the first resolution, and wherein decoding the enhancement layer data comprises At least a portion of the base layer data is decoded by the enhancement layer data; and the group of the "Hai Decoded Enhancement Layer Data" and the decoded base layer data corresponding to the decoded enhancement layer are the one of the left view or the right view. 9. The device of claim 8, wherein the enhancement layer material comprises a first enhancement layer material, the video decoder being further configured to interact with the first enhancement The data separately decodes the second enhancement layer data for the right view and the right one of the right view that is not associated with the first 掸% === The layer has the first resolution and wherein decoding the second enhancement layer includes dissolving the second enhancement layer data with respect to at least a portion of the base layer material or at least a portion of the first enhancement layer data. The apparatus of claim 9 wherein the decoder is configured to fetch from the upsampled version of the view corresponding to the base layer of the second enhancement layer in order to decode the second enhancement layer material The inter-layer prediction data of the second enhancement layer data, wherein the upsampled version has the first resolution. n.=!9 device 'wherein in order to decode the second enhancement layer data, the read configuration is self-contained The at least 158748.doc 201223249 of the map of the base layer of the first-level resolution and the first-thickness layer data is used to obtain inter-view prediction data for the second enhancement layer data. As requested Apparatus, wherein the video decoder is further configured to decode: a test image list construction material located in a slice header associated with the second enhancement layer, the reference image list construction data indicating the prediction The data is associated with or associated with the upsampled version of the other view of the base layer having the first resolution. 13. The device of claim 8, wherein The first enhancement layer data, the decoder configured to extract, from the upsampled version of the base layer data corresponding to the first enhancement layer, the layered data for the first enhancement layer data Wherein the upsampled version has the first resolution. The device of claim 8, wherein the decoder is configured to rise from one of the other views of the base layer data in order to decode the first enhancement layer data The sampled version retrieves inter-view prediction data for the first enhancement layer data, wherein the upsampled version has the first resolution. 15. The device of claim 8, wherein the device comprises at least one of: an integrated circuit; a microprocessor; and a wireless communication device comprising a video encoder. 16. Apparatus for decoding video material comprising base layer data and enhancement layer data, the apparatus comprising: means for decoding base layer data having a first resolution, the base layer data of 158748.doc 201223249 a reduced resolution version relative to the first resolution and a reduced resolution version relative to the first resolution of a left view; • for decoding having the first resolution and including And a component of the enhancement layer data of the enhanced data in the left view and the right view, wherein the enhanced material has the first resolution, and wherein decoding the enhancement layer data includes at least a portion of the data relative to the base layer Decoding the enhancement layer data; and means for combining the decoded enhancement layer material with the one of the left view or the right view of the decoded base layer data corresponding to the decoded enhancement layer. 17. The device of claim 16, wherein the enhancement layer material comprises first enhancement layer material, the device further comprising: for separately decoding the first enhancement layer material from being associated with the first enhancement layer material a member of the second enhancement layer and the second enhancement layer data of the right view, wherein the second enhancement layer has the first resolution, and wherein decoding the second enhancement layer data comprises relative to the base layer At least a portion of the data, at least a portion or a portion of the enhancement layer, decodes the second enhancement layer material. 18. A computer program product comprising a computer readable storage medium having stored thereon instructions for performing decoding of video material having base layer data and enhancement layer data when executed A processor of one of the devices: decoding base layer data having a first resolution, wherein the base layer data includes a reduced view of the left view relative to the first resolution 158748.doc 201223249 degree version and a right a reduced resolution version of the view relative to the first resolution; decoding enhancement layer data having the first resolution and including enhancement data for exactly one of the left view and the right view, wherein the enhanced material Having the first resolution, and wherein decoding the macro layer data comprises decoding the enhancement layer data with respect to at least a portion of the base layer data; and combining the decoded enhancement layer data with the decoded corresponding to the decoded enhancement layer The one of the left view or the right view of the base layer data. 19. The computer program product of claim 18, wherein the enhancement layer data comprises first enhancement layer data, and the computer program product further comprises: decoding the processor separately from the first enhancement layer data An enhancement layer data associated with the second enhancement layer and the second enhancement layer material of the right view, wherein the second enhancement layer has the first resolution, and wherein the second enhancement is decoded The layer material includes decoding the enhancement layer data with respect to at least a portion of the base layer material or at least a portion of the first enhancement layer data. 20. A method of encoding video material comprising base layer data and enhancement layer data, the method comprising: encoding base layer data having a -degree-resolution, wherein the base layer data comprises - (4) to (4) _ resolution a reduced-resolution version and a reduced-resolution version of the right-view relative to the first-resolution; and the code having the first-resolution and included for the left view and the right-view 158748.doc 201223249 The enhancement layer data of the enhancement data, wherein the enhancement data has the first resolution, and wherein decoding the enhancement layer data comprises at least partially decoding the enhancement layer data relative to the base layer data. The method of the second Guang Ge item 2G, wherein the enhancement layer data comprises a first enhancement layer. -; the bucket/method step comprises separately separating the data from the first enhancement layer data with the first reinforcement layer data In the left view and the right view, the enhancement layer (four), wherein the second enhancement layer has the first resolution, and wherein encoding the second enhancement layer data includes at least the data relative to the base layer A portion or portion of the first enhancement layer data encodes the second enhancement layer material. 22. The method of claim 21, wherein the encoding (four) two enhancement layer data comprises inter-layer prediction of the second enhancement layer data from an upsampled version of the view corresponding to the second enhancement layer (four) base layer data, wherein the The upsampled version has this first resolution. 23. The method of claim 21, wherein the encoding the second enhancement layer data comprises at least one of the upsampled version of the other view of the base layer having the first resolution and the first enhancement layer profile The second enhancement layer data is predicted between views. 24. The method of claim 21, further comprising providing information indicating whether inter-layer prediction is enabled and whether inter-view prediction is enabled for at least one of the first-enhancement layer data and the second enhancement layer data. The method of claim 21, further comprising providing information indicating an operation point including the base layer, the first enhancement layer, and the second enhancement layer, wherein the information indication indicating the operation points includes In the layer of each of the 158748.doc 201223249 points, the meaning of the layer, one of the maximum frame rate of one of the points of the clothing is not the maximum frame identifier, indicating that the connection # must find The video that is obeyed by the 轹 编 设定 设定 设定 设定 设定 、 、 、 、 、 、 τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ τ The position of the target does not match, and the average frame rate of the operating points. 26. The method of claim 21, wherein the step further comprises encoding a reference image list construction material located in a slice header associated with the second enhancement layer, the reference image list construction data indicating the prediction material Associated with the upsampled version of the other view of the base layer having the first resolution or associated with the first enhancement layer profile. 27. The method of claim 2, wherein encoding the enhancement layer comprises predicting the enhancement layer data from an upsampled version of one of the base layer data corresponding to one of a left view or a right view, wherein the upsampled version has The first resolution. 28. The method of claim 20, wherein encoding the enhancement layer data comprises predicting the enhancement layer data between the views from one of the base layer data corresponding to one of the left view or one of the opposite views of the right view. The upsampled version has this first resolution. 29. Apparatus for encoding video material comprising a left view of a scene and a right view of the scene, wherein the left view has a first resolution and the right view has the first resolution, the device A video encoder is included, the video encoder configured to: encode a reduced resolution version of the left view relative to the first resolution and the reduced resolution of the right view relative to the first resolution The base layer data of the degree version; edited 15S74S.doc • 8 · 201223249 code contains enhancement layer data for the enhanced information of the one of the left view and the right view, wherein the enhanced data has the first resolution; The base layer data and the enhancement layer data are output. 30. The apparatus of claim 29, wherein the enhancement layer material comprises a first enhancement layer material and the video encoder is further configured to be encoded separately from the first enhancement layer material and not associated with the first enhancement layer material a second enhancement layer data for exactly one of the left view and the right view, wherein the second enhancement layer has the first resolution, and wherein encoding the second enhancement layer material comprises relative to the base layer At least a portion of the data or at least a portion of the first enhancement layer data encodes the second enhancement layer material. 31. The apparatus of claim 30, wherein encoding the second enhancement layer data comprises interstitial prediction of the second enhancement layer data from an upsampled version of the view of the base layer data corresponding to the second enhancement layer, wherein The upsampled version has the first resolution. 32. The apparatus of claim 30, wherein encoding the second enhancement layer data comprises at least one of the upsampled version of the other view of the base layer having the first resolution and the first enhancement layer data To predict the brother-enhancement layer data between the views. 33. The device of claim 3, wherein the video encoder is further configured to enable inter-layer prediction and enable or disable for the first enhancement layer data and the second enhancement layer data in the second enhancement layer data. Information between predictions between views. 34. The device of claim 3, wherein the video encoder is further configured to include no information of the operating point indicated by one of the base layer, the first enhancement layer, and the second enhancement layer 158748.doc 201223249 The information indication of the central material operation point includes the layer included in each of the operation points, the maximum frame rate indicating the operation points, the maximum time value, and the indication that the operation points are observed. The setting of the video coding profile (4) indicator, indicating the compliance of the operation points; ^ bottle E - item < level indicator of the level of the video coding profile, and the operation points The average frame rate. 35. The apparatus of claim 3G wherein the video encoder passes a step group dimension to encode a picture list construction material located in a segment header associated with the second enhancement layer, the reference image The manifest construction data indicates whether the predicted data is associated with the upsampled version of the other view of the base layer having the first resolution or associated with the first enhancement layer material. 36. The apparatus of claim 29, wherein the enhancement layer data is encoded from the base layer data - corresponding to the left or right view - the upsampled version to the inter-layer (four) enhancement layer, the towel (four) sampled version has The apparatus of claim 29, wherein the apparatus of claim 29 includes the enhancement of the inter-view color from one of the opposite views of the left view or the right view. Layers, wherein the upsampled version has the first resolution. 38. The device of claim 29, wherein the device comprises at least one of: an integrated circuit; a microprocessor; and 158748.doc -10- 201223249 a wireless communication device including the video encoder. 39. Apparatus for encoding video material comprising a left view of a scene and a right view of one of the scenes, wherein the left view has a - resolution and the right view has the first resolution, the device includes : means for the code to include a reduced resolution version of the left view relative to the first resolution and the base layer data of the reduced resolution version relative to the first resolution of the right view a means for encoding enhancement layer data containing enhancements (4) for exactly one of the left view and the right view, wherein the enhancement material has the first resolution; and for outputting the base layer data and the enhancement The component of the layer data. For example, the apparatus of the monthly length item 39 wherein the enhancement layer material comprises a first reinforcement layer of bait material, and the apparatus further comprises a code for separately separating from the first enhancement layer material and not associated with the first enhancement layer material a member for the left view and the right view + exactly the second enhancement layer material, wherein the second enhancement layer has a first resolution of s, and wherein encoding the second enhancement layer material comprises relative to the base layer At least a portion of the data or at least a portion of the first enhancement layer data encodes the second enhancement layer material. 41. A computer program product comprising a computer readable storage medium, wherein the computer readable storage (4) has instructions stored thereon, the instructions being used in a processor of one of the encoded video materials when executed. Receiving a video material including a left view of one scene and a right view of one of the scenes, wherein the left view has a first--the first resolution; the resolution and the right view has a resolution of 158748.doc 11·201223249 The reduced solution encoding of the reduced resolution-resolution includes the base layer data of the left view relative to the first version and the right view relative to the first version; the encoding includes the left view and the Just one of the right views 度;及 輸出該基礎層資料及該增強層資料。 42.如請求項41之電腦程式產品,其中該增強層資料包含第 一增強層資料,且該電腦程式產品進一步包含在執行時 使用於編碼視頻資料之一器件之一處理器與該第一增強 層資料分離地編碼不與該第一增強層資料相關聯的用於 該左視圖及該右視圖中恰好一者之第二增強層資料的指 令’其中該第一增強層具有該第一解析度,且其中編碼 該第二增強層資料包含相對於該基礎層資料之至少—部 分或第一增強層資料之至少一部分編碼該第二增強層資 料0 158748.doc •12·Degree; and output the base layer data and the enhancement layer data. 42. The computer program product of claim 41, wherein the enhancement layer material comprises a first enhancement layer material, and the computer program product further comprises a processor for processing one of the devices used in encoding the video material and the first enhancement The layer data separately encodes an instruction for the second enhancement layer material of the left view and the right one of the right view that is not associated with the first enhancement layer material, wherein the first enhancement layer has the first resolution And wherein encoding the second enhancement layer material comprises encoding the second enhancement layer data with respect to at least a portion of the base layer data or at least a portion of the first enhancement layer data. 0 158748.doc • 12·
TW100134425A 2010-09-24 2011-09-23 Coding stereo video data TW201223249A (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US38646310P 2010-09-24 2010-09-24
US48033611P 2011-04-28 2011-04-28
US13/194,656 US20120075436A1 (en) 2010-09-24 2011-07-29 Coding stereo video data
PCT/US2011/050699 WO2012039936A1 (en) 2010-09-24 2011-09-07 Coding stereo video data

Publications (1)

Publication Number Publication Date
TW201223249A true TW201223249A (en) 2012-06-01

Family

ID=46725450

Family Applications (1)

Application Number Title Priority Date Filing Date
TW100134425A TW201223249A (en) 2010-09-24 2011-09-23 Coding stereo video data

Country Status (1)

Country Link
TW (1) TW201223249A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104798377A (en) * 2012-10-01 2015-07-22 高通股份有限公司 Sub-bitstream extraction for multiview, three-dimensional (3d) and scalable video bitstreams
TWI502545B (en) * 2013-06-25 2015-10-01 Method of storing a content of a three-dimensional image
US9549205B2 (en) 2012-07-27 2017-01-17 Novatek Microelectronics Corp. Method and device for encoding video
US9900593B2 (en) 2012-08-29 2018-02-20 Vid Scale, Inc. Method and apparatus of motion vector prediction for scalable video coding

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9549205B2 (en) 2012-07-27 2017-01-17 Novatek Microelectronics Corp. Method and device for encoding video
US9900593B2 (en) 2012-08-29 2018-02-20 Vid Scale, Inc. Method and apparatus of motion vector prediction for scalable video coding
US10939130B2 (en) 2012-08-29 2021-03-02 Vid Scale, Inc. Method and apparatus of motion vector prediction for scalable video coding
US11343519B2 (en) 2012-08-29 2022-05-24 Vid Scale. Inc. Method and apparatus of motion vector prediction for scalable video coding
CN104798377A (en) * 2012-10-01 2015-07-22 高通股份有限公司 Sub-bitstream extraction for multiview, three-dimensional (3d) and scalable video bitstreams
CN104798377B (en) * 2012-10-01 2018-07-03 高通股份有限公司 It is extracted for the sub- bit stream of multiple view, three-dimensional (3D) and scalable video bitstream
TWI502545B (en) * 2013-06-25 2015-10-01 Method of storing a content of a three-dimensional image

Similar Documents

Publication Publication Date Title
KR102567284B1 (en) An apparatus, a method and a computer program for video coding and decoding
KR101436713B1 (en) Frame packing for asymmetric stereo video
JP5866364B2 (en) Stereo video data encoding
JP6022652B2 (en) Slice header 3D video extension for slice header prediction
KR101825575B1 (en) Method and apparatus for video coding and decoding
TWI533679B (en) Parameter sets in video coding
KR101662963B1 (en) An apparatus, a method and a computer program for 3d video coding
US10200708B2 (en) Sequence level information for multiview video coding (MVC) compatible three-dimensional video coding (3DVC)
US9674519B2 (en) MPEG frame compatible video coding
KR101909331B1 (en) Target output layers in video coding
KR20130141674A (en) Coding multiview video plus depth content
KR20150064118A (en) Performing residual prediction in video coding
TW201515440A (en) Tiles and wavefront processing in multi-layer context
WO2013022540A1 (en) Three-dimensional video with asymmetric spatial resolution
TW201223249A (en) Coding stereo video data
TW202312739A (en) Green metadata signaling
Assunção et al. 3D Media Representation and Coding