TWI416962B

TWI416962B - Method and apparatus,and computer readable medium for frame prediction in hybrid video compression to enable temporal scalability

Info

Publication number: TWI416962B
Application number: TW94111031A
Authority: TW
Inventors: Peisong Chen; Vijayalakshmi R Raveendran
Original assignee: Qualcomm Inc
Priority date: 2004-04-07
Filing date: 2005-04-07
Publication date: 2013-11-21
Also published as: TW200614822A; AR048535A1

Description

Method, apparatus, and computer readable medium for frame prediction to enable temporal scalability in concurrent video compression

[35U.S.C priority request under §119]

本發明專利申請案請求優先於2004年4月7日申請的名稱為"在混合視訊壓縮中用於訊框預測以達成暫時可縮放性之方法及設備"(METHOD AND APPARATUS FOR PREDICTION FRAME IN HYBRID VIDEO COMPRESSION TO ENABLE TEMPORAL SCALABILITY)之臨時申請案第60/560,433號及2004年11月4日申請的名稱為"在混合視訊壓縮中用於訊框預測以達成暫時可縮放性之方法及設備"(METHOD AND APPARATUS FOR PREDICTION FRAME IN HYBRID VIDEO COMPRESSION TO ENABLE TEMPORAL SCALABILITY)之臨時申請案第60/625,700號，該等申請案受讓於本文之受讓人且由此明確地以引用方式併入本文中。The patent application of the present invention has priority over the method and apparatus for the purpose of "frame prediction in the hybrid video compression to achieve temporary scalability" on April 7, 2004 (METHOD AND APPARATUS FOR PREDICTION FRAME IN HYBRID VIDEO COMPRESSION TO ENABLE TEMPORAL SCALABILITY) Provisional Application No. 60/560,433 and November 4, 2004, entitled "Methods and Equipment for Frame Prediction in Hybrid Video Compression for Temporary Scalability" (METHOD) </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> <RTIgt; </ RTI> </ RTI> </ RTI> <RTIgt;

本發明係關於用於分配以一提供暫時可縮放性之方式編碼之數位資料之方法、設備及系統。The present invention relates to a method, apparatus and system for allocating digital data encoded in a manner that provides temporal scalability.

由於網際網路及無線通訊之爆炸性增長及巨大成功，以及對多媒體服務之日益增加之需求，網際網路及行動/無線通道上之串流式媒體已引起極大之關注。於異質性網際網路協定(IP)網路中，視訊由一伺服器提供且可由一個或多個客戶串流化。有線連接包括：撥號、整合服務數位網路(ISDN)、電纜、數位用戶線路協定(統稱為xDSL)、光纖、區域網路(LAN)、廣域網路(WAN)及其它網路。該傳輸模式可係單播或係多播。包括個人數位助理(PDA)、膝上型電腦、桌上型電腦、視訊轉換器、TV、HDTV、行動電話及其它裝置在內之各種個別客戶裝置需要同時用於同一內容之不同帶寬之位元流。該連接帶寬可隨時間快速改變(自9.6 kbps至100 Mbps及以上)，且可快於一伺服器之反應。Due to the explosive growth and great success of the Internet and wireless communications, and the increasing demand for multimedia services, streaming media over the Internet and mobile/wireless channels has received considerable attention. In heterogeneous Internet Protocol (IP) networks, video is provided by a server and can be streamed by one or more clients. Wired connections include: dial-up, integrated services digital network (ISDN), cable, digital subscriber line protocol (collectively referred to as xDSL), fiber optics, local area network (LAN), wide area network (WAN), and other networks. The transmission mode can be unicast or multicast. Various individual client devices, including personal digital assistants (PDAs), laptops, desktops, video converters, TVs, HDTVs, mobile phones, and other devices, need to be used for different bandwidth bits of the same content. flow. This connection bandwidth can change rapidly over time (from 9.6 kbps to 100 Mbps and above) and can be faster than a server response.

類似於異質性IP網路者係行動/無線通訊。藉由行動/無線通道運輸多媒體內容極具挑戰性，此乃因此等通道經常因多路徑衰落、遮蔽、符號間干擾及雜訊干擾而受到嚴重損害。某些諸如行動性及競爭性訊務之其它原故亦可導致帶寬變化及損失。通道雜訊及所服務之使用者數量決定通道環境之時間變化性質。除環境條件外，目的地網路會因地理位置及行動漫遊而自第二至第三代蜂巢網路變成唯寬帶資料網路。影響可用帶寬之所有此等可變性需要多媒體內容傳輸之適應性速率調節，既使在作業時。因此，於異質性有線/無線網路上成功傳輸視訊需要有效之編碼及對變化之網路條件、裝置特徵及使用者偏好之適應性，同時亦對損失具彈性。Similar to heterogeneous IP network users are mobile/wireless communication. Transporting multimedia content via mobile/wireless channels is extremely challenging, and as such channels are often severely compromised by multipath fading, shadowing, intersymbol interference, and noise interference. Certain other reasons, such as mobile and competitive traffic, can also cause bandwidth changes and losses. Channel noise and the number of users served determine the nature of the time variation of the channel environment. In addition to environmental conditions, the destination network will become a broadband-only data network from the second to third generation cellular networks due to geographic location and mobile roaming. All of these variability affecting the available bandwidth requires adaptive rate adjustment of the multimedia content transmission, even at the time of the job. Therefore, the successful transmission of video over a heterogeneous wired/wireless network requires efficient coding and adaptability to changing network conditions, device characteristics and user preferences, as well as flexibility for loss.

為了滿足不同使用者之要求且適應通道變化，可產生多個獨立版本之位元流，每一位元流可基於傳輸帶寬、使用者顯示及運算能力來滿足一個等級之限制，但此對伺服器儲存及多播應用無效。於含納高端使用者之一單一大位元流構建在該伺服器處之可縮放編碼中，僅將該等用於低端應用之位元流作為該大位元流之子集合嵌入。如此，一單一位元流可藉由有選擇地傳輸子位元流而適合於各種應用環境。可縮放編碼所提供之另一優點係針對易出錯通道(error prone channel)上之碩壯視訊傳輸。可輕易地實施錯誤保護或錯誤消除。可將更穩定之傳輸通道或更佳之錯誤保護應用於包含最有效資訊之基礎層位元。In order to meet the requirements of different users and adapt to channel changes, multiple independent versions of the bit stream can be generated. Each bit stream can satisfy a level limit based on the transmission bandwidth, user display and computing power, but the servo pair Device storage and multicast applications are invalid. A single large bit stream containing one of the high-end users is built into the scalable coding at the server, and only the bit stream for the low-end application is embedded as a subset of the large bit stream. As such, a single bit stream can be adapted to various application environments by selectively transmitting sub-bitstreams. Another advantage provided by scalable coding is for robust video transmission over error prone channels. Error protection or error elimination can be easily implemented. A more stable transmission channel or better error protection can be applied to the base layer bits containing the most efficient information.

在類似MPEG－1、MPEG－2、MPEG－4(共稱為MPEG－x)，H.261、H.262、H.263、及H.264(統稱為H.26x)之混合編碼器中存在空間、暫時及信號－雜訊比(SNR)之可縮放性。於混合編碼中，可藉由運動補償預測(MCP)去除暫時冗餘。一視訊通常被劃分成一系列圖像群組(GOP)，其中每一GOP皆以一內部編碼訊框(I)開始，後跟一向前預測訊框(P)與雙向預測訊框(B)之佈置。P－訊框及B－訊框兩者皆係中間訊框。B訊框係大多數類似MPEG之編碼器中暫時可縮放性之關鍵。然而，諸如MPEG－4 Simple Profile及H.264 Baseline Profile之某些規範並不支援B訊框。In a hybrid encoder like MPEG-1, MPEG-2, MPEG-4 (collectively referred to as MPEG-x), H.261, H.262, H.263, and H.264 (collectively referred to as H.26x) There is spatial, temporal, and signal-to-noise ratio (SNR) scalability. In hybrid coding, temporal redundancy can be removed by motion compensated prediction (MCP). A video is usually divided into a series of picture groups (GOPs), each of which starts with an internal coded frame (I) followed by a forward predictive frame (P) and a bidirectional predictive frame (B). Arrangement. Both the P-frame and the B-frame are intermediate frames. The B frame is the key to the temporary scalability of most MPEG-like encoders. However, some specifications such as MPEG-4 Simple Profile and H.264 Baseline Profile do not support B-frames.

於MPEG－4中，規範及位準提供一種根據解碼一特定位元流所需之解碼器能力界定語律及語義之子集之手段。一規範係一經界定之整個語律位元流之子集。一位準係一經界定之施加在該位元流內參數上之限制集合。對於任一既定規範，位準通常對應於解碼器之處理負載及記憶體能力。如此，規範及位準規定對位元流之限制且因此對解碼位元流之能力設置限制。一般而言，若其能夠正確地解碼由彼規範於彼位準處規定之所有語律元素之全部已允許值，則一解碼器應被視為符合一既定位準處之一既定規範。In MPEG-4, specifications and levels provide a means of defining a subset of linguistic and semantics based on the decoder capabilities required to decode a particular bit stream. A specification is a subset of the entire lexical bit stream that is defined. A set of limits defined by a quasi-system that is imposed on parameters within the bit stream. For any given specification, the level typically corresponds to the processing load and memory capabilities of the decoder. As such, the specification and leveling rules limit the bit stream and thus impose limits on the ability to decode the bit stream. In general, a decoder should be considered to conform to one of the established specifications of a given location if it is able to correctly decode all of the allowed values of all of the linguistic elements specified by the specification.

本發明之一目的係提供一種方法及設備，該方法及設備可提供亦符合MPEG－4 Simple Profile及H.264 Baseline Profile之簡單而有效之暫時可縮放性。該MPEG－4標準闡釋於ISO/IEC14496－2中。該H.264標準闡釋於[ISO/IEC14496－10]中。It is an object of the present invention to provide a method and apparatus that provides simple and effective temporary scalability that also conforms to MPEG-4 Simple Profile and H.264 Baseline Profile. The MPEG-4 standard is explained in ISO/IEC 14496-2. The H.264 standard is explained in [ISO/IEC 14496-10].

本文闡述一種用於在一視訊壓縮中提供暫時可縮放性及用於為符合MPEG－4 Simple Profile及H.264 Baseline Profile之裝置提供暫時可縮放性之編碼方案及投送方案(例如MPEG－x或H.26x)。This document describes a coding scheme and delivery scheme (such as MPEG-x) that provides temporary scalability in a video compression and provides temporary scalability for devices that conform to MPEG-4 Simple Profile and H.264 Baseline Profile. Or H.26x).

於一實例中，一編碼器或轉碼器可產生一適合於為多個使用者提供可變資料速準及視訊品質之單一位元流。該單一位元流可形成於作業時或儲存在記憶體中。例如，為滿足帶寬要求、滿足諸如環境雜訊之通道條件或投送可變品質視訊，可自該視訊流中略去暫時縮放訊框。In one example, an encoder or transcoder can generate a single bit stream suitable for providing variable data speed and video quality to multiple users. The single bit stream can be formed during operation or stored in memory. For example, to meet bandwidth requirements, meet channel conditions such as ambient noise, or deliver variable quality video, the temporary zoom frame can be omitted from the video stream.

於另一實例中，一解碼器可選擇略去對暫時縮放訊框之解碼，從而(例如)節省電池電力或解碼時間。In another example, a decoder may choose to omit decoding of the temporarily scaled frame to, for example, save battery power or decode time.

於數個通訊系統中，壓縮擬傳輸之資料以便更有效地使用可用帶寬。例如，移動圖像專家組(MPEG)已開發了數個關於數位資料投送系統之標準。該MPEG－4標準係針對通常經受高資料損失之低至高資料速準通道而開發。一類似標準係由ITU－T視訊編碼專家組(VCEG)與ISO/IEC MPEG一起開發的H.264。In several communication systems, the data to be transmitted is compressed to use the available bandwidth more efficiently. For example, the Moving Picture Experts Group (MPEG) has developed several standards for digital data delivery systems. The MPEG-4 standard was developed for low to high data rate channels that typically experience high data loss. A similar standard is H.264 developed by the ITU-T Video Coding Experts Group (VCEG) together with ISO/IEC MPEG.

該等MPEG－x及H.26x標準闡述可較佳適合於使用固定或可變長度源編碼技術壓縮及投送視訊、聲訊及其它資訊之資料處理及調處技術。特定而言，上述標準及其它混合編碼標準及技術將使用訊框內編碼技術(例如，運行長度編碼法、哈夫曼(Huffman)編碼法及類似方法)及訊框間編碼技術(諸如，例如，正向及反向預測編碼法、運動補償及類似方法)來壓縮視訊資訊。具體而言，於視訊處理系統之情形下，混合視訊處理系統之特徵係藉助訊框內及/或訊框間運動補償編碼法對視訊訊框實施基於預測之壓縮編碼。These MPEG-x and H.26x standards describe data processing and mediation techniques that are preferably suitable for compressing and delivering video, audio and other information using fixed or variable length source coding techniques. In particular, the above standards and other hybrid coding standards and techniques will use intra-frame coding techniques (eg, run length coding, Huffman coding, and the like) and inter-frame coding techniques (such as, for example, , forward and backward predictive coding, motion compensation, and the like) to compress video information. Specifically, in the case of a video processing system, the hybrid video processing system is characterized in that the video frame is subjected to predictive compression coding by means of intra-frame and/or inter-frame motion compensation coding.

本文闡述一種用於編碼一包含內部編碼訊框、正向及反向預測訊框及單向預測暫時縮放訊框之視訊流之方法、設備。在視訊投送期間，暫時縮放可發生在一始發裝置處、一中間裝置處或一接收裝置處。This document describes a method and apparatus for encoding a video stream that includes an intra-coded frame, forward and backward prediction frames, and a unidirectionally predicted temporarily scaled frame. During video delivery, temporary scaling may occur at an originating device, at an intermediate device, or at a receiving device.

訊框內編碼係指無需參考任一其它圖像來編碼一圖像(一欄位或一訊框)，但，該內部編碼訊框可用作其它訊框之一參考。術語"內部訊框"(Intra－frame)、"內部編碼訊框"(Intra－coded frame)及"I訊框"皆係藉助本專利申請中通篇使用的內部編碼法所形成之視訊對象之實例。In-frame coding means that an image (a field or a frame) is encoded without reference to any other image, but the internal coded frame can be used as a reference for other frames. The terms "intra-frame", "intra-coded frame" and "I-frame" are all video objects formed by the internal coding method used throughout this patent application. Example.

訊框間或預測編碼係指參照另一圖像來編碼一圖像(一欄位或一訊框)。與內部編碼訊框相比較，可以更高之效率編碼中間編碼或預測訊框。將在本專利申請中通篇使用之中間訊框(Inter－frame)之實例係預測訊框(或正向或反向預測，亦稱作"P訊框")、雙向預測訊框(亦稱作"B訊框")、及單向預測暫時縮放訊框(亦稱作"P^＊訊框")。用於中間編碼(Inter－coding)之其它術語包括：高通編碼法、剩餘編碼法、運動補償插值法及熟悉此項技術之普通人員習知之其它方法。Inter-frame or predictive coding refers to encoding an image (a field or a frame) with reference to another image. Compared with the inner coded frame, the intermediate code or prediction frame can be encoded with higher efficiency. An example of an inter-frame that will be used throughout this patent application is a predictive frame (either forward or backward prediction, also known as a "P-frame"), a bi-directional prediction frame (also known as a bi-predictive frame). As a "B frame", and a one-way prediction temporary zoom frame (also known as "P ^* frame"). Other terms used for inter-coding include: high pass encoding, residual encoding, motion compensated interpolation, and other methods known to those of ordinary skill in the art.

於一典型MPEG解碼器中，可相對於一參考訊框(其中一內部訊框或另一預測訊框可用作一參考訊框)來解碼所預測編碼之像素塊(即，包括一個或多個運動向量及一剩餘錯誤分量之塊)。圖1A係一圖解闡釋一傳統MPEG－4 Simple Profile資料流之示意圖，其描繪一GOP之訊框相依性。GOP 10係由初始I訊框12後跟數個正向預測P訊框14所構成。P訊框對一先前I或P訊框之相依性可限制提供至一僅支援正向預測訊框之系統(諸如符合MPEG－4 Simple及H.264 Baseline Profile之系統)之暫時可縮放性。去除任一P訊框14可導致對解碼其它P訊框可能至關重要之資訊損失。P訊框之去除可導致(例如)視訊抖動或解碼器在標記下一GOP開始之下一I訊框16之前不能繼續解碼。In a typical MPEG decoder, the block of the predicted coded block can be decoded (ie, including one or more) with respect to a reference frame (where an intra frame or another prediction frame can be used as a reference frame) Motion vectors and a block of residual error components). 1A is a schematic diagram illustrating a conventional MPEG-4 Simple Profile data stream depicting frame interdependence of a GOP. The GOP 10 is composed of an initial I frame 12 followed by a number of forward prediction P-frames 14. The dependency of the P-frame on a previous I or P frame may limit the provisional scalability to a system that only supports forward prediction frames, such as systems that conform to MPEG-4 Simple and H.264 Baseline Profile. Removing any of the P-frames 14 can result in loss of information that may be critical to decoding other P-frames. The removal of the P frame may result in, for example, video jitter or the decoder being unable to continue decoding before marking the I frame 16 under the start of the next GOP.

對該暫時縮放性問題之一解決方案係先前技術中使用之雙向預測訊框。圖1B係一圖解闡釋一達成暫時可縮放性之傳統編碼資料流之示意圖，其描繪一GOP之訊框相依性。GOP 20係由I訊框22A、正向預測P訊框24及雙向預測B訊框26所組成。每一B訊框皆可組合參照I訊框22A或正向預測P訊框24(亦可使用反向預測P訊框，但此實例中並未顯示)之正向及反向運動向量及剩餘錯誤。I訊框22B標記下一GOP之開始。如圖1B中所示，在I訊框22A與P訊框24之間或兩個P訊框24之間僅包含一個B訊框26。可在參考訊框之間插入數個B訊框以使暫時可縮放性具有更大之靈活性。由於沒有其它訊框可依賴於該B訊框作為一參考訊框，因此可在不損失關於其它訊框解碼資訊之情形下移除B訊框26。B訊框26之此特徵可允許B訊框26被插入一位元流，其中編碼器、轉碼器或解碼器可選擇移除B訊框26，以適應通道條件、帶寬限制、電池功率以及其它考量因素。例如，若參考訊框之間存在三個B訊框，則可移除所有三個B訊框並將訊框速率減小四分之三，或可保留位於中間的B訊框而移除另兩個訊框以將訊框速率減小一半。資料速率亦相應減小。One solution to this temporary scalability problem is the bi-directional prediction frame used in the prior art. FIG. 1B is a schematic diagram illustrating a conventional encoded data stream that achieves temporary scalability, depicting frame interdependence of a GOP. The GOP 20 is composed of an I frame 22A, a forward prediction P frame 24, and a bidirectional prediction B frame 26. Each B frame can combine the forward and reverse motion vectors and remaining of the reference I frame 22A or the forward prediction P frame 24 (which can also use the backward prediction P frame, but not shown in this example). error. I frame 22B marks the beginning of the next GOP. As shown in FIG. 1B, only one B frame 26 is included between the I frame 22A and the P frame 24 or between the two P frames 24. Several B frames can be inserted between the reference frames to give greater flexibility to temporary scalability. Since no other frame can rely on the B frame as a reference frame, the B frame 26 can be removed without loss of information about other frame decoding. This feature of B-frame 26 allows B-frame 26 to be inserted into a bit stream, where the encoder, transcoder, or decoder can optionally remove B-frame 26 to accommodate channel conditions, bandwidth limitations, battery power, and Other considerations. For example, if there are three B frames between the reference frames, all three B frames can be removed and the frame rate can be reduced by three quarters, or the B frame in the middle can be retained and the other frame removed. Two frames to reduce the frame rate by half. The data rate is also reduced accordingly.

儘管雙向預測可單獨對正向(單向)預測提供改良之壓縮，但其具有一向下趨勢。雙向預測要求增加的運算需求。雙向預測訊框會導致額外的編碼複雜性，此乃因必須對每一目標巨集區塊實施兩次巨集區塊匹配(運算最密集之編碼過程)，一次係藉助過去之參考訊框，另一次係藉助將來之參考訊框。引入B訊框亦會增加解碼器之運算複雜性且使排程複雜化。此複雜性之增加係MPEG－4 Simple Profile及H.264 Baseline Profile不支援雙向預測之一主要原因。此等規範係為需要有效使用電池及處理功率之裝置(例如行動電話、PDA及類似裝置)而開發。本發明提供一有效方式來為此等功率受限裝置提供暫時可縮放性。Although bi-directional prediction provides improved compression for forward (unidirectional) prediction alone, it has a downward trend. Bidirectional prediction requires increased computational requirements. Bidirectional prediction frames can lead to additional coding complexity, since two macroblock blocks (the most computationally intensive coding process) must be performed for each target macroblock, once with the past reference frame. Another time is to use the future reference frame. The introduction of a B frame also increases the computational complexity of the decoder and complicates scheduling. This increase in complexity is one of the main reasons why MPEG-4 Simple Profile and H.264 Baseline Profile do not support bidirectional prediction. These specifications were developed for devices that require efficient use of batteries and processing power, such as mobile phones, PDAs, and the like. The present invention provides an efficient way to provide temporary scalability for such power limited devices.

本發明涉及一單向預測暫時縮放訊框，以在不改變MPEG－4 Simple Profile及H.264 Baseline Profile中任一語律之情況下提供暫時可縮放性。單向預測暫時縮放訊框僅使用一正向或反向預測，而非傳統B訊框所使用之兩種類型之預測。此外，沒有其它預測訊框可參考單向預測暫時縮放訊框。由於沒有其它訊框可依賴於暫時縮放訊框，所以可自位元流中移除暫時縮放訊框而不會影響剩餘訊框。作為一結果，不需要將任何額外語律引入MPEG－4 Simple Profile或H.264 Baseline Profile中。可使用一單一附加項位元之添加，將一訊框識別為不同於一正常預測訊框之單向預測暫時縮放訊框。The present invention relates to a one-way predictive temporary zoom frame to provide temporary scalability without changing any of the MPEG-4 Simple Profile and H.264 Baseline Profile. Unidirectional prediction Temporary scaling frames use only one forward or backward prediction, rather than the two types of predictions used by traditional B frames. In addition, no other prediction frames can refer to the one-way prediction temporary scaling frame. Since no other frames can rely on the temporary zoom frame, the temporary zoom frame can be removed from the bit stream without affecting the remaining frames. As a result, there is no need to introduce any additional corpus into the MPEG-4 Simple Profile or H.264 Baseline Profile. A frame can be identified as a one-way predictive temporary zoom frame that is different from a normal predictive frame by the addition of a single additional bit.

圖2係一圖解闡釋根據本發明之一正向預測暫時縮放性方案之實例之示意圖。GOP 200包括I訊框210A、P訊框212及暫時可縮放性訊框214。如圖2所示，可使用一單一正向預測訊框作為連續P訊框212之間的單向預測暫時縮放P^＊訊框214。應瞭解，多個單向暫時縮放訊框可相依於一單一參考訊框。於連續P訊框212之間具有多個暫時縮放訊框可達成滿足資料速率要求之更佳之適應性。I訊框210B標記下一GOP之開始。2 is a schematic diagram illustrating an example of a forward prediction temporary scalability scheme in accordance with one aspect of the present invention. The GOP 200 includes an I frame 210A, a P frame 212, and a temporary scalability frame 214. As shown in FIG. 2, a single forward prediction frame can be used to temporarily scale the P ^* frame 214 as a one-way prediction between successive P frames 212. It should be appreciated that a plurality of one-way temporary zoom frames may be dependent on a single reference frame. Having multiple temporary zoom frames between successive P-frames 212 achieves better adaptability to meet data rate requirements. I-frame 210B marks the beginning of the next GOP.

圖3係一圖解闡釋根據本發明之一反向預測暫時縮放性方案之實例之示意圖。GOP 300包括I訊框310A、P訊框312及暫時縮放訊框314。如圖3所示，可使用一單一反向預測訊框作為連續P訊框312之間的單向預測暫時縮放P^＊訊框314。I訊框310B標記下一GOP之開始。如反向及正向兩種情形中所見，尚無其它訊框分別參考暫時縮放訊框214及314。由於無訊框參考暫時縮放訊框，故可將該等暫時縮放訊框自編碼、傳輸或解碼中略去而不會影響任何其它訊框。此可依據排除在傳輸/解碼外之單向預測暫時縮放訊框之數量達成對品質及/或資料速率之逐漸減小。Figure 3 is a schematic diagram illustrating an example of a reverse prediction temporal scalability scheme in accordance with one aspect of the present invention. The GOP 300 includes an I frame 310A, a P frame 312, and a temporary zoom frame 314. As shown in FIG. 3, a single reverse prediction frame can be used to temporarily scale the P ^* frame 314 as a one-way prediction between successive P frames 312. I frame 310B marks the beginning of the next GOP. As seen in both the reverse and forward directions, no other frames refer to the temporary zoom frames 214 and 314, respectively. Since the frameless reference temporarily zooms the frame, the temporary zoom frames can be omitted from the encoding, transmission or decoding without affecting any other frames. This can achieve a gradual reduction in quality and/or data rate based on the number of unidirectionally predicted temporary zoom frames excluded from transmission/decoding.

因單向預測暫時縮放訊框所需運算較B訊框為少，所以單向預測暫時縮放訊框可有利地用於功率受限或運算受限之裝置中。由於將不使用單向預測暫時縮放訊框來預測後續P訊框，故與僅使用P訊框相比，P訊框之編碼效率會降低。此編碼效率之下降因被賦予具有暫時可縮放性之額外益處而可予以容忍。圖2及圖3中提供之單向預測暫時縮放訊框之實例僅參考一個訊框。然而，應瞭解，一單向預測暫時縮放訊框可參考一個以上訊框。涉及一個以上先前或隨後訊框將會增加運算之複雜性，但亦可減少剩餘錯誤之大小。Since the unidirectional prediction temporarily zooms the frame to require less computation than the B frame, the unidirectional prediction temporary scaling frame can be advantageously used in a power limited or computationally constrained device. Since the unidirectional prediction temporarily zooms the frame to predict the subsequent P frame, the coding efficiency of the P frame is reduced compared to using only the P frame. This reduction in coding efficiency can be tolerated by the added benefit of having temporary scalability. The example of the unidirectional prediction temporary scaling frame provided in Figures 2 and 3 refers to only one frame. However, it should be understood that a one-way prediction temporary zoom frame can refer to more than one frame. Involving more than one previous or subsequent frame will increase the complexity of the operation, but it can also reduce the size of the remaining errors.

除運算益處外，當使用正向預測單向暫時縮放訊框替代雙向訊框時可達成一較短之延遲。在反向預測雙向訊框之訊框後編碼該等雙向訊框。此可意味著在能夠顯示B訊框之前存在一額外延遲。圖4係一使用本發明正向預測單向暫時縮放訊框之顯示及編碼過程之訊框次序之實例之圖解。如圖4所示，不同於雙向預測訊框，可以與其將在遙遠裝置處顯示之相同順序編碼並傳輸本發明之單向預測暫時縮放訊框。該依序編碼並傳輸正向預測單向暫時縮放訊框之能力可避免在使用B訊框時遭遇額外延遲，其可成為諸如視訊會議等應用之一額外益處。In addition to the operational benefits, a short delay can be achieved when using a forward-predictive one-way temporary zoom frame instead of a two-way frame. The two-way frames are encoded after the frame of the two-way frame is predicted backward. This can mean that there is an extra delay before the B frame can be displayed. 4 is an illustration of an example of a frame order for displaying and encoding a forward-predicted one-way temporary zoom frame using the present invention. As shown in FIG. 4, unlike the bi-predictive frame, the one-way predictive temporary zoom frame of the present invention can be encoded and transmitted in the same order as would be displayed at the remote device. The ability to sequentially encode and transmit forward-predicted one-way temporary scaling frames avoids the additional delays encountered when using B-frames, which can be an added benefit for applications such as video conferencing.

圖5係一用於編碼及解碼串流式圖像之一般通訊系統之方塊圖。系統500包括編碼器裝置505及解碼器裝置510。編碼器裝置505進一步包括內部編碼元件515、預測編碼元件520、暫時縮放元件525及記憶體元件530。編碼器裝置505能夠自外部源535存取資料。例如，外部源535可係外部記憶體、網際網路或一直播視訊及/或聲訊饋送。外部源535內所包含之資料可處於一原始(未編碼)或已編碼狀態。內部編碼元件515可用來編碼內部編碼訊框。預測編碼元件520可用來編碼所有類型之預測訊框，包括單向預測暫時縮放訊框。除包含用於編碼預測訊框之邏輯外，預測編碼元件520亦包含用於選擇參考訊框之邏輯及用於排除暫時縮放訊框不被其它訊框參考之邏輯。預測編碼元件520可存取原始或經編碼之資料供編碼之用。可存取經編碼之資料，以用單向預測暫時縮放訊框替代正常P訊框或I訊框。當存取經編碼之資料(或內部編碼或中間編碼資料)時，內部編碼元件515及預測編碼元件520內所包含之邏輯解碼經編碼之資料，以產生經重建之原始資料。然後，將此經重建之原始資料編碼成一單向預測暫時縮放訊框(或任一其它類型之訊框)。Figure 5 is a block diagram of a general communication system for encoding and decoding streaming images. System 500 includes an encoder device 505 and a decoder device 510. Encoder device 505 further includes an inner coding component 515, a predictive coding component 520, a temporary scaling component 525, and a memory component 530. Encoder device 505 is capable of accessing data from external source 535. For example, external source 535 can be external memory, the Internet, or a live video and/or voice feed. The material contained within external source 535 can be in an original (uncoded) or encoded state. The inner coding component 515 can be used to encode an inner coded frame. Predictive coding component 520 can be used to encode all types of predictive frames, including unidirectionally predictively temporarily scaled frames. In addition to the logic for encoding the prediction frame, the predictive coding component 520 also includes logic for selecting the reference frame and logic for excluding the temporary zoom frame from being referenced by other frames. Predictive coding component 520 can access the original or encoded data for encoding. The encoded data can be accessed to replace the normal P-frame or the I-frame with a one-way predictive temporary zoom frame. When the encoded data (or internal coded or intermediate coded material) is accessed, the logic contained within internal coding component 515 and predictive coding component 520 decodes the encoded data to produce reconstructed original data. The reconstructed original data is then encoded into a one-way predictive temporary zoom frame (or any other type of frame).

在編碼之後，將經編碼之訊框儲存於記憶體元件530或外部記憶體內。該外部記憶體可與外部源535相同或係一單獨記憶體元件(未顯示)。藉由網路540傳輸(Tx)該等經編碼之訊框。網路540可係有線或無線形式。暫時縮放元件525包含邏輯以確定在傳輸之前是否期望暫時縮放。暫時縮放元件525亦可包含邏輯以識別暫時縮放訊框且若確定期望暫時縮放則將其自傳輸中略去。下文將更全面地闡述由編碼器裝置所實施之編碼過程。After encoding, the encoded frame is stored in memory element 530 or external memory. The external memory can be the same as external source 535 or be a separate memory component (not shown). The encoded frames are transmitted (Tx) by the network 540. Network 540 can be in wired or wireless form. Temporary scaling component 525 contains logic to determine if temporary scaling is desired prior to transmission. Temporary scaling component 525 can also include logic to identify the temporarily scaled frame and omit it from transmission if it is determined that temporary scaling is desired. The encoding process implemented by the encoder device will be explained more fully below.

解碼器裝置510包含類似於編碼器裝置505之元件，該等元件包括：內部解碼元件545、預測解碼元件550、暫時縮放元件555及記憶體元件560。解碼器裝置510可接收已藉由網路540傳輸之已編碼資料或自外部儲存器565接收已編碼資料。內部解碼元件545可用來解碼內部編碼資料。預測解碼元件550可用來解碼預測資料，包括單向預測暫時縮放訊框。暫時縮放元件555包含邏輯以確定是否在解碼之前期望暫時縮放。於此實例中，暫時縮放元件555亦包含邏輯以識別暫時縮放訊框且若確定期望暫時縮放則將其略去不予解碼。在解碼後，該等已解碼訊框可被顯示在顯示元件570上或儲存於內部記憶體560或外部儲存器565內。顯示元件570可係該解碼裝置之一整合部分，例如一電話或PDA上之一顯示屏。顯示元件570亦可係一外部週邊裝置。下文將更全面地闡述由解碼器裝置所實施之解碼過程。The decoder device 510 includes elements similar to the encoder device 505, including: an internal decoding component 545, a predictive decoding component 550, a temporary scaling component 555, and a memory component 560. The decoder device 510 can receive encoded data that has been transmitted over the network 540 or receive encoded data from the external storage 565. Internal decoding component 545 can be used to decode the internally encoded material. The predictive decoding component 550 can be used to decode the predictive data, including unidirectionally predicting the temporarily scaled frame. Temporary scaling component 555 contains logic to determine if temporary scaling is desired prior to decoding. In this example, the temporary scaling component 555 also includes logic to identify the temporarily scaled frame and to omit it from being decoded if it is determined that temporary scaling is desired. After decoding, the decoded frames can be displayed on display element 570 or stored in internal memory 560 or external storage 565. Display component 570 can be an integrated portion of the decoding device, such as a display on a telephone or PDA. Display element 570 can also be an external peripheral device. The decoding process implemented by the decoder device will be explained more fully below.

為提供一解碼器裝置以支援單向預測暫時縮放訊框而實施之修改係微小。由於H.264可支援多重參考編碼，故若該基本解碼器可支援至少兩個參考訊框，則可能無需修改該解碼器來支援單向預測暫時縮放訊框。符合MPEG－4 Simple Profile之解碼器僅允許緩衝器內存在一個參考訊框，因此在解碼一單向正向預測暫時縮放訊框後，可在參考訊框緩衝器內為後續P訊框維持該參考訊框，而非以該剛剛解碼之暫時縮放訊框替代緩衝器內之該參考訊框。The modifications implemented to provide a decoder device to support unidirectional prediction of temporarily scaling frames are minor. Since H.264 can support multiple reference coding, if the basic decoder can support at least two reference frames, it may not be necessary to modify the decoder to support one-way prediction of temporarily scaling the frame. The decoder conforming to the MPEG-4 Simple Profile only allows a reference frame to be stored in the buffer. Therefore, after decoding a one-way forward prediction temporary zoom frame, the reference frame buffer can be maintained for the subsequent P frame. The reference frame, instead of replacing the reference frame in the buffer with the temporarily decoded frame that was just decoded.

除編碼及解碼裝置外，暫時縮放可發生在一稱作轉碼器之中間裝置處。參照圖6，圖中顯示一轉碼器裝置之方塊圖。轉碼器裝置600位於第一網路605與第二網路620之間。轉碼器裝置600藉由第一網路605自一裝置(諸如圖5中所描繪之編碼器裝置505)接收經編碼之資料。轉碼器裝置600將所接收之資料儲存於一記憶體元件615內。轉碼器裝置亦包含一暫時縮放元件610。暫時縮放元件610包含邏輯以確定在第二網路620上之傳輸之前是否期望暫時縮放。暫時縮放元件610亦可包含邏輯以識別暫時縮放訊框且若確定期望暫時縮放則將其自傳輸中略去。下文將更全面地闡述由轉碼器裝置600所實施之轉碼過程。In addition to the encoding and decoding devices, temporary scaling can occur at an intermediate device called a transcoder. Referring to Figure 6, a block diagram of a transcoder device is shown. Transcoder device 600 is located between first network 605 and second network 620. Transcoder device 600 receives the encoded material from a device (such as encoder device 505 depicted in FIG. 5) via first network 605. Transcoder device 600 stores the received data in a memory component 615. The transcoder device also includes a temporary scaling element 610. Temporary scaling component 610 includes logic to determine if temporary scaling is desired prior to transmission on second network 620. Temporary scaling component 610 can also include logic to identify the temporarily scaled frame and omit it from transmission if it is determined that temporary zooming is desired. The transcoding process implemented by transcoder device 600 will be more fully explained below.

圖7係一圖解闡釋根據本發明之一包括暫時縮放之編碼過程之實例之流程圖。該編碼過程發生在一諸如圖5中所描繪之裝置505之編碼器中。將數位視訊資料編碼成由複數個訊框組成之GOPs。一GOP開始於在720處編碼之一內部編碼訊框。該內部編碼訊框用作至少某些隨後中間訊框(或於具有敞開式GOP之反向預測之情形下之先前中間訊框，其中一敞開式GOP可參考來自另一GOP之訊框)之一參考點。編碼過程700亦包括編碼可包括正向或反向預測訊框之預測訊框(730)。該等預測訊框可包含諸如運動向量及剩餘錯誤等可參考一先前內部編碼或預測訊框之運動補償資料。預測訊框亦可用作其它預測訊框之參考訊框(正常及暫時縮放訊框兩者)。編碼單向預測暫時縮放訊框可達成暫時可縮放性(740)。可以一類似於該等預測訊框之方式運算此等訊框(730)，此乃因該等訊框可包含參考一內部編碼或預測訊框之運動補償。然而，該等暫時縮放訊框自身卻不能被另一訊框所參考(即，使該暫時縮放訊框不能被用於預測任一其它訊框)。該暫時縮放訊框資料亦可包含將該訊框識別為一暫時縮放訊框之附加項資訊。由於其它訊框不相依於暫時縮放訊框之存在，故可移除暫時縮放訊框而不會對其它訊框產生不利影響。可將經編碼之訊框儲存於記憶體中供隨後之投送(750)。在編碼後亦可投送經編碼訊框而無需儲存步驟750。Figure 7 is a flow chart illustrating an example of an encoding process including temporary scaling in accordance with one aspect of the present invention. The encoding process takes place in an encoder such as the device 505 depicted in FIG. The digital video data is encoded into GOPs composed of a plurality of frames. A GOP begins with encoding an internal coded frame at 720. The intra-coded frame is used as at least some subsequent intermediate frames (or a previous intermediate frame in the case of a reverse prediction with an open GOP, wherein an open GOP can refer to a frame from another GOP) A reference point. Encoding process 700 also includes encoding a prediction frame (730) that can include a forward or backward prediction frame. The prediction frames may include motion compensation data such as motion vectors and residual errors that may refer to a previous internal coding or prediction frame. The predictive frame can also be used as a reference frame for both predictive frames (both normal and temporarily scaled frames). Encoding one-way prediction temporarily scaling the frame can achieve temporary scalability (740). The frames (730) can be computed in a manner similar to the predictive frames, as the frames can include motion compensation with reference to an internal coded or predictive frame. However, the temporary zoom frames themselves cannot be referenced by another frame (ie, the temporary zoom frame cannot be used to predict any other frame). The temporary zoom frame data may also include additional information that identifies the frame as a temporary zoom frame. Since the other frames are not dependent on the presence of the temporarily zoomed frame, the temporary zoom frame can be removed without adversely affecting other frames. The encoded frame can be stored in memory for subsequent delivery (750). The encoded frame can also be dispatched after encoding without the need to store step 750.

編碼過程700可繼續編碼GOP直到視訊資料710耗盡。為滿足不同之目標，該GOP可由不同訊框類型之不同數量之訊框組成。將大量之暫時縮放訊框編碼於一GOP中將在調節投送品質或複雜性或解碼彼GOP方面提供更大之靈活性。The encoding process 700 can continue to encode the GOP until the video material 710 is exhausted. To meet different goals, the GOP can be composed of different numbers of frames of different frame types. Encoding a large number of temporary scaling frames into a GOP will provide greater flexibility in adjusting delivery quality or complexity or decoding a GOP.

圖8係一根據本發明包括暫時縮放之視訊投送過程之實例之流程圖。圖8之左側係對應於一視訊源中之過程，諸如圖5中所描述之編碼器裝置505，而右側則對應於一目的地裝置中之過程，諸如圖5中所描繪之解碼器裝置510。一有線/無線網路可連接該兩個側且可係有線或無線網路之一組合。至新網路之轉移可包括一轉碼器裝置，例如圖6中所描繪之轉碼器裝置600。圖8中之過程800開始於自記憶體810擷取視訊訊框資料。此記憶體可係先前所形成之永久性記憶體或其亦可係動態記憶體以保持在傳輸時所運算之訊框資料。Figure 8 is a flow diagram of an example of a video delivery process including temporary zooming in accordance with the present invention. The left side of Figure 8 corresponds to a process in a video source, such as the encoder device 505 depicted in Figure 5, while the right side corresponds to a process in a destination device, such as the decoder device 510 depicted in Figure 5. . A wired/wireless network can be connected to the two sides and can be a combination of wired or wireless networks. The transfer to the new network may include a transcoder device, such as the transcoder device 600 depicted in FIG. Process 800 in FIG. 8 begins by capturing video frame data from memory 810. The memory may be a previously formed permanent memory or it may be a dynamic memory to maintain the frame data computed during transmission.

做出一是否暫時縮放該視訊資料之決定(820)。於該決定中考量之因素可係(例如)：提供一較最大值為低之品質位準、將該資料速率降至低於該等網路之一之最大能力、控制訊務、保存一源或一目的地裝置之電池電力或限制編碼及/或解碼之時間。若欲實施暫時縮放，則識別並有選擇地自資料流中移除暫時縮放訊框(830)。因為沒有訊框參考該暫時縮放訊框，因此對任一單向預測暫時縮放訊框之移除將不會影響任一其它訊框。識別可採取多種形式，包括(例如)一單一附加項位元或旗標，當設定等於一時，該附加項位元或旗標會將該訊框識別為一暫時縮放訊框。可使用符合標準之語律或以一專利方式編碼此附加項位元或旗標。若該位元流係符合標準(或規範)，則可藉由一相互之先驗編碼器－伺服器通訊(在網路適應之情形下)或一相互之先驗編碼器－解碼器識別符(在裝置複雜性/功率適應之情形下)來識別該等暫時縮放訊框。該相互之先驗識別符可係，例如，訊框位置(例如，奇數或偶數訊框編號)、解碼或顯現時戳或訊框次序。另一識別形式涉及該解碼器使用該位元流內關於一訊框是否被另一訊框參考之資訊。可藉由該(等)有線及/或無線網路將未移除之視訊訊框傳輸至該目的地裝置(840)。在多播投送情形下，可有多個目的地裝置或在單播投送情形下，可有一單一目的地裝置。A decision is made as to whether to temporarily scale the video material (820). The factors considered in the decision may be, for example, providing a lower quality level, lowering the data rate to less than one of the networks, controlling the traffic, and maintaining a source. Or the battery power of a destination device or the time to limit encoding and/or decoding. If a temporary zoom is to be performed, the temporary zoom frame is identified and selectively removed from the data stream (830). Since there is no frame to refer to the temporary zoom frame, the removal of any one-way prediction temporary zoom frame will not affect any other frame. The identification can take a variety of forms, including, for example, a single additional item or flag, and when set equal to one, the additional item or flag identifies the frame as a temporary zoom frame. This additional item or flag can be encoded using a standard compliant law or in a patented manner. If the bit stream conforms to the standard (or specification), it can be communicated by a mutual a priori encoder-server (in the case of network adaptation) or a mutual a priori encoder-decoder identifier. The temporary zoom frames are identified (in the case of device complexity/power adaptation). The mutual a priori identifiers may be, for example, a frame position (e.g., an odd or even frame number), a decoded or visual time stamp, or a frame order. Another form of identification involves the decoder using information in the bitstream as to whether a frame is referenced by another frame. The unremoved video frame can be transmitted to the destination device (840) by the (and the like) wired and/or wireless network. In the case of multicast delivery, there may be multiple destination devices or, in the case of unicast delivery, there may be a single destination device.

在該目的地裝置(一諸如圖5之解碼器裝置510之解碼器)處，或在一中間網路裝置(一諸如圖6之裝置600之路由器或轉碼器)處，自網路擷取經編碼之視訊資料(850)。在擷取該資料後，該目的地裝置或中間網路裝置可分別決定是否提供暫時縮放(860)。暫時縮放之原因可類似於該視訊源處之彼等原因，尤其對一中間網路路由器而言，係關於網路能力或網路負載。暫時縮放之原因亦可包括：例如，電池電力之保存，尤其對於諸如PDA’s、行動電話及類似裝置之資源受限裝置。若選擇暫時縮放，則識別並略去暫時縮放訊框以滿足一目標參數，例如，一資料速率或一解碼時間。在略去暫時縮放訊框後，以一由其類型決定之方式解碼該等剩餘訊框(例如，內部編碼解碼、正向預測解碼等)(880)。At the destination device (such as a decoder such as decoder device 510 of Figure 5), or at an intermediate network device (such as a router or transcoder of device 600 of Figure 6), Coded video material (850). After the data is retrieved, the destination device or intermediate network device can determine whether to provide temporary scaling (860), respectively. The reason for the temporary zoom can be similar to the reason for the video source, especially for an intermediate network router, regarding network capabilities or network load. Reasons for temporary zooming may also include, for example, battery power conservation, particularly for resource-constrained devices such as PDA's, mobile phones, and the like. If temporary scaling is selected, the temporary zoom frame is identified and omitted to satisfy a target parameter, such as a data rate or a decoding time. After omitting the temporary scaling frame, the remaining frames (e.g., inner code decoding, forward predictive decoding, etc.) are decoded in a manner determined by their type (880).

可在一諸如編碼器裝置505(圖5)之編碼器、一諸如轉碼器裝置600(圖6)之轉碼器、或一諸如解碼器裝置510(圖5)之解碼器處實施上文所討論之做出暫時縮放決定及移除之過程。此等三個裝置之一個或多個裝置可參與在決定移除該相同位元流內之暫時縮放訊框。The above may be implemented at an encoder such as encoder device 505 (Fig. 5), a transcoder such as transcoder device 600 (Fig. 6), or a decoder such as decoder device 510 (Fig. 5). The process of making a temporary zoom decision and removal is discussed. One or more of the three devices may participate in a decision to remove the temporary zoom frame within the same bit stream.

雖然，出於簡化解釋之目的，圖7－8中所示之多個方法被顯示及闡述為一系列動作，但應理解且瞭解，本發明並非受限制於該等動作之次序，因為，根據本發明，某些動作可以不同之次序發生及/或不同於本文所示和所述而與其它動作同時發生。Although the various methods illustrated in FIGS. 7-8 are shown and described as a series of acts for the purpose of simplifying the explanation, it should be understood and appreciated that the invention is not limited to the order of the acts, as In the present invention, certain actions may occur in different orders and/or concurrently with other acts as illustrated and described herein.

儘管已結合利用內部訊框及正向預測訊框作為該等單向預測暫時縮放訊框之參考訊框全面闡述了本發明，但應瞭解，諸如反向預測訊框之其它訊框亦可用作參考訊框。Although the present invention has been fully described in conjunction with the use of internal frames and forward predictive frames as reference frames for such unidirectionally predicted temporally scaled frames, it should be understood that other frames such as reverse predictive frames may also be used. As a reference frame.

儘管結合MPEG－x及H.26x類型之壓縮方案全面闡述了本發明，但應瞭解，其它視訊壓縮方案亦可實施本發明之方法。Although the invention has been fully described in connection with MPEG-x and H.26x type compression schemes, it should be appreciated that other video compression schemes may also implement the methods of the present invention.

本發明之各種態樣包括(但不限於)下文之闡述。Various aspects of the invention include, but are not limited to, the following description.

一種編碼多媒體訊框之方法，其包括藉由單向預測可移除式暫時縮放訊框來編碼可移除式暫時縮放訊框，其中該可移除式暫時縮放訊框不能用於預測任一其它訊框。A method for encoding a multimedia frame, comprising encoding a removable temporary zoom frame by unidirectionally predicting a removable temporary zoom frame, wherein the removable temporary zoom frame cannot be used to predict any Other frames.

一種用於編碼多媒體訊框之設備，其包括用於藉由單向預測可移除式暫時縮放訊框編碼一可移除式暫時縮放訊框之構件，其中該可移除式暫時縮放訊框不能用於預測任一其它訊框。An apparatus for encoding a multimedia frame, comprising: means for encoding a removable temporary zoom frame by unidirectionally predicting a removable temporary zoom frame, wherein the removable temporary zoom frame Cannot be used to predict any other frame.

一種用於編碼多媒體訊框之電子裝置，該電子裝置經組態以藉由單向預測可移除式暫時縮放訊框來編碼一可移除式暫時縮放訊框，其中該可移除式暫時縮放訊框不能用於預測任一其它訊框。An electronic device for encoding a multimedia frame, the electronic device configured to encode a removable temporary zoom frame by unidirectionally predicting a removable temporary zoom frame, wherein the removable temporary frame The zoom frame cannot be used to predict any other frame.

一種電腦可讀媒體，其具有用於致使一電腦執行一編碼多媒體訊框之方法之指令，其包括：編碼一不依據另一訊框預測之內部編碼訊框；編碼一預測訊框，其中依據至少一個內部編碼或預測訊框來預測該預測訊框；及藉由單向預測一可移除式暫時縮放訊框來編碼該可移除式暫時縮放訊框，其中該可移除式暫時縮放訊框不能用於預測任一其它訊框。A computer readable medium having instructions for causing a computer to perform a method of encoding a multimedia frame, comprising: encoding an internal coded frame that is not predicted by another frame; encoding a predictive frame, wherein At least one internal coding or prediction frame to predict the prediction frame; and encoding the removable temporary zoom frame by unidirectionally predicting a removable temporary zoom frame, wherein the removable temporary zoom frame Frames cannot be used to predict any other frame.

一種解碼多媒體訊框之方法，其包括：接收經編碼之訊框資料；識別任何經單向預測之可移除式暫時縮放訊框，其中該可移除式暫時縮放訊框不能用於預測任一其它訊框；及解碼所接收之經編碼訊框以略去至少一個可移除式暫時縮放訊框而不予解碼。A method for decoding a multimedia frame, comprising: receiving encoded frame data; identifying any unidirectionally predictable removable temporary zoom frame, wherein the removable temporary zoom frame cannot be used for prediction a further frame; and decoding the received encoded frame to omit at least one removable temporary zoom frame without decoding.

一種用於解碼多媒體訊框之設備，其包括：用於接收經編碼之訊框資料之構件；用於識別經單向預測之任一可移除式暫時縮放訊框之構件，其中該可移除式暫時縮放訊框不能用於預測任一其它訊框；及用於解碼所接收之經編碼訊框以略去至少一個可移除式暫時縮放訊框而不予解碼之構件。An apparatus for decoding a multimedia frame, comprising: means for receiving encoded frame data; means for identifying any removable temporary zoom frame that is unidirectionally predicted, wherein the movable The divide-type temporary zoom frame cannot be used to predict any other frame; and the means for decoding the received encoded frame to omit at least one removable temporary zoom frame without decoding.

一種用於解碼多媒體訊框之電子裝置，該電子裝置經組態以：接收經編碼之訊框資料；識別經單向預測之任一可移除式暫時縮放訊框，其中該可移除式暫時縮放訊框不能用於預測任一其它訊框；及解碼所接收之經編碼訊框資料以略去至少一個可移除式暫時縮放訊框而不予解碼。An electronic device for decoding a multimedia frame, the electronic device configured to: receive encoded frame data; identify any removable temporary zoom frame that is unidirectionally predicted, wherein the removable The temporary zoom frame cannot be used to predict any other frame; and the received encoded frame data is decoded to omit at least one removable temporary zoom frame without decoding.

一種電腦可讀媒體，其具有用於致使一電腦執行一解碼所媒體訊框之方法之指令，其包括：接收經編碼之訊框資料；識別經單向預測之任一可移除式暫時縮放訊框，其中該可移除式暫時縮放訊框不能用於預測任一其它訊框；及解碼所接收之經編碼訊框資料以略去至少一個可移除式暫時縮放訊框不予解碼。A computer readable medium having instructions for causing a computer to perform a method of decoding a media frame, comprising: receiving encoded frame data; identifying any removable temporary zoom that is unidirectionally predicted a frame, wherein the removable temporary zoom frame cannot be used to predict any other frame; and the received encoded frame data is decoded to omit at least one removable temporary zoom frame from being decoded.

一種暫時縮放多媒體訊框之方法，其包括：藉由一第一網路接收一經編碼之訊框；藉由該第一網路接收一可移除式暫時縮放訊框，其中依據至少一個經編碼之訊框來單向預測該可移除式暫時縮放訊框且該可移除式暫時縮放訊框不能用於預測任一其它訊框；藉由一第二網路傳輸所接收之經編碼訊框；及自傳輸中略去該可移除式暫時縮放訊框。A method for temporarily scaling a multimedia frame, comprising: receiving an encoded frame by a first network; receiving, by the first network, a removable temporary zoom frame, wherein the at least one encoded The frame is used to predict the removable temporary zoom frame in one direction and the removable temporary zoom frame cannot be used to predict any other frame; the received encoded signal is transmitted by a second network transmission The box; and the removable temporary zoom frame is omitted from the transmission.

一種用於暫時縮放多媒體訊框之設備，其包括：用於藉由一第一網路接收一經編碼之訊框之構件；用於藉由該第一網路接收一可移除式暫時縮放訊框之構件，其中依據至少一個經編碼之訊框單向預測該可移除式暫時縮放訊框且該可移除式暫時縮放訊框不能用於預測任一其它訊框；用於藉由一第二網路傳輸所存取之經編碼訊框之構件；及用於自傳輸中略去該可移除式暫時縮放訊框之構件。An apparatus for temporarily scaling a multimedia frame, comprising: means for receiving an encoded frame by a first network; for receiving a removable temporary zoom by the first network a component of the frame, wherein the removable temporary zoom frame is unidirectionally predicted according to the at least one encoded frame and the removable temporary zoom frame cannot be used to predict any other frame; The second network transmits the component of the encoded frame accessed; and the means for omitting the removable temporary zoom frame from the transmission.

普通熟習此項技術者應瞭解，可使用眾多種不同技術及技法中之任一種來表示資訊及信號。舉例而言，在上文通篇中可能提及之資料、指令、命令、資訊、信號、位元、符號及碼片可由電壓、電流、電磁波、磁場或粒子、光場或粒子、或其任一組合來表示。Those of ordinary skill in the art will appreciate that information and signals can be represented using any of a variety of different technologies and techniques. For example, the materials, instructions, commands, information, signals, bits, symbols, and chips that may be referred to throughout the text may be voltage, current, electromagnetic waves, magnetic fields or particles, light fields or particles, or any of them. Combined to represent.

普通熟習此項技術者應進一步瞭解，結合本文所揭示實例闡述之各種例示性邏輯塊、模組、及演算法步驟可構建為電子硬體、電腦軟體或兩者之組合。為清楚地例示硬體與軟體之可互換性，上文大體就其功能闡述了各種例示性組件、塊、模組、電路及步驟。此功能構建為硬體抑或軟體係取決於施加於整體系統上之特定應用及設計約束。熟習此項技術者可針對每一特定應用以不同之方式構建所述功能度，但此種實施方案決定不應視為背離本發明之範疇。It will be further appreciated by those skilled in the art that the various illustrative logical blocks, modules, and algorithm steps described in connection with the examples disclosed herein can be constructed as electronic hardware, computer software, or a combination of both. To clearly illustrate the interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. The ability to build this functionality as a hardware or soft system depends on the specific application and design constraints imposed on the overall system. Those skilled in the art can construct the functionality in a different manner for each particular application, but such implementation decisions should not be considered as a departure from the scope of the invention.

結合本文所揭示實例闡述之各種例示性邏輯塊、模組及電路可使用通用處理器、數位信號處理器(DSP)、應用專用積體電路(ASIC)、場可程序規劃閘陣列(FPGA)或其他可程式規劃邏輯裝置、離散閘或電晶體邏輯、離散硬體組件、或設計用於執行本文所述功能之其任一組合來構建或執行。通用處理器可為微處理器、但另一選擇為，處理器可係任何傳統處理器、控制器、微控制器或狀態機。處理器亦可構建為運算裝置之組合，例如一DSP與一微處理器之組合、複數個微處理器、一或多個微處理器結合一DSP核芯、或任何其他此種構造。Various illustrative logic blocks, modules, and circuits set forth in connection with the examples disclosed herein may use a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or Other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or any combination of functions designed to perform the functions described herein are constructed or executed. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller or state machine. The processor can also be constructed as a combination of computing devices, such as a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.

結合本文所揭示實例闡述之方法或演算法之步驟可直接實施於硬體中、由處理器執行之軟體模組中或兩者之組合中。軟體模組可駐存於隨機存取記憶體(RAM)、快閃記憶體、唯讀記憶體(ROM)、可抹除可程式規劃唯讀記憶體(EPROM)、電子可抹除可程式規劃唯讀記憶體(EEPROM)、暫存器、硬磁碟、可抽換式磁碟、CD－ROM或此項技術中所習知之任一其他形式之儲存媒體中。一實例性儲存媒體耦合至處理器，以使該處理器可自該儲存媒體中讀取資訊或將資訊寫入其中。另一選擇係，儲存媒體可整合至該處理器中。處理器及儲存媒體可駐存於應用專用積體電路(ASIC)中。ASIC可駐存於調變解調器中。另一選擇係，處理器及儲存媒體可作為離散組件駐存於調變解調器中。The steps of the method or algorithm described in connection with the examples disclosed herein may be directly implemented in a hardware, in a software module executed by a processor, or in a combination of both. The software module can reside in random access memory (RAM), flash memory, read-only memory (ROM), erasable programmable read-only memory (EPROM), and electronic erasable programmable programming. Read only memory (EEPROM), scratchpad, hard disk, removable disk, CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from or write information to the storage medium. Alternatively, the storage medium can be integrated into the processor. The processor and the storage medium can reside in an application specific integrated circuit (ASIC). The ASIC can reside in the modem. Alternatively, the processor and the storage medium can reside as discrete components in the modem.

上文對所揭示實例之說明旨在使任何普通熟習此項技術者皆可製作或利用本發明。熟習此項技術者將易知對該等實例之各種修改，且本文所界定之一般原理亦可應用於其他實例，此並不背離本發明之精神或範疇。The above description of the disclosed examples is intended to enable any person skilled in the art to make or utilize the invention. A variety of modifications to the examples will be apparent to those skilled in the art, and the general principles defined herein may be applied to other examples without departing from the spirit or scope of the invention.

本文闡述了一種用於編碼、轉碼及解碼一包括內部編碼訊框、正向及反向預測訊框及單向預測暫時縮放訊框之視訊流之方法、設備及系統。This paper describes a method, apparatus, and system for encoding, transcoding, and decoding a video stream that includes an inner coded frame, forward and backward predictive frames, and a unidirectionally predicted temporarily scaled frame.

10．．．圖像群組(GOP)10. . . Image group (GOP)

12．．．初始I訊框12. . . Initial I frame

14．．．正向預測P訊框14. . . Forward prediction P frame

16．．．下一I訊框16. . . Next I frame

22A．．．I訊框22A. . . I frame

22B．．．I訊框22B. . . I frame

24．．．正向預測P訊框twenty four. . . Forward prediction P frame

26．．．雙向預測B訊框26. . . Bidirectional prediction B frame

200．．．圖像群組(GOP)200. . . Image group (GOP)

210A．．．I訊框210A. . . I frame

210B．．．I訊框210B. . . I frame

212．．．P訊框212. . . P frame

214．．．暫時縮放訊框214. . . Temporarily zoom the frame

300．．．圖像群組(GOP)300. . . Image group (GOP)

310A．．．I訊框310A. . . I frame

310B．．．I訊框310B. . . I frame

312．．．P訊框312. . . P frame

314．．．暫時縮放訊框314. . . Temporarily zoom the frame

500．．．系統500. . . system

505．．．編碼器裝置505. . . Encoder device

510．．．解碼器裝置510. . . Decoder device

515．．．內部編碼元件515. . . Internal coding component

520．．．預測編碼元件520. . . Predictive coding component

525．．．暫時縮放元件525. . . Temporary scaling component

530．．．記憶體元件530. . . Memory component

535．．．外部源535. . . External source

540．．．網路540. . . network

545．．．內部解碼元件545. . . Internal decoding component

550．．．預測解碼元件550. . . Predictive decoding component

555．．．暫時縮放元件555. . . Temporary scaling component

560．．．記憶體元件(內部記憶體)560. . . Memory component (internal memory)

565．．．外部儲存器565. . . External storage

600．．．轉碼器裝置600. . . Transcoder device

605．．．第一網路605. . . First network

610．．．暫時縮放元件610. . . Temporary scaling component

615．．．記憶體元件615. . . Memory component

620．．．第二網路620. . . Second network

圖1A係一圖解闡釋一傳統MPEG－4 Simple Profile資料流之示意圖，圖1B係一圖解闡釋一可達成暫時可縮放性之傳統經編碼之資料流之示意圖，圖2係一根據本發明圖解闡釋一正向預測暫時可縮放性方案之實例之示意圖，圖3係一根據本發明圖解闡釋一反向預測暫時可縮放性方案之實例之示意圖，圖4係一針對使用本發明正向預測單向暫時縮放訊框之顯示及編碼過程之訊框次序之實例之圖例，圖5係一用於編碼及解碼串流式圖像之一般通訊系統之方框圖，圖6係一轉換解碼器裝置之方框圖，圖7係一根據本發明圖解闡釋一包括暫時縮放之編碼過程之實例之流程圖，及圖8係一根據本發明一包括暫1A is a schematic diagram illustrating a conventional MPEG-4 Simple Profile data stream, and FIG. 1B is a schematic diagram illustrating a conventional encoded data stream that can achieve temporary scalability. FIG. 2 is a schematic diagram illustrating the present invention according to the present invention. A schematic diagram of an example of a forward prediction temporary scalability scheme, FIG. 3 is a schematic diagram illustrating an example of a reverse prediction temporary scalability scheme according to the present invention, and FIG. 4 is a schematic diagram for direct prediction using the present invention. FIG. 5 is a block diagram of a general communication system for encoding and decoding a streamed image, and FIG. 6 is a block diagram of a conversion decoder device. 7 is a flow chart illustrating an example of an encoding process including temporary scaling in accordance with the present invention, and FIG. 8 is a

10．．．圖像群組(GOP)10. . . Image group (GOP)

12．．．初始I訊框12. . . Initial I frame

14．．．正向預測P訊框14. . . Forward prediction P frame

16．．．下一I訊框16. . . Next I frame

22A．．．I訊框22A. . . I frame

22B．．．I訊框22B. . . I frame

24．．．正向預測P訊框twenty four. . . Forward prediction P frame

26．．．雙向預測B訊框26. . . Bidirectional prediction B frame

Claims

A method of encoding a multimedia frame in a single layer of bitstreams in an encoder, comprising: predicting, by unidirectionally, all of the shifts in the single layer of bitstreams in a display order Dividing the multimedia frame temporarily to encode the removable temporary zoomed multimedia frames in the single layer bit stream; by means of a decoder, the additional item data for identification of a multimedia frame is The removable temporarily zoomed multimedia frame is encoded as a removable temporarily zoomed multimedia frame, wherein the removable temporary zoomed multimedia frame is used to temporarily scale a data rate of the single layer bit stream Encoded to be removable by the decoder based on the additional item data; and to encode all of the multimedia frames including the removable temporarily scaled multimedia frames to the single layer bit stream The multimedia frame is temporarily scaled without any of the removable types to predict the multimedia frames, wherein the method is performed by one or more processors of the encoder.

According to the method of claim 1, the method further includes: encoding at least one of the other multimedia frames as an internal coded frame, the internal coded frame not being predicted according to another frame.

The method of claim 2, further comprising: encoding at least one of the other multimedia frames as a prediction frame, wherein the prediction frame is predicted according to the at least one internal coding or prediction frame.

According to the method of claim 3, wherein the encoding of the prediction frame includes a forward direction Predict the prediction frame.

The method of claim 1, further comprising: storing the encoded frames in a memory.

The method of claim 3, further comprising: transmitting the encoded frames via a network.

The method of claim 3, further comprising: transmitting the encoded intra-coded frame and the encoded prediction frame via a network, and omitting the encoded removable temporary zoom from the transmission Multimedia frame.

The method of claim 3, further comprising: encoding the prediction frame by means of a dynamic vector and residual error data; and encoding the removable temporary zoomed multimedia frames by means of a dynamic vector and residual error data.

The method of claim 6, further comprising: receiving the transmitted frames; and decoding the received frames.

The method of claim 6, further comprising: receiving the transmitted frames; decoding the received internal coded frame and the received prediction frame, and omitting the received removable temporary zoomed multimedia Frame.

The method of claim 6, further comprising: receiving the received frames; and identifying each of the received removable temporarily zoomed multimedia frames by means of a prior identifier.

An electronic device for encoding a multimedia frame in a single layer of bitstreams, the electronic device being configured to: by unidirectionally backward predicting all of the single layer bitstreams in a display order Removably temporarily zooming the multimedia frame to encode the removable temporary zoomed multimedia frames in the single layer bitstream, with a decoder having additional data for identification of a multimedia frame The removable temporary zoomed multimedia frame code is a removable temporarily zoomed multimedia frame, wherein the removable temporary zoomed multimedia message is used to temporarily scale a data rate of the single layer bit stream The frame is encoded to be removable by the decoder based on the additional item data, and to encode all of the multimedia frames including the removable temporarily zoomed multimedia frames, without using any such The removable multimedia frame is temporarily zoomed to predict the multimedia frames.

According to the electronic device of claim 12, it is further configured to encode at least one of the other multimedia frames as an internal coded frame, the internal coded frame not being predicted according to another frame.

The electronic device of claim 13, further configured to encode at least one of the other multimedia frames as a prediction frame, wherein the prediction frame is predicted based on the at least one internal coding or prediction frame.

According to the electronic device of claim 14, it is further configured to encode the prediction frame by using forward prediction.

According to the electronic device of claim 12, it is further configured to store the encoded frames in a memory.

According to the electronic device of claim 14, it is further configured to transmit the encoded frames via a network.

According to the electronic device of claim 14, further configured to transmit the encoded intra-coded frame and the encoded predictive frame via a network, and the encoded removable form is omitted from the transmission Temporarily zoom the multimedia frame.

According to the electronic device of claim 14, it is further configured to encode the predictive frame by means of a motion vector and residual error data, and to encode the removable temporary zoomed multimedia frames by means of a dynamic vector and residual error data.

A non-transitory computer readable medium having instructions for causing a computer to perform a method of encoding a multimedia frame in accordance with the method of claim 1.

A method of decoding a multimedia frame in a decoder, comprising: receiving one or more removable temporarily scaled multimedia frames encoded in a single layer of bitstreams and at the single layer Frame data of other multimedia frames in the meta stream, wherein all of the removable temporary zoomed multimedia frames in the single layer bit stream for a display order are unidirectionally reverse predicted, wherein the The removable temporary zooming multimedia frame is encoded as a removable temporary zooming multimedia frame by using the additional item data for identifying the multimedia frame, including the removable temporary zooming multimedia frame The multimedia frames are encoded without using any of the removable temporary zoomed multimedia frames to predict the multimedia frames, and wherein to temporarily scale a data rate of the single layer bit stream, such The removable temporary zoom multimedia frame is encoded to be removable; Identifying at least one of the unidirectionally predicted removable temporarily zoomed multimedia frames based on the additional item data; and decoding the received encoded frame data to omit the removable temporary zoomed multimedia At least one of the frames is not decoded, wherein the method is performed by one or more processors of the decoder.

According to the method of claim 21, the method further includes: receiving, by the encoded frame data, at least one internal code frame, the at least one internal code frame is not predicted according to another frame; and decoding the internal code frame.

The method of claim 22, further comprising: receiving the at least one prediction frame with the encoded frame material, wherein the prediction frame is predicted according to the at least one encoded frame; and decoding the prediction frame.

The method of claim 21, wherein the receiving step comprises receiving via a wireless network.

According to the method of claim 23, further comprising: receiving one of the forward predicted prediction frames.

According to the method of claim 21, further comprising: identifying, by means of a prior identifier, each of the received removable temporarily zoomed multimedia frames.

An electronic device for decoding a multimedia frame, the electronic device configured to: receive encoded frame data comprising one or more removable temporarily scaled multimedia frames in a single layer bit stream, One-way reverse Detecting all of the removable temporary zoom frames for a display order and other multimedia frames in the single layer bit stream, wherein the removable temporarily zoomed multimedia frames are a multimedia frame The additional item data for identifying is encoded as a removable temporarily zoomed multimedia frame, wherein the multimedia frames including the removable temporary zoomed multimedia frames are encoded without using any of the Removably temporarily zooming the multimedia frame to predict the multimedia frames, and wherein in order to temporarily scale a data rate of the single layer bit stream, the removable temporary zoomed multimedia frames are encoded to Removable; identifying at least one of the unidirectionally predictable removable temporary zoomed multimedia frames based on the additional item data, and decoding the received encoded frame data to omit the removable Temporarily scaling at least one of the multimedia frames without decoding.

The electronic device of claim 27, further configured to receive the at least one internal code frame by the encoded frame data, the at least one internal code frame not predicting according to another frame, and decoding the internal code frame.

The electronic device of claim 28, further configured to receive the at least one prediction frame with the encoded frame material, wherein the predicted frame is predicted according to the at least one encoded frame; and the prediction frame is decoded.

According to the electronic device of claim 27, it is further configured to receive the encoded frame material via a wireless network.

According to the electronic device of claim 29, it is further configured to receive one of the forward predicted prediction frames.

According to the electronic device of claim 29, it is further configured with a prior knowledge A unique identifier identifies each of the received removable temporary zoom frames.

A computer readable medium having instructions for causing a computer to perform a method of decoding a multimedia frame in accordance with a request item 21.