TWI323130B - Device for video decoding - Google Patents

Device for video decoding Download PDF

Info

Publication number
TWI323130B
TWI323130B TW95120653A TW95120653A TWI323130B TW I323130 B TWI323130 B TW I323130B TW 95120653 A TW95120653 A TW 95120653A TW 95120653 A TW95120653 A TW 95120653A TW I323130 B TWI323130 B TW I323130B
Authority
TW
Taiwan
Prior art keywords
decoding
video decoding
index
memory
syntax
Prior art date
Application number
TW95120653A
Other languages
Chinese (zh)
Other versions
TW200746832A (en
Inventor
Jiunin Guo
Yaochang Yang
Original Assignee
Jiunin Guo
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiunin Guo filed Critical Jiunin Guo
Priority to TW95120653A priority Critical patent/TWI323130B/en
Publication of TW200746832A publication Critical patent/TW200746832A/en
Application granted granted Critical
Publication of TWI323130B publication Critical patent/TWI323130B/en

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)
  • Image Processing (AREA)

Description

1323130 九、發明說明: 【發明所屬之技術領域】 本發明是有關於一種視訊解碼之裝置,且特別是有關 於一種降低H.264/AVC視訊壓縮標準中熵解蝎計算量之視 訊解碼之裝置。 【先前技術】 在H.264/AVC視訊壓縮標準中制訂了兩種新式之熵編 碼(Entropy Coding)演算法,一種為適用於基線協議 (Baseline Profile)標準之上下文自適應的可變長度編瑪 (Context Adaptive Variable Length Coding ; CAVLC),另一 種為適用於主類協議(Main Profile)標準之上下文自適應的 算數編碼(Context-based Adaptive Binary Arithmetic Coding ; CABAC)。 由於CAVLC參考了上下文的内容,因此可以根據機率 的分布而使用更加合適的編碼方式以提高編碼的效能;而 CABAC貝|J利用之前剛剛編過的符號碼來推估下一個符號 出現的條件機率,大幅地降低符號間累贅,並且可以機動 性的調整每個字符的碼以配合機率的分佈,因此可以有非 常高的壓縮率。 上述兩種演算法皆可比熵編碼演算法中習知之可變 長度編碼(Variable Length Coding; VLC)節省 50%以上之 位元率(bit-rate) »其中的CABAC更可比CAVLC節省9% 〜14%之位元率,不過,代價為CABAC較CAVLC超過10〇/。 5 1323130 的計算複雜度。即是說,若上述兩者對同一個樣本解碼, CABAC會較CAVLC多耗費1〇%的計算複雜度。 為了達到應用於高畫質電視(HDTV)和CABAC即時編 碼目標,如何在龐大計算量下提升整體效能與總體輸出率 是一項必須要克服的議題。 在主類協議中,除了切片(slice)的檔頭(Header·)是由切 片解析器(Parser)所編碼之外,其它如巨集區塊 (Macroblock,MB)資訊和殘餘資料(Residual Data)等主要 都是由CABAC所編碼。所謂的切片是許多巨集區塊的集 合,而一張影像是由許多切片所組成,切片為H 264/avc 視訊壓縮標準中的最小可解碼單位(self_dec〇dable仙⑴, 也就是說一個切片單靠本身的壓縮資料就能解碼,而不必 依#其他切片。這樣的好處是當傳送到遠端時,每接收完 一筆切片的壓縮資料就能馬上解碼,不用等待整張的資料 接收完後才能開始,而且萬一傳送的過程中發生資料遺失 或錯誤,也只是影響該筆切片,不會對其他切片有所影響。 另外,CABAC定義了一個機率模組,藉由固定地更 新「〇」、「1」出現的機率來針對樣本編碼,如此方法可以 更有效率地以更少的位元數來編碼樣本,但卻也會在每個 樣本間產生較高的機率相依性,即是說下一個樣本必須等 最後一個樣本完成更新機率模組後才能進行解碼,此流程 會使單位時間内的資料產量效能受到相當大的限制。此種 限制是一個運算上的瓶頸,也是我們在CABAC解碼設計 中所必須解決的課題。 6 【發明内容] 2解決上述和其他的問題,並且達到本發明所主張的 技術優點’本發明提供—種視訊解碼之裝置。 因此本發明的目的就是在提供—種採用管線排程控 1視訊解妈裝置,用以安排整體解碼流程,並 碼時間,增進解碼效能。 解 :本發月的另一目的是在提供一種採用索引建表之視 訊解碼裝置,用以將索引建表分割為多個小區塊以避免浪 費多餘的時間等待記憶體存取,並進而改善存取索引資料 的效率。 。。根據本發明之上述目的,提出一種採用管線排程控制 器之視訊解碼裝置。依照本發明一較佳實施例,本發明包 含三組解碼引擎,一個索引模組及-個機率模組。三組解 辱引擎刀&為.解碼決策(dee〇de deeisiGn),解碼分流 (deC〇de bmSS)與解碼終端(decode terminate)。解碼某一: 符號時,根衫㈣語法元素需求,每次僅有—組解則 擎會被使㈣,並且解碼完畢之後會需要更❹引模組與 機率模組中的數值;㉟此之外,本發明針對三組解4引擎 中的解碼決策,提供了前瞻語法解析技術,以供在一週期 内能多解碼一個符號。 ^ 根據本發明之目的,提出一種分割式索引建表與快取 式索引暫存器技術來做有效記憶體控管之視訊解碼裝 置。依照本發明一較佳實施例,索引建表可提供在一週期 内可同時存取多筆索引資料,而快取式索5丨暫存器預先讀 1323130 入不久之後會使用到之索引資料,以及當内部解碼語法迴 圈產生時,由於僅會使用到某些特定之索引資料,快取式 索引暫存器可暫存短期内會被不斷使用到之索引資料,待 迴圈結束時再將更新過之索引資料寫回記憶體中。 本發明可有效地增進單位時間内之總體輸出率,亦提 供關閉部份解碼引擎之方法,以降低功率消耗。另外,本 發明不需持續對記憶體做存取動作,進而間接地減少功率 之消耗。 【實施方式】 參照第1圖’其繪示依照本發明一較佳實施例的一種 CABAD於解碼系統層級之架構圖。此系統包含CABAD解 碼核心100、位元串流管理器问1价以1111^1^61〇110、語 法分析器(Syntax parser) 120、系統控制器(System Contr〇ller)l3〇、巨集區塊之語法資訊記憶體14〇與巨集區 塊之係數記憶體150。 位元串流管理器110首先將位元串流傳遞給語法分析 器120分析切片(siice)檀頭’再將位元串流傳遞給cabad 解碼核心100。系統控制器130先從巨集區塊之語法資訊 s己憶體140中準備好上方巨集區塊121與左方巨集區塊 122之語法資訊。CAB AD解碼核心ι00解碼後會產生兩部 份的資料,一是目前巨集區塊的語法資訊,透過系統控制 器130寫入巨集區塊之語法資訊記憶體14〇中;另一是殘 餘資料,寫入至巨集區塊之係數記憶體丨5〇中以供反量化 8 1323130 (Inverse Quantization ; IQ)以及反轉換(Inverse Transform ; IT)模組做資料重建的工作《 參照第2圖,其繪示依照本發明一較佳實施例的一種 CABAD解碼之流程圖。在每一個切片開始解碼之前,會 先進行初始化索引建表(Initialize Context Table)200以及 初始化機率模組(Initialize Probability Model)210 的動作, 機率模組240中含有兩個數值:編碼範圍(codiRange)與編 碼偏移量(codiOffset),在解碼一個符號時,首先會根據上 方與左方巨集區塊220中的語法資訊,並配合分析語法元 素的基礎變址(base index)221與位元變址(bin index)222來 查詢索引建表230,最後取得索引數值,索引數值中包含 兩個元素:狀態(state)與最大機率符號(MPS);根據索引建 表230與機率模組240的輸入值,解碼符號250便可以產 生一個符號(或稱為一個位元)。解碼符號250中共有三組 解碼引擎,每次在解碼時都會選出其中一個來進行解碼符 號的動作,之後解碼符號250便會更新索引數值以及將機 率數值分別寫入索引建表230與機率模組240。 同時,更新機率模組240時有可能需要從位元串流資 料260中將機率數值重新標準化,而每次產生出來的符號 (或位元)都會經過位元串分析270檢查目前所解碼出來的 語法位元串是否已經完成,如果目前的語法已經被解碼完 畢,此時會有三種情形:第一,如果目前的語法281是 mb_type,而且所得到的語法值是I_PCM,便會需要重新 初始化機率模組210,再進行解碼下一個語法;第二,如 9 1323130 果目前的語法282是end_of_slice一flag而且所得到的語法 值為1’這代表目前的切片(slice)已經全部被解碼完畢,便 會需要重新初始化索引建表200與初始化機率模組21〇並 重覆本段一開始所描述的過程;第三,便是剩餘的情況, 只需繼續解碼下一個語法便可;若目前的語法尚未被解碼 完成,則繼續解碼語法中的下一個位元。1323130 IX. Description of the Invention: [Technical Field] The present invention relates to a video decoding device, and more particularly to a device for reducing video decoding of entropy decoding calculation amount in H.264/AVC video compression standard . [Prior Art] Two new Entropy Coding algorithms have been developed in the H.264/AVC video compression standard, one is context-adaptive variable length marshalling for the Baseline Profile standard. (Context Adaptive Variable Length Coding; CAVLC), another is Context-based Adaptive Binary Arithmetic Coding (CABAC) for the Main Profile standard. Since CAVLC refers to the content of the context, it is possible to use a more suitable coding method according to the probability distribution to improve the coding efficiency; and CABAC uses the symbol code just compiled to estimate the conditional probability of the next symbol. , greatly reducing the cumbersome between symbols, and can adjust the code of each character with the flexibility to match the probability distribution, so it can have a very high compression ratio. Both of the above algorithms can save more than 50% bit-rate than the conventional Variable Length Coding (VLC) in the entropy coding algorithm. Among them, CABAC can save 9% to 14% compared with CAVLC. The bit rate of %, however, the price is CABAC more than 10〇/CAVLC. 5 1323130 Computational complexity. That is to say, if the above two decode the same sample, CABAC will consume 1% more computational complexity than CAVLC. In order to achieve high-definition television (HDTV) and CABAC instant coding goals, how to improve overall performance and overall output rate under a large amount of computation is an issue that must be overcome. In the main class protocol, in addition to the slice header (Header·) is encoded by the slice parser (Parser), other such as Macroblock (MB) information and Residual Data (Residual Data) The main ones are all coded by CABAC. The so-called slice is a collection of many macroblocks, and an image is composed of many slices, which is the smallest decodable unit in the H 264/avc video compression standard (self_dec〇dable (1), that is, one slice It can be decoded by its own compressed data, instead of relying on #other slices. The advantage is that when transmitted to the far end, each compressed piece of compressed data can be decoded immediately, without waiting for the entire piece of data to be received. In order to start, and if data loss or error occurs during the transmission process, it will only affect the slice and will not affect other slices. In addition, CABAC defines a probability module, which is fixedly updated by “〇”. The probability of occurrence of "1" is to encode the sample. This method can encode the sample more efficiently with fewer bits, but it also produces a higher probability of dependence between each sample, that is to say The next sample must wait for the last sample to complete the update of the probability module before decoding. This process will make the data output performance per unit time quite considerable. This limitation is an operational bottleneck and a problem that we must solve in CABAC decoding design. 6 [Summary] 2 Solving the above and other problems, and achieving the technical advantages claimed by the present invention The invention provides a device for video decoding. Therefore, the object of the present invention is to provide a video scheduling device for video decoding to arrange the overall decoding process, and to code time and improve decoding performance. One object is to provide a video decoding device that uses an index to form a table, which is used to divide an index table into a plurality of cell blocks to avoid wasting redundant time waiting for memory access, and thereby improving the efficiency of accessing index data. According to the above object of the present invention, a video decoding device using a pipeline scheduling controller is provided. According to a preferred embodiment of the present invention, the present invention comprises three sets of decoding engines, an indexing module and a probability module. Group insulting engine knife & decoding decision (dee〇de deeisiGn), decoding offload (deC〇de bmSS) and decoding terminal (decode term Inate). When decoding a certain symbol: the root shirt (4) syntax element requirements, the engine will be enabled (4) each time only the group solution is solved, and the value in the module and the probability module will be needed after the decoding is completed; In addition, the present invention provides a forward-looking syntax analysis technique for decoding decisions in three sets of solution 4 engines for decoding one symbol in a cycle. ^ According to the purpose of the present invention, a partitioned index construction table and The cache indexing device is used for the video decoding device of the effective memory control. According to a preferred embodiment of the present invention, the index table can provide simultaneous access to multiple index data in a cycle, and the cache is cached. The 索5丨 register pre-reads the 1323130 index data that will be used shortly after, and when the internal decoding syntax loop is generated, the cache index register can be used because only certain index data is used. The index data will be used continuously in the short term, and the updated index data will be written back to the memory when the loop ends. The present invention can effectively increase the overall output rate per unit time, and also provides a method of turning off part of the decoding engine to reduce power consumption. In addition, the present invention does not require continuous access to the memory, thereby indirectly reducing power consumption. [Embodiment] Referring to FIG. 1 , an architecture diagram of a CABAD at a decoding system level according to a preferred embodiment of the present invention is shown. This system includes CABAD decoding core 100, bit stream manager, 1 price to 1111^1^61〇110, syntax analyzer (Syntax parser) 120, system controller (System Contr〇ller) l3〇, macro area The block grammar information memory 14 〇 and the macro block coefficient memory 150. The bit stream manager 110 first passes the bit stream to the parser 120 to analyze the slice (siice) and then pass the bit stream to the cabad decoding core 100. The system controller 130 first prepares the syntax information of the upper macro block 121 and the left macro block 122 from the syntax information of the macro block. CAB AD decoding core ι00 will generate two parts of data after decoding. First, the syntax information of the current macro block is written into the grammar information memory of the macro block through the system controller 130; the other is the residual Data is written into the coefficient memory 丨5〇 of the macroblock for inverse quantization 8 1323130 (Inverse Quantization; IQ) and inverse transform (Inverse Transform; IT) module for data reconstruction. A flowchart of CABAD decoding in accordance with a preferred embodiment of the present invention is shown. Before each slice begins to decode, the actions of the Initialize Context Table 200 and the Initialize Probability Model 210 are first performed. The probability module 240 contains two values: the coding range (codiRange). And encoding offset (codiOffset), when decoding a symbol, first according to the syntax information in the upper and left macroblocks 220, and with the analysis of the base index of the syntax element 221 and bit change The bin index 222 queries the index to build the table 230, and finally obtains the index value. The index value includes two elements: a state and a maximum probability symbol (MPS); and an input of the table 230 and the probability module 240 according to the index. The value, decoded symbol 250, can produce a symbol (or a bit). There are three sets of decoding engines in the decoding symbol 250. Each time one of the decoding symbols is selected, the decoding symbol is selected. After that, the decoding symbol 250 updates the index value and writes the probability value to the index table 230 and the probability module 240 respectively. . At the same time, when updating the probability module 240, it may be necessary to re-normalize the probability values from the bit stream data 260, and each generated symbol (or bit) is checked by the bit string analysis 270 to check the currently decoded. Whether the syntax bit string has been completed, if the current syntax has been decoded, there will be three cases: First, if the current syntax 281 is mb_type, and the resulting syntax value is I_PCM, then the probability will need to be re-initialized. Module 210, then decodes the next syntax; second, as in 9 1323130, the current syntax 282 is end_of_slice_flag and the resulting syntax value is 1', which means that the current slice has been completely decoded. It will be necessary to reinitialize the index build table 200 and initialize the probability module 21 and repeat the process described at the beginning of this paragraph; third, it is the remaining case, just continue to decode the next syntax; if the current syntax is not yet After decoding is completed, the next bit in the syntax is continued to be decoded.

參照第3圖,其繪示依照本發明一較佳實施例的一種 CABAD解碼引擎之示意圖^第3圖係繪示第2圖中解碼 符號250的部份,在這個區塊中有三個解碼引擎:解碼決 策 300(dec〇de decision)’解碼分流 310(dee〇debypass)與解 碼終端320,其中最常被使用的為解碼決策3〇〇,使用率 超過百分之九十;解碼決策3〇〇必須根據第2圖中的機率 模組與索引數值23()進行解碼,同時還必須參考三個 表.最小機率符號範圍表(rangeLPS TaMe)33〇,最小機率 符號索引賴表(transIdxLPS侧柳i與最大機率符號索 51轉換表(transIdxMPS)332,以更新索引數值34();剩餘的 兩種解碼引擎只需要根據第2圖令描述的機率模組240便 可進行解碼。 因為每次只有-個解碼引擎會被使用到,所以本發明 提出關閉其餘兩個解碼5丨擎的方法以減少功率的消耗 量’㈣根據這個方法,可叫保每次只 生,/此,在產生最後輪出的部份只要使用-個「0R」之 邏輯:二:不::用多工器以降低硬體面積的成本。 ⑷圖’其繞示習知的-種解碼決策之示意 1323130 圖。其中包含了兩個核心:最大機率符號(Maximum Probable Symbol; MPS)410 和最小機率符號(Least Probable Symbol ; LPS)420,透過一些數學運算式,這兩組核心會 分別產生出兩組結果,包括產生的位元值、新的索引數 值、新的機率模組數值等,最後透過多工器430選出一組 正確的輸出結果;因為這種習知之解碼決策架構在一個週 期時間内最多只能產生一個位元值。 參照第4(b)圖,其繪示依照本發明一較佳實施例的 一種前瞻語法解析偵測器之解碼決策圖。本發明提出了前 瞻語法解析的技術來增加總體輸出率,承接原本的架構, 多工器433傳遞前半部所產生出來的索引數值至前瞻語法 解析偵測器(Look Ahead Parsing Detection, LAPD)440 並檢 查兩個條件,一是機率模組240中的編碼範圍之數值大於 等於256,這代表機率模組240不需要從位元串流資料中 被重新標準化,另一個條件是編碼範圍大於等於編碼偏移 量,這代表位元必須經由最大機率符號410產生出來,由 於在最大機率符號410階段只需由最小機率符號範圍表 (rangeLPS Table)330做狀態轉換,而查最小機率符號範圍 表330會需要兩個資訊:狀態值(state)與編碼範圍 (codiRange),這兩個資訊由多工器432所選出。 由於僅做狀態轉換而沒有其他任何的運算式,因此這 部份的硬體成本與時間影響極為有限,但卻能有效地增加 總體輸出率;而前瞻語法解析偵測器440的索引數值可能 來自於最大機率符號410所產生的結果,也可能來自於另 11 ⑶ 3130 :個索弓丨數值,而最後寫㈣以丨數值可能來自 符號410或最大機率符號411。 後回存的索引數值。 係用以選擇最 =5⑷圖’其繪示習知的一種解碼流 體⑽ '解碼符號52。、以及寫回取索引記憶 勺w家引記憶體530各需|Referring to FIG. 3, a schematic diagram of a CABAD decoding engine according to a preferred embodiment of the present invention is shown. FIG. 3 is a diagram showing a portion of the decoding symbol 250 in FIG. 2, in which three decoding engines are included. Decoding decision 300 (dee dedepass) and decoding terminal 320, wherein the most commonly used is decoding decision 3, the usage rate is more than 90%; decoding decision 3〇 〇 Must be decoded according to the probability module and index value 23() in Figure 2, and must also refer to three tables. The minimum probability symbol range table (rangeLPS TaMe) 33〇, the minimum probability symbol index table (transIdxLPS side willow) i and the maximum probability symbol 51 conversion table (transIdxMPS) 332 to update the index value 34 (); the remaining two decoding engines only need to be decoded according to the probability module 240 described in the second figure. Because only - A decoding engine will be used, so the present invention proposes to turn off the remaining two decoding 5 engine to reduce the power consumption' (four) according to this method, can be called each time only, / this, in the last round Out Some use only one logic of "0R": two: no:: use a multiplexer to reduce the cost of the hardware area. (4) Figure 'Figure's schematic diagram of the decoding decision 1323130. It contains Two cores: Maximum Probable Symbol (MPS) 410 and Least Probable Symbol (LPS) 420. Through some mathematical expressions, the two sets of cores produce two sets of results, including the generated bits. The value of the element, the new index value, the new probability module value, etc., finally selects a correct set of output results through the multiplexer 430; because this conventional decoding decision architecture can generate at most one bit in one cycle time. Referring to FIG. 4(b), there is shown a decoding decision diagram of a forward-looking syntax analysis detector according to a preferred embodiment of the present invention. The present invention proposes a forward-looking syntax analysis technique to increase the overall output rate and undertake In the original architecture, the multiplexer 433 passes the index value generated by the first half to the Look Ahead Parsing Detection (LAPD) 440 and checks the two bars. First, the value of the coding range in the probability module 240 is greater than or equal to 256, which means that the probability module 240 does not need to be re-normalized from the bit stream data, and the other condition is that the coding range is greater than or equal to the coding offset. The representative bit must be generated via the maximum probability symbol 410. Since the state transition is only required by the range LPS Table 330 at the maximum probability symbol 410 stage, the check of the minimum probability symbol range table 330 would require two pieces of information: The status value (state) and the encoding range (codiRange) are selected by the multiplexer 432. Since only state conversion is performed without any other expressions, the hardware cost and time impact of this part is extremely limited, but it can effectively increase the overall output rate; and the index value of the look-ahead syntax detector 440 may come from The result produced by the maximum probability symbol 410 may also be derived from another 11 (3) 3130: number of values, and the last (4) may be derived from the symbol 410 or the maximum probability symbol 411. The index value to be saved back. It is used to select the most = 5 (4) picture' which shows a conventional decoding stream (10) 'decoding symbol 52'. And write back the index memory spoon w home memory 530 needs |

:::期時間來完成;花三個週期才解碼出一個符合是非 b又有效率的,管線排程技術是—個最直觀而且非㈣易 的方法,只要讓讀取索引記憶體川、解碼符& 52〇、寫 回索引記憶體530在同-個週期時間内重疊地運作,便能 有效地解決這個問題。 參照第5⑻與5(c)圖’其繪示依照本發明一較佳實施 例的-種管線化之解碼流程示意圖,與本發明__種解碼迴 圈中語法元素之解碼流程圖。 第5(b)圖緣示了經由管線排程方法後的解碟流程。由 圖中可知’管線階段愈多,整體的節省效率可由33%至5〇% 愈來愈高。另外,有-種狀況是—個迴圈内所解碼的語法 所會使關的索引數值是從某幾個特定位址所取出,如第 5(c)圖中所示,CABAD需要在一個十六次的迴圈中解碼兩 個連續的語法:⑴ prev_intra4x4一pred—m〇de_flag 與⑺ rem一intra4X4_pred一mode。因此,本發明提出快取式索引 暫存器的技術來減少索引資料的存取次數。 參照第5(d)圖’其繪示依照本發明一較佳實施例的一 種語法官線化快取暫存器之解碼流程示意圖。只要在迴圈 12 開始讀取索引記憶體510至快 ,,最後從快取暫存器54(^==快取暂存器 新過的索弓I數值寫回倉引記憶體 中把更 各兩次便可,與原本的讀寫各十六次:比二=寫 合1=存取動作,也間接地節省了功率的消耗。結 將解^ 管線排程與快取式^暫存器方法,可 ::碼:需的週期時間從九十六個週期減少至三十四個 ^ 大幅提昇約61.40/0的節省比。 ^圖^會示依照本發明—較佳實施例的一種系統之 =ΓΓ。將兩組索引數值的輸入至解碼決策-。 第一解碼器與第二解竭器621,如 的了=塊記憶體的話,是不可能同時存取兩筆索引資料 己憶體被分割成多個小塊以供同時從快取索引 61二 〇、641、642'64”做讀取,如記憶體61〇、611、 ⑴13的部份;與同時寫入,如記憶體630、631、632、 633的部份。 rARl 7圖係繪示依照本發明一較佳實施例的-種 C/BAD解碼核心之示意圖心議解碼核心包括^ f法解析她-、分割式索引記《72。、快 ,、引暫存益73〇、管線排程器與位元分析控制器740 (管 線排程器需結合位元分析控制器的結果以便進行不同的 排程)’結合初始化用之索引唯讀記憶體750。 若滿足前瞻語法解析偵測器71〇中所包含之解析條 即可在—週期内多解碼一個符號,提昇此視訊解瑪方 13 之輸出效率°分割式索引記憶體720可使管線排程器與 :立,分析控制器740於同一記憶體與同_週期内,同時進 订讀取記憶體’解碼符號與寫入記憶體之動作。快取 ^暫存益730可預先讀取索引資訊以避免多餘之等待時 二?配合分割式索引記憶體720可有效管理索引記憶 。官線排程器與位元分析控制器74G控制了解碼語法; 部,或是複數個解碼語法之間的排程。 。 本CABAD解石馬核心之總體輸出率可以 HD1080標準中每秒三十張之壓縮規格。:::Time to complete; spend three cycles to decode a non-b and efficient, pipeline scheduling technology is the most intuitive and not (four) easy method, as long as the index memory is read and decoded The words & 52 〇, write back index memory 530 overlap in the same cycle time, can effectively solve this problem. Referring to Figures 5(8) and 5(c), there is shown a schematic diagram of a pipelined decoding process in accordance with a preferred embodiment of the present invention, and a decoding flow diagram of syntax elements in the __ decoding loop of the present invention. Figure 5(b) shows the solution process after the pipeline scheduling method. As can be seen from the figure, the more pipeline stages, the overall efficiency can be increased from 33% to 5〇%. In addition, there is a condition that the syntax of the decoded syntax in the loop is taken from a certain number of specific addresses. As shown in Figure 5(c), CABAD needs to be in a ten Two consecutive syntaxes are decoded in six loops: (1) prev_intra4x4-pred-m〇de_flag and (7) rem-intra4X4_pred-mode. Therefore, the present invention proposes a technique of a cached index register to reduce the number of accesses to index data. Referring to FIG. 5(d), a schematic diagram of a decoding process of a language judged line-up cache is shown in accordance with a preferred embodiment of the present invention. As long as the index memory 510 is read to be fast in the loop 12, finally, from the cache register 54 (^== cache register, the new value of the cable I write back to the memory, Two times, and the original reading and writing each sixteen times: than two = write together 1 = access action, also indirectly saves power consumption. The knot will solve ^ pipeline scheduling and cache type ^ register The method can be:: code: the required cycle time is reduced from ninety-six cycles to thirty-four ^ greatly increasing the savings ratio of about 61.40 / 0. ^ Figure 2 shows a system in accordance with the present invention - a preferred embodiment = ΓΓ. Input the two sets of index values to the decoding decision - The first decoder and the second decompressor 621, such as = block memory, it is impossible to access two index data at the same time It is divided into a plurality of small blocks for simultaneous reading from the cache index 61, 641, 642 '64", such as the memory 61 〇, 611, (1) 13; and simultaneous writing, such as memory 630 Sections of 631, 632, and 633. The rARl 7 diagram illustrates a schematic core decoding core package of a C/BAD decoding core in accordance with a preferred embodiment of the present invention. Including the ^f method to analyze her-, the split index is "72., fast, and the temporary storage benefit 73", the pipeline scheduler and the bit analysis controller 740 (the pipeline scheduler needs to be combined with the bit analysis controller The result is to perform different schedulings. 'Integrated indexing read-only memory 750. If the parsing strip included in the forward-looking parser detector 71 is satisfied, one symbol can be decoded in the - period to enhance the video. The output efficiency of the solution square 13 The split index memory 720 can make the pipeline scheduler and the analysis controller 740 in the same memory and the same period, and simultaneously read the read memory 'decoding symbols and writes Into the memory action. Cache ^ temporary storage benefits 730 can read the index information in advance to avoid redundant waiting time 2? With the split index memory 720 can effectively manage the index memory. Official line scheduler and bit element analysis control The 74G controls the decoding syntax; the section, or the scheduling between the plurality of decoding syntaxes. The overall output rate of the CABAD calculus core can be 30 compression specifications per second in the HD1080 standard.

由上述本發明較佳實施例可知,應用本發明具 優點: βN 1·本發明提供之視訊壓縮標準之高效能索⑽碼架 構。其十的管線排程控制器可對解碼一個符號 時間做最佳化; 4 2·本發明應用前瞻語法解析之技術,使得在每個週期 時間内可以多解碼一個符號;分割式索引建表配合快取式 索引暫存ϋ的技術可以有效率地f理並減少記憶體 取次數。 3_因本發明可有效減少單-解碼的週期,且大幅降低 記憶體的存取次數,故糾低整體解碼线之功率消耗。 雖然本發明已以-較佳實施例揭露如上,然其並非用 以限定本發明,任何熟習此技藝者,在不脱離本發明之精 神和範圍内’當可作各種之更動與濁飾,因此本發明之保 護範圍當視後狀申請專利㈣料定者為準。 1323130 【圖式簡單說明】 為讓本發明之上述和其他目的、特徵、優點與實施例 能更明顯易懂,所附圖式之詳細說明如下: 第1圖係繪示依照本發明一較佳實施例的一種 CABAD於解碼系統層級之架構圖。 第2圖係繪示依照本發明一較佳實施例的一種 CABAD解碼之流程圖。 第3圖係繪示依照本發明一較佳實施例的一種 CABAD解碼引擎之示意圖。 第4(a)圖係繪示習知的一種解碼決策之示意圖。 第4(b)圖係繪示依照本發明一較佳實施例的一種前瞻 語法解析偵測器之解碼決策圖。 第5(a)圖係繪示習知的一種解碼流程之示意圖。 第5(b)圖係繪示依照本發明一較佳實施例的一種管線 化之解碼流程示意圖。 第5(c)圖係繪示依照本發明一較佳實施例的一種解碼 迴圈中語法元素之解碼流程圖。 第5(d)圖係繪示依照本發明一較佳實施例的一種語法 管線化快取暫存器之解碼流程示意圖。 第6圖係繪示依照本發明一較佳實施例的一種系統之 解碼流程示意圖。 第7圖係繪示依照本發明一較佳實施例的一種 C AB AD解碼核心之示意圖。 15 1323130 【主要元件符號說明】 100 : CABAD解碼核心 120 :語法分析器 122 :左方巨集區塊 14〇.巨集區塊之語法資訊記 憶體 200 :初始化索引建表 220 _上方與左方巨集區塊 222 :位元變址 24〇 :機率模組 260 :位元串流資料 281 :語法 3〇〇 :解碼決策 320 :解碼終端 33 1 :最小機率符號索引轉換 表 340 :索引數值 410:最大機率符號 420 :最小機率符號 431:多工器 433:多工器 510 :讀取索引記憶體 53〇 :寫回索引記憶體 5 5 0 :快取暫存器 :位元串流管理器 :上方巨集區塊 :系統控制器 :巨集區塊之係數記憶體 :初始化機率模組 :基礎變址 :索引建表 :解碼符號 :位元串分析 ·‘語法 :解碼分流 :最小機率符號範圍表 :最大機率符號索引轉換 :邏輯閘 :最大機率符號 :多工器 :多工器 .則瞻語法解析偵測器 :解碼符號 :快取暫存器 :記憶體 16 1323130 611 :記憶體 613 :記憶體 621 ··第二解碼器 631 :記憶體 633 :記憶體 641 :快取索引暫存器 643 :快取索引暫存器 710 : 730 :快取式索引暫存器 612 :記憶體 620 :第一解碼器 630 :記憶體 632 :記憶體 640 :快取索引暫存器 642 :快取索引暫存器 700 : CABAD解碼核心 前瞻語法解析偵測器 720 :分割式索引記憶體 740 :管線排程器與位元分析 750 ··結合初始化用之索引唯控制器 讀記憶體It will be apparent from the above-described preferred embodiments of the present invention that the application of the present invention has advantages: βN 1 · The high performance cable (10) code architecture of the video compression standard provided by the present invention. The ten pipeline scheduling controller can optimize the decoding of one symbol time; 4 2. The invention applies the technique of forward-looking syntax analysis, so that one symbol can be decoded in each cycle time; The technique of cached index temporary storage can efficiently and reduce the number of memory fetches. 3_ The present invention can effectively reduce the single-decoding period and greatly reduce the number of memory accesses, thereby reducing the power consumption of the overall decoding line. Although the present invention has been described above in terms of a preferred embodiment, it is not intended to limit the invention, and it will be apparent to those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention. Therefore, the scope of protection of the present invention is subject to the patent application (4). BRIEF DESCRIPTION OF THE DRAWINGS The above and other objects, features, advantages and embodiments of the present invention will become more <RTIgt; An architectural diagram of a CABAD at the decoding system level of an embodiment. 2 is a flow chart showing a CABAD decoding in accordance with a preferred embodiment of the present invention. 3 is a schematic diagram of a CABAD decoding engine in accordance with a preferred embodiment of the present invention. Figure 4(a) is a schematic diagram showing a conventional decoding decision. Figure 4(b) is a diagram showing the decoding decision of a look-ahead syntax parser detector in accordance with a preferred embodiment of the present invention. Figure 5(a) is a schematic diagram showing a conventional decoding process. Figure 5(b) is a schematic diagram showing a pipelined decoding process in accordance with a preferred embodiment of the present invention. Figure 5(c) is a flow chart showing the decoding of syntax elements in a decoding loop in accordance with a preferred embodiment of the present invention. FIG. 5(d) is a schematic diagram showing a decoding process of a syntax pipelined cache register according to a preferred embodiment of the present invention. Figure 6 is a schematic diagram showing the decoding process of a system in accordance with a preferred embodiment of the present invention. Figure 7 is a schematic diagram showing a C AB AD decoding core in accordance with a preferred embodiment of the present invention. 15 1323130 [Description of main component symbols] 100: CABAD decoding core 120: parser 122: left macroblock block 14〇. Macroblock block syntax information memory 200: Initialization index table 220 _upper and left Macro block 222: bit indexing 24: probability module 260: bit stream data 281: syntax 3: decoding decision 320: decoding terminal 33 1 : minimum probability symbol index conversion table 340: index value 410 : Maximum probability symbol 420: Minimum probability symbol 431: Multiplexer 433: Multiplexer 510: Read index memory 53: Write back to index memory 5 5 0: Cache register: Bit stream manager : Upper macro block: System controller: coefficient of macro block memory: initial probability module: basic index: index built table: decoding symbol: bit string analysis · 'syntax: decoding shunt: minimum probability symbol Range table: Maximum probability symbol Index conversion: Logic gate: Maximum probability symbol: multiplexer: multiplexer. Proximity parser: Decoding symbol: Cache register: Memory 16 1323130 611: Memory 613 : Memory 621 ··Second Decoder 631 :Remember Recall 633: Memory 641: Cache Index Scratchpad 643: Cache Index Scratchpad 710: 730: Cache Index Scratchpad 612: Memory 620: First Decoder 630: Memory 632: Memory Body 640: cache index register 642: cache index register 700: CABAD decoding core look-ahead syntax parser detector 720: split index memory 740: pipeline scheduler and bit analysis 750 · · combined initialization Index-only controller read memory

1717

Claims (1)

1323130 年 1· mmm 十、申請專利範圍: 1.一種視訊解碼之裝置,至少包含: 一 CABAD解碼核心,包含一前瞻語法解析偵測器, 符合該前瞻語法解析偵測器包含之條件,可在一週期内多 解碼個符號’提昇該視訊解碼之方法之輸出效率,該解 碼符號核心輸出至少二符號,其中之一係來自該前瞻語法 解析偵測器; 一管線排程與位元分析控制器,係用以接收該至少二 符號,且控制一解碼語法内部或複數個解碼語法之間的排 程’並產生至少二索引位址; 刀割式索引記憶體根據該至少二索引位址,使該管 線排程器於同一記憶體中同時進行讀取與寫入之動作;以 及 一快取式索引暫存器,該快取式索引暫存器可預先讀 取該分割式索引記憶體内之索引資訊,以避免多餘之等待 時間,且該快取式索引暫存器配合該分割式索引記憶體係 用以管理索引記憶體。 2.如申請專利範圍第1項所述之視訊解碼之裝置,其 中該管線排程器於同一週期内進行之工作為讀取記憶 體,解碼符號與寫入記憶體。 3.如申請專利範圍第1項所述之視訊解碼之裝置,其 中該解碼符號核心包含複數個解碼引擎。 18 1323130 4. 如申請專利範圍第3項所述之視訊解碼之裝置,其 中該些解碼引擎為解碼決策。 5. 如申請專利範圍第4項所述之視訊解碼之裝置, 其中該解碼決策包含至少一個條件式。 6. 如申請專利範圍第5項所述之視訊解碼之裝置,其 中該或該些條件式為編碼範圍大於256。 7·如申請專利範圍第5項所述之視訊解碼之裝置,其 中該或該些條件式為編碼範圍等於256。 8.如申請專利範圍第5項所述之視訊解碼之裝置,其 中該或該些條件式為編碼範圍大於編碼偏移量。 •如申印專利範圍第5項所述之視訊解碼之裝置,其 該或該些條件式為編瑪範圍等於編碼偏移量。 該解碼決策可於一週 10.如申請專利範圍第5項所述之視訊解碼之裝置 藉由滿足該或該些條件式之情況下, 期内多解碼一符號。 置 &quot;•如申請專利範圍第3項所述之視訊解瑪之裝 19 1323130 其中該些解碼引擎為解碼分流。 12. 如申請專利範圍第3項所述之視訊解碼之裝置, 其中該些解碼引擎為解碼終端。 13. 如申請專利範圍第3項所述之視訊解碼之裝置, 其中該解碼符號核心進行解碼時,關閉至少一個解碼引 擎。 如申請專利範圍第3項所述之視訊解碼之裝置, ,、中該解碼符號核心進行解碼時,僅開啟—個解碼引擎。 如申請專利範圍第1項所述之視訊解碼之裝置, 其中該解碼符號核心包含一邏輯閘。 16.如申請專利範圍第。項所述之視訊解碼之裝 置’其中該邏輯閘為〇R邏輯閘。 20 1323130 f9% f1323130 1·mmm X. Patent application scope: 1. A video decoding device, comprising at least: a CABAD decoding core, comprising a forward-looking syntax parser, conforming to the conditions included in the forward-looking parser detector, Multi-decoding symbols in one cycle to improve the output efficiency of the video decoding method, the decoded symbol core outputs at least two symbols, one of which is from the forward-looking syntax analysis detector; a pipeline scheduling and bit analysis controller For receiving the at least two symbols, and controlling a schedule between the decoding syntax or the plurality of decoding syntaxes and generating at least two index addresses; the knife-cut index memory is based on the at least two index addresses. The pipeline scheduler performs simultaneous reading and writing operations in the same memory; and a cache index register, the cacheable index register can read the partitioned index memory in advance Indexing information to avoid redundant waiting time, and the cached index register cooperates with the split index memory system to manage index memory2. The apparatus for video decoding according to claim 1, wherein the pipeline scheduler performs the same cycle to read the memory, decode the symbols and write the memory. 3. The apparatus for video decoding according to claim 1, wherein the decoded symbol core comprises a plurality of decoding engines. The apparatus for video decoding according to claim 3, wherein the decoding engines are decoding decisions. 5. The apparatus for video decoding according to claim 4, wherein the decoding decision comprises at least one conditional expression. 6. The apparatus for video decoding according to claim 5, wherein the conditional expression is that the coding range is greater than 256. 7. The apparatus for video decoding according to claim 5, wherein the conditional expression is that the coding range is equal to 256. 8. The apparatus for video decoding according to claim 5, wherein the conditional expression is that the coding range is greater than the coding offset. The device for video decoding as described in claim 5, wherein the conditional expression is equal to the code offset. The decoding decision can be made in one week. 10. The device for video decoding according to item 5 of claim 5 further decodes a symbol by satisfying the conditional expression. &quot;• As described in the patent application scope 3 of the video decoding device 19 1323130, the decoding engine is a decoding shunt. 12. The apparatus for video decoding according to claim 3, wherein the decoding engines are decoding terminals. 13. The apparatus for video decoding according to claim 3, wherein the decoding symbol core is turned off to turn off at least one decoding engine. For example, in the device for video decoding described in claim 3, when the decoded symbol core is decoded, only one decoding engine is turned on. The device for video decoding according to claim 1, wherein the decoded symbol core comprises a logic gate. 16. If you apply for a patent scope. The video decoding device described in the item wherein the logic gate is a 〇R logic gate. 20 1323130 f9% f 解碼語法中的下一個位元Decoding the next bit in the grammar
TW95120653A 2006-06-09 2006-06-09 Device for video decoding TWI323130B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW95120653A TWI323130B (en) 2006-06-09 2006-06-09 Device for video decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW95120653A TWI323130B (en) 2006-06-09 2006-06-09 Device for video decoding

Publications (2)

Publication Number Publication Date
TW200746832A TW200746832A (en) 2007-12-16
TWI323130B true TWI323130B (en) 2010-04-01

Family

ID=45074019

Family Applications (1)

Application Number Title Priority Date Filing Date
TW95120653A TWI323130B (en) 2006-06-09 2006-06-09 Device for video decoding

Country Status (1)

Country Link
TW (1) TWI323130B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI510067B (en) * 2012-12-06 2015-11-21 Gemtek Technology Co Ltd Video playback system with multiple video decoders and related computer program products

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI510067B (en) * 2012-12-06 2015-11-21 Gemtek Technology Co Ltd Video playback system with multiple video decoders and related computer program products

Also Published As

Publication number Publication date
TW200746832A (en) 2007-12-16

Similar Documents

Publication Publication Date Title
JP4139330B2 (en) Improved variable length decoder
TWI330042B (en)
US7286066B1 (en) Acceleration of bitstream decoding
US8233545B2 (en) Run length encoding in VLIW architecture
US9001882B2 (en) System for entropy decoding of H.264 video for real time HDTV applications
JP5149444B2 (en) Decoding system and method
US7411529B2 (en) Method of decoding bin values using pipeline architecture and decoding device therefor
KR20210063483A (en) Methods and apparatus for video encoding and decoding binary sets using adaptive tree selection
US7557740B1 (en) Context-based adaptive binary arithmetic coding (CABAC) decoding apparatus and decoding method thereof
Guo et al. A new reference frame recompression algorithm and its VLSI architecture for UHDTV video codec
US7345601B2 (en) Variable length coding algorithm for multiple coding modes
US7339507B1 (en) Device for video decoding
Zhou et al. Reducing power consumption of HEVC codec with lossless reference frame recompression
TWI323130B (en) Device for video decoding
CN100466743C (en) Method for programmable entropy decoding based on shared storage and countra-quantization
Chen et al. A 2014 Mbin/s deeply pipelined CABAC decoder for HEVC
CN101267559A (en) Universal entropy decoding method and device for video decoder
Lee et al. A design of high-performance pipelined architecture for H. 264/AVC CAVLC decoder and low-power implementation
WO2010095181A1 (en) Variable-length decoding device
WO2023174407A1 (en) Encoding and decoding methods and apparatuses, and devices therefor
CN102148971A (en) Method of designing high-performance low-power consumption CAVLC decoder
Hong et al. A 360Mbin/s CABAC decoder for H. 264/AVC level 5.1 applications
Huang et al. High throughput VLSI architecture for H. 264/AVC context-based adaptive binary arithmetic coding (CABAC) decoding
US20090006664A1 (en) Linked DMA Transfers in Video CODECS
Ahangar et al. Real time low complexity VLSI decoder for prefix coded images

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees