TW201924340A

TW201924340A - Image encoder using machine learning and data processing method of the image encoder

Info

Publication number: TW201924340A
Application number: TW107130991A
Authority: TW
Inventors: 楊政燁
Original assignee: 南韓商三星電子股份有限公司
Priority date: 2017-10-19
Filing date: 2018-09-04
Publication date: 2019-06-16
Also published as: KR102535361B1; US11115673B2; TWI748125B; KR20190043930A; US20220007045A1; US20190124348A1; CN109688406B; US11694125B2; CN109688406A

Abstract

An image encoder for outputting a bitstream by encoding an input image includes a predictive block, a machine learning based prediction enhancement (MLBE) block, and a subtractor. The predictive block is configured to generate a prediction block using data of a previous input block. The MLBE block is configured to transform the prediction block into an enhanced prediction block by applying a machine learning technique to the prediction block. The subtractor is configured to generate a residual block by subtracting pixel data of the enhanced prediction block from pixel data of a current input block.

Description

Image encoder using machine learning and data processing method of image encoder

本揭露是有關於一種電子裝置。更具體而言，本揭露是有關於一種具有使用機器學習技術的影像編碼器的電子裝置以及所述影像編碼器的編碼方法/用於所述影像編碼器的編碼方法。The disclosure relates to an electronic device. More specifically, the present disclosure relates to an electronic device having a video encoder using machine learning technology, and an encoding method of the image encoder/encoding method for the image encoder.

對具有高解析度、高訊框率（frame rate）、高位元深度等特徵的高清視訊服務的需求已快速地增長。因此，用於高效地對大量視訊資料進行編碼及解碼的編解碼器的重要性已引起人們的注意。The demand for high definition video services with high resolution, high frame rate, high bit depth and the like has rapidly increased. Therefore, the importance of codecs for efficiently encoding and decoding a large amount of video data has attracted attention.

H.264或進階視訊編碼（advanced video coding，AVC）技術涉及視訊壓縮且相較於先前的視訊壓縮技術在壓縮效率、影像品質、位元率等方面可提供增強的效能。此種視訊壓縮技術已藉由數位電視（television，TV）變得商業化且廣泛地用於各種應用領域，例如，視訊電話、視訊會議、數位多功能碟（digital versatile disc，DVD）、遊戲及立體（three-dimensional，3D）電視。H.264或AVC壓縮技術當前可相較於先前版本在壓縮效率、影像品質、位元率等方面提供優異的效能。然而，運動預測模式在此種技術中可更加複雜，且由此可逐漸達到壓縮效率的極限。H.264 or advanced video coding (AVC) technology involves video compression and provides enhanced performance in terms of compression efficiency, image quality, bit rate, etc. compared to previous video compression techniques. Such video compression technology has been commercialized by digital television (TV) and widely used in various applications, such as video telephony, video conferencing, digital versatile disc (DVD), games and Three-dimensional (3D) TV. H.264 or AVC compression technology currently provides superior performance in terms of compression efficiency, image quality, bit rate, etc. compared to previous versions. However, the motion prediction mode can be more complicated in this technique, and thus the limit of compression efficiency can be gradually reached.

本揭露的實施例提供一種影像編碼器以及所述影像編碼器的編碼方法，所述影像編碼器用於產生預測區塊作為相對於（相較於）源區塊具有小的差異的增強的預測區塊而無需增加控制資料。Embodiments of the present disclosure provide an image encoder and an encoding method of the image encoder, the image encoder for generating a prediction block as an enhanced prediction region having a small difference with respect to (as compared to) a source block Block without adding control data.

根據實施例的一個態樣，一種藉由對輸入影像進行編碼而輸出位元流的影像編碼器包括：預測塊、機器學習式預測增強（machine learning based prediction enhancement，MLBE）塊及減法器。所述預測塊被配置成使用先前輸入區塊的資料產生預測區塊。所述機器學習式預測增強塊被配置成藉由對所述預測區塊應用機器學習技術而將所述預測區塊變換成增強的預測區塊。所述減法器被配置成藉由自當前輸入區塊的畫素資料減去所述增強的預測區塊的畫素資料而產生殘留資料的殘留區塊。According to an aspect of an embodiment, a video encoder that outputs a bit stream by encoding an input image includes: a prediction block, a machine learning based prediction enhancement (MLBE) block, and a subtractor. The prediction block is configured to generate a prediction block using data of a previously input block. The machine learning prediction enhancement block is configured to transform the prediction block into an enhanced prediction block by applying machine learning techniques to the prediction block. The subtractor is configured to generate a residual block of residual data by subtracting pixel data of the enhanced prediction block from pixel data of the current input block.

根據實施例的另一個態樣，一種處理影像資料的方法包括：自先前輸入區塊的時域資料產生預測區塊；藉由對所述預測區塊應用多種可用的機器學習技術中的至少一種而將所述預測區塊變換成增強的預測區塊；以及藉由自當前輸入區塊減去所述增強的預測區塊而產生殘留資料的殘留區塊。According to another aspect of an embodiment, a method of processing image data includes: generating a prediction block from time domain data of a previously input block; applying at least one of a plurality of available machine learning techniques to the prediction block And converting the prediction block into an enhanced prediction block; and generating a residual block of residual data by subtracting the enhanced prediction block from the current input block.

根據實施例的另一個態樣，一種處理影像資料的方法包括：自先前輸入區塊的時域資料產生預測區塊；藉由對所述預測區塊應用多種可用的機器學習技術中的至少一種而將所述預測區塊變換成增強的預測區塊；使用與所述預測區塊及所述增強的預測區塊中的每一者對應的率畸變最佳化（rate-distortion optimization，RDO）值來選擇所述預測區塊及所述增強的預測區塊中的一者；以及藉由自當前輸入區塊減去所述所選擇區塊而產生殘留資料的殘留區塊。According to another aspect of an embodiment, a method of processing image data includes: generating a prediction block from time domain data of a previously input block; applying at least one of a plurality of available machine learning techniques to the prediction block And converting the prediction block into an enhanced prediction block; using rate-distortion optimization (RDO) corresponding to each of the prediction block and the enhanced prediction block And selecting one of the prediction block and the enhanced prediction block; and generating a residual block of residual data by subtracting the selected block from the current input block.

在下文中，參照附圖闡述本揭露的實施例以使得本揭露將透徹及完整並將向熟習此項技術者充分傳達本揭露的範圍。在下文中，本說明書中的用語「影像」具有包含移動影像（例如，視訊）以及靜止影像（例如，相片）在內的綜合含義。In the following, the embodiments of the present disclosure are set forth with reference to the accompanying drawings. Hereinafter, the term "image" in this specification has a comprehensive meaning including moving images (for example, video) and still images (for example, photos).

圖1是示出根據本揭露實施例的機器學習式（MLB）編碼器的配置的方塊圖。參照圖1，MLB編碼器100可將輸入影像10劃分成多個區塊且可對區塊中的每一者執行MLB預測編碼。FIG. 1 is a block diagram showing a configuration of a machine learning (MLB) encoder according to an embodiment of the present disclosure. Referring to FIG. 1, the MLB encoder 100 may divide the input image 10 into a plurality of blocks and may perform MLB predictive encoding on each of the blocks.

MLB編碼器100可對輸入影像10進行處理以產生輸出資料20。根據本揭露實施例的MLB編碼器100可使用機器學習技術產生預測影像及預測區塊。舉例而言，在使用輸入影像10產生預測區塊時，MLB編碼器100可應用使用機器學習而學習得到的參數。亦即，機器學習演算法可具有使用多個預定的訓練資料集合而學習得到的決策參數。在此種情形中，預測區塊可能接近源區塊而不會在使用機器學習時增大報頭資料（header data）22。當預測區塊與源區塊彼此更接近時，殘留資料24的大小可減小更大的量。The MLB encoder 100 can process the input image 10 to produce an output data 20. The MLB encoder 100 in accordance with embodiments of the present disclosure may use machine learning techniques to generate predicted images and prediction blocks. For example, when the prediction block is generated using the input image 10, the MLB encoder 100 can apply parameters learned using machine learning. That is, the machine learning algorithm can have decision parameters learned using a plurality of predetermined training data sets. In such a case, the prediction block may be close to the source block without increasing the header data 22 when using machine learning. When the prediction block and the source block are closer to each other, the size of the residual data 24 can be reduced by a larger amount.

由根據本揭露實施例的應用機器學習技術的MLB編碼器100產生的輸出資料20可大致包括報頭資料22及殘留資料24。根據本揭露實施例的MLB編碼器100可執行MLB預測編碼以對預測區塊的殘留區塊以及源區塊進行編碼。在此種情形中，與殘留區塊對應的資料可為殘留資料24。亦即，殘留區塊的殘留資料可為殘留資料24。另一方面，預測所需的運動資料、影像資料及各種設定值可被輸出為報頭資料22。隨著預測區塊與源區塊之間的差異越來越小，殘留資料24的大小可減小更大的量。The output data 20 generated by the MLB encoder 100 applying machine learning techniques in accordance with embodiments of the present disclosure may generally include header data 22 and residual data 24. The MLB encoder 100 according to an embodiment of the present disclosure may perform MLB predictive coding to encode a residual block of a prediction block and a source block. In this case, the data corresponding to the residual block may be the residual data 24. That is, the residual data of the residual block may be the residual data 24. On the other hand, the motion data, the image data, and various set values required for prediction can be output as the header data 22. As the difference between the predicted block and the source block becomes smaller, the size of the residual data 24 can be reduced by a larger amount.

一般而言，為減小殘留區塊及殘留影像的資訊量而使預測產生得越精確，預測所需的報頭資料22的資訊量便越趨向於更多。然而，若使用根據本揭露實施例的機器學習，則可達成精確的預測而不會增大報頭資料22。先前產生的預測區塊可使用機器學習得到增強以接近源區塊。在此種情形中，儘管預測區塊與源區塊之間的差異對報頭資料22的大小不具有大的影響，然而預測區塊與源區塊之間的所述差異可有效地減小。因此，和預測區塊與源之間的差值對應的殘留資料24的大小大幅減小。In general, the more accurate the prediction is to reduce the amount of information on the residual blocks and residual images, the more the amount of information needed to predict the header data 22 tends to be more. However, if machine learning in accordance with embodiments of the present disclosure is used, accurate predictions can be achieved without increasing the header data 22. The previously generated prediction block can be enhanced using machine learning to approach the source block. In this case, although the difference between the prediction block and the source block does not have a large influence on the size of the header data 22, the difference between the prediction block and the source block can be effectively reduced. Therefore, the size of the residual data 24 corresponding to the difference between the predicted block and the source is greatly reduced.

圖2是示出圖1所示MLB編碼器的示意性配置的方塊圖。參照圖2，MLB編碼器100包括減法器110、MLBE塊120（MLB預測增強塊）、變換器/量化器130、預測塊140及熵編碼器裝置（entropy coder）150。FIG. 2 is a block diagram showing a schematic configuration of the MLB encoder shown in FIG. 1. Referring to FIG. 2, the MLB encoder 100 includes a subtractor 110, an MLBE block 120 (MLB prediction enhancement block), a transformer/quantizer 130, a prediction block 140, and an entropy coder 150.

本文中的說明可將例如編碼器、塊及編碼器裝置等結構設備/裝置元件稱為編碼器（例如，MLB編碼器或影像編碼器）的代表性元件。任何此種代表性元件在適合的時候皆可由電路元件或者由多個電路元件形成的電路來實施。此外，任何此種代表性元件在適合的時候皆可由執行特定的專用軟體指令集合（例如，軟體模組）的處理器（例如，中央處理單元、微控制器、微處理器、數位訊號處理器等）實施或者可由處理器與軟體指令的組合實施。由此，作為結構設備/裝置元件的編碼器、塊及編碼器裝置可利用電路及電路系統實作，及/或利用一或多個處理器與由所述一或多個處理器執行的軟體指令的組合實作。此種處理器可執行軟體指令以執行由本文所述所關注元件引起的一或多個過程。The description herein may refer to structural device/device elements such as encoders, blocks, and encoder devices as representative elements of an encoder (eg, an MLB encoder or a video encoder). Any such representative component can be implemented by a circuit component or a circuit formed by a plurality of circuit components, as appropriate. In addition, any such representative component can be implemented by a processor (eg, a central processing unit, a microcontroller, a microprocessor, or a digital signal processor) that executes a particular set of dedicated software instructions (eg, a software module). The implementation may be implemented by a combination of a processor and a software instruction. Thus, an encoder, block, and encoder device as a structural device/device component can be implemented using circuitry and circuitry, and/or utilizing one or more processors and software executed by the one or more processors The combination of instructions is implemented. Such a processor may execute software instructions to perform one or more processes caused by the elements of interest described herein.

本文所述任何處理器（或相似的元件）皆為有形的及非暫時性的。本文所用用語「非暫時性」應不被解釋為狀態的永恆特性，而是應被解釋為將持續一段時間的狀態特性。用語「非暫時性」具體而言否定短暫的特性，例如特定載波或訊號的特性或者在任何時間任何地點中只暫時存在的其他形式。處理器是製品及/或機器組件。處理器被配置成執行軟體指令以如在本文中的各種實施例中所述執行功能。處理器可為通用處理器或者可為應用專用積體電路（application specific integrated circuit，ASIC）的一部分。處理器亦可為微處理器、微電腦、處理器晶片、控制器、微控制器、數位訊號處理器（digital signal processor，DSP）、狀態機（state machine）或可程式化邏輯裝置。處理器亦可為邏輯電路（包括例如現場可程式化閘陣列（field programmable gate array，FPGA）等可程式化閘陣列（programmable gate array，PGA））或包括分立的閘及/或晶體管邏輯的另一種類型的電路。處理器可為中央處理單元（central processing unit，CPU）。另外，本文所述任何處理器皆可包括多處理器、並行處理器或二者。指令集合可自電腦可讀取媒體讀取。另外，指令在由處理器執行時可用於執行本文所述方法及過程中的一或多者。在具體實施例中，指令在執行期間可完全駐留於或至少局部地駐留於主記憶體、靜態記憶體內及/或處理器內。Any of the processors (or similar components) described herein are both tangible and non-transitory. The term "non-transitory" as used herein shall not be interpreted as an eternal character of the state, but rather as a state characteristic that will last for a period of time. The term "non-transitory" specifically denies short-lived characteristics, such as the characteristics of a particular carrier or signal, or other forms that are only temporarily present at any time and any place. The processor is an article of manufacture and/or a machine component. The processor is configured to execute software instructions to perform functions as described in various embodiments herein. The processor can be a general purpose processor or can be part of an application specific integrated circuit (ASIC). The processor can also be a microprocessor, a microcomputer, a processor chip, a controller, a microcontroller, a digital signal processor (DSP), a state machine, or a programmable logic device. The processor can also be a logic circuit (including, for example, a programmable gate array (PGA) such as a field programmable gate array (FPGA)) or another logic including discrete gate and/or transistor logic. One type of circuit. The processor can be a central processing unit (CPU). Additionally, any of the processors described herein can include multiple processors, parallel processors, or both. The set of instructions can be read from a computer readable medium. Additionally, the instructions, when executed by a processor, can be used to perform one or more of the methods and processes described herein. In a particular embodiment, the instructions may reside entirely or at least partially within the main memory, static memory, and/or processor during execution.

在替代實施例中，可構建專用硬體實作方式來實作本文所述方法中的一或多者，所述專用硬體實作方式例如為應用專用積體電路（ASIC）、可程式化邏輯陣列及用於本文所述功能塊、匯流排保護器及系統管理器的其他硬體組件。本文所述一或多個實施例可使用具有相關控制訊號及資料訊號的二或更多個專用互連硬體模組或裝置來實作功能，所述相關控制訊號及資料訊號可在各個模組之間傳送以及經由模組傳送。因此，本揭露囊括軟體實作方式、韌體實作方式及硬體實作方式。本申請案中的任何內容皆不應被解釋為僅或僅可使用軟體而不使用硬體（例如，有形的非暫時性處理器及/或記憶體）來實作。In an alternate embodiment, a dedicated hardware implementation can be implemented to implement one or more of the methods described herein, such as an application-specific integrated circuit (ASIC), which can be programmed. Logic arrays and other hardware components for the functional blocks, bus protectors, and system managers described herein. One or more embodiments described herein may be implemented using two or more dedicated interconnect hardware modules or devices having associated control signals and data signals, the associated control signals and data signals being available in various modes. Transfer between groups and transfer via module. Therefore, the present disclosure encompasses a software implementation, a firmware implementation, and a hardware implementation. Nothing in this application should be construed as being limited to the use of the software only, but not hardware (for example, a tangible non-transitory processor and/or memory).

減法器110可藉由輸入區塊與所產生的預測區塊之間的差異產生殘留區塊。變換器/量化器130可對殘留區塊進行變換以輸出變換係數。變換器/量化器130可使用量化參數及量化矩陣中的至少一者來對變換係數進行量化。因此，可產生經量化係數。在此種情形中，所輸出的經量化係數可最終對應於殘留資料Residual_Data。The subtractor 110 may generate a residual block by the difference between the input block and the generated prediction block. The transformer/quantizer 130 may transform the residual block to output transform coefficients. The transformer/quantizer 130 may quantize the transform coefficients using at least one of a quantization parameter and a quantization matrix. Therefore, quantized coefficients can be produced. In this case, the quantized coefficients output may ultimately correspond to the residual data Residual_Data.

熵編碼器裝置150可使用所生成的殘留資料Residual_Data或報頭資料Header_Data（例如，在編碼過程中生成的編碼參數）執行熵編碼。熵編碼是用於藉由以相對少的位元來表示頻繁出現的圖案以及以相對多的位元來表示不頻繁出現的圖案來壓縮數位資料的一種類型的無損編碼。以下提供熵編碼的實例。可藉由熵編碼器裝置150的熵編碼計算來輸出位元流Bitstream。若應用熵編碼，則可將小數目的位元分配給出現概率高的符號，且可將大數目的位元分配給出現概率低的符號。因此，將被編碼的符號的位元流的大小可根據此種符號表示方式而減小。The entropy encoder device 150 may perform entropy encoding using the generated residual data Residual_Data or header data Header_Data (eg, encoding parameters generated during encoding). Entropy coding is a type of lossless coding for compressing digital data by representing frequently occurring patterns with relatively few bits and representing infrequently occurring patterns with relatively many bits. An example of entropy coding is provided below. The bit stream Bitstream can be output by the entropy coding calculation of the entropy encoder device 150. If entropy coding is applied, a small number of bits can be assigned to symbols with high probability of occurrence, and a large number of bits can be assigned to symbols with low probability of occurrence. Therefore, the size of the bit stream of the symbol to be encoded can be reduced according to such symbol representation.

預測塊140可基於所輸入的經量化的殘留資料Residual_Data及各種參數來產生預測區塊P。預測塊140可以訊框內模式或訊框間模式執行編碼以輸出預測區塊P。預測塊140可產生輸入影像10的源區塊S的預測區塊P且可將所產生的預測區塊P提供至MLBE塊120。The prediction block 140 may generate the prediction block P based on the input quantized residual data Residual_Data and various parameters. The prediction block 140 may perform encoding in an intra-frame mode or an inter-frame mode to output the prediction block P. The prediction block 140 may generate the prediction block P of the source block S of the input image 10 and may provide the generated prediction block P to the MLBE block 120.

MLBE塊120可對預測區塊P進行處理以輸出增強的預測區塊EP作為處理結果。MLBE塊120可包括處理器，所述處理器執行演算法（例如，一或多種可用的機器學習演算法）以對預測區塊P進行處理並將預測區塊P變換成增強的預測區塊。MLBE塊120可使用例如機器學習演算法來將預測區塊P處理成具有接近源區塊S的程度以獲得處理結果。換言之，MLBE塊120可參考各種資訊（例如，預測模式、運動向量的特徵、影像的分區形式、變換單元的大小）來選擇各種機器學習技術中最佳的機器學習技術。可使用例如決策樹、神經網路（neural network，NN）、卷積神經網路（convolution neural network，CNN）、支持向量機（support vector machine，SVM）、K最近鄰（K-nearest neighbor，K-NN）演算法及強化學習等各種技術作為機器學習技術。The MLBE block 120 may process the prediction block P to output an enhanced prediction block EP as a processing result. The MLBE block 120 can include a processor that performs an algorithm (eg, one or more available machine learning algorithms) to process the prediction block P and transform the prediction block P into an enhanced prediction block. The MLBE block 120 may use, for example, a machine learning algorithm to process the prediction block P to have a degree close to the source block S to obtain a processing result. In other words, the MLBE block 120 can select the best machine learning techniques among various machine learning techniques with reference to various information (eg, prediction mode, characteristics of motion vectors, partitioned form of the image, size of the transform unit). For example, a decision tree, a neural network (NN), a convolutional neural network (CNN), a support vector machine (SVM), and a K nearest neighbor (K-nearest neighbor, K) can be used. -NN) Various techniques such as algorithm and reinforcement learning are used as machine learning techniques.

根據本揭露的實施例，MLB編碼器100使用機器學習將預測區塊P處理成接近源區塊S。本揭露不需要額外的資料來產生增強的預測區塊EP。在本文中，增強的預測區塊EP可藉由MLBE塊120提供，MLBE塊120藉由學習提供最佳的過濾效果。MLBE塊120可藉由（在線的或離線的）學習來維持或更新效能而無需提供額外的資料。因此，依照根據本揭露實施例的MLB編碼器100，可在不增大圖1的報頭資料22的情況下減小圖1所示殘留資料24。In accordance with an embodiment of the present disclosure, the MLB encoder 100 processes the prediction block P to be close to the source block S using machine learning. The disclosure does not require additional information to generate an enhanced prediction block EP. In this context, the enhanced prediction block EP can be provided by the MLBE block 120, which provides the best filtering effect by learning. The MLBE block 120 can maintain or update performance by (online or offline) learning without providing additional material. Therefore, according to the MLB encoder 100 according to the embodiment of the present disclosure, the residual material 24 shown in FIG. 1 can be reduced without increasing the header data 22 of FIG.

圖3是示出圖2所示MLB編碼器的詳細配置的方塊圖。參照圖3，MLB編碼器100包括減法器110、MLBE塊120、變換器132、量化器134、預測塊140、熵編碼器裝置150及編碼器裝置控制器160。在本文中，預測塊140包括反量化器（dequantizer）141、逆變換器142、加法器143、環內濾波器（in-loop filter）144、緩衝器145、運動估測塊146、運動補償塊147、訊框內預測塊148及模式決策塊149。上述配置的MLB編碼器100可提供MLB預測增強功能。因此，可在不增大圖1的報頭資料22的情況下減小圖1所示殘留資料24。Fig. 3 is a block diagram showing a detailed configuration of the MLB encoder shown in Fig. 2. Referring to FIG. 3, the MLB encoder 100 includes a subtractor 110, an MLBE block 120, a transformer 132, a quantizer 134, a prediction block 140, an entropy encoder device 150, and an encoder device controller 160. Herein, the prediction block 140 includes an inverse quantizer 141, an inverse transformer 142, an adder 143, an in-loop filter 144, a buffer 145, a motion estimation block 146, and a motion compensation block. 147. In-frame prediction block 148 and mode decision block 149. The MLB encoder 100 of the above configuration can provide MLB prediction enhancement. Therefore, the residual material 24 shown in Fig. 1 can be reduced without increasing the header data 22 of Fig. 1.

減法器110可產生殘留資料的殘留區塊，所述殘留資料是輸入區塊（或源區塊）與預測區塊之間的差異資料。詳細而言，減法器110可計算輸入訊框中所包括的多個空間域區塊中將在當前處理的當前空間域區塊的值與自MLBE塊120輸出的增強的預測區塊EP的值之間的差值。減法器110可產生與所計算的差值對應的空間域殘留區塊的值（在下文中稱為「殘留資料」）。The subtractor 110 may generate a residual block of residual data, which is a difference data between the input block (or source block) and the predicted block. In detail, the subtractor 110 may calculate the value of the current spatial domain block to be currently processed in the plurality of spatial domain blocks included in the input frame and the value of the enhanced prediction block EP output from the MLBE block 120. The difference between. The subtractor 110 may generate a value of a spatial domain residual block corresponding to the calculated difference (hereinafter referred to as "residual data").

就資料處理而言，空間域區塊中的每一者可包括m´n個畫素。在本文中，m及n中的每一者可為大於或等於2的自然數，且m可等於n或m不等於n。空間域區塊中所包括的畫素可為但不限於具有亮度與色度（YUV）格式的資料、具有YCbCr格式的資料或具有紅、綠及藍（red, green, and blue，RGB）格式的資料。舉例而言，空間域區塊可包括但不限於4×4個畫素、8×8個畫素、16×16個畫素、32×32個畫素或64×64個畫素。減法器110可針對每一個計算區塊計算差值且可針對每一個空間域區塊輸出所計算的差值。舉例而言，計算區塊的大小可小於空間域區塊。舉例而言，當計算區塊包括4×4個畫素時，空間域區塊可包括16×16個畫素。然而，計算區塊的大小及空間域區塊的大小並非僅限於此。For data processing, each of the spatial domain blocks may include m ́n pixels. Herein, each of m and n may be a natural number greater than or equal to 2, and m may be equal to n or m is not equal to n. The pixels included in the spatial domain block may be, but are not limited to, data having a luminance and chrominance (YUV) format, data having a YCbCr format, or having red, green, and blue (RGB) formats. data of. For example, the spatial domain block may include, but is not limited to, 4×4 pixels, 8×8 pixels, 16×16 pixels, 32×32 pixels, or 64×64 pixels. The subtractor 110 may calculate a difference value for each of the calculation blocks and may output the calculated difference value for each of the spatial domain blocks. For example, the size of the calculation block can be smaller than the space domain block. For example, when the calculation block includes 4×4 pixels, the spatial domain block may include 16×16 pixels. However, the size of the calculation block and the size of the spatial domain block are not limited to this.

變換器132可對殘留區塊進行變換以輸出變換係數。變換器132可對空間域殘留區塊中所包括的區塊值執行時域至頻域變換。舉例而言，變換器132可將時域的空間座標變換成頻域的值。舉例而言，變換器132可使用離散餘弦變換（discrete cosine transform，DCT）自空間域殘留區塊的值產生頻域係數。換言之，變換器132可將作為時域資料的殘留資料變換成頻域資料。Transformer 132 may transform the residual block to output transform coefficients. Transformer 132 may perform a time domain to frequency domain transform on the block values included in the spatial domain residual block. For example, transformer 132 can transform the spatial coordinates of the time domain into values of the frequency domain. For example, converter 132 may generate a frequency domain coefficient from the value of the spatial domain residual block using a discrete cosine transform (DCT). In other words, the transformer 132 can transform the residual data as time domain data into frequency domain data.

量化器134可使用量化參數及量化矩陣中的至少一者來對輸入變換係數進行量化。量化器134可輸出經量化的係數作為量化的結果。亦即，量化器可被配置成藉由對頻域資料進行量化來輸出經量化的係數。Quantizer 134 may quantize the input transform coefficients using at least one of a quantization parameter and a quantization matrix. Quantizer 134 may output the quantized coefficients as a result of the quantization. That is, the quantizer can be configured to output the quantized coefficients by quantizing the frequency domain data.

熵編碼器裝置150可基於由量化器134計算得到的值或在編碼過程中計算的編碼參數等來執行熵編碼以輸出位元流。若應用熵編碼，則可將小數目的位元分配給出現概率高的符號，且可將大數目的位元分配給出現概率低的符號。因此，將被編碼的符號的位元串的大小可減小。熵編碼器裝置150可使用例如指數哥倫布編碼（exponential-Golomb coding）、上下文自適應可變長編碼（context-adaptive variable length coding，CAVLC）或上下文自適應二進制算術編碼（context-adaptive binary arithmetic coding，CABAC）等編碼方法來進行熵編碼。The entropy encoder device 150 may perform entropy encoding to output a bit stream based on a value calculated by the quantizer 134 or an encoding parameter or the like calculated in the encoding process. If entropy coding is applied, a small number of bits can be assigned to symbols with high probability of occurrence, and a large number of bits can be assigned to symbols with low probability of occurrence. Therefore, the size of the bit string of the symbol to be encoded can be reduced. The entropy coder device 150 may use, for example, exponential-Golomb coding, context-adaptive variable length coding (CAVLC), or context-adaptive binary arithmetic coding (context-adaptive binary arithmetic coding). Encoding method such as CABAC) performs entropy coding.

當前被編碼的區塊或影像可能需要被解碼或儲存以用作參考區塊或影像。因此，由量化器134量化的係數可由反量化器141進行反量化且可由逆變換器142進行逆變換。經反量化及逆變換的係數可變成經重構的殘留區塊且可藉由加法器143被加到預測區塊P。因此，可產生重構區塊。The currently encoded block or image may need to be decoded or stored for use as a reference block or image. Therefore, the coefficients quantized by the quantizer 134 can be inverse quantized by the inverse quantizer 141 and inverse transformed by the inverse transformer 142. The inverse quantized and inverse transformed coefficients may become reconstructed residual blocks and may be added to prediction block P by adder 143. Therefore, a reconstructed block can be generated.

自加法器143計算得到的重構區塊可被傳送至訊框內預測塊148且可用於預測訊框內定向模式（intra directional mode）。自加法器143輸出的重構區塊亦可被傳送至環內濾波器144。The reconstructed block computed from adder 143 can be transmitted to intra-frame prediction block 148 and can be used to predict intra directional mode. The reconstructed block output from the adder 143 can also be transferred to the in-loop filter 144.

環內濾波器144可對重構區塊或重構影像應用解塊濾波器（deblocking filter）、樣本自適應偏移（sample adaptive offset，SAO）濾波器或自適應環路濾波器（adaptive loop filter，ALF）中的至少一者。解塊濾波器可將在各個區塊之間的邊界中出現的區塊畸變移除。SAO濾波器可將恰當的偏移值加到畫素值以補償編碼誤差。ALF可基於藉由將經重構的區塊與源區塊進行比較而獲得的值執行過濾。由/藉由環內濾波器144處理的重構區塊可儲存於緩衝器145中以儲存參考影像。The in-loop filter 144 may apply a deblocking filter, a sample adaptive offset (SAO) filter, or an adaptive loop filter to the reconstructed block or the reconstructed image. At least one of , ALF). The deblocking filter removes block distortion that occurs in the boundary between the various blocks. The SAO filter can add an appropriate offset value to the pixel value to compensate for the coding error. The ALF may perform filtering based on values obtained by comparing the reconstructed block to the source block. The reconstructed block processed by/by the in-loop filter 144 can be stored in the buffer 145 to store the reference image.

緩衝器145可儲存自環內濾波器144輸出的重構區塊且可將重構區塊提供至運動估測塊146及運動補償塊147。緩衝器145可提供自環內濾波器144輸出的重構區塊作為輸入至熵編碼器裝置150的輸出資料Output Data。The buffer 145 can store the reconstructed block output from the in-loop filter 144 and can provide the reconstructed block to the motion estimation block 146 and the motion compensation block 147. The buffer 145 can provide the reconstructed block output from the in-loop filter 144 as an output data Output data input to the entropy encoder device 150.

在訊框內模式中，訊框內預測塊148可使用當前區塊周圍的先前被編碼的區塊的畫素值來執行空間預測，以產生第一預測區塊P1作為執行空間預測的結果。在訊框間模式中，運動估測塊146可在運動預測過程中找到與儲存於緩衝器145中的參考影像中的輸入區塊最密切匹配的參考區塊以獲得運動向量。運動補償塊147可使用運動向量執行運動補償以產生第二預測區塊P2。在本文中，運動向量可為用於訊框間預測的二維（2D）向量且可表示參考區塊與當前將要被編碼/解碼的區塊之間的偏移。In the intra-frame mode, intra-frame prediction block 148 may perform spatial prediction using the pixel values of previously coded blocks around the current block to produce first prediction block P1 as a result of performing spatial prediction. In the inter-frame mode, motion estimation block 146 may find a reference block that most closely matches the input block stored in the reference image in buffer 145 during motion prediction to obtain a motion vector. Motion compensation block 147 may perform motion compensation using motion vectors to generate second prediction block P2. Herein, the motion vector may be a two-dimensional (2D) vector for inter-frame prediction and may represent an offset between a reference block and a block that is currently to be encoded/decoded.

模式決策塊149可接收當前區塊、自運動補償塊147提供的第一預測區塊P1、以及自訊框內預測塊148提供的第二預測區塊P2。模式決策塊149可將第一預測區塊P1及第二預測區塊P2中的一者確定為預測區塊P且可將預測區塊P提供至MLBE塊120。模式決策塊149可根據當前區塊值、經反量化的係數、第一預測區塊P1的區塊值及第二預測區塊P2的區塊值以及控制訊號CNT3確定並輸出預測區塊P。The mode decision block 149 may receive the current block, the first predicted block P1 provided from the motion compensation block 147, and the second predicted block P2 provided by the intra-frame prediction block 148. The mode decision block 149 may determine one of the first prediction block P1 and the second prediction block P2 as the prediction block P and may provide the prediction block P to the MLBE block 120. The mode decision block 149 may determine and output the prediction block P according to the current block value, the inverse quantized coefficient, the block value of the first prediction block P1, and the block value of the second prediction block P2, and the control signal CNT3.

在另一實施例中，模式決策塊149可對第一預測區塊P1及第二預測區塊P2中的每一者應用機器學習演算法並執行模式決策。舉例而言，模式決策塊149可對第一預測區塊P1應用機器學習演算法以產生增強的第一預測區塊EP1。模式決策塊149可對第二預測區塊P2應用機器學習演算法以產生增強的第二預測區塊EP2。模式決策塊149可將增強的第一預測區塊EP1及增強的第二預測區塊EP2中的一者確定為預測區塊P且可將預測區塊P提供至MLBE塊120。在此種情形中，在增強的第一預測區塊EP1或增強的第二預測區塊EP2中應包括表示是否已由機器學習演算法進行處理的值。影像壓縮率可藉由模式決策塊149的此種操作得到進一步增強。In another embodiment, mode decision block 149 may apply a machine learning algorithm to each of first prediction block P1 and second prediction block P2 and perform mode decision. For example, mode decision block 149 can apply a machine learning algorithm to first prediction block P1 to generate an enhanced first prediction block EP1. The mode decision block 149 may apply a machine learning algorithm to the second prediction block P2 to generate an enhanced second prediction block EP2. The mode decision block 149 may determine one of the enhanced first prediction block EP1 and the enhanced second prediction block EP2 as the prediction block P and may provide the prediction block P to the MLBE block 120. In this case, a value indicating whether processing has been performed by the machine learning algorithm should be included in the enhanced first prediction block EP1 or the enhanced second prediction block EP2. The image compression ratio can be further enhanced by such operation of mode decision block 149.

MLBE塊120可對自模式決策塊149提供的預測區塊P進行處理以輸出增強的預測區塊EP作為處理結果。執行MLB處理的MLBE塊120可將預測區塊P處理成具有接近源區塊S的程度。MLBE塊120可參照編碼資訊MLS_Info選擇各種可用的機器學習技術中的一種。編碼資訊MLS_Info可包括各種資訊，例如先前由模式決策塊149確定的預測模式、運動向量的特徵、影像的分區形式及變換單元的大小。可使用例如決策樹、CNN、SVM、K最近鄰（K-NN）演算法及強化學習等各種技術作為機器學習技術。將參照附圖詳細給出MLBE塊120的詳細特性的說明。The MLBE block 120 may process the prediction block P provided from the mode decision block 149 to output the enhanced prediction block EP as a processing result. The MLBE block 120 performing the MLB processing may process the prediction block P to a degree close to the source block S. The MLBE block 120 can select one of various available machine learning techniques with reference to the encoded information MLS_Info. The encoded information MLS_Info may include various information such as a prediction mode previously determined by the mode decision block 149, a feature of the motion vector, a partitioned form of the image, and a size of the transform unit. Various techniques such as decision tree, CNN, SVM, K nearest neighbor (K-NN) algorithm, and reinforcement learning can be used as machine learning techniques. A description will be given in detail of the detailed characteristics of the MLBE block 120 with reference to the drawings.

編碼器裝置控制器160可根據輸入影像或區塊來控制MLB編碼器100中的總體元件。編碼器裝置控制器160可確定輸入影像的分區、編碼區塊的大小等且可根據所確定的準則來控制影像的編碼及解碼。編碼器裝置控制器160可為此種控制操作產生多個控制訊號CNT1至CNT4且可將控制訊號CNT1至CNT4分別提供至運動估測塊146、變換器132、模式決策塊149及反量化器141。編碼器裝置控制器160可將位元流的報頭資料22（參見圖1）中包括的控制資料Control data提供至熵編碼器裝置150。The encoder device controller 160 can control the overall components in the MLB encoder 100 based on the input image or block. Encoder device controller 160 may determine the partition of the input image, the size of the encoded block, etc. and may control the encoding and decoding of the image in accordance with the determined criteria. The encoder device controller 160 may generate a plurality of control signals CNT1 to CNT4 for such control operations and may provide the control signals CNT1 to CNT4 to the motion estimation block 146, the converter 132, the mode decision block 149, and the inverse quantizer 141, respectively. . The encoder device controller 160 may provide the control data Control data included in the header data 22 (see FIG. 1) of the bit stream to the entropy encoder device 150.

如上所述，根據本揭露的實施例，MLB編碼器100使用機器學習將預測區塊P處理成接近源區塊S。根據本揭露的實施例，MLB編碼器100包括用於藉由學習來提供最佳的重構效果的MLBE塊120。增強的預測區塊EP可藉由MLBE塊120提供而不會使殘留資料增大。MLBE塊120可藉由（在線的或離線的）學習來維持或更新效能。因此，依照根據本揭露實施例的MLB編碼器100，可在不增大報頭資料22的情況下減小殘留資料24。As described above, according to the embodiment of the present disclosure, the MLB encoder 100 processes the prediction block P to be close to the source block S using machine learning. In accordance with an embodiment of the present disclosure, the MLB encoder 100 includes an MLBE block 120 for providing optimal reconstruction effects by learning. The enhanced prediction block EP can be provided by the MLBE block 120 without increasing the residual data. The MLBE block 120 can maintain or update performance by learning (online or offline). Therefore, according to the MLB encoder 100 according to the embodiment of the present disclosure, the residual material 24 can be reduced without increasing the header data 22.

圖4是示出圖3所示MLBE塊120的特性的方塊圖。參照圖4，MLBE塊120可使用各種編碼資訊MLS_Info將預測區塊P變換成最佳的增強的預測區塊EP。FIG. 4 is a block diagram showing the characteristics of the MLBE block 120 shown in FIG. Referring to FIG. 4, the MLBE block 120 may transform the prediction block P into an optimal enhanced prediction block EP using various encoding information MLS_Info.

MLBE塊120可具有各種機器學習演算法ML1至MLn（其中n是整數）。MLBE塊120可使用編碼資訊MLS_Info選擇具有最佳的增強效能的機器學習演算法。在本文中，應充分理解，各種機器學習演算法ML1至MLn分別被提供至利用硬體實作的機器學習裝置。在本文中，機器學習演算法ML1至MLn可包括各種演算法，例如，決策樹、CNN、SVM及強化學習。The MLBE block 120 can have various machine learning algorithms ML1 through MLn (where n is an integer). The MLBE block 120 can use the encoded information MLS_Info to select a machine learning algorithm with the best enhanced performance. In this context, it should be fully understood that the various machine learning algorithms ML1 through MLn are each provided to a machine learning device that utilizes hardware implementation. In this context, machine learning algorithms ML1 through MLn may include various algorithms, such as decision trees, CNN, SVM, and reinforcement learning.

編碼資訊MLS_Info可包括各種參數條件，例如，預測模式、運動向量的特徵、訊框內方向、編碼單元的大小、影像的分區形式及變換單元的大小。眾所周知，機器學習演算法ML1至MLn針對特定影像或特徵具有不同的濾波器特性。因此，增強的預測區塊EP的品質可根據各種條件或條件的組合而變化。根據本揭露實施例的MLBE塊120可選擇藉由學習過程而確定的最佳的機器學習演算法且可產生接近源區塊S的增強的預測模式EP而不會增大圖1的報頭資料22。因此，可在不增大報頭資料22的情況下減小殘留資料24（參見圖1）。The coding information MLS_Info may include various parameter conditions, such as a prediction mode, a feature of a motion vector, an in-frame direction, a size of a coding unit, a partition form of an image, and a size of a transform unit. It is well known that machine learning algorithms ML1 to MLn have different filter characteristics for a particular image or feature. Therefore, the quality of the enhanced prediction block EP may vary depending on various conditions or combinations of conditions. The MLBE block 120 in accordance with embodiments of the present disclosure may select an optimal machine learning algorithm determined by the learning process and may generate an enhanced prediction mode EP proximate to the source block S without increasing the header data 22 of FIG. . Therefore, the residual data 24 can be reduced without increasing the header data 22 (see Fig. 1).

圖5A及圖5B是示出用於根據每一種預測模式而選擇最佳的機器學習演算法的MLBE塊的方塊圖。圖5A示出用於在訊框內模式中選擇多種可用的機器學習演算法中的第二機器學習演算法ML2的MLBE塊120。圖5B示出用於在訊框間模式中選擇多種可用的機器學習演算法中的第三機器學習演算法ML3的MLBE塊120。5A and 5B are block diagrams showing MLBE blocks for selecting an optimal machine learning algorithm for each prediction mode. FIG. 5A illustrates an MLBE block 120 for selecting a second machine learning algorithm ML2 of a plurality of available machine learning algorithms in an intra-frame mode. FIG. 5B illustrates an MLBE block 120 for selecting a third machine learning algorithm ML3 of a plurality of available machine learning algorithms in an inter-frame mode.

參照圖5A，若提供至MLBE塊120的編碼資訊MLS_Info是訊框內模式，則將自圖3所示訊框內預測塊148傳送訊框內預測區塊P_Intra。訊框內模式中的訊框內預測區塊P_Intra可僅使用有限螢幕中的資訊產生。因此，訊框內預測區塊P_Intra在解析度或品質方面可相對粗糙。MLBE塊120可選擇第二機器學習演算法ML2來將此種訊框內預測區塊P_Intra處理為具有與源區塊S接近的程度的增強的預測區塊EP。此種選擇可基於先前執行的各種學習的結果來執行。Referring to FIG. 5A, if the encoded information MLS_Info provided to the MLBE block 120 is an intra-frame mode, the intra-frame prediction block 148 is transmitted from the intra-frame prediction block 148 shown in FIG. The intra-frame prediction block P_Intra in the in-frame mode can be generated using only information in a limited screen. Therefore, the intra-frame prediction block P_Intra can be relatively coarse in terms of resolution or quality. The MLBE block 120 may select the second machine learning algorithm ML2 to process such intra-frame prediction block P_Intra as an enhanced prediction block EP having a degree close to the source block S. Such selection can be performed based on the results of various learnings performed previously.

參照圖5B，若提供至MLBE塊120的編碼資訊MLS_Info是訊框間模式，則將自圖3所示運動補償塊147傳送訊框間預測區塊P_Inter。訊框間預測區塊P_Inter可參考影像中的先前處理的另一個訊框來產生。因此，訊框間預測區塊P_Inter在解析度方面可較在訊框內模式中產生的訊框內預測區塊P_Intra相對更精細，或者可具有較在訊框內模式中產生的訊框內預測區塊P_Intra更高/更佳的品質。MLBE塊120可選擇第三機器學習演算法ML3來將此種訊框間預測區塊P_Inter處理成具有與源區塊S接近的程度的增強的預測區塊EP。此種選擇可基於先前執行的各種學習的結果來執行。Referring to FIG. 5B, if the encoded information MLS_Info provided to the MLBE block 120 is an inter-frame mode, the inter-frame prediction block P_Inter is transmitted from the motion compensation block 147 shown in FIG. The inter-frame prediction block P_Inter can be generated by referring to another frame previously processed in the image. Therefore, the inter-frame prediction block P_Inter may be relatively finer in resolution than the intra-frame prediction block P_Intra generated in the intra-frame mode, or may have intra-frame prediction generated in the intra-frame mode. Block P_Intra higher / better quality. The MLBE block 120 may select the third machine learning algorithm ML3 to process such inter-frame prediction block P_Inters into an enhanced prediction block EP having a degree close to the source block S. Such selection can be performed based on the results of various learnings performed previously.

如上所述，用於根據預測模式選擇機器學習演算法的方法僅被闡述為在各個獨立的替代方案中進行選擇。然而，此可僅為示例性實施例。應充分理解，可根據各種編碼資訊MLS_Info的組合來對一或多種機器學習演算法以各種方式進行組合及應用。As described above, the method for selecting a machine learning algorithm based on a prediction mode is only described as being selected among various independent alternatives. However, this may be merely an exemplary embodiment. It should be fully understood that one or more machine learning algorithms can be combined and applied in various ways according to a combination of various encoding information MLS_Info.

圖6A及圖6B是示出根據本揭露實施例的根據預測區塊的特性而選擇機器學習技術的編碼方法的流程圖。參照圖6A及圖6B，將給出根據本揭露實施例的MLBE塊120（參見圖4）的示例性操作特性的說明。6A and 6B are flowcharts illustrating an encoding method of selecting a machine learning technique according to characteristics of a prediction block, according to an embodiment of the present disclosure. 6A and 6B, an illustration of exemplary operational characteristics of the MLBE block 120 (see FIG. 4) in accordance with an embodiment of the present disclosure will be presented.

參照圖6A，MLBE塊120可根據預測區塊的特性而選擇機器學習演算法。Referring to FIG. 6A, MLBE block 120 may select a machine learning algorithm based on the characteristics of the predicted block.

在操作S110中，MLBE塊120可接收圖4所示預測區塊P及編碼資訊（MLS_Info）。編碼資訊MLS_Info可包括各種參數或條件，例如，預測模式、運動向量的量值或方向、訊框內方向、編碼單元CU的大小、影像的分區形式及變換單元的大小。此種編碼資訊MLS_Info可自圖3所示模式決策塊149、編碼器裝置控制器160、運動估測塊146、環內濾波器144等提供。然而，應充分地理解，編碼資訊MLS_Info的類型或範圍並非僅限於此。可使用呈各種資料形式的編碼資訊MLS_Info的組合來產生具有高準確度的增強的預測區塊EP。In operation S110, the MLBE block 120 may receive the prediction block P and the encoding information (MLS_Info) shown in FIG. The coded information MLS_Info may include various parameters or conditions, such as a prediction mode, a magnitude or direction of a motion vector, an in-frame direction, a size of a coding unit CU, a partitioned form of an image, and a size of a transform unit. Such encoded information MLS_Info may be provided from mode decision block 149, encoder device controller 160, motion estimation block 146, in-loop filter 144, etc., as shown in FIG. However, it should be fully understood that the type or scope of the encoded information MLS_Info is not limited to this. A combination of the encoded information MLS_Info in various data forms can be used to generate an enhanced prediction block EP with high accuracy.

在操作S120中，MLBE塊120可檢查並分析所提供的編碼資訊MLS_Info。MLBE塊120可根據預定準則對所提供的資訊（例如，預測模式、運動向量的量值或方向、訊框內方向、編碼單元CU的大小、影像的分區形式及變換單元的大小）進行分類。所述預定準則可包括指示是否首先應用任何資訊以及依據於相應資訊的詳細操作流程的資訊。In operation S120, the MLBE block 120 may check and analyze the provided encoded information MLS_Info. The MLBE block 120 may classify the provided information (eg, the prediction mode, the magnitude or direction of the motion vector, the in-frame direction, the size of the coding unit CU, the partitioned form of the image, and the size of the transform unit) according to predetermined criteria. The predetermined criteria may include information indicating whether any information is applied first and a detailed operational flow based on the corresponding information.

在操作S130中，MLBE塊120可檢查預測模式且可分支至操作。為簡化實施例中的說明，可假設MLBE塊120根據預測模式及運動向量來確定機器學習技術。當然，應充分理解，可應用各種編碼資訊MLS_Info的組合來確定機器學習技術。若預測模式是訊框內模式，則流程可移動至操作S180。相反，若預測模式是訊框間模式，則流程可移動至操作S140。In operation S130, the MLBE block 120 may check the prediction mode and may branch to the operation. To simplify the description in the embodiment, it may be assumed that the MLBE block 120 determines the machine learning technique based on the prediction mode and the motion vector. Of course, it should be fully understood that a combination of various coding information MLS_Info can be applied to determine machine learning techniques. If the prediction mode is the intra-frame mode, the flow may move to operation S180. In contrast, if the prediction mode is the inter-frame mode, the flow may move to operation S140.

在操作S140中，操作可根據運動向量進行分支。為簡化說明，可假設操作根據運動向量的方向進行分支。若運動向量對應於第一方向Dir1，則流程可移動至操作S150。另一方面，若運動向量對應於第二方向Dir2，則流程可移動至操作S160。若運動向量對應於第三方向Dir2，則流程可移動至操作S170。In operation S140, the operation may be branched according to the motion vector. To simplify the description, it can be assumed that the operation branches according to the direction of the motion vector. If the motion vector corresponds to the first direction Dir1, the flow may move to operation S150. On the other hand, if the motion vector corresponds to the second direction Dir2, the flow may move to operation S160. If the motion vector corresponds to the third direction Dir2, the flow may move to operation S170.

在操作S150至操作S180中的每一者中，可根據所選擇機器學習技術對預測區塊P進行處理。作為示例性實例，在操作S150中，可根據決策樹機器學習演算法ML1對預測區塊P進行處理。在操作S160中，可根據CNN機器學習演算法ML2對預測區塊P進行處理。在操作S170中，可根據SVM機器學習演算法ML3對預測區塊P進行處理。在操作S180中，可根據適用於圖案辨識及決策的K最近鄰（K-NN）演算法類型的機器學習演算法ML4來對預測區塊P進行處理。另外，可使用強化學習演算法或各種機器學習演算法來產生根據本揭露實施例的預測區塊P作為增強的預測區塊EP。In each of operations S150 through S180, the prediction block P may be processed according to the selected machine learning technique. As an illustrative example, in operation S150, the prediction block P may be processed according to the decision tree machine learning algorithm ML1. In operation S160, the prediction block P may be processed according to the CNN machine learning algorithm ML2. In operation S170, the prediction block P may be processed according to the SVM machine learning algorithm ML3. In operation S180, the prediction block P may be processed according to a K-nearest neighbor (K-NN) algorithm type machine learning algorithm ML4 suitable for pattern recognition and decision. Additionally, a reinforcement learning algorithm or various machine learning algorithms may be used to generate the prediction block P as an enhanced prediction block EP in accordance with embodiments of the present disclosure.

在操作S190中，MLBE塊120可輸出藉由所選擇機器學習演算法產生的增強的預測區塊EP。所輸出的增強的預測區塊EP將被傳送至減法器11（參見圖3）。In operation S190, the MLBE block 120 may output the enhanced prediction block EP generated by the selected machine learning algorithm. The output enhanced prediction block EP will be transmitted to the subtractor 11 (see Fig. 3).

如上所述，可根據預測區塊的特性來選擇機器學習演算法的類型。然而，本揭露的優點並非僅限於上述實施例。以下將參照圖6B給出另一種特性的說明。As described above, the type of machine learning algorithm can be selected based on the characteristics of the prediction block. However, the advantages of the present disclosure are not limited to the above embodiments. An explanation of another characteristic will be given below with reference to FIG. 6B.

參照圖6B，可根據預測塊的特性在一種機器學習演算法（例如，CNN）中選擇各種參數集合中的一者。在本文中，將CNN闡述為機器學習演算法的實例。然而，應充分理解，本揭露並非僅限於此。Referring to FIG. 6B, one of various parameter sets may be selected in a machine learning algorithm (eg, CNN) based on the characteristics of the prediction block. In this paper, CNN is illustrated as an example of a machine learning algorithm. However, it should be fully understood that the disclosure is not limited thereto.

在操作S210中，MLBE塊120可接收圖4所示預測區塊P及編碼資訊（MLS_Info）。編碼資訊MLS_Info可包括各種參數或條件，例如，預測模式、運動向量的量值或方向、訊框內方向、編碼單元CU的大小、影像的分區形式及變換單元的大小。此種編碼資訊MLS_Info可自模式決策塊149、編碼器裝置控制器160、運動估測塊146、環內濾波器144等提供。In operation S210, the MLBE block 120 may receive the prediction block P and the encoding information (MLS_Info) shown in FIG. The coded information MLS_Info may include various parameters or conditions, such as a prediction mode, a magnitude or direction of a motion vector, an in-frame direction, a size of a coding unit CU, a partitioned form of an image, and a size of a transform unit. Such encoded information MLS_Info may be provided from mode decision block 149, encoder device controller 160, motion estimation block 146, in-loop filter 144, and the like.

在操作S220中，MLBE塊120可檢查並分析所提供的編碼資訊MLS_Info。MLBE塊120可根據預定準則對所提供的資訊（例如，預測模式、運動向量的量值或方向、訊框內方向、編碼單元CU的大小、影像的分區形式及變換單元的大小）進行分類。所述預定準則可包括指示是否首先應用任何資訊以及依據於相應資訊的詳細操作流程的資訊。In operation S220, the MLBE block 120 may check and analyze the provided encoded information MLS_Info. The MLBE block 120 may classify the provided information (eg, the prediction mode, the magnitude or direction of the motion vector, the in-frame direction, the size of the coding unit CU, the partitioned form of the image, and the size of the transform unit) according to predetermined criteria. The predetermined criteria may include information indicating whether any information is applied first and a detailed operational flow based on the corresponding information.

在操作S230中，MLBE塊120可檢查預測模式且可分支至操作。為簡化實施例中的說明，可假設MLBE塊120根據預測模式及運動向量來確定機器學習技術。當然，應充分理解，可應用各種編碼資訊MLS_Info的組合來確定機器學習技術。若預測模式是訊框內模式，則流程可移動至操作S280。另一方面，若預測模式是訊框間模式，則流程可移動至操作S240。In operation S230, the MLBE block 120 may check the prediction mode and may branch to the operation. To simplify the description in the embodiment, it may be assumed that the MLBE block 120 determines the machine learning technique based on the prediction mode and the motion vector. Of course, it should be fully understood that a combination of various coding information MLS_Info can be applied to determine machine learning techniques. If the prediction mode is the intra-frame mode, the flow may move to operation S280. On the other hand, if the prediction mode is the inter-frame mode, the flow may move to operation S240.

在操作S240中，操作可根據運動向量進行分支。為簡化說明，可假設操作根據運動向量的方向進行分支。若運動向量對應於第一方向Dir1，則流程可移動至操作S250。另一方面，若運動向量對應於第二方向Dir2，則流程可移動至操作S260。若運動向量對應於第三方向Dir3，則流程可移動至操作S270。In operation S240, the operation may be branched according to the motion vector. To simplify the description, it can be assumed that the operation branches according to the direction of the motion vector. If the motion vector corresponds to the first direction Dir1, the flow may move to operation S250. On the other hand, if the motion vector corresponds to the second direction Dir2, the flow may move to operation S260. If the motion vector corresponds to the third direction Dir3, the flow may move to operation S270.

在操作S250至操作S280中的每一者中，可根據所選擇參數集合來對預測區塊P進行處理。作為示例性實例，在操作S250中，可根據被設定成第一參數集合的CNN演算法對預測區塊P進行處理。在操作S260中，可根據被設定成第二參數集合的CNN演算法對預測區塊P進行處理。在操作S270中，可根據被設定成第三參數集合的CNN演算法對預測區塊P進行處理。在操作S280中，可根據被設定成第四參數集合的CNN演算法對預測區塊P進行處理。儘管將實施例說明為劃分成四個參數集合，然而本揭露的實施例並非僅限於此。In each of operations S250 to S280, the prediction block P may be processed according to the selected parameter set. As an illustrative example, in operation S250, the prediction block P may be processed according to a CNN algorithm set to the first parameter set. In operation S260, the prediction block P may be processed according to a CNN algorithm set to the second parameter set. In operation S270, the prediction block P may be processed according to a CNN algorithm set to a third parameter set. In operation S280, the prediction block P may be processed according to a CNN algorithm set to a fourth parameter set. Although the embodiment is illustrated as being divided into four parameter sets, the embodiments of the present disclosure are not limited thereto.

在操作S290中，MLBE塊120可輸出藉由所選擇參數集合的機器學習演算法產生的增強的預測區塊EP。所輸出的增強的預測區塊EP將被傳送至減法器11（參見圖3）。In operation S290, the MLBE block 120 may output the enhanced prediction block EP generated by the machine learning algorithm of the selected parameter set. The output enhanced prediction block EP will be transmitted to the subtractor 11 (see Fig. 3).

如上所述，機器學習的類型可由MLBE塊120根據編碼資訊MLS_Info進行選擇，且相同的機器學習演算法中的參數集合可由MLBE塊120根據本揭露的實施例進行選擇。由於選擇了與各種預測區塊P對應的最佳的機器學習演算法或參數集合，因此增強的預測區塊EP與源區塊S之間的差異可被最小化。As described above, the type of machine learning can be selected by the MLBE block 120 based on the encoded information MLS_Info, and the set of parameters in the same machine learning algorithm can be selected by the MLBE block 120 in accordance with embodiments of the present disclosure. Since the optimal machine learning algorithm or parameter set corresponding to the various prediction blocks P is selected, the difference between the enhanced prediction block EP and the source block S can be minimized.

圖7是示出根據本揭露實施例的MLBE塊的訓練方法的方塊圖。參照圖7，可使用各種圖案或影像離線地學習或訓練MLBE塊120中所包括的機器學習演算法。FIG. 7 is a block diagram showing a training method of an MLBE block according to an embodiment of the present disclosure. Referring to Figure 7, the machine learning algorithms included in the MLBE block 120 can be learned or trained offline using various patterns or images.

MLBE塊120的機器學習演算法ML1至MLn中的每一者可藉由源區塊S 121及預測區塊P 122的輸入來進行訓練。舉例而言，在NN機器學習演算法的情形中，預測區塊P 122可代表源區塊S 121的多個不同的預測區塊。舉例而言，可對機器學習參數ML參數進行更新以使得由各種預測模式產生的預測區塊P 122與源區塊S 121相映射。Each of the machine learning algorithms ML1 through MLn of the MLBE block 120 can be trained by input of the source block S 121 and the prediction block P 122. For example, in the case of an NN machine learning algorithm, prediction block P 122 may represent a plurality of different prediction blocks of source block S 121. For example, the machine learning parameter ML parameters may be updated such that the prediction block P 122 generated by the various prediction modes is mapped to the source block S 121.

可對機器學習演算法ML1至MLn中的每一者執行使用源區塊S 121及預測區塊P 122進行的訓練。若對先前準備的各種影像或圖案進行了訓練，則機器學習演算法ML1至MLn中的每一者的參數可為固定的。舉例而言，在作為用於訓練CNN的一種資料集合的ImageNet情形中，可使用約14,000,000或多於14,000,000個訓練影像。因此，每一種機器學習演算法可具有使用一或多個預定的訓練資料集合學習得到的決策參數。Training using source block S 121 and prediction block P 122 may be performed on each of machine learning algorithms ML1 through MLn. If various previously prepared images or patterns are trained, the parameters of each of the machine learning algorithms ML1 through MLn may be fixed. For example, in an ImageNet scenario as a collection of data for training CNN, approximately 14,000,000 or more than 14,000,000 training images may be used. Thus, each machine learning algorithm can have decision parameters learned using one or more predetermined sets of training data.

若上述學習或訓練過程完成，則MLBE塊120可產生相對於根據編碼資訊MLS_Info輸入的預測區塊P 122而言具有與源區塊S 121最相似的值的增強的預測區塊EP。If the learning or training process described above is completed, the MLBE block 120 may generate an enhanced prediction block EP having a value most similar to the source block S 121 with respect to the prediction block P 122 input according to the encoding information MLS_Info.

圖8是示出根據本揭露另一實施例的MLBE塊的訓練方法的圖式。參照圖8，可使用將根據在線訓練方案進行處理的影像來訓練MLBE塊120中所包括的機器學習演算法。FIG. 8 is a diagram showing a training method of an MLBE block according to another embodiment of the present disclosure. Referring to Figure 8, the machine learning algorithms included in the MLBE block 120 can be trained using images that are processed according to an online training scheme.

在此種情形中，可使用由輸入影像構成的訊框來訓練機器學習演算法，而不使用提前訓練的機器學習演算法。若訓練期（training session）結束，則隨後可使用訓練結果來僅執行參數更新。舉例而言，MLBE塊120的訓練可使用輸入訊框中與訓練期（例如，訓練間隔）對應的訊框F0至F4來執行。若訓練期結束，則可僅執行使用隨後輸入的訊框F5至F11進行訓練的參數的更新。因此，若提供輸入影像，則可使用輸入影像的訊框來對每一種機器學習演算法進行訓練，例如在訓練間隔期間。In this case, a frame composed of input images can be used to train the machine learning algorithm without using an advanced training machine learning algorithm. If the training session ends, then the training results can be used to perform only parameter updates. For example, the training of the MLBE block 120 can be performed using frames F0 through F4 corresponding to the training period (eg, training interval) in the input frame. If the training period ends, only the updating of the parameters for training using the subsequently input frames F5 to F11 may be performed. Thus, if an input image is provided, each machine learning algorithm can be trained using the input image frame, for example during the training interval.

若使用在線訓練方案，則不需要具有單獨的資料集合來進行訓練。可使用輸入影像對機器學習演算法進行訓練。因此，參數大小可相對小。然而，若未提供用於支持在線訓練的元素及資源，則可能難以容許恰當的效能。If an online training program is used, there is no need to have a separate data set for training. Machine learning algorithms can be trained using input images. Therefore, the parameter size can be relatively small. However, if the elements and resources used to support online training are not provided, it may be difficult to accommodate proper performance.

圖9是示出根據本揭露另一個實施例的MLB編碼器的方塊圖。參照圖9，MLB編碼器200包括減法器210、MLBE塊220、變換器232、量化器234、預測塊240、熵編碼器裝置250及編碼器裝置控制器260。在本文中，預測塊240包括反量化器241、逆變換器242、加法器243、環內濾波器244、緩衝器245、運動估測塊246、運動補償塊247、訊框內預測塊248及模式決策塊249。MLB編碼器100可選擇性地提供MLB預測區塊的增強功能。舉例而言，MLB編碼器200可根據指示編碼效率的率畸變最佳化（RDO）值來判斷是否使用MLBE塊220。因此，可選擇具有壓縮效率更佳的RDO值的區塊作為預測區塊與增強的預測區塊之間的所選擇結果。FIG. 9 is a block diagram showing an MLB encoder according to another embodiment of the present disclosure. Referring to FIG. 9, the MLB encoder 200 includes a subtractor 210, an MLBE block 220, a transformer 232, a quantizer 234, a prediction block 240, an entropy encoder device 250, and an encoder device controller 260. In this context, prediction block 240 includes inverse quantizer 241, inverse transformer 242, adder 243, in-loop filter 244, buffer 245, motion estimation block 246, motion compensation block 247, intra-frame prediction block 248, and Mode decision block 249. The MLB encoder 100 can selectively provide enhanced functionality of the MLB prediction block. For example, the MLB encoder 200 can determine whether to use the MLBE block 220 based on a rate distortion optimization (RDO) value indicating coding efficiency. Therefore, a block having a compression efficiency better RDO value can be selected as the selected result between the prediction block and the enhanced prediction block.

在本文中，除了MLBE塊220根據RDO值來判斷是否使用MLB預測增強之外，減法器210、變換器232、量化器234、預測塊240、熵編碼器裝置250及編碼器裝置控制器260可實質上相同於圖3所示。因此，將省略對減法器210、變換器232、量化器234、預測塊240、熵編碼器裝置250及編碼器裝置控制器260的詳細功能的說明。Herein, in addition to the MLBE block 220 determining whether to use the MLB prediction enhancement based on the RDO value, the subtractor 210, the transformer 232, the quantizer 234, the prediction block 240, the entropy encoder device 250, and the encoder device controller 260 may be Substantially the same as shown in Figure 3. Therefore, the detailed functions of the subtractor 210, the converter 232, the quantizer 234, the prediction block 240, the entropy encoder device 250, and the encoder device controller 260 will be omitted.

另一方面，MLBE塊220可具有圖3所示MLB預測區塊P的增強功能且可額外地判斷是否產生增強的預測區塊EP或旁通（bypass）所提供的預測區塊P。若根據RDO值確定在機器學習技術中不存在由對預測區塊進行處理引起的效能增益，則MLBE塊220可將自模式決策塊249提供的預測區塊P旁通至減法器210。若根據RDO值確定在機器學習技術中存在由對預測區塊進行處理引起的效能增益，則MLBE塊220可採用機器學習技術對自模式決策塊249提供的預測區塊P進行處理且可將增強的預測區塊（EP）作為處理結果傳送至減法器210。On the other hand, the MLBE block 220 may have the enhancement function of the MLB prediction block P shown in FIG. 3 and may additionally determine whether to generate the enhanced prediction block EP or the predicted block P provided by bypass. If it is determined from the RDO value that there is no performance gain caused by processing the prediction block in the machine learning technique, the MLBE block 220 may bypass the prediction block P provided from the mode decision block 249 to the subtractor 210. If it is determined from the RDO value that there is a performance gain caused by processing the prediction block in the machine learning technique, the MLBE block 220 may process the prediction block P provided from the mode decision block 249 using machine learning techniques and may enhance The prediction block (EP) is transmitted to the subtractor 210 as a result of the processing.

可防止藉由上述對MLBE塊220的選擇性預測增強操作而產生由預測增強引起的開銷（overhead）。在本文中，RDO值被闡述為用於判斷是否應用MLBE塊220的預測增強操作的資訊的實例。然而，根據本揭露實施例的MLBE塊220可使用各種效能參數以及RDO值來執行選擇性預測增強操作。Overhead caused by prediction enhancement can be prevented by the selective prediction enhancement operation on MLBE block 220 described above. Herein, the RDO value is illustrated as an example of information for judging whether to apply the prediction enhancement operation of the MLBE block 220. However, the MLBE block 220 in accordance with embodiments of the present disclosure may perform selective prediction enhancement operations using various performance parameters and RDO values.

圖10是示出圖9所示MLBE塊的功能的方塊圖。參照圖10，MLBE塊220包括MLBE塊222及選擇塊224。MLBE塊222可使用各種編碼資訊MLS_Info將預測區塊P變換成最佳的增強的預測區塊EP。選擇塊224可選擇預測區塊P及增強的預測區塊EP中的一者。Figure 10 is a block diagram showing the function of the MLBE block shown in Figure 9. Referring to FIG. 10, the MLBE block 220 includes an MLBE block 222 and a selection block 224. The MLBE block 222 can transform the prediction block P into the optimal enhanced prediction block EP using various encoding information MLS_Info. Selection block 224 may select one of prediction block P and enhanced prediction block EP.

MLBE塊222可具有各種機器學習演算法ML1至MLn。MLBE塊222可使用編碼資訊MLS_Info選擇具有最佳的增強效能的機器學習演算法。MLBE塊222可執行與圖4所示MLBE塊120實質上相同的功能。因此，將省略對MLBE塊222的說明。The MLBE block 222 can have various machine learning algorithms ML1 through MLn. The MLBE block 222 can use the encoded information MLS_Info to select a machine learning algorithm with the best enhanced performance. The MLBE block 222 can perform substantially the same functions as the MLBE block 120 shown in FIG. Therefore, the description of the MLBE block 222 will be omitted.

選擇塊224可參照RDO值選擇預測區塊P及增強的預測區塊EP中的一者。所選擇的一個區塊可作為向圖9所示減法器210提供的選擇預測區塊SP進行輸出。可藉由根據RDO值選擇預測區塊P或增強的預測區塊EP而減小藉由應用機器學習而產生的開銷。Selection block 224 may select one of prediction block P and enhanced prediction block EP with reference to the RDO value. The selected one block can be output as the selection prediction block SP supplied to the subtractor 210 shown in FIG. The overhead incurred by applying machine learning can be reduced by selecting the prediction block P or the enhanced prediction block EP according to the RDO value.

圖11是示出圖10所示MLBE塊的操作的流程圖。參照圖11，圖9所示MLBE塊220可參照RDO值對自圖9所示模式決策塊249提供至圖9所示減法器210的未被應用增強操作的預測區塊P進行旁通。Figure 11 is a flow chart showing the operation of the MLBE block shown in Figure 10. Referring to FIG. 11, the MLBE block 220 shown in FIG. 9 can bypass the prediction block P supplied from the mode decision block 249 shown in FIG. 9 to the subtractor 210 shown in FIG. 9 without applying the enhancement operation with reference to the RDO value.

在操作S310中，MLBE塊220可接收自模式決策塊249產生的預測區塊P。In operation S310, the MLBE block 220 may receive the prediction block P generated from the mode decision block 249.

在操作S320中，MLBE塊220可計算RDO值。MLBE塊220可根據RDO值來判斷是否執行MLB預測增強操作或者是否在不執行MLB預測增強操作的條件下將自模式決策塊249提供的預測區塊P傳送至減法器210。In operation S320, the MLBE block 220 may calculate an RDO value. The MLBE block 220 may determine whether to perform the MLB prediction enhancement operation or whether to transmit the prediction block P supplied from the mode decision block 249 to the subtractor 210 without performing the MLB prediction enhancement operation according to the RDO value.

在操作S330中，若在根據RDO值使用機器學習來處理預測區塊P時的效能等於小於（差於）在不使用機器學習時的效能（ML£非ML），則流程可移動至操作S340。相反，若使用機器學習來處理預測區塊P時的效能大於（好於）在不使用機器學習時的效能（ML＞非ML），則流程可移動至操作S350。In operation S330, if the performance when the prediction block P is processed using machine learning according to the RDO value is equal to less than (sparse) the performance when the machine learning is not used (ML £ non-ML), the flow may move to operation S340. . On the contrary, if the performance when using the machine learning to process the prediction block P is greater than (better than) the performance when the machine learning is not used (ML>non-ML), the flow may move to operation S350.

在操作S340中，MLBE塊220可選擇自模式決策塊249提供的預測區塊P且可將所選擇預測區塊P傳送至減法器210。In operation S340, the MLBE block 220 may select the prediction block P provided from the mode decision block 249 and may transmit the selected prediction block P to the subtractor 210.

在操作S350中，MLBE塊220可藉由MLBE塊222對自模式決策塊249提供的預測區塊P進行處理以獲得處理結果。MLBE塊220可選擇增強的預測區塊EP且可將所選擇增強的預測區塊EP傳送至減法器210。In operation S350, the MLBE block 220 may process the prediction block P provided from the mode decision block 249 by the MLBE block 222 to obtain a processing result. The MLBE block 220 may select the enhanced prediction block EP and may transmit the selected enhanced prediction block EP to the subtractor 210.

在操作S360中，MLB編碼器200可在視訊流語法中寫入指示是否藉由應用機器學習來對所傳送的位元流進行壓縮的旗標。In operation S360, the MLB encoder 200 may write a flag indicating whether to compress the transmitted bit stream by applying machine learning in the video stream syntax.

如上所述，可使用RDO值來判斷是否應用根據本揭露實施例的MLB預測區塊的啟用（activation）操作。在一些情形中，可能會出現特殊情形，即，當應用MLB啟用時，開銷會增加。在此種情形中，根據本揭露實施例的MLB編碼器200可選擇不應用機器學習的預測區塊P以防止因執行機器學習引起的開銷。As described above, the RDO value can be used to determine whether to apply an activation operation of the MLB prediction block according to the disclosed embodiment. In some cases, a special case may arise where the overhead is increased when the application MLB is enabled. In this case, the MLB encoder 200 according to an embodiment of the present disclosure may select a prediction block P that does not apply machine learning to prevent overhead caused by performing machine learning.

圖12是示出參照圖11所闡述的根據本揭露實施例的視訊流語法的實例的圖式。參照圖12，可證實，根據本揭露實施例的MLB預測增強操作是藉由表示編碼單元的語法來應用的。FIG. 12 is a diagram showing an example of a video stream syntax according to an embodiment of the present disclosure explained with reference to FIG. Referring to FIG. 12, it can be confirmed that the MLB prediction enhancement operation according to the embodiment of the present disclosure is applied by representing the syntax of the coding unit.

當傳送影像或區塊的位元流時，若應用MLB預測增強，則圖1所示MLB編碼器200可在視訊流語法的旗標（ml_based_pred_enhancement_flag）中寫入‘1’。相反，當傳送影像或區塊的位元流時，若不應用MLB預測增強，則MLB編碼器200可在視訊流語法的旗標中寫入‘0’。When the bit stream of the image or block is transmitted, if the MLB prediction enhancement is applied, the MLB encoder 200 shown in Fig. 1 can write '1' in the flag of the video stream syntax (ml_based_pred_enhancement_flag). In contrast, when transmitting a bit stream of an image or a block, if the MLB prediction enhancement is not applied, the MLB encoder 200 can write '0' in the flag of the video stream syntax.

參照視訊流語法的旗標（ml_based_pred_enhancement_flag），解碼器可在進行解碼操作時應用或跳過藉由機器學習進行的預測增強操作。Referring to the flag of the video stream syntax (ml_based_pred_enhancement_flag), the decoder can apply or skip the prediction enhancement operation by machine learning when performing the decoding operation.

圖13是示出MLB解碼器的方塊圖。參照圖13，根據本揭露實施例的MLB解碼器300包括用於解碼的MLBE塊390。Figure 13 is a block diagram showing an MLB decoder. Referring to Figure 13, an MLB decoder 300 in accordance with an embodiment of the present disclosure includes an MLBE block 390 for decoding.

MLBE塊390可對圖3所示MLBE塊120或圖9所示MLBE塊220執行相同或相似的操作。換言之，MLBE塊390可使用編碼資訊MLS_Info選擇最佳的機器學習演算法且可使用所選擇機器學習演算法來產生預測區塊P作為增強的預測區塊EP。作為另外一種選擇，MLBE塊390可參考視訊流語法中所包括的旗標來選擇預測區塊P或增強的預測區塊EP。MLB解碼器300可使用此種MLB增強的預測區塊EP來將位元流30重構為輸出影像40。The MLBE block 390 can perform the same or similar operations on the MLBE block 120 shown in FIG. 3 or the MLBE block 220 shown in FIG. In other words, MLBE block 390 can use the encoding information MLS_Info to select the best machine learning algorithm and can use the selected machine learning algorithm to generate prediction block P as the enhanced prediction block EP. Alternatively, MLBE block 390 can select prediction block P or enhanced prediction block EP with reference to a flag included in the video stream syntax. MLB decoder 300 may reconstruct bit stream 30 into output image 40 using such MLB enhanced prediction block EP.

圖14是示出圖13所示MLB解碼器的詳細配置的方塊圖。參照圖14，MLB解碼器300包括熵解碼器310、反量化器320、逆變換器330、加法器340、環內濾波器350、緩衝器360、訊框內預測塊370、運動補償塊372、運動估測塊374、模式決策塊380及MLBE塊390。Fig. 14 is a block diagram showing a detailed configuration of the MLB decoder shown in Fig. 13. Referring to FIG. 14, the MLB decoder 300 includes an entropy decoder 310, an inverse quantizer 320, an inverse transformer 330, an adder 340, an in-loop filter 350, a buffer 360, an intra-frame prediction block 370, a motion compensation block 372, Motion estimation block 374, mode decision block 380, and MLBE block 390.

MLB解碼器300可接收自圖1所示MLB編碼器100或圖9所示MLB編碼器200輸出的位元流且可以訊框內模式或訊框間模式執行解碼以輸出重構影像。MLB解碼器300可自所接收的位元流獲得經重構的殘留區塊且可產生預測區塊P。若使用MLBE塊390執行預測區塊P的MLB處理，則可產生增強的預測區塊EP。MLB解碼器300可將經重構的殘留區塊加到增強的預測區塊EP以產生經重新配置的重構區塊。The MLB decoder 300 can receive the bit stream output from the MLB encoder 100 shown in FIG. 1 or the MLB encoder 200 shown in FIG. 9 and can perform decoding in an intra-frame mode or an inter-frame mode to output a reconstructed image. The MLB decoder 300 may obtain the reconstructed residual block from the received bitstream and may generate the predicted block P. If the MLB processing of the prediction block P is performed using the MLBE block 390, an enhanced prediction block EP can be generated. The MLB decoder 300 may add the reconstructed residual block to the enhanced prediction block EP to produce a reconfigured reconstructed block.

MLB解碼器300的總體元件可實質上相同於上述MLB編碼器100或200的總體元件。因此，以下將省略對MLB解碼器300的元件的詳細說明。The overall elements of MLB decoder 300 may be substantially identical to the overall elements of MLB encoder 100 or 200 described above. Therefore, a detailed description of the elements of the MLB decoder 300 will be omitted below.

圖15是示出根據本揭露實施例的用於執行MLB預測增強操作的可攜式終端的方塊圖。參照圖15，根據本揭露實施例的可攜式終端1000包括影像處理單元1100、無線收發器、音訊處理單元、電源管理積體電路（power management integrated circuit，PMIC）1400、電池1450、記憶體1500、使用者介面1600及控制器1700。FIG. 15 is a block diagram showing a portable terminal for performing an MLB prediction enhancement operation according to an embodiment of the present disclosure. Referring to FIG. 15, a portable terminal 1000 according to an embodiment of the present disclosure includes an image processing unit 1100, a wireless transceiver, an audio processing unit, a power management integrated circuit (PMIC) 1400, a battery 1450, and a memory 1500. User interface 1600 and controller 1700.

影像處理單元1100包括鏡頭1110、影像感測器1120、影像處理器1130及顯示單元1140。無線收發器包括天線1210、收發器1220及數據機1230。音訊處理單元包括音訊處理器1310、麥克風1320及揚聲器1330。The image processing unit 1100 includes a lens 1110, an image sensor 1120, an image processor 1130, and a display unit 1140. The wireless transceiver includes an antenna 1210, a transceiver 1220, and a data machine 1230. The audio processing unit includes an audio processor 1310, a microphone 1320, and a speaker 1330.

具體而言，根據本揭露實施例的影像處理單元1100可藉由應用機器學習技術來對預測區塊進行處理。在此種情形中，影像處理單元1100可減小殘留資料而不增大視訊訊號的報頭資料的大小。Specifically, the image processing unit 1100 according to the embodiment of the present disclosure can process the prediction block by applying machine learning technology. In this case, the image processing unit 1100 can reduce the residual data without increasing the size of the header data of the video signal.

依照根據本揭露實施例的編碼器及編碼方法，可實作在藉由將預測區塊與源區塊之間的差最小化來增強資料壓縮率的同時影像品質劣化程度低的編碼器及解碼器。According to the encoder and the encoding method according to the embodiments of the present disclosure, an encoder and decoding with low image quality degradation while enhancing the data compression rate by minimizing the difference between the prediction block and the source block can be implemented. Device.

儘管已參照示例性實施例闡述了本揭露，然而對熟習此項技術者將顯而易見的是，在不背離本揭露的精神及範圍的條件下，可作出各種改變及潤飾。因此，應理解，上述實施例並非限制性的，而是說明性的。While the present invention has been described with reference to the exemplary embodiments thereof, it will be apparent to those skilled in the art that various changes and modifications can be made without departing from the spirit and scope of the disclosure. Therefore, it should be understood that the above-described embodiments are not limiting, but illustrative.

10‧‧‧輸入影像10‧‧‧ Input image

20‧‧‧輸出資料20‧‧‧Output data

22‧‧‧報頭資料22‧‧‧ Header information

24‧‧‧殘留資料24‧‧‧Residual data

30‧‧‧位元流30‧‧‧ bit stream

40‧‧‧輸出影像40‧‧‧ Output image

100、200‧‧‧MLB編碼器100, 200‧‧‧MLB encoder

110、210‧‧‧減法器110, 210‧‧‧ subtractor

120、220、222、390‧‧‧MLBE塊120, 220, 222, 390‧‧‧MLBE blocks

121‧‧‧源區塊S121‧‧‧Source Block S

122‧‧‧預測區塊P122‧‧‧ Forecast Block P

130‧‧‧變換器/量化器130‧‧‧Transformer/Quantizer

132、232‧‧‧變換器132, 232‧‧ ‧ inverter

134、234‧‧‧量化器134, 234‧‧ ‧ quantizer

140、240‧‧‧預測塊140, 240‧‧‧ forecast block

141、241、320‧‧‧反量化器141, 241, 320‧‧‧ inverse quantizer

142、242、330‧‧‧逆變換器142, 242, 330‧‧‧ inverse transformer

143、243、340‧‧‧加法器143, 243, 340‧‧ ‧ adders

144、244、350‧‧‧環內濾波器144, 244, 350‧‧‧ In-loop filter

145、245、360‧‧‧緩衝器145, 245, 360‧ ‧ buffer

146、374‧‧‧運動估測塊146, 374‧‧ ‧ motion estimation block

147、247、372‧‧‧運動補償塊147, 247, 372 ‧ ‧ motion compensation block

148、248、370‧‧‧訊框內預測塊148, 248, 370‧‧‧ In-frame prediction block

149、249、380‧‧‧模式決策塊149, 249, 380‧‧‧ mode decision block

150、250‧‧‧熵編碼器裝置150, 250‧‧‧ Entropy encoder device

160、260‧‧‧編碼器裝置控制器160, 260‧‧‧Encoder device controller

224‧‧‧選擇塊224‧‧‧Select block

246‧‧‧運動估測246‧‧‧Sports estimation

300‧‧‧MLB解碼器300‧‧‧MLB decoder

310‧‧‧熵解碼器310‧‧‧ Entropy decoder

1000‧‧‧可攜式終端1000‧‧‧Portable Terminal

1100‧‧‧影像處理單元1100‧‧‧Image Processing Unit

1110‧‧‧鏡頭1110‧‧‧ lens

1120‧‧‧影像感測器1120‧‧‧Image Sensor

1130‧‧‧影像處理器1130‧‧‧Image Processor

1140‧‧‧顯示單元1140‧‧‧Display unit

1210‧‧‧天線1210‧‧‧Antenna

1220‧‧‧收發器1220‧‧‧ transceiver

1230‧‧‧數據機1230‧‧‧Data machine

1310‧‧‧音訊處理器1310‧‧‧Optical processor

1320‧‧‧麥克風1320‧‧‧Microphone

1330‧‧‧揚聲器1330‧‧‧Speakers

1400‧‧‧電源管理積體電路1400‧‧‧Power Management Integrated Circuit

1450‧‧‧電池1450‧‧‧Battery

1500‧‧‧記憶體1500‧‧‧ memory

1600‧‧‧使用者介面1600‧‧‧User interface

1700‧‧‧控制器1700‧‧‧ controller

Bitstream‧‧‧位元流Bitstream‧‧‧ bitstream

CNT1、CNT2、CNT3、CNT4‧‧‧控制訊號CNT1, CNT2, CNT3, CNT4‧‧‧ control signals

Dir1‧‧‧第一方向Dir1‧‧‧ first direction

Dir2‧‧‧第二方向Dir2‧‧‧ second direction

Dir3‧‧‧第三方向Dir3‧‧‧ third direction

EP‧‧‧增強的預測區塊/MLB增強的預測區塊EP‧‧‧ Enhanced Prediction Block/MLB Enhanced Prediction Block

F0、F1、F2、F3、F4、F5、F6、F7、F8、F9、F10、F11‧‧‧訊框F0, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11‧‧‧ frames

ML1‧‧‧機器學習演算法/決策樹機器學習演算法ML1‧‧‧ machine learning algorithm/decision tree machine learning algorithm

ML2‧‧‧第二機器學習演算法/CNN機器學習演算法/機器學習演算法ML2‧‧‧Second Machine Learning Algorithm/CNN Machine Learning Algorithm/Machine Learning Algorithm

ML3‧‧‧第三機器學習演算法/SVM機器學習演算法/機器學習演算法ML3‧‧‧ third machine learning algorithm/SVM machine learning algorithm/machine learning algorithm

MLn‧‧‧機器學習演算法MLn‧‧‧ machine learning algorithm

MLS_Info‧‧‧編碼資訊MLS_Info‧‧‧ Coded Information

P‧‧‧預測區塊/MLB預測區塊P‧‧‧ Forecast Block/MLB Forecast Block

P_Inter‧‧‧訊框間預測區塊P_Inter‧‧‧ Inter-frame prediction block

P_Intra‧‧‧訊框內預測區塊P_Intra‧‧‧ In-frame prediction block

P1‧‧‧第一預測區塊P1‧‧‧ first forecast block

P2‧‧‧第二預測區塊P2‧‧‧ second forecast block

S‧‧‧源區塊S‧‧‧ source block

S110、S120、S130、S140、S150、S160、S170、S180、S190、S210、S220、S230、S240、S250、S260、S270、S280、S290、S310、S320、S330、S340、S350、S360‧‧‧操作S110, S120, S130, S140, S150, S160, S170, S180, S190, S210, S220, S230, S240, S250, S260, S270, S280, S290, S310, S320, S330, S340, S350, S360‧‧ operating

SP‧‧‧選擇預測區塊SP‧‧‧Select prediction block

參照以下各圖閱讀以下說明，以上及其他目標及特徵將變得顯而易見，其中除非另外指明，否則在所有的各圖中相同的參考編號指代相同的部件，且在各圖中：圖1是示出根據本揭露實施例的MLB編碼器的配置的方塊圖。圖2是示出圖1所示MLB編碼器的示意性配置的方塊圖。圖3是示出圖2所示MLB編碼器的詳細配置的方塊圖。圖4是示出圖3所示機器學習式預測增強（MLBE）塊的特性的方塊圖。圖5A及圖5B是示出用於根據每一種預測模式而選擇最佳的機器學習演算法的MLBE塊的方塊圖。圖6A及圖6B是示出根據本揭露實施例的根據預測區塊的特性而選擇機器學習技術的編碼方法的流程圖。圖7是示出根據本揭露實施例的MLBE塊的訓練方法的方塊圖。圖8是示出根據本揭露另一個實施例的MLBE塊的訓練方法的圖式。圖9是示出根據本揭露另一個實施例的MLB編碼器的方塊圖。圖10是示出圖9所示MLBE塊的功能的方塊圖。圖11是示出圖10所示MLBE塊的操作的流程圖。圖12是示出參照圖11所闡述的根據本揭露實施例的視訊流語法（video stream syntax）的實例的圖式。圖13是示出MLB解碼器的方塊圖。圖14是示出圖13所示MLB解碼器的詳細配置的方塊圖。圖15是示出根據本揭露實施例的用於執行MLB預測增強操作的可攜式終端的方塊圖。The above and other objects and features will be apparent from the following description, wherein the same reference numerals refer to the same. A block diagram showing a configuration of an MLB encoder according to an embodiment of the present disclosure. FIG. 2 is a block diagram showing a schematic configuration of the MLB encoder shown in FIG. 1. Fig. 3 is a block diagram showing a detailed configuration of the MLB encoder shown in Fig. 2. 4 is a block diagram showing the characteristics of the Machine Learning Predictive Enhancement (MLBE) block shown in FIG. 5A and 5B are block diagrams showing MLBE blocks for selecting an optimal machine learning algorithm for each prediction mode. 6A and 6B are flowcharts illustrating an encoding method of selecting a machine learning technique according to characteristics of a prediction block, according to an embodiment of the present disclosure. FIG. 7 is a block diagram showing a training method of an MLBE block according to an embodiment of the present disclosure. FIG. 8 is a diagram showing a training method of an MLBE block according to another embodiment of the present disclosure. FIG. 9 is a block diagram showing an MLB encoder according to another embodiment of the present disclosure. Figure 10 is a block diagram showing the function of the MLBE block shown in Figure 9. Figure 11 is a flow chart showing the operation of the MLBE block shown in Figure 10. FIG. 12 is a diagram showing an example of a video stream syntax according to an embodiment of the present disclosure explained with reference to FIG. Figure 13 is a block diagram showing an MLB decoder. Fig. 14 is a block diagram showing a detailed configuration of the MLB decoder shown in Fig. 13. FIG. 15 is a block diagram showing a portable terminal for performing an MLB prediction enhancement operation according to an embodiment of the present disclosure.

Claims

An image encoder for outputting a stream of bits by encoding an input image, the image encoder comprising: a prediction block configured to generate a prediction block using data of a previously input block; machine learning prediction enhancement (MLBE) a block configured to transform the prediction block into an enhanced prediction block by applying machine learning techniques to the prediction block; and a subtractor configured to input a pixel from the current block The data is subtracted from the pixel data of the enhanced prediction block to generate a residual block of residual data.

The image encoder of claim 1, wherein the machine learning prediction enhancement block is configured to: execute a plurality of machine learning algorithms to process the prediction block.

The image encoder of claim 2, wherein the machine learning prediction enhancement block is configured to: select at least one of the plurality of machine learning algorithms as a reference with reference to encoding information of the input image Selecting a machine learning algorithm; and processing the predicted block using the selected machine learning algorithm.

The image encoder of claim 3, wherein the encoding information comprises at least one of: a prediction mode corresponding to the prediction block, a magnitude and a direction of the motion vector, and an in-frame direction , the size of the coding unit, the partition from the image, and the size of the transform unit.

The image encoder of claim 2, wherein the plurality of machine learning algorithms comprise at least one of: a decision tree, a neural network (NN), a convolutional neural network (CNN), a support vector Machine (SVM), reinforcement learning and K nearest neighbor (K-NN) algorithm.

The image encoder of claim 1, wherein the machine learning prediction enhancement block is configured to: according to a rate distortion of each of the prediction block and the enhanced prediction block Depending on the RDO value, one of the prediction block and the enhanced prediction block is transmitted to the subtractor.

The image encoder of claim 6, wherein the machine learning prediction enhancement block comprises: a machine learning prediction enhancement block configured to select one of a plurality of machine learning algorithms as the selected machine learning calculus a method of processing the prediction block using the selected machine learning algorithm to obtain a processing result, and generating the enhanced prediction block according to the processing result; and selecting a block configured to be based on the prediction And selecting a rate distortion optimization value for each of the block and the enhanced prediction block, and selecting one of the prediction block and the enhanced prediction block as the selected block, and The selected block is transferred to the subtractor.

The image encoder of claim 1, wherein the machine learning prediction enhancement block is configured to: execute a machine learning algorithm that selects one of a plurality of parameter sets as selected according to the encoding information The set of parameters is used and the predicted block is processed using the selected set of parameters.

A method of processing image data, the method comprising: generating a prediction block from time domain data of a previously input block; and applying the prediction block by applying at least one of a plurality of machine learning techniques to the prediction block Transforming into an enhanced prediction block; and generating a residual block by subtracting the enhanced prediction block from the current input block.

A method of processing image data, the method comprising: generating a prediction block from time domain data of a previously input block; and applying the prediction block by applying at least one of a plurality of machine learning techniques to the prediction block Transforming into an enhanced prediction block; selecting the prediction block and the enhancement using a rate distortion optimization (RDO) value corresponding to each of the prediction block and the enhanced prediction block One of the prediction blocks is selected as the block; and the residual block is generated by subtracting the selected block from the current input block.