TW202218428A

TW202218428A - Image encoding method, image decoding method, and related apparatuses

Info

Publication number: TW202218428A
Application number: TW110130846A
Authority: TW
Inventors: 馬展; 劉浩杰
Original assignee: 大陸商Ｏｐｐｏ廣東移動通信有限公司
Priority date: 2020-10-28
Filing date: 2021-08-20
Publication date: 2022-05-01
Also published as: CN114501010B; CN114501010A; WO2022088631A1

Abstract

Disclosed in embodiments of the present application are an image encoding method, an image decoding method, and related apparatuses. The image decoding method comprises: obtaining an original residual block of a current coding block, the current coding block comprising a currently processed video frame or coding units obtained by dividing the currently processed video frame; obtaining transform features of the current coding block according to the original residual block and a pre-trained feature prediction model; quantizing the transform features of the current coding block to obtain quantized features of the current coding block; determining, by means of a pre-trained probability prediction model, the probability of each pixel in the quantized features of the current coding block; and generating a binary code stream of the current coding block using the probability of each pixel. According to the embodiments of the present application, adaptive dynamic residual compensation is implemented, and different forms of inter-frame residual information can be effectively encoded.

Description

Image coding method, image decoding method and related device

本申請涉及電子設備技術領域，具體涉及一種圖像編碼方法、圖像解碼方法及相關裝置。The present application relates to the technical field of electronic devices, and in particular, to an image encoding method, an image decoding method, and related apparatuses.

數位影像能力可併入到大範圍的裝置中，包含數位電視、數位直播系統、無線廣播系統、個人數位助理(personal digital assistant，PDA)、膝上型或桌上型電腦、平板電腦、電子書閱讀器、數位相機、數位記錄裝置、數位媒體播放機、影像遊戲裝置、影像遊戲控制台、行動或衛星無線電電話、影像會議裝置、影像流裝置等等。Digital imaging capabilities can be incorporated into a wide range of devices, including digital televisions, digital live broadcasting systems, wireless broadcasting systems, personal digital assistants (PDAs), laptop or desktop computers, tablet computers, e-books Readers, digital cameras, digital recording devices, digital media players, video game devices, video game consoles, mobile or satellite radio phones, video conferencing devices, video streaming devices, etc.

數位影像裝置實施影像壓縮技術，例如由MPEG-2、MPEG-4、ITU-TH.263、ITU-TH.264/MPEG-4第10部分高級影像編解碼(advanced video coding，AVC)、ITU-TH .265高效率影像編解碼(high efficiency video coding，HEVC)標準定義的標準和所述標準的擴展部分中所描述的那些影像壓縮技術，從而更高效地發射及接收數位影像訊息。影像裝置可透過實施這些影像編解碼技術來更高效地發射、接收、編碼、解碼和/或儲存數位影像訊息。Digital video devices implement video compression techniques, such as those provided by MPEG-2, MPEG-4, ITU-TH.263, ITU-TH.264/MPEG-4 Part 10 Advanced Video Coding (AVC), ITU- The standard defined by the TH.265 high efficiency video coding (HEVC) standard and those video compression techniques described in extensions to the standard to transmit and receive digital video messages more efficiently. Video devices can transmit, receive, encode, decode and/or store digital video information more efficiently by implementing these video codec techniques.

隨著網路影像的激增，儘管數位影像壓縮技術不斷演進，但仍然對影像壓縮比提出更高要求。With the proliferation of online video, despite the continuous evolution of digital video compression technology, it still puts forward higher requirements for video compression ratio.

本申請實施例提供了一種圖像編碼方法、圖像解碼方法及相關裝置，以期實現自我調整的動態殘差補償，能有效地編碼不同形式的幀間殘差訊息。The embodiments of the present application provide an image encoding method, an image decoding method, and a related device, so as to realize self-adjusting dynamic residual compensation, and can effectively encode different forms of inter-frame residual information.

第一方面，本申請實施例提供一種圖像編碼方法，包括：In a first aspect, an embodiment of the present application provides an image encoding method, including:

獲取當前編碼塊的原始殘差塊，所述當前編碼塊包括當前處理的影像幀或者劃分所述當前處理的影像幀而得到的編碼單元；obtaining an original residual block of a current coding block, where the current coding block includes a currently processed image frame or a coding unit obtained by dividing the currently processed image frame;

根據所述原始殘差塊和預先訓練好的特徵預測模型，得到所述當前編碼塊的變換特徵；Obtain the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model;

對所述當前編碼塊的變換特徵進行量化，得到所述當前編碼塊的量化特徵；quantizing the transform feature of the current coding block to obtain the quantization feature of the current coding block;

透過預先訓練好的概率預測模型，確定所述當前編碼塊的量化特徵中每個像素的概率；Determine the probability of each pixel in the quantization feature of the current coding block through a pre-trained probability prediction model;

利用所述每個像素的概率生成所述當前編碼塊的二進位位元流。A binary bit stream for the current coding block is generated using the probability of each pixel.

相比於現有技術，本申請方案對當前預測幀進行自我調整的動態殘差補償並得到最終的幀間重建，能有效地編碼不同形式的幀間殘差訊息。Compared with the prior art, the solution of the present application performs self-adjusting dynamic residual compensation on the current prediction frame and obtains the final inter-frame reconstruction, which can effectively encode different forms of inter-frame residual information.

第二方面，本申請實施例提供一種圖像解碼方法，包括：In a second aspect, an embodiment of the present application provides an image decoding method, including:

獲取當前解碼塊的二進位位元流，所述當前解碼塊包括當前處理的影像幀的位元流或者劃分所述當前處理的影像幀而得到的解碼單元；obtaining a binary bit stream of a current decoding block, where the current decoding block includes a bit stream of a currently processed image frame or a decoding unit obtained by dividing the currently processed image frame;

透過預先訓練好的概率預測模型，將所述二進位位元流變換成所述當前解碼塊的量化特徵；Transforming the binary bit stream into a quantized feature of the current decoding block through a pre-trained probabilistic prediction model;

根據所述量化特徵和預先訓練好的殘差預測模型，確定所述當前解碼塊的殘差塊；Determine the residual block of the current decoding block according to the quantized feature and the pre-trained residual prediction model;

根據所述殘差塊與所述當前解碼塊的預測塊，確定所述當前解碼塊的重建塊。A reconstructed block of the current decoding block is determined according to the residual block and the prediction block of the current decoding block.

第三方面，本申請實施例提供一種圖像編碼裝置，包括：In a third aspect, an embodiment of the present application provides an image encoding apparatus, including:

獲取單元，用於獲取當前編碼塊的原始殘差塊，所述當前編碼塊包括當前處理的影像幀或者劃分所述當前處理的影像幀而得到的編碼單元；an obtaining unit, configured to obtain an original residual block of a current coding block, where the current coding block includes a currently processed image frame or a coding unit obtained by dividing the currently processed image frame;

第一預測單元，用於根據所述原始殘差塊和預先訓練好的特徵預測模型，得到所述當前編碼塊的變換特徵；The first prediction unit is used to obtain the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model;

量化單元，用於對所述當前編碼塊的變換特徵進行量化，得到所述當前編碼塊的量化特徵；a quantization unit, configured to quantize the transform feature of the current coding block to obtain the quantization feature of the current coding block;

第二預測單元，用於透過預先訓練好的概率預測模型，確定所述當前編碼塊的量化特徵中每個像素的概率；The second prediction unit is used to determine the probability of each pixel in the quantization feature of the current coding block through a pre-trained probability prediction model;

生成單元，用於利用所述每個像素的概率生成所述當前編碼塊的二進位位元流。A generating unit, configured to generate a binary bit stream of the current coding block by using the probability of each pixel.

第四方面，本申請實施例提供一種圖像解碼裝置，包括：In a fourth aspect, an embodiment of the present application provides an image decoding apparatus, including:

獲取單元，用於獲取當前解碼塊的二進位位元流，所述當前解碼塊包括當前處理的影像幀的位元流或者劃分所述當前處理的影像幀而得到的解碼單元；an acquisition unit, configured to acquire a binary bit stream of a current decoding block, where the current decoding block includes a bit stream of a currently processed image frame or a decoding unit obtained by dividing the currently processed image frame;

第一預測單元，用於透過預先訓練好的概率預測模型，將所述二進位位元流變換成所述當前解碼塊的量化特徵；a first prediction unit for converting the binary bit stream into a quantized feature of the current decoding block through a pre-trained probability prediction model;

第二預測單元，用於根據所述量化特徵和預先訓練好的殘差預測模型，確定所述當前解碼塊的殘差塊；a second prediction unit, configured to determine the residual block of the current decoding block according to the quantized feature and the pre-trained residual prediction model;

確定單元，用於根據所述殘差塊與所述當前解碼塊的預測塊，確定所述當前解碼塊的重建塊。A determination unit, configured to determine the reconstructed block of the current decoding block according to the residual block and the prediction block of the current decoding block.

第五方面，本申請實施例提供了一種編碼器，包括：處理器和耦合於所述處理器的記憶體；所述處理器用於執行上述第一方面所述的方法。In a fifth aspect, an embodiment of the present application provides an encoder, including: a processor and a memory coupled to the processor; the processor is configured to execute the method described in the first aspect.

第六方面，本申請實施例提供了一種解碼器，包括：處理器和耦合於所述處理器的記憶體；所述處理器用於執行上述第二方面所述的方法。In a sixth aspect, an embodiment of the present application provides a decoder, including: a processor and a memory coupled to the processor; the processor is configured to execute the method described in the second aspect above.

第七方面，本申請實施例提供了一種終端，所述終端包括：一個或多個處理器、記憶體和通訊介面；所述記憶體、所述通訊介面與所述一個或多個處理器連接；所述終端透過所述通訊介面與其他設備通訊，所述記憶體用於儲存電腦程式代碼，所述電腦程式代碼包括指令，當所述一個或多個處理器執行所述指令時，所述終端執行如第一方面或第二方面所述的方法。In a seventh aspect, an embodiment of the present application provides a terminal, the terminal includes: one or more processors, a memory, and a communication interface; the memory and the communication interface are connected to the one or more processors ; the terminal communicates with other devices through the communication interface, the memory is used to store computer program code, the computer program code includes instructions, when the one or more processors execute the instructions, the The terminal executes the method according to the first aspect or the second aspect.

第八方面，本申請實施例提供了一種電腦可讀儲存媒介，所述電腦可讀儲存媒介中儲存有指令，當所述指令在電腦上運行時，使得電腦執行上述第一方面或第二方面所述的方法。In an eighth aspect, an embodiment of the present application provides a computer-readable storage medium, where an instruction is stored in the computer-readable storage medium, and when the instruction is executed on a computer, the computer is made to execute the above-mentioned first aspect or the second aspect the method described.

第九方面，本申請實施例提供了一種包含指令的電腦程式產品，當所述指令在電腦上運行時，使得電腦執行上述第一方面或第二方面所述的方法。In a ninth aspect, an embodiment of the present application provides a computer program product including instructions, which, when the instructions are run on a computer, cause the computer to execute the method described in the first aspect or the second aspect.

為了使本發明的目的、技術方案及優點更加清楚明白，以下結合附圖及實施例，對本發明進行進一步詳細說明。應當理解，此處所描述的具體實施例僅僅用以解釋本發明，並不用於限定本發明。In order to make the objectives, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

可以理解，本發明所使用的術語“第一”、“第二”等可在本文中用於描述各種元件，但這些元件不受這些術語限制。這些術語僅用於將第一個元件與另一個元件區分。舉例來說，在不脫離本發明的範圍的情況下，可以將第一用戶端稱為第二用戶端，且類似地，可將第二用戶端稱為第一用戶端。第一用戶端和第二用戶端兩者都是用戶端，但其不是同一用戶端。It will be understood that the terms "first", "second", etc., as used herein, may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish a first element from another element. For example, a first user terminal may be referred to as a second user terminal, and similarly, a second user terminal may be referred to as a first user terminal without departing from the scope of the present invention. Both the first client and the second client are clients, but they are not the same client.

首先介紹一下本申請實施例中用到的術語和相關技術。First, terms and related technologies used in the embodiments of the present application are introduced.

影像中完整的圖像通常被稱為“幀”，由許多幀按照時間順序組成的影像也被稱為影像序列(Video Sequence)。影像序列存在空間冗餘、時間冗餘、視覺冗餘、訊息熵冗餘、結構冗餘、知識冗餘、重要性冗餘等一系列的冗餘訊息。為了盡可能的去除影像序列中的冗餘訊息，減少表徵影像的資料量，提出了影像編碼(Video Coding)技術，以達到減小儲存空間和節省傳輸頻寬的效果。影像編碼技術也稱為影像壓縮技術。A complete image in a video is usually called a "frame", and a video composed of many frames in time sequence is also called a video sequence. There are a series of redundant information in the image sequence, such as spatial redundancy, temporal redundancy, visual redundancy, information entropy redundancy, structural redundancy, knowledge redundancy, and importance redundancy. In order to remove redundant information in the image sequence as much as possible and reduce the amount of data representing the image, a video coding (Video Coding) technology is proposed to achieve the effect of reducing storage space and saving transmission bandwidth. Video coding technology is also called video compression technology.

就目前的技術發展現狀而言，影像編碼技術主要包括幀內預測、幀間預測、變換量化、熵編碼以及消塊濾波處理等。在國際通用範圍內，影像壓縮編碼標準，例如：由運動態影像專家群(Motion Picture Experts Group，MPEG) 制定的MPEG-2和MPEG-4第10部分高級影像編解碼(Advanced Video Coding，AVC) ，由國際電信聯盟電信標準化部門(International Telecommunication Union-Telecommunication Standardization Sector，ITU-T)制定的H.263、H.264和H .265高效率影像編解碼(High Efficiency Video Coding standard，HEVC)中，主流的壓縮編碼方式主要有四種：色度抽樣、預測編碼、變換編碼和量化編碼。As far as the current technological development is concerned, image coding technologies mainly include intra-frame prediction, inter-frame prediction, transform and quantization, entropy coding, and deblocking filtering. In the international scope, video compression coding standards, such as: MPEG-2 and MPEG-4 Part 10 Advanced Video Coding (AVC) formulated by Motion Picture Experts Group (MPEG) , in the H.263, H.264 and H.265 High Efficiency Video Coding standard (HEVC) formulated by the International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), There are four main compression coding methods: chroma sampling, predictive coding, transform coding and quantization coding.

預測編碼：利用之前已編碼幀的資料訊息來預測當前將要編碼的幀。編碼端透過預測得到一個預測值，該預測值與實際值之間存在著一定的殘差值。如果預測越適合，則預測值就會越接近實際值，殘差值就越小，這樣編碼端對殘差值進行編碼就能大大減小資料量。解碼端在解碼時，運用殘差值加上預測值還原重構出初始圖像。在主流編碼標準中，預測編碼分為幀內預測和幀間預測兩種基本類型。Predictive coding: Use the data information of previously coded frames to predict the current frame to be coded. The encoder obtains a predicted value through prediction, and there is a certain residual value between the predicted value and the actual value. If the prediction is more suitable, the predicted value will be closer to the actual value, and the residual value will be smaller, so that the encoding end can greatly reduce the amount of data by encoding the residual value. When decoding, the decoding end uses the residual value plus the prediction value to restore and reconstruct the original image. In mainstream coding standards, predictive coding is divided into two basic types: intra-frame prediction and inter-frame prediction.

幀間預測是基於運動補償(motion compensation)的預測技術，主要處理為確定當前塊的運動訊息，根據運動訊息從當前塊的參考幀中獲取參考圖像塊，產生當前塊的預測圖像。其中，當前塊使用前向預測、後向預測或雙向預測中的一種進行，預測方向透過運動訊息中的幀間預測方向指示，參考幀中用於預測當前塊的參考圖像塊相對於當前塊的位移向量透過運動訊息中的運動向量指示，一個運動向量對應有一個參考幀。一個圖像塊的幀間預測可以只透過一個運動向量，使用一個參考幀中的像素來生成預測圖像，稱為單向預測；也可以透過兩個運動向量，使用兩個參考幀中的像素來組合生成預測圖像，稱為雙向預測。也就是說，一個圖像塊通常可包含一個或兩個運動向量。對於一些多假設幀間預測(multi-hypothesis inter prediction)技術，一個圖像塊可能包含多於兩個運動向量。Inter prediction is a prediction technology based on motion compensation. The main processing is to determine the motion information of the current block, obtain a reference image block from a reference frame of the current block according to the motion information, and generate a prediction image of the current block. The current block is performed using one of forward prediction, backward prediction or bidirectional prediction, the prediction direction is indicated by the inter-frame prediction direction in the motion information, and the reference image block used to predict the current block in the reference frame is relative to the current block. The displacement vector of is indicated by the motion vector in the motion information, and one motion vector corresponds to one reference frame. The inter prediction of an image block can use only one motion vector to generate a predicted image using pixels in one reference frame, which is called unidirectional prediction; it can also use two motion vectors to use pixels in two reference frames. to combine to generate predicted images, called bidirectional prediction. That is, an image block may generally contain one or two motion vectors. For some multi-hypothesis inter prediction techniques, an image block may contain more than two motion vectors.

幀間預測透過參考幀索引(reference index ，ref_idx)指明參考幀(reference frame)，透過運動向量(motion vector ，MV)指示當前塊在參考幀中的參考塊(reference block)相對當前塊的位置偏移。一個MV是二維向量，包含水平方向位移分量和豎直方向位移分量；一個MV對應於兩個幀，每一幀具有一個圖像順序號(picture order count，POC)，用於表示圖像在顯示順序上的編號，所以一個MV也對應於一個POC差值。POC差值與時間間隔呈線性關係。運動向量的縮放通常採用基於POC差值的縮放方式，將一對圖像之間的運動向量轉換成另一對圖像之間的運動向量。The inter-frame prediction indicates the reference frame (reference frame) through the reference frame index (reference index, ref_idx), and indicates the position offset of the reference block (reference block) of the current block in the reference frame relative to the current block through the motion vector (motion vector, MV). shift. An MV is a two-dimensional vector, including a horizontal displacement component and a vertical displacement component; an MV corresponds to two frames, and each frame has a picture order count (POC), which is used to indicate that the image is in the The numbers are displayed in order, so an MV also corresponds to a POC difference. The POC difference has a linear relationship with the time interval. The scaling of the motion vector usually adopts the scaling method based on the POC difference, and converts the motion vector between one pair of images into the motion vector between the other pair of images.

常用的幀間預測模式有以下兩種。There are two commonly used inter prediction modes.

1)高級運動向量預測(advanced motion vector prediction，AMVP)模式：在位元流中標識當前塊使用的幀間預測方向(前向、後向或雙向)、參考幀索引(reference index)、運動向量預測值索引(motion vector predictor index ，MVP index)、運動向量殘差值(motion vector difference，MVD)；由幀間預測方向確定使用的參考幀佇列，由參考幀索引確定當前塊MV指向的參考幀，由運動向量預測值索引指示MVP列表中的一個MVP作為當前塊MV的預測值，一個MVP與一個MVD相加得到一個MV。1) Advanced motion vector prediction (AMVP) mode: identifies the inter prediction direction (forward, backward or bidirectional), reference frame index (reference index), motion vector used by the current block in the bitstream Predictor index (motion vector predictor index, MVP index), motion vector residual value (motion vector difference, MVD); the reference frame queue used is determined by the inter prediction direction, and the reference frame pointed to by the current block MV is determined by the reference frame index frame, an MVP in the MVP list is indicated by the motion vector predictor index as the predictor of the MV of the current block, and an MVP is added to an MVD to obtain an MV.

2)合併/跳躍(merge/skip)模式：位元流中標識融合索引(merge index)，根據融合索引(merge index)從融合候選者列表(merge candidate list)中選擇一個融合候選者(merge candidate)，當前塊的運動訊息(包括預測方向、參考幀、運動向量)由這個融合候選者(merge candidate)確定。merge模式和skip模式的主要區別在於，merge模式隱含當前塊有殘差訊息，而skip模式隱含當前塊沒有殘差訊息(或者說殘差為0)；這兩種模式匯出運動訊息的方式是一樣的。2) Merge/skip mode: the merge index is identified in the bit stream, and a merge candidate is selected from the merge candidate list according to the merge index. ), the motion information (including prediction direction, reference frame, and motion vector) of the current block is determined by this merge candidate. The main difference between the merge mode and the skip mode is that the merge mode implies that the current block has residual information, while the skip mode implies that the current block has no residual information (or the residual is 0); these two modes export motion information. The way is the same.

融合候選者具體是一種運動訊息資料結構體，包含幀間預測方向、參考幀、運動向量等多種訊息。當前塊可根據融合索引(merge index)從融合候選者列表(merge candidate list)中選擇對應的融合候選者，將融合候選者的運動訊息作為當前塊的運動訊息，或者對融合候選者的運動訊息經過縮放後作為當前塊的運動訊息。HEVC標準中，融合候選者可以是當前塊相鄰的圖像塊的運動訊息，稱為空間融合候選者(spatial merge candidate)；也可以是當前塊在另一已編碼圖像中對應位置圖像塊的運動訊息，稱為時間融合候選者(temporal merge candidate)。此外，融合候選者還可以是由一個融合候選者的前向運動訊息和另一個融合候選者的後向運動訊息組合而成的雙向預測融合候選者(bi-predictive merge candidate)，或者運動向量強制為0向量的零運動向量融合候選者(zero motion vector merge candidate)。The fusion candidate is specifically a motion information data structure, including various information such as inter-frame prediction direction, reference frame, and motion vector. The current block can select the corresponding fusion candidate from the merge candidate list according to the merge index, and use the motion information of the fusion candidate as the motion information of the current block, or use the motion information of the fusion candidate as the motion information of the fusion candidate. After scaling, it is used as the motion information of the current block. In the HEVC standard, a fusion candidate can be the motion information of image blocks adjacent to the current block, which is called a spatial merge candidate; it can also be a position image corresponding to the current block in another coded image The motion information of the block is called a temporal merge candidate. In addition, the fusion candidate may also be a bi-predictive merge candidate composed of forward motion information of one fusion candidate and backward motion information of another fusion candidate, or a motion vector forced A zero motion vector merge candidate that is a 0 vector.

其中，所述幀間預測單元的劃分包括2N×2N劃分方式(如圖4中的A所示)、N×N劃分方式(如圖4中的B所示)、N×2N劃分方式(如圖4中的C所示)、2N×N劃分方式(如圖4中的D所示)、2N×nD劃分方式(如圖4中的E所示)、2N×nU劃分方式 (如圖4中的F所示)、nL×2N劃分方式(如圖4中的G所示)、nR×2N劃分方式(如圖4中的H所示)。其中，N為任意正整數，n＝x×N，0≤x≤1。The division of the inter-frame prediction unit includes a 2N×2N division method (as shown in A in FIG. 4 ), an N×N division method (as shown in B in FIG. 4 ), and an N×2N division method (as shown in FIG. 4 ). C in Figure 4), 2N×N division (as shown in D in Figure 4), 2N×nD division (as shown in E in Figure 4), 2N×nU division (as shown in Figure 4 shown in F), nL×2N division method (as shown in G in FIG. 4 ), and nR×2N division method (as shown in H in FIG. 4 ). Among them, N is any positive integer, n=x×N, 0≤x≤1.

2N×2N劃分方式為對圖像塊不進行劃分；N×N劃分方式為將圖像塊劃分為四個等大的子圖像塊；N×2N劃分方式為將圖像塊劃分成左右兩個等大的子圖像塊；2N×N劃分方式為將圖像塊劃分成上下兩個等大的子圖像塊；2N×nD劃分方式為將圖像塊劃分為上下兩個子圖像塊，且圖像劃分線相對該圖像塊的垂直平分線下移n，其中，D表示圖像劃分線相對該圖像塊的垂直平分線下移；2N×nU劃分方式為將圖像塊劃分為上下兩個子圖像塊，且圖像劃分線相對該圖像塊的垂直平分線上移n，其中，U表示圖像劃分線相對該圖像塊的垂直平分線上移；nL×2N劃分方式為將圖像塊劃分為左右兩個子圖像塊，且圖像劃分線相對該圖像塊的垂直平分線左移n，其中，L表示圖像劃分線相對該圖像塊的垂直平分線左移；nR×2N劃分方式為將圖像塊劃分為左右兩個子圖像塊，且圖像劃分線相對該圖像塊的垂直平分線右移n，其中，R表示圖像劃分線相對該圖像塊的垂直平分線右移。The 2N×2N division method is to not divide the image block; the N×N division method is to divide the image block into four sub-image blocks of equal size; the N×2N division method is to divide the image block into two left and right sub-blocks. sub-image blocks of equal size; the 2N×N division method is to divide the image block into two equal-sized sub-image blocks; the 2N×nD division method is to divide the image block into two upper and lower sub-image blocks block, and the image dividing line moves down by n relative to the vertical bisector of the image block, where D indicates that the image dividing line moves down relative to the vertical bisector of the image block; the 2N×nU division method is to divide the image block It is divided into upper and lower sub-image blocks, and the image division line is shifted relative to the vertical bisector of the image block by n, where U means that the image division line is moved relative to the vertical bisector of the image block; nL×2N division The method is to divide the image block into left and right sub-image blocks, and the image division line is shifted to the left by n relative to the vertical bisector of the image block, where L represents the vertical bisector of the image division line relative to the image block. The line is shifted to the left; the nR×2N division method is to divide the image block into left and right sub-image blocks, and the image division line is shifted to the right relative to the vertical bisector of the image block by n, where R represents the image division line Shifts right relative to the vertical bisector of the image block.

對於圖像的劃分，為了更加靈活的表示影像內容，高效率影像編解碼(High Efficiency Video Coding standard，HEVC)技術中定義了編碼樹單元(coding tree unit，CTU)、編碼塊(Coding Unit，CU)、預測單元(Prediction Unit，PU)和變換單元(Transform Unit，TU)。CTU、CU、PU和TU均為圖像塊。For the division of images, in order to represent the video content more flexibly, the High Efficiency Video Coding standard (HEVC) technology defines a coding tree unit (CTU), a coding block (Coding Unit, CU) ), prediction unit (Prediction Unit, PU) and transform unit (Transform Unit, TU). CTU, CU, PU and TU are all image blocks.

編碼樹單元CTU，一幅圖像由多個CTU構成，一個CTU通常對應於一個方形圖像區域，包含這個圖像區域中的亮度像素和色度像素(或者也可以只包含亮度像素，或者也可以只包含色度像素)；CTU中還包含語法元素，這些語法元素指示如何將CTU劃分成至少一個編碼塊(coding unit，CU)，以及解碼每個編碼塊得到重建圖像的方法。如圖1所示，圖像1由多個CTU構成(包括CTU A、CTU B、CTU C等)。與某一CTU對應的編碼訊息包含與該CTU對應的方形圖像區域中的像素的亮度值和/或色度值。此外，與某一CTU對應的編碼訊息還可以包含語法元素，這些語法元素指示如何將該CTU劃分成至少一個CU，以及解碼每個CU以得到重建圖像的方法。一個CTU對應的圖像區域可以包括64×64、128×128或256×256個像素。在一個示例中，64×64個像素的CTU包含由64列、每列64個像素的矩形像素點陣，每個像素包含亮度分量和/或色度分量。CTU也可以對應矩形圖像區域或者其它形狀的圖像區域，一個CTU對應的圖像區域也可以是水平方向的像素點的數量與豎直方向的像素點數量不同的圖像區域，例如包括64×128個像素。Coding tree unit CTU, an image is composed of multiple CTUs, and a CTU usually corresponds to a square image area, including luma pixels and chroma pixels in this image area (or can only contain luma pixels, or also can only contain chroma pixels); the CTU also contains syntax elements that indicate how to divide the CTU into at least one coding block (coding unit, CU), and the method for decoding each coding block to obtain a reconstructed image. As shown in FIG. 1 , an image 1 is composed of a plurality of CTUs (including CTU A, CTU B, CTU C, etc.). The encoded information corresponding to a certain CTU contains the luminance and/or chrominance values of the pixels in the square image area corresponding to the CTU. In addition, the encoded information corresponding to a certain CTU may also include syntax elements indicating how to divide the CTU into at least one CU, and a method of decoding each CU to obtain a reconstructed image. An image area corresponding to one CTU may include 64×64, 128×128 or 256×256 pixels. In one example, a 64x64 pixel CTU contains a rectangular pixel lattice of 64 columns of 64 pixels, each pixel containing a luma component and/or a chrominance component. A CTU can also correspond to a rectangular image area or an image area of other shapes, and an image area corresponding to a CTU can also be an image area with a different number of pixels in the horizontal direction and the number of pixels in the vertical direction, for example, including 64 ×128 pixels.

編碼塊CU，通常對應於圖像中一個A×B的矩形區域，包含A×B亮度像素或/和它對應的色度像素，A為矩形的寬，B為矩形的高，A和B可以相同也可以不同，A和B的取值通常為2的整數次冪，例如128、64、32、16、8、4。其中，本申請實施例中涉及到的寬是指圖1示出的二維直角坐標系XoY中沿X軸方向(水平方向)的長度，高是指圖1示出的二維直角坐標系XoY中沿Y軸方向(豎直方向)的長度。一個CU的重建圖像可以透過預測圖像與殘差圖像相加得到，預測圖像透過幀內預測或幀間預測生成，具體可以由一個或多個預測塊(prediction block，PB)構成，殘差圖像透過對變換係數進行反量化和反變換處理生成，具體可以由一個或多個變換塊(transform block，TB)構成。具體的，一個CU包含編碼訊息，編碼訊息包括預測模式、變換係數等訊息，按照這些編碼訊息對CU進行相應的預測、反量化、反變換等解碼處理，產生這個CU對應的重建圖像。編碼樹單元CTU與編碼塊CU關係如圖2所示。Coding block CU, usually corresponding to an A×B rectangular area in the image, including A×B luminance pixels or/and its corresponding chrominance pixels, A is the width of the rectangle, B is the height of the rectangle, A and B can be The same or different, the values of A and B are usually integer powers of 2, such as 128, 64, 32, 16, 8, and 4. The width involved in the embodiments of the present application refers to the length along the X-axis direction (horizontal direction) in the two-dimensional rectangular coordinate system XoY shown in FIG. 1 , and the height refers to the two-dimensional rectangular coordinate system XoY shown in FIG. 1 . The length along the Y-axis direction (vertical direction) in the middle. The reconstructed image of a CU can be obtained by adding the predicted image and the residual image. The predicted image is generated through intra-frame prediction or inter-frame prediction, and can be specifically composed of one or more prediction blocks (PB), The residual image is generated by performing inverse quantization and inverse transform processing on the transform coefficients, and may be specifically composed of one or more transform blocks (transform blocks, TB). Specifically, a CU contains encoded information, and the encoded information includes information such as prediction mode and transform coefficients. According to the encoded information, the CU is subjected to corresponding decoding processes such as prediction, inverse quantization, and inverse transformation, and a reconstructed image corresponding to the CU is generated. The relationship between the coding tree unit CTU and the coding block CU is shown in FIG. 2 .

數位影像壓縮技術作用於顏色編碼方法為YCbCr，也可稱為YUV，顏色格式為4:2:0、4:2:2或4:4:4的影像序列。其中，Y表示明亮度(Luminance或Luma)，也就是灰階值，Cb表示藍色色度分量，Cr表示紅色色度分量，U和V表示色度(Chrominance或Chroma)，用於描述色彩及飽和度。在顏色格式上，4:2:0表示每4個像素有4個亮度分量，2個色度分量(YYYYCbCr)，4:2:2表示每4個像素有4個亮度分量，4個色度分量(YYYYCbCrCbCr)，而4:4:4表示全像素顯示(YYYYCbCrCbCrCbCrCbCr)，圖3展示了不同顏色格式下的各分量分佈圖，其中圓形為Y分量，三角形為UV分量。Digital image compression technology works on image sequences whose color coding method is YCbCr, also known as YUV, and whose color format is 4:2:0, 4:2:2 or 4:4:4. Among them, Y represents the brightness (Luminance or Luma), that is, the grayscale value, Cb represents the blue chrominance component, Cr represents the red chrominance component, and U and V represent the chrominance (Chrominance or Chroma), which is used to describe color and saturation. Spend. In the color format, 4:2:0 means that every 4 pixels has 4 luminance components and 2 chrominance components (YYYYCbCr), and 4:2:2 means that every 4 pixels has 4 luminance components and 4 chrominance components. component (YYYYCbCrCbCr), and 4:4:4 represents full pixel display (YYYYCbCrCbCrCbCrCbCr), Figure 3 shows the distribution of each component in different color formats, where the circle is the Y component and the triangle is the UV component.

預測單元PU，是幀內預測、幀間預測的基本單元。定義圖像塊的運動訊息包含幀間預測方向、參考幀、運動向量等，正在進行編碼處理的圖像塊稱為當前編碼塊(current coding block，CCB)，正在進行解碼處理的圖像塊稱為當前解碼塊(current decoding block，CDB)，例如正在對一個圖像塊進行預測處理時，當前編碼塊或者當前解碼塊為預測塊；正在對一個圖像塊進行殘差處理時，當前編碼塊或者當前解碼塊為變換塊。當前編碼塊或當前解碼塊所在的圖像稱為當前幀。當前幀中，位於當前塊的左側或上側的圖像塊可能處於當前幀內部並且已經完成了編碼/解碼處理，得到了重建圖像，它們稱為重構塊；重構塊的編碼模式、重建像素等訊息是可以獲得的(available)。在當前幀進行編碼/解碼之前已經完成編碼/解碼處理的幀稱為重建幀。當前幀為單向預測幀(P幀)或雙向預測幀(B幀)時，它分別具有一個或兩個參考幀列表，兩個列表分別稱為L0和L1，每個列表中包含至少一個重建幀，稱為當前幀的參考幀。參考幀為當前幀的幀間預測提供參考像素。The prediction unit PU is the basic unit of intra prediction and inter prediction. The motion information that defines the image block includes inter prediction direction, reference frame, motion vector, etc. The image block that is being encoded is called the current coding block (CCB), and the image block that is being decoded is called the current coding block (CCB). is the current decoding block (CDB), for example, when prediction processing is being performed on an image block, the current coding block or the current decoding block is the prediction block; when residual processing is being performed on an image block, the current coding block Or the current decoding block is a transform block. The picture in which the current coding block or the current decoding block is located is called the current frame. In the current frame, the image blocks located on the left or upper side of the current block may be inside the current frame and have completed the encoding/decoding process to obtain a reconstructed image, which are called reconstructed blocks; the encoding mode of the reconstructed block, reconstruction Information such as pixels is available. A frame that has completed encoding/decoding before the current frame is encoded/decoded is called a reconstructed frame. When the current frame is a unidirectional prediction frame (P frame) or a bidirectional prediction frame (B frame), it has one or two reference frame lists respectively, and the two lists are called L0 and L1 respectively, and each list contains at least one reconstructed frame. frame, referred to as the reference frame for the current frame. The reference frame provides reference pixels for inter prediction of the current frame.

變換單元TU，對原始圖像塊和預測圖像塊的殘差進行處理。The transform unit TU processes the residuals of the original image block and the predicted image block.

像素（又稱為像素點），是指圖像中的像素點，如編碼塊中的像素點、亮度分量像素塊中的像素點（又稱為亮度像素）、色度分量像素塊中的像素點（又稱為色度像素）等。Pixels (also known as pixel points) refer to pixels in an image, such as pixels in coding blocks, pixels in luminance component pixel blocks (also known as luminance pixels), and pixels in chrominance component pixel blocks points (aka chroma pixels), etc.

樣本（又稱為像素值、樣本值），是指像素點的像素值，該像素值在亮度分量域具體是指亮度（即灰階值），該像素值在色度分量域具體是指色度值（即色彩和飽和度），按照處理階段的不同，一個像素的樣本具體包括原始樣本、預測樣本和重構樣本。The sample (also known as pixel value, sample value) refers to the pixel value of a pixel point. The pixel value in the luminance component domain specifically refers to the brightness (ie grayscale value), and the pixel value in the chrominance component domain specifically refers to the color. The degree value (ie color and saturation), according to the different processing stages, the sample of a pixel specifically includes the original sample, the predicted sample and the reconstructed sample.

目前，隨著深度學習的發展和成熟，基於深度學習的影像影像處理和編碼被廣泛研究。透過資料驅動的方法以及端到端學習的方式，深度神經網路能基於位元率失真優化端到端整個系統。卷積神經網路採用可學習的特徵變換，可微分量化，動態的概率分佈估計能更高效地去除影像圖像之間的冗餘，得到更緊湊的影像圖像特徵空間表達，在相同的碼率情況下能得到更高的重建品質。同時，基於特定神經網路硬體加速和開發，有利於進一步推進基於學習的編解碼系統的加速與落地。然而，由於影像編解碼的複雜性，實現完整的端到端基於學習的影像編碼方法仍是這個領域亟待解決的問題，每個特定模組的優化與分析以及其對整個端到端系統的影響仍有很大的不確定性和研究價值。國內外針對基於學習的端到端影像編碼系統的標準工作剛開始進行，MPEG和AVS對於智慧編碼標準化基本都處於call for evidence的階段。At present, with the development and maturity of deep learning, image processing and coding based on deep learning have been widely studied. Through a data-driven approach and end-to-end learning, deep neural networks can optimize the entire system end-to-end based on bit rate distortion. The convolutional neural network adopts learnable feature transformation, can be differentiated and quantified, and the dynamic probability distribution estimation can more efficiently remove the redundancy between image images and obtain a more compact image image feature space representation. A higher reconstruction quality can be obtained at low rates. At the same time, hardware acceleration and development based on a specific neural network is conducive to further promoting the acceleration and implementation of learning-based encoding and decoding systems. However, due to the complexity of image encoding and decoding, implementing a complete end-to-end learning-based image encoding method is still an urgent problem to be solved in this field. The optimization and analysis of each specific module and its impact on the entire end-to-end system There is still great uncertainty and research value. The standard work for the end-to-end image coding system based on learning has just started at home and abroad, and MPEG and AVS are basically in the stage of call for evidence for the standardization of smart coding.

現有的端到端系統方案中，直接採用端到端幀內編碼處理殘差訊息，沒有考慮殘差訊息的特殊性以及預測後的不均勻分佈性，沒有嵌入殘差稀疏化方法來近似傳統編碼方法中的skip模式。In the existing end-to-end system solutions, end-to-end intra-frame coding is directly used to process residual information, without considering the particularity of residual information and the uneven distribution after prediction, and there is no embedded residual sparsification method to approximate traditional coding. skip mode in the method.

針對上述問題，本申請實施例提供一種圖像編碼方法、編碼方法及相關裝置，下面結合本申請實施例中的附圖，對本申請實施例中的技術方案進行清楚、完整地描述。In response to the above problems, the embodiments of the present application provide an image encoding method, an encoding method, and a related apparatus. The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application.

圖5為本申請實施例中所描述的一種實例的編解碼系統1的方塊圖，編解碼系統1包括影像編碼器100和影像解碼器200，影像編碼器100和影像解碼器200用於實現本申請提出的基於學習的端到端自我調整幀間殘差編碼方法。FIG. 5 is a block diagram of an example encoding/decoding system 1 described in the embodiments of the application. The encoding/decoding system 1 includes an image encoder 100 and an image decoder 200. The image encoder 100 and the image decoder 200 are used to implement the present invention. This paper proposes a learning-based end-to-end self-adjusting inter-frame residual coding method.

如圖5中所示，編解碼系統1包含源裝置10和目的裝置20。源裝置10產生經編碼影像資料。因此，源裝置10可被稱為影像編碼裝置。目的裝置20可對由源裝置10所產生的經編碼的影像資料進行解碼。因此，目的裝置20可被稱為影像解碼裝置。源裝置10、目的裝置20或兩個的各種實施方案可包含一或多個處理器以及耦合到所述一或多個處理器的記憶體。所述記憶體可包含但不限於RAM、ROM、EEPROM、快閃記憶體或可用於以可由電腦存取的指令或資料結構的形式儲存所要的程式碼的任何其它媒體，如本文所描述。As shown in FIG. 5 , the codec system 1 includes a source device 10 and a destination device 20 . Source device 10 generates encoded video data. Accordingly, the source device 10 may be referred to as an image encoding device. Destination device 20 may decode the encoded image data generated by source device 10 . Therefore, the destination device 20 may be referred to as an image decoding device. Various implementations of source device 10, destination device 20, or both may include one or more processors and memory coupled to the one or more processors. The memory may include, but is not limited to, RAM, ROM, EEPROM, flash memory, or any other medium that can be used to store the desired code in the form of instructions or data structures that can be accessed by a computer, as described herein.

源裝置10和目的裝置20可以包括各種裝置，包含桌上型電腦、行動計算裝置、筆記型(例如，膝上型)電腦、平板電腦、機上盒、例如所謂的“智慧”電話等電話手持機、電視機、相機、顯示裝置、數位媒體播放機、影像遊戲控制台、車載電腦或其類似者。Source device 10 and destination device 20 may include a variety of devices, including desktop computers, mobile computing devices, notebook (eg, laptop) computers, tablet computers, set-top boxes, telephone handhelds such as so-called "smart" phones, etc. computer, television, camera, display device, digital media player, video game console, car computer or the like.

目的裝置20可經由鏈路30從源裝置10接收經編碼影像資料。鏈路30可包括能夠將經編碼影像資料從源裝置10移動到目的裝置20的一或多個媒體或裝置。在一個實例中，鏈路30可包括使得源裝置10能夠即時將經編碼影像資料直接發射到目的裝置20的一或多個通訊媒體。在此實例中，源裝置10可根據通訊標準(例如無線通訊協定)來調製經編碼影像資料，且可將經調製的影像資料發射到目的裝置20。所述一或多個通訊媒體可包含無線和/或有線通訊媒體，例如射頻(RF)頻譜或一或多個實體傳輸線。所述一或多個通訊媒體可形成基於分組的網路的一部分，基於分組的網路例如為局域網、廣域網路或全球網路(例如，網際網路)。所述一或多個通訊媒體可包含路由器、交換器、基地台或促使從源裝置10到目的裝置20的通訊的其它設備。在另一實例中，可將經編碼資料從輸出介面140輸出到儲存裝置40。Destination device 20 may receive encoded image data from source device 10 via link 30 . Link 30 may include one or more media or devices capable of moving encoded image data from source device 10 to destination device 20 . In one example, link 30 may include one or more communication media that enable source device 10 to transmit encoded image data directly to destination device 20 in real time. In this example, source device 10 may modulate the encoded image data according to a communication standard, such as a wireless communication protocol, and may transmit the modulated image data to destination device 20 . The one or more communication media may include wireless and/or wired communication media, such as radio frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (eg, the Internet). The one or more communication media may include routers, switches, base stations, or other equipment that facilitates communication from source device 10 to destination device 20 . In another example, the encoded data may be output from output interface 140 to storage device 40 .

本申請的圖像編解碼技術可應用於影像編解碼以支援多種多媒體應用，例如空中電視廣播、有線電視發射、衛星電視發射、串流影像發射(例如，經由網際網路)、用於儲存於資料儲存媒體上的影像資料的編碼、儲存在資料儲存媒體上的影像資料的解碼，或其它應用。在一些實例中，編解碼系統1可用於支援單向或雙向影像傳輸以支援例如影像資料流、影像重播、影像廣播和/或影像電話等應用。The image encoding and decoding techniques of the present application can be applied to image encoding and decoding to support a variety of multimedia applications, such as over-the-air television broadcasting, cable television transmission, satellite television transmission, streaming image transmission (eg, via the Internet), storage in Encoding of image data on a data storage medium, decoding of image data stored on a data storage medium, or other applications. In some examples, the codec system 1 may be used to support one-way or two-way video transmission to support applications such as video data streaming, video playback, video broadcasting, and/or video telephony.

圖5中所說明的編解碼系統1僅為實例，並且本申請的技術可適用於未必包含編碼裝置與解碼裝置之間的任何資料通訊的影像解碼設置(例如，影像編碼或影像解碼)。在其它實例中，資料從本機記憶體檢索、在網路上資料流等等。影像編碼裝置可對資料進行編碼並且將資料儲存到記憶體，和/或影像解碼裝置可從記憶體檢索資料並且對資料進行解碼。在許多實例中，由並不彼此通訊而是僅編碼資料到記憶體和/或從記憶體檢索資料且解碼資料的裝置執行編碼和解碼。The codec system 1 illustrated in FIG. 5 is merely an example, and the techniques of this application may be applicable to image decoding setups (eg, image encoding or image decoding) that do not necessarily include any data communication between an encoding device and a decoding device. In other examples, data is retrieved from local memory, data is streamed over a network, and so on. The image encoding device may encode the data and store the data to memory, and/or the image decoding device may retrieve the data from the memory and decode the data. In many instances, encoding and decoding is performed by devices that do not communicate with each other but only encode data to and/or retrieve data from memory and decode the data.

在圖5的實例中，源裝置10包含影像源120、影像編碼器100和輸出介面140。在一些實例中，輸出介面140可包含調節器/解調器(數據機)和/或發射器。影像源120可包括影像捕獲裝置(例如，攝影機)、含有先前捕獲的影像資料的影像存檔、用以從影像內容提供者接收影像資料的影像饋入介面，和/或用於產生影像資料的電腦圖形系統，或影像資料的此些來源的組合。In the example of FIG. 5 , source device 10 includes video source 120 , video encoder 100 , and output interface 140 . In some examples, output interface 140 may include a modulator/demodulator (modem) and/or a transmitter. Image source 120 may include an image capture device (eg, a camera), an image archive containing previously captured image data, an image feed interface for receiving image data from image content providers, and/or a computer for generating image data A graphics system, or a combination of such sources of image data.

影像編碼器100可對來自影像源120的影像資料進行編碼。在一些實例中，源裝置10經由輸出介面140將經編碼影像資料直接發射到目的裝置20。在其它實例中，經編碼影像資料還可儲存到儲存裝置40上，供目的裝置20以後存取來用於解碼和/或播放。Image encoder 100 may encode image data from image source 120 . In some examples, source device 10 transmits the encoded image data directly to destination device 20 via output interface 140 . In other examples, the encoded image data may also be stored on storage device 40 for later access by destination device 20 for decoding and/or playback.

在圖5的實例中，目的裝置20包含輸入介面240、影像解碼器200和顯示裝置220。在一些實例中，輸入介面240包含接收器和/或數據機。輸入介面240可經由鏈路30和/或從儲存裝置40接收經編碼影像資料。顯示裝置220可與目的裝置20集成或可在目的裝置20外部。一般來說，顯示裝置220顯示經解碼影像資料。顯示裝置220可包括多種顯示裝置，例如，液晶顯示器(LCD)、等離子顯示器、有機發光二極體(OLED)顯示器或其它類型的顯示裝置。In the example of FIG. 5 , destination device 20 includes input interface 240 , video decoder 200 and display device 220 . In some examples, input interface 240 includes a receiver and/or modem. Input interface 240 may receive encoded image data via link 30 and/or from storage device 40 . The display device 220 may be integrated with the destination device 20 or may be external to the destination device 20 . Generally, display device 220 displays decoded image data. The display device 220 may include various display devices, such as a liquid crystal display (LCD), a plasma display, an organic light emitting diode (OLED) display, or other types of display devices.

儘管圖5中未圖示，但在一些方面，影像編碼器100和影像解碼器200可各自與音訊編碼器和解碼器集成，且可包含適當的多工器-多路分用器單元或其它硬體和軟體，以處置共同資料流程或單獨資料流程中的音訊和影像兩者的編碼。Although not shown in FIG. 5, in some aspects video encoder 100 and video decoder 200 may each be integrated with an audio encoder and decoder, and may include appropriate multiplexer-demultiplexer units or other Hardware and software to handle the encoding of both audio and video in a common data flow or separate data flows.

影像編碼器100和影像解碼器200各自可實施為例如以下各項的多種電路中的任一者：一或多個微處理器、數位訊號處理器(DSP)、專用積體電路(ASIC)、場域可程式閘陣列(FPGA)、離散邏輯、硬體或其任何組合。如果部分地以軟體來實施本申請，那麼裝置可將用於軟體的指令儲存在合適的非揮發性電腦可讀儲存媒體中，且可使用一或多個處理器在硬體中執行所述指令從而實施本申請技術。前述內容(包含硬體、軟體、硬體與軟體的組合等)中的任一者可被視為一或多個處理器。影像編碼器100和影像解碼器200中的每一者可包含在一或多個編碼器或解碼器中，所述編碼器或解碼器中的任一者可集成為相應裝置中的組合編碼器/解碼器(編碼解碼器)的一部分。Image encoder 100 and image decoder 200 may each be implemented as any of a variety of circuits such as one or more microprocessors, digital signal processors (DSPs), application specific integrated circuits (ASICs), Field Programmable Gate Array (FPGA), discrete logic, hardware, or any combination thereof. If the application is implemented in part in software, a device may store instructions for the software in a suitable non-volatile computer-readable storage medium and execute the instructions in hardware using one or more processors Thus, the technology of the present application is implemented. Any of the foregoing (including hardware, software, a combination of hardware and software, etc.) may be considered to be one or more processors. Each of image encoder 100 and image decoder 200 may be included in one or more encoders or decoders, either of which may be integrated as a combined encoder in the respective device /decoder (codec) part.

圖6為本申請實施例中所描述的一種影像編碼器100的示例方塊圖。影像編碼器100用於將影像輸出到後處理實體41。後處理實體41表示可處理來自影像編碼器100的經編碼影像資料的影像實體的實例，例如媒體感知網路元件(MANE)或拼接/編輯裝置。在一些情況下，後處理實體41可為網路實體的實例。在一些影像編碼系統中，後處理實體41和影像編碼器100可為單獨裝置的若干部分，而在其它情況下，相對於後處理實體41所描述的功能性可由包括影像編碼器100的相同裝置執行。在某一實例中，後處理實體41是圖1的儲存裝置40的實例。FIG. 6 is an exemplary block diagram of an image encoder 100 described in an embodiment of the present application. The video encoder 100 is used to output the video to the post-processing entity 41 . Post-processing entity 41 represents an example of an image entity that can process encoded image data from image encoder 100, such as a media aware network element (MANE) or a stitching/editing device. In some cases, post-processing entity 41 may be an instance of a network entity. In some image encoding systems, post-processing entity 41 and image encoder 100 may be parts of separate devices, while in other cases, the functionality described with respect to post-processing entity 41 may be implemented by the same device that includes image encoder 100 implement. In an example, post-processing entity 41 is an example of storage device 40 of FIG. 1 .

在圖6的實例中，影像編碼器100包括預測處理單元108、濾波器單元106、記憶體107、求和器112、變換器101、量化器102和熵編碼器103。預測處理單元108包括幀間預測器110和幀內預測器109。為了圖像塊重構，影像編碼器100還包含反量化器104、反變換器105和求和器111。濾波器單元106表示一個或多個迴路濾波器，例如去塊濾波器、自適應迴路濾波器(ALF)和取樣自適應偏移 (SAO)濾波器。儘管在圖6中將濾波器單元106示出為迴路內濾波器，但在其它實現方式下，可將濾波器單元106實施為迴路後濾波器。在一種示例下，影像編碼器100還可以包括影像資料記憶體、分割單元(圖中未示意)。In the example of FIG. 6 , image encoder 100 includes prediction processing unit 108 , filter unit 106 , memory 107 , summer 112 , transformer 101 , quantizer 102 , and entropy encoder 103 . The prediction processing unit 108 includes an inter predictor 110 and an intra predictor 109 . For image block reconstruction, the image encoder 100 further includes an inverse quantizer 104 , an inverse transformer 105 and a summer 111 . Filter unit 106 represents one or more in-loop filters, such as deblocking filters, adaptive in-loop filters (ALF), and sample adaptive offset (SAO) filters. Although the filter unit 106 is shown in FIG. 6 as an in-loop filter, in other implementations, the filter unit 106 may be implemented as a post-loop filter. In an example, the image encoder 100 may further include an image data memory and a division unit (not shown in the figure).

圖7為本申請實施例中所描述的一種影像解碼器200的示例方塊圖。在圖7的實例中，影像解碼器200包括熵解碼器203、預測處理單元208、反量化器204、反變換器205、求和器211、濾波器單元206以及記憶體207。預測處理單元208可以包括幀間預測器210和幀內預測器209。在一些實例中，影像解碼器200可執行大體上與相對於來自圖6的影像編碼器100描述的編碼過程互逆的解碼過程。FIG. 7 is an exemplary block diagram of an image decoder 200 described in an embodiment of the present application. In the example of FIG. 7 , image decoder 200 includes entropy decoder 203 , prediction processing unit 208 , inverse quantizer 204 , inverse transformer 205 , summer 211 , filter unit 206 , and memory 207 . Prediction processing unit 208 may include inter predictor 210 and intra predictor 209 . In some examples, image decoder 200 may perform a decoding process that is substantially the reciprocal of the encoding process described with respect to image encoder 100 from FIG. 6 .

在解碼過程中，影像解碼器200從影像編碼器100接收表示經編碼影像條帶的圖像塊和相關聯的語法元素的經編碼影像位元流。影像解碼器200可從網路實體42接收影像資料，可選的，還可以將所述影像資料儲存在影像資料記憶體(圖中未示意)中。影像資料記憶體可儲存待由影像解碼器200的元件解碼的影像資料，例如經編碼影像位元流。儲存在影像資料記憶體中的影像資料，例如可從儲存裝置40、從相機等本地影像源、經由影像資料的有線或無線網路通訊或者透過存取實體資料儲存媒體而獲得。影像資料記憶體可作為用於儲存來自經編碼影像位元流的經編碼影像資料的經解碼圖像緩衝器(CPB)。During the decoding process, image decoder 200 receives from image encoder 100 an encoded image bitstream representing image blocks of an encoded image slice and associated syntax elements. The image decoder 200 may receive image data from the network entity 42, and optionally, may also store the image data in an image data memory (not shown in the figure). Image data memory may store image data, such as an encoded image bitstream, to be decoded by the components of image decoder 200 . The image data stored in the image data memory can be obtained, for example, from the storage device 40, from a local image source such as a camera, through wired or wireless network communication of the image data, or by accessing a physical data storage medium. The video data memory may act as a decoded picture buffer (CPB) for storing encoded video data from the encoded video bitstream.

網路實體42可例如為伺服器、MANE、影像編輯器/剪接器，或用於實施上文所描述的技術中的一或多者的其它此裝置。網路實體42可包括或可不包括影像編碼器，例如影像編碼器100。在網路實體42將經編碼影像位元流發送到影像解碼器200之前，網路實體42可實施本申請中描述的技術中的部分。在一些影像解碼系統中，網路實體42和影像解碼器200可為單獨裝置的部分，而在其它情況下，相對於網路實體42描述的功能性可由包括影像解碼器200的相同裝置執行。Network entity 42 may be, for example, a server, a MANE, a video editor/splicer, or other such device for implementing one or more of the techniques described above. Network entity 42 may or may not include a video encoder, such as video encoder 100 . Before network entity 42 sends the encoded video bitstream to video decoder 200, network entity 42 may implement portions of the techniques described in this application. In some video decoding systems, network entity 42 and video decoder 200 may be part of separate devices, while in other cases, functionality described with respect to network entity 42 may be performed by the same device that includes video decoder 200 .

應當理解的是，影像解碼器200的其它結構變化可用於解碼經編碼影像位元流。例如，影像解碼器200可以不經濾波器單元206處理而生成輸出影像流；或者，對於某些圖像塊或者圖像幀，影像解碼器200的熵解碼器203沒有解碼出經量化的係數，相應地不需要經反量化器204和反變換器205處理。It should be understood that other structural variations of the video decoder 200 may be used to decode the encoded video bitstream. For example, the image decoder 200 may generate an output image stream without being processed by the filter unit 206; or, for some image blocks or image frames, the entropy decoder 203 of the image decoder 200 does not decode quantized coefficients, Accordingly, processing by inverse quantizer 204 and inverse transformer 205 is not required.

圖8A為本申請實施例中圖像編碼方法的一種流程示意圖，該圖像編碼方法可以應用於圖5示出的編解碼系統1中的源裝置10或圖6示出的影像編碼器100。圖8A示出的流程以執行主體為圖6示出的影像編碼器100為例進行說明。如圖8A所示，本申請實施例提供的圖像編碼方法包括：FIG. 8A is a schematic flowchart of an image encoding method in an embodiment of the present application, and the image encoding method may be applied to the source device 10 in the encoding/decoding system 1 shown in FIG. 5 or the image encoder 100 shown in FIG. 6 . The flow shown in FIG. 8A is described by taking the execution subject as the video encoder 100 shown in FIG. 6 as an example. As shown in FIG. 8A , the image coding method provided by the embodiment of the present application includes:

步驟S110，獲取當前編碼塊的原始殘差塊，所述當前編碼塊包括當前處理的影像幀或者劃分所述當前處理的影像幀而得到的編碼單元。Step S110: Obtain an original residual block of a current coding block, where the current coding block includes a currently processed image frame or a coding unit obtained by dividing the currently processed image frame.

其中，所述編碼單元的劃分方式包括如圖4所示的各種劃分方式，此處不做唯一限定。Wherein, the division manner of the coding unit includes various division manners as shown in FIG. 4 , which is not uniquely limited here.

具體實現中，針對當前編碼塊為當前處理的影像幀的情況，由於最小資料處理物件為單幀圖像，因此該方法處理效率更高，但精度和性能有一定損失。In the specific implementation, for the case where the current coding block is the currently processed image frame, since the minimum data processing object is a single frame image, the processing efficiency of this method is higher, but the accuracy and performance are lost to a certain extent.

針對當前編碼塊為劃分所述當前處理的影像幀而得到的編碼單元的情況，由於最小資料處理顆粒度為劃分後的編碼單元，因此整體演算法處理過程複雜度變高，處理時長變長，但精度和性能相對較高。For the case where the current coding block is a coding unit obtained by dividing the currently processed image frame, since the minimum data processing granularity is the coding unit after the division, the overall algorithm processing complexity becomes higher and the processing time becomes longer , but with relatively high accuracy and performance.

步驟S120，根據所述原始殘差塊和預先訓練好的特徵預測模型，得到所述當前編碼塊的變換特徵。Step S120: Obtain the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model.

其中，所述特徵預測模型具體可以透過本端設備的影像處理器GPU實現資料處理，可以採用任意常用的神經網路架構，例如深度神經網路（Deep Neural Network，DNN）、支援向量機等，該模型輸入為殘差塊，輸出為變換特徵。Specifically, the feature prediction model can realize data processing through the image processor GPU of the local device, and can adopt any commonly used neural network architecture, such as deep neural network (DNN), support vector machine, etc. The model inputs are residual blocks and outputs are transformed features.

步驟S130，對所述當前編碼塊的變換特徵進行量化，得到所述當前編碼塊的量化特徵。Step S130: Quantize the transform feature of the current coding block to obtain the quantized feature of the current coding block.

步驟S140，透過預先訓練好的概率預測模型，確定所述當前編碼塊的量化特徵中每個像素的概率。Step S140: Determine the probability of each pixel in the quantized feature of the current coding block through a pre-trained probability prediction model.

其中，在算術編碼過程中，對於每一個所需編碼的像素，需要預測對應像素所出現的概率（0~1之間的值），其概率可表示當前像素預測可能出現的頻次，預測的概率越高，則其可能出現的頻次越高，則在進行算術編碼生成的位元流的越小。Among them, in the arithmetic coding process, for each pixel to be encoded, it is necessary to predict the probability of the corresponding pixel (a value between 0 and 1), and its probability can represent the frequency that the current pixel prediction may occur, and the predicted probability The higher it is, the higher the frequency it may appear, and the smaller the bit stream generated during arithmetic coding.

步驟S150，利用所述每個像素的概率生成所述當前編碼塊的二進位位元流。Step S150, using the probability of each pixel to generate a binary bit stream of the current coding block.

在本可能的示例中，所述獲取當前編碼塊的原始殘差塊，包括：確定所述當前編碼塊的預測塊；將所述當前編碼塊的預測塊與所述當前編碼塊的原始圖像塊做差，得到所述原始殘差塊。In this possible example, the obtaining the original residual block of the current coding block includes: determining the prediction block of the current coding block; comparing the prediction block of the current coding block with the original image of the current coding block The block is differenced to obtain the original residual block.

具體實現中，基於當前編碼塊的預測塊

進行數值變換並量化，從原（0，1）的連續浮點分佈，生成（0，255）的離散分佈

，與當前編碼塊

做差得到整數訊號殘差

，

。 In the specific implementation, the prediction block based on the current coding block

Perform numerical transformation and quantization to generate (0, 255) discrete distribution from the original (0, 1) continuous floating point distribution

, with the current encoding block

Do the difference to get the integer signal residual

,

.

在本可能的示例中，所述將所述當前編碼塊的預測塊與所述當前編碼塊的原始圖像塊做差，得到所述原始殘差塊，包括：根據所述當前編碼塊的預測塊進行數值變換並量化，生成所述預測塊的離散分佈；將所述預測塊的離散分佈與當前編碼塊的原始圖像塊做差，得到整數訊號的所述原始殘差塊。In this possible example, the performing the difference between the prediction block of the current coding block and the original image block of the current coding block to obtain the original residual block includes: according to the prediction of the current coding block The block is numerically transformed and quantized to generate a discrete distribution of the predicted block; the discrete distribution of the predicted block is differentiated from the original image block of the current coding block to obtain the original residual block of integer signals.

在本可能的示例中，所述根據所述原始殘差塊和預先訓練好的特徵預測模型，得到所述當前編碼塊的變換特徵，包括：對所述原始殘差塊進行重歸一化，得到歸一化後的第一殘差塊；對所述第一殘差塊進行稀疏化處理，得到處理後的第二殘差塊；將所述第二殘差塊輸入預先訓練好的特徵預測模型，得到所述當前編碼塊的變換特徵。In this possible example, obtaining the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model includes: renormalizing the original residual block, obtaining a normalized first residual block; performing sparse processing on the first residual block to obtain a processed second residual block; inputting the second residual block into the pre-trained feature prediction model to obtain the transform feature of the current coding block.

具體實現中，利用基於能量的重歸一化，把預測後不同分佈的殘差統一歸一化在（-1，1）之間，對於不同的影像序列，基於能量的歸一化能統一資料分佈使得訓練更加穩定。In the specific implementation, the energy-based renormalization is used to uniformly normalize the residuals of different distributions after prediction to be between (-1, 1). For different image sequences, the energy-based normalization can unify the data. The distribution makes training more stable.

此外，基於能量的重歸一化可使用基於其他的標準化方法，如0-1標準化 (0-1 normalization)，線性函數歸一化等，目標是統一預測後方差較大的殘差分佈，加快模型訓練和收斂速度。In addition, energy-based renormalization can use other normalization methods, such as 0-1 normalization, linear function normalization, etc., the goal is to unify the residual distribution with large variance after prediction, speed up Model training and convergence speed.

可見，本示例中，閾值稀疏化能在相同的碼率約束下，在端到端編碼中能分配更多的碼率在運動邊界，遮擋等區域，節省較多背景區域所需要碼率，此外，基於能量的重歸一化能加速模型的訓練和收斂，使得模型更加魯棒於不同的殘差分佈。It can be seen that in this example, the threshold sparseness can allocate more code rates in the moving boundary, occlusion and other areas in the end-to-end encoding under the same code rate constraint, saving more code rates required for the background area. In addition, , the energy-based renormalization can speed up the training and convergence of the model, making the model more robust to different residual distributions.

在本可能的示例中，所述對所述原始殘差塊進行重歸一化，得到歸一化後的第一殘差塊，包括：根據能量統一機制，將所述原始殘差塊的不同殘差分佈收斂到相同分佈空間，得到歸一化後的第一殘差塊。In this possible example, the re-normalizing the original residual block to obtain a normalized first residual block includes: according to an energy unification mechanism, The residual distribution converges to the same distribution space, and the normalized first residual block is obtained.

在本可能的示例中，所述根據能量統一機制，將所述原始殘差塊的不同殘差分佈收斂到相同分佈空間，得到歸一化後的第一殘差塊，包括：In this possible example, according to the energy unification mechanism, the different residual distributions of the original residual blocks are converged to the same distribution space to obtain a normalized first residual block, including:

提取所述原始殘差塊中的最小像素值x _min和最大像素值x _max； extracting the minimum pixel value x _min and the maximum pixel value x _max in the original residual block;

透過如下公式將所述原始殘差塊歸一化到區間（0，1）；

Normalize the original residual block to the interval (0, 1) by the following formula;

其中，

表示初次變換後的像素值，

表示歸一化前的像素值； in,

represents the pixel value after the initial transformation,

Represents the pixel value before normalization;

透過如下公式對

進行二次變換，得到處於區間（-1，1）的殘差連續分佈，即歸一化後的第一殘差塊，

by the following formula

Perform secondary transformation to obtain a continuous distribution of residuals in the interval (-1, 1), that is, the normalized first residual block,

其中，

表示歸一化後的像素值。 in,

Indicates the normalized pixel value.

在本可能的示例中，所述對所述第一殘差塊進行稀疏化處理，得到處理後的第二殘差塊，包括：獲取預設閾值集合，所述預設閾值集合包括多個閾值；從所述預設閾值集合篩選適配所述當前編碼塊的目標閾值；遍歷所述第一殘差塊中每個像素的殘差樣本，並將殘差樣本小於所述目標閾值的像素的殘差樣本置零，得到處理後的第二殘差塊。In this possible example, the performing thinning processing on the first residual block to obtain the processed second residual block includes: acquiring a preset threshold set, where the preset threshold set includes multiple thresholds ; Screen the target threshold for adapting the current coding block from the preset threshold set; traverse the residual samples of each pixel in the first residual block, and compare the residual samples of pixels whose residual samples are less than the target threshold. The residual samples are set to zero to obtain the processed second residual block.

具體實現中，目標閾值可以透過如下方式獲取：從預設閾值集合的最小的閾值開始，在編碼端針對每一個閾值均做位元率失真優化得到相對應的結果，並從結果中選擇最優結果對應的閾值作為最適合當前幀殘差編碼的閾值。所述對每一個閾值進行位元率失真優化是指，每選取一個閾值都需要做一次編解碼並得到相對應的結果，從最終結果中選出最優的結果。如圖8B所示，

表示歸一化前的像素值，m1表示預設閾值集合中的第一個閾值，m _n表示表示預設閾值集合中的第n個閾值，不同閾值處理後，生成的殘差圖有著不同的稀疏性，閾值越大得到的殘差越稀疏，同時表示需要編碼的殘差空間區間越小。透過遍歷預設閾值集合，可以準確篩選出最適合當前幀殘差編碼的閾值，提高編碼效率。 In the specific implementation, the target threshold can be obtained by the following methods: starting from the smallest threshold in the preset threshold set, performing bit rate distortion optimization for each threshold at the encoding end to obtain the corresponding result, and selecting the optimal one from the results The threshold corresponding to the result is used as the most suitable threshold for residual coding of the current frame. The performing bit rate distortion optimization on each threshold means that each time a threshold is selected, one encoding and decoding is required to obtain a corresponding result, and the optimal result is selected from the final result. As shown in Figure 8B,

Represents the pixel value before normalization, m1 represents the first threshold in the preset threshold set, and m _n represents the nth threshold in the preset threshold set. After different thresholds are processed, the generated residual maps have different Sparsity, the larger the threshold, the sparser the residual, and the smaller the residual space interval that needs to be encoded. By traversing the preset threshold set, the most suitable threshold for residual coding of the current frame can be accurately screened, thereby improving coding efficiency.

具體實現中，設置不同的閾值，對已歸一化後的殘差做稀疏化處理，使其能分配更有效的訊息在有效的像素上。In the specific implementation, different thresholds are set, and the normalized residuals are sparsed, so that more effective information can be allocated to the effective pixels.

需要注意的是，基於閾值的稀疏化是基於傳統模式選擇的方式，實現skip模式來自我調整編碼殘差訊息，此處的閾值稀疏化可直接針對量化後特徵操作。It should be noted that the threshold-based sparsification is based on the traditional mode selection method. The skip mode is implemented to adjust the coding residual information by itself. The threshold sparsification here can be directly operated on the quantized features.

可見，本示例中，閾值稀疏化能在相同的碼率約束下，在端到端編碼中能分配更多的碼率在運動邊界，遮擋等區域，節省較多背景區域所需要碼率。It can be seen that in this example, threshold sparseness can allocate more code rates in the moving boundary, occlusion and other areas in the end-to-end encoding under the same code rate constraint, saving more code rates required for background areas.

在本可能的示例中，所述多個閾值中每個閾值按照預設的採樣間隔對所述當前編碼塊的像素進行均勻採樣得到。In this possible example, each of the plurality of thresholds is obtained by uniformly sampling the pixels of the current coding block according to a preset sampling interval.

其中，所述採樣間隔的取值範圍透過如下方式確定：根據當前幀的殘差分佈，生成數值分佈的殘差長條圖，獲取殘差分佈的1/α的峰值部分對應的區間。The value range of the sampling interval is determined by: generating a residual histogram of the numerical distribution according to the residual distribution of the current frame, and obtaining the interval corresponding to the peak part of 1/α of the residual distribution.

其中，α的數值可以是4、6、8等，此處不做唯一限定。Wherein, the value of α can be 4, 6, 8, etc., which is not uniquely limited here.

此外，在其他可能的示例中，所述多個閾值中每個閾值按照預設的採樣間隔對所述當前編碼塊的像素進行非均勻採樣得到，一般條件下不超過4個閾值能更好地權衡複雜度與性能的平衡。In addition, in other possible examples, each of the plurality of thresholds is obtained by non-uniformly sampling the pixels of the current coding block according to a preset sampling interval, and under normal conditions, no more than 4 thresholds can better Balancing complexity and performance.

在本可能的示例中，所述對所述當前編碼塊的變換特徵進行量化，得到所述當前編碼塊的量化特徵，包括：對所述當前編碼塊的變換特徵採用可微分量化機制，將浮點的特徵變換成量化後的整數特徵，得到所述當前編碼塊的量化特徵。In this possible example, the quantizing the transform feature of the current coding block to obtain the quantization feature of the current coding block includes: using a differentiable quantization mechanism for the transform feature of the current coding block, The point feature is transformed into a quantized integer feature to obtain the quantized feature of the current coding block.

具體實現中，對提取的特徵採用可微分量化方法，將浮點（floating32）的特徵變換成量化後的整數特徵；其具體方法為正向計算In the specific implementation, a differentiable quantization method is used for the extracted features, and the floating-point (floating32) features are transformed into quantized integer features; the specific method is forward calculation.

此處，

為四捨五入函數，

為正負

的均值雜訊分佈；反向傳播把此函數近似為線性函數，用1作為反向求導的梯度。 here,

is the rounding function,

positive or negative

The mean noise distribution of ; backpropagation approximates this function as a linear function, using 1 as the gradient of the reverse derivation.

在本可能的示例中，如圖8C所示，所述特徵預測模型包括第一支路和第二支路，所述第一支路和所述第二支路並聯；所述第一支路包括級聯的三個殘差提取模組和一個下採樣模組；所述第二支路包括級聯的三個殘差提取模組、一個下採樣模組以及一個啟動模組。In this possible example, as shown in FIG. 8C , the feature prediction model includes a first branch and a second branch, and the first branch and the second branch are connected in parallel; the first branch It includes three cascaded residual extraction modules and one downsampling module; the second branch includes three cascaded residual extraction modules, one downsampling module and one startup module.

其中，殘差提取模組可採用任意的神經網路主流模組，例如殘差塊，密集連接塊等，下採樣模組採用帶步長的卷積核；另一支路採用級聯的卷積層提取特徵並用sigmoid函數啟動，得到空間通道逐點啟動（spatial-channel wise）的自我調整掩膜，並對提取的特徵進行自我調整啟動。所述上採樣模組可以採用轉置卷積實現。Among them, the residual extraction module can use any mainstream neural network module, such as residual block, dense connection block, etc., the downsampling module uses a convolution kernel with stride; the other branch uses a cascaded volume The stacked layer extracts features and starts with the sigmoid function to obtain a spatial-channel wise self-tuning mask, and performs self-tuning on the extracted features. The upsampling module can be implemented by transposed convolution.

具體實現中，殘差提取模組用於針對輸入的殘差塊進行特徵提取，多個殘差提取模組用於提取多個特徵進行堆疊，從而實現級聯特徵提取。In a specific implementation, the residual extraction module is used for feature extraction for the input residual block, and multiple residual extraction modules are used to extract multiple features for stacking, thereby realizing cascade feature extraction.

其需要注意的是，第一支路是主要的特徵提取模組，第二支路的sigmoid之後的模組是自注意力啟動映射模組，兩個支路的輸出相乘生成最終的變換特徵。It should be noted that the first branch is the main feature extraction module, the module after the sigmoid of the second branch is the self-attention mapping module, and the outputs of the two branches are multiplied to generate the final transformation feature. .

此外，所述特徵預測模型的訓練過程中，碼率和損失函數可以透過如下方式確定。In addition, in the training process of the feature prediction model, the code rate and the loss function can be determined in the following manner.

碼率估計透過公式

得到，R為碼率約束的損失，P為所述量化後的變換特徵中每個像素的概率； The code rate is estimated through the formula

Obtain, R is the loss of the code rate constraint, and P is the probability of each pixel in the quantized transform feature;

損失函數

，D(.)為均方誤差MSE函數或者L2損失函數，

為當前編碼塊的預測塊，

為前編碼塊，整數訊號殘差

，

為當前編碼塊的預測塊的離散分佈； loss function

, D(.) is the mean square error MSE function or L2 loss function,

is the prediction block of the current coding block,

is the pre-coding block, the integer signal residual

,

is the discrete distribution of the prediction block of the current coding block;

對所述碼率和所述損失函數使用位元率失真優化

，L為每一幀的重建損失，R為碼率約束的損失，透過調整

，訓練得到不同碼率的特徵預測模型。 use rate-distortion optimization for the bit rate and the loss function

, L is the reconstruction loss of each frame, R is the loss of the rate constraint, by adjusting

, and trained to obtain feature prediction models with different bit rates.

具體實現中，所述特徵預測模型可以採用自注意力機制（self-attention），能夠根據需要靈活調整兩路的殘差提取模組使用個數，也可以使用簡單卷積來替換殘差提取模組，適用編解碼的加速與簡化。In the specific implementation, the feature prediction model can use a self-attention mechanism, which can flexibly adjust the number of residual extraction modules used in the two channels as needed, or can use simple convolution to replace the residual extraction module. group, which is suitable for codec acceleration and simplification.

例如，所述第一支路和第二支路可以分別包括四個殘差提取模組，或者分別包括四個卷積模組。For example, the first branch and the second branch may respectively include four residual extraction modules, or respectively include four convolution modules.

可以看出，本申請實施例中，採用預先訓練好的神經網路模型來編碼殘差訊息，能使神經網路模型隱式學習不同失真的殘差，相比於一般的端到端殘差編碼，此方法能自我調整地編碼並作幀間補償，在相同的碼率下，能更高效地分配空間上的殘差訊息，得到更高品質的重建影像幀。It can be seen that, in the embodiment of the present application, the pre-trained neural network model is used to encode the residual information, so that the neural network model can implicitly learn residuals with different distortions, which is compared with the general end-to-end residuals. Coding, this method can self-adjust coding and perform inter-frame compensation. Under the same bit rate, it can allocate the residual information in space more efficiently, and obtain higher-quality reconstructed image frames.

與圖8A所述的圖像編碼方法對應的，圖9A為本申請實施例中圖像編碼方法的一種流程示意圖，該圖像編碼方法可以應用於圖5示出的編解碼系統1中的目的裝置20或圖7示出的影像解碼器200。圖9A示出的流程以執行主體為圖7示出的影像編碼器200為例進行說明。如圖9A所示，本申請實施例提供的圖像解碼方法包括：Corresponding to the image encoding method described in FIG. 8A , FIG. 9A is a schematic flowchart of the image encoding method in the embodiment of the present application, and the image encoding method can be applied to the purpose of the encoding and decoding system 1 shown in FIG. 5 . The device 20 or the video decoder 200 shown in FIG. 7 . The flow shown in FIG. 9A is described by taking the execution subject as the video encoder 200 shown in FIG. 7 as an example. As shown in FIG. 9A , the image decoding method provided by the embodiment of the present application includes:

步驟S210，獲取當前解碼塊的二進位位元流，所述當前解碼塊包括當前處理的影像幀的位元流或者劃分所述當前處理的影像幀而得到的解碼單元。Step S210: Acquire a binary bit stream of a current decoding block, where the current decoding block includes a bit stream of a currently processed image frame or a decoding unit obtained by dividing the currently processed image frame.

其中，所述解碼單元的劃分方式包括如圖4所示的各種劃分方式，此處不做唯一限定。Wherein, the division manner of the decoding unit includes various division manners as shown in FIG. 4 , which is not uniquely limited here.

其中，所述解碼塊與前述編碼方法實施例中所涉及到的編碼塊是對應的，具體可以表現為大小一致。Wherein, the decoding block corresponds to the encoding block involved in the foregoing encoding method embodiments, and may specifically be represented as having the same size.

具體實現中，針對當前解碼塊為當前處理的影像幀的位元流情況，由於最小資料處理物件為單幀圖像的位元流，因此該方法處理效率更高，但精度和性能有一定損失。In the specific implementation, for the current decoding block is the bit stream of the currently processed image frame, since the minimum data processing object is the bit stream of a single frame image, the processing efficiency of this method is higher, but the accuracy and performance have a certain loss. .

針對當前編碼塊為劃分所述當前處理的影像幀而得到的編碼單元的位元流的情況，由於最小資料處理顆粒度為劃分後的編碼單元，因此整體演算法處理過程複雜度變高，處理時長變長，但精度和性能相對較高。For the case where the current coding block is the bit stream of the coding unit obtained by dividing the currently processed image frame, since the minimum data processing granularity is the divided coding unit, the overall algorithm processing complexity becomes high, and the processing The duration becomes longer, but the accuracy and performance are relatively high.

步驟S220，透過預先訓練好的概率預測模型，將所述二進位位元流變換成所述當前解碼塊的量化特徵。Step S220 , transform the binary bit stream into the quantized feature of the current decoding block through a pre-trained probability prediction model.

其中，所述變換為無損變換。Wherein, the transformation is a lossless transformation.

步驟S230，根據所述量化特徵和預先訓練好的殘差預測模型，確定所述當前解碼塊的殘差塊。Step S230: Determine the residual block of the current decoding block according to the quantized feature and the pre-trained residual prediction model.

其中，所述殘差預測模型具體可以透過本端設備的影像處理器GPU實現資料處理，可以採用任意常用的神經網路架構，例如深度神經網路DNN、遞迴神經網路（Recurrent Neural Network，RNN）、卷積神經網路（Convolutional Neural Network，CNN）等，該模型輸入為量化特徵，輸出為殘差塊。The residual prediction model can specifically realize data processing through the image processor GPU of the local device, and can adopt any commonly used neural network architecture, such as deep neural network DNN, recurrent neural network (Recurrent Neural Network, RNN), convolutional neural network (Convolutional Neural Network, CNN), etc., the input of this model is quantized features, and the output is residual block.

步驟S240，根據所述殘差塊與所述當前解碼塊的預測塊，確定所述當前解碼塊的重建塊。Step S240: Determine the reconstructed block of the current decoding block according to the residual block and the prediction block of the current decoding block.

在本可能的示例中，所述根據所述原始殘差塊與所述當前解碼塊的預測塊，確定所述當前解碼塊的重建塊，包括：確定所述當前解碼塊的預測塊；利用所述原始殘差塊對所述當前解碼塊的預測塊做殘差補償，得到所述當前解碼塊的重建塊。In this possible example, the determining the reconstructed block of the current decoding block according to the original residual block and the prediction block of the current decoding block includes: determining the prediction block of the current decoding block; using the The original residual block performs residual compensation on the prediction block of the current decoding block to obtain a reconstructed block of the current decoding block.

本申請實施例的圖像解碼方法具體可以解釋為如下步驟。The image decoding method in the embodiment of the present application can be specifically explained as the following steps.

首先，獲取位元流，該位元流對應當前解碼塊的二級制位元流，具體可以包括當前解碼塊的公共參數集，以及當前解碼塊的圖像的編碼訊息，First, obtain a bit stream, which corresponds to the secondary bit stream of the current decoding block, which may specifically include the public parameter set of the current decoding block and the encoding information of the image of the current decoding block,

其次，以初始化後的全零特徵開始，二進位位元流讀取後的數值為預先訓練好的概率預測模型的輸入，運行該模型以輸出當前解碼塊的量化特徵；Secondly, starting with the initialized all-zero feature, the value read from the binary bit stream is the input of the pre-trained probability prediction model, and the model is run to output the quantized feature of the current decoding block;

再次，以模型預測得到的量化特徵為預先訓練好的殘差預測模型的輸入，運行該模型以輸出對應的殘差塊，Again, take the quantized features predicted by the model as the input of the pre-trained residual prediction model, run the model to output the corresponding residual block,

最後，根據模型預測得到的殘差塊與當前解碼塊的預測塊，計算重建塊或重建圖像。Finally, according to the residual block predicted by the model and the predicted block of the current decoding block, the reconstructed block or reconstructed image is calculated.

其中，所述預測塊可以根據解碼訊息中攜帶的幀間預測模式對當前解碼塊預測得到。The prediction block can be obtained by predicting the current decoding block according to the inter-frame prediction mode carried in the decoding message.

在本可能的示例中，所述確定所述當前解碼塊的預測塊，包括：對所述當前解碼塊進行熵解碼以產生語法元素；根據語法元素確定對所述當前解碼塊進行解碼的幀間預測模式；根據確定的所述幀間預測模式，對所述當前解碼塊執行幀間預測以獲取所述當前解碼塊的預測塊。In this possible example, the determining the prediction block of the current decoding block includes: performing entropy decoding on the current decoding block to generate a syntax element; determining the inter frame for decoding the current decoding block according to the syntax element a prediction mode; according to the determined inter prediction mode, perform inter prediction on the current decoding block to obtain a prediction block of the current decoding block.

在本可能的示例中，如圖9B所示，所述殘差預測模型包括第一支路和第二支路，所述第一支路和所述第二支路並聯；所述第一支路包括級聯的三個殘差提取模組和一個上採樣模組；所述第二支路包括級聯的三個殘差提取模組、一個上採樣模組以及一個啟動模組。In this possible example, as shown in FIG. 9B , the residual prediction model includes a first branch and a second branch, and the first branch and the second branch are connected in parallel; the first branch The circuit includes three cascaded residual extraction modules and one upsampling module; the second branch includes three cascaded residual extraction modules, one upsampling module and one startup module.

此外，所述殘差預測模型的訓練過程中，碼率和損失函數可以透過如下方式確定。In addition, during the training process of the residual prediction model, the code rate and the loss function can be determined in the following manner.

碼率估計透過公式

損失函數

，D(.)為均方誤差MSE函數或者L2損失函數，

為當前編碼塊的預測塊，

為前編碼塊，整數訊號殘差

，

為當前編碼塊的預測塊的離散分佈； loss function

, D(.) is the mean square error MSE function or L2 loss function,

is the prediction block of the current coding block,

is the pre-coding block, the integer signal residual

,

對所述碼率和所述損失函數使用位元率失真優化

，L為每一幀的重建損失，R為碼率約束的損失，透過調整

，訓練得到不同碼率的殘差預測模型。 use rate-distortion optimization for the bit rate and the loss function

, and train the residual prediction models with different bit rates.

具體實現中，所述殘差預測模型可以採用自注意力機制，能夠根據需要靈活調整兩路的殘差提取模組使用個數，也可以使用簡單卷積來替換殘差提取模組，適用編解碼的加速與簡化。In the specific implementation, the residual prediction model can adopt a self-attention mechanism, which can flexibly adjust the number of residual extraction modules used in the two channels according to the needs, and can also use simple convolution to replace the residual extraction module. Decoding acceleration and simplification.

具體實現中，殘差預測模型用於針對輸入的殘差塊進行特徵提取，多個殘差提取模組用於提取多個特徵進行堆疊，從而實現級聯特徵提取。In the specific implementation, the residual prediction model is used to extract features for the input residual blocks, and multiple residual extraction modules are used to extract multiple features for stacking, thereby realizing cascade feature extraction.

其需要注意的是，第一支路是主要的特徵提取模組，第二支路的sigmoid之後的模組是自注意力啟動映射模組，兩個支路的輸出相乘生成最終的殘差塊。It should be noted that the first branch is the main feature extraction module, the module after the sigmoid of the second branch is the self-attention mapping module, and the outputs of the two branches are multiplied to generate the final residual. yuan.

本申請實施例提供一種圖像編碼裝置，該圖像編碼裝置可以為影像解碼器或影像編碼器。具體的，圖像編碼裝置用於執行以上解碼方法中的影像解碼器所執行的步驟。本申請實施例提供的圖像編碼裝置可以包括相應步驟所對應的模組。An embodiment of the present application provides an image encoding apparatus, and the image encoding apparatus may be an image decoder or an image encoder. Specifically, the image encoding device is configured to perform the steps performed by the image decoder in the above decoding method. The image encoding apparatus provided by the embodiments of the present application may include modules corresponding to corresponding steps.

本申請實施例可以根據上述方法示例對圖像編碼裝置進行功能模組的劃分，例如，可以對應各個功能劃分各個功能模組，也可以將兩個或兩個以上的功能集成在一個處理模組中。上述集成的模組既可以採用硬體的形式實現，也可以採用軟體功能模組的形式實現。本申請實施例中對模組的劃分是示意性的，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式。In this embodiment of the present application, the image coding apparatus can be divided into functional modules according to the above method examples. For example, each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module. middle. The above-mentioned integrated modules can be implemented in the form of hardware, or can be implemented in the form of software function modules. The division of modules in the embodiments of the present application is schematic, and is only a logical function division, and other division methods may be used in actual implementation.

在採用對應各個功能劃分各個功能模組的情況下，圖10示出上述實施例中所涉及的圖像編碼裝置的一種可能的結構示意圖。如圖10所示，圖像編碼裝置1000包括獲取單元1001，用於獲取當前編碼塊的原始殘差塊，所述當前編碼塊包括當前處理的影像幀或者劃分所述當前處理的影像幀而得到的編碼單元；第一預測單元1002，用於根據所述原始殘差塊和預先訓練好的特徵預測模型，得到所述當前編碼塊的變換特徵；量化單元1003，用於對所述當前編碼塊的變換特徵進行量化，得到所述當前編碼塊的量化特徵；第二預測單元1004，用於透過預先訓練好的概率預測模型，確定所述當前編碼塊的量化特徵中每個像素的概率；生成單元1005，用於利用所述每個像素的概率生成所述當前編碼塊的二進位位元流。In the case where each functional module is divided according to each function, FIG. 10 shows a possible schematic structural diagram of the image coding apparatus involved in the above embodiment. As shown in FIG. 10 , the image coding apparatus 1000 includes an obtaining unit 1001 for obtaining an original residual block of a current coding block, where the current coding block includes a currently processed image frame or is obtained by dividing the currently processed image frame The first prediction unit 1002 is used for obtaining the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model; the quantization unit 1003 is used for the current coding block. Quantize the transform feature of the current coding block to obtain the quantized feature of the current coding block; the second prediction unit 1004 is used to determine the probability of each pixel in the quantized feature of the current coding block through a pre-trained probability prediction model; Unit 1005, configured to generate a binary bit stream of the current coding block by using the probability of each pixel.

在本可能的示例中，在所述獲取當前編碼塊的原始殘差塊方面，所述獲取單元1001具體用於：確定所述當前編碼塊的預測塊；將所述當前編碼塊的預測塊與所述當前編碼塊的原始圖像塊做差，得到所述原始殘差塊。In this possible example, in terms of obtaining the original residual block of the current coding block, the obtaining unit 1001 is specifically configured to: determine the prediction block of the current coding block; and compare the prediction block of the current coding block with the prediction block of the current coding block. The original image block of the current coding block is subjected to difference to obtain the original residual block.

在本可能的示例中，在所述將所述當前編碼塊的預測塊與所述當前編碼塊的原始圖像塊做差，得到所述原始殘差塊方面，所述獲取單元1001具體用於：根據所述當前編碼塊的預測塊進行數值變換並量化，生成所述預測塊的離散分佈；將所述預測塊的離散分佈與當前編碼塊的原始圖像塊做差，得到整數訊號的所述原始殘差塊。In this possible example, in the aspect of obtaining the original residual block by making a difference between the prediction block of the current coding block and the original image block of the current coding block, the obtaining unit 1001 is specifically configured to: : perform numerical transformation and quantization according to the prediction block of the current coding block to generate a discrete distribution of the prediction block; make a difference between the discrete distribution of the prediction block and the original image block of the current coding block to obtain all the integer signals. the original residual block.

在本可能的示例中，在所述根據所述原始殘差塊和預先訓練好的特徵預測模型，得到所述當前編碼塊的變換特徵方面，所述根據所述原始殘差塊和預先訓練好的特徵預測模型，得到所述當前編碼塊的變換特徵方面，在所述第一預測單元1002具體用於：對所述原始殘差塊進行重歸一化，得到歸一化後的第一殘差塊；對所述第一殘差塊進行稀疏化處理，得到處理後的第二殘差塊；將所述第二殘差塊輸入預先訓練好的特徵預測模型，得到所述當前編碼塊的變換特徵。In this possible example, in the aspect of obtaining the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model, the In terms of obtaining the transform feature of the current coding block, the first prediction unit 1002 is specifically used to: re-normalize the original residual block to obtain the normalized first residual block. difference block; perform sparse processing on the first residual block to obtain a processed second residual block; input the second residual block into a pre-trained feature prediction model to obtain the current coding block Transform features.

在本可能的示例中，在所述對所述原始殘差塊進行重歸一化，得到歸一化後的第一殘差塊方面，所述第一預測單元1002具體用於：根據能量統一機制，將所述原始殘差塊的不同殘差分佈收斂到相同分佈空間，得到歸一化後的第一殘差塊。In this possible example, in terms of renormalizing the original residual block to obtain a normalized first residual block, the first prediction unit 1002 is specifically configured to: unify according to energy A mechanism is used to converge the different residual distributions of the original residual blocks to the same distribution space to obtain a normalized first residual block.

在本可能的示例中，在所述根據能量統一機制，將所述原始殘差塊的不同殘差分佈收斂到相同分佈空間，得到歸一化後的第一殘差塊方面，所述第一預測單元1002具體用於：提取所述原始殘差塊中的最小像素值x _min和最大像素值x _max；透過如下公式將所述原始殘差塊歸一化到區間（0，1）；

In this possible example, in the aspect of converging different residual distributions of the original residual blocks into the same distribution space according to the energy unification mechanism to obtain a normalized first residual block, the first residual block is The prediction unit 1002 is specifically configured to: extract the minimum pixel value x _min and the maximum pixel value x _max in the original residual block; normalize the original residual block to the interval (0, 1) through the following formula;

其中，

表示初次變換後的像素值，

表示歸一化前的像素值； in,

represents the pixel value after the initial transformation,

Represents the pixel value before normalization;

透過如下公式對

by the following formula

其中，

表示歸一化後的像素值。 in,

Indicates the normalized pixel value.

在本可能的示例中，在所述對所述第一殘差塊進行稀疏化處理，得到處理後的第二殘差塊方面，所述第一預測模型101具體用於：獲取預設閾值集合，所述預設閾值集合包括多個閾值；從所述預設閾值集合篩選適配所述當前編碼塊的目標閾值；遍歷所述第一殘差塊中每個像素的像素值，並將像素值小於所述目標閾值的像素的像素值置零，得到處理後的第二殘差塊。In this possible example, in the aspect of performing thinning processing on the first residual block to obtain a processed second residual block, the first prediction model 101 is specifically used to: obtain a preset threshold set , the preset threshold value set includes a plurality of threshold values; the target threshold value adapted to the current coding block is screened from the preset threshold value set; the pixel value of each pixel in the first residual block is traversed, and the pixel value is The pixel value of the pixel whose value is less than the target threshold is set to zero to obtain the processed second residual block.

在本可能的示例中，在所述對所述當前編碼塊的變換特徵進行量化，得到所述當前編碼塊的量化特徵方面，所述量化單元1003具體用於：對所述當前編碼塊的變換特徵採用可微分量化機制，將浮點的特徵變換成量化後的整數特徵，得到所述當前編碼塊的量化特徵。In this possible example, in terms of quantizing the transform feature of the current coding block to obtain the quantization feature of the current coding block, the quantization unit 1003 is specifically configured to: transform the current coding block The feature adopts a differentiable quantization mechanism to transform the floating-point feature into a quantized integer feature to obtain the quantized feature of the current coding block.

在本可能的示例中，所述特徵預測模型包括第一支路和第二支路，所述第一支路和所述第二支路並聯；所述第一支路包括級聯的三個殘差提取模組和一個下採樣模組；所述第二支路包括級聯的三個殘差提取模組、一個下採樣模組以及一個啟動模組。In this possible example, the feature prediction model includes a first branch and a second branch, and the first branch and the second branch are connected in parallel; the first branch includes three cascaded A residual error extraction module and a downsampling module; the second branch includes three cascaded residual error extraction modules, a downsampling module and a startup module.

其中，上述方法實施例涉及的各步驟的所有相關內容均可以援引到對應功能模組的功能描述，在此不再贅述。當然，本申請實施例提供的圖像編碼裝置1000包括但不限於上述模組，例如：圖像編碼裝置1000還可以包括儲存單元。儲存單元可以用於儲存該圖像編碼裝置的程式碼和資料。Wherein, all relevant contents of the steps involved in the above method embodiments can be cited in the functional descriptions of the corresponding functional modules, which will not be repeated here. Of course, the image encoding apparatus 1000 provided in the embodiment of the present application includes but is not limited to the above modules. For example, the image encoding apparatus 1000 may further include a storage unit. The storage unit can be used to store the code and data of the image encoding device.

在採用集成的單元的情況下，本申請實施例提供的圖像編碼裝置的結構示意圖如圖11所示。在圖11中，圖像編碼裝置11包括：處理模組1102和通訊模組1101。處理模組1102用於對圖像編碼裝置的動作進行控制管理，例如，執行獲取單元1001、第一預測單元1002、量化單元1003、第二預測單元1004、生成單元1005執行的步驟，和/或用於執行本文所描述的技術的其它過程。通訊模組1101用於支援圖像編碼裝置與其他設備之間的交互。如圖11所示，圖像編碼裝置還可以包括儲存模組1103，儲存模組1103用於儲存圖像編碼裝置的程式碼和資料，例如儲存上述儲存單元所保存的內容。In the case of using an integrated unit, a schematic structural diagram of the image encoding apparatus provided by the embodiment of the present application is shown in FIG. 11 . In FIG. 11 , the image encoding device 11 includes: a processing module 1102 and a communication module 1101 . The processing module 1102 is used to control and manage the actions of the image encoding device, for example, to execute the steps performed by the acquisition unit 1001, the first prediction unit 1002, the quantization unit 1003, the second prediction unit 1004, and the generation unit 1005, and/or Other processes for performing the techniques described herein. The communication module 1101 is used to support the interaction between the image encoding device and other devices. As shown in FIG. 11 , the image encoding apparatus may further include a storage module 1103, and the storage module 1103 is used to store the program codes and data of the image encoding apparatus, for example, to store the content stored in the above-mentioned storage unit.

其中，處理模組1102可以是處理器或控制器，例如可以是中央處理器(Central Processing Unit，CPU)，通用處理器，數位訊號處理器(Digital Signal Processor，DSP)，ASIC，FPGA或者其他可程式邏輯器件、電晶體邏輯器件、硬體部件或者其任意組合。其可以實現或執行結合本申請公開內容所描述的各種示例性的邏輯方塊，模組和電路。所述處理器也可以是實現計算功能的組合，例如包含一個或多個微處理器組合，DSP和微處理器的組合等等。通訊模組1101可以是收發器、RF電路或通訊介面等。儲存模組1103可以是記憶體。The processing module 1102 may be a processor or a controller, such as a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), ASIC, FPGA or other Program logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like. The communication module 1101 can be a transceiver, an RF circuit, or a communication interface. The storage module 1103 may be a memory.

其中，上述方法實施例涉及的各場景的所有相關內容均可以援引到對應功能模組的功能描述，在此不再贅述。上述圖像編碼裝置1000和圖像編碼裝置11均可執行上述圖8A所示的圖像編碼方法，圖像編碼裝置1000和圖像編碼裝置11具體可以是影像圖像編碼裝置或者其他具有影像編碼功能的設備。Wherein, all relevant contents of the scenarios involved in the above method embodiments can be cited in the functional descriptions of the corresponding functional modules, which will not be repeated here. The image encoding device 1000 and the image encoding device 11 can both execute the image encoding method shown in FIG. 8A . functional device.

本申請還提供一種影像編碼器，包括非揮發性儲存媒介，以及中央處理器，所述非揮發性儲存媒介儲存有可執行程式，所述中央處理器與所述非揮發性儲存媒介連接，並執行所述可執行程式以實現本申請實施例的圖像編碼方法。The present application also provides an image encoder, including a non-volatile storage medium, and a central processing unit, wherein the non-volatile storage medium stores an executable program, the central processing unit is connected to the non-volatile storage medium, and The executable program is executed to implement the image encoding method of the embodiment of the present application.

本申請實施例提供一種圖像解碼裝置，該圖像解碼裝置可以為影像解碼器或影像解碼器。具體的，圖像解碼裝置用於執行以上解碼方法中的影像解碼器所執行的步驟。本申請實施例提供的圖像解碼裝置可以包括相應步驟所對應的模組。An embodiment of the present application provides an image decoding apparatus, and the image decoding apparatus may be an image decoder or an image decoder. Specifically, the image decoding apparatus is configured to perform the steps performed by the image decoder in the above decoding method. The image decoding apparatus provided by the embodiments of the present application may include modules corresponding to corresponding steps.

本申請實施例可以根據上述方法示例對圖像解碼裝置進行功能模組的劃分，例如，可以對應各個功能劃分各個功能模組，也可以將兩個或兩個以上的功能集成在一個處理模組中。上述集成的模組既可以採用硬體的形式實現，也可以採用軟體功能模組的形式實現。本申請實施例中對模組的劃分是示意性的，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式。In this embodiment of the present application, the image decoding apparatus may be divided into functional modules according to the above method examples. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. middle. The above-mentioned integrated modules can be implemented in the form of hardware, or can be implemented in the form of software function modules. The division of modules in the embodiments of the present application is schematic, and is only a logical function division, and other division methods may be used in actual implementation.

在採用對應各個功能劃分各個功能模組的情況下，圖12示出上述實施例中所涉及的圖像解碼裝置的一種可能的結構示意圖。如圖12所示，圖像解碼裝置12包括：In the case where each functional module is divided according to each function, FIG. 12 shows a possible schematic structural diagram of the image decoding apparatus involved in the above embodiment. As shown in FIG. 12, the image decoding device 12 includes:

獲取單元124，用於獲取當前解碼塊的二進位位元流，所述當前解碼塊包括當前處理的影像幀的位元流或者劃分所述當前處理的影像幀而得到的解碼單元；an obtaining unit 124, configured to obtain a binary bit stream of a current decoding block, where the current decoding block includes a bit stream of a currently processed image frame or a decoding unit obtained by dividing the currently processed image frame;

第一預測單元121，用於透過預先訓練好的概率預測模型，將所述二進位位元流變換成所述當前解碼塊的量化特徵；The first prediction unit 121 is used to transform the binary bit stream into a quantized feature of the current decoding block through a pre-trained probability prediction model;

第二預測單元122，用於根據所述量化特徵和預先訓練好的殘差預測模型，確定所述當前解碼塊的殘差塊；The second prediction unit 122 is configured to determine the residual block of the current decoding block according to the quantized feature and the pre-trained residual prediction model;

確定單元123，用於根據所述殘差塊與所述當前解碼塊的預測塊，確定所述當前解碼塊的重建塊。The determining unit 123 is configured to determine the reconstructed block of the current decoding block according to the residual block and the prediction block of the current decoding block.

在一個可能的示例中，在所述根據所述原始殘差塊與所述當前解碼塊的預測塊，確定所述當前解碼塊的重建塊方面，所述確定單元123具體用於：確定所述當前解碼塊的預測塊；利用所述原始殘差塊對所述當前解碼塊的預測塊做殘差補償，得到所述當前解碼塊的重建塊。In a possible example, in the aspect of determining the reconstructed block of the current decoding block according to the original residual block and the prediction block of the current decoding block, the determining unit 123 is specifically configured to: determine the The prediction block of the current decoding block; using the original residual block to perform residual compensation on the prediction block of the current decoding block to obtain the reconstructed block of the current decoding block.

在一個可能的示例中，在所述確定所述當前解碼塊的預測塊方面，所述確定單元123具體用於：對所述當前解碼塊進行熵解碼以產生語法元素；根據語法元素確定對所述當前解碼塊進行解碼的幀間預測模式；根據確定的所述幀間預測模式，對所述當前解碼塊執行幀間預測以獲取所述當前解碼塊的預測塊。In a possible example, in the aspect of determining the prediction block of the current decoding block, the determining unit 123 is specifically configured to: perform entropy decoding on the current decoding block to generate a syntax element; according to the determined inter prediction mode, performing inter prediction on the current decoding block to obtain the prediction block of the current decoding block.

在一個可能的示例中，所述殘差預測模型包括第一支路和第二支路，所述第一支路和所述第二支路並聯；所述第一支路包括級聯的三個殘差提取模組和一個上採樣模組；所述第二支路包括級聯的三個殘差提取模組、一個上採樣模組以及一個啟動模組。In a possible example, the residual prediction model includes a first branch and a second branch, the first branch and the second branch are connected in parallel; the first branch includes a cascade of three a residual extraction module and an upsampling module; the second branch includes three cascaded residual extraction modules, an upsampling module and a startup module.

其中，上述方法實施例涉及的各步驟的所有相關內容均可以援引到對應功能模組的功能描述，在此不再贅述。當然，本申請實施例提供的圖像解碼裝置包括但不限於上述模組，例如：圖像解碼裝置還可以包括儲存單元。儲存單元可以用於儲存該圖像解碼裝置的程式碼和資料。Wherein, all relevant contents of the steps involved in the above method embodiments can be cited in the functional descriptions of the corresponding functional modules, which will not be repeated here. Of course, the image decoding apparatus provided in the embodiment of the present application includes but is not limited to the above-mentioned modules. For example, the image decoding apparatus may further include a storage unit. The storage unit can be used to store the code and data of the image decoding device.

在採用集成的單元的情況下，本申請實施例提供的圖像解碼裝置的結構示意圖如圖13所示。在圖13中，圖像解碼裝置13包括：處理模組130和通訊模組131。處理模組130用於對圖像解碼裝置的動作進行控制管理，例如，執行獲取單元124、第一預測單元121、第二預測單元122和確定單元123執行的步驟，和/或用於執行本文所描述的技術的其它過程。通訊模組131用於支援圖像解碼裝置與其他設備之間的交互。如圖13所示，圖像解碼裝置還可以包括儲存模組132，儲存模組132用於儲存圖像解碼裝置的程式碼和資料，例如儲存上述儲存單元123所保存的內容。In the case of using an integrated unit, a schematic structural diagram of an image decoding apparatus provided by an embodiment of the present application is shown in FIG. 13 . In FIG. 13 , the image decoding device 13 includes: a processing module 130 and a communication module 131 . The processing module 130 is used to control and manage the actions of the image decoding device, for example, to perform the steps performed by the acquisition unit 124, the first prediction unit 121, the second prediction unit 122, and the determination unit 123, and/or to perform the steps described herein. other procedures of the described techniques. The communication module 131 is used to support the interaction between the image decoding apparatus and other devices. As shown in FIG. 13 , the image decoding apparatus may further include a storage module 132 , and the storage module 132 is used to store the program codes and data of the image decoding apparatus, for example, to store the contents stored in the above-mentioned storage unit 123 .

其中，處理模組130可以是處理器或控制器，例如可以是中央處理器(Central Processing Unit，CPU)，通用處理器，數位訊號處理器(Digital Signal Processor，DSP)，ASIC，FPGA或者其他可程式邏輯器件、電晶體邏輯器件、硬體部件或者其任意組合。其可以實現或執行結合本申請公開內容所描述的各種示例性的邏輯方塊，模組和電路。所述處理器也可以是實現計算功能的組合，例如包含一個或多個微處理器組合，DSP和微處理器的組合等等。通訊模組131可以是收發器、RF電路或通訊介面等。儲存模組132可以是記憶體。The processing module 130 may be a processor or a controller, such as a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), ASIC, FPGA or other Program logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like. The communication module 131 can be a transceiver, an RF circuit, a communication interface, or the like. The storage module 132 may be a memory.

其中，上述方法實施例涉及的各場景的所有相關內容均可以援引到對應功能模組的功能描述，在此不再贅述。上述圖像解碼裝置12和圖像解碼裝置13均可執行上述圖9A所示的圖像解碼方法，圖像解碼裝置12和圖像解碼裝置13具體可以是影像圖像解碼裝置或者其他具有影像解碼功能的設備。Wherein, all relevant contents of the scenarios involved in the above method embodiments can be cited in the functional descriptions of the corresponding functional modules, which will not be repeated here. Both the image decoding device 12 and the image decoding device 13 described above can perform the image decoding method shown in FIG. 9A . functional device.

本申請還提供一種影像解碼器，包括非揮發性儲存媒介，以及中央處理器，所述非揮發性儲存媒介儲存有可執行程式，所述中央處理器與所述非揮發性儲存媒介連接，並執行所述可執行程式以實現本申請實施例的圖像解碼方法。The present application also provides an image decoder, including a non-volatile storage medium, and a central processing unit, wherein the non-volatile storage medium stores an executable program, the central processing unit is connected to the non-volatile storage medium, and The executable program is executed to implement the image decoding method of the embodiment of the present application.

本申請還提供一種終端，該終端包括：一個或多個處理器、記憶體、通訊介面。該記憶體、通訊介面與一個或多個處理器耦合；記憶體用於儲存電腦程式代碼，電腦程式代碼包括指令，當一個或多個處理器執行指令時，終端執行本申請實施例的圖像編碼和/或圖像解碼方法。這裡的終端可以是影像顯示裝置，智慧手機，可擕式電腦以及其它可以處理影像或者播放影像的設備。The present application also provides a terminal, where the terminal includes: one or more processors, a memory, and a communication interface. The memory and the communication interface are coupled with one or more processors; the memory is used to store computer program codes, and the computer program codes include instructions. When one or more processors execute the instructions, the terminal executes the images of the embodiments of the present application. Encoding and/or image decoding methods. The terminal here can be an image display device, a smart phone, a portable computer, and other devices that can process images or play images.

本申請另一實施例還提供一種電腦可讀儲存媒介，該電腦可讀儲存媒介包括一個或多個程式碼，該一個或多個程式包括指令，當解碼設備中的處理器在執行該程式碼時，該解碼設備執行本申請實施例的圖像編碼方法、圖像解碼方法。Another embodiment of the present application further provides a computer-readable storage medium, the computer-readable storage medium includes one or more program codes, and the one or more programs include instructions, when the processor in the decoding device executes the program code , the decoding device executes the image encoding method and the image decoding method of the embodiments of the present application.

在本申請的另一實施例中，還提供一種電腦程式產品，該電腦程式產品包括電腦執行指令，該電腦執行指令儲存在電腦可讀儲存媒介中；解碼設備的至少一個處理器可以從電腦可讀儲存媒介讀取該電腦執行指令，至少一個處理器執行該電腦執行指令使得終端實施執行本申請實施例的圖像編碼方法、圖像解碼方法。In another embodiment of the present application, there is also provided a computer program product, the computer program product includes computer-executable instructions, and the computer-executable instructions are stored in a computer-readable storage medium; at least one processor of the decoding device can be obtained from the computer The read storage medium reads the computer-executed instruction, and at least one processor executes the computer-executed instruction so that the terminal implements the image encoding method and the image decoding method of the embodiments of the present application.

在上述實施例中，可以全部或部分的透過軟體，硬體，韌體或者其任意組合來實現。當使用軟體程式實現時，可以全部或部分地以電腦程式產品的形式出現。所述電腦程式產品包括一個或多個電腦指令。在電腦上載入和執行所述電腦程式指令時，全部或部分地產生按照本申請實施例所述的流程或功能。In the above embodiments, it may be implemented in whole or in part through software, hardware, firmware or any combination thereof. When implemented using a software program, it may be in the form of a computer program product, in whole or in part. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, all or part of the processes or functions described in the embodiments of the present application are generated.

所述電腦可以是通用電腦、專用電腦、電腦網路、或者其他可程式裝置。所述電腦指令可以儲存在電腦可讀儲存媒介中，或者從一個電腦可讀儲存媒介向另一個電腦可讀儲存媒介傳輸，例如，所述電腦指令可以從一個網站、電腦、伺服器或資料中心透過有線(例如同軸電纜、光纖、數位用戶線路(DSL)) 或無線(例如紅外、無線、微波等)方式向另一個網站、電腦、伺服器或資料中心傳輸。The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored on or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be downloaded from a website, computer, server or data center Transmission to another website, computer, server or data center via wired (eg coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).

所述電腦可讀儲存媒介可以是電腦能夠存取的任何可用媒介或者是包含一個或多個可用媒介集成的伺服器、資料中心等資料存放裝置。該可用媒介可以是磁性媒介，(例如，軟碟，硬碟、磁帶)、光媒介(例如，DVD)或者半導體媒介(例如固態硬碟Solid State Disk(SSD))等。The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, etc., which is integrated with one or more available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media (eg, Solid State Disk (SSD)), and the like.

透過以上的實施方式的描述，所屬領域的技術人員可以清楚地瞭解到，為描述的方便和簡潔，僅以上述各功能模組的劃分進行舉例說明，實際應用中，可以根據需要而將上述功能分配由不同的功能模組完成，即將裝置的內部結構劃分成不同的功能模組，以完成以上描述的全部或者部分功能。Through the description of the above embodiments, those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the division of the above functional modules is used as an example for illustration. The allocation is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.

在本申請所提供的幾個實施例中，應該理解到，所揭露的裝置和方法，可以透過其它的方式實現。例如，以上所描述的裝置實施例僅僅是示意性的，例如，所述模組或單元的劃分，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式，例如多個單元或元件可以結合或者可以集成到另一個裝置，或一些特徵可以忽略，或不執行。另一點，所顯示或討論的相互之間的耦合或直接耦合或通訊連接可以是透過一些介面，裝置或單元的間接耦合或通訊連接，可以是電性，機械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or elements may be Incorporation may either be integrated into another device, or some features may be omitted, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be electrical, mechanical or other forms.

所述作為分離部件說明的單元可以是或者也可以不是實體上分開的，作為單元顯示的部件可以是一個實體單元或多個實體單元，即可以位於一個地方，或者也可以分佈到多個不同地方。可以根據實際的需要選擇其中的部分或者全部單元來實現本實施例方案的目的。The unit described as a separate component may or may not be physically separated, and a component shown as a unit may be one entity unit or multiple entity units, that is, it may be located in one place, or may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在本申請各個實施例中的各功能單元可以集成在一個處理單元中，也可以是各個單元單獨實體存在，也可以兩個或兩個以上單元集成在一個單元中。上述集成的單元既可以採用硬體的形式實現，也可以採用軟體功能單元的形式實現。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist independently, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software functional units.

所述集成的單元如果以軟體功能單元的形式實現並作為獨立的產品銷售或使用時，可以儲存在一個可讀取儲存媒介中。基於這樣的理解，本申請實施例的技術方案本質上或者說對現有技術做出貢獻的部分或者該技術方案的全部或部分可以以軟體產品的形式體現出來，該軟體產品儲存在一個儲存媒介中，包括若干指令用以使得一個設備(可以是單片機，晶片等)或處理器(processor)執行本申請各個實施例所述方法的全部或部分步驟。而前述的儲存媒介包括：隨身碟、行動硬碟、唯讀記憶體(Read-Only Memory，ROM)、隨機存取記憶體(Random Access Memory， RAM)、磁碟或者光碟等各種可以儲存程式碼的媒介。The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence, or the parts that contribute to the prior art, or all or part of the technical solutions, which are stored in a storage medium , which includes several instructions for causing a device (which may be a single chip microcomputer, a chip, etc.) or a processor (processor) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage media include: flash drives, mobile hard drives, Read-Only Memory (ROM), Random Access Memory (RAM), disks or CDs, etc. that can store program codes medium.

以上所述，僅為本申請的具體實施方式，但本申請的保護範圍並不局限於此，任何在本申請揭露的技術範圍內的變化或替換，都應涵蓋在本申請的保護範圍之內。因此，本申請的保護範圍應以所述請求項的保護範圍為準。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this, and any changes or substitutions within the technical scope disclosed in the present application should be covered within the protection scope of the present application. . Therefore, the protection scope of the present application shall be subject to the protection scope of the claimed item.

1:圖像 10:源裝置 11:圖像編碼裝置 12:圖像解碼裝置 13:圖像解碼裝置 20:目的裝置 30:鏈路 40:儲存裝置 41:後處理實體 42:網路實體 100:影像編碼器 101:變換器 102:量化器 103:熵編碼器 104:反量化器 105:反變換器 106:濾波器單元 107:記憶體 108:預測處理單元 109:幀內預測器 110:幀間預測器 111:求和器 112:求和器 120:影像源 121:第一預測單元 122:第二預測單元 123:確定單元 124:獲取單元 130:處理模組 131:通訊模組 132:儲存模組 140:輸出介面 200:影像解碼器 203:熵解碼器 204:反量化器 205:反變換器 206:濾波器單元 207:記憶體 208:預測處理單元 209:幀內預測器 210:幀間預測器 211:求和器 220:顯示裝置 240:輸入介面 1000:圖像編碼裝置 1001:獲取單元 1002:第一預測單元 1003:量化單元 1004:第二預測單元 1005:生成單元 1101:通訊模組 1102:處理模組 1103:儲存模組 S110~S150:步驟 S210~S240:步驟 1: Image 10: Source device 11: Image coding device 12: Image decoding device 13: Image decoding device 20: Purpose Device 30: Link 40: Storage device 41: Postprocessing entities 42: Network entities 100: Video encoder 101: Transformer 102: Quantizer 103: Entropy Encoder 104: Inverse Quantizer 105: Inverse Transformer 106: Filter unit 107: Memory 108: Prediction processing unit 109: Intra Predictor 110: Inter predictor 111: Summation 112: Summation 120: Image source 121: First prediction unit 122: Second prediction unit 123: Determine unit 124: Get Unit 130: Processing Modules 131: Communication module 132: Storage Module 140: Output interface 200: Video decoder 203: Entropy Decoder 204: Inverse Quantizer 205: Inverse Transformer 206: Filter unit 207: Memory 208: Prediction Processing Unit 209: Intra Predictor 210: Inter predictor 211: Summation 220: Display device 240: Input interface 1000: Image coding device 1001: Get unit 1002: First prediction unit 1003: Quantization unit 1004: Second prediction unit 1005: Generate unit 1101: Communication module 1102: Processing Modules 1103: Storage Module S110~S150: Steps S210~S240: Steps

為了更清楚地說明本發明實施例或現有技術中的技術方案，下面將對實施例或現有技術描述中所需要使用的附圖作簡單地介紹，顯而易見地，下面描述中的附圖僅僅是本發明的一些實施例，對於本領域普通技術人員來講，在不付出創造性勞動的前提下，還可以根據這些附圖獲得其他的附圖。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only the For some embodiments of the invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

圖1為本申請實施例中編碼樹單元的一種示意性方塊圖；1 is a schematic block diagram of a coding tree unit in an embodiment of the present application;

圖2為本申請實施例中CTU和編碼塊CU的一種示意性方塊圖；2 is a schematic block diagram of a CTU and a coding block CU in an embodiment of the present application;

圖3為本申請實施例中顏色格式的一種示意性方塊圖；3 is a schematic block diagram of a color format in an embodiment of the application;

圖4為本申請實施例中圖像劃分方式的示意圖；FIG. 4 is a schematic diagram of an image division manner in an embodiment of the present application;

圖5為本申請實施例中編解碼系統的一種示意性方塊圖；5 is a schematic block diagram of an encoding and decoding system in an embodiment of the present application;

圖6為本申請實施例中影像編碼器的一種示意性方塊圖；6 is a schematic block diagram of an image encoder in an embodiment of the present application;

圖7為本申請實施例中影像解碼器的一種示意性方塊圖；7 is a schematic block diagram of an image decoder in an embodiment of the present application;

圖8A為本申請實施例中一種圖像編碼方法的流程示意圖；8A is a schematic flowchart of an image encoding method in an embodiment of the present application;

圖8B為本申請實施例中一種不同閾值處理後生成的殘差圖的示意圖；8B is a schematic diagram of a residual map generated after processing with different thresholds in an embodiment of the present application;

圖8C為本申請實施例中一種特徵預測模型的結構圖；8C is a structural diagram of a feature prediction model in an embodiment of the present application;

圖9A為本申請實施例中一種圖像解碼方法的流程示意圖；9A is a schematic flowchart of an image decoding method in an embodiment of the present application;

圖9B為本申請實施例中一種殘差預測模型的結構圖；9B is a structural diagram of a residual prediction model in an embodiment of the present application;

圖10為本申請實施例中圖像編碼裝置的一種功能單元方塊圖；10 is a block diagram of a functional unit of an image encoding apparatus in an embodiment of the application;

圖11為本申請實施例中圖像編碼裝置的另一種功能單元方塊圖；11 is a block diagram of another functional unit of the image coding apparatus in the embodiment of the application;

圖12為本申請實施例中圖像解碼裝置的一種功能單元方塊圖；12 is a block diagram of a functional unit of an image decoding apparatus in an embodiment of the application;

圖13為本申請實施例中圖像解碼裝置的另一種功能單元方塊圖。FIG. 13 is a block diagram of another functional unit of the image decoding apparatus according to the embodiment of the present application.

S110~S150:步驟 S110~S150: Steps

Claims

An image encoding method, comprising: obtaining an original residual block of a current coding block, where the current coding block includes a currently processed image frame or a coding unit obtained by dividing the currently processed image frame; Obtain the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model; quantizing the transform feature of the current coding block to obtain the quantization feature of the current coding block; Determine the probability of each pixel in the quantization feature of the current coding block through a pre-trained probability prediction model; A binary bit stream for the current coding block is generated using the probability of each pixel.

The method according to claim 1, wherein the obtaining the original residual block of the current coding block includes: determining a prediction block for the current coding block; The original residual block is obtained by making a difference between the prediction block of the current coding block and the original image block of the current coding block.

The method according to claim 2, wherein the obtaining the original residual block by performing a difference between the prediction block of the current coding block and the original image block of the current coding block includes: Perform numerical transformation and quantization according to the prediction block of the current coding block to generate a discrete distribution of the prediction block; Differences between the discrete distribution of the prediction block and the original image block of the current coding block are performed to obtain the original residual block of integer signals.

The method according to claim 1, wherein obtaining the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model, comprising: renormalizing the original residual block to obtain a normalized first residual block; performing sparse processing on the first residual block to obtain a processed second residual block; Inputting the second residual block into a pre-trained feature prediction model to obtain the transform feature of the current coding block.

The method according to claim 4, wherein the re-normalization of the original residual block to obtain the normalized first residual block includes: According to the energy unification mechanism, the different residual distributions of the original residual blocks are converged to the same distribution space to obtain a normalized first residual block.

The method according to claim 5, wherein, according to the energy unification mechanism, different residual distributions of the original residual blocks are converged to the same distribution space to obtain a normalized first residual block, including: Extract the minimum pixel value x _min and the maximum pixel value x _max in the original residual block; normalize the original residual block to the interval (0, 1) through the following formula;

in,

represents the pixel value after the initial transformation,

Indicates the pixel value before normalization; through the following formula

Perform secondary transformation to obtain a continuous distribution of residuals in the interval (-1, 1), that is, the normalized first residual block;

in,

Indicates the normalized pixel value.

The method according to any one of claims 4-6, wherein the performing thinning processing on the first residual block to obtain a processed second residual block, comprising: acquiring a preset threshold set, where the preset threshold set includes multiple thresholds; Screening the target threshold for adapting the current coding block from the preset threshold set; The pixel value of each pixel in the first residual block is traversed, and the pixel value of the pixel whose pixel value is less than the target threshold is set to zero to obtain a processed second residual block.

The method according to claim 7, wherein each of the plurality of thresholds is obtained by uniformly sampling the pixels of the current coding block according to a preset sampling interval.

The method according to claim 1, wherein the quantizing the transform feature of the current coding block to obtain the quantization feature of the current coding block includes: A differentiable quantization mechanism is used for the transform feature of the current coding block, and the floating-point feature is transformed into a quantized integer feature to obtain the quantized feature of the current coding block.

The method according to any one of claims 1-9, wherein the feature prediction model includes a first branch and a second branch, and the first branch and the second branch are connected in parallel; The first branch includes three cascaded residual extraction modules and a downsampling module; The second branch includes three cascaded residual extraction modules, a downsampling module and a startup module.

An image decoding method, comprising: obtaining a binary bit stream of a current decoding block, where the current decoding block includes a bit stream of a currently processed image frame or a decoding unit obtained by dividing the currently processed image frame; Transforming the binary bit stream into a quantized feature of the current decoding block through a pre-trained probabilistic prediction model; Determine the residual block of the current decoding block according to the quantized feature and the pre-trained residual prediction model; A reconstructed block of the current decoding block is determined according to the residual block and the prediction block of the current decoding block.

The method according to claim 11, wherein the determining the reconstructed block of the current decoding block according to the original residual block and the prediction block of the current decoding block comprises: determining a prediction block for the current decoded block; Perform residual compensation on the prediction block of the current decoding block by using the original residual block to obtain the reconstructed block of the current decoding block.

The method according to claim 12, wherein the determining the prediction block of the current decoding block comprises: entropy decoding the current decoded block to generate syntax elements; determining an inter prediction mode for decoding the current decoding block according to the syntax element; According to the determined inter prediction mode, inter prediction is performed on the current decoding block to obtain a prediction block of the current decoding block.

The method according to claim 11, wherein the residual prediction model includes a first branch and a second branch, and the first branch and the second branch are connected in parallel; The first branch includes three cascaded residual extraction modules and an upsampling module; The second branch includes three cascaded residual extraction modules, an upsampling module and a startup module.

An image encoding device, comprising: an obtaining unit, configured to obtain an original residual block of a current coding block, where the current coding block includes a currently processed image frame or a coding unit obtained by dividing the currently processed image frame; The first prediction unit is used to obtain the transform feature of the current coding block according to the original residual block and the pre-trained feature prediction model; a quantization unit, configured to quantize the transform feature of the current coding block to obtain the quantization feature of the current coding block; The second prediction unit is used to determine the probability of each pixel in the quantization feature of the current coding block through a pre-trained probability prediction model; A generating unit, configured to generate a binary bit stream of the current coding block by using the probability of each pixel.

An image decoding device, comprising: an acquisition unit, configured to acquire a binary bit stream of a current decoding block, where the current decoding block includes a bit stream of a currently processed image frame or a decoding unit obtained by dividing the currently processed image frame; a first prediction unit for converting the binary bit stream into a quantized feature of the current decoding block through a pre-trained probability prediction model; a second prediction unit, configured to determine the residual block of the current decoding block according to the quantized feature and the pre-trained residual prediction model; A determination unit, configured to determine the reconstructed block of the current decoding block according to the residual block and the prediction block of the current decoding block.

An encoder includes a non-volatile storage medium and a central processing unit, the non-volatile storage medium stores an executable program, the central processing unit is connected to the non-volatile storage medium, and when the central processing unit executes When the executable program is executed, the encoder performs the bidirectional inter-frame prediction method as described in any one of claims 1-10.

A decoder includes a non-volatile storage medium and a central processing unit, the non-volatile storage medium stores an executable program, the central processing unit is connected to the non-volatile storage medium, and when the central processing unit executes When the executable program is executed, the decoder performs the bidirectional inter prediction method as described in any one of claims 11-14.

A terminal, the terminal comprises: one or more processors, a memory and a communication interface; the memory, the communication interface are connected with the one or more processors; the terminal communicates with the one or more processors through the communication interface other equipment communication, the memory is used to store computer program code, the computer program code includes instructions, When the one or more processors execute the instructions, the terminal performs the method as described in any one of claim items 1-10 or 11-14.

A computer program product comprising instructions which, when run on a terminal, cause the terminal to perform the method of any one of claims 1-10 or 11-14.

A computer-readable storage medium comprising instructions that, when executed on a terminal, cause the terminal to perform the method of any one of claim 1-10 or 11-14.