TWI821923B

TWI821923B - Video coding concept allowing for limitation of drift

Info

Publication number: TWI821923B
Application number: TW111107381A
Authority: TW
Inventors: 羅伯特史庫濱; 克里斯汀巴特尼克; 亞當維科夫斯基; 雅構夏契茲德拉富恩特; 寇尼拉斯黑爾吉; 班加明布洛斯; 湯瑪士夏以爾; 湯瑪士威剛德; 迪特利夫馬皮
Original assignee: 弗勞恩霍夫爾協會
Priority date: 2021-02-26
Filing date: 2022-03-01
Publication date: 2023-11-11
Also published as: WO2022180261A1; CN117083862A; EP4298796A1; US20240155114A1; AU2022225089A1; JP2024509680A; AR126316A1; KR20230130088A; TW202234891A; MX2023008464A

Abstract

A video decoder for decoding a video from a data stream is configured to decode an indication from the data stream which is valid for a sequence of pictures of the video, and indicates that RASL pictures within the sequence of pictures are coded in a manner excluding a predetermined set of one or more coding tools.

Description

Video coding concept technology that allows drift limits

發明領域Field of invention

本申請案係關於適於限制漂移之視訊寫碼及概念。This application is about video coding and concepts suitable for limiting drift.

發明背景Background of the invention

在過去十年中經寫碼視訊之HTTP串流已成為視訊分發的重要路徑，且如今OTT服務提供商可經由公共網際網路接觸到數億使用者。諸如經由HTTP之動態自適應串流(DASH)[1]之標準協定使得服務提供商能夠藉由使伺服器以時間分段的形式以各種位元率提供媒體來將媒體串流傳輸至用戶端。接著，用戶端裝置能夠藉由根據可用的網路頻寬及其以動態及自適應方式的解碼能力在特定區段之所提供變體中進行選擇來下載連續區段以進行連續播放。實務上，內容被提供為由經最佳化位元率階梯生成的多個所謂的表示，其通常涉及多個解析度及保真度，以最佳化特定位元率的感知品質，且藉此最佳化使用者體驗[2]。由於通常使用所謂的封閉圖像組(GOP)寫碼結構對每個區段進行寫碼，而不依賴於較早的區段[2]，因此可將下載及解封包化的區段資料串接至一致的位元串流並饋送至解碼器中。與此類封閉的GOP結構相反，使用所謂的開放GOP寫碼結構之區段包含一些圖像，該等圖像使用早期區段中之圖像的框間預測，此有利於寫碼效率。雖然當隨機存取以呈現次序排在第一位的區段時，使用來自早期區段的框間預測之圖像可在沒有播放問題或視覺假影之情況下自輸出中跳過，但當在連續播放期間發生解析度切換時，會出現問題，因為此等圖像在此非無縫切換中被跳過。即使在純位元率切換中，當區段未正確編碼以進行切換時，一些圖像可能會丟失或展現嚴重的視覺假影。HTTP streaming of coded video has become an important path for video distribution over the past decade, and OTT service providers can now reach hundreds of millions of users over the public Internet. Standard protocols such as Dynamic Adaptive Streaming over HTTP (DASH)[1] enable service providers to stream media to clients by having the server serve the media in a time-sliced fashion at various bitrates . The client device is then able to download consecutive segments for continuous playback by selecting among the provided variants of a particular segment based on available network bandwidth and its decoding capabilities in a dynamic and adaptive manner. In practice, content is provided as multiple so-called representations generated from optimized bitrate ladders, often involving multiple resolutions and fidelity, to optimize the perceptual quality for a specific bitrate, and by This optimizes user experience[2]. Since each segment is typically coded using a so-called closed group of pictures (GOP) coding structure, independent of earlier segments [2], the downloaded and depacketized segment data string can be Connected to a consistent bit stream and fed to the decoder. In contrast to such closed GOP structures, segments using so-called open GOP coding structures contain some images that use inter-frame predictions of images in earlier segments, which is beneficial to coding efficiency. Although images using inter-frame prediction from earlier segments can be skipped from the output without playback issues or visual artifacts when random accessing the first-ordered segment, when A problem occurs when a resolution switch occurs during continuous playback, as the images are skipped in this non-seamless switch. Even in pure bitrate switching, some images may be lost or exhibit severe visual artifacts when segments are not properly encoded for switching.

諸如AVC[4]及HEVC[5]等激增的早期編解碼器不提供使用不同解析度的參考圖像所需的參考圖像重新取樣(RPR)功能性。因此，在解析度切換之後，當在此開放的GOP結構下執行時，區段之一些圖像無法被正確解碼，因為來自早期區段的參考圖像在所需解析度下不可用，此導致在自丟棄圖像之區段切換下的非恆定圖框速率播出。在[6]中，作者提出了藉由對HEVC解碼程序進行規範性改變或使用提供RPR功能性的HEVC (SHVC)之激增較少的可擴展性的擴展來克服開放GOP解析度切換問題的方法。然而，迄今為止，可用的解決方案還沒有在HTTP串流中大量採用開放GOP寫碼。Proliferating early codecs such as AVC [4] and HEVC [5] did not provide the reference picture resampling (RPR) functionality required to use reference pictures of different resolutions. Therefore, after a resolution switch, when executing under this open GOP structure, some images of a segment cannot be decoded correctly because the reference images from earlier segments are not available at the required resolution, which results in Non-constant frame rate playback under segment switching of self-dropped images. In [6], the authors proposed ways to overcome the Open GOP resolution switching problem by making normative changes to the HEVC decoding procedure or using a less proliferative scalability extension to HEVC (SHVC) that provides RPR functionality. . However, the solutions available to date do not make extensive use of open GOP coding in HTTP streaming.

多功能視訊寫碼(VVC)標準[7]之最近定稿的第1版為ITU-T之視訊寫碼專家組及ISO/IEC第29小組委員會(亦被稱作動畫專業團體)合作制定的最新視訊寫碼標準。除了與早期編解碼器相比提供大幅提高的寫碼效率外[8]，VVC在最初的Main 10配置文件中亦包括許多應用程式驅動的特徵，諸如RPR。在VVC開發期間，RPR主要在具有低延遲寫碼結構的對話場景之上下文中進行研究[9]，其中現實世界對潛時及緩衝區大小的要求對插入經框內寫碼圖像以進行解析度切換的可行性設置了嚴格的限制。The recently finalized version 1 of the Versatile Video Coding (VVC) standard [7] is the latest version developed jointly by the ITU-T Video Coding Expert Group and ISO/IEC Subcommittee 29 (also known as the animation professional body) Video coding standards. In addition to providing greatly improved coding efficiency compared with earlier codecs [8], VVC also included many application-driven features, such as RPR, in the original Main 10 configuration file. During the development of VVC, RPR was primarily studied in the context of conversational scenarios with low-latency coding structures [9], where real-world latency and buffer size requirements necessitated the insertion of in-frame coding images for parsing Severe limits are placed on the feasibility of degree switching.

然而，VVC中之RPR亦可為串流域的視訊編碼中之寫碼效率提供實質性的好處。However, RPR in VVC can also provide substantial benefits for coding efficiency in video encoding in the streaming domain.

最好手頭有一個概念，使得能夠在使用諸如VVC之編解碼器的HTTP串流中使用開放GOP解析度切換，其中上述問題不僅發生在RPR方面，而且亦發生在藉由串接與例如不同SNR相關聯的區段來形成視訊位元串流時。It would be nice to have a concept at hand that enables the use of open GOP resolution switching in HTTP streaming using codecs such as VVC, where the above problem occurs not only in terms of RPR, but also by concatenating with e.g. different SNR associated segments to form a video bit stream.

因此，本發明之目標係提供一種視訊寫碼概念，其使得能夠更有效地限制漂移對視訊品質的負面影響，漂移例如由在不同視訊位元串流表示之間的切換下的區段性視訊位元串流形成引起的。It is therefore an object of the present invention to provide a video coding concept that makes it possible to more effectively limit the negative impact of drift on video quality, such as segmented video when switching between different video bit stream representations. Caused by bit stream formation.

此目標係藉由本申請案之獨立請求項的主題來達成。This object is achieved by the subject matter of the independent claims of the present application.

發明概要Summary of the invention

根據本發明之一第一態樣的實施例提供一種用於自一資料串流解碼一視訊之視訊解碼器，該視訊解碼器經組配以自該資料串流解碼一指示(例如，gci_rasl_pictures_tool_constraint_flag)，該指示對於該視訊之一圖像序列為有效的且指示該圖像序列內之RASL圖像以不包括一或多個寫碼工具之一預定集合之一方式來寫碼。舉例而言，該指示可用作一種承諾，使得該解碼器已知，藉由串接在不同空間解析度及/或不同SNR下寫碼之視訊的單獨寫碼的開放GOP版本之開放GOP切換在RASL圖像中不導致過多漂移。其他實施例提供一種用於將一視訊編碼成一資料串流之視訊編碼器，該視訊編碼器經組配以用於將該指示編碼成該資料串流。舉例而言，RASL圖像表示圖像，其按解碼次序在例如CRA之圖像序列中之經框內寫碼圖像之後，但按呈現次序在其之前，且其可使用按解碼次序在經框內寫碼圖像之前的參考圖像。舉例而言，之前的圖像可屬於先前的圖像序列，例如，經框內寫碼圖像可為按寫碼次序之該序列中之第一圖像。由於參考按解碼次序在經框內寫碼圖像之前的圖像，當就包含先前經框內寫碼圖像之視訊的先前區段而言，在經框內寫碼圖像處切換解析度時，可能會出現上文所提及之漂移假影或其他類型的假影。傳信資料串流中之上文所提及之指示可向解碼器保證，相較於先前圖像序列，在圖像序列處之解析度切換在RASL圖像中不涉及過多漂移。因此，該解碼器可基於該指示判定解析度切換是否為有利的。According to a first aspect of the present invention, a video decoder for decoding a video from a data stream is provided, the video decoder being configured to decode a directive (eg, gci_rasl_pictures_tool_constraint_flag) from the data stream. , the indication is valid for an image sequence of the video and indicates that the RASL images in the image sequence are coded in a manner that does not include a predetermined set of one or more coding tools. For example, this indication may be used as a commitment to make known to the decoder open-GOP switching by concatenating separately coded open-GOP versions of video coded at different spatial resolutions and/or different SNRs. Does not cause excessive drift in RASL images. Other embodiments provide a video encoder for encoding a video into a data stream, the video encoder configured for encoding the indication into the data stream. For example, a RASL image represents an image that follows, in decoding order, an in-box coded image in a sequence of images, such as CRA, but precedes it in presentation order, and which can be preceded in decoding order using The reference image before the coded image in the box. For example, the previous image may belong to a previous image sequence, for example, the coded image within the frame may be the first image in the sequence in coding order. Since the reference is to an image before the inboxed image in decoding order, the resolution is switched at the inboxed image with respect to the previous segment of the video containing the previous inboxed image. , the drift artifacts mentioned above or other types of artifacts may appear. The above-mentioned indication in the signaling data stream assures the decoder that resolution switching at a sequence of images does not involve excessive drift in the RASL image compared to the previous sequence of images. Therefore, the decoder can decide whether resolution switching is advantageous based on the indication.

根據本發明之其他實施例提供一種用於自一資料串流解碼一視訊之視訊解碼器，該視訊解碼器經組配以自該資料串流解碼一指示(例如，使用sps_extra_ph_bit_present_flag及ph_extra_bit，或使用gci_rasl_pictures_tool_contraint_flag)，該指示根據該視訊之一圖像序列中之圖像、全域地針對各別圖像或在每切片基礎上指示該各別圖像是否以不包括一或多個寫碼工具之一預定集合之一方式來寫碼，該預定集合包含一基於跨分量線性模型之預測工具(例如，作為一種圖像式指示，其使得有可能查看RASL圖像處之潛力漂移係足夠低的)。其他實施例提供一種用於將一視訊編碼成一資料串流之視訊編碼器，該視訊編碼器經組配以用於將該指示編碼成該資料串流。Other embodiments in accordance with the present invention provide a video decoder for decoding a video from a data stream, the video decoder configured to decode a directive from the data stream (e.g., using sps_extra_ph_bit_present_flag and ph_extra_bit, or using gci_rasl_pictures_tool_contraint_flag), which indicates whether the individual image is configured to exclude one or more coding tools, based on the image in an image sequence of the video, globally for each individual image, or on a per-slice basis. The code is written in a way that a predetermined set contains a prediction tool based on a cross-component linear model (e.g., as a graphical indication that it is possible to view RASL images where the potential drift is sufficiently low). Other embodiments provide a video encoder for encoding a video into a data stream, the video encoder configured for encoding the indication into the data stream.

較佳實施例之詳細說明Detailed description of preferred embodiments

諸圖之以下描述以呈現基於區塊之預測性編解碼器的編碼器及解碼器之描述開始，基於區塊之預測性編解碼器用於對視訊之圖像進行寫碼以便形成可建置本發明之實施例之寫碼框架的實例。參看圖1至圖3描述各別編碼器及解碼器。在下文中，呈現本發明之概念的實施例之描述以及關於此等概念可如何分別建置至圖1及圖2之編碼器及解碼器中的描述，但運用後續圖4及之後描述的實施例亦可用以形成並不根據在圖1及圖2之編碼器及解碼器下層的寫碼框架操作的編碼器及解碼器。The following description of the figures begins with a description of the encoder and decoder presenting a block-based predictive codec for coding images of video to form buildable scripts. An example of a coding framework according to an embodiment of the invention. The respective encoders and decoders are described with reference to Figures 1 to 3. In the following, a description of embodiments of the concepts of the present invention is presented and a description of how these concepts can be built into the encoder and decoder of Figures 1 and 2 respectively, but using the embodiments of Figure 4 and subsequently described It can also be used to form encoders and decoders that do not operate according to the coding framework underlying the encoders and decoders of Figures 1 and 2.

圖1展示用於例示性使用基於變換之殘餘寫碼將圖像A12預測性地寫碼成資料串流A14之設備。使用參考符號A10指示設備或編碼器。圖2展示對應的解碼器A20，亦即經組配以亦使用基於變換之殘餘解碼對來自資料串流A14之圖像A12'進行預測性解碼之設備A20，其中撇號用於指示藉由解碼器A20重建之圖像A12'就藉由預測殘餘信號之量化引入之寫碼耗損而言自最初藉由設備A10編碼之圖像A12偏離。雖然圖1及圖2例示性地使用基於變換之預測殘餘寫碼，但本申請案之實施例不限於此類預測殘餘寫碼。對於參看圖1及圖2所描述之其他細節亦係如此，如將在下文所概述的。Figure 1 shows an apparatus for illustratively using transform-based residual coding to predictively code image A12 into data stream A14. Use reference symbol A10 to designate equipment or encoders. Figure 2 shows a corresponding decoder A20, that is, a device A20 configured to predictively decode image A12' from data stream A14 also using transform-based residual decoding, where the apostrophe is used to indicate that by decoding Image A12' reconstructed by device A20 deviates from image A12 originally encoded by device A10 in terms of coding loss introduced by quantization of the prediction residual signal. Although FIGS. 1 and 2 illustrate the use of transform-based predictive residual coding, embodiments of the present application are not limited to such predictive residual coding. The same is true for the other details described with reference to Figures 1 and 2, as will be summarized below.

編碼器A10經組配以使預測殘餘信號經受空間至頻譜變換且將由此獲得之預測殘餘信號編碼成資料串流A14。同樣地，解碼器A20經組配以對來自資料串流A14之預測殘餘信號進行解碼且使由此獲得之預測殘餘信號經受頻譜至空間變換。Encoder A10 is configured to subject the prediction residual signal to a spatial to spectral transformation and encode the prediction residual signal thus obtained into data stream A14. Likewise, decoder A20 is configured to decode the prediction residual signal from data stream A14 and subject the prediction residual signal thus obtained to a spectral to spatial transformation.

在內部，編碼器A10可包含生成預測殘餘A24之預測殘餘信號形成器A22，以便量測預測信號A26與原始信號(亦即，與圖像A12)之偏差。預測殘餘信號形成器A22可例如為自原始信號，亦即，自圖像A12減去預測信號之減法器。編碼器A10接著進一步包含變換器A28，該變換器使預測殘餘信號A24經受空間至頻譜變換以獲得譜域預測殘餘信號A24'，該譜域預測殘餘信號接著經受藉由亦由編碼器A10包含之量化器A32進行之量化。由此量化之預測殘餘信號A24''經寫碼成位元串流A14。為此，編碼器A10可任擇地包含熵寫碼器A34，該熵寫碼器將經變換及量化之預測殘餘信號熵寫碼成資料串流A14。預測信號A26係基於經編碼成資料串流A14且可自該資料串流解碼之預測殘餘信號A24''由編碼器A10之預測級A36生成。為此，如圖1中所展示，預測級A36可在內部包含：解量化器A38，其對預測殘餘信號A24''進行解量化以便獲得譜域預測殘餘信號A24'''，該信號除量化損失以外對應於信號A24'；解量化器之後的反變換器A40，其使後一預測殘餘信號A24'''經受反變換，亦即，頻譜至空間變換，以獲得除量化損失以外對應於原始預測殘餘信號A24之預測殘餘信號A24''''。預測級A36之組合器A42接著諸如藉由添加來重組預測信號A26與預測殘餘信號A24''''，以便獲得經重建信號A46，亦即原始信號A12之重建。經重建信號A46可對應於信號A12'。預測級A36之預測模組A44接著藉由使用例如空間預測(亦即，圖像內預測)及/或時間預測(亦即，圖像間預測)基於信號A46生成預測信號A26。Internally, encoder A10 may include a prediction residual signal former A22 that generates prediction residue A24 in order to measure the deviation of prediction signal A26 from the original signal (ie, from image A12). The prediction residual signal former A22 may for example be a subtractor that subtracts the prediction signal from the original signal, ie from the image A12. Encoder A10 then further includes a transformer A28 which subjects the prediction residual signal A24 to a spatial to spectral transformation to obtain a spectral domain prediction residual signal A24', which is then subjected to a spectral domain prediction residual signal A24' by Quantization is performed by quantizer A32. The thus quantized prediction residual signal A24'' is coded into a bit stream A14. To this end, encoder A10 may optionally include an entropy coder A34 that entropy codes the transformed and quantized prediction residual signal into data stream A14. Prediction signal A26 is generated by prediction stage A36 of encoder A10 based on prediction residual signal A24'' encoded into and decodable from data stream A14. To this end, as shown in Figure 1, the prediction stage A36 may internally include: a dequantizer A38, which dequantizes the prediction residual signal A24" to obtain a spectral domain prediction residual signal A24"", which signal is dequantized corresponds to the signal A24' except for the quantizer; an inverse transformer A40 after the dequantizer, which subjects the latter prediction residual signal A24''' to an inverse transform, that is, a spectral to spatial transformation, in order to obtain, except for the quantization loss, an inverse transformer A40 corresponding to the original Prediction residual signal A24'''' of prediction residual signal A24. The combiner A42 of the prediction stage A36 then recombines the prediction signal A26 and the prediction residual signal A24'''', such as by addition, in order to obtain a reconstructed signal A46, which is a reconstruction of the original signal A12. Reconstructed signal A46 may correspond to signal A12'. Prediction module A44 of prediction level A36 then generates prediction signal A26 based on signal A46 by using, for example, spatial prediction (ie, intra-image prediction) and/or temporal prediction (ie, inter-image prediction).

同樣，如圖2中所展示，解碼器A20可在內部由對應於預測級A36並以對應於預測級之方式互連的組件構成。詳言之，解碼器A20之熵解碼器A50可對來自資料串流之經量化譜域預測殘餘信號A24''進行熵解碼，接著以上文參看預測級A36之模組描述的方式互連及協作的解量化器A52、反變換器A54、組合器A56及預測模組A58基於預測殘餘信號A24''恢復經重建信號以使得如圖2中所展示，組合器A56之輸出產生經重建信號，亦即圖像A12'。Likewise, as shown in Figure 2, decoder A20 may be internally composed of components that correspond to prediction level A36 and are interconnected in a manner corresponding to the prediction level. In particular, the entropy decoder A50 of the decoder A20 may entropy decode the quantized spectral domain prediction residual signal A24'' from the data stream, and then interconnect and cooperate in the manner described above with reference to the module of the prediction stage A36 The dequantizer A52, the inverse transformer A54, the combiner A56 and the prediction module A58 restore the reconstructed signal based on the prediction residual signal A24'' such that as shown in Figure 2, the output of the combiner A56 produces a reconstructed signal, also That is image A12'.

儘管上文未特地描述，但易於瞭解，編碼器A10可根據某一最佳化方案諸如以最佳化某一速率及失真相關準則(亦即，寫碼成本)之方式設定一些寫碼參數，其包括例如預測模式、運動參數等等。舉例而言，編碼器A10及解碼器A20以及對應的模組A44、A58可分別地支援諸如框內寫碼模式及框間寫碼模式之不同預測模式。編碼器及解碼器在此等預測模式類型之間切換所藉以的粒度可對應於圖像A12及A12'分別細分成寫碼區段或寫碼區塊。以此等寫碼區段為單位，例如可將圖像細分成經框內寫碼之區塊及經框間寫碼之區塊。經框內寫碼區塊係基於如在下文更詳細地概述的各別區塊之空間已經寫碼/經解碼鄰域來預測。若干框內寫碼模式可存在並經選擇用於各別經框內寫碼區段，包括定向或角度框內寫碼模式，各別區段根據定向或角度框內寫碼模式藉由沿著對各別定向框內寫碼模式具專一性的某一方向將鄰域之樣本值外推成各別經框內寫碼區段而填充。框內寫碼模式可(例如)亦包含一或多個其他模式，諸如：DC寫碼模式，用於各別經框內寫碼區塊的預測根據該DC寫碼模式指派DC值至各別經框內寫碼區段內之全部樣本；及/或平面框內寫碼模式，各別區塊之預測根據該平面框內寫碼模式估算或判定為藉由相對於具有由二維線性函數基於相鄰樣本定義的平面之驅動傾斜及偏移的各別經框內寫碼區塊之樣本位置的二維線性函數描述的樣本值之空間分佈。與其比較，可例如在時間上預測經框間寫碼區塊。對於經框間寫碼區塊，運動向量可在資料串流內經傳信，運動向量指示圖像A12所屬之視訊之先前經寫碼圖像的部分之空間位移，在該空間位移下，對先前經寫碼/經解碼圖像進行取樣以便獲得用於各別經框間寫碼區塊之預測信號。此意謂除了藉由資料串流A14包含的殘餘信號寫碼(諸如表示經量化譜域預測殘餘信號A24''之熵寫碼變換係數層級)之外，資料串流A14可已將其編碼成用於指派寫碼模式至各種區塊的寫碼模式參數、用於區塊中之一些的預測參數(諸如用於經框間寫碼區段之運動參數)，及任擇的其他參數(諸如用於控制及傳信圖像A12及A12'分別成區段的細分之參數)。解碼器A20使用此等參數以與編碼器相同之方式細分圖像，從而將相同預測模式指派給區段，且執行相同預測以產生相同預測信號。Although not specifically described above, it is easy to understand that the encoder A10 can set some coding parameters according to a certain optimization scheme, such as optimizing a certain rate and distortion related criteria (ie, coding cost), This includes, for example, prediction modes, motion parameters, etc. For example, the encoder A10 and the decoder A20 and the corresponding modules A44 and A58 may respectively support different prediction modes such as intra-frame coding mode and inter-frame coding mode. The granularity by which the encoder and decoder switch between these prediction mode types may correspond to the subdivision of images A12 and A12' into coding segments or coding blocks, respectively. Taking these coding sections as units, for example, the image can be subdivided into blocks for writing codes within the frames and blocks for writing codes between the frames. In-frame coded blocks are predicted based on the spatial coded/decoded neighborhood of the respective block as outlined in more detail below. Several in-box coding modes may exist and be selected for respective in-box coding sections, including directional or angular in-box coding modes, in which respective sections are encoded according to the directional or angular in-box coding mode by For a certain direction specific to each directional box coding pattern, the sample values of the neighborhood are extrapolated into respective in-box coding sections and filled. The in-box coding mode may, for example, also include one or more other modes, such as a DC coding mode, for assigning DC values to respective in-box coding blocks based on the predictions of the in-box coding modes. After all the samples in the in-frame coding section; and/or the plane in-frame coding pattern, the prediction of each block is estimated or determined based on the plane in-frame coding pattern by relative to the two-dimensional linear function The spatial distribution of sample values is described by a two-dimensional linear function of the sample position of the coded block within the frame, respectively, based on the driving tilt and offset of the plane defined by adjacent samples. In comparison, inter-coded blocks may be predicted temporally, for example. For inter-frame coded blocks, a motion vector may be signaled within the data stream, indicating the spatial displacement of the previously coded image portion of the video to which image A12 belongs, at which spatial displacement for the previously coded image The coded/decoded image is sampled to obtain prediction signals for respective inter-coded blocks. This means that in addition to being encoded by the residual signal contained in the data stream A14 (such as the level of entropy-coded transform coefficients representing the quantized spectral prediction residual signal A24''), the data stream A14 may have been encoded into Coding mode parameters for assigning coding modes to various blocks, prediction parameters for some of the blocks (such as motion parameters for inter-coded segments), and optional other parameters (such as Parameters used to control and signal the subdivision of images A12 and A12' into segments respectively). Decoder A20 uses these parameters to subdivide the image in the same way as the encoder, assigning the same prediction modes to the segments and performing the same predictions to produce the same prediction signals.

圖3說明一方面經重建信號(亦即，經重建圖像A12')與另一方面如在資料串流A14中傳信之預測殘餘信號A24''''及預測信號A26之組合之間的關係。如上文已指示，該組合可為相加。預測信號A26在圖3中說明為圖像區域成使用陰影線說明性指示之經框內寫碼區塊及非陰影說明性指示的經框間寫碼區塊之細分。該細分可為任何細分，諸如圖像區域成正方形區塊或非正方形區塊之列及行之常規細分或來自樹根區塊之圖像A12成多個具有不同大小之葉區塊之多分樹細分，諸如四分樹細分等等，其中圖3中說明其混合，在圖3中，圖像區域首先細分成樹根區塊之列及行，該等樹根區塊接著根據遞歸多分樹細分而另外細分成一或多個葉區塊。Figure 3 illustrates the relationship between the reconstructed signal (i.e. the reconstructed image A12') on the one hand and the combination of the prediction residual signal A24'''' and the prediction signal A26 as signaled in the data stream A14 on the other hand. . As indicated above, the combination may be additive. Prediction signal A26 is illustrated in FIG. 3 as a subdivision of the image area into intra-coded blocks illustratively indicated by hatching and inter-coded blocks illustratively indicated by non-shading. The subdivision can be any subdivision, such as a conventional subdivision of the image area into columns and rows of square blocks or non-square blocks or a polytree of image A12 from the tree root block into multiple leaf blocks with different sizes. Subdivisions, such as quarter-tree subdivision, etc., the mixture of which is illustrated in Figure 3. In Figure 3, the image area is first subdivided into columns and rows of tree root blocks, which are then subdivided according to the recursive polytree subdivision. Additionally subdivided into one or more leaf blocks.

再次，資料串流A14可針對經框內寫碼區塊A80而在其中對框內寫碼模式進行寫碼，該框內寫碼模式將若干所支援框內寫碼模式中之一者指派給各別經框內寫碼區塊A80。對於經框間寫碼區塊A82，資料串流A14可在其中寫碼一或多個運動參數。一般而言，經框間寫碼區塊A82並不限於在時間上寫碼。替代地，經框間寫碼區塊A82可為超出當前圖像A12 (諸如圖像A12所屬的視訊之先前經寫碼圖像，或在編碼器及解碼器分別為可調式編碼器及解碼器之狀況下，另一視圖之圖像或階層式下部層)自身的自先前寫碼部分預測之任何區塊。Again, data stream A14 may be directed to in-frame coding block A80 in which an in-frame coding mode is coded, with the in-frame coding mode assigning one of a number of supported in-frame coding modes to Write code block A80 in each frame. For inter-frame coding block A82, data stream A14 may code one or more motion parameters therein. Generally speaking, the inter-frame coding block A82 is not limited to coding in time. Alternatively, the interframe coded block A82 may be a previously coded image beyond the current image A12 (such as the video to which image A12 belongs, or an adjustable encoder and decoder at the encoder and decoder respectively. In this case, the image of another view or the lower layer of the hierarchy) itself is any block predicted from the previously coded part.

圖3中之預測殘餘信號A24''''亦經說明為圖像區域分成區塊A84之細分。此等區塊可被稱作變換區塊，以便將其與寫碼區塊A80及A82區分。實際上，圖3說明編碼器A10及解碼器A20可使用圖像A12及圖像A12'分別成區塊的二種不同的細分，亦即分別成寫碼區塊A80及A82之一個細分，及成變換區塊A84之另一細分。二個細分可為相同的，亦即每一寫碼區塊A80及A82可同時形成變換區塊A84，但圖3說明以下狀況：其中例如，成變換區塊A84之細分形成成寫碼區塊A80、A82之細分之擴展，使得區塊A80及A82中之二個區塊之間的任一邊界與二個區塊A84之間的邊界重疊，或換言之，每一區塊A80、A82與變換區塊A84中之一者重合或與變換區塊A84之群集重合。然而，亦可獨立於彼此判定或選擇細分，使得變換區塊A84可替代地跨越區塊A80、A82之間的區塊邊界。就細分成變換區塊A84而言，類似陳述因此如同關於細分成區塊A80、A82所提出之彼等陳述而成立，亦即，區塊A84可為圖像區域成區塊(具有或不具有成列及行之配置)之常規細分的結果、圖像區域之遞歸多分樹細分的結果，或其組合或任何其他類別之分塊。順便指出，應注意，區塊A80、A82及A84不限於正方形、矩形或任何其他形狀。The prediction residual signal A24'''' in FIG. 3 is also illustrated as a subdivision of the image area into blocks A84. These blocks may be referred to as transform blocks to distinguish them from coding blocks A80 and A82. In fact, Figure 3 illustrates that the encoder A10 and the decoder A20 can use two different subdivisions of the image A12 and the image A12' into blocks, namely, into one subdivision of the coding blocks A80 and A82, respectively, and into another subdivision of transform block A84. The two subdivisions can be the same, that is, each coding block A80 and A82 can simultaneously form a transformation block A84, but Figure 3 illustrates the following situation: where, for example, the subdivision into a transformation block A84 forms a coding block The extension of the subdivision of A80 and A82 makes any boundary between two blocks in blocks A80 and A82 overlap with the boundary between two blocks A84, or in other words, each block A80, A82 and the transformation One of the blocks A84 coincides with or coincides with a cluster of transformed blocks A84. However, subdivisions may also be determined or selected independently of each other, such that transform block A84 may instead span the block boundary between blocks A80, A82. With respect to the subdivision into transform block A84, similar statements thus hold as to those made with respect to the subdivision into blocks A80, A82, i.e., block A84 may be an image area into blocks (with or without The result of a regular subdivision (column and row configuration), a recursive polytree subdivision of an image area, or a combination thereof or any other category of tiles. By the way, it should be noted that blocks A80, A82 and A84 are not limited to square, rectangular or any other shapes.

圖3進一步說明預測信號A26與預測殘餘信號A24''''之組合直接產生經重建信號A12'。然而，應注意，根據替代性實施例，多於一個預測信號A26可與預測殘餘信號A24''''組合，以產生圖像A12'。FIG. 3 further illustrates that the combination of prediction signal A26 and prediction residual signal A24'''' directly produces reconstructed signal A12'. However, it should be noted that according to alternative embodiments, more than one prediction signal A26 may be combined with the prediction residual signal A24'''' to produce image A12'.

在圖3中，變換區塊A84應具有以下重要性。變換器A28及反變換器A54以此等變換區塊A84為單位執行其變換。舉例而言，許多編解碼器將某種DST或DCT用於所有變換區塊A84。一些編解碼器允許跳過變換，使得對於變換區塊A84中之一些，直接在空間域中對預測殘餘信號進行寫碼。然而，根據下文描述之實施例，編碼器A10及解碼器A20以使得其支援若干變換之方式進行組配。舉例而言，編碼器A10及解碼器A20所支援之變換可包含： ○ DCT-II (或DCT-III)，其中DCT代表離散餘弦變換 ○ DST-IV，其中DST代表離散正弦變換 ○DCT-IV ○DST-VII ○恆等變換(IT) In Figure 3, transform block A84 should have the following importance. The converter A28 and the inverse converter A54 perform their conversion in units of the conversion blocks A84. For example, many codecs use some kind of DST or DCT for all transform blocks A84. Some codecs allow transforms to be skipped such that for some of the transform blocks A84 the prediction residual signal is coded directly in the spatial domain. However, according to embodiments described below, encoder A10 and decoder A20 are configured in such a way that they support several transformations. For example, the transformations supported by encoder A10 and decoder A20 may include: ○ DCT-II (or DCT-III), where DCT stands for Discrete Cosine Transform ○ DST-IV, where DST stands for Discrete Sine Transform ○DCT-IV ○DST-VII ○Identity transformation (IT)

當然，雖然變換器A28將支援此等變換之所有正向變換版本，但解碼器A20或反變換器A54將支援其對應的後向或反向版本： ○反向DCT-II (或反向DCT-III) ○反向DST-IV ○反向DCT-IV ○反向DST-VII ○恆等變換(IT) Of course, while transformer A28 will support all forward transform versions of these transforms, decoder A20 or inverse transformer A54 will support their corresponding backward or inverse versions: ○Inverse DCT-II (or Inverse DCT-III) ○Reverse DST-IV ○Reverse DCT-IV ○Reverse DST-VII ○Identity transformation (IT)

後續描述提供關於編碼器A10及解碼器A20可支援哪些變換之更多細節。在任何狀況下，應注意，所支援變換之集合可僅包含一種變換，諸如一種頻譜至空間或空間至頻譜變換。The subsequent description provides more details on which transformations encoder A10 and decoder A20 can support. In any case, it should be noted that the set of supported transforms may include only one transform, such as a spectral-to-spatial or spatial-to-spectral transform.

如上文已概述，已呈現圖1至圖3作為實例，在該實例中可實施下文進一步描述之本發明概念以便形成根據本申請案的編碼器及解碼器之特定實例。就此而言，圖1及圖2之編碼器及解碼器可分別表示下文所描述之編碼器及解碼器的可能實施。然而，圖1及圖2僅僅為實例。然而，根據本申請案之實施例之編碼器可使用下文更詳細地概述之概念來執行圖像A12之基於區塊之編碼，且與圖1之編碼器之不同之處在於例如成區塊A80之細分以不同於圖3中所例示之方式來執行。同樣地，根據本申請案之實施例之解碼器可使用下文進一步所概述之寫碼概念執行自資料串流A14之圖像A12'的基於區塊之解碼，但與例如圖2之解碼器A20之不同之處可在於其不支援框內預測，或在於其以不同於關於圖3所描述之方式將圖像A12'細分成區塊，及/或在於其不在變換域中但在例如空間域中自資料串流A14導出預測殘餘。As summarized above, Figures 1 to 3 have been presented as examples in which the inventive concepts described further below may be implemented to form specific examples of encoders and decoders according to the present application. In this regard, the encoders and decoders of FIGS. 1 and 2 may represent possible implementations of the encoder and decoder, respectively, described below. However, Figures 1 and 2 are only examples. However, an encoder according to an embodiment of the present application may perform block-based encoding of image A12 using concepts outlined in more detail below, and differs from the encoder of FIG. 1 in that, for example, block A80 The subdivision is performed in a manner different from that illustrated in Figure 3. Likewise, a decoder in accordance with embodiments of the present application may perform block-based decoding of image A12' from data stream A14 using coding concepts outlined further below, but is different from, for example, decoder A20 of Figure 2 This may differ in that it does not support intra-frame prediction, or in that it subdivides image A12' into blocks in a manner different from that described with respect to Figure 3, and/or in that it is not in the transform domain but in, for example, the spatial domain. Export prediction residuals from data stream A14.

如所論述，圖1至圖3僅僅意圖提供視訊編解碼器之粗略概述，本申請案之隨後概述的實施例可基於該粗略概述。舉例而言，VVC為圖1及圖2之視訊解碼器及視訊編碼器可據以定製的視訊編解碼器之實例。As discussed, Figures 1-3 are only intended to provide a rough overview of video codecs upon which the subsequently outlined embodiments of this application may be based. For example, VVC is an example of a video codec based on which the video decoder and video encoder of Figures 1 and 2 can be customized.

以下描述如以下構造。初步，VVC用作視訊編解碼器環境之實例，並且基於此實例，以下描述提供了關於調查開放GOP寫碼結構的一般寫碼效率影響以及區段切換處的圖像品質影響的實驗報告。再次，稍後描述之實施例不限於VVC，且關於此等實施例論述之寫碼工具不限於關於VVC論述之彼等寫碼工具，但此等實驗及其結果之呈現提供了產生稍後描述之實施例的動機。此外，隨後的描述將提供GOP寫碼結構及分段的概述，接著呈現約束編碼以實現開放GOP切換，例如開放GOP解析度切換，並有效地限制與切換相關聯的漂移。在下文中，呈現了自關於VVC的考慮中出現的本申請案之若干實施例。The following description is structured as follows. Preliminarily, VVC is used as an example of a video codec environment, and based on this example, the following description provides an experimental report on investigating the general coding efficiency impact of the open GOP coding structure and the image quality impact at zone switches. Again, the embodiments described later are not limited to VVC, and the coding tools discussed with respect to these embodiments are not limited to those discussed with respect to VVC, but these experiments and the presentation of their results provide the basis for generating the results described later. motivation for the implementation. In addition, the ensuing description will provide an overview of the GOP coding structure and segmentation, and then present constrained coding to enable open GOP switching, such as open GOP resolution switching, and effectively limit the drift associated with the switching. In the following, several embodiments of the present application arising from considerations regarding VVC are presented.

下文提供了對用於串流之VVC位元串流及媒體分段內的結構之概述。媒體區段通常僅使用框內寫碼工具與框內隨機存取點(IRAP)圖像對準。IRAP圖像可頻繁地出現在經寫碼視訊位元串流中以允許諸如尋求或快速轉遞之功能性，且亦用作切換點以用於自適應HTTP串流。用於隨選視訊(VoD)串流之系統通常將區段與IRAP圖像週期對準，亦即，IRAP圖像通常置放於區段開始處且所要區段持續時間判定IRAP圖像之間的時間距離。然而，存在其中並非所有媒體區段含有IRAP圖像之使用情況，例如，極低延遲串流，使得小區段可用於傳輸，而無需等待IRAP圖像且因此縮減內容產生側處之潛時。區段大小可取決於目標應用在長度上變化。舉例而言，VoD服務允許玩家建置更大的緩衝區(例如，30秒)來克服吞吐量波動，對於這種情況，高達幾秒(例如，5秒)的區段大小可為合理的設計選擇[3]。然而，需要更嚴格的端對端延遲之實時服務不允許在用戶端側有如此大的緩衝區，且因此需要更頻繁的切換點及1秒或更短的更短區段。The following provides an overview of the structure within the VVC bitstream and media segments used for streaming. Media sections are typically aligned with in-frame random access point (IRAP) images using only in-frame coding tools. IRAP images may appear frequently in encoded video bit streams to allow functionality such as seeking or fast delivery, and also serve as switching points for adaptive HTTP streaming. Systems used for Video on Demand (VoD) streaming typically align segments to the IRAP image period, that is, the IRAP image is typically placed at the beginning of the segment and between the IRAP images as determined by the desired segment duration. time distance. However, there are use cases where not all media segments contain IRAP images, such as very low latency streaming, so that small segments can be used for transmission without waiting for IRAP images and thus reducing latency on the content generation side. Section size may vary in length depending on the target application. For example, VoD services allow players to build larger buffers (e.g., 30 seconds) to overcome throughput fluctuations. For this case, a segment size of up to several seconds (e.g., 5 seconds) can be a reasonable design Select [3]. However, real-time services that require tighter end-to-end latency do not allow such large buffers on the user side, and therefore require more frequent switching points and shorter segments of 1 second or less.

只要解碼延遲要求允許，二個IRAP圖像之間的圖像通常在雙向經預測分層GOP結構中經編碼，其涉及在呈現之前的重定序，因為此結構提供如AVC [10]中所引入之實質性的寫碼效率益處。GOP之層級結構可用於時間可擴展性，其中解碼直至給定層的所有圖像對應於給定的圖框速率，並且為每一圖像指派對應的時間Id (Tid)值，如圖1中針對8個圖像之GOP大小所展示。GOP可經定義為按解碼次序自第一Tid 0圖像直至但不包括以下Tid 0圖像的所有圖像。通常，區段包括取決於IRAP時間段及GOP大小之一或多個GOP結構。在HEVC中，經解碼圖像緩衝區(DBP)中之參考圖像槽之數量允許16個圖像的典型GOP大小，而在VVC中，增加了DPR容量，允許多達32個圖像的分層GOP大小。As long as the decoding latency requirements allow, the picture between two IRAP pictures is usually encoded in a bidirectional predicted hierarchical GOP structure, which involves reordering before rendering, since this structure provides as introduced in AVC [10] Substantial coding efficiency benefits. The hierarchical structure of the GOP can be used for temporal scalability, where all images up to a given layer are decoded corresponding to a given frame rate, and each image is assigned a corresponding temporal Id (Tid) value, as in Figure 1 Shown for GOP size of 8 images. The GOP may be defined as all pictures in decoding order from the first Tid 0 picture up to but not including the following Tid 0 picture. Typically, a section includes one or more GOP structures depending on the IRAP time period and GOP size. In HEVC, the number of reference picture slots in the Decoded Picture Buffer (DBP) allows for a typical GOP size of 16 pictures, while in VVC the increased DPR capacity allows for decoding of up to 32 pictures. Layer GOP size.

按解碼次序在IRAP圖像之後但按呈現次序在其之前的圖像在HEVC中被引入並被稱為前置圖像。其可進一步區分為隨機存取跳過前置(RASL)及隨機存取可解碼前置(RADL)。雖然RADL圖像可僅使用按解碼次序自IRAP圖像開始之參考圖像，但RASL圖像可另外使用在IRAP之前的參考圖像。瞬時隨機存取(IDR)類型之IRAP圖像重設DBP，且可僅具有前置圖像，其為產生所謂的封閉GOP結構之RADL圖像。此外，另一方面，清潔隨機存取(CRA)類型之IRAP圖像不重設DPB。因此，按解碼次序自CRA之前重建的圖像可用作未來圖像的參考，亦即，允許所謂的開放GOP寫碼結構之RASL圖像。與RADL圖像相比，RASL圖像展現出更高的寫碼效率，但當參考圖像不可用時，可能會呈現為不可解碼的，例如在區段開頭的相關聯的IRAP處之隨機存取期間且無需解碼前一個區段。VVC高級語法之更詳細概述可見於[11]中。Pictures that follow the IRAP picture in decoding order but precede it in rendering order are introduced in HEVC and are called front-end pictures. It can be further distinguished into Random Access Skip Preamble (RASL) and Random Access Decodable Preamble (RADL). While a RADL picture may only use reference pictures starting from the IRAP picture in decoding order, a RASL picture may additionally use reference pictures that precede the IRAP. An IRAP image of the Instantaneous Random Access (IDR) type resets the DBP and may only have a preamble image, which is a RADL image resulting in a so-called closed GOP structure. Additionally, on the other hand, Clean Random Access (CRA) type IRAP images do not reset the DPB. Therefore, images reconstructed from before CRA in decoding order can be used as a reference for future images, ie, RASL images that allow so-called open GOP coding structures. RASL images exhibit higher coding efficiency than RADL images, but may appear undecodable when the reference image is not available, such as random access at the associated IRAP at the beginning of the segment. during fetch without decoding the previous segment. A more detailed overview of VVC's high-level syntax can be found in [11].

圖4說明例如由具有不同解析度之二個連續區段之串接形成視訊資料串流，其中第二區段使用來自第一區段之具有參考圖像的開放GOP寫碼結構。詳言之，經引用參考圖像為圖4中之彼等矩形，箭頭自其出現。箭頭自身說明預測互依性，亦即，其自參考圖像指向引用圖像。每一圖像係與某一時間ID Tid相關聯，且如可看出，寫碼次序與圖像之輸出/呈現次序偏離。如可看出，輸出次序排序9至15之圖像為RASL圖像，其直接地或間接地參考其所屬之自身區段(區段1)之CRA圖像，以及源自前一區段(區段0)之圖像，主要為具有輸出次序排序8之圖像。舉例而言，視訊之區段亦可稱為圖像序列，其可例如包含一個GOP。Figure 4 illustrates, for example, a video data stream formed by the concatenation of two consecutive segments with different resolutions, where the second segment uses an open GOP coding structure with a reference image from the first segment. Specifically, the reference images cited are the rectangles in Figure 4 from which arrows appear. The arrow itself illustrates prediction interdependence, that is, it points from the reference image to the reference image. Each image is associated with a certain time ID Tid, and as can be seen, the coding order deviates from the output/presentation order of the images. As can be seen, the images in output order 9 to 15 are RASL images that refer directly or indirectly to the CRA image of its own segment (Segment 1) to which it belongs, as well as from the previous segment ( The images in section 0) are mainly images with output order sorting 8. For example, a video segment may also be called an image sequence, which may include a GOP, for example.

當RASL圖像之參考圖像位於前一區段中且串流用戶端在此前一區段之後切換表示時，相較於編碼器側，用戶端解碼器將使用參考圖像之至少部分的不同變型來解碼RASL圖像。此情形在內容未適當地生成之情況下可導致非一致的位元串流，或在經重建RASL圖像中導致顯著失配，且此漂移可傳播至所有RASL圖像直至但不包括相關聯的CRA圖像。在下文中，論述內容之適當生成，其允許使用開放GOP結構，同時在區段切換處維持位元串流一致性且避免原本在切換期間不利於視覺品質之非所需漂移。When the reference picture of a RASL picture is located in the previous section and the streaming client switches representation after the previous section, the client decoder will use at least a partial difference in the reference picture compared to the encoder side Variant to decode RASL images. This situation can result in non-uniform bit streams if the content was not generated properly, or significant mismatch in the reconstructed RASL image, and this drift can be propagated to all RASL images up to but not including the correlation CRA image. In the following, appropriate generation of content is discussed that allows the use of open GOP structures while maintaining bitstream consistency at sector switches and avoiding unwanted drift that would otherwise be detrimental to visual quality during switches.

當實行開放GOP切換時，VVC中之多個框間預測工具展現不同潛力以引起漂移，且同時，工具使用率受到一致性約束的限制。接下來，吾人在開放GOP解析度切換時分析VVC中之框間預測工具的漂移潛力，且本文中提議一種克服開放GOP解析度切換之嚴重假影同時確保VVC一致性的受約束編碼方法。When open GOP switching is implemented, multiple inter-frame prediction tools in VVC exhibit different potentials to cause drift, and at the same time, tool usage is limited by the consistency constraint. Next, we analyze the drift potential of inter-frame prediction tools in VVC when open-GOP resolution switches, and in this paper propose a constrained coding method that overcomes severe artifacts of open-GOP resolution switches while ensuring VVC consistency.

關於VVC寫碼工具之漂移潛力，VVC中之第一組寫碼工具可經分類為樣本至樣本預測，例如自VVC之許多前身或VVC中被稱作仿射運動補償(AMC)之最新引入的框間預測模式已知的常規的基於區塊之平移運動補償樣本預測，其將預測區塊分解為較小子區塊，該等子區塊經個別地運動補償以模擬仿射運動補償[12]。使用作為AMC之任擇的組件之光流(PROF)或雙向光流(BDOF)的預測改進為VVC中之最新引入的框間預測工具，其藉由依賴於基於光流之方法來更改經預測樣本以便模擬逐樣本框間預測。當不同表示用作使用此樣本至樣本預測工具之重建的參考時，經重建圖像之視覺品質將傾向於該表示之視覺品質且遠離原始表示之視覺品質。然而，此樣本至樣本預測具有相對較低的引起視覺干擾假影的可能性，但在給定的RASL圖像序列中導致優雅的品質轉變，因為第一視覺品質之預測源樣本藉由殘餘資訊在第二視覺品質下逐漸更新。Regarding the drift potential of VVC coding tools, the first set of coding tools in VVC can be classified as sample-to-sample prediction, such as those introduced from many predecessors of VVC or the most recent introduction in VVC called Affine Motion Compensation (AMC) Inter prediction mode is known as conventional block-based translation motion compensated sample prediction, which decomposes the prediction block into smaller sub-blocks that are individually motion compensated to simulate affine motion compensation [12 ]. Prediction improvements using Optical Flow (PROF) or Bidirectional Optical Flow (BDOF), which are optional components of AMC, are the latest inter-frame prediction tools introduced in VVC, by relying on optical flow-based methods to modify the predicted samples in order to simulate sample-by-frame inter-frame prediction. When a different representation is used as a reference for reconstruction using this sample-to-sample prediction tool, the visual quality of the reconstructed image will be biased towards the visual quality of that representation and away from the visual quality of the original representation. However, this sample-to-sample prediction has a relatively low probability of causing visual interference artifacts, but leads to elegant quality transitions in a given RASL image sequence, because the first visual quality of the predicted source sample is obtained by residual information Gradually updated in second visual quality.

圖5例如說明光流工具300及其功能性。展示圖像12內之經框間預測區塊10c。經框間預測區塊10c係與運動向量302相關聯。亦即，針對經框間預測區塊10c在資料串流中傳信運動向量302。運動向量302指示經框間預測區塊10c之平移位移，將在該平移位移下，取樣/複製參考圖像304以便產生經框間預測區塊10c之平移框間預測信號。若光流300將用於經框間預測區塊10c，則光流工具300藉助於基於光流之分析來改良平移框間預測信號。更精確而言，代替僅僅在根據運動向量302位移之區塊10c的佔據面積處取樣參考圖像304，光流工具300使用稍微大於參考圖像304內之圖像10c的佔據面積之面積，亦即面積306，以便針對經框間預測區塊10c判定框間預測信號，亦即藉由檢測面積306內之梯度以便判定經框間預測信號。換言之，可能除了在根據運動向量302位移、落在參考圖像304內之子像素位置處之區塊10c的樣本之狀況下的內插濾波之外，藉由使用光流工具300、藉由使用梯度敏感FIR濾波來判定區塊10c之經框間預測信號之每一樣本。Figure 5 illustrates an optical flow tool 300 and its functionality, for example. Inter-predicted block 10c within image 12 is shown. Inter-predicted block 10c is associated with motion vector 302. That is, motion vector 302 is signaled in the data stream for inter-prediction block 10c. The motion vector 302 indicates the translational displacement through the inter-prediction block 10c at which the reference image 304 is sampled/copied in order to generate the translational inter-prediction signal through the inter-prediction block 10c. If optical flow 300 is to be used for inter-prediction block 10c, optical flow tool 300 improves the translational inter-prediction signal by means of optical flow-based analysis. More precisely, instead of sampling reference image 304 only at the footprint of block 10c displaced according to motion vector 302, optical flow tool 300 uses an area slightly larger than the footprint of image 10c within reference image 304, also That is, the area 306 is used to determine the inter prediction signal for the inter prediction block 10c, that is, the gradient within the area 306 is detected to determine the inter prediction signal. In other words, it is possible by using the optical flow tool 300 , by using the gradient Sensitive FIR filtering is used to determine each sample of the inter-frame prediction signal of block 10c.

應注意，雖然圖5僅僅展示一個參考圖像及一個運動向量，但光流工具亦可針對二個參考圖像及二個運動向量執行光流分析，其中圖像12係在二個參考圖像之間且該圖像含有經框間預測區塊10c。It should be noted that although Figure 5 only shows one reference image and one motion vector, the optical flow tool can also perform optical flow analysis on two reference images and two motion vectors, where image 12 is between the two reference images. and the image contains the inter-predicted block 10c.

隨後，當描述本申請案之實施例時，光流工具300可形成用於寫碼工具之一個實例，該等寫碼工具經受自編碼排除以避免漂移。因此，根據下文描述之一些實施例，視訊解碼器及/或視訊編碼器支援此光流工具。因為圖1及圖2展示用於視訊解碼器及視訊編碼器之可能實施，所以支援根據圖5之光流工具之根據圖1及圖2的視訊解碼器及編碼器可表示用於本申請案之實施例的例示性基礎。無論如何，關於如何分別在解碼器側及編碼器側決定是否將光流工具300應用於經框間預測區塊10c，存在不同的可能性。舉例而言，光流工具300可為固有應用的寫碼工具。舉例而言，光流工具是否應用於區塊10c可取決於在資料串流中針對區塊10c相對於除光流工具以外的另一寫碼工具傳信之一或多個寫碼選項。甚至替代地，光流工具300可為固有應用的寫碼工具，其中光流工具300是否應用於區塊10c之決策取決於區塊10c之大小。自然地，二種相依性可適用。甚至替代地，光流工具300可為明確應用的寫碼工具，其意謂語法元素經寫碼成資料串流，該資料串流專門地傳信光流工具300是否應用於區塊10c。換言之，此語法元素將特定針對區塊10c。應注意，在僅能夠在工具350之非應用與應用之間切換的意義上，此語法元素可並非旗標或二進位值。實際上，語法元素可為m元語法元素，其中其m個狀態中之一者例如與工具350之應用相關聯。替代地，語法元素可為m元語法元素，其中其m個狀態中之一者例如與工具350之非應用相關聯。Subsequently, when describing embodiments of the present application, optical flow tool 300 may form an example for coding tools that undergo self-coding exclusion to avoid drift. Therefore, according to some embodiments described below, the video decoder and/or the video encoder supports this optical flow tool. Since Figures 1 and 2 show possible implementations for video decoders and video encoders, video decoders and encoders according to Figures 1 and 2 that support the optical flow tool according to Figure 5 can be represented for use in this application illustrative basis for the embodiment. Regardless, there are different possibilities as to how to decide whether to apply the optical flow tool 300 to the inter-predicted block 10c at the decoder side and the encoder side respectively. For example, the optical flow tool 300 may be a coding tool native to the application. For example, whether the optical flow tool is applied to block 10c may depend on one or more coding options being signaled in the data stream for block 10c relative to another coding tool other than the optical flow tool. Even alternatively, the optical flow tool 300 may be a native application coding tool, where the decision of whether the optical flow tool 300 should be applied to a block 10c depends on the size of the block 10c. Naturally, two kinds of dependencies apply. Even alternatively, the optical flow tool 300 may be an application-specific coding tool, meaning that the syntax elements are coded into a data stream that specifically signals whether the optical flow tool 300 applies to block 10c. In other words, this syntax element will be specific to block 10c. It should be noted that this syntax element may not be a flag or binary value in the sense of merely being able to switch between non-application and application of tool 350 . Indeed, the syntax element may be an m-gram element, where one of its m states is associated with the application of tool 350, for example. Alternatively, the syntax element may be an m-gram element, where one of its m states is associated with non-application of tool 350, for example.

在隨後解釋的實施例中，該編碼器將自RASL圖像之編碼排除諸如光流工具300之某些寫碼工具傳信至解碼器。隨後論述之此傳信不需要充當此等寫碼工具是否可用於某一圖像之區塊或圖像之切片的實際控制。實際上，在下文進一步描述之實施例中呈現之此傳信或指示實際上可作為自編碼器至解碼器之一種額外傳信或承諾：下文進一步論述之此等某些寫碼工具(或僅僅一個寫碼工具)已經自諸如RASL圖像之某些圖像的編碼排除。在後一狀況下，隨後論述之指示或傳信對於資料串流內部之組配信令係冗餘的或作為其補充，該組配信令去啟動某些寫碼工具，使得其不可用於應用於某些圖像或圖像切片內之圖像區塊。該編碼器遵從藉由設定組配信令以及(若適用)與區塊基礎工具應用相關聯的語法元素而給定之保證，因此，亦即使得工具係/不用於例如RASL圖像中。因此，光流工具可為可去啟動寫碼工具，其就其應用於諸如區塊10c之經框間預測區塊而言可藉由資料串流內部之組配資料以圖像或切片為單位去啟動。舉例而言，此組配信令可由切片標頭或圖像標頭包含。是否將會將光流工具300應用於某一經框間預測區塊10c因此將基於組配信令而決定，且僅僅若組配信令針對圖像12 (或圖像12之切片) (經框間預測區塊10c為該圖像之一部分)指示光流工具300經啟動，亦即可用的，則做出關於其上文所描述的應用之顯式傳信或固有決策。In an embodiment explained subsequently, the encoder signals to the decoder the exclusion of certain coding tools such as the optical flow tool 300 from the encoding of the RASL image. This signaling, discussed subsequently, need not serve as the actual control of whether these coding tools can be used for a block of an image or a slice of an image. In fact, this signaling or indication presented in the embodiments described further below may actually serve as an additional signaling or commitment from the encoder to the decoder: some of these coding tools (or simply A coding tool) has been excluded from the encoding of certain images such as RASL images. In the latter case, the instructions or signaling discussed subsequently are redundant or supplementary to the assembly signaling within the data stream that activates certain coding tools, rendering them unusable for the application. Image blocks within certain images or image slices. The encoder complies with the guarantees given by setting the assembly signaling and, if applicable, the syntax elements associated with the block-based tool application, thus enabling the tool to be/is not used in e.g. RASL images. Therefore, the optical flow tool can be a deactivated coding tool that, for its application to inter-frame prediction blocks such as block 10c, can be configured in units of images or slices by assembling data within the data stream. Go to boot. This assembly signaling may be contained by a slice header or an image header, for example. Whether the optical flow tool 300 will be applied to a certain inter-predicted block 10c will therefore be decided based on the assembly signaling, and only if the assembly signaling is for image 12 (or a slice of image 12) (inter-predicted Block 10c (part of this image) indicates that the optical flow tool 300 is enabled, ie available, and makes explicit signaling or inherent decisions regarding its application as described above.

後一情形再次在圖6中描繪，此係因為用於決定是否應用某一寫碼工具之一般可能性係與針對本文中所論述的寫碼工具類似。因此，圖6說明當前經寫碼圖像12以及為圖像12之一部分的區塊10。圖像區塊10僅僅經說明性地描繪，且實際上僅僅為圖像12分割成之一個圖像區塊。區塊10所屬之區塊層級可例如對應於進行框內/框間預測模式決策之區塊，但亦存在其他可能性，諸如區塊小於後面的區塊。已經關於圖3描述用於將圖像12分割成諸如區塊10之區塊的可能實例。圖6中展示代表性寫碼工具350。此寫碼工具實際上是否應用於區塊10可經由以下情形來控制：該應用可取決於如由箭頭352所指示之區塊之大小及/或在資料串流中針對預定區塊10傳信之寫碼選項，諸如係使用框內區塊模式抑或框間區塊模式來寫碼區塊10。在圖6中使用箭頭354來描繪後一相關性。作為區塊大小相關性之替代方案或另外，對對應的基於遞歸多分樹之分割樹的區塊縱橫比或樹分割層級之相關性可適用。寫碼選項可係關於區塊經框內寫碼、區塊經框間寫碼、區塊經雙向預測、區塊藉由等距間隔開且相對置放的參考圖像而經雙向預測等等中之一或多者。二個資訊實體，亦即區塊大小/縱橫比/分裂層級及寫碼選項在資料串流中傳信，亦即藉由關於圖像12分割成包括區塊10之區塊的分割資訊之區塊大小/縱橫比/分裂層級，及特定用於例如區塊10之寫碼選項。代替依賴於隱式區塊層級決策，呈語法元素353形式之顯式信令可用於控制工具350針對區塊10之應用。語法元素將特定針對區塊10，且對於用於工具350之工具決策係特定的。雖然在區塊層級上、在較大層級上進行此應用決策356，但應用決策可另外取決於啟動決策358，其係在較大層級上諸如關於整個圖像12或圖像12沿著圖像12經寫碼成資料串流之區塊寫碼次序所細分成的切片而進行。啟動決策358可由前述組配信令來控制，諸如區塊10所屬之切片之切片標頭中或圖像12之圖像標頭中的設置或諸如圖像參數集之與圖像12相關聯的參數集。並非所有的決策356及358均可適用。其中無一者可適用，然而其中在下文所論述之指示接著將不僅將某一寫碼工具350對某一切片、圖像或RASL圖像的啟動或去啟動指示為一種承諾或冗餘信令，且亦類似於組配信令起作用。The latter scenario is again depicted in Figure 6 because the general possibilities for deciding whether to apply a certain coding tool are similar to those discussed for the coding tools herein. Thus, FIG. 6 illustrates the currently coded image 12 and the block 10 that is part of the image 12 . Image block 10 is depicted illustratively only, and is actually only one image block into which image 12 is divided. The block level to which block 10 belongs may, for example, correspond to the block where the intra/inter prediction mode decision is made, but there are other possibilities, such as a block being smaller than a subsequent block. Possible examples for segmenting the image 12 into blocks such as block 10 have been described with respect to FIG. 3 . A representative coding tool 350 is shown in Figure 6. Whether this coding tool is actually applied to block 10 can be controlled by: the application can depend on the size of the block as indicated by arrow 352 and/or signaling for the predetermined block 10 in the data stream. Coding options, such as whether to use intra-frame block mode or inter-frame block mode to write the code block 10. The latter correlation is depicted in Figure 6 using arrow 354. As an alternative to or in addition to block size dependence, dependence on block aspect ratio or tree partitioning level of the corresponding recursive polytree-based partitioning tree may apply. Coding options may relate to blocks being coded in-frame, blocks being coded in-frame, blocks being bi-predicted, blocks being bi-predicted from equidistantly spaced and oppositely placed reference images, etc. one or more of them. Two information entities, namely block size/aspect ratio/split level and coding options are communicated in the data stream, i.e. by partitioning information about the partitioning of image 12 into blocks including block 10 Block size/aspect ratio/split level, and coding options specific to, for example, block 10. Instead of relying on implicit block-level decisions, explicit signaling in the form of syntax element 353 may be used to control the application of tool 350 to block 10 . The syntax elements will be specific to block 10 and to tool decisions for tool 350. While this application decision 356 is made at the block level, at a larger level, the application decision may additionally depend on an activation decision 358 at a larger level, such as with respect to the entire image 12 or along the image 12 12 is performed by coding the data stream into slices subdivided in the block coding order. The startup decision 358 may be controlled by the aforementioned configuration signaling, such as settings in the slice header of the slice to which block 10 belongs or in the image header of image 12 or parameters associated with image 12 such as an image parameter set. set. Not all decisions 356 and 358 are applicable. None of these are applicable, however the indications discussed below will then not only indicate the activation or deactivation of a certain slice, image or RASL image by a certain coding tool 350 as a commitment or redundant signaling , and also functions similarly to group signaling.

因此，雖然圖6之寫碼工具350可為圖5之光流工具300，但其在圖6中經描繪為表示經受下文進一步論述之編碼器受約束決策的寫碼工具中之任一者。Thus, while the coding tool 350 of FIG. 6 may be the optical flow tool 300 of FIG. 5, it is depicted in FIG. 6 as representing any of the coding tools subject to the encoder constrained decisions discussed further below.

VVC中之第二組寫碼工具係用於自圖像之語法或樣本的語法(亦即模型參數)預測。類似於其前身，VVC允許在區塊基礎上使用來自所謂的並置參考圖像之時間MV候選者經由時間運動向量預測(TMVP) [13]進行運動向量(MV)預測。藉由在子區塊基礎上引入更細粒度的TMVP變型(SBTMVP)而在VVC中擴展了此特徵，增加了在並置參考圖像中查找對應的運動資訊之位移步驟。The second set of coding tools in VVC is used to predict the grammar (that is, the model parameters) from the image or sample. Similar to its predecessor, VVC allows motion vector (MV) prediction via Temporal Motion Vector Prediction (TMVP) [13] on a block basis using temporal MV candidates from so-called collocated reference pictures. This feature is extended in VVC by introducing a more fine-grained variant of TMVP (SBTMVP) on a sub-block basis, adding a displacement step to find the corresponding motion information in the collocated reference image.

圖7說明時間運動向量預測工具500。此工具500為用於圖6之寫碼工具350的另一實例，及用於下文針對編碼器受約束指示進一步論述之寫碼工具的另一實例。時間運動向量預測工具係用於基於與參考圖像502內之區塊506相關聯的運動向量510針對圖像12之經框間預測區塊10c預測運動向量508。雖然根據圖7之實例，工具500可使用參考圖像502內之並置區塊的運動向量作為預測子508，但工具500首先導出用於區塊10c之位移向量504，諸如空間預測運動向量，且使用此位移向量504來在參考圖像502內定位「並置區塊506」，且此並置區塊506之運動向量510接著用於時間預測運動向量候選者508。此外，圖7說明工具500可操作以僅針對至用於區塊10c之運動向量候選者清單512中之插入來判定時間預測運動向量508，諸如藉由使用在資料串流中針對區塊10c傳信之索引而最終自該清單選擇一個運動向量預測子。替代地，清單512係以產生清單512內之運動向量候選者的某一次序之某一方式來理解，且具有最高排序之運動向量候選者經簡單地最終選取/選擇以用於框間預測區塊10c中。Figure 7 illustrates a temporal motion vector prediction tool 500. This tool 500 is another example of the coding tool 350 used in FIG. 6 and discussed further below with respect to encoder constrained indications. A temporal motion vector prediction tool is used to predict motion vector 508 for inter-predicted block 10c of image 12 based on motion vector 510 associated with block 506 within reference image 502. Although according to the example of FIG. 7 , tool 500 may use motion vectors of collocated blocks within reference image 502 as predictors 508 , tool 500 first derives displacement vectors 504 for block 10c, such as spatially predicted motion vectors, and This displacement vector 504 is used to locate "collocated block 506" within the reference image 502, and the motion vector 510 of this collocated block 506 is then used to temporally predict motion vector candidates 508. Additionally, FIG. 7 illustrates that the tool 500 is operable to determine the temporal prediction motion vector 508 only for insertion into the motion vector candidate list 512 for block 10c, such as by using the The index is then used to select a motion vector predictor from the list. Alternatively, list 512 is understood in some way that the motion vector candidates within list 512 are generated in some order, and the motion vector candidate with the highest ranking is simply finalized/selected for use in the inter prediction region In block 10c.

應注意，圖6之運動向量預測工具應被理解為足夠寬泛以在合併意義上亦涵蓋時間運動向量預測之狀況，亦即自並置區塊提供具有諸如參考圖像索引之其他運動預測設置的運動向量預測子成塊。It should be noted that the motion vector prediction tool of Figure 6 should be understood as broad enough to also cover the situation of temporal motion vector prediction in a merged sense, that is, providing motion from collocated blocks with other motion prediction settings such as reference picture indexes. Vector predictors into blocks.

關於圖6論述之所有選項可用於工具500以判定工具500實際上是否應用於區塊10c。更精確而言，工具500可為圖6中之寫碼工具350的實例，其中呈語法元素353形式之顯式信令係用於控制工具500對區塊10之應用，亦即，在資料串流中針對區塊10c傳信之索引在清單512中選擇或不選擇TMVP候選者508。若未經選擇，工具500針對區塊10c保持無效，此經解釋為區塊10c之工具500之非應用。然而，經由決策358之在較高層級上之組配信令可用於較全域地去啟動工具500，使得對於駐存在區域(圖像或切片)中之其中工具500經傳信以去啟動的區塊，無論如何以不包括TMVP候選者508之方式理解清單512，且清單索引將不再充當前述區塊層級決策控制。該編碼器可決定使用哪種方式以便遵從經給定至解碼器之任一承諾，以便避免如本文中所教示的RASL圖像中之漂移。All of the options discussed with respect to Figure 6 may be used with tool 500 to determine whether tool 500 actually applies to block 10c. More precisely, the tool 500 may be an example of the coding tool 350 in FIG. 6 , where explicit signaling in the form of syntax element 353 is used to control the application of the block 10 by the tool 500 , that is, in the data string The index in the stream signaled for block 10c selects or deselects TMVP candidate 508 in list 512 . If unselected, tool 500 remains invalid for block 10c, which is interpreted as a non-application of tool 500 for block 10c. However, configuration signaling at a higher level via decision 358 may be used to launch tool 500 more globally, such that for blocks residing in a region (image or slice) where tool 500 is signaled to launch, Manifest 512 is anyway understood to not include TMVP candidates 508, and the manifest index will no longer serve as the aforementioned block-level decision control. The encoder can decide which way to use in order to comply with any commitment given to the decoder in order to avoid drift in RASL images as taught herein.

前述第二組中之其他工具之特徵可為樣本至語法預測工具。VVC中之完全新的框間預測工具引入工具為解碼器側運動向量改進(DMVR)，其基於二個參考圖像之鏡像性質來改進雙向預測中之MV的準確度，該二個參考圖像與當前圖像具有相同且相反的時間距離。Other tools in the aforementioned second group may be characterized as sample-to-syntax prediction tools. A completely new inter-frame prediction tool introduced in VVC is Decoder Side Motion Vector Improvement (DMVR), which improves the accuracy of MV in bidirectional prediction based on the mirroring properties of two reference images. Have the same and opposite temporal distance from the current image.

圖8說明解碼器側運動向量改進工具400。若應用於圖像12之經框間預測區塊10d，則工具400會藉助於藉由使用最佳匹配搜索來改進在資料串流中針對此區塊10d寫碼/傳信之此運動向量402而改良運動向量402以用於對來自參考圖像404之此區塊10d進行框間預測。可在由解碼器及編碼器支援之最高解析度(1/16像素解析度)下執行最佳匹配搜索。經傳信運動向量可具有較低解析度，且僅僅用以「實質上」指示最終由DMVR工具判定之經改進運動向量。關於什麼應與參考圖像匹配以便執行最佳匹配搜索，存在不同可能性。一種可能性將為使用鄰近經框間預測區塊10d之已經解碼部分。此部分將經受使用經傳信運動向量處及周圍之運動向量候選者的運動向量位移，且產生最佳匹配之候選者將經選擇為經改進運動向量。替代地，區塊10d可為經雙向預測區塊，其中存在將藉由工具400改進之一對經傳信運動向量。亦即，在該狀況下，經框間預測區塊10d為經雙向預測區塊。參考圖像可能將在其間具有按呈現次序之圖像12，亦即，參考圖像在時間上置放在圖像12前方及後方。任擇地，二個參考圖像在時間上與圖像12同等地間隔開。工具400甚至可專用於此狀況，亦即工具400將在區塊層級上取決於區塊寫碼選項而固有地經啟動，該等區塊寫碼選項指示將基於與當前圖像10等距間隔開且在其間具有圖像10之參考圖像來雙向預測區塊10d。在彼狀況下，將藉由在包括該對經傳信運動向量及該對經傳信運動向量周圍之運動向量配對候選者當中執行最佳匹配搜索來改進該對經傳信運動向量，其中之一將為向量402。舉例而言，可藉由測試在稱為運動向量配對候選者之部分處取樣的參考圖像之間的相似性來執行最佳匹配搜索。運動向量配對候選者可限於以下運動向量配對候選者：其中用於一個參考圖像之一個運動向量候選者與用於此參考圖像之對應的信號運動向量以與用於另一參考圖像之另一運動向量候選者與用於此另一參考圖像之另一信號運動向量之偏差相反的方式偏離。對於類似性，可使用SAD或SSD。最佳匹配運動向量配對候選者接著將用作信號運動向量之替換406，亦即，用於參考圖像404之向量402將由406替換，且用於另一參考圖像之另一經傳信向量將由運動向量配對候選者之另一向量替換。亦將存在其他可能性。舉例而言，可藉由在經傳信運動向量處及周圍之二個經傳信運動向量處在二個參考圖像中執行經取樣貼片之平均值的最佳匹配搜索來個別地改進經雙向預測區塊10d之二個經傳信運動向量。Figure 8 illustrates a decoder side motion vector improvement tool 400. If applied to the inter-frame predicted block 10d of image 12, the tool 400 will do so by improving the motion vector 402 coded/signaled for this block 10d in the data stream by using a best match search. Motion vector 402 is modified for inter prediction of this block 10d from reference image 404. Best match searches can be performed at the highest resolution supported by the decoder and encoder (1/16 pixel resolution). The signaled motion vectors may be of lower resolution and only serve to "substantially" indicate the improved motion vectors ultimately determined by the DMVR tool. There are different possibilities as to what should match the reference image in order to perform a best match search. One possibility would be to use the already decoded portion of the neighboring inter-prediction block 10d. This portion will be subjected to motion vector displacement using motion vector candidates at and around the signaled motion vector, and the candidate that yields the best match will be selected as the improved motion vector. Alternatively, block 10d may be a bi-predicted block where there is a pair of signaled motion vectors to be improved by tool 400. That is, in this situation, the inter-predicted block 10d is a bi-predicted block. The reference image may have the image 12 between it in presentation order, that is, the reference image is placed temporally in front of and behind the image 12 . Optionally, the two reference images are equidistant in time from image 12. The tool 400 may even be specialized for this situation, i.e. the tool 400 will be inherently enabled at the block level depending on block coding options which instructions will be based on equidistant intervals from the current image 10 Block 10d is bidirectionally predicted with a reference picture of picture 10 in between. In that case, the pair of signaled motion vectors will be improved by performing a best match search among motion vector pairing candidates including the pair of signaled motion vectors and the pair of signaled motion vectors, one of which will be vector402. For example, a best match search may be performed by testing the similarity between reference images sampled at portions called motion vector pairing candidates. Motion vector pairing candidates may be limited to motion vector pairing candidates in which a motion vector candidate for one reference image has a corresponding signal motion vector for that reference image and a motion vector candidate for another reference image. The other motion vector candidate deviates in an opposite manner to the deviation of the other signal motion vector for this other reference image. For similarity, SAD or SSD can be used. The best matching motion vector pairing candidate will then be used as a replacement 406 for the signal motion vector, that is, the vector 402 for the reference image 404 will be replaced 406, and the other signaled vector for another reference image will be replaced 406 by The motion vector pairing candidate is replaced by another vector. There will also be other possibilities. For example, bidirectional prediction can be improved individually by performing a best match search of the average of sampled tiles in two reference images at and around the signaled motion vector. Two signaled motion vectors of block 10d.

關於MVR工具400是否將應用於區塊10d之決策，關於圖6所論述之所有替代方案均可適用。亦即，MVR工具400可為寫碼工具350。Regarding the decision of whether MVR tool 400 will be applied to block 10d, all alternatives discussed with respect to Figure 6 may apply. That is, the MVR tool 400 may be the coding tool 350.

VVC中之另一新工具為跨分量線性模型(CCLM)，其允許使用其中模型參數係自經重建明度樣本值導出之線性模型自各別明度分量框內預測區塊之色度分量。線性模型藉助於以下等式將經次取樣明度樣本 rec' _L變換成色度預測： P( i, j) = a⋅ rec' _L( i, j ) + b，其中參數a及b係如下自相鄰明度及色度樣本導出。在 X _l 及 X _s 分別表示二個最大及二個最小相鄰樣本之平均值且 Y _l及 Y _s分別表示對應的色度樣本對之平均值之情況下，參數經導出為： a= ( Y _l- Y _s) / ( X _l- X _s) b= Y _s- a⋅ X _s Another new tool in VVC is the Cross-Component Linear Model (CCLM), which allows predicting the chrominance component of a block within a respective luma component box using a linear model in which the model parameters are derived from reconstructed luma sample values. The linear model transforms the successively sampled lightness samples rec ' _L into chroma predictions with the help of the following equation: P ( i , j ) = a ⋅ rec ' _L ( i , j ) + b , where the parameters a and b are: Luminance and chroma samples are exported. _In the case _where _X l _and Y _l - Y _s ) / ( X _l - X _s ) b = Y _s - a ⋅ X _s

由於參數求導程序僅考慮相鄰樣本值之極值，因此即使在相鄰區塊中之單樣本漂移離群值之狀況下，該程序亦易於發生廣泛漂移。並且，由於線性模型，若a較大，則明度漂移可經放大。對於考慮所有相鄰樣本值的其他框內預測模式，漂移傳播不那麼明顯，且無法經線性地放大。由於此固有的不穩定性，此模式在應用於其中受約束漂移係可接受的應用中時需要特別小心，例如HTTP自適應串流中之開放GOP切換。除此之外，由於在經描述應用之上下文中，該漂移可僅發生在RASL圖框中，亦即運動經預測圖框。若該編碼器決定使用CCLM，亦即框內預測模式，則此通常係由於缺少適當的經運動補償之預測子，此意指高時間活動區。在此類區中，用於開放GOP切換之所預期的重建漂移預期為高的，對所論述的不穩定效應的貢獻甚至更大。Because the parameter derivation procedure only considers extreme values of adjacent sample values, it is prone to widespread drift even in the presence of single-sample drift outliers in adjacent blocks. Also, due to the linear model, if a is larger, the brightness drift can be amplified. For other in-box prediction modes that consider all adjacent sample values, the drift propagation is less obvious and cannot be linearly amplified. Due to this inherent instability, this mode requires special care when used in applications where constrained drift is acceptable, such as open GOP switching in HTTP adaptive streaming. In addition, since in the context of the described application, the drift may only occur in RASL frames, ie motion predicted frames. If the encoder decides to use CCLM, i.e. intra prediction mode, this is usually due to the lack of appropriate motion compensated predictors, meaning regions of high temporal activity. In such regions, the reconstruction drift expected for open GOP switching is expected to be high, contributing even more to the discussed instability effects.

圖9示意性地展示跨分量線性模型工具100之操作模式。相對於應用哪一工具100來展示區塊10a。在組合用於明度之預測信號及殘餘信號內，使用任何類型的預測120且藉由自資料串流解碼122殘餘信號來重建此區塊10a之明度分量。工具100之目的為基於經重建明度分量124來預測色度分量。此係使用線性模型或線性映射106來進行。此線性映射106使用純量線性函數以便基於樣本126之經重建明度分量針對區塊10a之每一樣本126來逐樣本地預測樣本126之色度分量。純量線性函數之線性參數，亦即上文所表示的a及b，係藉由區塊10a之鄰域112中之已經重建樣本的明度及色度分量之統計資料之分析針對區塊10a經區塊全域地判定。詳言之，藉由工具100執行之統計分析124在圖9中在128處指示，且可針對每一分量判定在鄰域112中之經重建樣本內出現的外部明度及色度值。舉例而言，可使用二個最大明度值之平均值，以及二個最小明度值。對於經預測色度分量亦如此。基於所得四個平均值，用於鄰域112內之明度值之跨度及色度值之跨度的量度經判定，且其間的比率係用作用於線性映射106之純量線性函數的斜率。用於明度之最小值的平均值—斜率x，用於色度之最小值之平均值—係用於判定純量線性函數之截距。如此執行之參數求導108產生純量線性函數，且經重建明度分量124之每一明度樣本係用於預測區塊10a內之對應的色度樣本值，藉此產生用於區塊10a之色度分量間預測信號。圖9中未展示但係可能的為，資料串流可具有經寫碼成其的色度殘餘信號，以便針對色度分量C ₁及/或C ₂來校正色度分量間預測信號。 Figure 9 schematically illustrates the mode of operation of the cross-component linear model tool 100. Block 10a is shown relative to which tool 100 is applied. In combining the prediction signal and the residual signal for luma, the luma component of this block 10a is reconstructed using any type of prediction 120 and by decoding 122 the residual signal from the data stream. The purpose of tool 100 is to predict chroma components based on reconstructed luma components 124 . This is done using a linear model or linear mapping 106 . This linear mapping 106 uses a scalar linear function to predict the chroma component of sample 126 on a sample-by-sample basis for each sample 126 of block 10a based on the reconstructed luma component of sample 126. The linear parameters of the scalar linear function, namely a and b represented above, are determined for the block 10a through the analysis of the statistical data of the luminance and chroma components of the reconstructed samples in the neighborhood 112 of the block 10a. Block-wide determination. In detail, the statistical analysis 124 performed by the tool 100 is indicated in FIG. 9 at 128 and can determine for each component the external luminance and chroma values that occur within the reconstructed sample in the neighborhood 112 . For example, you can use the average of the two maximum brightness values, and the two minimum brightness values. The same is true for predicted chrominance components. Based on the four resulting averages, measures for the span of luma values and the span of chroma values within neighborhood 112 are determined, and the ratio therebetween is used as the slope of the scalar linear function for linear mapping 106 . The average of the minimum values for lightness - the slope x, and the average of the minimum values for chroma - are the intercepts used to determine the scalar linear function. The parameter derivation 108 so performed produces a scalar linear function, and each luma sample of the reconstructed luma component 124 is used to predict the corresponding chroma sample value within block 10a, thereby generating the color for block 10a. Inter-component prediction signal. Not shown in Figure 9, but possible, the data stream may have a chroma residual signal coded therein to correct the inter-chroma component prediction signal for chroma components C ₁ and/or C ₂ .

再次，工具100為用於圖6中之寫碼工具350的另一實例。換言之，寫碼工具是否應用於圖像之區塊10a可根據關於圖6論述之選項中之任一者來決定。值得注意的是，VVC不提供任何用於圖像全域地或至少切片全域地去啟動工具100之手段，但根據稍後描述之一實施例，提議此傳信，藉此避免由工具100引起的有害漂移。更精確而言，工具100可為用於圖6中之寫碼工具350的實例，其中呈語法元素353形式之顯式信令係用於控制工具100對區塊10之應用。語法元素可為例如旗標，其為區塊10打開或關閉工具100。就隨後關於實施例論述之編碼受約束指示的論述而言，存在二種可能性：其可僅向接收者(亦即，解碼器)通知以下事實：用於某一圖像12之所有語法元素353指示工具100之非應用，或其替代地亦可充當組配信令，其關於圖像12去啟動工具100，由此，資料串流將不針對彼圖像12內部之區塊10傳送任何語法元素353。以VVC為例，例如，不存在組配信令，以便以圖像或切片之粒度來去啟動工具100。在VVC中，此組配信令僅用於控制用於一序列圖像之工具100的啟動。因此，RASL式去啟動係不可行的。Again, tool 100 is another example of coding tool 350 for use in FIG. 6 . In other words, whether the coding tool is applied to the block 10a of the image may be determined based on any of the options discussed with respect to FIG. 6 . It is worth noting that VVC does not provide any means for launching the tool 100 image-wide or at least slice-wide, but according to an embodiment described later, this signaling is proposed, thereby avoiding errors caused by the tool 100 Harmful drift. More precisely, the tool 100 may be an example of the coding tool 350 used in FIG. 6 , where explicit signaling in the form of syntax element 353 is used to control the application of the block 10 by the tool 100 . The syntax element may be, for example, a flag that opens or closes tool 100 for block 10 . Regarding the subsequent discussion of the coding constrained indication regarding the embodiment discussion, there are two possibilities: it can simply inform the recipient (i.e. the decoder) of the fact that all syntax elements for a certain image 12 353 indicates non-application of the tool 100, or it may alternatively serve as a configuration signal to launch the tool 100 with respect to the image 12, whereby the data stream will not carry any syntax for the block 10 within that image 12 Element 353. Taking VVC as an example, for example, there is no assembly signaling to launch the tool 100 at the granularity of images or slices. In VVC, this configuration signaling is only used to control the launch of the tool 100 for a sequence of images. Therefore, RASL-style deactivation is not feasible.

又一新的工具經引入至VVC之迴路濾波級，且被稱作明度映射及色度縮放(LMCS)，其中色度樣本值使用自明度樣本導出之參數來進行縮放，如圖10中所說明。Yet another new tool was introduced into the loop filtering stage of VVC and is called Luminance Mapping and Chroma Scaling (LMCS), where the chroma sample values are scaled using parameters derived from the luma samples, as illustrated in Figure 10 .

此處亦存在色度至明度相關性，但不如在CCLM之狀況下那麼明顯。在該程序之色度縮放部分期間，經變換且經反量化色度殘餘係根據自相鄰虛擬管線資料單元(VPDU)之明度樣本導出的模型參數來縮放。出於管線潛時縮減的目的，CCLM依賴於相鄰VPDU之樣本。然而，在LMCS中，考慮所有相鄰明度樣本，允許平均化VPDU相鄰樣本中之漂移離群值。並且，該等模型參數係用於縮放殘餘信號，其不聚集漂移但經直接傳信。出於彼等原因，工具放大漂移的可能要小得多，但在為受控漂移應用進行編碼時仍應考慮。There is also a chroma-to-lightness correlation here, but it is not as obvious as in the case of CCLM. During the chroma scaling portion of the procedure, the transformed and inverse quantized chroma residues are scaled according to model parameters derived from the luma samples of adjacent virtual pipeline data units (VPDUs). For the purpose of pipeline latency reduction, CCLM relies on samples of adjacent VPDUs. However, in LMCS, all adjacent brightness samples are considered, allowing averaging of drifting outliers in adjacent samples of the VPDU. Also, these model parameters are used to scale the residual signal, which does not aggregate drift but is directly signaled. For these reasons, tools are much less likely to amplify drift, but should still be considered when coding for controlled drift applications.

圖11中描繪用於LMCS工具200之操作模式。此處之想法為執行明度工具映射212以便在寫碼明度色調標度208而非呈現明度色調標度210中針對預定圖像12執行明度分量預測202及明度分量殘餘解碼204。更精確而言，雖然經重建明度值可在線性標度上在某一位元深度處表示用於圖像12之經重信號的明度分量，但明度色調映射212可使用諸如逐圖像線性色調映射函數或某一其他色調映射函數來將此標度210映射至寫碼標度208上。色調映射函數可在諸如圖像12之圖像參數集的資料串流處傳信。該函數係由編碼器適當地判定。因此，藉由框間預測202針對圖像12之區塊10b獲得的框間預測信號係在於寫碼標度208處與殘餘信號204組合之前經受明度色調映射212，以產生用於圖像12之經重建明度分量。對於經框內預測區塊，使用框間預測206。在寫碼標度208域內執行框內預測。工具200之另一目標為根據明度色調映射212來控制色度分量之量化誤差。亦即，色度分量量化誤差係針對每一區塊個別地經控制，且藉由明度色調映射212來適應明度分量之影響。為此目的，針對圖像12之區塊10b自圖像區塊10b之鄰域222內之圖像12的經重建明度分量之寫碼明度色調標度版本的平均值220來判定色度殘餘縮放因數216。針對圖像區塊10b自資料串流解碼之色度殘餘信號224係根據如此判定之色度殘餘縮放因數216經縮放226，且此縮放係用於針對圖像區塊10b校正228框內色度預測信號230。相較於明度分量，框內色度預測信號230可使用相同預測工具或其子集。藉由使用鄰域222以用於判定平均值220，用於區塊10b之明度及色度分量可並行而非串行地經重建。圖像12之經重建明度及色度分量接著經受反明度色調映射240以產生用於圖像12之經重建最終結果且產生接下來寫碼/解碼圖像之基礎，亦即用作用於此經隨後寫碼/解碼圖像之參考圖像。The operating mode for the LMCS tool 200 is depicted in Figure 11. The idea here is to perform the luma tool mapping 212 in order to perform luma component prediction 202 and luma component residual decoding 204 for the predetermined image 12 in a written luma tone scale 208 rather than a rendered luma tone scale 210 . More precisely, while the reconstructed luma values may represent the luma components of the weighted signal for image 12 on a linear scale at a certain bit depth, luma tone mapping 212 may use, for example, per-image linear tone mapping function or some other tone mapping function to map this scale 210 onto the coding scale 208. The tone mapping function may be signaled at the data stream such as the image parameter set of image 12. This function is appropriately determined by the encoder. Therefore, the inter prediction signal obtained by inter prediction 202 for block 10b of image 12 is subjected to luma tone mapping 212 before being combined with residual signal 204 at coding scale 208 to produce the The reconstructed lightness component. For intra-predicted blocks, inter-prediction 206 is used. Intra-box prediction is performed within the coding scale 208 domain. Another goal of the tool 200 is to control the quantization error of the chroma components according to the luma tone map 212 . That is, the chroma component quantization error is controlled individually for each block, and the effects of the luma component are accommodated by luma tone mapping 212 . To this end, chroma residual scaling is determined for block 10b of image 12 from an average 220 of coded luma tone scaled versions of the reconstructed luma components of image 12 within a neighborhood 222 of image block 10b Factor 216. The chroma residual signal 224 decoded from the data stream for image block 10b is scaled 226 according to the chroma residual scaling factor 216 so determined, and this scaling is used to correct 228 the in-frame chroma for image block 10b Prediction signal 230. Compared to the luma component, the in-frame chroma prediction signal 230 may use the same prediction tool or a subset thereof. By using neighborhood 222 for determining mean 220, the luma and chroma components for block 10b can be reconstructed in parallel rather than serially. The reconstructed luma and chroma components of image 12 are then subjected to inverse luma tone mapping 240 to produce the reconstructed final result for image 12 and to produce the basis for subsequent coding/decoding of the image, i.e., used for this process. The reference image for subsequent writing/decoding of images.

關於寫碼工具200，與關於前述圖論述的其他寫碼工具一樣，該註釋亦為有效的，亦即，寫碼工具200為用於圖6之寫碼工具350的實例，且可不使用用於決定此寫碼工具對關於圖6所論述之特定區塊的應用之選項。作為一實例，關於寫碼工具300，可省略逐區塊決策356，但組配信令可用於逐圖像或逐切片地控制應用。Regarding coding tool 200, this comment is valid as with other coding tools discussed with regard to the preceding figures, that is, coding tool 200 is an example of coding tool 350 for FIG. 6, and may not be used for Options that determine the application of this coding tool to the specific blocks discussed in Figure 6. As an example, with respect to coding tool 300, block-by-block decision 356 may be omitted, but assembly signaling may be used to control the application on a per-image or per-slice basis.

自語法至語法及樣本至語法框間預測工具預測的MV之誤差在後續的樣本至樣本預測工具中導致嚴重假影之可能性相對較高，該等後續的樣本至樣本預測工具使用此等經錯誤預測的MV作為空間或時間MV候選者。此對於在開放GOP切換中展示大多數可見假影之(SB) TMVP以及DMVR尤其有效，此係因為不當運動向量之誤差可能會以不斷增加的量值在後續圖像內傳播。然而，此亦適用於其他預測模型，例如CCLM及/或LMCS，其基於自經重建樣本值導出之參數來實行。圖12說明了在常規開放GOP寫碼中使用32個圖像之GOP大小的一般語法或參數預測誤差對RASL圖像之視覺及客觀品質的影響。顯而易見，RASL圖像在經重建圖像之明度以及色度分量中存在明顯的假影。Errors in MV predictions from grammar-to-grammar and sample-to-grammar box prediction tools have a relatively high probability of causing serious artifacts in subsequent sample-to-sample prediction tools that use these experience Mispredicted MVs serve as spatial or temporal MV candidates. This is particularly effective for (SB) TMVP and DMVR which exhibit the majority of visible artifacts in open GOP switches, since errors in inappropriate motion vectors may propagate with increasing magnitude in subsequent images. However, this also applies to other prediction models, such as CCLM and/or LMCS, which are implemented based on parameters derived from reconstructed sample values. Figure 12 illustrates the impact of general syntax or parameter prediction errors on the visual and objective quality of RASL images using a GOP size of 32 images in conventional open GOP coding. It is obvious that the RASL image has obvious artifacts in the brightness and chroma components of the reconstructed image.

開放GOP切換中之第三個問題可能源於在VVC中使用自適應參數集(APS)，其攜帶自適應迴路濾波器(ALF)之濾波器係數、具有色度縮放之明度映射(LMCS)之參數及量化縮放清單。因為RASL圖像可指代在各別CRA之前以解碼次序傳輸的APS，該等APS在連續解碼期間係可用的，但在CRA圖像處之隨機存取時不可用，因為在此狀況下丟棄了相關聯的RASL圖像。因此，開放GOP解析度切換可能會導致對缺失APS的引用，此會導致非容錯解碼器崩潰或在使用具有巧合匹配識別符值的錯誤APS之參數時產生視覺假影。類似於語法預測工具，此問題極有可能造成視覺干擾，直至解碼器完全失效。The third problem in open GOP switching may stem from the use of adaptive parameter sets (APS) in VVC, which carry the filter coefficients of the adaptive loop filter (ALF), the luma map with chroma scaling (LMCS) List of parameters and quantization scaling. Since a RASL picture may refer to APS transmitted in decoding order before the respective CRA, these APS are available during consecutive decoding, but not during random access at the CRA picture since they are discarded in this case associated RASL image. Therefore, open GOP resolution switching may result in references to missing APS, which can cause non-fault-tolerant decoders to crash or produce visual artifacts when using parameters of the wrong APS with coincidentally matching identifier values. Similar to syntax prediction tools, this issue has a high probability of causing visual interference until the decoder fails completely.

為了避免在執行開放GOP解析度切換時出現上文所描述的問題，可使用在下文描述之由三個支柱組成的受約束VVC編碼方法。In order to avoid the problems described above when performing open GOP resolution switching, the constrained VVC encoding method consisting of three pillars described below can be used.

首先，與CRA相關聯之RASL圖像受約束，使得按解碼次序在CRA之前的圖像不會經選擇為並置參考圖像，來執行語法至語法預測，亦即(SB) TMVP。藉此，與在編碼器側完全相同的參考圖像及運動資訊經使用，並且防止了來自不正確的源運動資訊至較早區段圖像的任何語法預測誤差。在可能的實施中，按解碼次序之第一RASL圖像被限制為僅使用其相關聯的CRA圖像作為並置參考圖像，該參考圖像自然地僅代管零運動向量，而其他RASL圖像可存取按解碼次序之第一RASL圖像及以下圖像的非零時間MV候選者。關於樣本至語法預測工具，DMVR對所有RASL圖像停用，該等RASL圖像具有按解碼次序在相關聯的CRA之前的有效參考圖像。在另一替代方案中，DMVR對所有RASL圖像停用，而不管其參考圖像如何，且在另一替代方案中，DMVR僅對用作以下圖像之並置參考圖像的RASL圖像停用，並且在又一替代方案中，DMVR對所有RASL圖像停用，除了屬於最高時間層且藉此不用作參考之RASL圖像以外。藉此，來自早期區段之參考圖像的錯誤樣本值與其他RASL圖像的編碼器側或受漂移影響的樣本不同，不會導致樣本至語法預測中之誤差。First, the RASL pictures associated with the CRA are constrained such that pictures preceding the CRA in decoding order are not selected as collocated reference pictures to perform syntax-to-syntax prediction, that is, (SB) TMVP. Thereby, the exact same reference pictures and motion information as on the encoder side are used, and any syntax prediction errors from incorrect source motion information to earlier segment pictures are prevented. In a possible implementation, the first RASL picture in decoding order is restricted to using only its associated CRA picture as a collocated reference picture, which naturally only hosts zero motion vectors, while the other RASL pictures The image has access to non-zero time MV candidates for the first RASL picture and subsequent pictures in decoding order. Regarding the sample-to-syntax prediction tool, DMVR is disabled for all RASL images that have a valid reference image preceding the associated CRA in decoding order. In another alternative, DMVR is disabled for all RASL images regardless of their reference images, and in another alternative, DMVR is disabled only for RASL images used as collocated reference images for is used, and in yet another alternative, DMVR is disabled for all RASL images except those belonging to the highest temporal layer and thereby not used as a reference. Thereby, erroneous sample values from reference pictures in early segments that differ from encoder-side or drift-affected samples in other RASL pictures do not result in sample-to-syntax errors in prediction.

為了確保區段切換後的VVC一致性，必須針對具有在相關聯的CRA之前的參考圖像之所有RASL圖像限制其他工具，亦即光流相關工具BDOF及PROF的使用被停用。在一替代方案中，為了簡單起見，可能對所有RASL圖像停用BDOF及PROF。另外，視訊內之獨立寫碼子圖像的VVC之新特徵例如適用於360度視埠相關視訊串流，該新特徵必須停用以使用RPR。所有以上工具約束亦為在VVC規範中定義以啟用RPR之一致性約束的一部分。除與VVC中之RPR使用相關聯的一致性約束以外，亦需要其他工具約束，此係因為使用自經重建樣本之參數預測的預測技術亦可產生明顯假影。因此，在吾人之實施中，藉由編碼器側搜尋演算法逐區塊約束來停用CCLM，此係因為當前的VVC語法僅允許逐序列停用，此顯著降低整體寫碼效率。此有效地允許確保編碼器側避免漂移，但若無徹底的低層級剖析，則無法在解碼器上容易地確認。並且，由於啟用但不使用該工具，不必要的位元(亦即用於CCLM使用之寫碼單元層級旗標，諸如cclm_mode_flag或cclm_mode_idx)經發送以傳信不使用其之編碼決策。To ensure VVC consistency after section switching, the use of other tools, namely the optical flow related tools BDOF and PROF, must be disabled for all RASL images with a reference image before the associated CRA. In an alternative, for simplicity, BDOF and PROF might be disabled for all RASL images. In addition, the new feature of VVC for independently coded sub-images within a video, such as 360-degree viewport-related video streaming, must be disabled to use RPR. All the above tool constraints are also part of the conformance constraints defined in the VVC specification to enable RPR. In addition to the consistency constraints associated with the use of RPR in VVC, other tool constraints are also required because prediction techniques that use parameter predictions from reconstructed samples can also produce significant artifacts. Therefore, in our implementation, CCLM is disabled by block-by-block constraints of the encoder-side search algorithm. This is because the current VVC syntax only allows sequence-by-sequence disabling, which significantly reduces the overall coding efficiency. This effectively allows ensuring that drift is avoided on the encoder side, but cannot be easily confirmed on the decoder without thorough low-level profiling. Also, because the tool is enabled but not used, unnecessary bits (ie, code unit level flags used by CCLM, such as cclm_mode_flag or cclm_mode_idx) are sent to signal the encoding decision not to use them.

其次，亦對於開放GOP寫碼結構，亦即按解碼次序在其相關聯的RASL圖像之前的CRA圖像，用於所有圖像之必要的APS需要存在於該區段內。應注意，對於隨機存取開放GOP串流，此約束並非必要的，且允許RASL圖像指代以位元串流傳輸的按解碼次序在相關聯的CRA圖像之前的APS。由於當在此類CRA圖像處隨機存取此類圖像時，RASL圖像會被丟棄且此引用並非有問題的。並且，當不執行切換時，此類APS在連續解碼中係可用的。然而，此等APS在具有開放GOP切換之串流中可能不可用，且因此需要防止被引用。在吾人之實施中，與ALF、LMCS及量化縮放清單相關的處理係以與封閉GOP寫碼結構類似的方式來重設。Secondly, also for open GOP coding structures, that is, CRA pictures that precede their associated RASL pictures in decoding order, the necessary APS for all pictures need to be present in this section. It should be noted that for random access open GOP streaming, this constraint is not necessary and a RASL picture is allowed to refer to an APS transmitted in the bit stream that precedes the associated CRA picture in decoding order. Since when such an image is randomly accessed at such a CRA image, the RASL image is discarded and this reference is not problematic. Also, such APS is available in continuous decoding when switching is not performed. However, these APS may not be available in streams with open GOP switching, and therefore need to be prevented from being referenced. In our implementation, the processing related to ALF, LMCS, and quantized scaling lists is reconfigured in a similar manner to the closed GOP coding structure.

第三，自VVC高層級語法之視角來看，位元串流階梯中之變型的個別編碼必須以協調方式實行，且牢記開放GOP切換之目標在解碼器側。因此，所有區段變型之序列參數集(SPS)需要經對準，使得區段切換不會藉由SPS之改變觸發新的經寫碼層視訊序列之開始。舉例而言，藉由適當協調，SPS將指示位元串流階梯內之最大解析度、匹配區塊大小及色度格式、適當匹配層級指示器及相關約束旗標，諸如gci_no_res_change_in_clvs_constraint_flag、sps_ref_pic_resampling_enabled_flag及sps_res_change_in_clvs_allowed_flag，其具有在解碼器側啟用RPR之使用的適當組配。能力比經指示最大解析度或層級所需的能力低的裝置需要藉由系統機制運用經調節SPS來經服務。Third, from the perspective of VVC high-level syntax, the individual encodings of the variants in the bitstreaming ladder must be implemented in a coordinated manner, keeping in mind that the target of open GOP switching is on the decoder side. Therefore, the sequence parameter sets (SPS) of all segment variants need to be aligned so that segment switching does not trigger the start of a new coded layer video sequence through changes in the SPS. For example, with appropriate coordination, SPS will indicate the maximum resolution within the bitstream ladder, match block size and chroma format, appropriate match level indicators and related constraint flags such as gci_no_res_change_in_clvs_constraint_flag, sps_ref_pic_resampling_enabled_flag and sps_res_change_in_clvs_allowed_flag, It has the appropriate configuration to enable the use of RPR on the decoder side. Devices with capabilities lower than those required for the indicated maximum resolution or level will need to be serviced by system mechanisms using adjusted SPS.

VVC中之RPR已經以受約束方式設計，以限制如自以上工具約束論述而顯而易見的其實施及運行時間複雜性。此複雜性考慮之重要態樣在於當存取RPR使用中之經縮放參考樣本時之記憶體頻寬係可接受的，且相比於無RPR之情況下，並不顯著較高。VVC中之經寫碼圖像伴隨著所謂的縮放窗口，其用於判定二個圖像之間的縮放因數。為了為RPR之記憶體頻寬要求設定一個界限，使用RPR之圖像的縮放窗口與其參考圖像之縮放窗口之間的關係經限制以最多允許八倍放大及二倍縮小。換言之，假設每一縮放窗口匹配其表示之圖像大小，允許當切換為具有八倍高的圖像大小之表示時使用RPR。然而，若圖像大小在每一維度上減少不少於一半，則下切換可能僅使用RPR。RPR in VVC has been designed in a constrained manner to limit its implementation and runtime complexity as apparent from the discussion of tool constraints above. An important aspect of this complexity consideration is that the memory bandwidth when accessing scaled reference samples in use with RPR is acceptable and not significantly higher than without RPR. Coded images in VVC are accompanied by a so-called scaling window, which is used to determine the scaling factor between two images. In order to set a bound on the memory bandwidth requirements of RPR, the relationship between the zoom window of the image using RPR and the zoom window of its reference image is limited to allow a maximum of eight times magnification and two times reduction. In other words, each zoom window is assumed to match the image size of its representation, allowing RPR to be used when switching to a representation with eight times the height of the image size. However, if the image size is reduced by no less than half in each dimension, down switching may only use RPR.

通常，在自適應串流情況中，以漸進性方式來實行上切換，亦即逐漸提高解析度或品質。然而，在下切換時，可能會發生，當播放器之緩衝區不足時，該播放器切換至最低品質以避免緩衝區欠載運行，此意謂下切換可能不會逐步發生。減輕VVC中之RPR的此限制性之一種方式為運用封閉的GOP結構來編碼最低品質表示，使得其可在圖像大小在此類非漸進性下切換事件期間減小至小於一半時用作後背方案。Typically, in the case of adaptive streaming, upswitching is performed in a progressive manner, that is, gradually increasing the resolution or quality. However, on downswitching, it may happen that when the player's buffer is low, the player switches to the lowest quality to avoid buffer underruns, meaning that downswitching may not occur gradually. One way to mitigate this limitation of RPR in VVC is to use a closed GOP structure to encode the lowest quality representation so that it can be used as a backing when the image size is reduced to less than half during such non-progressive switching events. plan.

在下文中，描述實施例，其係關於支援上文所論述之寫碼工具100、200、300、400及500中之一個、更多個或全部之解碼器及編碼器。接下來所描述之解碼器及編碼器可以遵從圖1及圖2之方式來實施。儘管上文已經主要關於解碼器側論述了工具100、200、300、400及500，但顯而易見，對應的工具之描述可轉移至編碼器側上，該差異在於該編碼器將經涉及資訊插入至資料串流中而非自該資料串流對經涉及資訊進行解碼。每一所支援的寫碼工具表示一個寫碼工具350。在區塊基礎上使用顯式語法元素控制353之任一寫碼工具涉及該編碼器在區塊基礎上編碼語法元素及該解碼器自資料串流解碼該語法元素。寫碼工具100可為僅有的使用此顯式區塊基礎語法元素之寫碼工具。其他寫碼工具200、300、400及500可使用固有的區塊基礎應用決策356連同圖像基礎或切片基礎組配信令，以用於完全地去啟動工具。In the following, embodiments are described regarding decoders and encoders supporting one, more or all of the coding tools 100, 200, 300, 400 and 500 discussed above. The decoder and encoder described next can be implemented in accordance with the manner of FIG. 1 and FIG. 2 . Although the tools 100, 200, 300, 400 and 500 have been discussed above mainly with respect to the decoder side, it is obvious that the description of the corresponding tools can be transferred to the encoder side, the difference being that the encoder inserts the involved information into The information involved is decoded in the data stream rather than from the data stream. Each supported coding tool represents a coding tool 350 . Any coding tool using explicit syntax element control 353 on a block basis involves the encoder encoding the syntax elements on a block basis and the decoder decoding the syntax elements from the data stream. Coding tool 100 may be the only coding tool that uses this explicit block-based syntax element. Other coding tools 200, 300, 400, and 500 may use native block-based application decisions 356 along with image-based or slice-based assembly signaling for fully deactivating the tool.

隨後經解釋實施例係關於一指示或信令，其向該解碼器指示是否已經遵從關於剛提及之一組一或多個寫碼工具之使用的某些編碼約束。該編碼器在資料串流中傳信此指示，且藉由遵從對應的編碼約束來相應地限定其編碼。在區段切換之狀況下，該解碼器又使用該指示且將其解譯為漂移限制性的保證或指示。根據替代性實施例，下文論述之該指示/信令亦可用於實際上去啟動寫碼工具中之一或多者。舉例而言，在VVC中，迄今為止不可能在圖像或切片基礎上去啟動工具100。除了承諾功能之外，下文論述之該指示/信令亦可承擔組配信令之功能，以便關於某些圖像/切片去啟動工具100。用於區塊基礎決策356之區塊基礎語法元素接著可經省略，且未經寫碼成資料串流且未自資料串流解碼。The subsequently explained embodiments relate to an indication or signaling to the decoder whether certain coding constraints regarding the use of the just mentioned set of one or more coding tools have been complied with. The encoder signals this indication in the data stream and limits its encoding accordingly by complying with the corresponding encoding constraints. In the case of a sector switch, the decoder in turn uses this indication and interprets it as a drift-limited guarantee or indication. According to alternative embodiments, the instructions/signaling discussed below may also be used to actually launch one or more of the coding tools. For example, in VVC it has not been possible so far to launch the tool 100 on an image or slice basis. In addition to the commitment function, the instructions/signaling discussed below may also assume the function of configuring signaling to launch the tool 100 with respect to certain images/slices. The block-based syntax elements used for block-based decision 356 may then be omitted and not encoded into the data stream and not decoded from the data stream.

隨後，呈現組合，其使用經呈現約束之信令以啟用開放GOP解析度切換，亦即，當執行串流切換時，CRA之RASL圖像可運用可接受漂移來解碼，此係因為某些寫碼工具在RASL圖像中係無效的。雖然目前先進技術允許對作為經呈現方法之一部分的工具(例如TMVP、SBTMVP、BDOF、PROF及DMVR)中之一些的此類指示，但在經呈現方法中存在顯著的額外約束，該等約束需要避免來自樣本至語法預測工具(亦即，CCLM及/或LMCS)的嚴重假影。因此，需要具有能夠在位元串流中指示此類工具對於某些圖像(亦即CRA之RASL圖像)無效的編碼器。Subsequently, a combination is presented that uses presentation-constrained signaling to enable open-GOP resolution switching, i.e., when performing stream switching, the RASL images of the CRA can be decoded with acceptable drift due to certain write Coding tools are not available on RASL images. While current state-of-the-art technology allows for such indication for some of the tools that are part of the presented method (such as TMVP, SBTMVP, BDOF, PROF, and DMVR), there are significant additional constraints in the presented method that require Avoid serious artifacts from samples to grammar prediction tools (i.e., CCLM and/or LMCS). Therefore, there is a need to have encoders that can indicate in the bitstream that such tools are not valid for certain images (i.e. CRA's RASL images).

VVC具有擴展機制，以向後相容的方式將位元旗標添加至圖像標頭(PH)及切片標頭(SH)語法。出於此目的，各別SPS指示PH或SH語法中之用於此類目的的額外位元之數目，該等額外位元在讀取語法時必須經剖析，且求導係用於將此額外位元指派至旗標或可變值。以下表展示伴隨著各別語義之各別SPS及PH語法。SH語法及語義類似於PH語法及語義。 seq_parameter_set_rbsp( ) { 描述符 […] sps_num_extra_ph_bytes u(2) for( i = 0; i ＜ (sps_num_extra_ph_bytes * 8 ); i++ ) sps_extra_ph_bit_present_flag[ i ] u(1) […] sps _extra _ph _bit _present _flag[ i ]等於1規定第i額外位元存在於引用SPS之PH語法結構中。sps_extra_ph_bit_present_flag[ i ]等於0規定第i額外位元不存在於引用SPS之PH語法結構中。 VVC has an extension mechanism to add bit flags to the image header (PH) and slice header (SH) syntax in a backwards-compatible manner. For this purpose, the respective SPS indicates the number of extra bits in the PH or SH syntax used for this purpose. These extra bits must be parsed when reading the syntax, and the derivation is used to Bits are assigned to flags or variable values. The following table shows the respective SPS and PH syntax along with the respective semantics. SH syntax and semantics are similar to PH syntax and semantics. seq_parameter_set_rbsp( ) { Descriptor […] sps_num_extra_ph_bytes u(2) for( i = 0; i < (sps_num_extra_ph_bytes * 8 ); i++ ) sps_extra_ph_bit_present_flag [i] u(1) […] sps _extra _ph _bit _present _flag [ i ] equal to 1 specifies that the i-th extra bit is present in the PH syntax structure referencing SPS. sps_extra_ph_bit_present_flag[i] equal to 0 specifies that the i-th extra bit does not exist in the PH syntax structure referencing SPS.

變數NumExtraPhBits如下導出： NumExtraPhBits = 0 for( i = 0; i ＜ ( sps_num_extra_ph_bytes * 8 ); i++ ) if( sps_extra_ph_bit_present_flag[ i ] ) (1) NumExtraPhBits++ picture_header_structure( ) { 描述符 […] for( i = 0; i ＜ NumExtraPhBits; i++ ) ph_ extra_bit[ i ] u(1) […] ph_extra_bit[ i ]可具有任一值。符合本說明書之此版本的解碼器應忽略ph_extra_bit[ i ]之存在及值。其值不影響在本說明書之此版本中指定的解碼程序。 The variable NumExtraPhBits is derived as follows: NumExtraPhBits = 0 for( i = 0; i < ( sps_num_extra_ph_bytes * 8 ); i++ ) if( sps_extra_ph_bit_present_flag[ i ] ) (1) NumExtraPhBits++ picture_header_structure( ) { Descriptor […] for( i = 0; i <NumExtraPhBits; i++ ) ph_extra_bit [i ] u(1) […] ph_extra_bit[i] can have any value. Decoders conforming to this version of this specification shall ignore the presence and value of ph_extra_bit[i]. Its value does not affect the decoding procedures specified in this version of this specification.

未知曉的解碼器可至少正確地剖析位元串流且對其進行正確解碼，而知曉額外位元含義之解碼器可進一步解譯額外位元指示且相應地起作用，例如向用戶端建議串流切換係可能的且無嚴重漂移，此係因為符合根據經呈現方法之約束。同樣地，檔案格式封裝器、HTTP串流伺服器或甚至RTP串流伺服器可在以利用位元串流切換之方式封裝、提供且服務該內容時考慮此位元串流指示。An unknown decoder can at least parse the bitstream correctly and decode it correctly, while a decoder that knows the meaning of the extra bits can further interpret the extra bits indication and act accordingly, such as suggesting the string to the user. Flow switching is possible without severe drift due to compliance with constraints according to the presented method. Likewise, a file format wrapper, HTTP streaming server, or even an RTP streaming server can consider this bitstream directive when packaging, serving, and serving the content in a manner that utilizes bitstream switching.

本發明之實施例將如下在RASL圖像或相關聯的CRA圖像之PH或SH語法的額外位元中攜載經呈現方法之指示。在SPS語義中，藉由sps_extra_ph_bit_present_flag[ i ]之指數i將特定額外位元旗標識別為指示PH/SH額外位元旗標之存在，其指示經呈現方法。舉例而言，PH中之第一額外位元存在可如下藉由第一SPS PH額外位元組之第一SPS PH額外位元(i=0)來識別。Embodiments of the present invention would carry an indication of the rendered method in extra bits of the PH or SH syntax of the RASL image or associated CRA image as follows. In SPS semantics, a specific extra bit flag is identified by the index i of sps_extra_ph_bit_present_flag[i] as indicating the presence of a PH/SH extra bit flag, which indicates the presentation method. For example, the presence of the first extra bit in the PH may be identified by the first SPS PH extra bit (i=0) of the first SPS PH extra byte.

變數ConstraintMethodFlagPresentFlag之值經設定為等於sps_extra_ph_bit_present_flag[ 0 ]。應注意，使用索引0，但可替代地使用另一索引，亦即，sps_extra_ph_bit_present_flag[ i ]當中之位元經選擇以具有以下意義：在所使用的工具方面約束RASL圖像。The value of the variable ConstraintMethodFlagPresentFlag is set equal to sps_extra_ph_bit_present_flag[ 0 ]. It should be noted that index 0 is used, but another index could be used instead, namely, the bits in sps_extra_ph_bit_present_flag[i] are chosen to have the following meaning: constrain the RASL image in terms of the tool used.

在PH語義中，如下導出指示經呈現約束方法之特點的各別變數。In PH semantics, the respective variables indicating the characteristics of the rendered constraint method are derived as follows.

變數ConstrainedRASLFlagEnabledFlag/ConstrainedCRAFlagEnabledFlag之值經設定成等於(ConstraintMethodFlagPresentFlag && ph_extra_bit[ 0 ])。應注意，使用索引0，但取決於由sps_extra_ph_bit_present_flag[ i ]指示之值以及哪一索引用於針對RASL圖像ph_extra_bit[ j ]之約束，PH中之額外旗標中之第j旗標將指示該等約束對於RASL圖像是否為適當的。The value of the variable ConstrainedRASLFlagEnabledFlag/ConstrainedCRAFlagEnabledFlag is set equal to (ConstraintMethodFlagPresentFlag && ph_extra_bit[ 0 ]). It should be noted that index 0 is used, but depending on the value indicated by sps_extra_ph_bit_present_flag[i] and which index is used for the constraint against the RASL image ph_extra_bit[j], the jth flag among the extra flags in the PH will indicate that. Whether the equality constraints are appropriate for RASL images.

替代方案1 (攜載RASL圖像中之信令)：當ConstrainedRASLFlagEnabledFlag等於1時，在不使用CCLM之情況下編碼當前圖像。用於BDOF、DMVR、PROF、(SB)TMVP及LMCS之PH/SH控制旗標以及序列層級約束旗標已經在VVC版本1中，而CCLM錯過了具有圖像或切片範疇之控制旗標。Alternative 1 (carrying signaling in RASL images): When ConstrainedRASLFlagEnabledFlag is equal to 1, encode the current image without using CCLM. PH/SH control flags and sequence-level constraint flags for BDOF, DMVR, PROF, (SB)TMVP and LMCS are already in VVC version 1, while CCLM misses control flags with image or slice scope.

替代方案2 (攜載相關聯的CRA圖像中之信令)：當ConstrainedCRAFlagEnabledFlag等於1時，在不使用TOOLSET之情況下編碼與當前圖像相關聯之RASL圖像，其中TOOLSET是指CCLM及/或LMCS及/或BDOF及/或PROF及/或DMVR，及/或RASL圖像不使用在當前圖像(亦即CRA圖像)之前的用於(sb)TMVP之任何並置圖像。Alternative 2 (carrying signaling in the associated CRA image): When ConstrainedCRAFlagEnabledFlag is equal to 1, encode the RASL image associated with the current image without using TOOLSET, where TOOLSET refers to CCLM and/or Or LMCS and/or BDOF and/or PROF and/or DMVR, and/or RASL images do not use any collocated image used for (sb)TMVP before the current image (ie CRA image).

類似於上文之替代性實施例可經建構以用於RASL或CRA圖像之切片的SH信令。Alternative embodiments similar to those above may be constructed for SH signaling of slices of RASL or CRA images.

在另一替代實施例中，以上約束如下藉由在具有相同含義之DCI、VPS或SPS中添加一般約束旗標(例如，一般約束資訊語法中之gci_rasl_pictures_tool_constraint_flag或gci_cra_pictures_tool_constraint_flag)而經指示為CVS、CLVS及/或位元串流之性質/約束。In another alternative embodiment, the above constraints are indicated as CVS, CLVS and /or properties/constraints of bit streams.

對應於替代方案1：gci_rasl_pictures_tool_constraint_flag等於1規定用於OlsInScope中之所有RASL圖像的ConstrainedRASLFlagEnabledFlag應等於1。gci_rasl_pictures_tool_constraint_flag等於0不施加此類約束。Corresponding to alternative 1: gci_rasl_pictures_tool_constraint_flag equal to 1 specifies that the ConstrainedRASLFlagEnabledFlag for all RASL images in OlsInScope should be equal to 1. gci_rasl_pictures_tool_constraint_flag equal to 0 does not impose such constraints.

對應於替代方案2：gci_cra_pictures_tool_constraint_flag等於1規定用於OlsInScope中之所有CRA圖像的ConstrainedCRAFlagEnabledFlag應等於1。gci_cra_pictures_tool_constraint_flag等於0不施加此類約束。Corresponding to alternative 2: gci_cra_pictures_tool_constraint_flag equal to 1 specifies that the ConstrainedCRAFlagEnabledFlag for all CRA images in OlsInScope should be equal to 1. gci_cra_pictures_tool_constraint_flag equal to 0 does not impose such constraints.

亦即，當設定此一般約束旗標時，在不使用TOOLSET之情況下編碼與CVS、CLVS及/或位元串流中之CRA相關聯的所有RASL圖像—其中TOOLSET係指CCLM及/或LMCS及/或BDOF及/或PROF及/或DMVR，及/或RASL圖像不使用在當前圖像(亦即CRA圖像)之前的用於(sb)TMVP之任何並置圖像。That is, when this general constraint flag is set, all RASL images associated with CVS, CLVS and/or CRA in the bitstream are encoded without using TOOLSET - where TOOLSET refers to CCLM and/or The LMCS and/or BDOF and/or PROF and/or DMVR, and/or RASL images do not use any collocated image used for (sb)TMVP before the current image (that is, the CRA image).

在另一替代實施例中，以上約束係在PPS擴展語法中指示。與CRA相關聯之RASL圖像可指代PPS，其指示以上約束係有效的，而位元串流之其他圖像確實指代不指示以上約束之PPS。In another alternative embodiment, the above constraints are indicated in the PPS extension syntax. The RASL image associated with the CRA may refer to a PPS that indicates that the above constraints are in effect, while other images of the bitstream do indeed refer to PPSs that do not indicate the above constraints.

在另一替代實施例中，以上約束信令係藉由CRA圖像或相關聯的RASL圖像中之SEI訊息或針對圖像之整個經寫碼層視訊序列來實行。In another alternative embodiment, the above constraint signaling is performed by SEI messages in the CRA image or associated RASL image or the entire coded layer video sequence for the image.

在另一替代實施例中，以上約束信令係用於在寫碼單元層級上有條件地發送CCLM旗標。在另一替代實施例中，以上約束信令不適用於與CRA相關聯之所有RASL圖像，但考慮到可接受漂移及招致的寫碼效率損失，取決於實際工具，限於RASL圖像之子集： In another alternative embodiment, the above constraint signaling is used to conditionally send the CCLM flag at the code writing unit level. In another alternative embodiment, the above constraint signaling does not apply to all RASL images associated with CRA, but is limited to a subset of RASL images, depending on the actual tool, taking into account the acceptable drift and incurring loss of coding efficiency. :

- DMVR：(與以上描述本文相同)：DMVR對具有按解碼次序在相關聯的CRA之前的有效參考圖像之所有RASL圖像停用。應注意，當前圖像可具有：其參考圖像清單(RPL)中之有效參考圖像(實際上用於預測)，及非有效參考圖像，其不用於預測當前圖像而是後續(按解碼次序)圖像之樣本或語法，且其因此未準備好自經解碼圖像緩衝區(DPB)移除。在另一替代方案中，DMVR對所有RASL圖像停用，而不管其參考圖像，且在另一替代方案中，DMVR僅對用作用於以下圖像之並置參考圖像的RASL圖像停用，並且在又一替代方案中，DMVR對不屬於最高時間層之RASL圖像停用。可組合該等替代方案。- DMVR: (Same as described above in this article): DMVR is disabled for all RASL pictures that have a valid reference picture preceding the associated CRA in decoding order. It should be noted that the current picture can have valid reference pictures in its reference picture list (RPL) (actually used for prediction), and non-valid reference pictures, which are not used for prediction of the current picture but subsequent (as per decoding order) image samples or syntax, and it is therefore not ready to be removed from the Decoded Picture Buffer (DPB). In another alternative, DMVR is disabled for all RASL images regardless of their reference images, and in another alternative, DMVR is disabled only for RASL images used as collocated reference images for used, and in yet another alternative, DMVR is disabled for RASL images that do not belong to the highest temporal layer. These alternatives can be combined.

- BDOF及PROF：對於所有RASL圖像或僅對具有按解碼次序在相關聯的CRA之前的有效參考圖像之RASL圖像停用。- BDOF and PROF: Disabled for all RASL pictures or only for RASL pictures with a valid reference picture preceding the associated CRA in decoding order.

根據一實施例，該指示在資料串流中以SEI訊息之形式來傳信。如上文所提及，SEI訊息對於圖像序列(例如，經寫碼視訊序列，CVS)中之所有圖像可為有效的。例如，SEI訊息可在該序列中傳信。因此，該解碼器可自SEI訊息之存在或自SEI訊息中之指示推斷出該序列中之所有RASL圖像以不包括寫碼工具之預定集合之方式經寫碼。舉例而言，根據此實施例，該組寫碼工具至少包含基於跨分量線性模型之預測工具100及解碼器側運動向量改進工具400。According to one embodiment, the indication is communicated in the data stream in the form of an SEI message. As mentioned above, the SEI message may be valid for all images in the image sequence (eg, coded video sequence, CVS). For example, SEI messages may be signaled in this sequence. Therefore, the decoder can infer from the presence of the SEI message or from the indication in the SEI message that all RASL images in the sequence are coded in a manner that does not include the predetermined set of coding tools. For example, according to this embodiment, the set of coding tools includes at least a prediction tool 100 based on a cross-component linear model and a decoder-side motion vector improvement tool 400.

在下文中，描述上文所描述之發明的其他實施例。In the following, further embodiments of the invention described above are described.

D1.1. 一種用於自一資料串流解碼一視訊之視訊解碼器，其經組配以自該資料串流解碼一指示[例如，gci_rasl_pictures_tool_constraint_flag]，該指示對於該視訊之一圖像序列為有效的且指示該圖像序列內之RASL圖像以不包括一或多個寫碼工具之一預定集合來寫碼[例如，作為一種承諾，使得該解碼器已知，藉由串接在不同空間解析度及/或不同SNR下寫碼之該視訊之單獨寫碼的開放GOP版本之開放GOP切換在RASL圖像中不導致過多漂移]。D1.1. A video decoder for decoding a video from a data stream configured to decode a directive from the data stream [e.g., gci_rasl_pictures_tool_constraint_flag] that for a sequence of images of the video is Valid and indicates that the RASL images within the image sequence are coded with a predetermined set that does not include one or more coding tools [e.g., as a commitment to make the decoder known, by concatenating different Open-GOP switching of separately coded Open-GOP versions of this video coded at different spatial resolutions and/or different SNRs does not result in excessive drift in the RASL image].

D1.2.如任一前述實施例D1.#之視訊解碼器，其中一或多個寫碼工具之該集合包含一基於跨分量線性模型之預測工具(100)。 D1.2. The video decoder of any preceding embodiment D1.#, wherein the set of one or more coding tools includes A prediction tool (100) based on a cross-component linear model.

D1.3. 如實施例D1.2之視訊解碼器，其中，根據該基於跨分量線性模型之預測工具，一圖像區塊(10a)之一色度分量(102)係使用一線性模型(106)自該圖像區塊(10a)之一明度分量(104)來預測，該線性模型之參數係自該圖像區塊之一已經解碼鄰域(112)中之明度及色度極值(110)來判定(108)。D1.3. The video decoder of embodiment D1.2, wherein according to the prediction tool based on a cross-component linear model, a chroma component (102) of an image block (10a) uses a linear model (106 ) is predicted from a brightness component (104) of the image block (10a), the parameters of the linear model are derived from the brightness and chroma extrema (112) of one of the decoded neighborhoods (112) of the image block 110) to determine (108).

D1.4. 如任一前述實施例D1.#之視訊解碼器，其中一或多個寫碼工具之該集合包含一明度色調映射及色度殘餘縮放預測工具(200)。 D1.4. The video decoder of any preceding embodiment D1.#, wherein the set of one or more coding tools includes A luma tone mapping and chroma residual scaling prediction tool (200).

D1.5. 如實施例D1.4之視訊解碼器，其中，根據該明度色調映射及色度殘餘縮放預測工具，用於一預定圖像(12)之一明度分量預測(202)[例如框間預測]及一明度分量殘餘解碼(204)係以一寫碼明度色調標度(208)來執行，一呈現明度色調標度(210)係藉由一明度色調映射(212)而經映射至該寫碼明度色調標度上，以獲得該預定圖像之一經重建明度分量之一寫碼明度色調標度版本(214)，用於該預定圖像之一圖像區塊(10b)之一色度殘餘縮放因數(216)係自該圖像區塊之一鄰域(222)內之該預定圖像的該經重建明度分量之該寫碼明度色調標度版本之一平均值(220)來判定，且針對該圖像區塊自該資料串流解碼之一色度殘餘信號(224)係根據該色度殘餘縮放因數而經縮放(226)且用於針對該圖像區塊來校正(228)一框內色度預測信號(230)。 D1.5. The video decoder of embodiment D1.4, wherein based on the luma tone mapping and chroma residual scaling prediction tools, Luminance component prediction (202) [eg, inter-frame prediction] and a luma component residual decoding (204) for a predetermined image (12) are performed with a coded luma tone scale (208), a rendered luma The hue scale (210) is mapped onto the coded value tone scale by a value tone map (212) to obtain a coded value tone scale version of the reconstructed value component of the predetermined image ( 214), A chroma residual scaling factor (216) for an image block (10b) of the predetermined image is derived from the reconstructed luma component of the predetermined image within a neighborhood (222) of the image block It is judged by the average value (220) of the lightness and hue scale version of the code, and A chroma residual signal (224) decoded from the data stream for the image block is scaled (226) according to the chroma residual scaling factor and used to correct (228) a frame for the image block Inner chroma prediction signal (230).

D1.6. 如任一前述實施例D1.#之視訊解碼器，其中一或多個寫碼工具之該集合包含一光流工具(300)。 D1.6. The video decoder of any preceding embodiment D1.#, wherein the set of one or more coding tools includes An optical flow tool (300).

D1.7. 如實施例D1.6之視訊解碼器，其中該光流工具係用於藉助於基於光流之分析來改良一預定經框間預測區塊(10c)之一平移框間預測信號。 D1.7. The video decoder of embodiment D1.6, wherein the optical flow tool is A translational inter prediction signal for improving a predetermined inter prediction block (10c) by means of optical flow based analysis.

D1.8. 如任一前述實施例D1.#之視訊解碼器，其中一或多個寫碼工具之該集合包含一解碼器側運動向量改進工具(400)。 D1.8. The video decoder of any preceding embodiment D1.#, wherein the set of one or more coding tools includes A decoder side motion vector improvement tool (400).

D1.9. 如實施例D1.8之視訊解碼器，其中該解碼器側運動向量改進工具係用於藉由在於該資料串流中寫碼之一經傳信運動向量(402)處及周圍之運動向量候選者當中執行一最佳匹配搜索來改進該經傳信運動向量而改良該經傳信運動向量，以用於對來自一參考圖像(404)之一預定經框間預測區塊(10d)進行框間預測。 D1.9. The video decoder of embodiment D1.8, wherein the decoder-side motion vector improvement tool is for improving a signaled motion vector by performing a best match search among motion vector candidates at and around a signaled motion vector coded in the data stream (402). , for performing inter prediction on a predetermined inter predicted block (10d) from a reference image (404).

D1.9a. 如實施例D1.9之視訊解碼器，其中該解碼器側運動向量改進工具經組配以相對於該參考圖像使用該經框間預測區塊之一已經解碼鄰域來執行該最佳匹配搜索。 D1.9a. The video decoder of embodiment D1.9, wherein the decoder-side motion vector improvement tool is configured with The best match search is performed using one of the decoded neighborhoods of the inter-predicted block relative to the reference image.

D1.9b. 如實施例D1.8之視訊解碼器，其中該解碼器側運動向量改進工具經組配以藉由在包括在該資料串流中寫碼之一對經傳信運動向量(402)及圍繞該對經傳信運動向量之運動向量配對候選者當中執行一最佳匹配搜索來改進該對經傳信運動向量，以用於對來自在時間上置放在一經預定框間雙向預測區塊(10d)之一圖像前方及後方[按呈現次序]的一對參考圖像(404)之該經預定框間雙向預測區塊(10d)進行框間預測。 D1.9b. The video decoder of embodiment D1.8, wherein the decoder-side motion vector improvement tool is configured with Improving a pair of signaled motion vectors by performing a best match search among motion vector pairing candidates including a pair of signaled motion vectors encoded in the data stream (402) and surrounding the pair of signaled motion vectors Vectors for mapping a predetermined frame from a pair of reference images (404) temporally placed in front of and behind one of the images in the predetermined inter-frame bidirectional prediction block (10d) in presentation order. The inter-frame bidirectional prediction block (10d) performs inter-frame prediction.

D1.10. 如任一前述實施例D1.#之視訊解碼器，其中一或多個寫碼工具之該集合包含一時間運動向量預測工具(500)。 D1.10. The video decoder of any of the preceding embodiments D1.#, wherein the set of one or more coding tools includes A temporal motion vector prediction tool (500).

D1.11. 如實施例D1.10之視訊解碼器，其中根據該時間運動向量預測工具，用於經框間預測區塊之運動向量候選者清單形成包括自一經先前解碼圖像(502)進行運動向量候選者補充。D1.11. The video decoder of embodiment D1.10, wherein according to the temporal motion vector prediction tool, forming a motion vector candidate list for an inter-predicted block includes proceeding from a previously decoded image (502) Motion vector candidate complement.

D1.12. 如實施例D1.11之視訊解碼器，其中根據該時間運動向量預測工具，用於經框間預測區塊之運動向量候選者清單形成包括自由一運動向量預測子(504)指向之該經先前解碼圖像之一區塊(506)進行運動向量候選者補充。D1.12. The video decoder of embodiment D1.11, wherein according to the temporal motion vector prediction tool, the motion vector candidate list for the inter prediction block is formed including a free motion vector predictor (504) pointing Motion vector candidate supplementation is performed on a block of the previously decoded image (506).

D1.13. 如實施例D1.12之視訊解碼器，其中該運動向量預測子包括一時間運動向量預測子。D1.13. The video decoder of embodiment D1.12, wherein the motion vector predictor includes a temporal motion vector predictor.

D1.14. 如任一前述實施例D1.#之視訊解碼器，其中該指示包括於以下各者中之一者中：該資料串流之一解碼器能力資訊區段，及該資料串流之一視訊或序列參數集，及一補充增強資訊訊息。 D1.14. The video decoder of any preceding embodiment D1.#, wherein the instruction is included in one of the following: a decoder capability information section of the data stream, and one of the video or sequence parameter sets of the data stream, and 1. Supplement and enhance information messages.

D1.15. 如任一前述實施例D1.#之視訊解碼器，其中該指示包含一個位元，其共同地指示相對於該圖像序列內之該等RASL圖像之寫碼，不包括一或多個寫碼工具之該預定集合中之所有寫碼工具。D1.15. The video decoder of any preceding embodiment D1.#, wherein the indication includes a bit that collectively indicates the coding relative to the RASL images in the image sequence, excluding a or all coding tools in the predetermined set of multiple coding tools.

D1.16. 如任一前述實施例D1.#之視訊解碼器，其中該解碼器經組配以支援參考圖像重新取樣。D1.16. The video decoder of any preceding embodiment D1.#, wherein the decoder is configured to support reference picture resampling.

D1.17. 如實施例D1.16之視訊解碼器，其中，根據該參考圖像重新取樣，一經框間預測區塊之一參考圖像係經受樣本重新取樣，以便橋接該參考圖像與其中含有該經框間預測區塊之一圖像之間的一縮放窗大小偏差或樣本解析度偏差，以為該經框間預測區塊提供一框間預測信號。D1.17. The video decoder of embodiment D1.16, wherein, based on the reference picture resampling, one of the reference pictures of the inter-frame prediction block is subjected to sample resampling in order to bridge the reference picture and the A scaling window size deviation or sample resolution deviation between images containing the inter-predicted block provides an inter-prediction signal for the inter-predicted block.

D1.18. 如前述實施例D1.#中任一項之視訊解碼器，其中一或多個寫碼工具之該集合包含[例如，200、300、400、500] 一或多個第一固有應用的寫碼工具，其中之每一者係針對一預定區塊取決於在該資料串流中針對該預定區塊傳信之一或多個寫碼選項而應用，且與除各別另一寫碼工具以外之一另一寫碼工具相關,及/或一或多個第二固有應用的寫碼工具，其中之每一者係針對一預定區塊取決於該預定區塊之一大小而應用。 D1.18. The video decoder as in any one of the aforementioned embodiments D1.#, wherein the set of one or more coding tools includes [for example, 200, 300, 400, 500] one or more first application-specific coding tools, each of which is applied for a predetermined block depending on one or more coding options communicated in the data stream for the predetermined block, and Relevant to a coding tool other than the respective coding tool, and/or One or more second application-specific coding tools, each of which is applied to a predetermined block depending on a size of the predetermined block.

D1.19. 如前述實施例D1.#中任一項之視訊解碼器，其中一或多個寫碼工具之該集合包含[例如，100、500] 一或多個明確應用的寫碼工具，其中之每一者係針對一預定區塊取決於經寫碼成該資料串流之一語法元素而應用於該預定區塊，以用於專門地傳信各別寫碼工具對該預定區塊之應用。 D1.19. The video decoder as in any one of the aforementioned embodiments D1.#, wherein the set of one or more coding tools includes [for example, 100, 500] One or more explicitly applied coding tools, each of which is applied to a predetermined block in dependence on a syntax element coded into the data stream for specifically transmitting The application of each coding tool to the predetermined block.

D1.20. 如實施例D1.19之視訊解碼器，其中該解碼器經組配以針對該等RASL圖像內之區塊以及針對除該等RASL圖像以外之圖像的區塊自該資料串流解碼該語法元素。D1.20. The video decoder of embodiment D1.19, wherein the decoder is configured for blocks within the RASL images and for blocks for images other than the RASL images from the The data stream decodes this syntax element.

D1.21. 如實施例D1.19之視訊解碼器，其中該解碼器經組配以僅針對除RASL圖像以外之圖像內的區塊自該資料串流解碼該語法元素[例如，藉此節省該等RASL圖像中之位元]。D1.21. The video decoder of embodiment D1.19, wherein the decoder is configured to decode the syntax element from the data stream only for blocks within images other than RASL images [e.g., by This saves bits in these RASL images].

D1.22. 如前述實施例D1.#中任一項之視訊解碼器，其中該解碼器經組配以支援框內預測區塊解碼模式及框間預測區塊解碼模式。D1.22. The video decoder of any one of the preceding embodiments D1.#, wherein the decoder is configured to support an intra-prediction block decoding mode and an inter-prediction block decoding mode.

D1.23. 如前述實施例D1.#中任一項之視訊解碼器，其中該圖像序列開始於且包括一個CRA圖像，且包含直至—按寫碼次序—且結尾於緊接在一CRA圖像之前的一圖像之圖像，或包含按寫碼次序為連續的且包含多於一個CRA之圖像。 D1.23. The video decoder as in any one of the aforementioned embodiments D1.#, wherein the image sequence Beginning with and including a CRA image and including images up to—in coding order—and ending with an image immediately preceding a CRA image, or Contains images that are consecutive in coding order and contain more than one CRA.

D1.24. 如前述實施例D1.#中任一項之視訊解碼器，其中一或多個寫碼工具之該集合包含一或多個可去啟動寫碼工具，其中之每一者就其應用於圖像區塊而言可藉由該資料串流內部之組配信令以圖像或切片為單位而去啟動。 D1.24. The video decoder as in any one of the aforementioned embodiments D1.#, wherein the set of one or more coding tools includes One or more enableable coding tools, each of which with respect to its application to an image block, may be enabled on an image or slice basis by assembly signaling within the data stream.

D1.25. 如前述實施例D1.#中任一項之視訊解碼器，其經組配以使用該指示以查看開放GOP切換是否導致持久的漂移。D1.25. A video decoder as in any of the preceding embodiments D1.# configured to use this indication to see whether open GOP switching causes persistent drift.

D1.25a. 如前述實施例D1.#中任一項之視訊解碼器，其經組配以使用該指示來查看開放GOP切換是否導致樣本不匹配，但保持語法及參數設置。D1.25a. A video decoder as in any one of the preceding embodiments D1.# configured to use this instruction to see if open GOP switching results in a sample mismatch, but maintaining syntax and parameter settings.

D1.26. 如前述實施例D1.#中任一項之視訊解碼器，其中該指示指示該圖像序列內之所有RASL圖像以不包括一或多個寫碼工具之該預定集合之該方式經寫碼，及/或該指示指示該圖像序列內之所有RASL圖像以不包括一或多個寫碼工具之該預定集合之該方式經寫碼，該等RASL圖像具有參考圖像，該等參考圖像按解碼次序在與其相關聯的一CRA圖像之前，及/或該指示指示該圖像序列內之所有RASL圖像以不包括一或多個寫碼工具之該預定集合之該方式經寫碼，該等RASL圖像用作用於以下圖像之一時間運動向量預測參考圖像，及/或該指示指示該圖像序列內之所有RASL圖像以不包括一或多個寫碼工具之該預定集合之該方式經寫碼，該等RASL圖像不屬於一最高時間層。 D1.26. The video decoder as in any one of the aforementioned embodiments D1.#, wherein the indication indicates that all RASL images within the sequence of images are coded in such a manner that does not include the predetermined set of one or more coding tools, and/or The indication indicates that all RASL images within the image sequence are coded in the manner excluding the predetermined set of one or more coding tools, the RASL images having reference images, the reference images being The decoding order precedes a CRA picture with which it is associated, and/or The indication indicates that all RASL images within the image sequence are coded in such a manner excluding the predetermined set of one or more coding tools that are used as a temporal motion vector for the image predict reference images, and/or The indication indicates that all RASL images within the image sequence that do not belong to a top temporal layer are coded in such a manner that does not include the predetermined set of one or more coding tools.

D1.27. 如前述實施例D1.#中任一項之視訊解碼器，其中該指示指示該圖像序列內之該等RASL圖像以不包括一或多個寫碼工具之該預定集合之該方式以一如下方式寫碼使得對於在一或多個寫碼工具之該預定集合中的一或多個寫碼工具之一第一子集，一第一特性應用於之所有RASL圖像以不包括一或多個寫碼工具之該預定集合中的一或多個寫碼工具之該第一子集的該方式經寫碼，且對於在一或多個寫碼工具之該預定集合中的一或多個寫碼工具之一第二子集，一第二特性應用於之所有RASL圖像，或所有RASL圖像，以不包括一或多個寫碼工具之該預定集合中的一或多個寫碼工具之該第二子集的該方式經寫碼，該第一子集及該第二子集相互不相交。 D1.27. The video decoder as in any one of the aforementioned embodiments D1.#, wherein The instruction instructs the RASL images within the sequence of images to be coded in a manner that does not include the predetermined set of one or more coding tools such that For a first subset of one or more coding tools in the predetermined set of one or more coding tools, a first characteristic is applied to all RASL images excluding the one or more coding tools the manner of the first subset of one or more coding tools in the predetermined set is coded, and For a second subset of one or more coding tools in the predetermined set of one or more coding tools, a second characteristic applies to all RASL images, or to all RASL images excluding The manner in which the second subset of one or more coding tools in the predetermined set of one or more coding tools is coded is such that the first subset and the second subset are mutually disjoint.

D1.28. 如實施例D1.27之視訊解碼器，其中該第一特性及/或該第二特性係選擇：使參考圖像按解碼次序在與其相關聯的一CRA圖像之前，充當用於以下圖像之一時間運動向量預測參考圖像，及不屬於一最高時間層。 D1.28. As in the video decoder of embodiment D1.27, the first characteristic and/or the second characteristic is selected: causing the reference picture to precede a CRA picture with which it is associated in decoding order, serves as a reference image for temporal motion vector prediction for one of the following images, and Does not belong to a highest time layer.

D1.29. 如實施例D1.27或更高之視訊解碼器，其中該第一子集包括一解碼器側運動向量改進工具及一時間運動向量預測工具中之一或多者。 D1.29. The video decoder of embodiment D1.27 or higher, wherein The first subset includes one or more of a decoder-side motion vector improvement tool and a temporal motion vector prediction tool.

D1.30. 如實施例D1.29之視訊解碼器，其中該第一特性係選自：使參考圖像按解碼次序在與其相關聯的一CRA圖像之前，充當用於以下圖像之一時間運動向量預測參考圖像，及不屬於一最高時間層。 D1.30. The video decoder of embodiment D1.29, wherein the first characteristic is selected from: causing the reference picture to precede a CRA picture with which it is associated in decoding order, serves as a reference image for temporal motion vector prediction for one of the following images, and Does not belong to a highest time layer.

D2.1. 一種用於自一資料串流解碼一視訊之視訊解碼器，其經組配以自該資料串流解碼一指示[例如，使用sps_extra_ph_bit_present_flag及ph_extra_bit，或使用gci_rasl_pictures_tool_contraint_flag]，該指示按該視訊之一圖像序列中之圖像、全域地針對各別圖像或在每切片基礎上指示該各別圖像是否以不包括一或多個寫碼工具之一預定集合之一方式經寫碼，該預定集合包含一基於跨分量線性模型之預測工具[例如作為一種圖像式指示，其使得有可能查看RASL圖像處之潛力漂移係足夠低的]。 D2.1. A video decoder for decoding a video from a data stream, configured with Decode a directive from the data stream [e.g., using sps_extra_ph_bit_present_flag and ph_extra_bit, or using gci_rasl_pictures_tool_contraint_flag], either per picture in a sequence of pictures in the video, globally for individual pictures, or on a per-slice basis Whether the respective image is coded in a manner that does not include a predetermined set of one or more coding tools, the predetermined set including a prediction tool based on a cross-component linear model [e.g., as an graphical indication, which The potential drift at which it is possible to view RASL images is low enough].

D2.2. 如前述實施例D2.#中任一項之視訊解碼器，其中該解碼器經組配以支援框內預測區塊解碼模式及框間預測區塊解碼模式。D2.2. The video decoder of any one of the preceding embodiments D2.#, wherein the decoder is configured to support an intra-prediction block decoding mode and an inter-prediction block decoding mode.

D2.3. 如任一前述實施例D2.#之視訊解碼器，其中，根據該基於跨分量線性模型之預測工具，一圖像區塊之一色度分量係使用一線性模型自該圖像區塊之一明度分量來預測，該線性模型之參數係自該圖像區塊之一已經解碼鄰域中之明度及色度極值來判定。 D2.3. The video decoder of any of the preceding embodiments D2.#, wherein according to the prediction tool based on the cross-component linear model, The chrominance component of an image block is predicted from the lightness component of the image block using a linear model whose parameters are derived from the lightness and color in one of the decoded neighborhoods of the image block. Judgment by extreme value.

D2.4. 如任一前述實施例D2.#之視訊解碼器，其中一或多個寫碼工具之該集合進一步包含一明度色調映射及色度殘餘縮放預測工具。 D2.4. The video decoder of any preceding embodiment D2.#, wherein the set of one or more coding tools further includes A luma tone mapping and chroma residual scaling prediction tool.

D2.5. 如實施例D2.4之視訊解碼器，其中，根據該明度色調映射及色度殘餘縮放預測工具，用於一預定圖像之一明度分量預測及一明度分量殘餘解碼係以一寫碼明度色調標度來執行，一呈現明度色調標度係藉由一明度色調映射經映射至該寫碼明度色調標度上，以獲得該預定圖像之一經重建明度分量之一寫碼明度色調標度版本，用於該預定圖像之一圖像區塊之一色度殘餘縮放因數係自該圖像區塊之一鄰域內之該預定圖像的該經重建明度分量之該寫碼明度色調標度版本之一平均值來判定，且針對該圖像區塊自該資料串流解碼之一色度殘餘信號係根據該色度殘餘縮放因數而經縮放，且用於針對該圖像區塊來校正一框內色度預測信號。 D2.5. The video decoder of embodiment D2.4, wherein based on the luma tone mapping and chroma residual scaling prediction tools, Luminance component prediction and a luminance component residual decoding for a predetermined image are performed with a coded luma tone scale, and a rendered luma tone scale is mapped to the coded luma tone by a luma tone map on a scale to obtain a coded luma tonal scaled version of one of the reconstructed luma components of the predetermined image, A chroma residual scaling factor for an image block of the predetermined image is derived from the coded luma tone scaled version of the reconstructed luma component of the predetermined image within a neighborhood of the image block to determine the average value of A chroma residual signal decoded from the data stream for the image block is scaled according to the chroma residual scaling factor and used to correct an intra-frame chroma prediction signal for the image block.

D2.6. 如任一前述實施例D2.#之視訊解碼器，其中一或多個寫碼工具之該集合進一步包含一光流工具。 D2.6. The video decoder of any preceding embodiment D2.#, wherein the set of one or more coding tools further includes An optical flow tool.

D2.7. 如實施例D2.6之視訊解碼器，其中該光流工具係用於藉助於基於光流之分析來改良一預定經框間預測區塊之一平移框間預測信號。 D2.7. The video decoder of embodiment D2.6, wherein the optical flow tool is A translational inter prediction signal for improving a predetermined inter prediction block by means of optical flow based analysis.

D2.8. 如任一前述實施例D2.#之視訊解碼器，其中一或多個寫碼工具之該集合進一步包含一解碼器側運動向量改進工具。 D2.8. The video decoder of any preceding embodiment D2.#, wherein the set of one or more coding tools further includes A decoder-side motion vector improvement tool.

D2.9. 如實施例D2.8之視訊解碼器，其中該解碼器側運動向量改進工具係用於藉由在於該資料串流中寫碼之一經傳信運動向量(402)處及周圍之運動向量候選者當中執行一最佳匹配搜索來改進該經傳信運動向量而改良該經傳信運動向量，以用於對來自一參考圖像(404)之一預定經框間預測區塊(10d)進行框間預測。 D2.9. The video decoder of embodiment D2.8, wherein the decoder-side motion vector improvement tool is for improving a signaled motion vector by performing a best match search among motion vector candidates at and around a signaled motion vector coded in the data stream (402). , for performing inter prediction on a predetermined inter predicted block (10d) from a reference image (404).

D2.9a如實施例D2.9之視訊解碼器，其中該解碼器側運動向量改進工具經組配以相對於該參考圖像使用該經框間預測區塊之一已經解碼鄰域來執行該最佳匹配搜索。 D2.9a is the video decoder of embodiment D2.9, wherein the decoder-side motion vector improvement tool is configured with The best match search is performed using one of the decoded neighborhoods of the inter-predicted block relative to the reference image.

D2.9b. 如實施例D2.8之視訊解碼器，其中該解碼器側運動向量改進工具經組配以藉由在包括在該資料串流中寫碼之一對經傳信運動向量(402)及圍繞該對經傳信運動向量之運動向量配對候選者當中執行一最佳匹配搜索來改進該對經傳信運動向量，以用於對來自在時間上置放在一經預定框間雙向預測區塊(10d)之一圖像前方及後方的一對參考圖像(404)之該經預定框間雙向預測區塊(10d)進行框間預測。 D2.9b. The video decoder of embodiment D2.8, wherein the decoder-side motion vector improvement tool is configured with Improving a pair of signaled motion vectors by performing a best match search among motion vector pairing candidates including a pair of signaled motion vectors encoded in the data stream (402) and surrounding the pair of signaled motion vectors Vectors for mapping a predetermined inter-frame bi-predicted block (10d) from a pair of reference pictures (404) temporally placed in front of and behind a picture of the predetermined inter-frame bi-predicted block (10d). (10d) Perform inter-frame prediction.

D2.10. 如任一前述實施例D2.#之視訊解碼器，其中一或多個寫碼工具之該集合進一步包含一時間運動向量預測工具。 D2.10. The video decoder of any of the preceding embodiments D2.#, wherein the set of one or more coding tools further includes A temporal motion vector prediction tool.

D2.11. 如實施例D2.10之視訊解碼器，其中根據該時間運動向量預測工具，用於經框間預測區塊之運動向量候選者清單形成包括自一經先前解碼圖像進行運動向量候選者補充。D2.11. The video decoder of embodiment D2.10, wherein according to the temporal motion vector prediction tool, formation of the motion vector candidate list for the inter-predicted block includes generating motion vector candidates from a previously decoded image. who added.

D2.12. 如實施例D2.11之視訊解碼器，其中根據該時間運動向量預測工具，用於經框間預測區塊之運動向量候選者清單形成包括自由一運動向量預測子指向之該經先前解碼圖像之一區塊進行運動向量候選者補充。D2.12. The video decoder of embodiment D2.11, wherein according to the temporal motion vector prediction tool, the motion vector candidate list for the inter prediction block is formed to include the motion vector predictor pointing to the free one. Motion vector candidate supplementation is performed on a block of a previously decoded image.

D2.13. 如實施例D2.12之視訊解碼器，其中該運動向量預測子包括一時間運動向量預測子。D2.13. The video decoder of embodiment D2.12, wherein the motion vector predictor includes a temporal motion vector predictor.

D2.14. 如任一前述實施例D1.#之視訊解碼器，其中該指示包括於以下各者中之一者中：由該圖像序列中之該等圖像引用之一或多個圖像參數集，該圖像序列中之該等圖像之一圖像標頭，及該圖像序列中之該等圖像的切片之一切片標頭。 D2.14. The video decoder of any of the preceding embodiments D1.#, wherein the instruction is included in one of the following: one or more image parameter sets referenced by the images in the image sequence, one of the image headers of the images in the sequence of images, and A slice header for one of the slices of the images in the image sequence.

D2.15. 如任一前述實施例D2.#之視訊解碼器，其中該指示包括於以下各者中由該圖像序列中之該等圖像引用之圖像參數集，其中該等圖像參數集包含：至少一個第一圖像參數集，其指示引用該至少一個第一圖像參數集之圖像以不包括一或多個寫碼工具之該預定集合之一方式經寫碼；及至少一個第二圖像參數集，其指示引用該至少一個第二圖像參數集之圖像以可能使用一或多個寫碼工具之該預定集合之一方式經寫碼，或由該圖像序列中之該等圖像引用之圖像參數集，其中該等圖像參數集包含：至少一個第一圖像參數集，其指示與引用該至少一個第一圖像參數集之圖像相關聯的RASL圖像以不包括一或多個寫碼工具之該預定集合之一方式經寫碼；及至少一個第二圖像參數集，其指示與引用該至少一個第二圖像參數集之圖像相關聯的RASL圖像以可能使用一或多個寫碼工具之該預定集合之一方式經寫碼。 D2.15. The video decoder of any of the preceding embodiments D2.#, wherein the instruction is included in the following: A set of image parameters referenced by the images in the sequence of images, wherein the sets of image parameters include: at least one first set of image parameters indicating a picture that references the at least one first set of image parameters The image is coded in a manner that does not include the predetermined set of one or more coding tools; and at least one second set of image parameters indicating that images referencing the at least one second set of image parameters may be used one of the predetermined sets of one or more coding tools is coded, or Image parameter sets referenced by the images in the image sequence, wherein the image parameter sets include: at least a first image parameter set indicating and referencing the at least one first image parameter set a RASL image associated with the image is coded in a manner that does not include one of the predetermined set of one or more coding tools; and at least one second image parameter set indicating and referencing the at least one second image The RASL image associated with the image of the parameter set is coded in one of the predetermined sets, possibly using one or more coding tools.

D2.16. 如實施例D2.15之視訊解碼器，其中該指示包含該等圖像參數集之一延伸語法部分內之一語法元素[例如，使用sps_extra_ph_bit_present_flag及ph_extra_bit]。D2.16. The video decoder of embodiment D2.15, wherein the indication includes a syntax element within an extended syntax portion of the image parameter set [e.g., using sps_extra_ph_bit_present_flag and ph_extra_bit].

D2.17. 如實施例D2.16之視訊解碼器，其中該等圖像參數集之該延伸語法部分之一長度在該資料串流之一序列或視訊參數集中指定。D2.17. The video decoder of embodiment D2.16, wherein a length of the extended syntax part of the image parameter set is specified in a sequence of the data stream or a video parameter set.

D2.18. 如任一前述實施例D2.#之視訊解碼器，其中該指示為以下各者之一延伸部分中之一語法元素：該圖像序列中之該等圖像之一圖像標頭，及/或該圖像序列中之該等圖像的切片之一切片標頭，其中該延伸部分之一長度[例如，NumExtraPhBits]係在該資料串流之一圖像或序列或視訊參數集中指示。 D2.18. The video decoder of any preceding embodiment D2.#, wherein the instruction is a syntax element in an extension of one of the following: one of the image headers of the images in the sequence of images, and/or A slice header for one of the slices of the images in the sequence of images, One of the lengths of the extension [eg, NumExtraPhBits] is indicated in an image or sequence of the data stream or in a video parameter set.

D2.18a. 如實施例D2.18之視訊解碼器，其中該語法元素指示該語法元素所屬[例如，該圖像標頭或切片標頭有關]之一圖像是否以不包括一或多個寫碼工具之該預定集合之一方式經寫碼，或與該語法元素所屬之該圖像相關聯之RASL圖像是否以不包括一或多個寫碼工具之該預定集合之一方式經寫碼。 D2.18a. The video decoder of embodiment D2.18, wherein the syntax element indicates whether an image to which the syntax element belongs [for example, to which the image header or slice header relates] is coded in a manner that does not include one of the predetermined set of one or more coding tools, or Whether the RASL image associated with the image to which the syntax element belongs is coded in a manner that does not include one of the predetermined set of one or more coding tools.

D2.19. 如任一前述實施例D2.#之視訊解碼器，其中該解碼器經組配以支援參考圖像重新取樣。D2.19. The video decoder of any preceding embodiment D2.#, wherein the decoder is configured to support reference picture resampling.

D2.22. 如實施例D2.19之視訊解碼器，其中，根據該參考圖像重新取樣，一經框間預測區塊之一參考圖像係經受樣本重新取樣，以便橋接該參考圖像與其中含有該經框間預測區塊之一圖像之間的一縮放窗大小偏差或樣本解析度偏差，以為該經框間預測區塊提供一框間預測信號。D2.22. The video decoder of embodiment D2.19, wherein, based on the reference picture resampling, one of the reference pictures of the inter-frame prediction block is subjected to sample resampling in order to bridge the reference picture and the A scaling window size deviation or sample resolution deviation between images containing the inter-predicted block provides an inter-prediction signal for the inter-predicted block.

D2.23. 如前述實施例D2.#中任一項之視訊解碼器，其中一或多個寫碼工具之該集合包含[例如，200、300、400、500] 一或多個第一固有應用的寫碼工具，其中之每一者係針對一預定區塊取決於在該資料串流中針對該預定區塊傳信之一或多個寫碼選項而應用，且與除各別另一寫碼工具以外之一另一寫碼工具相關,及/或一或多個第二固有應用的寫碼工具，其中之每一者係針對一預定區塊取決於該預定區塊之一大小而應用。 D2.23. The video decoder as in any one of the aforementioned embodiments D2.#, wherein the set of one or more coding tools includes [for example, 200, 300, 400, 500] one or more first application-specific coding tools, each of which is applied for a predetermined block depending on one or more coding options communicated in the data stream for the predetermined block, and Relevant to a coding tool other than the respective coding tool, and/or One or more second application-specific coding tools, each of which is applied to a predetermined block depending on a size of the predetermined block.

D2.24. 如前述實施例D2.#中任一項之視訊解碼器，其中一或多個寫碼工具之該集合包含[例如，100、500] 一或多個明確應用的寫碼工具，其中之每一者係針對一預定區塊取決於經寫碼成該資料串流之一語法元素而應用於該預定區塊，以用於專門地傳信各別寫碼工具對該預定區塊之應用。 D2.24. The video decoder as in any one of the aforementioned embodiments D2.#, wherein the set of one or more coding tools includes [for example, 100, 500] One or more explicitly applied coding tools, each of which is applied to a predetermined block in dependence on a syntax element coded into the data stream for specifically transmitting The application of each coding tool to the predetermined block.

D2.25. 如實施例D2.24之視訊解碼器，其中該解碼器經組配以針對以下各者內之區塊自該資料串流解碼該語法元素：一或多個寫碼工具之該預定集合經傳信以自編碼排除之圖像或切片；及一或多個寫碼工具之該預定集合未經傳信以自編碼排除之圖像或切片。D2.25. The video decoder of embodiment D2.24, wherein the decoder is configured to decode the syntax element from the data stream for blocks within: the one or more coding tools A predetermined set of images or slices that are signaled to be excluded from self-encoding; and the predetermined set of one or more coding tools that are not signaled to be excluded from self-encoding.

D2.26. 如實施例D2.24之視訊解碼器，其中該解碼器經組配以僅針對一或多個寫碼工具之該預定集合經傳信以自編碼排除之圖像或切片內的區塊自該資料串流解碼該語法元素。D2.26. The video decoder of embodiment D2.24, wherein the decoder is configured to signal regions within images or slices excluded from encoding only for the predetermined set of one or more coding tools Block decodes the syntax element from the data stream.

D2.27. 如前述實施例D2.24或更高中任一項之視訊解碼器，其中該基於跨分量線性模型之預測工具屬於該一或多個一或多個明確應用的寫碼工具。D2.27. The video decoder of the preceding embodiment D2.24 or any one of the above, wherein the prediction tool based on the cross-component linear model belongs to the one or more one or more explicitly applied coding tools.

D2.28. 如前述實施例D2.16或更高中任一項之視訊解碼器，其中該語法元素為一個位元，其共同地指示不包括一或多個寫碼工具之該預定集合中之所有寫碼工具。D2.28. The video decoder of any one of the preceding embodiments D2.16 or above, wherein the syntax element is a bit that collectively indicates that the predetermined set of one or more coding tools is not included All coding tools.

D2.29. 如前述實施例D2.#中任一項之視訊解碼器，其中一或多個寫碼工具之該集合包含一或多個可去啟動寫碼工具，其中之每一者就其應用於圖像區塊而言可藉由該資料串流內部之組配信令以圖像或切片為單位而去啟動。 D2.29. The video decoder as in any one of the aforementioned embodiments D2.#, wherein the set of one or more coding tools includes One or more enableable coding tools, each of which with respect to its application to an image block, may be enabled on an image or slice basis by assembly signaling within the data stream.

D2.30. 如前述實施例D2.#中任一項之視訊解碼器，其經組配以使用該指示以查看開放GOP切換是否導致持久的漂移。D2.30. A video decoder as in any of the preceding embodiments D2.# configured to use this indication to see whether open GOP switching causes persistent drift.

D2.30a. 如前述實施例D2.#中任一項之視訊解碼器，其經組配以使用該指示來查看開放GOP切換是否導致樣本不匹配，但保持語法及參數設置。D2.30a. A video decoder as in any of the preceding embodiments D2.# configured to use this instruction to see if open GOP switching results in a sample mismatch, but maintaining syntax and parameter settings.

E1.1. 一種用於將一視訊編碼成一資料串流之視訊編碼器，其經組配以將一指示[例如，gci_rasl_pictures_tool_constraint_flag]編碼成該資料串流，該指示對於該視訊之一圖像序列為有效的且指示該圖像序列內之RASL圖像以不包括一或多個寫碼工具之一預定集合之一方式經寫碼[例如作為一種承諾，使得該編碼器已知，藉由串接在不同空間解析度及/或不同SNR下寫碼之該視訊之單獨寫碼的開放GOP版本之開放GOP切換在RASL圖像中不導致過多漂移]。 E1.1. A video encoder for encoding a video into a data stream, configured with Encode into the data stream a directive [e.g., gci_rasl_pictures_tool_constraint_flag] that is valid for an image sequence of the video and indicates that the RASL images within the image sequence do not include one or more coding tools. A predetermined set of coded in one way [e.g. as a commitment made known to the encoder by concatenating individually coded open GOP versions of the video coded at different spatial resolutions and/or different SNRs Open GOP switching does not cause excessive drift in RASL images].

E1.2. 如任一前述實施例E1.#之視訊編碼器，其中一或多個寫碼工具之該集合包含一基於跨分量線性模型之預測工具(100)。 E1.2. The video encoder of any of the preceding embodiments E1.#, wherein the set of one or more coding tools includes A prediction tool (100) based on a cross-component linear model.

E1.3. 如E1.2之視訊編碼器，其中，根據該基於跨分量線性模型之預測工具，一圖像區塊(10a)之一色度分量(102)係使用一線性模型(106)自該圖像區塊(10a)之一明度分量(104)來預測，該線性模型之參數係自該圖像區塊之一已經解碼鄰域(112)中之明度及色度極值(110)來判定(108)。 E1.3. The video encoder such as E1.2, wherein according to the prediction tool based on the cross-component linear model, A chrominance component (102) of an image block (10a) is predicted from a luminance component (104) of the image block (10a) using a linear model (106) whose parameters are derived from the One of the image blocks has been determined (108) by decoding the brightness and chroma extrema (110) in the neighborhood (112).

E1.4. 如任一前述實施例E1.#之視訊編碼器，其中一或多個寫碼工具之該集合包含一明度色調映射及色度殘餘縮放預測工具(200)。 E1.4. The video encoder of any of the preceding embodiments E1.#, wherein the set of one or more coding tools includes A luma tone mapping and chroma residual scaling prediction tool (200).

E1.5. 如實施例E1.4之視訊編碼器，其中，根據該明度色調映射及色度殘餘縮放預測工具，用於一預定圖像(12)之一明度分量預測(202)[例如框間預測]及一明度分量殘餘編碼(204)係以一寫碼明度色調標度(208)來執行，一呈現明度色調標度(210)係藉由一明度色調映射(212)而經映射至該寫碼明度色調標度上，以獲得該預定圖像之一經重建明度分量之一寫碼明度色調標度版本(214)，用於該預定圖像之一圖像區塊(10b)之一色度殘餘縮放因數(216)係自該圖像區塊之一鄰域(222)內之該預定圖像的該經重建明度分量之該寫碼明度色調標度版本之一平均值(220)來判定，且針對該圖像區塊在該資料串流中寫碼之一色度殘餘信號(224)係根據該色度殘餘縮放因數而經縮放(226)且用於針對該圖像區塊來校正(228)一框內色度預測信號(230)。 E1.5. The video encoder of embodiment E1.4, wherein based on the luma tone mapping and chroma residual scaling prediction tools, Luminance component prediction (202) [eg, inter-frame prediction] and a luma component residual encoding (204) for a predetermined image (12) are performed with a coded luma tone scale (208), a rendered luma The hue scale (210) is mapped onto the coded value tone scale by a value tone map (212) to obtain a coded value tone scale version of the reconstructed value component of the predetermined image ( 214), A chroma residual scaling factor (216) for an image block (10b) of the predetermined image is derived from the reconstructed luma component of the predetermined image within a neighborhood (222) of the image block It is judged by the average value (220) of the lightness and hue scale version of the code, and A chroma residual signal coded in the data stream for the image block (224) is scaled (226) according to the chroma residual scaling factor and used to correct for the image block (228) In-frame chroma prediction signal (230).

E1.6. 如任一前述實施例E1.#之視訊編碼器，其中一或多個寫碼工具之該集合包含一光流工具(300)。 E1.6. The video encoder of any of the preceding embodiments E1.#, wherein the set of one or more coding tools includes An optical flow tool (300).

E1.7. 如實施例E1.6之視訊編碼器，其中該光流工具係用於藉助於基於光流之分析來改良一預定經框間預測區塊(10c)之一平移框間預測信號。 E1.7. The video encoder of embodiment E1.6, wherein the optical flow tool is A translational inter prediction signal for improving a predetermined inter prediction block (10c) by means of optical flow based analysis.

E1.8. 如任一前述實施例E1.#之視訊編碼器，其中一或多個寫碼工具之該集合包含一解碼器側運動向量改進工具(400)。 E1.8. The video encoder of any of the preceding embodiments E1.#, wherein the set of one or more coding tools includes A decoder side motion vector improvement tool (400).

E1.9. 如實施例E1.8之視訊編碼器，其中該解碼器側運動向量改進工具係用於藉由在於該資料串流中寫碼之一經傳信運動向量(402)處及周圍之運動向量候選者當中執行一最佳匹配搜索來改進該經傳信運動向量而改良該經傳信運動向量，以用於對來自一參考圖像(404)之一預定經框間預測區塊(10d)進行框間預測。 E1.9. The video encoder of embodiment E1.8, wherein the decoder-side motion vector improvement tool is for improving a signaled motion vector by performing a best match search among motion vector candidates at and around a signaled motion vector coded in the data stream (402). , for performing inter prediction on a predetermined inter predicted block (10d) from a reference image (404).

E1.9a.如實施例E1.9之視訊編碼器，其中該解碼器側運動向量改進工具經組配以相對於該參考圖像使用該經框間預測區塊之一已經解碼鄰域來執行該最佳匹配搜索。 E1.9a. The video encoder of embodiment E1.9, wherein the decoder-side motion vector improvement tool is configured with The best match search is performed using one of the decoded neighborhoods of the inter-predicted block relative to the reference image.

E1.9b.如實施例E1.8之視訊編碼器，其中該解碼器側運動向量改進工具經組配以藉由在包括在該資料串流中寫碼之一對經傳信運動向量(402)及圍繞該對經傳信運動向量之運動向量配對候選者當中執行一最佳匹配搜索來改進該對經傳信運動向量，以用於對來自在時間上置放在一經預定框間雙向預測區塊(10d)之一圖像前方及後方的一對參考圖像(404)之該經預定框間雙向預測區塊(10d)進行框間預測。 E1.9b. The video encoder of embodiment E1.8, wherein the decoder-side motion vector improvement tool is configured with Improving a pair of signaled motion vectors by performing a best match search among motion vector pairing candidates including a pair of signaled motion vectors encoded in the data stream (402) and surrounding the pair of signaled motion vectors Vectors for mapping a predetermined inter-frame bi-predicted block (10d) from a pair of reference pictures (404) temporally placed in front of and behind a picture of the predetermined inter-frame bi-predicted block (10d). (10d) Perform inter-frame prediction.

E1.10. 如任一前述實施例E1.#之視訊編碼器，其中一或多個寫碼工具之該集合包含一時間運動向量預測工具(500)。 E1.10. The video encoder of any preceding embodiment E1.#, wherein the set of one or more coding tools includes A temporal motion vector prediction tool (500).

E1.11. 如實施例E1.10之視訊編碼器，其中根據該時間運動向量預測工具，用於經框間預測區塊之運動向量候選者清單形成包括自一經先前解碼圖像(502)進行運動向量候選者補充。E1.11. The video encoder of embodiment E1.10, wherein according to the temporal motion vector prediction tool, forming a motion vector candidate list for an inter-predicted block includes performing from a previously decoded image (502) Motion vector candidate complement.

E1.12. 如實施例E1.11之視訊編碼器，其中根據該時間運動向量預測工具，用於經框間預測區塊之運動向量候選者清單形成包括自由一運動向量預測子(504)指向之該經先前編碼圖像之一區塊(506)進行運動向量候選者補充。E1.12. The video encoder of embodiment E1.11, wherein according to the temporal motion vector prediction tool, the motion vector candidate list for the inter prediction block is formed to include a free motion vector predictor (504) pointing Motion vector candidate supplementation is performed on a block of the previously encoded image (506).

E1.13. 如實施例E1.12之視訊編碼器，其中該運動向量預測子包括一時間運動向量預測子。E1.13. The video encoder of embodiment E1.12, wherein the motion vector predictor includes a temporal motion vector predictor.

E1.14. 如任一前述實施例E1.#之視訊編碼器，其中該指示包括於以下各者中之一者中：該資料串流之一解碼器能力資訊區段，及該資料串流之一視訊或序列參數集，及一補充增強資訊訊息。 E1.14. The video encoder of any preceding embodiment E1.#, wherein the instruction is included in one of the following: a decoder capability information section of the data stream, and one of the video or sequence parameter sets of the data stream, and 1. Supplement and enhance information messages.

E1.15. 如任一前述實施例E1.#之視訊編碼器，其中該指示包含一個位元，其共同地指示相對於該圖像序列內之該等RASL圖像之該寫碼，不包括一或多個寫碼工具之該預定集合中之所有寫碼工具。E1.15. The video encoder of any preceding embodiment E1.#, wherein the indication includes a bit that collectively indicates the writing relative to the RASL images in the image sequence, excluding All coding tools in the predetermined set of one or more coding tools.

E1.16.如任一前述實施例E1.#之視訊編碼器，其中該編碼器經組配以支援參考圖像重新取樣。E1.16. The video encoder of any preceding embodiment E1.#, wherein the encoder is configured to support reference picture resampling.

E1.17. 如實施例E1.16之視訊編碼器，其中，根據該參考圖像重新取樣，一經框間預測區塊之一參考圖像係經受樣本重新取樣，以便橋接該參考圖像與其中含有該經框間預測區塊之一圖像之間的一縮放窗大小偏差或樣本解析度偏差，以為該經框間預測區塊提供一框間預測信號。E1.17. The video encoder of embodiment E1.16, wherein based on the reference picture resampling, one of the reference pictures of an inter-frame prediction block is subjected to sample resampling in order to bridge the reference picture with the reference picture therein. A scaling window size deviation or sample resolution deviation between images containing the inter-predicted block provides an inter-prediction signal for the inter-predicted block.

E1.18. 如前述實施例E1.#中任一項之視訊編碼器，其中一或多個寫碼工具之該集合包含[例如200、300、400、500] 一或多個第一固有應用的寫碼工具，其中之每一者係針對一預定區塊取決於在該資料串流中針對該預定區塊傳信之一或多個寫碼選項而應用，且與除各別另一寫碼工具以外之一另一寫碼工具相關,及/或一或多個第二固有應用的寫碼工具，其中之每一者係針對一預定區塊取決於該預定區塊之一大小而應用。 E1.18. The video encoder as in any one of the preceding embodiments E1.#, wherein the set of one or more coding tools includes [for example, 200, 300, 400, 500] one or more first application-specific coding tools, each of which is applied for a predetermined block depending on one or more coding options communicated in the data stream for the predetermined block, and Relevant to a coding tool other than the respective coding tool, and/or One or more second application-specific coding tools, each of which is applied to a predetermined block depending on a size of the predetermined block.

E1.19. 如前述實施例E1.#中任一項之視訊編碼器，其中一或多個寫碼工具之該集合包含[例如100] 一或多個明確應用的寫碼工具，其中之每一者係針對一預定區塊取決於經寫碼成該資料串流之一語法元素而應用於該預定區塊，以用於專門地傳信各別寫碼工具對該預定區塊之應用。 E1.19. The video encoder as in any one of the preceding embodiments E1.#, wherein the set of one or more coding tools includes [for example, 100] One or more explicitly applied coding tools, each of which is applied to a predetermined block in dependence on a syntax element coded into the data stream for specifically transmitting The application of each coding tool to the predetermined block.

E1.20. 如實施例E1.19之視訊編碼器，其中該編碼器經組配以針對該等RASL圖像內之區塊以及針對除該等RASL圖像以外之圖像的區塊而將該語法元素編碼成該資料串流。E1.20. The video encoder of embodiment E1.19, wherein the encoder is configured to encode blocks within the RASL images and blocks in images other than the RASL images. The syntax element is encoded into the data stream.

E1.21. 如實施例E1.19之視訊編碼器，其中該編碼器經組配以僅針對除RASL圖像以外之圖像內的區塊將該語法元素編碼成該資料串流[例如，藉此節省該等RASL圖像中之位元]。E1.21. The video encoder of embodiment E1.19, wherein the encoder is configured to encode the syntax element into the data stream only for blocks within images other than RASL images [e.g., thereby saving bits in these RASL images].

E1.22. 如前述實施例E1.#中任一項之視訊編碼器，其中該編碼器經組配以支援框內預測區塊編碼模式及框間預測區塊編碼模式。E1.22. The video encoder as in any one of the preceding embodiments E1.#, wherein the encoder is configured to support intra-prediction block coding mode and inter-prediction block coding mode.

E1.23.如前述實施例E1.#中任一項之視訊編碼器，其中該圖像序列開始於且包括一個CRA圖像，且包含直至-按寫碼次序-且結尾於緊接在一CRA圖像之前的一圖像之圖像，或包含按寫碼次序為連續的且包含多於一個CRA之圖像。 E1.23. The video encoder as in any one of the preceding embodiments E1.#, wherein the image sequence Beginning with and including a CRA image and including images up to - in coding order - and ending with an image immediately preceding a CRA image, or Contains images that are consecutive in coding order and contain more than one CRA.

E1.24. 如前述實施例E1.#中任一項之視訊編碼器，其中一或多個寫碼工具之該集合包含一或多個可去啟動寫碼工具，其中之每一者就其應用於圖像區塊而言可藉由該資料串流內部之組配信令以圖像或切片為單位而去啟動。 E1.24. The video encoder as in any one of the preceding embodiments E1.#, wherein the set of one or more coding tools includes One or more enableable coding tools, each of which with respect to its application to an image block, may be enabled on an image or slice basis by assembly signaling within the data stream.

E1.25. 如前述實施例E1.#中任一項之視訊編碼器，其經組配以在將該視訊編碼成該資料串流時遵從作為一編碼約束之指示。E1.25. The video encoder of any one of the preceding embodiments E1.# configured to comply with instructions as an encoding constraint when encoding the video into the data stream.

E1.26. 如前述實施例E1.#中任一項之視訊編碼器，其中該指示指示該圖像序列內之所有RASL圖像以不包括一或多個寫碼工具之該預定集合之該方式經寫碼，及/或該指示指示該圖像序列內之所有RASL圖像以不包括一或多個寫碼工具之該預定集合之該方式經寫碼，該等RASL圖像具有參考圖像，該等參考圖像按解碼次序在與其相關聯的一CRA圖像之前，及/或該指示指示該圖像序列內之所有RASL圖像以不包括一或多個寫碼工具之該預定集合之該方式經寫碼，該等RASL圖像用作用於以下圖像之一時間運動向量預測參考圖像，及/或該指示指示該圖像序列內之所有RASL圖像以不包括一或多個寫碼工具之該預定集合之該方式經寫碼，該等RASL圖像不屬於一最高時間層。 E1.26. The video encoder as in any one of the aforementioned embodiments E1.#, wherein the indication indicates that all RASL images within the sequence of images are coded in such a manner that does not include the predetermined set of one or more coding tools, and/or The indication indicates that all RASL images within the image sequence are coded in the manner excluding the predetermined set of one or more coding tools, the RASL images having reference images, the reference images being The decoding order precedes a CRA picture with which it is associated, and/or The indication indicates that all RASL images within the image sequence are coded in such a manner excluding the predetermined set of one or more coding tools that are used as a temporal motion vector for the image predict reference images, and/or The indication indicates that all RASL images within the image sequence that do not belong to a top temporal layer are coded in such a manner that does not include the predetermined set of one or more coding tools.

E1.27. 如前述實施例E1.#中任一項之視訊編碼器，其中該指示指示該圖像序列內之該等RASL圖像以不包括一或多個寫碼工具之該預定集合之該方式以一如下方式寫碼使得對於在一或多個寫碼工具之該預定集合中的一或多個寫碼工具之一第一子集，一第一特性應用於之所有RASL圖像以不包括一或多個寫碼工具之該預定集合中的一或多個寫碼工具之該第一子集的該方式經寫碼，且對於在一或多個寫碼工具之該預定集合中的一或多個寫碼工具之一第二子集，一第二特性應用於之所有RASL圖像，或所有RASL圖像，以不包括一或多個寫碼工具之該預定集合中的一或多個寫碼工具之該第二子集的該方式經寫碼，該第一子集及該第二子集相互不相交。 E1.27. The video encoder as in any one of the aforementioned embodiments E1.#, wherein The instruction instructs the RASL images within the sequence of images to be coded in a manner that does not include the predetermined set of one or more coding tools such that For a first subset of one or more coding tools in the predetermined set of one or more coding tools, a first characteristic is applied to all RASL images excluding the one or more coding tools the manner of the first subset of one or more coding tools in the predetermined set is coded, and For a second subset of one or more coding tools in the predetermined set of one or more coding tools, a second characteristic applies to all RASL images, or to all RASL images excluding The manner in which the second subset of one or more coding tools in the predetermined set of one or more coding tools is coded is such that the first subset and the second subset are mutually disjoint.

E1.28. 如實施例E1.27之視訊編碼器，其中該第一特性及/或該第二特性係選自使參考圖像按解碼次序在與其相關聯的一CRA圖像之前，充當用於以下圖像之一時間運動向量預測參考圖像，及不屬於一最高時間層。 E1.28. The video encoder of embodiment E1.27, wherein the first characteristic and/or the second characteristic is selected from causing the reference picture to precede a CRA picture with which it is associated in decoding order, serves as a reference image for temporal motion vector prediction for one of the following images, and Does not belong to a highest time layer.

E1.29. 如實施例E1.27或更高之視訊編碼器，其中該第一子集包括一解碼器側運動向量改進工具及一時間運動向量預測工具中之一或多者。 E1.29. The video encoder of embodiment E1.27 or higher, wherein The first subset includes one or more of a decoder-side motion vector improvement tool and a temporal motion vector prediction tool.

E1.30. 如實施例E1.29之視訊編碼器，其中該第一特性係選自使參考圖像按解碼次序在與其相關聯的一CRA圖像之前，充當用於以下圖像之一時間運動向量預測參考圖像，及不屬於一最高時間層。 E1.30. The video encoder of embodiment E1.29, wherein the first characteristic is selected from causing the reference picture to precede a CRA picture with which it is associated in decoding order, serves as a reference image for temporal motion vector prediction for one of the following images, and Does not belong to a highest time layer.

E2.1. 一種用於將一視訊編碼成一資料串流之視訊編碼器，其經組配以將一指示編碼成該資料串流[例如，使用sps_extra_ph_bit_present_flag及ph_extra_bit，或使用gci_rasl_pictures_tool_contraint_flag]，該指示按該視訊之一圖像序列中之圖像、全域地針對各別圖像或在每切片基礎上指示該各別圖像是否以不包括一或多個寫碼工具之一預定集合之一方式經寫碼，該預定集合包含一基於跨分量線性模型之預測工具[例如作為一種圖像式指示，其使得有可能查看RASL圖像處之潛力漂移係足夠低的]。 E2.1. A video encoder for encoding a video into a data stream, configured with Encode an instruction into the data stream [e.g., using sps_extra_ph_bit_present_flag and ph_extra_bit, or using gci_rasl_pictures_tool_contraint_flag], by image in an image sequence of the video, globally for individual images, or on a per-slice basis Indicates whether the respective image is coded in a manner that does not include a predetermined set of one or more coding tools that includes a prediction tool based on a cross-component linear model [e.g., as an graphical indication, This makes it possible to view RASL images where the potential drift is sufficiently low].

E2.2. 如前述實施例E2.#中任一項之視訊編碼器，其中該編碼器經組配以支援框內預測區塊編碼模式及框間預測區塊編碼模式。E2.2. The video encoder as in any one of the preceding embodiments E2.#, wherein the encoder is configured to support intra prediction block coding mode and inter prediction block coding mode.

E2.3. 如任一前述實施例E2.#之視訊編碼器，其中，根據該基於跨分量線性模型之預測工具，一圖像區塊之一色度分量係使用一線性模型自該圖像區塊之一明度分量來預測，該線性模型之參數係自該圖像區塊之一已經編碼鄰域中之明度及色度極值來判定。 E2.3. The video encoder of any of the preceding embodiments E2.#, wherein according to the prediction tool based on the cross-component linear model, The chrominance component of an image block is predicted from the lightness component of the image block using a linear model whose parameters are derived from the lightness and color in one of the encoded neighborhoods of the image block. Judgment by extreme value.

E2.4. 如任一前述實施例E2.#之視訊編碼器，其中一或多個寫碼工具之該集合進一步包含一明度色調映射及色度殘餘縮放預測工具。 E2.4. The video encoder of any of the preceding embodiments E2.#, wherein the set of one or more coding tools further includes A luma tone mapping and chroma residual scaling prediction tool.

E2.5. 如實施例E2.4之視訊編碼器，其中，根據該明度色調映射及色度殘餘縮放預測工具，用於一預定圖像之一明度分量預測及一明度分量殘餘編碼係以一寫碼明度色調標度來執行，一呈現明度色調標度係藉由一明度色調映射經映射至該寫碼明度色調標度上，以獲得該預定圖像之一經重建明度分量之一寫碼明度色調標度版本，用於該預定圖像之一圖像區塊之一色度殘餘縮放因數係自該圖像區塊之一鄰域內之該預定圖像的該經重建明度分量之該寫碼明度色調標度版本之一平均值來判定，且針對該圖像區塊自該資料串流編碼之一色度殘餘信號係根據該色度殘餘縮放因數而經縮放，且用於針對該圖像區塊來校正一框內色度預測信號。 E2.5. The video encoder of embodiment E2.4, wherein based on the luma tone mapping and chroma residual scaling prediction tools, Luminance component prediction and a luminance component residual encoding for a predetermined image are performed with a coded luma tone scale, and a rendered luma tone scale is mapped to the coded luma tone by a luma tone map on a scale to obtain a coded luma tonal scaled version of one of the reconstructed luma components of the predetermined image, A chroma residual scaling factor for an image block of the predetermined image is derived from the coded luma tone scaled version of the reconstructed luma component of the predetermined image within a neighborhood of the image block one average value to determine, and A chroma residual signal encoded from the data stream for the image block is scaled according to the chroma residual scaling factor and used to correct an intra-frame chroma prediction signal for the image block.

E2.6. 如任一前述實施例E2.#之視訊編碼器，其中一或多個寫碼工具之該集合進一步包含一光流工具。 E2.6. The video encoder of any of the preceding embodiments E2.#, wherein the set of one or more coding tools further includes An optical flow tool.

E2.7. 如實施例E2.6之視訊編碼器，其中該光流工具係用於藉助於基於光流之分析來改良一預定經框間預測區塊之一平移框間預測信號。 E2.7. The video encoder of embodiment E2.6, wherein the optical flow tool is A translational inter prediction signal for improving a predetermined inter prediction block by means of optical flow based analysis.

E2.8. 如任一前述實施例E2.#之視訊編碼器，其中一或多個寫碼工具之該集合進一步包含一解碼器側運動向量改進工具。 E2.8. The video encoder of any of the preceding embodiments E2.#, wherein the set of one or more coding tools further includes A decoder-side motion vector improvement tool.

E2.9. 如實施例E2.8之視訊編碼器，其中該解碼器側運動向量改進工具係用於藉由在於該資料串流中寫碼之一經傳信運動向量(402)處及周圍之運動向量候選者當中執行一最佳匹配搜索來改進該經傳信運動向量而改良該經傳信運動向量，以用於對來自一參考圖像(404)之一預定經框間預測區塊(10d)進行框間預測。 E2.9. The video encoder of embodiment E2.8, wherein the decoder-side motion vector improvement tool is for improving a signaled motion vector by performing a best match search among motion vector candidates at and around a signaled motion vector coded in the data stream (402). , for performing inter prediction on a predetermined inter predicted block (10d) from a reference image (404).

E2.9a如實施例E2.9之視訊編碼器，其中該解碼器側運動向量改進工具經組配以相對於該參考圖像使用該經框間預測區塊之一已經解碼鄰域來執行該最佳匹配搜索。 E2.9a is the video encoder of embodiment E2.9, wherein the decoder-side motion vector improvement tool is configured with The best match search is performed using one of the decoded neighborhoods of the inter-predicted block relative to the reference image.

E2.9b. 如實施例E2.8之視訊編碼器，其中該解碼器側運動向量改進工具經組配以藉由在包括在該資料串流中寫碼之一對經傳信運動向量(402)及圍繞該對經傳信運動向量之運動向量配對候選者當中執行一最佳匹配搜索來改進該對經傳信運動向量，以用於對來自在時間上置放在一經預定框間雙向預測區塊(10d)之一圖像前方及後方的一對參考圖像(404)之該經預定框間雙向預測區塊(10d)進行框間預測。 E2.9b. The video encoder of embodiment E2.8, wherein the decoder-side motion vector improvement tool is configured with Improving a pair of signaled motion vectors by performing a best match search among motion vector pairing candidates including a pair of signaled motion vectors encoded in the data stream (402) and surrounding the pair of signaled motion vectors Vectors for mapping a predetermined inter-frame bi-predicted block (10d) from a pair of reference pictures (404) temporally placed in front of and behind a picture of the predetermined inter-frame bi-predicted block (10d). (10d) Perform inter-frame prediction.

E2.10. 如任一前述實施例E2.#之視訊編碼器，其中一或多個寫碼工具之該集合進一步包含一時間運動向量預測工具。 E2.10. The video encoder of any preceding embodiment E2.#, wherein the set of one or more coding tools further includes A temporal motion vector prediction tool.

E2.11. 如實施例E2.10之視訊編碼器，其中根據該時間運動向量預測工具，用於經框間預測區塊之運動向量候選者清單形成包括自一經先前編碼圖像進行運動向量候選者補充。E2.11. The video encoder of embodiment E2.10, wherein according to the temporal motion vector prediction tool, formation of the motion vector candidate list for the inter-predicted block includes generating motion vector candidates from a previously encoded image. who added.

E2.12. 如實施例E2.11之視訊編碼器，其中根據該時間運動向量預測工具，用於經框間預測區塊之運動向量候選者清單形成包括自由一運動向量預測子指向之該經先前編碼圖像之一區塊進行運動向量候選者補充。E2.12. The video encoder of embodiment E2.11, wherein according to the temporal motion vector prediction tool, the motion vector candidate list for the inter prediction block is formed to include the motion vector predictor pointing to the free one. Motion vector candidate supplementation is performed on a block of a previously encoded image.

E2.13. 如實施例E2.12之視訊編碼器，其中該運動向量預測子包括一時間運動向量預測子。E2.13. The video encoder of embodiment E2.12, wherein the motion vector predictor includes a temporal motion vector predictor.

E2.14. 如任一前述實施例D1.#之視訊編碼器，其中該指示包括於以下各者中之一者中：由該圖像序列中之該等圖像引用之一或多個圖像參數集，該圖像序列中之該等圖像之一圖像標頭，及該圖像序列中之該等圖像的切片之一切片標頭。 E2.14. The video encoder of any preceding embodiment D1.#, wherein the instruction is included in one of the following: one or more image parameter sets referenced by the images in the image sequence, one of the image headers of the images in the sequence of images, and A slice header for one of the slices of the images in the image sequence.

E2.15. 如任一前述實施例E2.#之視訊編碼器，其中該指示包括於以下各者中：由該圖像序列中之該等圖像引用之圖像參數集，其中該等圖像參數集包含：至少一個第一圖像參數集，其指示引用該至少一個第一圖像參數集之圖像以不包括一或多個寫碼工具之該預定集合之一方式經寫碼；及至少一個第二圖像參數集，其指示引用該至少一個第二圖像參數集之圖像以可能使用一或多個寫碼工具之該預定集合之一方式經寫碼，或由該圖像序列中之該等圖像引用之圖像參數集，其中該等圖像參數集包含：至少一個第一圖像參數集，其指示與引用該至少一個第一圖像參數集之圖像相關聯的RASL圖像以不包括一或多個寫碼工具之該預定集合之一方式經寫碼；及至少一個第二圖像參數集，其指示與引用該至少一個第二圖像參數集之圖像相關聯的RASL圖像以可能使用一或多個寫碼工具之該預定集合之一方式經寫碼。 E2.15. The video encoder of any preceding embodiment E2.#, wherein the instruction is included in the following: A set of image parameters referenced by the images in the sequence of images, wherein the sets of image parameters include: at least one first set of image parameters indicating a picture that references the at least one first set of image parameters The image is coded in a manner that does not include the predetermined set of one or more coding tools; and at least one second set of image parameters indicating that images referencing the at least one second set of image parameters may be used one of the predetermined sets of one or more coding tools is coded, or Image parameter sets referenced by the images in the image sequence, wherein the image parameter sets include: at least a first image parameter set indicating and referencing the at least one first image parameter set a RASL image associated with the image is coded in a manner that does not include one of the predetermined set of one or more coding tools; and at least one second image parameter set indicating and referencing the at least one second image The RASL image associated with the image of the parameter set is coded in one of the predetermined sets, possibly using one or more coding tools.

E2.16. 如實施例E2.15之視訊編碼器，其中該指示包含該等圖像參數集之一延伸語法部分內之一語法元素[例如，使用sps_extra_ph_bit_present_flag及ph_extra_bit]。E2.16. The video encoder of embodiment E2.15, wherein the indication includes a syntax element within an extended syntax portion of the image parameter set [eg, using sps_extra_ph_bit_present_flag and ph_extra_bit].

E2.17. 如實施例E2.16之視訊編碼器，其中該等圖像參數集之該延伸語法部分之一長度在該資料串流之一序列或視訊參數集中指定。E2.17. The video encoder of embodiment E2.16, wherein a length of the extended syntax part of the image parameter set is specified in a sequence of the data stream or a video parameter set.

E2.18. 如任一前述實施例E2.#之視訊編碼器，其中該指示為以下各者之一延伸部分中之一語法元素：該圖像序列中之該等圖像之一圖像標頭，及/或該圖像序列中之該等圖像的切片之一切片標頭，其中該延伸部分之一長度[例如，NumExtraPhBits]係在該資料串流之一圖像或序列或視訊參數集中指示。 E2.18. The video encoder of any preceding embodiment E2.#, wherein the indication is a syntax element in an extension of one of the following: one of the image headers of the images in the sequence of images, and/or A slice header for one of the slices of the images in the sequence of images, One of the lengths of the extension [eg, NumExtraPhBits] is indicated in an image or sequence of the data stream or in a video parameter set.

E2.18a. 如實施例E2.18之視訊編碼器，其中該語法元素指示該語法元素所屬[例如，該圖像標頭或切片標頭有關]之一圖像是否以不包括一或多個寫碼工具之該預定集合之一方式經寫碼，或與該語法元素所屬之該圖像相關聯之RASL圖像是否以不包括一或多個寫碼工具之該預定集合之一方式經寫碼。 E2.18a. The video encoder of embodiment E2.18, wherein the syntax element indicates whether an image to which the syntax element belongs [for example, to which the image header or slice header relates] is coded in a manner that does not include one of the predetermined set of one or more coding tools, or Whether the RASL image associated with the image to which the syntax element belongs is coded in a manner that does not include one of the predetermined set of one or more coding tools.

E2.19. 如任一前述實施例E2.#之視訊編碼器，其中該編碼器經組配以支援參考圖像重新取樣。E2.19. The video encoder of any preceding embodiment E2.#, wherein the encoder is configured to support reference picture resampling.

E2.22. 如實施例E2.19之視訊編碼器，其中，根據該參考圖像重新取樣，一經框間預測區塊之一參考圖像係經受樣本重新取樣，以便橋接該參考圖像與其中含有該經框間預測區塊之一圖像之間的一縮放窗大小偏差或樣本解析度偏差，以為該經框間預測區塊提供一框間預測信號。E2.22. The video encoder of embodiment E2.19, wherein based on the reference picture resampling, one of the reference pictures of an inter-frame prediction block is subjected to sample resampling in order to bridge the reference picture and therein A scaling window size deviation or sample resolution deviation between images containing the inter-predicted block provides an inter-prediction signal for the inter-predicted block.

E2.23.如前述實施例E2.#中任一項之視訊編碼器，其中一或多個寫碼工具之該集合包含[例如200、300、400、500] 一或多個第一固有應用的寫碼工具，其中之每一者係針對一預定區塊取決於在該資料串流中針對該預定區塊傳信之一或多個寫碼選項而應用，且與除各別另一寫碼工具以外之一另一寫碼工具相關,及/或一或多個第二固有應用的寫碼工具，其中之每一者係針對一預定區塊取決於該預定區塊之一大小而應用。 E2.23. The video encoder as in any one of the preceding embodiments E2.#, wherein the set of one or more coding tools includes [for example, 200, 300, 400, 500] one or more first application-specific coding tools, each of which is applied for a predetermined block depending on one or more coding options communicated in the data stream for the predetermined block, and Relevant to a coding tool other than the respective coding tool, and/or One or more second application-specific coding tools, each of which is applied to a predetermined block depending on a size of the predetermined block.

E2.24. 如前述實施例E2.#中任一項之視訊編碼器，其中一或多個寫碼工具之該集合包含一或多個明確應用的寫碼工具，其中之每一者係針對一預定區塊取決於經寫碼成該資料串流之一語法元素而應用於該預定區塊，以用於專門地傳信各別寫碼工具對該預定區塊之應用。 E2.24. The video encoder as in any one of the preceding embodiments E2.#, wherein the set of one or more coding tools includes One or more explicitly applied coding tools, each of which is applied to a predetermined block in dependence on a syntax element coded into the data stream for specifically transmitting The application of each coding tool to the predetermined block.

E2.25.如實施例E2.24之視訊編碼器，其中該編碼器經組配以針對以下各者內之區塊將該語法元素編碼成該資料串流：一或多個寫碼工具之該預定集合經傳信以自編碼排除之圖像或切片；及一或多個寫碼工具之該預定集合未經傳信以自編碼排除之圖像或切片。E2.25. The video encoder of embodiment E2.24, wherein the encoder is configured to encode the syntax element into the data stream for blocks within: one or more encoding tools The predetermined set of images or slices that are signaled to exclude from self-encoding; and the predetermined set of one or more coding tools that are not signaled to exclude from self-encoding.

E2.26. 如實施例E2.24之視訊編碼器，其中該編碼器經組配以僅針對一或多個寫碼工具之該預定集合經傳信以自編碼排除之圖像或切片內之區塊將該語法元素編碼成該資料串流。E2.26. The video encoder of embodiment E2.24, wherein the encoder is configured to signal regions within images or slices excluded from encoding only for the predetermined set of one or more encoding tools block encodes the syntax element into the data stream.

E2.27. 如前述實施例E2.24或更高中任一項之視訊編碼器，其中該基於跨分量線性模型之預測工具屬於該一或多個一或多個明確應用的寫碼工具。E2.27. The video encoder of any one of the preceding embodiments E2.24 or above, wherein the prediction tool based on the cross-component linear model belongs to the one or more one or more explicitly applied coding tools.

E2.28. 如前述實施例E2.16或更高中任一項之視訊編碼器，其中該語法元素為一個位元，其共同地指示不包括一或多個寫碼工具之該預定集合中之所有寫碼工具。E2.28. The video encoder of any one of the preceding embodiments E2.16 or above, wherein the syntax element is a bit that collectively indicates that one of the predetermined sets of one or more coding tools is not included. All coding tools.

E2.29. 如前述實施例E2.#中任一項之視訊編碼器，其中一或多個寫碼工具之該集合包含一或多個可去啟動寫碼工具，其中之每一者就其應用於圖像區塊而言可藉由該資料串流內部之組配信令以圖像或切片為單位而去啟動。 E2.29. The video encoder as in any one of the preceding embodiments E2.#, wherein the set of one or more coding tools includes One or more enableable coding tools, each of which with respect to its application to an image block, may be enabled on an image or slice basis by assembly signaling within the data stream.

E2.30. 如前述實施例E2.#中任一項之視訊編碼器，其經組配以在將該視訊編碼成該資料串流時遵從作為一編碼約束之指示。E2.30. The video encoder of any one of the preceding embodiments E2.#, configured to comply with instructions as an encoding constraint when encoding the video into the data stream.

B1.1. 一種其中編碼有一視訊的資料串流，其包含一指示[例如，gci_rasl_pictures_tool_constraint_flag]，該指示對於該視訊之一圖像序列為有效的且指示該圖像序列內之RASL圖像以不包括一或多個寫碼工具之一預定集合之一方式經寫碼[例如作為一種承諾，使得該編碼器已知，藉由串接在不同空間解析度及/或不同SNR下寫碼之該視訊之單獨寫碼的開放GOP版本之開放GOP切換在RASL圖像中不導致過多漂移]。 B1.1. A data stream in which a video is encoded, which contains An indication [e.g., gci_rasl_pictures_tool_constraint_flag] that is valid for an image sequence of the video and indicates that the RASL images within the image sequence were processed in a manner that does not include a predetermined set of one or more coding tools Coding [e.g. as a commitment to make it known to the encoder that open-GOP switching is in the RASL diagram by concatenating separately coded open-GOP versions of the video coded at different spatial resolutions and/or different SNRs does not cause excessive drift in the image].

B1.2 如實施例B1.1之資料串流，其由一如實施例E1.#中任一項之編碼器產生。B1.2 The data stream of Embodiment B1.1 is generated by the encoder of any one of Embodiments E1.#.

B2.1. 一種其中編碼有一視訊的資料串流，其包含一指示[例如，使用sps_extra_ph_bit_present_flag及ph_extra_bit，或使用gci_rasl_pictures_tool_contraint_flag]，該指示按該視訊之一圖像序列中之圖像、全域地針對各別圖像或在每切片基礎上指示該各別圖像是否以不包括一或多個寫碼工具之一預定集合之一方式經寫碼，該預定集合包含一基於跨分量線性模型之預測工具[例如作為一種圖像式指示，其使得有可能查看RASL圖像處之潛力漂移係足夠低的]。 B2.1. A data stream in which a video is encoded, which contains An indication [e.g., using sps_extra_ph_bit_present_flag and ph_extra_bit, or using gci_rasl_pictures_tool_contraint_flag] that indicates whether the individual image is present, per image in a sequence of images in the video, globally for individual images, or on a per-slice basis. Coded in a manner that does not include a predetermined set of one or more coding tools, the predetermined set containing a prediction tool based on a cross-component linear model [e.g. as a graphical indication that makes it possible to view a RASL plot The potential drift of the system is low enough].

B2.2 如實施例B2.1之資料串流，其由一如實施例E2.#中任一項之編碼器產生。B2.2 The data stream of Embodiment B2.1 is generated by the encoder of any one of Embodiments E2.#.

M. 一種方法，其由以上解碼器及編碼器中之任一項來執行。M. A method performed by any one of the above decoders and encoders.

P. 一種電腦程式，其具有一程式碼，該程式碼用於當在一電腦上執行程式時執行如實施例M之方法。P. A computer program having a program code for executing the method of embodiment M when the program is executed on a computer.

儘管已在設備之上下文中描述一些態樣，但顯而易見的是，此等態樣亦表示對應方法之描述，其中區塊或裝置對應於方法步驟或方法步驟之特徵。類似地，方法步驟之上下文中所描述的態樣亦表示對應設備之對應區塊或項目或特徵的描述。可由(或使用)硬體設備(比如微處理器、可規劃電腦或電子電路)執行方法步驟中之一些或全部。在一些實施例中，可由此類設備執行最重要之方法步驟中之一或多者。Although some aspects have been described in the context of apparatus, it is obvious that these aspects also represent descriptions of corresponding methods, where blocks or means correspond to method steps or features of method steps. Similarly, aspects described in the context of method steps also represent descriptions of corresponding blocks or items or features of the corresponding apparatus. Some or all of the method steps may be performed by (or using) hardware devices, such as microprocessors, programmable computers, or electronic circuits. In some embodiments, one or more of the most important method steps may be performed by such a device.

本發明之資料串流可儲存於數位儲存媒體上或可在諸如無線傳輸媒體或有線傳輸媒體(諸如，網際網路)的傳輸媒體上傳輸。The data stream of the present invention may be stored on a digital storage medium or may be transmitted on a transmission medium such as a wireless transmission medium or a wired transmission medium (such as the Internet).

取決於某些實施要求，本發明之實施例可以硬體或以軟體實施。實施可使用數位儲存媒體來執行，該媒體例如軟性磁碟、DVD、Blu-Ray、CD、ROM、PROM、EPROM、EEPROM或快閃記憶體，該媒體上儲存有電子可讀控制信號，該等電子可讀控制信號與可規劃電腦系統協作(或能夠協作)，使得執行各別方法。因此，數位儲存媒體可為電腦可讀的。Depending on certain implementation requirements, embodiments of the present invention may be implemented in hardware or software. Implementation may be performed using a digital storage medium such as a floppy disk, DVD, Blu-Ray, CD, ROM, PROM, EPROM, EEPROM or flash memory having electronically readable control signals stored thereon, such as The electronically readable control signals cooperate (or can cooperate) with the programmable computer system to enable execution of the respective methods. Therefore, the digital storage medium can be computer readable.

根據本發明之一些實施例包含具有電子可讀控制信號之資料載體，該等電子可讀控制信號能夠與可規劃電腦系統協作，使得執行本文中所描述之方法中之一者。Some embodiments according to the invention comprise a data carrier having electronically readable control signals capable of cooperating with a programmable computer system such that one of the methods described herein is performed.

通常，本發明之實施例可實施為具有程式碼之電腦程式產品，當電腦程式產品在電腦上運行時，該程式碼操作性地用於執行該等方法中之一者。程式碼可例如儲存於機器可讀載體上。Generally, embodiments of the invention may be implemented as a computer program product having program code operative to perform one of the methods when the computer program product is run on a computer. The program code may, for example, be stored on a machine-readable carrier.

其他實施例包含儲存於機器可讀載體上用於執行本文中所描述之方法中之一者的電腦程式。Other embodiments include a computer program stored on a machine-readable carrier for performing one of the methods described herein.

換言之，本發明方法之實施例因此為電腦程式，其具有用於在電腦程式於電腦上運行時執行本文中所描述之方法中之一者的程式碼。In other words, an embodiment of the method of the invention is therefore a computer program having code for performing one of the methods described herein when the computer program is run on a computer.

因此，本發明方法之另一實施例為包含記錄於其上的，用於執行本文中所描述之方法中的一者的電腦程式之資料載體(或數位儲存媒體，或電腦可讀媒體)。資料載體、數位儲存媒體或記錄媒體通常係有形的及/或非暫時性的。Therefore, another embodiment of the method of the invention is a data carrier (or digital storage medium, or computer readable medium) comprising recorded thereon a computer program for performing one of the methods described herein. Data carriers, digital storage media or recording media are usually tangible and/or non-transitory.

因此，本發明方法之另一實施例為表示用於執行本文中所描述之方法中的一者之電腦程式之資料串流或信號序列。資料串流或信號序列可例如經組配以經由資料通訊連接(例如，經由網際網路)而傳送。Therefore, another embodiment of the method of the invention is a data stream or a signal sequence representing a computer program for performing one of the methods described herein. The data stream or signal sequence may, for example, be configured to be transmitted over a data communications connection (eg, over the Internet).

另一實施例包含處理構件，例如經組配或經調適以執行本文中所描述之方法中的一者的電腦或可規劃邏輯裝置。Another embodiment includes processing means, such as a computer or programmable logic device configured or adapted to perform one of the methods described herein.

另一實施例包含電腦，該電腦具有安裝於其上之用於執行本文中所描述之方法中之一者的電腦程式。Another embodiment includes a computer having installed thereon a computer program for performing one of the methods described herein.

根據本發明之另一實施例包含經組配以(例如電子地或光學地)傳送用以執行本文中所描述之方法中之一者的電腦程式至接收器的設備或系統。舉例而言，接收器可為電腦、行動裝置、記憶體裝置等等。該設備或系統可(例如)包含用於傳送電腦程式至接收器之檔案伺服器。Another embodiment in accordance with the invention includes a device or system configured to transmit (eg electronically or optically) a computer program for performing one of the methods described herein to a receiver. For example, the receiver can be a computer, a mobile device, a memory device, etc. The device or system may, for example, include a file server for transmitting computer programs to the receiver.

在一些實施例中，可規劃邏輯裝置(例如，場可規劃閘陣列)可用以執行本文中所描述之方法的功能性中之一些或全部。在一些實施例中，場可規劃閘陣列可與微處理器協作，以便執行本文中所描述之方法中之一者。通常，該等方法較佳地由任一硬體設備執行。In some embodiments, a programmable logic device (eg, a field programmable gate array) may be used to perform some or all of the functionality of the methods described herein. In some embodiments, a field programmable gate array can cooperate with a microprocessor to perform one of the methods described herein. Generally, these methods are preferably performed by any hardware device.

本文中所描述之設備可使用硬體設備或使用電腦或使用硬體設備與電腦之組合來實施。The devices described herein may be implemented using hardware devices or using computers or using a combination of hardware devices and computers.

本文中所描述之設備或本文中所描述之設備的任何組件可至少部分地以硬體及/或以軟體予以實施。The apparatus described herein, or any component of the apparatus described herein, may be implemented, at least in part, in hardware and/or in software.

本文中所描述之方法可使用硬體設備或使用電腦或使用硬體設備與電腦的組合來執行。The methods described herein may be performed using hardware devices or using computers or using a combination of hardware devices and computers.

本文中所描述之方法或本文中所描述之設備的任何組件可至少部分地由硬體及/或軟體執行。Any component of a method described herein or an apparatus described herein may be performed, at least in part, by hardware and/or software.

上文所描述的實施例僅說明本發明之原理。應理解，熟習此項技術者將顯而易見對本文中所描述之配置及細節的修改及變化。因此，意圖為僅受到接下來之申請專利範圍之範疇限制，而不受到藉由本文中之實施例之描述及解釋所呈現的具體細節限制。 參考文獻 The embodiments described above merely illustrate the principles of the invention. It is understood that modifications and variations to the configurations and details described herein will be apparent to those skilled in the art. Therefore, it is intended that the claims be limited only by the scope of the claims that follow and not by the specific details presented by the description and explanation of the embodiments herein. References

[1] ISO/IEC JTC 1, Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats, ISO/IEC 23009-1, 2012 (and subsequent editions).[1] ISO/IEC JTC 1, Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats, ISO/IEC 23009-1, 2012 (and subsequent editions).

[2] J. De Cock, Z. Li, M. Manohara, A. Aaron. "Complexity-based consistent-quality encoding in the cloud." 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016[2] J. De Cock, Z. Li, M. Manohara, A. Aaron. "Complexity-based consistent-quality encoding in the cloud." 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2016

[3] DASH Industry Forum Implementation Guidelines. [Online]. Available: https://dashif.org/guidelines/[3] DASH Industry Forum Implementation Guidelines. [Online]. Available: https://dashif.org/guidelines/

[4] ITU-T and ISO/IEC JTC 1, Advanced Video Coding for generic audio-visual services, Rec. ITU-T H.264 and ISO/IEC 14496-10 (AVC), May 2003 (and subsequent editions).[4] ITU-T and ISO/IEC JTC 1, Advanced Video Coding for generic audio-visual services, Rec. ITU-T H.264 and ISO/IEC 14496-10 (AVC), May 2003 (and subsequent editions).

[5] ITU-T and ISO/IEC JTC 1, “High Efficiency Video Coding,” Rec. ITU-T H.265 and ISO/IEC 23008-2 (HEVC), April 2013 (and subsequent editions).[5] ITU-T and ISO/IEC JTC 1, “High Efficiency Video Coding,” Rec. ITU-T H.265 and ISO/IEC 23008-2 (HEVC), April 2013 (and subsequent editions).

[6] Y. Yan, M. Hannuksela, and H. Li. "Seamless switching of H. 265/HEVC-coded dash representations with open GOP prediction structure." 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015.[6] Y. Yan, M. Hannuksela, and H. Li. "Seamless switching of H. 265/HEVC-coded dash representations with open GOP prediction structure." 2015 IEEE International Conference on Image Processing (ICIP). IEEE, 2015 .

[7] ITU-T and ISO/IEC JTC 1, “Versatile video coding”, Rec. ITU-T H.266 and ISO/IEC 23090-3 (VVC), August 2020.[7] ITU-T and ISO/IEC JTC 1, “Versatile video coding”, Rec. ITU-T H.266 and ISO/IEC 23090-3 (VVC), August 2020.

[8] V. Baroncini and M. Wien, “VVC verification test report for UHD SDR video content”, doc. JVET-T2020 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 21th meeting: October 2020.[8] V. Baroncini and M. Wien, “VVC verification test report for UHD SDR video content”, doc. JVET-T2020 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 21th meeting: October 2020.

[9] D. Luo, V. Seregin, W. Wan. “Description of Core Experiment 1 (CE1): Reference picture resampling filters“, doc. JVET-Q2021 of ITU-T/ISO/IEC Joint Video Experts Team (JVET), 15th meeting: July 2019[9] D. Luo, V. Seregin, W. Wan. “Description of Core Experiment 1 (CE1): Reference picture resampling filters“, doc. JVET-Q2021 of ITU-T/ISO/IEC Joint Video Experts Team (JVET ), 15th meeting: July 2019

[10] H. Schwarz, D. Marpe, and T. Wiegand, “Analysis of hierarchical B pictures and MCTF”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006.[10] H. Schwarz, D. Marpe, and T. Wiegand, “Analysis of hierarchical B pictures and MCTF”, ICME 2006, IEEE International Conference on Multimedia and Expo, Toronto, Ontario, Canada, July 2006.

[11] Y.-K. Wang et al., “The High-Level Syntax of the Versatile Video Coding (VVC) Standard” IEEE Trans. Circuits Syst. Video Technol., in press[11] Y.-K. Wang et al., “The High-Level Syntax of the Versatile Video Coding (VVC) Standard” IEEE Trans. Circuits Syst. Video Technol., in press

[12] H. Yang et al., “Subblock based Motion Derivation and Inter-Prediction Refinement in Versatile Video Coding Standard”, IEEE Trans. Circuits Syst. Video Technol., in press[12] H. Yang et al., “Subblock based Motion Derivation and Inter-Prediction Refinement in Versatile Video Coding Standard”, IEEE Trans. Circuits Syst. Video Technol., in press

[13] W.-J. Chien et al., “Motion Vector Coding and Block Merging in Versatile Video Coding Standard”, IEEE Trans. Circuits Syst. Video Technol., in press[13] W.-J. Chien et al., “Motion Vector Coding and Block Merging in Versatile Video Coding Standard”, IEEE Trans. Circuits Syst. Video Technol., in press

10,10a,10b:圖像區塊 10c,10d:預定經框間預測區塊 12:預定圖像 100:跨分量線性模型工具 102:色度分量 104:明度分量 106:線性模型 108:參數求導 110:明度及色度極值 112:已經解碼鄰域 120:任何類型的預測 122:解碼 124:經重建明度分量 126:樣本 128:操作 200:明度色調映射及色度殘餘縮放預測工具 202:明度分量預測 204:明度分量殘餘解碼 206:框間預測 208:寫碼明度色調標度 210:呈現明度色調標度 212:明度色調映射 214:寫碼明度色調標度版本 216:色度殘餘縮放因數 220:平均值 222:鄰域 224:色度殘餘信號 226:縮放 228:校正 230:框內色度預測信號 240:反明度色調映射 300:光流工具 302:運動向量 304:參考圖像 306:面積 350:寫碼工具 352,354:箭頭 353:語法元素 356:應用決策 358:啟動決策 400:解碼器側運動向量改進工具 402:經傳信運動向量 404:參考圖像 406:替換 500:時間運動向量預測工具 502:經先前解碼圖像 504:運動向量預測子 506,A84:區塊 508,510:運動向量 512:運動向量候選者清單 A10:設備或編碼器 A12,A12':圖像 A14:資料串流 A20:解碼器 A22:預測殘餘信號形成器 A24:預測殘餘 A24',A24''':譜域預測殘餘信號 A24'',A24'''':預測殘餘信號 A26:預測信號 A28:變換器 A32:量化器 A34:熵寫碼器 A36:預測級 A38,A52:解量化器 A40,A54:反變換器 A42,A56:組合器 A46:經重建信號 A50:熵解碼器 A58,A44:預測模組 A80:經框內寫碼區塊 A82:經框間寫碼區塊 A84:區塊 C ₁,C ₂:色度分量 10, 10a, 10b: Image block 10c, 10d: Predetermined inter-frame prediction block 12: Predetermined image 100: Cross-component linear model tool 102: Chroma component 104: Luminance component 106: Linear model 108: Parameter calculation Guide 110: Luminance and chroma extremes 112: Decoded neighborhood 120: Any type of prediction 122: Decoding 124: Reconstructed luma component 126: Sample 128: Operation 200: Luminance tone mapping and chroma residual scaling prediction tools 202: Luminance component prediction 204: Luminance component residual decoding 206: Inter-frame prediction 208: Coding luminance tone scale 210: Rendering luma tone scale 212: Luminance tone mapping 214: Writing luma tone scale version 216: Chroma residual scaling factor 220: Average 222: Neighborhood 224: Chroma residual signal 226: Scaling 228: Correction 230: In-box chroma prediction signal 240: Inverse luma tone mapping 300: Optical flow tools 302: Motion vectors 304: Reference image 306: Area 350: Coding Tool 352, 354: Arrow 353: Syntax Element 356: Application Decision 358: Startup Decision 400: Decoder Side Motion Vector Improvement Tool 402: Signaled Motion Vector 404: Reference Image 406: Replacement 500: Temporal Motion Vector Prediction Tool 502: Previously decoded image 504: Motion vector predictor 506, A84: Block 508, 510: Motion vector 512: Motion vector candidate list A10: Device or encoder A12, A12': Image A14: Data stream A20 : Decoder A22: Prediction residual signal former A24: Prediction residual signal A24', A24''': Spectral domain prediction residual signal A24'', A24'''': Prediction residual signal A26: Prediction signal A28: Transformer A32: Quantizer A34: entropy coder A36: prediction stage A38, A52: dequantizer A40, A54: inverse transformer A42, A56: combiner A46: reconstructed signal A50: entropy decoder A58, A44: prediction module A80 : In-frame coding block A82: Between-frame coding block A84: Blocks C ₁ , C ₂ : Chroma component

本申請案之有利態樣為附屬請求項之主題。下文中關於圖式描述本申請案之較佳實施例。圖1說明根據一實施例之視訊編碼器，圖2說明根據一實施例之視訊解碼器，圖3說明根據一實施例之基於區塊之殘餘寫碼方案，圖4說明根據一實施例之包含二個區段之視訊資料串流，圖5說明根據一實施例之光流工具之操作方案，圖6說明根據一實施例之用於寫碼工具之應用程式方案，圖7說明根據一實施例之時間運動向量預測工具之操作方案，圖8說明根據一實施例之解碼器側運動向量改進工具之操作方案，圖9說明根據一實施例之跨分量線性模型工具之操作方案，圖10說明根據一實施例之明度映射及色度縮放工具之操作方案，圖11說明根據一實施例之明度映射及色度縮放工具之另一操作方案，圖12說明開放GOP情境中之預測誤差之實例。 The advantageous aspects of this application are the subject of the subsidiary claims. Preferred embodiments of the present application are described below with respect to the drawings. Figure 1 illustrates a video encoder according to one embodiment, Figure 2 illustrates a video decoder according to an embodiment, Figure 3 illustrates a block-based residual coding scheme according to one embodiment, Figure 4 illustrates a video data stream including two segments, according to one embodiment, Figure 5 illustrates an operation scheme of an optical flow tool according to an embodiment, Figure 6 illustrates an application solution for a coding tool according to one embodiment, 7 illustrates an operation scheme of a temporal motion vector prediction tool according to one embodiment, Figure 8 illustrates the operation scheme of the decoder side motion vector improvement tool according to one embodiment, Figure 9 illustrates the operation scheme of the cross-component linear model tool according to one embodiment, Figure 10 illustrates the operation scheme of the luma mapping and chroma scaling tools according to one embodiment. Figure 11 illustrates another operation scheme of the luma mapping and chroma scaling tools according to one embodiment. Figure 12 illustrates an example of prediction error in an open GOP scenario.

A12':圖像 A12':Image

A14:資料串流 A14: Data streaming

A20:解碼器 A20:Decoder

A24":預測殘餘信號 A24": Predict residual signal

A50:熵解碼器 A50: Entropy decoder

A52:解量化器 A52: Dequantizer

A54:反變換器 A54: Inverse converter

A56:組合器 A56: Combiner

A58:預測模組 A58: Prediction module

Claims

A video decoder configured to: decode a supplemental enhancement information (SEI) message from a data stream, the SEI message indicating that for all random access skips within a sequence of images Coding of RASL images is constrained in a manner that does not include a predetermined set of one or more coding tools, where the predetermined set includes: a prediction tool (100) based on a cross-component linear model, and a Decoder side motion vector improvement tool (400), wherein the image sequence includes at least one random access skip preamble (RASL) picture and a clean random access (CRA) associated with the at least one RASL picture. ) image.

The video decoder of claim 1, wherein the supplemental enhancement information (SEI) message further indicates that coding for the at least one RASL image is constrained such that there is no moving image for the at least one RASL image. The collocated reference picture for quantified temporal motion vector prediction (TMVP) or sub-block temporal motion vector prediction (sbTMVP) precedes the CRA picture associated with the RASL picture in decoding order.

The video decoder of claim 1 or 2, wherein the video decoder is configured to support reference picture resampling, wherein based on the reference picture resampling, once a reference picture of the inter-frame prediction block is is subjected to resampling to bridge a scaling window size deviation or sample resolution deviation between the reference image and an image containing the inter-prediction block to provide the inter-prediction block An inter-frame prediction signal.

A video encoder configured to encode a supplemental enhancement information (SEI) message into a data stream, the SEI message indicating a skip preamble ( RASL) images are constrained in a manner that does not include a predetermined set of one or more coding tools, Wherein, the predetermined set includes: a prediction tool (100) based on a cross-component linear model, and a decoder-side motion vector improvement tool (400), wherein the image sequence includes at least one random access skip preamble ( RASL) image and a clean random access (CRA) image associated with the at least one RASL image.

The video encoder of claim 4, wherein the supplemental enhancement information (SEI) further indicates that coding for the at least one RASL image is constrained such that there is no moving image for the at least one RASL image. The collocated reference picture for quantified temporal motion vector prediction (TMVP) or sub-block temporal motion vector prediction (sbTMVP) precedes the CRA picture associated with the RASL picture in decoding order.

The video encoder of claim 4 or 5, wherein the video encoder is configured to support reference picture resampling, wherein based on the reference picture resampling, a reference picture of the inter-frame prediction block is subjected to resampling to bridge a scaling window size deviation or sample resolution deviation between the reference image and an image containing the inter-prediction block to provide the inter-prediction block An inter-frame prediction signal.