RU2575411C2

RU2575411C2 - Method and apparatus for scalable video coding

Info

Publication number: RU2575411C2
Application number: RU2013154579/08A
Authority: RU
Inventors: Тцзу-Дер ЧУАН; Чинг-Ех ЧЭНЬ; Чих-Мин ФУ; Юй-Вэнь ХУАН; Шав-Минь ЛЭЙ
Original assignee: МедиаТек Инк.
Priority date: 2011-06-10
Filing date: 2012-05-31
Publication date: 2016-02-20

Abstract

FIELD: physics, computer engineering.

SUBSTANCE: invention relates to computer engineering. A method of coding a CU (coding unit) structure, coding mode information or encoding motion information for scalable video coding, wherein video data are configured into a base layer (BL) and an enhancement layer (EL) and wherein the EL has higher spatial resolution or better video quality than the BL, the method comprising: determining CU structure (coding unit structure), mode, motion information, or a combination of the CU structure, the mode and the motion information for a CU (coding unit) in the BL; and determining CU structure, mode, motion vector predictor (MVP) information, or a combination of the CU structure, the mode and the MVP information for a corresponding CU in the EL based on the CU structure, the mode, the motion information, or the combination of the CU structure, the mode and the motion information for the CU in the BL respectively; wherein the mode is skip mode, merge mode, or intra mode.

EFFECT: high efficiency of coding an enhancement layer.

36 cl, 11 dwg

Description

Перекрестная ссылка на родственные заявкиCross reference to related applications

[0001] Настоящее изобретение испрашивает приоритет предварительной заявки на патент (США), порядковый номер 61/495,740, поданной 10 июня 2011 года, озаглавленной "Scalable Coding of High Efficiency Video Coding", и предварительной заявки на патент (США), порядковый номер 61/567,774, поданной 7 декабря 2011 года. Предварительная заявка на патент (США) полностью содержится в данном документе по ссылке.[0001] The present invention claims the priority of provisional patent application (USA), serial number 61 / 495,740, filed June 10, 2011, entitled "Scalable Coding of High Efficiency Video Coding", and provisional patent application (USA), serial number 61 / 567,774, filed December 7, 2011. The provisional patent application (USA) is fully incorporated herein by reference.

Область техники, к которой относится изобретениеFIELD OF THE INVENTION

[0002] Настоящее изобретение относится к кодированию видео. В частности, настоящее изобретение относится к масштабируемому кодированию видео, которое использует информацию базового слоя для кодирования слоя улучшения.[0002] The present invention relates to video encoding. In particular, the present invention relates to scalable video encoding that uses base layer information to encode an enhancement layer.

Уровень техникиState of the art

[0003] Сжатое цифровое видео широко используется в различных приложениях, таких как потоковая передача видео по цифровым сетям и передача видео по цифровым каналам. Очень часто один видеоконтент может доставляться по сетям с различными характеристиками. Например, прямая трансляция спортивного соревнования может переноситься в формате широкополосной потоковой передачи по широкополосным сетям для услуги платной передачи видео по подписке. В таких приложениях сжатое видео обычно сохраняет как высокое разрешение, так и высокое качество, так что видеоконтент подходит для устройств высокой четкости, таких как HDTV или ЖК-дисплей высокого разрешения. Тот же самый контент также может переноситься через сотовую сеть передачи данных, так что контент может просматриваться на портативном устройстве, таком как смартфон или подключенное к сети портативное мультимедийное устройство. В таких приложениях вследствие проблем полосы пропускания сети, а также типичного устройства отображения низкого разрешения на смартфоне или портативных устройствах, видеоконтент обычно сжимается до более низкого разрешения и более низких частот следования битов. Следовательно, для различных сетевых окружений и для различных приложений, требования по разрешению видео и качеству видео очень отличаются. Даже для одинакового типа сети пользователям могут предоставляться различные доступные полосы частот вследствие различной сетевой инфраструктуры и состояния сетевого трафика. Следовательно, пользователь может желать принимать видео с более высоким качеством, когда доступная полоса пропускания является высокой, и принимать видео более низкого качества, но плавное, когда возникает перегрузка сети. В другом сценарии, высокопроизводительный мультимедийный проигрыватель может обрабатывать сжатое видео высокого разрешения и с высокой частотой следования битов, в то время как недорогой мультимедийный проигрыватель допускает только обработку сжатого видео низкого разрешения и с низкой частотой следования битов вследствие ограниченных вычислительных ресурсов. Соответственно, желательно составлять сжатое видео масштабируемым способом, так что видео с различным пространственным временным разрешением и/или качеством может извлекаться из одного и того же самого сжатого потока битов.[0003] Compressed digital video is widely used in various applications, such as streaming video over digital networks and transmitting video over digital channels. Very often, one video content can be delivered over networks with different characteristics. For example, a live sporting event broadcast may be carried in the format of broadband streaming over broadband networks for a paid subscription video service. In such applications, compressed video typically retains both high resolution and high quality, so video content is suitable for high-definition devices such as HDTV or high-resolution LCD. The same content can also be transported over a cellular data network so that the content can be viewed on a portable device, such as a smartphone or a networked portable multimedia device. In such applications, due to network bandwidth problems, as well as a typical low-resolution display device on a smartphone or portable devices, video content is usually compressed to a lower resolution and lower bit rates. Therefore, for different network environments and for various applications, the requirements for video resolution and video quality are very different. Even for the same type of network, users can be provided with different available frequency bands due to different network infrastructure and the state of network traffic. Therefore, the user may wish to receive higher quality video when the available bandwidth is high, and to receive lower quality video, but smooth when network congestion occurs. In another scenario, a high-performance multimedia player can process high-resolution compressed video with a high bit rate, while an inexpensive multimedia player can only process low-resolution compressed video with a low bit rate due to limited computing resources. Accordingly, it is desirable to compose the compressed video in a scalable manner so that video with different spatial temporal resolution and / or quality can be extracted from the same compressed bit stream.

[0004] В текущем видеостандарте H.264/AVC существует расширение стандарта H.264/AVC, называемое масштабируемым кодированием видео (SVC). SVC предоставляет временную, пространственную масштабируемости и масштабируемость по качеству на основе одного потока битов. Поток SVC-битов содержит масштабируемую видеоинформацию от низкой частоты кадров, низкого разрешения и низкого качества до высокой частоты кадров, высокой четкости и высокого качества, соответственно. Соответственно, SVC является подходящим для различных видеоприложений, таких как передача видео в широковещательном режиме, потоковая передача видео и видеонаблюдение, чтобы адаптироваться к сетевой инфраструктуре, состоянию трафика, предпочтениям пользователя и т.д.[0004] In the current H.264 / AVC video standard, there is an extension to the H.264 / AVC standard called scalable video encoding (SVC). SVC provides temporal, spatial scalability, and quality scalability based on a single bit stream. The SVC bit stream contains scalable video information from low frame rate, low resolution and low quality to high frame rate, high definition and high quality, respectively. Accordingly, SVC is suitable for various video applications such as broadcast video, video streaming and video surveillance to adapt to network infrastructure, traffic status, user preferences, etc.

[0005] В SVC предоставляется три типа масштабируемости, т.е. временная масштабируемость, пространственная масштабируемость и масштабируемость по качеству. SVC использует многослойную структуру кодирования для того, чтобы реализовывать три измерения масштабируемости. Основная цель SVC состоит в том, чтобы формировать один масштабируемый поток битов, который может быть легко и быстро адаптирован к требованию по частоте следования битов, ассоциированному с различными каналами передачи, разнообразными характеристиками отображения и различными вычислительными ресурсами, без транскодирования или повторного кодирования. Важная особенность SVC-схемы заключается в том, что масштабируемость предоставляется на уровне потока битов. Другими словами, потоки битов для извлечения видео с уменьшенным пространственным и/или временным разрешением могут быть получены просто посредством извлечения из масштабируемого потока битов единиц уровня абстракции сети (NAL) (или сетевых пакетов), которые требуются для декодирования запланированного видео. Единицы NAL для повышения качества дополнительно могут усекаться, чтобы сокращать поток битов и ассоциированное качество видео.[0005] Three types of scalability are provided in SVC, i.e. temporal scalability, spatial scalability and quality scalability. SVC uses a multi-layer coding structure in order to implement three dimensions of scalability. The main goal of SVC is to form one scalable bit stream that can be easily and quickly adapted to the bit rate requirement associated with different transmission channels, various display characteristics and various computing resources, without transcoding or re-encoding. An important feature of the SVC scheme is that scalability is provided at the bitstream level. In other words, bit streams for retrieving video with reduced spatial and / or temporal resolution can simply be obtained by extracting from the scalable bit stream the network abstraction level (NAL) units (or network packets) that are required to decode the scheduled video. Quality improvement NAL units may further be truncated to reduce bitstream and associated video quality.

[0006] Например, временная масштабируемость может извлекаться из иерархической структуры кодирования на основе B-изображений согласно стандарту H.264/AVC. Фиг. 1 иллюстрирует пример иерархической структуры B-изображений с 4 временными слоями и группой изображений (GOP) с восемью изображениями. Изображения 0 и 8 на фиг. 1 называются ключевыми изображениями. Внешнее прогнозирование ключевых изображений использует только предыдущие ключевые изображения в качестве опорных изображений. Другие изображения между двумя ключевыми изображениями прогнозируются иерархически. Видео, имеющее только ключевые изображения, формирует самое приблизительное временное разрешение масштабируемой системы. Временная масштабируемость достигается посредством прогрессивной детализации (грубого) видео низшего уровня посредством добавления большего числа B-изображений, соответствующих слоям улучшения масштабируемой системы. В примере по фиг. 1, изображение 4 сначала двунаправленно прогнозируется с использованием ключевых изображений, т.е. изображений 0 и 8, после того как упомянутые два ключевых изображения кодированы. После того, как изображение 4 обработано, обрабатываются изображения 2 и 6. Изображение 2 двунаправленно прогнозируется с использованием изображения 0 и 4, и изображение 6 двунаправленно прогнозируется с использованием изображения 4 и 8. После того, как изображения 2 и 6 кодированы, оставшиеся изображения, т.е. изображения 1, 3, 5 и 7, обрабатываются двунаправленно с использованием двух соответствующих соседних изображений, как показано на фиг. 1. Соответственно, порядок обработки для GOP представляет собой 0, 8, 4, 2, 6, 1, 3, 5 и 7. Изображения, обработанные согласно иерархическому процессу по фиг. 1, приводят к иерархическим четырехуровневым изображениям, при этом изображения 0 и 8 принадлежат первому временному порядку, изображение 4 принадлежит второму временному порядку, изображения 2 и 6 принадлежат третьему временному порядку, и изображения 1, 3, 5 и 7 принадлежат четвертому временному порядку. Посредством декодирования изображений базового уровня и добавления изображений высшего временного порядка можно предоставлять видео верхнего уровня. Например, изображения 0 и 8 базового уровня могут быть комбинированы с изображением 4 второго временного порядка, чтобы формировать изображения второго уровня. Посредством дополнительного добавления изображений третьего временного порядка в видео второго уровня можно формировать видео третьего уровня. Аналогично, посредством добавления изображений четвертого временного порядка в видео третьего уровня можно формировать видео четвертого уровня. Соответственно, достигается временная масштабируемость. Если исходное видео имеет частоту кадров 30 кадров в секунду, видео базового уровня имеет частоту кадров 30/8=3,75 кадра в секунду. Видео второго уровня, третьего уровня и четвертого уровня соответствует 7,5, 15 и 30 кадрам в секунду. Изображения первого временного порядка также называются видео базового уровня или изображениями базового уровня. Изображения от второго временного порядка до четвертого временного порядка также называются видео уровня улучшения или изображениями уровня улучшения. Помимо этого, чтобы обеспечивать временную масштабируемость, структура кодирования иерархических B-изображений также повышает эффективность кодирования по сравнению с типичной IBBP GOP-структурой за счет увеличенной задержки при кодировании-декодировании.[0006] For example, temporal scalability can be derived from a hierarchical coding structure based on B-images according to the H.264 / AVC standard. FIG. 1 illustrates an example of a hierarchical structure of B-images with 4 time layers and an image group (GOP) with eight images. Images 0 and 8 in FIG. 1 are called key images. External key image prediction uses only previous key images as reference images. Other images between two key images are predicted hierarchically. A video having only key images forms the most approximate time resolution of a scalable system. Temporal scalability is achieved through progressive granularity of (coarse) lower-level video by adding more B-images corresponding to layers of scalable system enhancement. In the example of FIG. 1, image 4 is first bi-directionally predicted using key images, i.e. images 0 and 8, after the two key images are encoded. After image 4 is processed, images 2 and 6 are processed. Image 2 is bi-directionally predicted using images 0 and 4, and image 6 is bi-directionally predicted using images 4 and 8. After images 2 and 6 are encoded, the remaining images, those. images 1, 3, 5 and 7 are processed bi-directionally using two corresponding neighboring images, as shown in FIG. 1. Accordingly, the processing order for the GOP is 0, 8, 4, 2, 6, 1, 3, 5, and 7. Images processed according to the hierarchical process of FIG. 1 lead to hierarchical four-level images, wherein images 0 and 8 belong to the first temporal order, image 4 belong to the second temporal order, images 2 and 6 belong to the third temporal order, and images 1, 3, 5, and 7 belong to the fourth temporal order. By decoding base-level images and adding higher temporal images, upper-level video can be provided. For example, base level images 0 and 8 can be combined with second time order image 4 to form second level images. By further adding third-order temporal images to the second-level video, it is possible to form a third-level video. Similarly, by adding fourth-order temporal images to a third-level video, a fourth-level video can be generated. Accordingly, temporary scalability is achieved. If the source video has a frame rate of 30 frames per second, the base layer video has a frame rate of 30/8 = 3.75 frames per second. Video of the second level, third level and fourth level corresponds to 7.5, 15 and 30 frames per second. First time order images are also referred to as base level videos or base level images. Images from the second temporal order to the fourth temporal order are also called enhancement level videos or enhancement level images. In addition to providing temporal scalability, the coding structure of hierarchical B-images also improves coding efficiency compared to a typical IBBP GOP structure due to the increased delay in coding / decoding.

[0007] В SVC пространственная масштабируемость поддерживается на основе пирамидальной схемы кодирования, как показано на фиг. 2. В SVC-системе с пространственной масштабируемостью видеопоследовательность сначала понижающе дискретизируется, чтобы получать меньшие изображения при различных пространственных разрешениях (слоях). Например, изображение 210 с исходным разрешением может быть обработано посредством пространственного прореживания 220, чтобы получать изображение 211 с уменьшенным разрешением. Изображение 211 с уменьшенным разрешением дополнительно может обрабатываться посредством пространственного прореживания 221, чтобы получать дополнительное изображение 212 с уменьшенным разрешением, как показано на фиг. 2. В дополнение к двухэлементному пространственному разрешению, когда пространственное разрешение уменьшается наполовину на каждом уровне, SVC также поддерживает произвольные разрешающие способности, что называется расширенной пространственной масштабируемостью (ESS). SVC-система на фиг. 2 иллюстрирует пример пространственной масштабируемой системы с тремя слоями, в которой слой 0 соответствует изображениям с наименьшим пространственным разрешением, а слой 2 соответствует изображениям с наибольшим разрешением. Изображения слоя 0 кодируются независимо от других слоев, т.е. как однослойное кодирование. Например, изображение 212 самого нижнего слоя кодируется с использованием прогнозирования 230 с компенсацией движения и внутреннего прогнозирования.[0007] In SVC, spatial scalability is supported based on a pyramidal coding scheme, as shown in FIG. 2. In an SVC system with spatial scalability, the video sequence is first down-sampled to obtain smaller images at different spatial resolutions (layers). For example, an original resolution image 210 may be processed by spatial decimation 220 to obtain a reduced resolution image 211. The reduced resolution image 211 may further be processed by spatial decimation 221 to obtain an additional reduced resolution image 212, as shown in FIG. 2. In addition to the two-element spatial resolution, when the spatial resolution is reduced by half at each level, SVC also supports arbitrary resolution, which is called extended spatial scalability (ESS). The SVC system of FIG. 2 illustrates an example of a spatial scalable three-layer system in which layer 0 corresponds to the images with the lowest spatial resolution and layer 2 corresponds to the images with the highest resolution. Layer 0 images are encoded independently of other layers, i.e. as single layer coding. For example, the image 212 of the lowest layer is encoded using prediction 230 with motion compensation and intra prediction.

[0008] Прогнозирование 230 с компенсацией движения и внутреннее прогнозирование формируют элементы синтаксиса, а также связанную с кодированием информацию, такую как информация движения для дополнительного энтропийного кодирования 240. Фиг. 2 фактически иллюстрирует комбинированную SVC-систему, которая предоставляет пространственную масштабируемость, а также масштабируемость по качеству (также называемую SNR-масштабируемостью). Система также может предоставлять временную масштабируемость, которая явным образом не показана. Для каждого однослойного кодирования, ошибки остаточного кодирования могут быть детализированы с использованием кодирования 250 SNR-слоя улучшения. SNR-слой улучшения на фиг. 2 может предоставлять несколько уровней качества (масштабируемость по качеству). Каждый поддерживаемый слой разрешения может быть кодирован посредством соответствующего однослойного прогнозирования с компенсацией движения и внутреннего прогнозирования, к примеру, как в системе не масштабируемого кодирования. Каждый верхний пространственный слой также может быть кодирован с использованием межслойного кодирования на основе одного или более нижних пространственных слоев. Например, видео слоя 1 может быть адаптивно кодировано с использованием межслойного прогнозирования на основе видео слоя 0 либо однослойного кодирования на основе макроблоков или другой единицы блоков. Аналогично, видео слоя 2 может быть адаптивно кодировано с использованием межслойного прогнозирования на основе видео восстановленного слоя 1 или однослойного кодирования. Как показано на фиг. 2, изображения 211 слоя 1 могут быть кодированы посредством прогнозирования 231 с компенсацией движения и внутреннего прогнозирования, энтропийного кодирования 241 базового слоя и кодирования 251 SNR-слоя улучшения. Аналогично, изображения 210 слоя 2 могут быть кодированы посредством прогнозирования 232 с компенсацией движения и внутреннего прогнозирования, энтропийного кодирования 242 базового слоя и кодирования 252 SNR-слоя улучшения. Эффективность кодирования может быть повышена вследствие межслойного кодирования. Кроме того, информация, требуемая для того, чтобы кодировать пространственный слой 1, может зависеть от восстановленного слоя 0 (межслойное прогнозирование). Межслойные разности называются слоями улучшения. H.264 SVC предоставляет три типа инструментальных средств межслойного прогнозирования: межслойное прогнозирование движения, межслойное внутреннее прогнозирование и межслойное остаточное прогнозирование.[0008] Prediction 230 with motion compensation and internal prediction form syntax elements as well as coding related information, such as motion information for additional entropy coding 240. FIG. 2 actually illustrates a combined SVC system that provides spatial scalability as well as quality scalability (also called SNR scalability). The system may also provide temporal scalability, which is not explicitly shown. For each single layer coding, residual coding errors can be detailed using coding 250 SNR enhancement layer. The SNR enhancement layer of FIG. 2 can provide several levels of quality (scalability in quality). Each supported resolution layer can be encoded by means of a corresponding single-layer prediction with motion compensation and internal prediction, for example, as in a non-scalable encoding system. Each upper spatial layer can also be encoded using interlayer coding based on one or more lower spatial layers. For example, a video of layer 1 can be adaptively encoded using interlayer prediction based on video layer 0 or single-layer coding based on macroblocks or another unit of blocks. Similarly, the video of layer 2 can be adaptively encoded using interlayer prediction based on the video of the reconstructed layer 1 or single layer encoding. As shown in FIG. 2, images 211 of layer 1 can be encoded by motion compensation prediction 231 and intra prediction, entropy coding 241 of the base layer and coding 251 of the SNR enhancement layer. Similarly, images 210 of layer 2 can be encoded by motion compensation prediction 232 and intra prediction, entropy coding 242 of the base layer and coding 252 of the SNR enhancement layer. Coding efficiency may be improved due to interlayer coding. In addition, the information required to encode spatial layer 1 may depend on the reconstructed layer 0 (interlayer prediction). Interlayer differences are called enhancement layers. The H.264 SVC provides three types of inter-layer prediction tools: inter-layer motion prediction, inter-layer intra prediction, and inter-layer residual prediction.

[0009] В SVC, слой (EL) улучшения может многократно использовать информацию движения в базовом слое (BL), чтобы уменьшать избыточность межслойных данных движения. Например, макроблочное EL-кодирование может использовать такой флаг, как base_mode_flag, до того как определено то, что mb_type указывает то, извлекается или нет информация движения EL непосредственно из BL. Если base_mode_flag равен 1, данные сегментирования EL-макроблока вместе с ассоциированными опорными индексами и векторами движения извлекаются из соответствующих данных совместно размещенного блока 8x8 в BL. Индекс опорного изображения BL непосредственно используется в EL. Векторы движения EL масштабируются из данных, ассоциированных с BL. Кроме того, масштабированный вектор движения BL может быть использован в качестве дополнительного предиктора вектора движения для EL.[0009] In SVC, an enhancement layer (EL) can reuse motion information in a base layer (BL) to reduce redundancy of interlayer motion data. For example, macroblock EL coding may use a flag such as base_mode_flag before it is determined that mb_type indicates whether or not EL motion information is extracted directly from BL. If base_mode_flag is 1, the segmentation data of the EL macroblock, together with the associated reference indices and motion vectors, are extracted from the corresponding data of the co-located 8x8 block in BL. The reference image index BL is directly used in EL. The motion vectors EL are scaled from data associated with BL. In addition, the scaled motion vector BL can be used as an additional predictor of the motion vector for EL.

[0010] Межслойное остаточное прогнозирование использует информацию BL-остатков после повышающей дискретизации для того, чтобы уменьшать информацию EL-остатков. Совместно размещенный остаток BL может быть поблочно повышающе дискретизирован с использованием билинейного фильтра и может быть использован в качестве прогнозирования для остатка текущего макроблока в EL. Повышающая дискретизация остатка опорного слоя выполняется на основе блоков преобразования, чтобы обеспечивать то, что фильтрация не применяется через границы блоков преобразования.[0010] Interlayer residual prediction uses the information of the BL residues after upsampling in order to reduce the information of the EL residues. The co-located BL residue can be block-incrementally sampled using a bilinear filter and can be used as a prediction for the remainder of the current macroblock in EL. Upsampling of the remainder of the reference layer is performed based on the transform blocks to ensure that filtering is not applied across the boundaries of the transform blocks.

[0011] Аналогично межслойному остаточному прогнозированию, межслойное внутреннее прогнозирование уменьшает избыточную информацию текстуры EL. Прогнозирование в EL формируется посредством поблочной повышающей дискретизации совместно размещенного сигнала восстановления BL. В процедуре повышающей дискретизации при межслойном внутреннем прогнозировании 4-отводные и 2-отводные FIR-фильтры применяются для компонентов сигнала яркости и сигнала цветности, соответственно. В отличие от межслойного остаточного прогнозирования, фильтрация для межслойного внутреннего прогнозирования всегда выполняется через границы субблоков. Для простоты декодирования межслойное внутреннее прогнозирование может ограничиваться только внутренне кодированными макроблоками в BL.[0011] Like interlayer residual prediction, interlayer intra prediction reduces redundant EL texture information. Prediction in the EL is generated by block-wise upsampling of the co-located BL recovery signal. In the upsampling procedure for interlayer intra prediction, 4-tap and 2-tap FIR filters are used for the luminance and chrominance components, respectively. Unlike inter-layer residual prediction, filtering for inter-layer intra prediction is always performed across sub-block boundaries. For ease of decoding, interlayer intra prediction may be limited only to intra coded macroblocks in BL.

[0012] В SVC, масштабируемость по качеству реализуется посредством кодирования нескольких качественных EL, которые состоят из коэффициентов детализации. Масштабируемый поток видеобитов может легко усекаться или извлекаться, чтобы предоставлять различные потоки видеобитов с различным качеством видео или размерами потока битов. В SVC, масштабируемость по качеству (также называемая SNR-масштабируемостью) может предоставляться через две стратегии, крупномодульную масштабируемость (CGS) и среднемодульную масштабируемость (MGS). CGS может рассматриваться в качестве частного случая пространственной масштабируемости, при которой пространственное разрешение BL и EL является одинаковым. Тем не менее, качество EL лучше (QP EL меньше QP BL). Может использоваться тот же самый механизм межслойного прогнозирования для пространственного масштабируемого кодирования. Тем не менее, соответствующие операции повышающей дискретизации или удаления блочности не выполняются. Кроме того, межслойное внутреннее и остаточное прогнозирование непосредственно выполняется в области преобразования. Для межслойного прогнозирования в CGS детализация информации текстуры типично осуществляется посредством повторного квантования остаточного сигнала в EL с меньшим размером шага квантования, чем шаг квантования, используемый для предыдущего CGS-слоя. CGS может предоставлять несколько предварительно заданных баллов оценки качества.[0012] In SVC, quality scalability is implemented by encoding several high-quality ELs that consist of detail coefficients. The scalable video bit stream can be easily truncated or retrieved to provide different video bit streams with different video quality or bit stream sizes. In SVC, quality scalability (also called SNR scalability) can be provided through two strategies, Large Modular Scalability (CGS) and Medium Modular Scalability (MGS). CGS can be considered as a special case of spatial scalability, in which the spatial resolution of BL and EL is the same. However, EL quality is better (QP EL is less than QP BL). The same interlayer prediction mechanism for spatial scalable coding may be used. However, corresponding upsampling or deblocking operations are not performed. In addition, interlayer intra and residual prediction is directly performed in the transform domain. For inter-layer prediction in CGS, granularity of texture information is typically accomplished by re-quantization of the residual signal in EL with a smaller quantization step than the quantization step used for the previous CGS layer. CGS may provide several predefined quality assessment points.

[0013] Чтобы предоставлять более точную степень детализации скорости передачи битов при поддержании обоснованной сложности для масштабируемости по качеству, MGS используется посредством H.264 SVC. MGS может рассматриваться как расширение CGS, в котором квантованные коэффициенты в одной серии последовательных SGS-макроблоков могут быть разделены на несколько серий последовательных MGS-макроблоков. Квантованные коэффициенты в CGS классифицируются на 16 категорий на основе позиции сканирования в зигзагообразном порядке сканирования. Эти 16 категорий коэффициентов могут быть распределены в различные серии последовательных макроблоков, чтобы предоставлять большее число баллов извлечения качества, чем CGS.[0013] In order to provide a more accurate granularity of the bit rate while maintaining reasonable complexity for quality scalability, the MGS is used through the H.264 SVC. MGS can be considered as an extension of CGS, in which the quantized coefficients in one series of consecutive SGS macroblocks can be divided into several series of consecutive MGS macroblocks. The quantized coefficients in CGS are classified into 16 categories based on the scan position in a zigzag scan order. These 16 categories of coefficients can be distributed into different series of consecutive macroblocks to provide more quality extraction points than CGS.

[0014] В текущем HEVC, это предоставляет только однослойное кодирование на основе структуры кодирования иерархических B без пространственной масштабируемости и масштабируемости по качеству. Желательно предоставлять характеристики пространственной масштабируемости и масштабируемости по качеству для текущего HEVC. Кроме того, желательно предоставлять улучшенное SVC по сравнению с H.264 SVC, чтобы достигать более высокой эффективности и/или большей гибкости.[0014] In the current HEVC, this provides only single layer coding based on the hierarchical B coding structure without spatial scalability and quality scalability. It is desirable to provide spatial scalability and quality scalability characteristics for the current HEVC. In addition, it is desirable to provide improved SVC compared to H.264 SVC in order to achieve higher efficiency and / or greater flexibility.

Сущность изобретенияSUMMARY OF THE INVENTION

[0015] Раскрываются способ и устройство для масштабируемого кодирования видео, которые используют информацию базового слоя (BL) для слоя (EL) улучшения, при этом EL имеет и/или лучшее качество более высокого разрешения, чем BL. Варианты осуществления настоящего изобретения используют различные фрагменты BL-информации, чтобы повышать эффективность кодирования EL. В одном варианте осуществления согласно настоящему изобретению, способ и устройство используют информацию структуры CU, информацию режима или информацию движения BL для того, чтобы извлекать соответствующую информацию структуры CU, информацию режима или информацию предиктора вектора движения (MVP) для EL. Комбинация структуры CU, информации режима, движения также может быть использована для того, чтобы извлекать соответствующую информацию для EL. В другом варианте осуществления согласно настоящему изобретению способ и устройство извлекают кандидатов предикторов вектора движения (MVP) или кандидатов слияния EL на основе MVP-кандидатов или кандидатов слияния BL. В еще одном другом варианте осуществления настоящего изобретения способ и устройство извлекают режим внутреннего прогнозирования EL на основе режима внутреннего прогнозирования BL.[0015] A method and apparatus for scalable video encoding is disclosed that uses base layer (BL) information for an enhancement layer (EL), wherein EL has and / or better quality higher resolution than BL. Embodiments of the present invention utilize various fragments of BL information to increase EL coding efficiency. In one embodiment, according to the present invention, the method and apparatus uses CU structure information, mode information or motion information BL to retrieve corresponding CU structure information, mode information, or motion vector predictor (MVP) information for EL. The combination of the CU structure, mode information, motion can also be used to extract the corresponding information for the EL. In another embodiment, according to the present invention, the method and apparatus extracts motion vector predictor candidates (MVPs) or EL merge candidates based on MVP candidates or BL merge candidates. In yet another embodiment of the present invention, the method and apparatus derives an intra prediction mode EL based on the intra prediction mode BL.

[0016] Вариант осуществления настоящего изобретения использует информацию структуры в виде остаточного дерева квадрантов BL для того, чтобы извлекать структуру в виде остаточного дерева квадрантов для EL. Другой вариант осуществления настоящего изобретения извлекает текстуру EL посредством повторной дискретизации текстуры BL. Дополнительный вариант осуществления настоящего изобретения извлекает предиктор остатка EL посредством повторной дискретизации остатка BL.[0016] An embodiment of the present invention uses residual quadrant tree structure information BL to extract a residual quadrant tree structure for EL. Another embodiment of the present invention extracts an EL texture by re-sampling a BL texture. A further embodiment of the present invention retrieves the EL residue predictor by resampling the BL residue.

[0017] Один аспект настоящего изобретения направлен на эффективность кодирования контекстно-адаптивного энтропийного кодирования для EL. Вариант осуществления настоящего изобретения определяет контекстную информацию для обработки элемента синтаксиса EL с использованием информации BL. Другой аспект настоящего изобретения направлен на связанную с эффективностью кодирования внутриконтурную обработку. Вариант осуществления настоящего изобретения извлекает ALF-информацию, SAO-информацию или DF-информацию для EL с использованием ALF-информации, SAO-информации или DF-информации BL, соответственно.[0017] One aspect of the present invention is directed to coding efficiency of context adaptive entropy coding for EL. An embodiment of the present invention defines context information for processing an EL syntax element using BL information. Another aspect of the present invention is directed to coding efficiency related to in-loop processing. An embodiment of the present invention extracts ALF information, SAO information, or DF information for an EL using ALF information, SAO information, or DF information BL, respectively.

Краткое описание чертежейBrief Description of the Drawings

[0018] Фиг. 1 иллюстрирует пример временного масштабируемого кодирования видео с использованием иерархических B-изображений.[0018] FIG. 1 illustrates an example of temporal scalable video coding using hierarchical B-images.

[0019] Фиг. 2 иллюстрирует пример комбинированной системы масштабируемого кодирования видео, которая предоставляет пространственную масштабируемость, а также масштабируемость по качеству, при которой предоставляются три пространственных слоя.[0019] FIG. 2 illustrates an example of a combined scalable video coding system that provides spatial scalability as well as quality scalability in which three spatial layers are provided.

[0020] Фиг. 3 иллюстрирует пример многократного использования структуры CU для масштабируемого кодирования видео, при этом структура CU для базового слоя масштабируется и используется в качестве начальной структуры CU для слоя улучшения.[0020] FIG. 3 illustrates an example of reusing the CU structure for scalable video coding, wherein the CU structure for the base layer is scaled and used as the initial CU structure for the enhancement layer.

[0021] Фиг. 4 иллюстрирует примерную блок-схему последовательности операций способа кодирования структуры CU или кодирования информации движения для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения.[0021] FIG. 4 illustrates an example flowchart of a method for encoding a CU structure or encoding motion information for scalable video encoding according to an embodiment of the present invention.

[0022] Фиг. 5 иллюстрирует примерную блок-схему последовательности операций способа извлечения MVP или извлечения кандидатов слияния для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения.[0022] FIG. 5 illustrates an example flowchart of a method for extracting MVPs or extracting merge candidates for scalable video encoding according to an embodiment of the present invention.

[0023] Фиг. 6 иллюстрирует примерную блок-схему последовательности операций способа извлечения режима внутреннего прогнозирования для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения.[0023] FIG. 6 illustrates an example flowchart of a method for deriving an intra prediction mode for scalable video encoding according to an embodiment of the present invention.

[0024] Фиг. 7 иллюстрирует примерную блок-схему последовательности операций способа кодирования структуры в виде остаточного дерева квадрантов для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения.[0024] FIG. 7 illustrates an example flowchart of a method for encoding a residual quadrant tree structure for scalable video encoding according to an embodiment of the present invention.

[0025] Фиг. 8 иллюстрирует примерную блок-схему последовательности операций способа прогнозирования текстуры и повторной дискретизации для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения.[0025] FIG. 8 illustrates an example flowchart of a texture prediction and resampling method for scalable video encoding according to an embodiment of the present invention.

[0026] Фиг. 9 иллюстрирует примерную блок-схему последовательности операций способа остаточного прогнозирования и повторной дискретизации для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения.[0026] FIG. 9 illustrates an example flowchart of a residual prediction and resampling method for scalable video encoding according to an embodiment of the present invention.

[0027] Фиг. 10 иллюстрирует примерную блок-схему последовательности операций способа контекстно-адаптивного энтропийного кодирования для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения.[0027] FIG. 10 illustrates an example flowchart of a context adaptive entropy encoding method for scalable video encoding according to an embodiment of the present invention.

[0028] Фиг. 11 иллюстрирует примерную блок-схему последовательности операций способа кодирования ALF-информации, кодирования SAO-информации и кодирования DF-информации для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения.[0028] FIG. 11 illustrates an example flowchart of a method for encoding ALF information, encoding SAO information, and encoding DF information for scalable video encoding according to an embodiment of the present invention.

Подробное описание изобретенияDETAILED DESCRIPTION OF THE INVENTION

[0029] В HEVC, структура единицы кодирования (CU) введена в качестве новой блочной структуры для процесса кодирования. Изображение разделяется на наибольшие CU (LCU), и каждая LCU адаптивно сегментируется на CU до тех пор, пока концевая CU не будет получена, или минимальный CU-размер не будет достигнут. Информация структуры CU должна быть передана на сторону декодера, так что та же самая структура CU может быть восстановлена на стороне декодера. Чтобы повышать эффективность кодирования, ассоциированную с CU-структурой для масштабируемого HEVC, вариант осуществления согласно настоящему изобретению дает возможность CU-структуре BL повторно использоваться посредством EL. На уровне EL LCU или CU один флаг передается, чтобы указывать то, используется многократно или нет структура CU из соответствующей CU BL. Если BL структура CU многократно используется, BL структура CU масштабируется так, что она совпадает с разрешениями EL, и масштабированная BL структура CU многократно используется посредством EL. В некоторых вариантах осуществления, информация структуры CU, которая может быть многократно использована посредством EL, включает в себя флаг разбиения CU и флаг разбиения остаточного дерева квадрантов. Кроме того, концевая CU масштабированных CU-структур дополнительно может разбиваться на суб-CU. Фиг. 3 иллюстрирует пример многократного использования CU-сегмента. Сегмент 310 соответствует CU-структуре BL. Разрешение видео EL в два раза превышает разрешение видео BL по горизонтали и по вертикали. Структура CU соответствующего CU-сегмента 315 BL масштабируется с повышением на 2. Масштабированная структура CU 320 затем используется в качестве начальной структуры CU для EL LCU. Концевые CU масштабированной CU в EL дополнительно могут разбиваться на суб-CU, и результат указывается посредством 330 на фиг. 3. Флаг может быть использован для того, чтобы указывать то, разделяется или нет концевая CU дополнительно на суб-CU. Хотя фиг. 3 иллюстрирует пример, в котором многократно используется структура CU, другая информация также может быть многократно использована. Например, тип прогнозирования, размер прогнозирования, индекс слияния, опорное направление внешнего кодирования, индекс опорного изображения, векторы движения, индекс MVP и внутренний режим. Информация/данные могут масштабироваться при необходимости до того, как информация/данные многократно используются в EL.[0029] In HEVC, the coding unit (CU) structure is introduced as a new block structure for the coding process. The image is divided into the largest CUs (LCUs), and each LCU is adaptively segmented into CUs until an end CU is received or the minimum CU size is reached. The CU structure information must be transmitted to the decoder side, so that the same CU structure can be restored on the decoder side. In order to increase the coding efficiency associated with the CU structure for scalable HEVC, an embodiment according to the present invention enables the CU structure of BL to be reused by EL. At the EL LCU or CU level, one flag is sent to indicate whether the CU structure from the corresponding BL CU is reused. If the BL structure of the CU is reused, the BL structure of the CU is scaled so that it matches the EL permissions, and the scaled BL structure of the CU is reused by the EL. In some embodiments, CU structure information that can be reused by EL includes a CU split flag and a residual quadrant tree split flag. In addition, the terminal CU of the scaled CU structures can be further partitioned into sub-CUs. FIG. 3 illustrates an example of a reuse of a CU segment. Segment 310 corresponds to the CU structure of BL. The resolution of the EL video is twice the resolution of the horizontal and vertical BL video. The CU structure of the corresponding CU segment 315 BL is scaled up by 2. The scaled structure of the CU 320 is then used as the initial CU structure for the EL LCU. The end CUs of the scaled CU in the EL can be further partitioned into sub-CUs and the result is indicated by 330 in FIG. 3. The flag can be used to indicate whether or not the terminal CU is further divided into sub-CUs. Although FIG. 3 illustrates an example in which the CU structure is reused; other information can also be reused. For example, prediction type, prediction size, merge index, reference direction of external coding, reference image index, motion vectors, MVP index, and internal mode. Information / data can be scaled as needed before the information / data is reused in the EL.

[0030] В другом варианте осуществления согласно настоящему изобретению, информация режима для концевой CU многократно используется. Информация режима может включать в себя флаг пропуска, тип прогнозирования, размер прогнозирования, опорное направление внешнего кодирования, индекс опорного изображения, векторы движения, индекс вектора движения, флаг слияния, индекс слияния, режим пропуска, режим слияния и внутренний режим. Информация режима концевой CU в EL может совместно использовать тот же самый или масштабируемую информацию режима соответствующей CU в BL. Один флаг может быть использован для того, чтобы указывать то, использует многократно или нет EL информацию режима из BL. Для одного или более фрагментов информации режима один флаг может быть использован для того, чтобы указывать то, использует многократно или нет EL эту информацию режима из BL. В еще одном другом варианте осуществления согласно настоящему изобретению информация движения соответствующей единицы прогнозирования (PU) или единицы кодирования (CU) в BL многократно используется для того, чтобы извлекать информацию движения PU или CU в EL. Информация движения может включать в себя направление внешнего прогнозирования, индекс опорного изображения, векторы движения (MV), предикторы вектора движения (MVP), индекс MVP, индекс слияния, кандидатов слияния и внутренний режим. Информация движения для BL может быть использована в качестве предикторов или кандидатов для информации предиктора вектора движения (MVP) в EL. Например, BL MV (вектор движения BL) и BL MVP (предиктор вектора движения BL) могут добавляться в список MVP и/или список слияния для извлечения MVP EL. Вышеуказанные MV из BL могут быть MV соответствующей PU в BL, MV соседних PU соответствующих PU в BL, MV кандидатов слияния соответствующих PU в BL, MVP соответствующих PU в BL или совместно размещенные MV соответствующих PU в BL.[0030] In another embodiment according to the present invention, the mode information for the terminal CU is reused. The mode information may include a skip flag, a prediction type, a prediction size, a reference direction of external coding, a reference image index, motion vectors, a motion vector index, a merge flag, a merge index, a skip mode, a merge mode, and an internal mode. The end CU mode information in the EL may share the same or scalable mode information of the corresponding CU in the BL. One flag can be used to indicate whether EL mode information from BL is used repeatedly or not. For one or more pieces of mode information, one flag may be used to indicate whether EL repeatedly uses this mode information from BL. In yet another embodiment, according to the present invention, the motion information of the corresponding prediction unit (PU) or coding unit (CU) in BL is repeatedly used to retrieve the motion information of the PU or CU in EL. Motion information may include an external prediction direction, a reference image index, motion vectors (MV), motion vector predictors (MVP), an MVP index, a merge index, merge candidates, and an internal mode. The motion information for BL can be used as predictors or candidates for motion vector predictor (MVP) information in EL. For example, BL MV (motion vector BL) and BL MVP (predictor BL motion vector) can be added to the MVP list and / or merge list to retrieve the MVP EL. The above MVs from BL can be MVs of the corresponding PUs in BLs, MVs of neighboring PUs of the corresponding PUs in BLs, MVs of mergers of the corresponding PUs in BLs, MVPs of the corresponding PUs in BLs, or co-located MVs of the corresponding PUs in BLs.

[0031] В другом примере, извлечение кандидатов слияния для EL может использовать информацию движения BL. Например, кандидаты слияния соответствующей PU в BL могут добавляться в список кандидатов слияния и/или в список MVP. Вышеуказанная информация движения BL может быть информацией движения соответствующей PU в BL, информацией движения, ассоциированной с соседней PU соответствующей PU в BL, кандидатами слияния соответствующих PU в BL, MVP соответствующих PU в BL или совместно размещенной PU соответствующей PU в BL. В этом случае, информация движения включает в себя направление внешнего прогнозирования, индекс опорного изображения и векторы движения.[0031] In another example, merge candidate retrieval for an EL may use motion information BL. For example, merger candidates of the corresponding PU in BL may be added to the merge candidate list and / or to the MVP list. The above motion information BL may be motion information of a corresponding PU in BL, motion information associated with a neighboring PU of a corresponding PU in BL, candidates for merging the corresponding PUs in BL, MVPs of the corresponding PUs in BL, or co-located PUs of the corresponding PU in BL. In this case, the motion information includes an external prediction direction, a reference image index, and motion vectors.

[0032] В еще одном другом примере, внутренний режим соответствующей PU или CU в BL может быть многократно использован для EL. Например, внутренний режим соответствующей PU или CU в BL может добавляться в список наиболее вероятных внутренних режимов. Вариант осуществления согласно настоящему изобретению использует информацию движения BL для того, чтобы прогнозировать внутренний режим для EL. Порядок относительно списка наиболее вероятных режимов в EL может быть адаптивно изменен согласно информации режима внутреннего прогнозирования в BL. Соответственно, длины кодовых слов для кодовых слов в списке наиболее вероятных режимов в EL могут быть адаптивно изменены согласно информации режима внутреннего прогнозирования в BL. Например, кодовым словам оставшихся внутренних режимов с направлениями прогнозирования, близкими к направлению прогнозирования кодированного внутреннего BL-режима, назначается меньшая длина. В качестве другого примера, режимы соседних направлений внутреннего BL-режима также могут добавляться в список наиболее вероятных внутренних режимов (MPM) кодирования внутреннего EL-режима. Информация режима внутреннего прогнозирования BL может быть режимом внутреннего прогнозирования соответствующей PU в BL или режимами соседних направлений внутреннего BL-режима, или режимом внутреннего прогнозирования соседней PU соответствующей PU в BL.[0032] In yet another example, the internal mode of the corresponding PU or CU in BL can be reused for EL. For example, the internal mode of the corresponding PU or CU in BL may be added to the list of most likely internal modes. An embodiment according to the present invention uses BL motion information to predict an internal mode for an EL. The order relative to the list of the most likely modes in the EL can be adaptively changed according to the information of the intra prediction mode in BL. Accordingly, the codeword lengths for the codewords in the list of the most likely modes in the EL can be adaptively changed according to the information of the intra prediction mode in BL. For example, the code words of the remaining internal modes with prediction directions close to the prediction direction of the encoded internal BL mode are assigned a shorter length. As another example, modes of adjacent directions of the internal BL mode may also be added to the list of most probable internal modes (MPM) of encoding the internal EL mode. The information of the intra prediction mode BL may be the intra prediction mode of the corresponding PU in BL or the neighboring direction modes of the intra BL mode, or the intra prediction mode of the neighboring PU of the corresponding PU in BL.

[0033] Выбранный индекс MVP, индекс слияния и индекс внутреннего режима информации движения BL могут быть использованы для того, чтобы адаптивно изменять порядок индексов в EL списке MVP, списке индекса слияния и списке наиболее вероятных внутренних режимов. Например, в версии 3.0 тестовой модели HEVC (HM 3.0), порядок списка MVP представляет собой {левого MVP, вышерасположенного MVP, совместно размещенного MVP}. Если соответствующая BL PU выбирает вышерасположенного MVP, то порядок вышерасположенного MVP перемещается вперед в EL. Соответственно, список MVP в EL становится {вышерасположенный MVP, левый MVP, совместно размещенный MVP}. Кроме того, BL-кодированный MV, масштабированный кодированный MV, MVP-кандидаты, масштабированные MVP-кандидаты, кандидаты слияния и масштабированные кандидаты слияния могут заменять часть EL MVP-кандидатов и/или кандидатов слияния. Процесс извлечения информации движения для PU или CU в EL на основе информации движения для соответствующей PU или CU в BL активируется, когда MVP-кандидат или кандидат слияния для PU или CU в EL требуется для кодирования или декодирования.[0033] The selected MVP index, the merge index, and the index of the internal mode of the motion information BL can be used to adaptively reorder the indexes in the EL MVP list, the merge index list, and the list of the most likely internal modes. For example, in version 3.0 of the HEVC test model (HM 3.0), the MVP list order is {left MVP, upstream MVP, co-hosted MVP}. If the corresponding BL PU selects the upstream MVP, then the order of the upstream MVP moves forward to EL. Accordingly, the MVP list in EL becomes {upstream MVP, left MVP, co-hosted MVP}. In addition, BL-coded MV, scaled coded MV, MVP candidates, scaled MVP candidates, merge candidates and scaled merge candidates can replace part of the EL MVP candidates and / or merge candidates. The process of extracting motion information for the PU or CU in the EL based on the motion information for the corresponding PU or CU in the BL is activated when the MVP candidate or merge candidate for the PU or CU in the EL is required for encoding or decoding.

[0034] Как упомянуто выше, информация структуры CU для BL может быть использована для того, чтобы определять информацию структуры CU для EL. Кроме того, информация структуры CU, информация режима и информация движения для BL может быть использована совместно, чтобы определять информацию структуры CU, информацию режима и информацию движения для EL. Информация режима или информация движения для BL также может быть использована для того, чтобы определять информацию режима или информацию движения для EL. Процесс извлечения информации структуры CU, информации режима, информации движения или любой комбинации для EL на основе соответствующей информации для B, может активироваться, когда информация структуры CU, информация режима, информация движения или любая комбинация для EL должна быть кодирована либо декодирована.[0034] As mentioned above, CU structure information for BL can be used to determine CU structure information for EL. In addition, CU structure information, mode information, and motion information for BL can be used together to determine CU structure information, mode information, and motion information for EL. The mode information or motion information for BL can also be used to determine the mode information or motion information for the EL. The process of extracting CU structure information, mode information, motion information, or any combination for EL based on the corresponding information for B, can be activated when CU structure information, mode information, motion information, or any combination for EL must be encoded or decoded.

[0035] В HM 3.0, остаток прогнозирования дополнительно обрабатывается с использованием сегментирования с помощью дерева квадрантов, и тип кодирования выбирается для каждого блока результатов сегмента остаточного дерева квадрантов. Как информация сегмента остаточного дерева квадрантов, так и информация шаблона блока кодирования (CBP) должны быть включены в поток битов, так что декодер может восстанавливать информацию остаточного дерева квадрантов. Вариант осуществления согласно настоящему изобретению многократно использует сегмент остаточного дерева квадрантов и CBP соответствующей CU в BL для EL. Сегмент остаточного дерева квадрантов и CBP могут масштабироваться и использоваться в качестве предиктора для кодирования сегментов остаточного дерева квадрантов EL и CBP-кодирования. В HEVC, единица для блочного преобразования называется "единицей преобразования (TU)", и TU может быть сегментирована на меньшие TU. В варианте осуществления настоящего изобретения, один флаг для корневого TU-уровня или TU-уровня EL передается, чтобы указывать то, используется или нет структура кодирования остаточного дерева квадрантов (RQT) соответствующей TU в BL для того, чтобы прогнозировать RQT-структуру текущей TU в EL. Если RQT-структура соответствующей TU в BL используется для того, чтобы прогнозировать RQT-структуру текущей TU в EL, RQT-структура соответствующей TU в BL масштабируется и используется в качестве начальной RQT-структуры текущей TU в EL. В концевой TU начальной RQT-структуры для EL один флаг разбиения может быть передан, чтобы указывать то, разделяется или нет TU на под-TU. Процесс извлечения RQT-структуры EL на основе информации RQT-структуры BL выполняется, когда кодер должен кодировать RQT-структуру EL, или декодер должен декодировать RQT-структуру EL.[0035] In HM 3.0, the prediction residual is further processed using segmentation using a quadrant tree, and a coding type is selected for each block block of the residual quadrant tree segment. Both segment information of the residual quadrant tree and the information of the coding block template (CBP) must be included in the bit stream, so that the decoder can recover the information of the residual quadrant tree. An embodiment of the present invention reuses the residual quadrant tree segment and CBP of the corresponding CU in BL for EL. The residual quadrant tree segment and CBP can be scaled and used as a predictor for coding segments of the residual quadrant tree of EL and CBP coding. In HEVC, a unit for block transform is called a "transform unit (TU)", and TU can be segmented into smaller TUs. In an embodiment of the present invention, one flag for the root TU level or EL TU level is transmitted to indicate whether or not the coding structure of the residual quadrant tree (RQT) of the corresponding TU in BL is used in order to predict the RQT structure of the current TU in EL. If the RQT structure of the corresponding TU in BL is used to predict the RQT structure of the current TU in EL, the RQT structure of the corresponding TU in BL is scaled and used as the initial RQT structure of the current TU in EL. In the terminal TU of the initial RQT structure for the EL, one split flag may be transmitted to indicate whether or not the TU is split into sub-TUs. The process of extracting the RQT structure of EL based on the information of the RQT structure of BL is performed when the encoder must encode the RQT structure of EL, or the decoder must decode the RQT structure of EL.

[0036] В масштабируемом расширении H.264/AVC 4-отводные и 2-отводные FIR-фильтры приспосабливаются для операции повышающей дискретизации сигнала текстуры для компонентов сигнала яркости и сигнала цветности, соответственно. Вариант осуществления согласно настоящему изобретению повторно дискретизирует BL-текстуру в качестве предиктора EL-текстуры, при этом повторная дискретизация использует улучшенные способы повышающей дискретизации для того, чтобы заменять 4-отводный и 2-отводный FIR-фильтр в масштабируемом расширении H.264/AVC. Фильтр согласно настоящему изобретению использует один из следующих фильтров или комбинацию следующих фильтров: интерполяционный фильтр на основе дискретного косинусного преобразования (DCTIF), интерполяционный фильтр на основе дискретного синусного преобразования (DSTIF), фильтр Винера, фильтр нелокальных средних значений, сглаживающий фильтр и билатеральный фильтр. Фильтр согласно настоящему изобретению может пересекать границы TU или может быть ограничен рамками границ TU. Вариант осуществления согласно настоящему изобретению может пропускать процедуры дополнения и удаления блочности в межслойном внутреннем прогнозировании, чтобы снижать остроту вычислительной проблемы и проблемы зависимости по данным. Дискретизированное адаптивное смещение (SAO), адаптивный контурный фильтр (ALF), фильтр нелокальных средних значений и/или сглаживающий фильтр в BL также может пропускаться. Пропуск дополнения, удаления блочности, SAO, ALF, фильтр нелокальных средних значений и сглаживающий фильтр может применяться ко всей LCU, концевой CU, PU, TU, предварительно заданной области, границе LCU, границе концевой CU, границе PU, границе TU или границе предварительно заданной области. В другом варианте осуществления, текстура BL обрабатывается с использованием фильтра, чтобы формировать фильтрованную BL-текстуру, и BL-текстура имеет разрешение, тот же самый разрешению EL-текстуры, и используется в качестве предиктора текстуры EL. Фильтр Винера, ALF (адаптивный контурный фильтр), фильтр нелокальных средних значений, сглаживающий фильтр или SAO (дискретизированное адаптивное смещение) может применяться к текстуре BL до того, как текстура BL используется в качестве предиктора текстуры EL.[0036] In the scalable H.264 / AVC extension, 4-tap and 2-tap FIR filters are adapted for the upsampling operation of a texture signal for luminance and chrominance components, respectively. An embodiment of the present invention resambles the BL texture as a predictor of the EL texture, while resampling uses improved upsampling techniques to replace the 4-tap and 2-tap FIR filter in the scalable H.264 / AVC extension. The filter according to the present invention uses one of the following filters or a combination of the following filters: discrete cosine transform interpolation filter (DCTIF), discrete sinus transform interpolation filter (DSTIF), Wiener filter, non-local mean filter, smoothing filter and bilateral filter. The filter according to the present invention may cross the boundaries of the TU or may be limited by the boundaries of the TU. An embodiment of the present invention may skip the addition and de-blocking procedures in interlayer intra prediction to reduce the severity of the computational problem and the data dependency problem. Discretized adaptive bias (SAO), adaptive loop filter (ALF), non-local average filter and / or smoothing filter in BL can also be skipped. Skip padding, deblocking, SAO, ALF, a non-local average filter, and a smoothing filter can be applied to the entire LCU, end CU, PU, TU, predefined area, LCU boundary, end CU boundary, PU boundary, TU boundary, or predefined boundary area. In another embodiment, the BL texture is processed using a filter to form a filtered BL texture, and the BL texture has a resolution that is the same as the resolution of the EL texture and is used as a predictor of the EL texture. The Wiener filter, ALF (adaptive loop filter), non-local average filter, smoothing filter, or SAO (sampled adaptive bias) can be applied to the BL texture before the BL texture is used as a predictor of the EL texture.

[0037] Чтобы повышать качество изображений, вариант осуществления настоящего изобретения применяет фильтр Винера или адаптивный фильтр к текстуре BL до того, как текстура BL повторно дискретизируется. Альтернативно, фильтр Винера или адаптивный фильтр может применяться к текстуре BL после того, как текстура BL повторно дискретизируется. Кроме того, вариант осуществления настоящего изобретения применяет SAO или ALF к текстуре BL до того, как текстура BL повторно дискретизируется.[0037] In order to improve image quality, an embodiment of the present invention applies a Wiener filter or an adaptive filter to a BL texture before the BL texture is resampled. Alternatively, a Wiener filter or an adaptive filter may be applied to the BL texture after the BL texture is resampled. In addition, an embodiment of the present invention applies SAO or ALF to the BL texture before the BL texture is resampled.

[0038] Другой вариант осуществления согласно настоящему изобретению использует фильтр Винера на основе CU или на основе LCU и/или адаптивное смещение для межслойного внутреннего прогнозирования. Фильтрация может применяться к данным BL-текстуры или данным BL-текстуры после повышающей дискретизации.[0038] Another embodiment according to the present invention uses a CU or LCU based Wiener filter and / or adaptive bias for interlayer intra prediction. Filtering can be applied to BL texture data or BL texture data after upsampling.

[0039] В H.264 SVC, 2-отводный FIR-фильтр приспосабливается для операции повышающей дискретизации остаточного сигнала для обоих компонентов сигнала яркости и сигнала цветности. Вариант осуществления согласно настоящему изобретению использует улучшенные способы повышающей дискретизации для того, чтобы заменять 2-отводный FIR-фильтр H.264 SVC. Фильтр может быть одним из следующих фильтров или комбинацией следующих фильтров: интерполяционный фильтр на основе дискретного косинусного преобразования (DCTIF), интерполяционный фильтр на основе дискретного синусного преобразования (DSTIF), фильтр Винера, фильтр нелокальных средних значений, сглаживающий фильтр и билатеральный фильтр. Когда EL имеет более высокое пространственное разрешение, чем BL, вышеуказанные фильтры могут применяться для того, чтобы повторно дискретизировать BL-остаток. Все вышеуказанные фильтры могут быть ограничены таким образом, что они пересекают или не пересекают границы TU. Кроме того, остаточное прогнозирование может быть выполнено в пространственной области или в частотной области, если BL и EL имеют то же самое разрешение, или EL имеет более высокое разрешение, чем BL. Когда EL имеет более высокое пространственное разрешение, чем BL, остаток BL может быть повторно дискретизирован в частотной области, чтобы формировать предикторы для EL-остатка. Процесс извлечения предиктора остатка EL посредством повторной дискретизации остатка BL может быть выполнен, когда кодер или декодер должен извлекать предиктор остатка EL на основе повторно дискретизированного остатка BL.[0039] In the H.264 SVC, a 2-tap FIR filter is adapted for the upsampling operation of the residual signal for both components of the luminance signal and chrominance signal. An embodiment of the present invention uses improved upsampling methods to replace a 2-tap H.264 SVC FIR filter. The filter can be one of the following filters or a combination of the following filters: discrete cosine transform interpolation filter (DCTIF), discrete sinus transform interpolation filter (DSTIF), Wiener filter, non-local mean filter, smoothing filter, and bilateral filter. When the EL has a higher spatial resolution than BL, the above filters can be applied in order to resample the BL residue. All of the above filters can be limited so that they cross or do not cross the boundaries of the TU. In addition, residual prediction can be performed in the spatial domain or in the frequency domain if BL and EL have the same resolution or EL has a higher resolution than BL. When an EL has a higher spatial resolution than BL, the BL residual can be resampled in the frequency domain to form predictors for the EL residual. The process of retrieving the EL residual predictor by resampling the BL residual can be performed when the encoder or decoder needs to extract the EL residual predictor based on the resampled BL residual.

[0040] Вариант осуществления согласно настоящему изобретению может использовать BL-информацию для контекстно-адаптивного энтропийного кодирования в EL. Например, формирование контекста или преобразование в двоичную форму CABAC (контекстно-адаптивного двоичного арифметического кодирования) может использовать информацию BL. EL может использовать различные контекстные модели, различные способы формирования контекста или различные контекстные наборы на основе соответствующей информации в BL. Например, EL PU может использовать различные контекстные модели в зависимости от того, кодируется соответствующая PU в BL в режиме пропуска или нет. В другом варианте осуществления настоящего изобретения, вероятность или наиболее вероятный символ (MPS) части контекстных моделей для CABAC в BL может быть многократно использована для того, чтобы извлекать начальную вероятность и MPS части контекстных моделей для CABAC в EL. Элемент синтаксиса может представлять собой флаг разбиения, флаг пропуска, флаг слияния, индекс слияния, внутренний режим сигнала цветности, внутренний режим сигнала яркости, размер сегмента, режим прогнозирования, направление внешнего прогнозирования, разность векторов движения, индекс предиктора вектора движения, опорный индекс, параметр дельта-квантования, флаг значимости, последняя значимая позиция, коэффициент, больший единицы (coefficient-greater-than-one), абсолютную величину коэффициента минус один (coefficient-magnitude-minus-one), флаг управления ALF (адаптивным контурным фильтром), ALF-флаг, размер ALF-следа, флаг ALF-слияния, решение по активации-деактивации ALF, ALF-коэффициент, флаг дискретизированного адаптивного смещения (SAO), SAO-тип, SAO-смещение, флаг SAO-слияния, SAO-серию, решение по активации-деактивации SAO, флаги подразделения преобразования, CBF (флаг кодированного блока) остаточного дерева квадрантов или CBF корня остаточного дерева квадрантов. Кодовое слово, соответствующее элементам синтаксиса, может быть адаптивно изменено согласно информации BL, и порядок кодового слова, соответствующий элементам синтаксиса EL в таблице кодовых слов поиска, также может быть адаптивно изменен согласно информации BL. Процесс определения контекстной информации для обработки элемента синтаксиса EL с использованием информации BL выполняется, когда элемент синтаксиса EL должен быть кодирован или декодирован.[0040] An embodiment according to the present invention may use BL information for context adaptive entropy coding in EL. For example, context formation or binary conversion of CABAC (context adaptive binary arithmetic coding) may use BL information. An EL may use different context models, different context formation methods, or different context sets based on the corresponding information in BL. For example, an EL PU may use different context models depending on whether the corresponding PU in BL is encoded in skip mode or not. In another embodiment of the present invention, the probability or most likely symbol (MPS) of the context model part for CABAC in BL can be reused to extract the initial probability and the MPS of context model parts for CABAC in EL. The syntax element can be a splitting flag, a skipping flag, a merging flag, a merging index, an internal chroma signal mode, an internal luminance signal mode, a segment size, a prediction mode, an external prediction direction, a motion vector difference, a motion vector predictor index, a reference index, a parameter delta quantization, significance flag, last significant position, coefficient greater than one (coefficient-greater-than-one), absolute coefficient minus one (coefficient-magnitude-minus-one), ALF control flag (hell active contour filter), ALF flag, ALF trace size, ALF merge flag, ALF activation / deactivation solution, ALF coefficient, discretized adaptive bias (SAO) flag, SAO type, SAO bias, SAO merge flag , SAO series, SAO activation-deactivation solution, conversion unit flags, CBF (coded block flag) of the residual quadrant tree, or CBF of the root of the residual quadrant tree. The codeword corresponding to the syntax elements can be adaptively changed according to the BL information, and the codeword order corresponding to the EL syntax elements in the search codeword table can also be adaptively changed according to the BL information. The context information determination process for processing the EL syntax element using the BL information is performed when the EL syntax element is to be encoded or decoded.

[0041] Вариант осуществления настоящего изобретения использует некоторую ALF-информацию в BL для того, чтобы извлекать ALF-информацию в EL. ALF-информация может включать в себя режим адаптации фильтра, коэффициенты фильтрации, след фильтра, сегментацию на области, решение по активации-деактивации, флаг разрешения и результаты слияния. Например, EL может использовать часть ALF-параметров в BL в качестве ALF-параметров или предикторов ALF-параметров в EL. Когда ALF-информация многократно используется непосредственно из ALF-информации BL, нет необходимости передавать ассоциированные ALF-параметры для EL. Флаг может быть использован для того, чтобы указывать то, прогнозируется или нет ALF-информация для EL из ALF-информации BL. Если флаг указывает то, что ALF-информация для EL прогнозируется из ALF-информации BL, ALF-информация BL может масштабироваться и использоваться в качестве предиктора для ALF-информации EL. Значение может быть использовано для того, чтобы обозначать разность между предиктором ALF-информации и ALF-информацией EL. Процесс извлечения ALF-информации для EL с использованием ALF-информации BL выполняется, когда кодер или декодер должен извлекать ALF-информацию EL.[0041] An embodiment of the present invention uses some ALF information in BL to retrieve ALF information in EL. ALF information may include a filter adaptation mode, filter coefficients, filter trace, segmentation into regions, an activation-deactivation solution, a resolution flag, and merge results. For example, an EL may use part of the ALF parameters in BL as ALF parameters or predictors of ALF parameters in EL. When the ALF information is reused directly from the ALF information BL, there is no need to transfer the associated ALF parameters for the EL. The flag can be used to indicate whether or not the ALF information for EL is predicted from the ALF information BL. If the flag indicates that the ALF information for the EL is predicted from the ALF information BL, the ALF information BL can be scaled and used as a predictor for the ALF information EL. The value can be used to indicate the difference between the predictor of ALF information and ALF information EL. The process of extracting ALF information for the EL using ALF information BL is performed when the encoder or decoder is to extract the ALF information EL.

[0042] Вариант осуществления настоящего изобретения использует некоторую SAO-информацию в BL для того, чтобы извлекать SAO-информацию в EL. SAO-информация может включать в себя тип смещения, смещения, сегментацию на области, решение по активации-деактивации, флаг разрешения и результаты слияния. Например, EL может использовать часть SAO-параметров в BL в качестве SAO-параметров для EL. Когда SAO-информация многократно используется непосредственно из SAO-информации BL, нет необходимости передавать ассоциированные SAO-параметры для EL. Флаг может быть использован для того, чтобы указывать то, прогнозируется или нет SAO-информация для EL из SAO-информации BL. Если флаг указывает то, что SAO-информация для EL прогнозируется из SAO-информации BL, SAO-информация BL может масштабироваться и использоваться в качестве предиктора для SAO-информации EL. Значение может быть использовано для того, чтобы обозначать разность между предиктором SAO-информации и SAO-информацией EL. Процесс извлечения SAO-информации для EL с использованием SAO-информации BL выполняется, когда кодер или декодер должен извлекать SAO-информацию EL.[0042] An embodiment of the present invention uses some SAO information in BL to extract SAO information in EL. SAO information may include the type of displacement, displacement, segmentation into regions, an activation-deactivation decision, an enable flag, and merge results. For example, an EL may use a portion of SAO parameters in BL as SAO parameters for EL. When the SAO information is reused directly from the SAO information BL, there is no need to transmit the associated SAO parameters for the EL. The flag can be used to indicate whether or not the SAO information for EL is predicted from the SAO information BL. If the flag indicates that the SAO information for the EL is predicted from the SAO information BL, the SAO information BL can be scaled and used as a predictor for the SAO information EL. The value can be used to indicate the difference between the predictor of SAO information and SAO information EL. The process of extracting SAO information for the EL using SAO information BL is performed when the encoder or decoder is to extract the SAO information EL.

[0043] Вариант осуществления настоящего изобретения использует некоторую информацию фильтра удаления блочности (DF) в BL для того, чтобы извлекать DF-информацию в EL. DF-информация может включать в себя пороговые значения, такие как пороговые значения α, β и t_c, которые используются для того, чтобы определять граничную интенсивность (BS). DF-информация также может включать в себя параметры фильтра, решение по активации-деактивации фильтра, выбор слабого/сильного фильтра или интенсивность фильтрации. Когда DF-информация многократно используется непосредственно из DF-информации BL, нет необходимости передавать ассоциированные DF-параметры для EL. Флаг может быть использован для того, чтобы указывать то, прогнозируется или нет DF-информация для EL из DF-информации BL. Если флаг указывает то, что DF-информация для EL прогнозируется из DF-информации BL, DF-информация BL может масштабироваться и использоваться в качестве предиктора для DF-информации EL. Значение может быть использовано для того, чтобы обозначать разность между предиктором DF-информации и DF-информацией EL. Процесс извлечения DF-информации для EL с использованием DF-информации BL выполняется, когда кодер или декодер должен извлекать DF-информацию EL.[0043] An embodiment of the present invention uses some deblocking filter (DF) information in BL in order to extract DF information in EL. DF information may include threshold values, such as threshold values α, β, and t _c , which are used to determine the boundary intensity (BS). DF information may also include filter parameters, a decision to activate / deactivate the filter, select a weak / strong filter, or filter intensity. When the DF information is reused directly from the DF information BL, there is no need to transmit the associated DF parameters for the EL. The flag can be used to indicate whether or not the DF information for EL is predicted from the DF information BL. If the flag indicates that DF information for the EL is predicted from the DF information BL, the DF information BL can be scaled and used as a predictor for the DF information EL. The value can be used to indicate the difference between the predictor of the DF information and the DF information of EL. The process of extracting DF information for the EL using the DF information BL is performed when the encoder or decoder is to extract the DF information EL.

[0044] Фиг. 4-11 иллюстрируют примерные блок-схемы последовательности операций способа для масштабируемого кодирования видео согласно различным вариантам осуществления настоящего изобретения. Фиг. 4 иллюстрирует примерную блок-схему последовательности операций способа кодирования структуры CU или кодирования информации движения для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения, в котором видеоданные конфигурируются в базовый слой (BL) и слой (EL) улучшения, при этом EL имеет более высокое пространственное разрешение или лучшее качество видео, чем BL. структура CU (структура единицы кодирования), информация движения или комбинация структуры CU и информации движения для CU (единицы кодирования) в BL определяются на этапе 410. Структура CU, информация предиктора вектора движения (MVP) или комбинация структуры CU и информации MVP для соответствующей CU в EL на основе структуры CU, информации движения или комбинации структуры CU и информации движения для CU в BL, соответственно, определяются на этапе 420. Фиг. 5 иллюстрирует примерную блок-схему последовательности операций способа извлечения MVP или извлечения кандидатов слияния для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения, в котором видеоданные конфигурируются в базовый слой (BL) и слой (EL) улучшения, при этом EL имеет более высокое пространственное разрешение или лучшее качество видео, чем BL. Информация движения для BL определяется на этапе 510. Кандидаты предикторов вектора движения (MVP) или кандидаты слияния в EL на основе информации движения BL извлекаются на этапе 520. Фиг. 6 иллюстрирует примерную блок-схему последовательности операций способа извлечения режима внутреннего прогнозирования для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения, в котором видеоданные конфигурируются в базовый слой (BL) и слой (EL) улучшения, при этом EL имеет более высокое пространственное разрешение или лучшее качество видео, чем BL. Информация режима внутреннего прогнозирования BL определяется на этапе 610. Режим внутреннего прогнозирования EL на основе информации режима внутреннего прогнозирования BL извлекается на этапе 620.[0044] FIG. 4-11 illustrate exemplary flowcharts of a method for scalable video encoding according to various embodiments of the present invention. FIG. 4 illustrates an example flowchart of a method for encoding a CU structure or encoding motion information for scalable video encoding according to an embodiment of the present invention in which video data is configured in a base layer (BL) and an enhancement layer (EL), wherein the EL has a higher spatial resolution or better video quality than BL. the CU structure (coding unit structure), motion information, or a combination of the CU structure and motion information for the CU (coding unit) in the BL is determined in step 410. The CU structure, the motion vector predictor information (MVP), or the combination of the CU structure and the MVP information for the corresponding CU in an EL based on a CU structure, motion information, or a combination of a CU structure and motion information for a CU in a BL, respectively, are determined in step 420. FIG. 5 illustrates an example flowchart of a method for extracting MVP or extracting merge candidates for scalable video encoding according to an embodiment of the present invention in which video data is configured in a base layer (BL) and an enhancement layer (EL), wherein the EL has a higher spatial resolution or better video quality than BL. Motion information for BL is determined in step 510. Motion vector predictor (MVP) candidates or EL merge candidates based on the BL motion information are extracted in step 520. FIG. 6 illustrates an example flowchart of a method for deriving an intra prediction mode for scalable video encoding according to an embodiment of the present invention in which video data is configured into a base layer (BL) and an enhancement layer (EL), wherein EL has higher spatial resolution or better video quality than BL. Information of the intra prediction mode BL is determined in step 610. The intra prediction mode EL based on the information of the intra prediction mode BL is extracted in step 620.

[0045] Фиг. 7 иллюстрирует примерную блок-схему последовательности операций способа кодирования структуры в виде остаточного дерева квадрантов для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения, в котором видеоданные конфигурируются в базовый слой (BL) и слой (EL) улучшения, при этом EL имеет более высокое пространственное разрешение или лучшее качество видео, чем BL. Информация RQT-структуры (структуры кодирования остаточного дерева квадрантов) BL определяется на этапе 710. RQT-структура EL на основе информации RQT-структуры BL извлекается на этапе 720. Фиг. 8 иллюстрирует примерную блок-схему последовательности операций способа прогнозирования текстуры и повторной дискретизации для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения, в котором видеоданные конфигурируются в базовый слой (BL) и слой (EL) улучшения, при этом EL имеет более высокое пространственное разрешение, чем BL, или лучшее качество видео, чем BL. Информация текстуры BL определяется на этапе 810. Предиктор текстуры EL на основе информации текстуры BL извлекается на этапе 820. Фиг. 9 иллюстрирует примерную блок-схему последовательности операций способа остаточного прогнозирования и повторной дискретизации для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения, в котором видеоданные конфигурируются в базовый слой (BL) и слой (EL) улучшения, при этом EL имеет более высокое пространственное разрешение, чем BL, или лучшее качество видео, чем BL. Остаточная информация BL определяется на этапе 910. Предиктор остатка EL посредством повторной дискретизации остатка BL извлекается на этапе 920.[0045] FIG. 7 illustrates an example flowchart of a method for encoding a residual quadrant tree structure for scalable video encoding according to an embodiment of the present invention, in which video data is configured in a base layer (BL) and an enhancement layer (EL), wherein the EL has a higher spatial resolution or better video quality than BL. Information of the RQT structure (coding structure of a residual quadrant tree) BL is determined at step 710. The RQT structure EL based on the information of the RQT structure BL is extracted at step 720. FIG. 8 illustrates an example flowchart of a texture prediction and resampling method for scalable video encoding according to an embodiment of the present invention, in which video data is configured in a base layer (BL) and an enhancement layer (EL), wherein EL has a higher spatial resolution, than BL, or better video quality than BL. The BL texture information is determined at 810. An EL texture predictor based on the BL texture information is retrieved at 820. FIG. 9 illustrates an example flowchart of a residual prediction and resampling method for scalable video encoding according to an embodiment of the present invention in which video data is configured in a base layer (BL) and an enhancement layer (EL), wherein the EL has a higher spatial resolution, than BL, or better video quality than BL. Residual information BL is determined at step 910. The predictor of the remainder EL by resampling the remainder BL is extracted at step 920.

[0046] Фиг. 10 иллюстрирует примерную блок-схему последовательности операций способа контекстно-адаптивного энтропийного кодирования для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения, в котором видеоданные конфигурируются в базовый слой (BL) и слой (EL) улучшения, при этом EL имеет более высокое пространственное разрешение или лучшее качество видео, чем BL. Информация BL определяется на этапе 1010. Контекстная информация для обработки элемента синтаксиса EL с использованием информации BL определяется на этапе 1020. Фиг. 11 иллюстрирует примерную блок-схему последовательности операций способа кодирования ALF-информации, кодирования SAO-информации и кодирования DF-информации для масштабируемого кодирования видео согласно варианту осуществления настоящего изобретения, в котором видеоданные конфигурируются в базовый слой (BL) и слой (EL) улучшения, при этом EL имеет более высокое пространственное разрешение или лучшее качество видео, чем BL. ALF-информация, SAO-информация или DF-информация BL определяется на этапе 1110. ALF-информация, SAO-информация или DF-информация для EL с использованием ALF-информации, SAO-информации или DF-информации BL, соответственно, извлекается на этапе 1120.[0046] FIG. 10 illustrates an exemplary flowchart of a context adaptive entropy encoding method for scalable video encoding according to an embodiment of the present invention, in which video data is configured in a base layer (BL) and an enhancement layer (EL), wherein the EL has a higher spatial resolution or Better video quality than BL. The BL information is determined at 1010. Context information for processing the EL syntax element using the BL information is determined at 1020. FIG. 11 illustrates an exemplary flowchart of a method for encoding ALF information, encoding SAO information, and encoding DF information for scalable video encoding according to an embodiment of the present invention in which video data is configured into a base layer (BL) and an enhancement layer (EL), however, EL has a higher spatial resolution or better video quality than BL. ALF information, SAO information, or DF information BL is determined in step 1110. ALF information, SAO information or DF information for EL using ALF information, SAO information, or DF information BL, respectively, is extracted in step 1120.

[0047] Варианты осуществления масштабируемого кодирования видео, в которых кодирование слоя улучшения использует информацию базового слоя согласно настоящему изобретению, как описано выше, могут быть реализованы в различных аппаратных средствах, программных кодах или в комбинации вышеозначенного. Например, вариант осуществления настоящего изобретения может представлять собой схему, интегрированную в кристалл для сжатия видео, или программные коды, интегрированные в программное обеспечение для сжатия видео, чтобы выполнять обработку, описанную в данном документе. Вариант осуществления настоящего изобретения также может представлять собой программные коды, которые должны выполняться в процессоре цифровых сигналов (DSP), чтобы выполнять обработку, описанную в данном документе. Изобретение также может заключать в себе ряд функций, которые должны быть выполнены посредством процессора компьютера, процессора цифровых сигналов, микропроцессора или программируемой пользователем вентильной матрицы (FPGA). Эти процессоры могут быть выполнены с возможностью осуществлять конкретные задачи согласно изобретению посредством выполнения машиночитаемого программного кода или микропрограммного кода, который задает конкретные способы, осуществленные посредством изобретения. Программный код или микропрограммные коды могут быть разработаны на различных языках программирования и в различном формате или стиле. Программный код также может быть компилирован для различных целевых платформ. Тем не менее, различные форматы кода, стили и языки программных кодов, а также другие средства конфигурирования кода для того, чтобы выполнять задачи в соответствии с изобретением, не должны отступать от сущности и объема изобретения.[0047] Embodiments of scalable video coding in which the enhancement layer coding uses the base layer information of the present invention as described above can be implemented in various hardware, software codes, or a combination of the above. For example, an embodiment of the present invention may be a circuit integrated in a chip for video compression, or program codes integrated in video compression software to perform the processing described herein. An embodiment of the present invention may also be program codes that must be executed in a digital signal processor (DSP) in order to perform the processing described herein. The invention may also comprise a number of functions that must be performed by a computer processor, a digital signal processor, a microprocessor, or a user programmable gate array (FPGA). These processors can be configured to carry out specific tasks according to the invention by executing computer-readable program code or firmware code that defines specific methods implemented by the invention. Program code or firmware codes can be developed in various programming languages and in a different format or style. The program code can also be compiled for various target platforms. However, various code formats, styles and languages of program codes, as well as other means of configuring the code in order to perform tasks in accordance with the invention, should not depart from the essence and scope of the invention.

[0048] Изобретение может быть осуществлено в других характерных формах без отступления от сущности или важнейших характеристик. Описанные примеры должны рассматриваться во всех отношениях только как иллюстративные, а не ограничивающие. Следовательно, объем изобретения указан посредством прилагаемой формулы изобретения, а не вышеприведенного описания. Все изменения, которые подпадают под смысл и рамки равнозначности формулы изобретения, должны охватываться ее объемом.[0048] The invention may be embodied in other characteristic forms without departing from the spirit or essential characteristics. The described examples should be considered in all respects only as illustrative and not restrictive. Therefore, the scope of the invention is indicated by the appended claims, and not the above description. All changes that fall within the meaning and scope of equivalence of the claims, should be covered by its volume.

Claims

1. A method of encoding a CU structure (coding unit), encoding mode information, or encoding motion information for scalable video encoding, in which video data is configured into a base layer (BL) and an enhancement layer (EL), and wherein EL has a higher spatial resolution or better video quality than BL, while the method comprises the steps of:
determining a CU structure (coding unit structure), a mode, motion information, or a combination of a CU structure, mode and motion information for a CU (coding unit) in BL; and
determining a CU structure, mode, motion vector predictor (MVP) information, or a combination of CU structure, mode, and MVP information for the corresponding CU in an EL based on a CU structure, mode, motion information, or a combination of CU structure, mode, and motion information for CU in BL , respectively;
wherein the mode is a skip mode, a merge mode, or an internal mode.

2. The method of claim 1, wherein said determining a CU structure, mode, MVP information, or a combination of CU structure, mode, and MVP information for a corresponding CU in an EL based on a CU structure, mode, motion information, or a combination of CU structure, mode, and information The movement for the CU in BL, respectively, is performed if the encoder needs to encode the CU structure, mode, MVP information, or a combination of the CU structure, mode and MVP information, respectively, for the corresponding CU in EL.

3. The method of claim 1, wherein said determining a CU structure, mode, MVP information, or a combination of CU structure, mode, and MVP information for a corresponding CU in an EL based on a CU structure, mode, motion information, or a combination of CU structure, mode, and information The motion for the CU in BL, respectively, is performed if the decoder needs to decode the CU structure, mode, MVP information, or a combination of the CU structure, mode, MVP information, respectively, for the corresponding CU in EL.

4. The method of claim 1, further comprising including a first flag to indicate whether said determination of a CU structure, a mode, MVP information, or a combination of a CU structure, a mode, MVP information for a corresponding CU in an EL based on a CU structure, a mode, motion information or a combination of a CU structure, mode, and motion information for a CU in a BL, respectively, or not.

5. The method of claim 4, wherein the CU structure for the corresponding CU in EL is scaled from the CU structure for CU in BL and is used as the initial CU structure for CU in EL if the first flag indicates that said definition of the CU structure for the corresponding CU in EL is predicted based on the CU structure for CU in BL.

6. The method of claim 5, wherein the split flag is turned on to indicate whether the terminal CU of the corresponding CU in the EL is split into sub-CUs.

7. The method of claim 4, wherein the CU structure, mode, and MVP information for the corresponding CU in the EL are scaled from the CU structure, mode, and motion information for the CU in the BL, if the first flag indicates that the aforementioned definition of the CU structure, mode, and MVP information for the corresponding CU in the EL is predicted based on the structure of the CU, mode, and motion information for the CU in the BL.

8. The method of claim 1, wherein the CU structure is a CU split flag or a split quadrant residual tree split flag, and the motion information comprises one or a combination of an external prediction direction, a reference image index, a motion vector, a merge index and an MVP index when the aforementioned definition of the combination of the CU structure, mode and MVP information for the corresponding CU in the EL is based on the combination of the CU structure, mode and motion information for the CU in the BL, respectively.

9. The method of claim 1, wherein the CU is an end CU, and wherein said definition of a mode or MVP information for a corresponding CU in an EL is based on a mode or motion information, respectively, for a CU in a BL.

10. The method of claim 9, wherein said determining the mode or MVP information for the corresponding CU in the EL based on the mode or motion information for the CU in the BL, respectively, is performed if the encoder needs to encode the mode or information of the MVP, respectively, for the corresponding CU in EL.

11. The method of claim 9, wherein said determining the mode or MVP information for the corresponding CU in the EL based on the mode or motion information for the CU in the BL, respectively, is performed if the decoder needs to decode the mode or information of the MVP, respectively, for the corresponding CU in EL.

12. The method of claim 9, further comprising including a first flag to indicate whether said mode definition or MVP information is predicted for the corresponding terminal CU in the EL based on the mode or motion information, respectively, for the terminal CU in the BL, or not.

13. The method of claim 12, wherein the mode or MVP information for the corresponding terminal CU in the EL is scaled from the CU structure for the terminal CU in the BL if the first flag indicates that said definition of the mode or information of the MVP for the corresponding terminal CU in the EL is predicted based on the mode or motion information for the terminal CU in BL.

14. The method of claim 1, wherein the MVP information comprises one or a combination of a list of MVP candidates, an MVP candidate, an order of a list of MVP candidates, a list of merge candidates, a merge candidate, the order of a list of merge candidates, merge index, and MVP index.

15. A device for encoding a CU structure (coding unit), encoding mode information, or encoding motion information for scalable video encoding, in which video data is configured into a base layer (BL) and an enhancement layer (EL), while the EL has a higher spatial resolution or Better video quality than BL, and the device contains:
means for determining a CU structure (coding unit structure), a mode, motion information, or a combination of a CU structure, a mode and motion information for a CU (coding unit) in BL; and
means for determining a CU structure, mode, motion vector predictor information (MVP), or a combination of CU structure, mode, MVP information for a corresponding CU in an EL based on a CU structure, mode, motion information, or a combination of CU structure, mode, motion information for CU in BL, respectively;
wherein the mode is a skip mode, a merge mode, or an internal mode.

16. The apparatus of claim 15, further comprising means for turning on the first flag to indicate whether said determination of the CU structure, mode, MVP information, or a combination of CU structure, mode, and MVP information for the corresponding CU in the EL is predicted based on the CU structure, mode, motion information, or a combination of CU structure, mode and motion information for CU in BL, respectively, or not.

17. The device of claim 15, wherein the CU structure is a CU split flag or a residual quadrant tree split flag, wherein the motion information comprises one or a combination of an external prediction direction, a reference image index, a motion vector, a merge index and an MVP index, when referred to determining a combination of the CU structure, mode, and MVP information for the corresponding CU in the EL is based on a combination of the CU structure, mode, and motion information for the CU in the BL, respectively.

18. The device of claim 15, wherein the CU is an end CU, and wherein said definition of mode and MVP information for the corresponding CU in the EL is based on the mode and motion information, respectively, for the CU in the BL.

19. A method of extracting MVP (motion vector predictor) or extracting merge candidates for scalable video encoding, in which video data is configured into a base layer (BL) and an enhancement layer (EL), wherein EL has higher spatial resolution or better video quality than BL, wherein the method comprises the steps of:
determining motion information in BL; and
retrieving motion vector predictor candidates (MVPs) or EL merge candidates based on the motion information BL.

20. The method of claim 19, wherein said extraction of motion vector predictor (MVP) candidates or merge candidates in the EL based on the motion information in BL is performed when encoding or decoding video data needs to extract MVP candidates or merge candidates, respectively, in the EL .

21. The method according to p. 19, in which the list of MVP candidates for EL includes at least one MV (motion vector) in BL.

22. The method of claim 21, wherein the MV of BL comprises the MV of the corresponding PU (prediction units) in BL, the MV of the neighboring PU of the corresponding PU in BL, the candidate merger MV of the corresponding PU in BL, the MVP of the corresponding PU in BL, or the co-located MV of the corresponding PU to BL.

23. The method according to p. 21, in which the MV of the BL is scaled up for the MVP list according to the resolution of the video between the ELs relative to BL.

24. The method of claim 19, wherein at least the motion vector in BL replaces at least the MVP candidate of the list of MVP candidates in EL or is added to the list of MVP candidates in EL.

25. The method of claim 24, wherein the MV of BL comprises MV of the corresponding PU in BL, MV of the neighboring PU of the corresponding PU in BL, MV of a candidate merging the corresponding PU in BL, MVP of the corresponding PU in BL, or co-located MV of the corresponding PU in BL.

26. The method according to p. 24, in which the MV of the BL is scaled up for the list of MVP candidates according to the resolution of the video between ELs relative to BL.

27. An MVP (motion vector predictor) extractor or merge candidate extractor for scalable video encoding, in which video data is configured into a base layer (BL) and an enhancement layer (EL), wherein EL has a higher spatial resolution or better video quality than BL, wherein the device comprises:
means for determining motion information for in BL; and
means for extracting motion vector predictor candidates (MVPs) or EL merge candidates based on the motion information BL.

28. The device according to p. 27, in which at least the motion vector in BL replaces at least the MVP candidate of the list of MVP candidates in EL or added to the list of MVP candidates in EL.

29. A method for extracting an intra prediction mode for scalable video coding in which video data is configured in a base layer (BL) and an enhancement layer (EL), wherein EL has a higher spatial resolution or better video quality than BL, the method comprising the steps , where:
determining BL intra prediction mode information; and
extracting the intra prediction mode EL based on the intra prediction mode information BL;
in this case, the intra prediction mode BL is added to the list of MRM (most probable modes) for EL.

30. The method of claim 29, wherein said retrieving the intra prediction mode EL based on the intra prediction mode information BL is performed when the encoder needs to encode the intra prediction mode EL.

31. The method of claim 29, wherein said extraction of the intra prediction mode EL based on the intra prediction mode information BL is performed when the decoder needs to decode the intra prediction mode EL.

32. The method of claim 29, wherein the intra prediction mode information BL comprises one or a combination of the intra prediction mode of the corresponding PU (prediction unit) in BL, the neighboring direction mode of the BL intra prediction mode, and the neighboring direction mode of the intra prediction mode or the neighboring intra prediction mode PU corresponding PU in BL.

33. The method according to p. 29, in which the order of the list of MPM (most likely modes) for the EL changes adaptively according to the information of the intra prediction mode BL.

34. The method of claim 29, wherein the codeword for the remaining mode associated with the intra prediction mode EL depends on the prediction direction of the remaining mode, and the codeword is shorter if the forecast direction of the remaining mode is closer to the prediction direction of the intra prediction mode BL .

35. The method of claim 29, wherein the intra prediction mode is an intra prediction mode of a luminance signal or an intra prediction mode of a chroma signal.

36. An intra prediction mode extraction device for scalable video encoding, in which video data is configured into a base layer (BL) and an enhancement layer (EL), and wherein EL has a higher spatial resolution or better video quality than BL, the device comprising:
means for determining information of the intra prediction mode BL; and
means for extracting an intra prediction mode EL based on information of an intra prediction mode BL;
in this case, the intra prediction mode BL is added to the list of MRM (most probable modes) for EL.