WO2016098280A1

WO2016098280A1 - Video encoding apparatus, video decoding apparatus and video delivery system

Info

Publication number: WO2016098280A1
Application number: PCT/JP2015/005758
Authority: WO
Inventors: 慶一蝶野
Original assignee: 日本電気株式会社
Priority date: 2014-12-16
Filing date: 2015-11-18
Publication date: 2016-06-23
Also published as: JPWO2016098280A1

Abstract

A video encoding apparatus comprises: a low resolution layer encoding means that encodes a low resolution video and outputs a low resolution layer bitstream; a high resolution layer encoding means that encodes a high resolution video and outputs a high resolution layer bitstream; and a multiplexing means that multiplexes the low resolution layer bitstream and the high resolution layer bitstream and outputs a scalable bitstream. The video encoding apparatus is provided with an auxiliary information generating means that multiplexes, with the scalable bitstream, auxiliary information required for identifying an area of interest included in the high resolution layer bitstream.

Description

Video encoding device, video decoding device, and video distribution system

The present invention relates to a video encoding device, a video decoding device, and a video distribution system that use a scalable encoding system.

In the video coding system based on Scalable High-efficiency Video Coding (SHVC) described in Non-Patent Document 1, a low resolution video obtained by down-sampling an input image is encoded as a low resolution layer (BL: Base Layer). And the input image is encoded as a high resolution layer (EL: Enhancement Layer).

Each frame of a video with a resolution corresponding to BL and each frame of a video with a resolution corresponding to EL are each divided into a coding tree unit (CTU). Each CTU is processed in the raster scan order, and is recursively divided into coding units (CU: Coding Unit) in a quad tree structure and encoded.

FIG. 12 shows an example of CTU partitioning of frame t when the spatial resolution of the frame is CIF (CIF: Common Intermediate Format) and CTU size is 64, and CU recursion of the eighth CTU (CTU8) included in frame t It is explanatory drawing which shows the example of a division | segmentation. In the example shown in FIG. 12, it is assumed that the resolution of the EL video is 352 pixels × 288 pixels, and the resolution of the BL video is 176 pixels × 144 pixels.

In SHVC, CU is a coding unit for intra prediction, interframe prediction, and interlayer prediction. Hereinafter, intra prediction, interframe prediction, and interlayer prediction will be described.

Intra prediction is prediction in which a prediction image is generated from a reconstructed image of an encoding target frame. Non-Patent Document 1 defines 33 types of angle intra prediction shown in FIG. In the angle intra prediction, the reconstructed pixels around the encoding target block are extrapolated in any of the 33 types of directions shown in FIG. 13 to generate an intra prediction signal.

Inter-frame prediction is prediction based on an image of a reconstructed frame (reference picture) having a display time different from that of an encoding target frame. Hereinafter, inter-frame prediction is also referred to as inter prediction. FIG. 14 is an explanatory diagram illustrating an example of inter-frame prediction. The motion vector MV = (mv _x , mv _y ) indicates the parallel movement amount of the reconstructed image block of the reference picture with respect to the encoding target block. Inter prediction generates an inter prediction signal based on a reconstructed image block of a reference picture (using pixel interpolation if necessary).

Inter-layer prediction is prediction based on an upsampled image of a coded BL frame, and is classified as a type of inter prediction. FIG. 15 is an explanatory diagram showing inter-layer prediction. In inter-layer prediction, an inter-layer prediction signal is generated by inter-frame prediction of an up-sampled image obtained by up-sampling an encoded BL frame to the same resolution as an EL frame.

Hereinafter, a CU using intra prediction is called an intra CU, and a CU using inter prediction or inter-layer prediction is called an inter CU.

Next, with reference to FIG. 16, the configuration and operation of a general video encoding apparatus using the scalable encoding method will be described.

The video encoding device shown in FIG. 16 includes a BL encoder 101 that encodes each CTU of the BL frame, an EL encoder 102 that encodes each CTU of the EL frame, and a bit stream of the BL frame and the EL frame. Multiplexer 103 する that multiplexes these bit streams.

The BL encoder 101 encodes each CTU of the BL frame based on intra prediction and inter-frame prediction, and outputs a BL bit stream. The EL encoder 102 uses the up-sampled image of the BL frame that has been encoded by the BL encoder 101 として as a reference image, and each CTU of the EL frame is based on intra prediction, inter-frame prediction, and inter-layer prediction. Encode and output EL bitstream.

Note that a downsampler (not shown) provided in the front stage of the apparatus shown in FIG. 16 downsamples the input image to generate a BL video.

The multiplexer 103 multiplexes the BL bit stream and the EL bit stream, and outputs a scalable bit stream.

When the above-described video encoding device is applied to a use in which only the high-resolution video of the region of interest in the screen is reproduced on the receiving side, the following two problems occur.
・ Distribution of compressed data outside the region of interest is also necessary, so the transmission band cannot be used effectively. ・ Decryption processing of compressed data outside the region of interest is necessary, so extra decryption processing for regions unnecessary for viewing occurs.

An object of the present invention is to effectively use a transmission band and suppress an extra decoding process for an area unnecessary for viewing.

A video encoding apparatus according to the present invention includes a low resolution layer encoding means for encoding a low resolution video and outputting a low resolution layer bit stream, and a high resolution layer for encoding a high resolution video and outputting a high resolution layer bit stream. A video encoding apparatus having encoding means and multiplexing means for multiplexing a low resolution layer bit stream and a high resolution layer bit stream to output a scalable bit stream, wherein the region of interest included in the high resolution layer bit stream Auxiliary information generating means for generating auxiliary information necessary for identification is provided, and the multiplexing means multiplexes the auxiliary information into a scalable bitstream.

A video decoding apparatus according to the present invention includes a separating unit that separates a low resolution layer bit stream and a high resolution layer bit stream from a scalable bit stream, and a low resolution layer decoding that decodes the low resolution layer bit stream and outputs a low resolution video. And a high-resolution layer decoding unit that decodes the high-resolution layer bitstream and outputs a high-resolution video, and includes auxiliary information necessary for identifying a region of interest included in the high-resolution layer bitstream Auxiliary information decoding means for decoding from the scalable bit stream is provided.

The video encoding method according to the present invention encodes a low resolution video and outputs a low resolution layer bit stream, encodes a high resolution video and outputs a high resolution layer bit stream, and outputs the low resolution layer bit stream and the high resolution layer. A video encoding method that multiplexes a bitstream and outputs a scalable bitstream, generates auxiliary information necessary to identify a region of interest included in the high-resolution layer bitstream, and multiplexes the auxiliary information into the scalable bitstream It is characterized by becoming.

The video decoding method according to the present invention separates a low resolution layer bit stream and a high resolution layer bit stream from a scalable bit stream, decodes the low resolution layer bit stream, outputs a low resolution video, and converts the high resolution layer bit stream to A video decoding method for decoding and outputting a high-resolution video, characterized in that auxiliary information necessary for identification of a region of interest included in a high-resolution layer bitstream is decoded from the scalable bitstream.

A video encoding program according to the present invention includes a process for encoding a low resolution video and outputting a low resolution layer bitstream to a computer, a process for encoding a high resolution video and outputting a high resolution layer bitstream, and a low resolution Multiplexing the layer bitstream and the high-resolution layer bitstream and outputting a scalable bitstream, and generating auxiliary information necessary for identifying the region of interest included in the high-resolution layer bitstream A process of multiplexing information into a scalable bitstream is executed.

The video decoding program according to the present invention includes a process for separating a low resolution layer bit stream and a high resolution layer bit stream from a scalable bit stream and a process for decoding the low resolution layer bit stream and outputting a low resolution video to a computer. , Decoding the high resolution layer bitstream and outputting the high resolution video, and further decoding the auxiliary information necessary for identifying the region of interest contained in the high resolution layer bitstream from the scalable bitstream. It is made to perform.

A video distribution system according to the present invention includes the above video encoding device and the above video decoding device.

The video distribution method according to the present invention is characterized in that the video encoding method and the video decoding method are implemented.

According to the present invention, it is not necessary to distribute a bit stream other than the region of interest on the video encoding device side, so that the transmission band is effectively used. In addition, since the bit stream decoding process other than the region of interest is not required on the video decoding device side, an extra decoding process for an area unnecessary for viewing is suppressed.

It is a block diagram which shows 1st Embodiment of a video coding apparatus. It is explanatory drawing which shows an example of EL image | video and a region of interest. It is explanatory drawing of el_roi_descriptor (). It is explanatory drawing which shows an example of EL bit stream. It is a flowchart which shows operation | movement of the video coding apparatus of 1st Embodiment. It is a block diagram which shows 2nd Embodiment of a video decoding apparatus. It is a flowchart which shows operation | movement of the video decoding apparatus of 2nd Embodiment. It is a block diagram which shows an example of a video delivery system. It is a block diagram which shows the structural example of the information processing system which can implement | achieve the function of the video coding apparatus and video decoding apparatus by this invention. It is a block diagram which shows the principal part of the video coding apparatus by this invention. It is a block diagram which shows the principal part of the video decoding apparatus by this invention. FIG. 10 is an explanatory diagram illustrating an example of CTU partitioning of frame t and a CU recursive partitioning of CTU8 of frame t. It is explanatory drawing which shows the example of 33 types of angle intra prediction. It is explanatory drawing which shows the example of inter-frame prediction. It is explanatory drawing which shows the prediction between layers. It is explanatory drawing which shows the structure of a general video coding apparatus.

Embodiment 1. FIG.
FIG. 1 is a block diagram showing the configuration of the video encoding apparatus of the present embodiment. 1 includes a BL encoder 101 that encodes each CTU of a BL frame, an EL encoder 102 that encodes each CTU of an EL frame, a bit stream of the BL frame, and bits of the EL frame A multiplexer 103 that multiplexes the stream, and an auxiliary information generator 104 that multiplexes auxiliary information necessary for identifying the region of interest of the EL frame included in the EL bitstream into the scalable bitstream.

The BL encoder 101 encodes each CTU of the BL frame based on intra prediction and inter-frame prediction, and outputs a BL bit stream in the same manner as the above-described general BL encoder (see FIG. 16).

The EL encoder 102 uses the upsampled image of the BL frame encoded by the BL encoder 101 as a reference image, and calculates each CTU of the EL frame based on intra prediction, interframe prediction, and interlayer prediction. Encode and output EL bitstream. However, unlike the above-described general EL encoder (see FIG. 16), the type of prediction for each CTU and the output EL based on the coordinates of the EL region of interest (EL region of interest) set from the outside Control the bitstream.

The control related to scalability in this embodiment will be described using the example of the EL image and the region of interest shown in FIG.

If the EL region of interest does not include the CTU of the EL frame to be encoded, the EL encoder 102 only fills the high-resolution image corresponding to the CTU with the inter-layer prediction image and does not output the EL bitstream . That is, for the same CTU, the cu_split_flag syntax of the value corresponding to the CU partition at the maximum size allowed from the relationship between the EL resolution and the CTU size, the part_modeg syntax of the value corresponding to the 2Nx2N prediction unit shape, (zero motion vector ) Syntax such as pred_mode_flag, merge_flag, merge_idx, inter_pred_idc, ref_idx_l0, ref_idx_l1, mvp_l0_flag, mvp_l1_flag, mvd_coding (), and cb of f corresponding to the presence of prediction error, cb The operation is performed as if the syntax is multiplexed and sent in the bitstream (as is apparent from FIG. 4B, it is not actually multiplexed). Details of the syntax are described in Non-Patent Document 2.

In other cases, that is, in the case where the EL region of interest includes the CTU of the EL frame to be encoded, the EL encoder 102 is similar to the above-described general EL encoder in that the code included in the EL region of interest is encoded. The CTU of the EL frame to be converted is encoded based on intra prediction, interframe prediction, and interlayer prediction, and an EL bitstream is output.

The auxiliary information generator 104 多重 multiplexes the number of slice segments including the EL region of interest (EL region of interest slice segment), the head CTU address and the number of CTUs in each EL region of interest slice segment, into a scalable bitstream.

For example, using el_roi_descriptor () as shown in Fig. 3 (b), the number of EL region-of-interest slice segments, the address of the EL region-of-interest slice segment (the start CTU address in the EL region-of-interest slice segment), and the EL region-of-interest slice segment The number of CTUs can be multiplexed using num_el_roi_slice_segment_minus1 syntax, el_roi_slice_segment_address [i] syntax, and num_el_roi_slice_segment_ctus_minus1 [i] syntax.

In the example shown in FIG. 2, the number of EL region of interest slice segments is 2, the address of the first EL region of interest slice segment is 8, the number of CTUs is 2, the address of the next EL region of interest slice segment is 15, The number of CTUs is 2,
num_el_roi_slice_segment_minus1 = 1, el_roi_slice_segment_address [0] = 8,
num_el_roi_ slice_segment_ctus_minus1 [0] = 1, el_roi_slice_segment_address [1] = 14,
num_el_roi_ slice_segment_ctus_minus1 [1] = 1.

Figure 4 is an explanatory diagram showing an example of an EL bitstream. FIG. 4A illustrates an EL bit stream output from the above-described general video encoding device. FIG. 4 (b) illustrates an EL bit stream output from the video encoding apparatus according to the present embodiment.

Since the receiving side can identify the EL region of interest by using el_roi_descriptor (), in this embodiment, unlike the above-described general video encoding device, it is not necessary to output a bitstream of CTUEL other than the EL region of interest ( (See Figure 4 (b) IV).

Next, the operation of the video encoding device of this embodiment will be described with reference to the flowchart in FIG.

In step S101, the BL encoder 101 encodes each CTU of the BL frame. In step S102, the EL encoder 102 encodes each CTU of the EL frame included in the EL region of interest.

In step S103, the auxiliary information generator 104 generates el_roi_descriptor () as auxiliary information necessary for identifying the region of interest of the EL frame. As described above, the auxiliary information includes information on the number of EL region-of-interest slice segments, the address of the EL region-of-interest slice segment, and the number of CTUs of the EL region-of-interest slice segment.

In step S104, the multiplexer 103 multiplexes the BL bit stream, EL bit stream, and el_roi_descriptor () (and outputs a scalable bit stream.

Embodiment 2. FIG.
FIG. 6 is a block diagram showing the configuration of the video decoding apparatus according to this embodiment. 6 includes a separator 201 that separates a scalable bitstream, a BL decoder 202 that decodes each CTU of the BL bitstream, an auxiliary information decoder 203 that decrypts el_roi_descriptor (), and And an EL decoder 204 for decoding each CTU of the EL bitstream using the decoded region of interest (specifically, information capable of identifying the region of interest).

Note that the video decoding apparatus shown in FIG. 6B receives a bitstream from the video encoding apparatus using el_roi_descriptor () 例示 illustrated in FIG.

The separator 201 extracts the BL bit stream, EL bit stream, and el_roi_descriptor () by separating the scalable bit stream.

The BL decoder 202 decodes each CTU included in the BL bitstream and reconstructs the BL video.

The auxiliary information decoder 203 decodes el_roi_descriptor () script and outputs the region of interest of the EL frame.

The EL decoder 204 decodes each CTU included in the EL bitstream based on the region of interest of the EL frame supplied from the auxiliary information decoder 203 to reconstruct the EL video. An EL video image in a region where no EL bitstream exists (region other than the region of interest) is filled with an inter-layer prediction image. In other words, for each CTU included in the region other than the region of interest, it corresponds to a cu_split_flag syntax with a value corresponding to the CU partition at the maximum size allowed from the relationship between EL resolution and CTU size, and a 2N × 2N prediction unit shape Part_modeg syntax of values, pred_mode_flag, merge_flag, merge_idx, inter_pred_idc, ref_idx_l0, ref_idx_l1, mvp_l0_flag, mvp_l1_flag, and the presence of mvd_coding () It operates as if the corresponding values of syntax such as cbf_luma, cbf_cb, cbf_cr were decoded from the bitstream (not actually decoded as can be seen from the first embodiment described above).

Next, the operation of the video decoding apparatus of this embodiment will be described with reference to the flowchart in FIG.

In step S201, the separator 201 separates the scalable bit stream, and extracts the BL bit stream, EL bit stream, and el_roi_descriptor ().

In step S202, the BL decoder 202 decodes each CTU of the BL bit stream extracted by the separator 201.

In step S203, the auxiliary information decoder 203 decodes el_roi_descriptor () extracted by the separator 201 and outputs the region of interest of the EL frame.

In step S204, the EL decoder 204 decodes each CTU of the EL bitstream extracted by the separator 201. In addition, the EL decoder 204 identifies a region where there is no EL bitstream (a region other than the region of interest) based on the region of interest (specifically, information that can identify the region of interest) output from the auxiliary information decoder 203. Then, the EL video image of the identified region is filled with the inter-layer prediction image.

Embodiment 3. FIG.
FIG. 8 is a block diagram showing an example of a video distribution system using the video encoding device (encoder) of the first embodiment described above and the video decoding device (decoder) of the second embodiment described above.

In the video distribution system shown in FIG. 8B, the distribution side includes the encoder 100B of the first embodiment, and the reception side includes the decoder 200B of the second embodiment. The bit stream from the encoder 100 is transmitted to the decoder 200 via the network 300. FIG. 8 (b) also shows a user 400 ユーザ on the distribution side and a display device 500 on the receiving side.

As an example, the distribution side is equipment in a content distribution system or a broadcasting station, and the reception side is a television receiver, a personal computer, or a portable terminal.

In the video distribution system according to the present embodiment, since it is not necessary to distribute a bit stream other than the region of interest on the encoder 100 side, the transmission band is effectively used. In addition, since the bit stream decoding process other than the region of interest is unnecessary on the decoder side, an extra decoding process for an area unnecessary for viewing is suppressed.

Note that each of the above embodiments can be configured by hardware, but can also be realized by a computer program.

The information processing system shown in FIG. 9 includes a processor 1001, a program memory 1002, a storage medium 1003 for storing video data, and a storage medium 1004 for storing a bitstream. The storage medium 1003 and the storage medium 1004 may be separate storage media, or may be storage areas composed of the same storage medium. A magnetic storage medium such as a hard disk can be used as the storage medium.

In the information processing system shown in FIG. 9B, the program memory 1002 stores a program for realizing the function of each block (excluding the buffer block) shown in FIG. 1 and FIG. Then, the processor 1001 implements the functions of the video encoding device and the video decoding device described in each of the above embodiments by executing processing in accordance with a program stored in the program memory 1002.

FIG. 10 is a block diagram showing the main part of the video encoding apparatus according to the present invention. As shown in FIG. 10, the video encoding apparatus encodes a low resolution layer frame (BL frame) and outputs a low resolution layer bit stream (BL bit stream). The low resolution layer encoding unit (BL encoding unit) 11, high resolution layer encoding means (EL encoding means) 12 for encoding a high resolution layer frame (EL frame) and outputting a high resolution layer bit stream (EL bit stream), a BL bit stream and an EL bit stream And a multiplexing means 13 for outputting a scalable bitstream, and an auxiliary information generating means 14 for generating auxiliary information necessary for identifying the region of interest of the EL frame included in the EL bitstream in the multiplexing means 13; And the multiplexing means 13 multiplexes the auxiliary information into the scalable bit stream.

FIG. 11 is a block diagram showing the main part of the video decoding apparatus according to the present invention. As shown in FIG. 11, the video decoding apparatus includes a separating unit 21 that separates a scalable bitstream, and a low-resolution layer that decodes the BL bitstream separated from the scalable bitstream and outputs a BL video (low-resolution video) Decoding means (BL decoding means) 22, auxiliary information decoding means 23 for decoding auxiliary information necessary for identifying the region of interest of the EL frame included in the EL bit stream separated from the scalable bit stream, and separation from the scalable bit stream A high-resolution layer decoding unit (EL decoding unit) 24 that decodes the EL bitstream and outputs an EL video (high-resolution video).

Although the present invention has been described with reference to the embodiments and examples, the present invention is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

This application claims priority based on Japanese Patent Application No. 2014-254478 filed on December 16, 2014, the entire disclosure of which is incorporated herein.

11 Low-resolution layer coding means (BL coding means)
12 High resolution layer coding means (EL coding means)
13 Multiplexing means 14 Auxiliary information generating means 21 Separating means 22 Low resolution layer decoding means (BL decoding means)
23 Auxiliary information decoding means 24 High resolution layer decoding means (EL decoding means)
100 encoder (video encoding device)
101 BL encoder 102 EL encoder 103 Multiplexer 104 Auxiliary information generator 200 Decoder (video decoding device)
201 Separator 202 BL decoder 203 Auxiliary information decoder 204 EL decoder 300 Network 400 User 500 Display device 1001 Processor 1002

Program memory

1003, 1004 Storage medium

Claims

Low resolution layer encoding means for encoding a low resolution video and outputting a low resolution layer bitstream;
High-resolution layer encoding means for encoding a high-resolution video and outputting a high-resolution layer bitstream;
A video encoding device including multiplexing means for multiplexing the low resolution layer bitstream and the high resolution layer bitstream to output a scalable bitstream;
Auxiliary information generating means for generating auxiliary information necessary for identification of a region of interest included in the high resolution layer bitstream,
The video encoding apparatus, wherein the multiplexing means multiplexes the auxiliary information into the scalable bitstream.
Separation means for separating the low resolution layer bitstream and the high resolution layer bitstream from the scalable bitstream;
Low resolution layer decoding means for decoding the low resolution layer bitstream and outputting a low resolution video;
A video decoding device having high resolution layer decoding means for decoding the high resolution layer bitstream and outputting a high resolution video,
A video decoding apparatus comprising: auxiliary information decoding means for decoding auxiliary information necessary for identifying a region of interest included in the high-resolution layer bitstream from the scalable bitstream.
Encode the low resolution video and output the low resolution layer bitstream,
Encode high resolution video and output high resolution layer bitstream,
A video encoding method for multiplexing the low-resolution layer bitstream and the high-resolution layer bitstream and outputting a scalable bitstream,
Generating auxiliary information necessary for identification of a region of interest included in the high resolution layer bitstream;
The video encoding method, wherein the auxiliary information is multiplexed into the scalable bitstream.
Separate the low resolution layer bitstream and the high resolution layer bitstream from the scalable bitstream,
Decoding the low resolution layer bitstream to output a low resolution video;
A video decoding method for decoding the high resolution layer bitstream and outputting a high resolution video,
Decoding auxiliary information necessary for identifying a region of interest included in the high-resolution layer bitstream from the scalable bitstream.
On the computer,
Encoding low resolution video and outputting a low resolution layer bitstream;
Encoding high-resolution video and outputting a high-resolution layer bitstream;
A video encoding program for executing a process of multiplexing the low resolution layer bit stream and the high resolution layer bit stream and outputting a scalable bit stream,
Generating auxiliary information necessary for identification of a region of interest included in the high resolution layer bitstream;
A video encoding program for executing a process of multiplexing the auxiliary information into the scalable bitstream.
On the computer,
Separating the low resolution layer bitstream and the high resolution layer bitstream from the scalable bitstream;
Decoding the low resolution layer bitstream and outputting a low resolution video;
A video decoding program for executing a process of decoding the high resolution layer bitstream and outputting a high resolution video,
A video decoding program for executing processing for decoding auxiliary information necessary for identifying a region of interest included in the high-resolution layer bitstream from the scalable bitstream.
A video distribution system comprising the video encoding device according to claim 1 and the video decoding device according to claim 2.
A video distribution method for executing the video encoding method according to claim 3 and the video decoding method according to claim 4.