WO2017051493A1

WO2017051493A1 - Video encoding device and video decoding device

Info

Publication number: WO2017051493A1
Application number: PCT/JP2016/003322
Authority: WO
Inventors: 貴之石田; 慶一蝶野
Original assignee: 日本電気株式会社
Priority date: 2015-09-25
Filing date: 2016-07-14
Publication date: 2017-03-30
Also published as: JPWO2017051493A1; JP6489227B2

Abstract

A video encoding device equipped with: an analysis means that analyzes encoding statistics information; an estimation means that, on the basis of the analysis result from the analysis means, estimates whether it is possible to select an optimal motion vector near a slice boundary; and an encoding structure determination means that, on the basis of the estimation result from the estimation means, adaptively sets the encoding structure as a SOP structure formed only with pictures for which the temporal ID is 0, a SOP structure formed with pictures for which the temporal ID is 0 and pictures for which the temporal ID is 1, a SOP structure formed with pictures for which the temporal ID is 0, pictures for which the temporal ID is 1, and pictures for which the temporal ID is 2, or a SOP structure formed with pictures for which the temporal ID is 0, pictures for which the temporal ID is 1, pictures for which the temporal ID is 2, and pictures for which the temporal ID is 3.

Description

Video encoding device and video decoding device

The present invention relates to a video encoding device, a video decoding device, a video system, a video encoding method, and a video encoding program based on an encoding method in which a video screen is divided and then compressed.

In response to the demand for higher definition of video, full HD (High Definition) video content of 1920 (horizontal) × 1080 (pixel) in the horizontal direction is supplied. In addition, test broadcasts and commercial broadcasts of high-definition video (hereinafter referred to as 4K) in the horizontal direction 3840 × vertical direction 2160 (pixels) have been started. Furthermore, commercial broadcasting of high-definition video (hereinafter referred to as “8K”) in the horizontal direction 7680 × vertical direction 4320 (pixels) is planned.

In video content distribution systems, video signals are generally encoded based on the H.264 / AVC (Advanced Video Coding) standard or HEVC (High Efficiency Video Coding) standard on the transmission side, and video is decoded and processed on the reception side. The signal is reproduced, but in the case of 8K, since the number of pixels is large, the processing load in the encoding process and the decoding process becomes high.

As a method for reducing the processing load in the case of 8K, there is, for example, screen quadrant coding using a slice described in Non-Patent Document 1 (see FIG. 10). As illustrated in FIG. 11, in Non-Patent Document 1, when quad-frame coding is used, when inter prediction is performed, a motion vector for motion compensation (MC) is detected in a block near a slice boundary. There is a restriction that the vertical (vertical) component is 128 pixels or less. Note that there is no restriction on the motion vector range in the vertical direction across the slice boundary (hereinafter referred to as motion vector restriction) for blocks that do not belong to the vicinity of the slice boundary.

When there is a motion vector restriction, when encoding a scene where an object on the screen or the entire screen moves fast in the vertical direction, an optimal motion vector may not be selected at the slice boundary. As a result, local image quality degradation may occur. The degree of deterioration increases as the M value increases during fast movement. The M value is a reference picture interval. Note that “optimal motion vector” means an original (normal) motion vector selected by a predictor that performs inter-screen prediction (inter prediction) processing in the video encoding device.

FIG. 13 illustrates the interval between reference pictures when M = 4 and M = 8. In general, when the M value is small, the inter-frame distance is small, so the value of the motion vector tends to be small. However, especially in a stationary scene, since the time-direction hierarchy is reduced, code amount distribution according to the hierarchy (layer) is restricted, so that the encoding efficiency is lowered. On the other hand, when the M value is large, the inter-frame distance is large, so that the value of the motion vector tends to be large. However, especially in a stationary scene, the number of time direction hierarchies increases, so that restrictions on code amount distribution according to the hierarchies (layers) are relaxed, so that coding efficiency is improved. As an example, when the M value is changed from 8 to 4, the value of the motion vector is halved, and when the M value is changed from 4 to 8, the value of the motion vector is doubled.

In Non-Patent Document 1, the concept of SOP (Set of pictures) is introduced. SOP is a unit for describing the coding order and reference relationship of each AU (Access Unit) when performing temporal direction hierarchical encoding. The temporal direction hierarchical coding is coding that enables partial extraction of a frame from a plurality of frames of video.

The SOP structure includes a structure with L = 0, a structure with L = 1, a structure with L = 2, and a structure with L = 3. As shown in FIG. 14, Lx (x = 0, 1, 2, 3) has the following structure.
Structure with L = 0: SOP structure composed only of pictures with Temporal ID 0 (that is, the number of stages of pictures included in the SOP is 1. It can be said that L indicating the maximum Temporal ID is 0. )
Structure with L = 1: SOP structure composed of a picture with a temporal ID of 0 and a picture of 1 (that is, the number of stages of pictures included in the SOP is two. L indicating the maximum temporal ID is 1) It can also be said.)
Structure with L = 2: SOP structure composed of a picture with a temporal ID of 0, a picture of 1, and a picture of 2 (that is, the number of stages of pictures included in the SOP is three. The maximum temporal ID is It can be said that L shown is 2.)
Structure with L = 3: SOP structure composed of picture with

Temporal ID

0, 1 picture, 2 pictures, and 3 pictures (that is, the number of stages of pictures included in the SOP is four. (It can be said that L indicating Temporal ID is 3.)

In the description of the present specification, M = 1 corresponds to a SOP having a structure of L = 0, M = 2 corresponds to a SOP having a structure of L = 1 when N = 1 (see FIG. 14), and M = 3 corresponds to the SOP having the structure of L = 1 in the case of N = 2 (see FIG. 14), M = 4 corresponds to the SOP having the structure of L = 2, and M = 8 corresponds to the SOP 構造 having the structure of L = 3. Corresponding to

For a stationary scene (for example, an object in the screen or a scene in which the entire screen does not move quickly), as described above, the coding efficiency increases as the reference picture interval (M value) increases. Therefore, in order to encode a high-definition video such as 8K at a low rate, it is preferable that the video encoding device basically operates at M = 8.

However, as described above, when the M value is increased, the value of the motion vector tends to increase. Therefore, particularly in a scene in which an object on the screen or the entire screen moves fast in the vertical direction, the image quality is caused by the motion vector limitation. Deteriorates. This is because an optimal motion vector may not be selected at a slice boundary due to motion vector limitation.

The present invention is an encoding method that compresses after dividing a video screen, and an object of the present invention is to suppress deterioration in image quality when using an encoding method that restricts motion vector selection in the vicinity of a slice boundary. And

A video encoding device according to the present invention is a video encoding device that divides a video into a predetermined number of slices and performs encoding processing under a motion vector restriction in the vicinity of a slice boundary, and analyzes the encoding statistical information Means, an estimation means for estimating whether or not an optimal motion vector can be selected in the vicinity of the slice boundary based on the analysis result of the analysis means, and an encoding structure based on the estimation result of the estimation means, Temporal ID is 0 SOP structure consisting only of pictures, Temporal ID SOP structure consisting of 0 and 1 pictures, Temporal ID ピクチ

ャ picture

0, 1 picture, and SOP structure consisting of 2 pictures, Temporal Coding structure determining means for adaptively determining any one of SOP structures composed of a picture with ID ０ 0, 1 picture, 2 pictures and 3 pictures It is characterized by.

The video decoding apparatus according to the present invention includes an SOP structure composed only of pictures having a temporal ID of 0, an SOP structure composed of pictures having a temporal ID of 0 and 1 pictures, a picture having a temporal ID of 0, a picture of 1, Decoding means for decoding a video encoded with any one of the SOP structure composed of the

pictures

1 and 2 and the SOP structure composed of the picture whose Temporal ID is 0, 1 picture, 2 pictures, and 3 pictures It is characterized by providing.

A video encoding method according to the present invention is a video encoding method that divides a video into a predetermined number of slices and performs an encoding process under a motion vector restriction near a slice boundary, and analyzes encoding statistical information, Based on the analysis result, it is estimated whether or not an optimal motion vector can be selected near the slice boundary, and based on the estimation result, the coding structure is composed of an SOP structure consisting only of pictures whose Temporal ID is 0, Temporal ID SOP structure composed of 0 picture and 1 picture, picture whose Temporal ID is 0, SOP structure composed of 1 picture and 2 pictures, picture whose Temporal ID is 0, picture of 1 and 2 It is characterized in that it is adaptively determined to one of the SOP structures composed of pictures and three pictures.

The video decoding method according to the present invention includes an SOP structure composed only of pictures whose Temporal ID is 0, an SOP structure composed of pictures whose Temporal ID is 0 and 1 pictures, pictures whose Temporal ID is 0, 1 pictures, And SOP structure composed of two pictures, Temporal ID ０

picture

0, 1 picture, 2 pictures, and SOP ３ structure composed of 3 pictures Features.

A video encoding program according to the present invention includes a computer that analyzes encoding statistical information, a process that estimates whether an optimal motion vector can be selected in the vicinity of a slice boundary based on the analysis result, and an estimation result The coding structure is based on the SOP structure consisting only of pictures whose Temporal ID is 0, the SOP structure consisting of pictures whose Temporal ID is 0 and 1 pictures, pictures whose Temporal ID is 0 and 1 pictures. And SOP structure composed of two pictures, and Temporal ID ピクチャ is a picture having a zero value, 1 picture, 2 pictures, and SOP 構成される structure composed of 3 pictures. It is characterized by.

The video decoding program according to the present invention allows a computer to store an SOP structure composed only of a picture whose Temporal ID is 0, an SOP structure composed of a picture whose Temporal ID is 0 and a picture of 1 and a picture whose Temporal ID is 0, 1 Decode a video encoded with one of the SOP structure consisting of the picture of, and the picture of 2 and the SOP structure consisting of the picture whose Temporal ID is 0, 1 picture, 2 pictures, and 3 pictures It is characterized in that the processing is performed.

According to the present invention, image quality deterioration can be suppressed.

It is a block diagram which shows the structural example of embodiment of a video coding apparatus. It is a block diagram which shows the structural example of embodiment of a video decoding apparatus. It is a flowchart which shows operation | movement of 1st Embodiment of a video coding apparatus. It is a flowchart which shows operation | movement of 2nd Embodiment of a video coding apparatus. It is a flowchart which shows operation | movement of 3rd Embodiment of a video coding apparatus. It is a block diagram which shows an example of a video system. It is a block diagram which shows the structural example of the information processing system which can implement | achieve the function of a video coding apparatus and a video decoding apparatus. It is a block diagram which shows the principal part of a video coding apparatus. It is a block diagram which shows the principal part of a video decoding apparatus. It is explanatory drawing which shows an example of a screen division. It is explanatory drawing for demonstrating a motion vector restriction | limiting. It is explanatory drawing which shows an SOP structure. It is explanatory drawing which shows an example of the space | interval of a reference picture. It is explanatory drawing which shows an SOP structure.

Hereinafter, embodiments of the present invention will be described with reference to the drawings.

FIG. 1 is a block diagram illustrating a configuration example of an embodiment of a video encoding device. A video encoding apparatus 100 illustrated in FIG. 1 includes an encoding unit 101, an analysis unit 111, a determination unit 112, and an M value determination unit 113. Note that the video encoding apparatus 100 executes the encoding process based on the HEVC standard, but may execute the encoding process based on another standard, for example, the H.264 / AVC standard. Hereinafter, an example in which 8K video is input will be described.

The encoding unit 101 includes a screen divider 102 that divides an input image into a plurality of screens, a frequency transformer / quantizer 103, an inverse quantizer / inverse frequency transformer 104, a buffer 105, a predictor 106, and an entropy encoder. 107 is included.

The screen divider 102 divides the input video screen into four screens (see FIG. 10). The frequency converter / quantizer 103 performs frequency conversion on the prediction error image obtained by subtracting the prediction signal from the input video signal. The frequency transformer / quantizer 103 further quantizes the frequency-converted prediction error image (frequency transform coefficient). Hereinafter, the quantized frequency transform coefficient is referred to as a transform quantization value.

The entropy encoder 107 entropy encodes the prediction parameter and the transform quantization value, and outputs a bit stream. The prediction parameters are information related to CTU (Coding | Tree | Unit | Unit) and block prediction, such as prediction mode (intra prediction, inter prediction), intra prediction block size, intra prediction direction, inter prediction block size, and motion vector.

The predictor 106 generates a prediction signal for the input video signal. The prediction signal is generated based on intra prediction or inter-frame prediction.

The inverse quantization / inverse frequency converter 104 inversely quantizes the transform quantization value. Further, the inverse quantization / inverse frequency converter 104 performs inverse frequency conversion on the inversely quantized frequency conversion coefficient. The reconstructed prediction error image subjected to inverse frequency conversion is supplied with a prediction signal and supplied to the buffer 105. The buffer 105 stores the reconstructed image.

The analysis unit 111 analyzes the encoded statistical information. Based on the analysis result of the analysis unit 111, the determination unit 112 determines whether or not an optimal motion vector can be selected near the slice boundary with the above-described motion vector restriction. The encoding statistical information is information on the encoding result of a past frame (for example, a frame immediately before the current encoding target frame), and a specific example of the encoding statistical information will be described later.

Note that the vicinity of the slice boundary is an area in which an optimal motion vector could not be selected. However, for example, when realizing the following control, for example, a range of ± 128 pixels or a range of ± 256 pixels from the slice boundary. May be near the slice boundary. Further, when realizing the following control, the range of “near the slice boundary” may be appropriately changed according to the state of the video (such as large / small motion). For example, when the generation ratio of a motion vector having a large value is high, the range “near the slice boundary” may be set wide.

The M value determination unit 113 adaptively determines the M value based on the determination result of the determination unit 112. As described above, determining the M value is equivalent to determining the Lx (x = 0, 1, 2, 3) structure in the SOP structure. The encoded statistical information will be described later.

FIG. 2 is a block diagram illustrating a configuration example of an embodiment of the video decoding apparatus. The video decoding apparatus 200 shown in FIG. 2 includes an entropy decoder 202, an inverse quantization / inverse frequency converter 203, a predictor 204, and a buffer 205.

The entropy decoder 202 entropy decodes the video bitstream. The entropy decoder 202 supplies the transform quantization value subjected to entropy decoding to the inverse quantization / inverse frequency converter 203.

The inverse quantization / inverse frequency converter 203 obtains a frequency conversion coefficient by inversely quantizing the converted quantization values of luminance and chrominance with a quantization step width. Further, the inverse quantization / inverse frequency converter 203 performs inverse frequency conversion on the inversely quantized frequency conversion coefficient.

After the inverse frequency conversion, the predictor 204 generates a prediction signal using the image of the reconstructed picture stored in the buffer 205 (the prediction is also referred to as motion compensation prediction or MC reference). The reconstructed prediction error image subjected to inverse frequency conversion by the inverse quantization / inverse frequency converter 203 is added with the prediction signal supplied from the predictor 204 and supplied to the buffer 205 as a reconstructed picture. Then, the reconstructed picture stored in the buffer 205 is output as decoded video.

Next, operations of the analysis unit 111, the determination unit 112, and the M value determination unit 113 in the video encoding device 100 will be described.

Embodiment 1. FIG.
FIG. 3 is a flowchart showing the operation of the first embodiment of the video encoding apparatus 100 shown in FIG. In the first embodiment, it is assumed that an 8K video is divided into four (see FIG. 10) and there is a motion vector restriction near the slice boundary. Further, as a motion vector restriction, ± 128 is taken as an example. The 8K video is divided into four, and the motion vector limitation is the same in other embodiments. Note that the initial value of the M value is 8 (M = 8).

The analysis unit 111 analyzes past encoding results stored in the buffer 105 (for example, the encoding result of the immediately preceding frame). Specifically, the analysis unit 111 calculates an average value or median value of motion vectors in blocks other than the slice boundary (hereinafter, the average value or median value is referred to as M _avg ) (step S101). In the first embodiment, the encoded statistical information is a value of a motion vector, and the analysis result is an average value or a median value of the motion vectors.

The determination unit 112 determines how large M _avg is based on ± 128 as the motion vector limit (step S102).

Then, the M value determination unit 113 determines the M value based on the determination result of how large M _avg is (step S103).

The M value determination unit 113 determines the M value based on the determination result as follows, for example.

Even when the M value is any other value, the M value determining unit 113, when the M value is set to 8, as in the cases (1) and (2), When it is estimated that the value of the motion vector near the slice boundary is within ± 128, the M value is returned to 8. In other words, the M value determination unit 113 returns the M value to 8 when it can be estimated that an optimal motion vector can be selected near the slice boundary under the motion vector restriction. In other cases, the M value is determined in accordance with M _{avg so} that the value of the motion vector near the slice boundary is within ± 128.

Note that the above case division (threshold value setting) is an example, and the threshold value may be changed or a finer case division may be performed.

The control of the video encoding device of the first embodiment is based on the following concept.

When the video is a video of a scene in which the entire screen moves quickly, the ratio of the number of motion vectors having a large value is high for all generated motion vectors, both near the slice boundary and near the slice boundary. However, since there is a motion vector restriction, there is a possibility that an optimal motion vector is not selected near the slice boundary. Therefore, the determination unit 112 performs coding based on a motion vector as encoded statistical information generated in a region other than the slice boundary (there is no motion vector limitation, and thus a normal, in other words, an optimal motion vector). It is estimated whether or not the target screen is a screen image of a scene that moves fast. When the determination unit 112 estimates that the video is a fast moving scene, the M value determination unit 113 changes the M value so that an optimal motion vector can be selected in the vicinity of the slice boundary.

Note that if it is a fast-moving scene video, the optimal motion vector may not be selected in the vicinity of the slice boundary. This is equivalent to estimating that the optimal motion vector is not selected near the slice boundary.

Further, as described above, the M value and the SOP structure are correlated. Therefore, the M value determining unit 113 determining the M value is equivalent to determining the SOP structure (that is, the Lx (x = 0, 1, 2, 3) structure).

Embodiment 2. FIG.
FIG. 4 is a flowchart showing the operation of the second embodiment of the video encoding apparatus 100 shown in FIG.

The analysis unit 111 analyzes past encoding results stored in the buffer 105 (for example, the encoding result of the immediately preceding frame). Specifically, the analysis unit 111 calculates a block ratio P _{1 in} which intra-screen prediction (intra prediction) is used for all blocks (for example, PU: Prediction Unit) in a range other than the slice boundary. (step S201), for all the blocks in the vicinity of a slice boundary, and calculates the ratio _{P 2} of the blocks used are intra prediction (step S202). In the second embodiment, the encoded statistical information is a prediction mode of a block near a slice boundary (specifically, the number of blocks for intra-screen prediction), and the analysis result is a ratio P ₁ and a ratio P. ₂ .

The determination unit 112 compares the ratio P ₁ and the ratio P ₂ and determines the degree of deviation between them. Specifically, as compared with the ratio P _1, it determines whether or not the ratio P ₂ is quite large. Determination unit 112, for example, determines whether the difference between the ratio _{P 2} and the ratio _{P 1} exceeds a predetermined value (step S203).

M value determining unit 113, when the difference between the ratio _{P 2} and the ratio _{P 1} exceeds a predetermined value, the smaller the M value (step S204). A plurality of predetermined values are provided. For example, when the difference exceeds the first predetermined value, the M value is decreased by a plurality of levels, and the difference exceeds the second predetermined value (<first predetermined value). Sometimes the M value may be decreased by one step.

Further, M value determining unit 113, the difference between the ratio _{P 2} and the ratio _{P 1} is the case is less than the predetermined value, maintaining or M value, or increases the M value (step S205). For example, the M value determination unit 113 increases the M value when the difference is equal to or smaller than a third predetermined value (<second predetermined value), and maintains the M value when the difference exceeds the third predetermined value. To do.

The control of the video encoding device of the second embodiment is based on the following concept.

The encoding unit 101 can use either intra prediction or inter prediction (inter prediction) as a prediction mode when encoding each block in the screen. When the video is a video of a scene where the entire screen moves fast, the occurrence rate of the number of motion vectors having a large value when the inter-screen prediction is used is considered to be high even near the slice boundary (the motion vector limit is If not). Since there is a motion vector restriction, an optimal motion vector (large motion vector) cannot be generated near the slice boundary, and as a result, it is considered that intra prediction is often used near the slice boundary. . Since there is no motion vector limitation outside the vicinity of the slice boundary, it is considered that intra prediction is less used than in the vicinity of the slice boundary.

Therefore, when the ratio P ₁ and the ratio P ₂ are greatly different from each other, it is estimated that a video signal of a fast moving scene is input to the encoding unit 101.

Note that if it is a fast-moving scene video, the optimal motion vector may not be selected in the vicinity of the slice boundary. This is equivalent to the fact that the ratio P ₁ and the ratio P ₂ are greatly deviated.

As an example, the empirical or experimental use of such a value as a threshold value as a predetermined value for determining whether or not there is a large divergence may result in an optimal motion vector not being selected near the slice boundary. A value that can be estimated to be sexual is selected.

Embodiment 3. FIG.
FIG. 5 is a flowchart showing the operation of the third embodiment of the video encoding apparatus 100 shown in FIG.

The analysis unit 111 analyzes past encoding results stored in the buffer 105 (for example, the encoding result of the immediately preceding frame). Specifically, the analysis unit 111, the previous frame (e.g., two frames before the current encoding target frame) is calculated generated code amount C ₁ in the block in the vicinity of a slice boundary (step S301). Further, the analysis unit 111 calculates a generated code amount _{C 2} in the block near the slice boundary of the previous frame (step S302). In the third embodiment, the encoded statistical information is the generated code amount of blocks near the slice boundary, and the analysis results are the generated code amount C ₁ and the generated code amount C ₂ .

The determination unit 112 compares the generated code amount C ₁ and the generated code amount C ₂ and determines the degree of deviation between them. Specifically, as compared to the amount of generated codes C _1, determines whether or not considerably large amount of generated codes C _2. Determination unit 112, for example, determines whether the difference between the generated code amount _{C 2} and the generated code amount _{C 1} exceeds a predetermined amount (step S303).

M value determining unit 113, when the difference between the generated code amount C ₂ and the generated code amount C ₁ exceeds a predetermined amount, the smaller the M value (step S304). A plurality of predetermined amounts are provided. For example, when the difference exceeds the first predetermined amount, the M value is decreased by a plurality of levels, and the difference exceeds the second predetermined amount (<first predetermined amount). Sometimes the M value may be decreased by one step.

Further, M value determining unit 113, when the difference between the generated code amount C ₂ and the generated code amount C ₁ is equal to or less than the predetermined amount, maintaining or M value, or increases the M value (step S305 ). For example, the M value determination unit 113 increases the M value when the difference is equal to or smaller than a third predetermined amount (<second predetermined amount), and maintains the M value when the difference exceeds the third predetermined amount. To do.

The control of the video encoding device of the third embodiment is based on the following concept.

As described above, when the entire screen is a fast-moving scene video, the ratio of the number of motion vectors having a large value when the inter-screen prediction is used is considered to be high even near the slice boundary (motion vector restriction). If not). However, since there is a motion vector limitation, an optimal motion vector (large motion vector) cannot be generated near the slice boundary, and as a result, in-screen prediction is often used near the slice boundary. Conceivable. In general, the amount of generated code is larger when intra prediction is used than when inter prediction is used.

Therefore, when the generated code amount C ₂ is considerably larger than the generated code amount C ₁ , it is presumed that the situation has changed so that the video signal of the fast moving scene is input to the encoding unit 101.

Note that when it comes to a fast-moving scene video, the optimal motion vector may not be selected near the slice boundary. in is equivalent to the generated code amount C ₂ increases greatly.

As an example of a predetermined amount for determining whether or not it has greatly increased, empirically or experimentally, if such an amount is used as a threshold value, an optimal motion vector may not be selected in the vicinity of the slice boundary. A value that can be estimated is selected.

As described above, in each of the above embodiments, the M value is adaptively switched based on the past encoding result (encoding statistical information). Based on the encoded statistical information, it is estimated whether or not an optimal motion vector (in other words, a motion vector outside the motion vector limit) can be selected near the slice boundary under the motion vector restriction. If it is estimated that it cannot be selected, the M value is changed to a smaller value. If it is determined that it can be selected, it is considered that an optimal motion vector can be selected in the vicinity of the slice boundary under the motion vector restriction even with the current M value, so that the M value is maintained or a larger value. Changed to

As a result, it is possible to avoid as much as possible that the optimal motion vector cannot be selected near the slice boundary due to the motion vector restriction, and the possibility that local image quality degradation will occur can be reduced. That is, since the M value is adaptively switched according to the speed of movement, a suitable image quality can be obtained.

In addition, since the M value can be switched based on the encoding result (for example, the encoding result of the immediately preceding frame), pre-analysis (analysis processing executed as pre-processing when encoding the current frame) is performed. It is not necessary to perform this, and it is possible to prevent the processing time for encoding from being extended as compared with the case of performing the pre-analysis.

Note that in the video encoding device 100, the analysis unit 111, the determination unit 112, and the M value determination unit 113 are configured so that any two or all of the first to third embodiments are incorporated. It may be.

In addition, the video decoding apparatus shown in FIG. 2 converts a bitstream encoded using an M value set in a range that satisfies the motion vector restriction as exemplified in the first to third embodiments. Decrypt.

FIG. 6 is a block diagram illustrating an example of a video system. The video system shown in FIG. 6 is a system in which the video encoding device 100 of each of the above embodiments and the video decoding device 200 shown in FIG. 2 are connected by a wireless transmission line or a wired transmission line 300. The video encoding device 100 is the video encoding device 100 of any of the first to third embodiments described above, but the video encoding device 100 is an arbitrary one of the first to third embodiments. The analysis unit 111, the determination unit 112, and the M value determination unit 113 may be configured to execute two or all processes.

Further, although each of the above embodiments can be configured by hardware, it can also be realized by a computer program.

The information processing system shown in FIG. 7 includes a processor 1001, a program memory 1002, a storage medium 1003 for storing video data, and a storage medium 1004 for storing a bitstream. The storage medium 1003 and the storage medium 1004 may be separate storage media, or may be storage areas composed of the same storage medium. A magnetic storage medium such as a hard disk can be used as the storage medium.

In the information processing system shown in FIG. 7, the program memory 1002 has a program (video encoding program) for realizing the function of each block (except for the buffer block) shown in FIGS. Or a video decoding program). The processor 1001 executes processing according to the program stored in the program memory 1002, thereby realizing the functions of the video encoding device or the video decoding device shown in FIGS.

FIG. 8 is a block diagram showing the main part of the video encoding device. As illustrated in FIG. 8, the video encoding device 10 includes an analysis unit 11 (equivalent to the analysis unit 111 in the embodiment) that analyzes encoded statistical information, and a slice boundary near the analysis result of the analysis unit 11. An estimation unit 12 that estimates whether or not an optimal motion vector can be selected (in the embodiment, realized by the determination unit 112), and an encoding structure is adaptively determined based on the estimation result of the estimation unit 12. A coding structure determination unit 13 (implemented in the embodiment by the M value determination unit 113).

FIG. 9 is a block diagram showing the main part of the video decoding apparatus. As shown in FIG. 9, the video decoding apparatus 20 decodes a bitstream encoded based on an encoding structure that is set so that an optimal motion vector can be selected near a slice boundary under motion vector restriction. The decoding part 21 (it implement | achieved by the predictor 204 grade | etc., In this embodiment) is provided.

Note that the decoding unit 21 has, as the set coding structure, an SOP structure composed only of pictures with Temporal ID 0, an SOP structure composed of pictures with Temporal ID ０ 0 and 1 pictures, and Temporal ID 0 SOP structure composed of pictures of 1, 1 and 2; SOP structure of SOP の structure composed of pictures whose temporal ID is 0, 1 picture, 2 pictures and 3 pictures Can be decoded.

Furthermore, the decoding unit 21 is divided into four slices as shown in FIG. 10, and when the PU of one slice refers to another slice with motion compensation (MC) as shown in FIG. 11, The MC reference of the same PU across the slice boundary is limited to refer to only pixels within 128 lines from the slice boundary, so that the encoded bit stream can be decoded.

In the embodiment, when a 120P moving image is handled, the following SOP cage structure as shown in FIG. 12 can be used on the video encoding and decoding side.

Structure with L = 0: SOP structure composed of only pictures with Temporal ID 0 (that is, the number of picture stages included in the SOP is 1. It can be said that L indicating the maximum Temporal ID is 0. )
Structure with L = 1: SOP structure composed of a picture with a temporal ID of 0 and a picture with 1 (or M) (that is, the number of stages of pictures included in the SOP is two. L indicating the maximum temporal ID) Can be said to be 1 (or M).
L = 2 structure: an SOP structure composed of a picture with a temporal ID of 0, a picture of 1 and a picture of 2 (or M) (that is, the number of stages of pictures included in the SOP is three). (It can also be said that L indicating the maximum Temporal ID is 2 (or M).)
Structure with L = 3: SOP structure composed of picture with

Temporal ID

0, 1 picture, 2 pictures, and 3 (or M) pictures (that is, the number of stages of pictures included in the SOP is 4) (It can also be said that L indicating the maximum Temporal ID is 3 (or M).)
Structure with L = 4: SOP structure composed of a picture with a temporal ID of 0, a picture of 1, a picture of 2, a picture of 3, a picture of 4 (or M) (that is, a picture included in the SOP) (The number of stages is 4. It can also be said that L indicating the maximum Temporal ID is 4 (or M).)

Although the present invention has been described with reference to the embodiments and examples, the present invention is not limited to the above embodiments and examples. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

This application claims priority based on Japanese Patent Application No. 2015-188043 filed on September 25, 2015, the entire disclosure of which is incorporated herein.

DESCRIPTION OF SYMBOLS 10 Video coding apparatus 11 Analysis part 12 Estimation part 13 Coding structure determination part 20 Video decoding apparatus 21 Decoding part 100 Video coding apparatus 101 Encoding part 102 Screen divider 103 Frequency converter / quantizer 104 Inverse quantization / inverse Frequency converter 105 Buffer 106 Predictor 107 Entropy encoder 111 Analysis unit 112 Determination unit 113 M-value determination unit 200 Video decoding device 202 Entropy decoder 203 Inverse quantization / inverse frequency converter 204 Predictor 205 Buffer 1001 Processor 1002

Program Memory

1003, 1004 Storage medium

Claims

A video encoding device that divides a video into a predetermined number of slices and performs encoding processing under motion vector restriction near a slice boundary,
An analysis means for analyzing the encoded statistical information;
Estimating means for estimating whether an optimal motion vector can be selected near the slice boundary based on the analysis result of the analyzing means;
Based on the estimation result of the estimation means, the coding structure is composed of an SOP structure composed only of pictures with a temporal ID of 0, an SOP structure composed of pictures with a temporal ID of 0 and 1 pictures, and a temporal ID of 0. SOP structure consisting of picture, picture of 1 and 2 pictures, adaptive decision to one of picture whose temporal ID is 0, SOP structure consisting of 1 picture, 1 picture, 2 pictures and 3 pictures A video encoding device comprising: an encoding structure determining unit.
The video coding apparatus according to claim 1, wherein the coding structure determining unit determines a reference picture distance.
The video encoding apparatus according to claim 1, wherein the analysis unit analyzes a motion vector as the encoded statistical information.
The video encoding device according to any one of claims 1 to 3, wherein the analysis unit analyzes a prediction mode of a block near a slice boundary as the encoded statistical information.
The video encoding apparatus according to any one of claims 1 to 4, wherein the analysis unit analyzes a generated code amount of a block near a slice boundary as the encoded statistical information.
SOP structure consisting of only pictures with Temporal ID 0, SOP structure consisting of pictures with Temporal ID 0 and 1 pictures, pictures with Temporal ID 0, 1 pictures, and 2 pictures A video decoding apparatus comprising: decoding means for decoding a video encoded with any one of an SOP structure, a picture having a temporal ID of 0, a picture of 1, a picture of 2, a picture of 2, and a picture of 3;
The video to be decoded is divided into a predetermined number of slices, encoded under the motion vector limit near the slice boundary, and set so that the optimal motion vector can be selected near the slice boundary under the motion vector limit The video decoding apparatus according to claim 6, wherein the video decoding apparatus is encoded with a structure.
A video encoding method that divides a video into a predetermined number of slices and performs an encoding process under a motion vector restriction near a slice boundary,
Analyze encoding statistics,
Based on the analysis results, estimate whether the optimal motion vector can be selected near the slice boundary,
Based on the estimation result, the coding structure is composed of an SOP structure composed only of pictures with a temporal ID of 0, an SOP structure composed of pictures with a temporal ID of 0 and 1 pictures, a picture with a temporal ID of 0, and a 1 Picture coding method for adaptively determining one of a picture and a SOP structure composed of two pictures, a picture whose temporal ID is 0, a picture of one picture, a picture of two, and a picture of three .
The video encoding method according to claim 8, wherein a reference picture distance is determined as the encoding structure.
The video encoding method according to claim 8 or 9, wherein a motion vector is analyzed as the encoded statistical information.
The video encoding method according to any one of claims 8 to 10, wherein a prediction mode of a block near a slice boundary is analyzed as the encoding statistical information.
The video coding method according to any one of claims 8 to 11, wherein a generated code amount of a block near a slice boundary is analyzed as the coding statistical information.
SOP structure consisting of only pictures with Temporal ID 0, SOP structure consisting of pictures with Temporal ID 0 and 1 pictures, pictures with Temporal ID 0, 1 pictures, and 2 pictures A video decoding method for decoding video encoded with an SOP structure, a picture having a temporal ID of 0, a picture of 1, a picture of 2, a picture of 2, and a picture of SOP.
It is divided into a predetermined number of slices, encoded under the motion vector limit near the slice boundary, and encoded with the SOP structure set so that the optimal motion vector can be selected near the slice boundary under the motion vector limit. The video decoding method according to claim 13, wherein the video is decoded.
A program for executing a video encoding method that divides a video into predetermined slices and performs an encoding process under a motion vector restriction near a slice boundary,
On the computer,
A process of analyzing the encoding statistics,
Based on the analysis result, a process for estimating whether an optimal motion vector can be selected near the slice boundary;
Based on the estimation result, the coding structure is composed of an SOP structure composed only of pictures with a temporal ID of 0, an SOP structure composed of pictures with a temporal ID of 0 and 1 pictures, a picture with a temporal ID of 0, and a 1 SOP structure composed of pictures and 2 pictures, and processing to adaptively determine any of SOP structures composed of pictures with temporal ID 0, 1 picture, 2 pictures and 3 pictures Video encoding program for
16. The video encoding program according to claim 15, which causes a computer to execute a process of determining a reference picture distance as the encoding structure.
The video encoding program according to claim 15, wherein the computer analyzes a prediction mode of a block near a slice boundary as the encoding statistical information.
The video encoding program according to any one of claims 15 to 17, which causes a computer to analyze a generated code amount of a block near a slice boundary as the encoded statistical information.
On the computer,
SOP structure consisting of only pictures with Temporal ID 0, SOP structure consisting of pictures with Temporal ID 0 and 1 pictures, pictures with Temporal ID 0, 1 pictures, and 2 pictures A video decoding program for executing processing for decoding a video encoded with any of the SOP structure, a picture with a temporal ID of 0, 1 picture, 2 pictures, and 3 pictures.
On the computer,
It is divided into a predetermined number of slices, encoded under the motion vector limit near the slice boundary, and encoded with the SOP structure set so that the optimal motion vector can be selected near the slice boundary under the motion vector limit. The video decoding program according to claim 19, wherein the video is decoded.