KR101868270B1

KR101868270B1 - Content-aware video encoding method, controller and system based on single-pass consistent quality control

Info

Publication number: KR101868270B1
Application number: KR1020170026356A
Authority: KR
Inventors: 김기원; 경종민
Original assignee: 재단법인 다차원 스마트 아이티 융합시스템 연구단
Priority date: 2017-02-28
Filing date: 2017-02-28
Publication date: 2018-06-15

Abstract

According to an embodiment, a method for content-aware video encoding based on single-pass consistent quality control includes: a step of detecting a screen change for a current frame of a picture group - the picture group includes a plurality of frames -; a step of determining an optimal frame type of the current frame based on a result of detecting the screen change; a step of setting an initial quantization parameter in the current frame based on an optimal quantization parameter in a previous frame of the current frame; a step of obtaining model parameters in the current parameters based on the optimal frame type of the current frame and the initial quantization parameter by using a prebuilt model parameter lookup table; a step of calculating a predicted encoding distortion of the current frame based on the obtained model parameters and a screen describer of the previous frame received from a video encoder by using a prebuilt distortion prediction model for the current frame; and a step of obtaining an optimal quantization parameter in the current frame minimizing the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame.

Description

TECHNICAL FIELD [0001] The present invention relates to a content-aware video encoding method, a controller, and a system based on single-pass consistent picture quality control.

The following description relates to a content-aware video encoding method, a controller and a system based on a single-pass encoding structure, and more particularly, to a method and apparatus for encoding a quantization parameter of each frame that minimizes inter-frame distortion variance. QP). &Lt; / RTI >

Recent advances in video technology have led to a variety of battery-powered miniature cameras that capture, record, and stream image / video of a user's personal activities. Although such a video camera system requires high video quality, a trade-off relationship between the encoding bit-rate and the distortion of the video frame becomes a very important problem due to the limitations on storage capacity, channel bandwidth and battery life.

Many techniques have been developed to control the encoding bit-rate and / or distortion while satisfying the constraints of the encoding system, in order to improve the performance of the rate-distortion (RD) and rate-distortion (RD) . In particular, since the size of the quantization step greatly affects the R-D performance in the video encoding process, a technique of controlling the bit-rate and distortion by controlling the quantization parameter that determines the size of the quantization step has been developed.

Specifically, with respect to R-D control, video encoding is classified into variable bit-rate (VBR) encoding and constant bit-rate (CBR) encoding. The VBR encoding allocates a different bit rate for each frame, while the CBR encoding allocates a uniform bit rate for all frames. Due to storage capacity or channel bandwidth constraints, most conventional techniques employ CBR encoding. However, despite the different screen characteristics, uniform bit rate allocation for all frames often causes severe distortion variations. Such an increase in the change in inter-frame distortion causes flicker and is undesirable because subjective video quality deteriorates depending on the human visual system.

Thus, a consistent image quality control method for reducing interframe distortion dispersion in CBR encoding has been proposed. Consistent image quality control is focused on minimizing the difference between the encoded distortion level and the target distortion level for each frame, i. E., Minimizing the distortion variance for all frames. In order to achieve consistent image quality control, by selecting appropriate quantization parameter values for each frame, different bit rates may be assigned to each frame depending on the screen characteristics. However, this increases the overall bit rate and, furthermore, causes the disadvantage that the buffer overflow frequently causes frame degradation.

To solve the buffer overflow problem, most commercial hardware video encoders attempt to reduce the frame rate according to the degree of buffer fullness. However, reducing the frame rate according to the buffer fullness can not compensate for the frame degradation, which often leads to a problem of degrading perceptual video quality.

Constrained Variable Bit-Rate (CVBR) encoding assigns a different bit rate to each frame as long as the bit rate constraint is satisfied. In the case of CVBR encoding, the existing consistent quality control scheme determines the optimal quantization parameter for each frame based on the result of the first pass encoding, and uses the resulting quantization parameter for the second pass encoding . However, in addition to increased computational complexity and power consumption, the resultant longer encoding delay makes multi-pass encoding unsuitable for real-time video streaming applications.

Thus, in order to overcome the problems of multi-pass encoding, a consistent picture quality management system based on single pass encoding has been proposed. Unlike multi-pass encoding, single pass encoding requires accurate prediction of encoding distortion because it can not determine the quantization parameters of the current frame using the encoding result of the current frame.

Since the influence of the screen complexity on the R-D performance is significant, the scene descriptor indicating the screen complexity is a very important factor in encoding distortion prediction. Conventional techniques have proposed several screen descriptors for various applications such as perceptual coding, image / video image quality evaluation, object recognition as well as consistent image quality management. However, conventional screen descriptors have the drawback that they are not suitable for real-time video streaming applications due to the high computational complexity. Furthermore, when a sudden change in the screen occurs, inaccurate distortion prediction and a prediction error are propagated due to screen distortion, resulting in a large distortion variation.

Therefore, it is necessary to provide a single-pass consistent image quality control technique that uses a lightweight screen descriptor, an accurate distortion prediction model, and a powerful screen change detection method.

One embodiment of the present invention proposes a content-aware video encoding method, a controller, and a system based on a single-pass consistent picture quality control using a lightweight screen descriptor, an accurate distortion prediction model, and a strong screen change detection method.

In particular, one embodiment proposes a screen descriptor that can be obtained without additional cost by calculating a screen descriptor based on a DCT coefficient (Discrete Cosine Transform coefficient) so as to quantitatively express a screen complexity in a frame.

In addition, one embodiment proposes a distortion prediction model considering screen complexity by defining a relationship between a screen descriptor having a distortion-quantization (D-Q) relation based on a Cauchy distribution and intra-frame distortion.

In addition, one embodiment suggests a screen change detection that is robust against the intensity change and improves the accuracy by using the ratio of the prediction mode of the macroblock (MB) within one frame.

According to an exemplary embodiment, a content-aware video encoding method based on single-pass consistent image quality control includes: detecting a screen change for a current frame of a group of pictures (the group of pictures includes a plurality of frames) ; Determining an optimal frame type of the current frame based on a result of performing the screen change detection; Setting an initial quantization parameter in the current frame based on an optimal quantization parameter in a previous frame of the current frame; Obtaining model parameters in the current frame based on the optimal frame type and the initial quantization parameter of the current frame using a pre-built model parameter lookup table; Calculating a predicted encoding distortion of the current frame based on the screen descriptor of the previous frame received from the video encoder and the obtained model parameters using a distortion prediction model constructed beforehand for the current frame; And obtaining an optimal quantization parameter in the current frame that minimizes the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame.

According to one aspect, the screen descriptor of the previous frame is calculated based on the horizontal slope AC coefficient and the vertical slope AC coefficient among the DCT coefficients (Discrete Cosine Transform coefficient) in the previous frame so as to quantitatively express the screen complexity in the previous frame .

According to another aspect, the distortion prediction model can be constructed in advance to define a relationship between the picture descriptor and the predicted encoding distortion using the model parameters.

According to another aspect, performing the scene change detection on the current frame may include detecting the number of macroblocks encoded in the intra mode in the previous frame received from the video encoder and the number of macroblocks encoded in the skip mode in the previous frame, And performing a screen change detection on the current frame based on a ratio between the number of blocks.

According to another aspect, performing the scene change detection on the current frame may include detecting a ratio between a number of macroblocks encoded in an intra mode in the previous frame and a number of macroblocks encoded in a skip mode in the previous frame, Recognizing that a scene change occurs in the current frame if the current frame is greater than or equal to the threshold; Or if the ratio between the number of macroblocks encoded in the intra mode in the previous frame and the number of macroblocks encoded in the skip mode in the previous frame is smaller than the threshold value, Wherein the step of determining an optimal frame type of the current frame comprises the steps of: if it is recognized that a scene change has occurred in the current frame, determining an optimal frame type of the current frame as an I frame Determining; Or determining that the optimal frame type of the current frame is a P frame when it is recognized that the current frame does not issue a screen change.

According to another aspect of the present invention, performing the screen change detection on the current frame includes: determining whether the current frame is the first frame of the image group; And performing a screen change detection on the current frame if the current frame is not the first frame of the image group as a result of the determination.

According to another aspect, the model parameter look-up table may be pre-built to define a relationship between each optimal frame type, quantization parameters and model parameters in sample frames.

According to another aspect, obtaining the optimal quantization parameter in the current frame may comprise incrementing the initial quantization parameter such that the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is minimized ; And obtaining the increased or decreased initial quantization parameter as an optimal quantization parameter in the current frame.

According to another aspect, there is provided a method comprising: obtaining model parameters in the current frame; calculating predicted encoding distortion in the current frame; and obtaining optimal quantization parameters in the current frame, Wherein the step of repeatedly performing the at least one generation until the difference between the predicted encoding distortion and the target encoding distortion of the current frame is minimized and the step of increasing or decreasing the initial quantization parameter comprises: Wherein if the difference between the encoding distortion and the target encoding distortion of the current frame is not minimized, then increasing or decreasing the initial quantization parameter is performed in the step of obtaining model parameters in the current frame, More steps to use as .

According to another aspect, a content-aware video encoding method based on single-pass consistent picture quality control comprises calculating a target encoding bit-rate in the current frame based on a constraint on the average number of encoded bits per frame; And calculating a target encoding distortion of the current frame based on a target encoding bit-rate in the current frame, wherein obtaining an optimal quantization parameter in the current frame comprises: Estimating a predicted encoding bit-rate in the current frame based on a quantization parameter; And increasing the optimal quantization parameter until the predicted encoding bit-rate in the current frame is less than or equal to the target encoding bit-rate in the current frame.

According to another aspect, a content-aware video encoding method based on single pass consistent picture quality control comprises encoding the current frame based on an optimal frame type of the current frame and an optimal quantization parameter in the current frame .

According to another aspect, the step of encoding the current frame may further comprise the steps of using the current frame and the predicted encoding distortion of the current frame, the encoding distortion of the current frame being obtained as a result of encoding the current frame, Calculating an update parameter; And using the update parameter in calculating the predicted encoding distortion for the next frame of the current frame.

According to one embodiment, a computer program stored on a medium for executing a content-aware video encoding method based on single-pass consistent quality control in combination with a computer implementing an electronic device, the method comprising: Performing a screen change detection on a current frame of a group of pictures including the plurality of frames; Determining an optimal frame type of the current frame based on a result of performing the screen change detection; Setting an initial quantization parameter in the current frame based on an optimal quantization parameter in a previous frame of the current frame; Obtaining model parameters in the current frame based on the optimal frame type and the initial quantization parameter of the current frame using a pre-built model parameter lookup table; Calculating a predicted encoding distortion of the current frame based on the screen descriptor of the previous frame received from the video encoder and the obtained model parameters using a distortion prediction model constructed beforehand for the current frame; And obtaining an optimal quantization parameter in the current frame that minimizes the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame.

According to one embodiment, a content-aware video encoding controller based on single-pass consistent picture quality control may perform screen change detection on a current frame of a group of pictures (e.g., the video group includes a plurality of frames) A screen change detection unit for determining an optimal frame type of the current frame based on a result of performing the screen change detection; A distortion prediction / quantization parameter setting unit that sets an initial quantization parameter in the current frame based on an optimal quantization parameter in a previous frame of the current frame; And a model parameter obtaining unit for obtaining model parameters in the current frame based on the optimum frame type and the initial quantization parameter of the current frame using a pre-built model parameter lookup table, wherein the distortion prediction / The setting unit calculates the predicted encoding distortion of the current frame based on the screen descriptor of the previous frame received from the video encoder and the obtained model parameters using the distortion prediction model constructed beforehand for the current frame, Obtains an optimal quantization parameter in the current frame that minimizes the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame.

According to one embodiment, a content aware video encoding system based on single pass consistent quality control comprises: a video encoder; And a content-aware video encoding controller for controlling the video encoder, wherein the content-aware video encoding controller includes: a group of pictures, the group of pictures including a plurality of frames, A screen change detector for detecting an optimal frame type of the current frame based on a result of performing the screen change detection; A distortion prediction / quantization parameter setting unit that sets an initial quantization parameter in the current frame based on an optimal quantization parameter in a previous frame of the current frame; And a model parameter obtaining unit for obtaining model parameters in the current frame based on the optimum frame type and the initial quantization parameter of the current frame using a pre-built model parameter lookup table, wherein the distortion prediction / The setting unit calculates the predicted encoding distortion of the current frame based on the screen descriptor of the previous frame received from the video encoder and the obtained model parameters using the distortion prediction model constructed beforehand for the current frame, Obtains an optimal quantization parameter in the current frame that minimizes the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame, and wherein the video encoder Group on the basis of the optimum quantization parameter at the optimum frame type and the current frame in the current frame, and encoding the current frame.

One embodiment can propose a content-aware video encoding method, a controller and a system based on a single-pass consistent picture quality control using a lightweight screen descriptor, an accurate distortion prediction model, and a powerful screen change detection method.

In particular, one embodiment may propose a screen descriptor that can be obtained at no additional cost by computing a screen descriptor based on DCT coefficients to quantitatively represent screen intra-frame complexity.

In addition, one embodiment can propose a distortion prediction model considering screen complexity by defining a relationship between a screen descriptor having a distortion-quantization relation based on a Cauchy distribution and intra-frame distortion.

In addition, in one embodiment, by using the ratio of the prediction mode of the macroblock in one frame, it is possible to improve the accuracy and to propose a screen change detection robust to the intensity change.

Therefore, one embodiment can propose a technique for reducing the distortion variation while satisfying the bit-rate constraint through the image quality control method applicable to the VBR and CVBR.

1 is a diagram illustrating a correlation between a screen descriptor and a Sobel operator value according to an exemplary embodiment of the present invention.
FIG. 2 is a graph showing the standard deviation of a screen descriptor and a Sobel operator value in a frame for 15 video sequences according to an exemplary embodiment.
Figures 3-5 illustrate model parameters for each of the I frame and P frame for each quantization parameter for five video sequences according to one embodiment.

And the screen descriptor.
FIG. 6 is a block diagram illustrating a model parameter < RTI ID = 0.0 >

And

Fig.
7 to 8 are diagrams showing a comparison between a distortion estimated by the distortion prediction model according to an embodiment and an actually measured distortion.
FIG. 9 is a diagram for explaining the ratio of prediction modes of intra-frame macroblocks used in picture change detection according to an embodiment.
10 is a diagram illustrating a type of a frame after a screen change in the single pass encoding according to an exemplary embodiment.
11 is a block diagram illustrating a content-aware video encoding system in accordance with one embodiment.
12 is a diagram for explaining a process of calculating predicted encoding distortion during a content recognition video encoding process according to an embodiment.
FIG. 13 is a flowchart illustrating a content recognition video encoding method according to an embodiment.
FIG. 14 is a diagram illustrating a content recognition quantization parameter determination algorithm according to an embodiment.
FIG. 15 is a flowchart showing a specific example in which the content recognition video encoding method shown in FIG. 13 is performed in the VBR encoding.
16 is a flowchart showing a concrete example in which the content recognition video encoding method shown in FIG. 13 is performed in CVBR encoding.

Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited to or limited by the embodiments. In addition, the same reference numerals shown in the drawings denote the same members.

Also, terminologies used herein are terms used to properly represent preferred embodiments of the present invention, which may vary depending on the viewer, the intention of the operator, or the custom in the field to which the present invention belongs. Therefore, the definitions of these terms should be based on the contents throughout this specification.

The content-aware video encoding method in accordance with an embodiment minimizes variations in encoding distortion and may be performed at an average encoded bit-rate (in bits per frame) to adhere to a given capacity of the encoding buffer

And the target distortion

Lt; RTI ID = 0.0 > a < / RTI >

In this case, two criteria can be considered to represent the distortion variation: Minimizing the average distortion (MINAVE) and Minimizing the variance of distortion (MINVAR). Since the MINVAR is a more suitable and intuitive standard than the MINAVE in terms of consistent video quality, the content-aware video encoding method according to an exemplary embodiment, in the case of VBR encoding, is based on the MINVAR as an optimal quantization parameter

And the optimum frame type

As well as the problem of searching.

In Equation 2,

Represents the number of frames,

Lt; / RTI > represents a given target distortion,

Represents the encoding distortion of the i < th > frame,

Lt; RTI ID = 0.0 > i < / RTI &

And the variance of the variance.

In Equation 1,

Is a set of optimal quantization parameters for all frames,

. &Lt; / RTI > Likewise,

Is a set of optimal frame types for all frames,

. &Lt; / RTI > Below,

Is expressed as 1 when it is I frame or 0 when it is P frame.

The content-aware video encoding method according to an embodiment should add an encoding bit-rate constraint such as Equation 3 in case of CVBR encoding.

In Equation 3,

&Lt; / RTI > where the i < th > frame represents the encoding bit-

Represents the frame width at the pixels,

Represents the frame height in the pixels.

Solving Equations 2 and 3, the variation of the encoding distortion of the i < th >

The optimal quantization parameter < RTI ID = 0.0 >

And the optimum frame type

Can be obtained. However, solving

Equations

2 and 3 to find the correct solution to Equation 1 is a good solution for the combination of QP and frame type for all frames

Lt; RTI ID = 0.0 > overhead. &Lt; / RTI >

Thus, the content-aware video encoding method according to one embodiment can solve the problem in Equations 2 and 3

By localizing them into local problems, as shown in Equation (4), the optimal quantization parameter

And the optimal frame type of the i < th >

The solution can be simplified.

In Equation 4,

Represents the target encoding distortion of the i < th > frame,

Represents the predicted encoding distortion of the i < th > frame,

Represents the difference between the predicted encoding distortion of the i < th > frame and the target encoding distortion of the i < th >

Further, in the CVBR encoding, the bit-rate constraint is added as shown in Equation 5 below.

&Lt; EMI ID =

In Equation 5,

Represents the target encoding bit-rate of the i < th > frame,

Represents the predicted encoding bit-rate of the i < th > frame.

The terms used in Equations 1 to 5 and the terms to be used in the following expressions will be described in detail with reference to Table 1.

Terms Explanation

Given target distortion

The given average encoded bit-rate

A set of optimal quantization parameters

A set of optimal frame types

The encoding distortion of the i < th >

The encoding bit-rate of the ith frame

The optimal quantization parameter of the i < th >

The optimal frame type of the i-th frame

The predicted encoding distortion of the i < th >

The predicted encoding bit-rate of the i < th >

Number of frames

Frame width in pixels

Frame height in pixels

Target encoding distortion of ith frame

The target encoding bit-rate of the ith frame

The quantization parameter of the i-th frame

Frame type of i-th frame

Number of macroblocks in the frame

The screen descriptor in the i-th macroblock of the i-th frame

Horizontal Sobel operator of N * N block

Vertical Sobel operator of N * N block

The pixel value at the position (k, l) of the frame

The magnitude of the Sobel slope of the N * N block

The DCT coefficients at the position (u, v) of the N * N block

Screen descriptor of N * N block

The encoding distortion of the jth macroblock of the i < th >

Quantization step size of ith frame

The number of macroblocks encoded in intra mode in a frame

The number of macroblocks encoded in the skip mode in the frame

Threshold for screen change detection

FIG. 1 is a diagram illustrating a correlation between a screen descriptor and a Sobel operator value according to an exemplary embodiment. FIG. 2 illustrates a screen descriptor and a Sobel operator value in a frame for 15 video sequences according to an exemplary embodiment. Fig.

Referring to FIGS. 1 and 2, on the other hand, the screen complexity is a degree of deviation of neighboring pixel values in a frame, and can be expressed by a slope magnitude through a slope operator. In general, a Sobel operator, which is capable of accurately measuring the slope information of an image among various operators, is widely used for measuring screen complexity. The Sobel operators of horizontal and vertical N * N blocks are defined as Equations 6 and 7.

&Lt; EMI ID =

Equation (7)

In Equations 6 and 7,

Represents a horizontal Sobel operator,

Represents a vertical Sobel operator,

Represents the pixel value at the position (k, l) of the frame. From

Equations

6 and 7, the size of the Sobel operator for the N * N block

Is defined as Eq. (8).

The slope information in the spatial domain can be expressed by a transform coefficient in the frequency domain. In particular, DCT is often used as an image / video encoder during various conversions. The DCT coefficients are composed of a DC coefficient, a horizontal slope AC coefficient, a vertical slope AC coefficient, and other AC coefficients. According to the definition of N * N DCT, the horizontal slope AC coefficient and the vertical slope AC coefficient are Is defined.

Equation (9)

In Equations 9 and 10,

Represents the horizontal slope AC coefficient of the N * N block DCT,

Represents the vertical slope AC coefficient of the N * N block DCT. Such

And

Can be obtained from the video encoder at no additional cost.

In the content-aware video encoding method according to an exemplary embodiment, the screen descriptor is defined as Equation 11 based on the horizontal slope AC coefficient and the vertical slope AC coefficient among the DCT coefficients.

In Equation 11,

Represents a screen descriptor for an N * N block. These screen descriptors

To verify how well this slope is represented, we calculated the average of the five video sequences of Aspen, Pedestrian, Sunflower, Bluesky, and Tractor

And

1, a screen descriptor proposed by a content-recognition video encoding method according to an exemplary embodiment includes a size of a Sobel slope and a degree of degree of correlation (degree of similarity)

Is very similar to 0.981.

Furthermore, the screen complexity is

Is defined as the standard deviation of < RTI ID =

And the standard deviation of the screen descriptor, it is possible to know how well the screen descriptor proposed by the content recognition video encoding method according to the embodiment quantitatively expresses the screen complexity. For 15 video sequences of Bridge-far, Mother, Akiyo, Water-fall, Silent, Soccer, Coastguard, Bridge-close, Tempete, Harbor, Football, Bus, Paris, Flower and Mobile,

Wow

Referring to FIG. 2 showing the standard deviation of the standard descriptor, the relation between the screen descriptor and the Sobel operator value proposed by the content recognition video encoding method according to an embodiment is quasi-linear.

As described above, since the screen descriptor proposed by the content-recognition video encoding method according to the embodiment is calculated using only some of the coefficients (the horizontal slope AC coefficient and the vertical slope AC coefficient) of the DCT coefficients, And can be utilized as an indicator sufficiently reflecting the screen complexity.

Accordingly, the screen descriptor according to one embodiment may be used independently in the process of classifying a video sequence according to screen complexity as well as in a process using a distortion prediction model described later. For example, the fifteen video sequences of Bridge-far, Mother, Akiyo, Water-fall, Silent, Soccer, Coastguard, Bridge-close, Tempete, Harbor, Football, Bus, Based on the screen descriptor, the screen complexity can be classified into three groups as shown in Table 2 according to the conditions as shown in Expression (12).

In Equation 12,

Represents a threshold value for classifying low screen complexity,

Represents a threshold value for classifying the screen complexity as high. for example,

Is set to 100,

Lt; RTI ID = 0.0 > 150. Also,

In the frame

. &Lt; / RTI > Table 2 below shows 15 videos of Bridge-far, Mother, Akiyo, Water-fall, Silent, Soccer, Coastguard, Bridge-close, Tempete, Harbor, And the sequence is classified according to Equation (12).

group Video sequence lowness Bridge-far, Mother, Akiyo, Water-fall, Silent, Soccer, Coastguard, Bridge-close middle Tempete, Harbor, Football, Bus, Paris height Flower, Mobile

Thus, the screen descriptor of the j-th macroblock of the i-th frame

Is defined as Eq. (13).

Equation (13)

In Equation 13, n and m denote the position index of the jth macroblock 4 * 4 DCT block. In video encoding, residual macroblocks that are differences between current and predicted macroblocks by inter / intra prediction can be transformed via DCT. Therefore,

For the calculation of the residual macroblock,

Can be used.

Such

May be used in a process of previously constructing a distortion prediction model used in the content recognition video encoding method according to an embodiment. A detailed description thereof will be given below.

Figures 3-5 illustrate model parameters for each of the I frame and P frame for each quantization parameter for five video sequences according to one embodiment.

And FIG. 6 is a diagram illustrating a relationship between a model parameter according to various quantization parameters in each of an I frame and a P frame according to an embodiment.

And

And FIGS. 7 to 8 are views showing a comparison between the distortion estimated by the distortion prediction model according to an embodiment and the actually measured distortion.

Referring to FIGS. 3-8, the DQ relation based on the Cauchy distribution is defined as Equation 14. < EMI ID = 14.0 >

In Equation 14,

And

Indicates a content-dependent model parameter,

Represents the encoding distortion due to the Mean Square Error (MSE) for the jth macroblock of the i < th > frame,

Is the quantization step size of the i < th > frame, and is the quantization parameter

. In H.264 / AVC,

Wow

Is expressed as shown in Eq. (15).

In Equation 15,

And

Lt; RTI ID = 0.0 > 14 < / RTI > and 15,

Wow

Is defined as Eq. (16).

In Equation 16,

Wow

Represents a model parameter. Equation 16, which indicates the encoding distortion of the jth macroblock of the i-th frame,

Is a constant value for each frame type, but has a high fitting accuracy. For example, in five video sequences: Aspen, Pedestrian, Sunflower, Bluesky, and Tractor

Is 0.139 in the I frame and 0.116 in the P frame, the fitting accuracy of Equation 16

As shown in Table 3.

Video sequence I frame P frame Aspen 0.999 0.998 Bluesky 0.996 0.994 Pedestrian 0.989 0.985 Sunflower 0.985 0.988 Tractor 0.998 0.997 Average 0.993 0.992

Especially,

Is closely related to the screen complexity,

And

Is a constant,

Has a higher correlation with screen complexity. For the five video sequences in Table 3, the I and P frames when the quantization parameters are 24, 32, and 40, and the model parameters

And screen descriptors

3 to 5 are shown. Therefore,

And screen descriptors

Is defined as Eq. (17).

In Equation 17,

And

Represents a model parameter that depends on the frame type and the quantization parameter. Such

And

Can be obtained by the curve fit of Equation 17 for each quantization parameter in the five video sequences of Table 3. [ At this time, as shown in Fig. 6, in each quantization parameter, the highest average fitting accuracy

Having

And

Can be adopted.

From equations 16 and 17

Wow

Is derived as shown in Eq. (18).

In Equation 18,

Represents the model parameter. From Equation (18), the encoding distortion of the i < th >

Of all the macroblocks in the frame

Is defined as equation 19.

In Equation 19,

Represents the number of macroblocks in a frame.

As described above, the content-recognition video encoding method according to an embodiment is based on the encoding distortion of the jth macroblock of the i-th frame defined by using the screen descriptor, as shown in Equation 18,

A distortion model based on a content characteristic that defines a distortion characteristic can be constructed. Since this distortion model is defined by using a screen descriptor indicating the screen complexity, it is possible to calculate not only the quantization parameter but also intra-frame distortion considering the screen complexity.

Accordingly, the content-aware video encoding method according to one embodiment is a method for encoding this distortion model into the predicted encoding distortion

, It is possible to construct a distortion prediction model. A detailed description thereof will be given below.

The fitting accuracy of the distortion prediction model constructed by Eqs. 18 to 19 for ten video sequences of Akiyo, Bus, Crew, Football, Aspen, Station2, RushFieldCuts, OldTownCross, Sunflower and DucksTakeOff is shown in Table 4. Here, Model 1 and Model 2 are models that use the SATD of the DCT and the SATD of the HT as the screen descriptor, and do not consider the quantization parameter in the relationship between the screen descriptor and the model parameters of the DQ model. As shown in Table 4, the distortion prediction model according to one embodiment has the highest average fitting accuracy between the estimated distortion and the actually measured distortion (e.g.,

).

resolution Video sequence Model 1 Model 2 Distortion prediction model CIF Akiyo 0.937 0.958 0.993 Bus 0.950 0.916 0.996 Crew 0.940 0.992 0.984 Football 0.940 0.905 0.993 HD Aspen 0.920 0.959 0.997 Station2 0.870 0.982 0.997 RushFieldCuts 0.982 0.912 0.992 OldTownCross 0.967 0.945 0.971 Sunflower 0.784 0.956 0.984 DucksTakeOff 0.991 0.953 0.998 Average 0.930 0.948 0.991

Referring to FIGS. 7 to 8 illustrating a comparison between the distortion estimated by the distortion prediction model according to an embodiment and the actually measured distortion based on Table 4, the distortion prediction model according to an embodiment is a conventional model 1 And 2, respectively, and the average fitting accuracy.

Hereinafter, a distortion prediction model according to an embodiment is described as being used for obtaining the optimal quantization parameter, but it is not limited thereto and can be applied to various techniques such as rate-distortion optimization macro block mode determination. The detailed description thereof will be omitted because it goes beyond the technical idea of the present invention.

FIG. 9 is a diagram for explaining the ratio of prediction modes of intra-frame macroblocks used in picture change detection according to an embodiment.

Due to the nature of video encoding using reference frames, sudden scene changes cause a significant degradation of R-D performance. Therefore, in order to prevent degradation of the prediction error due to the screen change, the first frame after the screen change must be encoded into the I frame. Detecting the scene change in the video encoding is a necessary process to prevent serious deterioration of the R-D performance.

Accordingly, the content-aware video encoding method according to an exemplary embodiment of the present invention includes the step of determining the number of macroblocks

It is possible to efficiently detect a screen change at a low calculation cost.

When a violent picture change occurs, the pixels of the macroblock in the current frame are hardly correlated with the corresponding pixels of the corresponding macroblock in the previous frame. Thus, due to the high proportion of intra modes in the frame, it may be more efficient to encode the macroblocks in intra mode than inter mode or skip mode.

Referring to FIG. 9A, the number of macroblocks in the ratio of the macroblocks encoded in one of the skip mode, the inter mode, and the intra mode for the first frame after the scene change generated with the high intensity light, The ratio of intra-encoded MBs to intra-mode encoded macroblocks,

) Is more than 98%. However, referring to FIG. 9 (b), it can be seen that the IMR ratio for the first frame after the change of the screen generated with the low-intensity light is reduced due to the increase of the ratio of the macroblock. The macroblocks thus increased in the IMR ratio are encoded in skip mode with increasing quantization parameters.

Accordingly, the content-aware video encoding method according to an exemplary embodiment may use a modified intra-encoded MB, as shown in Equation 20, to remove the effect of the quantization parameter and the increase of the skip mode in the IMR. ratio (MIMR).

In Equation 20, MIMR represents the ratio between the number of macroblocks encoded in intra mode and the number of macroblocks encoded in skipped mode,

Represents a threshold value for screen change detection. This MIMR can be used as a robust picture change detection indicator, since it is independent of the quantization parameters relative to both video sequences with high intensity and low intensity light. At this time,

Can be adaptively determined according to the MIMR analyzed for high and low video sequences.

For example, the minimum MIMR for w / scene change in high-light and low-light video sequences analyzed for 512 connections of high and low-brightness video sequences is 0.955 and 0.728, respectively, as shown in Table 5, May be 0.476 and 0.655, respectively. Thus, for accurate detection,

Should be less than the minimum MIMR of the scene change and may be set to a value of 0.7 higher than the maximum MIMR of the scene change.

Connected video Screen change frames MIMR Average at least maximum High strength w / screen change 0.983 0.955 0.998 w / o screen change 0.291 0.077 0.476 Low intensity w / screen change 0.867 0.728 0.949 w / o screen change 0.325 0.093 0.655

As described above, the MIMR, which represents the ratio between the number of macroblocks encoded in intra mode and the number of macroblocks encoded in skipped mode, is a suitable screen transition indicator that can be used to detect screen changes regardless of the intensity of the light Thus, the content-aware video encoding method according to an exemplary embodiment performs screen change detection based on MIMR.

10 is a diagram illustrating a type of a frame after a screen change in the single pass encoding according to an exemplary embodiment.

Referring to FIG. 10, in order to suppress error propagation due to a screen change, it is necessary to be encoded into an I frame after a screen change. The first frame after the picture change has a ratio of the macroblocks encoded in the high intra mode, but the remaining macroblocks encoded in the internal skip mode may cause error propagation due to the picture change. Therefore, the R-D performance may be significantly degraded.

Accordingly, the content recognition video encoding method according to the embodiment can determine the frame type for the frame after the screen change by using the MIMR calculated as shown in Equation (20). More specifically, as shown in FIG. 10 (a), when the first frame after the scene change is encoded into an I frame, since all the macroblocks of the first frame after the scene change have already been encoded in the intra mode, It is possible to suppress the error propagation caused by the error. Accordingly, the content recognition video encoding method according to the embodiment does not need to detect the screen change using the MIMR calculated as Equation 20, and the frame type of the next frame can be encoded into the P frame.

However, as shown in Fig. 10 (b), if the first frame after the scene change is encoded as a P frame, the next frame (the second frame after the scene change) should be encoded in the I frame. This is to suppress the propagation of errors caused by the screen change. Accordingly, in the content recognition video encoding method according to the embodiment, when the first frame after the screen change is encoded into the P frame, the screen change detection is performed using the MIMR calculated as Equation (20) As shown in FIG.

That is, in the content-aware video encoding method according to the embodiment, the MIMR calculated as shown in Equation (20)

If the MIMR is less than or equal to the threshold, it is determined that the screen change occurs in the current frame, and the encoding frame type of the current frame is determined to be I frame as shown in FIG. 10 (b) It is recognized that no change occurs and the encoding frame type of the current frame can be determined as a P frame as shown in FIG. 10 (a).

FIG. 11 is a block diagram illustrating a content recognition video encoding system according to an embodiment, FIG. 12 is a diagram for explaining a process of calculating encoding distortion predicted during a content recognition video encoding process according to an embodiment, FIG. 14 is a flowchart illustrating a content recognition video encoding method according to an exemplary embodiment of the present invention, and FIG. 14 illustrates a content recognition quantization parameter determination algorithm according to an exemplary embodiment.

11 through 14, the content recognition video encoding system 1100 according to an embodiment includes a video encoder 1110 and a content recognition video encoding controller 1120. [ Hereinafter, the content recognition video encoding controller 1120 may be embodied as a hardware module combined with the video encoder 1110. However, the present invention is not limited thereto and can be implemented by a computer program installed in the processor included in the video encoder, And may be embodied in the form of a computer program stored in a recording medium. The content recognition video encoding controller 1120 includes a screen change detection unit 1121, a distortion prediction / quantization parameter setting unit 1122, and a model parameter obtaining unit 1123, which perform the content recognition video encoding method described above And controls the video encoder 1110 in connection with the video encoder 1110.

In single pass video encoding, the feature of the currently encoded frame can not be used until the video encoding of the current frame is completed. That is, extracting the characteristics of the current frame prior to video encoding necessarily leads to significant computational overhead and is not suitable for real-time video streaming applications.

Accordingly, the content-aware video encoding controller 1120 according to an exemplary embodiment may include indicators such as the screen descriptor described with reference to FIGS. 1 to 10, the number of macro blocks encoded in the intra mode, and the number of macro blocks encoded in the skip mode Use the previous frame of the current frame as a reference instead of the current frame. This is because the content-aware video encoding controller 1120 according to an exemplary embodiment may use the screen detection method described on the basis of Equation 20 because the change in the screen characteristics between the current frame and the previous frame is based on a gradual and similar assumption The content-recognized video encoding method is applied in the case where no rapid screen change has occurred.

Hereinafter, the current frame is described as an i-th frame, and the previous frame is described as an (i-1) th frame.

The screen change detection unit 1121 performs screen change detection on a current frame of a group of pictures (including a plurality of frames) (1320), and based on the result of performing screen change detection The optimal frame type of the current frame is determined (1330).

Specifically, the screen change detection unit 1121 detects the number of macroblocks encoded in the intra mode in the previous frame

And the number of macroblocks encoded in skip mode in the previous frame

From the video encoder 1110 and calculates a ratio MIMR between the number of macroblocks encoded in intra mode in the previous frame and the number of macroblocks encoded in skip mode in the previous frame as shown in Equation 20, (1320), and determines an optimal frame type of a current frame according to a result of the detection (1330).

For example, if the ratio MIMR between the number of macroblocks encoded in intra mode in the previous frame and the number of macroblocks encoded in skip mode in the previous frame is greater than the threshold

The screen change detecting unit 1121 recognizes that a screen change occurs in the current frame, and accordingly, the screen change detecting unit 1121 detects an optimal frame type

As an I frame (1311). In this case, the screen change detection unit 1121 detects the optimal quantization parameter

(1312) based on the optimal quantization parameter of the previous frame. A detailed description thereof will be given below.

For another example, if the ratio MIMR between the number of macroblocks encoded in intra mode in the previous frame and the number of macroblocks encoded in skip mode in the previous frame is greater than the threshold

The screen change detecting unit 1121 recognizes that no screen change occurs in the current frame, and accordingly, the screen change detecting unit 1121 detects an optimal frame type

Can be determined as a P frame.

In this case, the screen change detecting unit 1121 may adaptively perform the screen change detection for the current frame only when the current frame is not the first frame of the image group, instead of unconditionally performing the screen change detection for the current frame. That is, the screen change detecting unit 1121 determines whether the current frame is the first frame of the image group (1310). If it is determined that the current frame is not the first frame of the image group, Detection may be performed (1320).

If the current frame is the first frame of the video group, the screen change detection unit 1121 detects the optimal frame type

As an I frame (1311), and determines an optimum quantization parameter

(1312) based on the optimal quantization parameter of the previous frame. For example, if the current frame is the first frame of the image group, the distortion of the I frame is generally lower than the distortion of the P frame under the same quantization parameter. Therefore, in step 1312, the screen change detection unit 1121 detects the optimal quantization parameter

Lt; RTI ID = 0.0 > 1 < / RTI >

.

The optimum frame type of the current frame thus determined may be transmitted to the model parameter obtaining unit 1123 and the video encoder 1110, respectively. In addition, if the current frame is the first frame of the image group, or the ratio MIMR between the number of macroblocks encoded in the intra mode in the previous frame and the number of macroblocks encoded in the skip mode in the previous frame is smaller than the threshold

The optimum frame type of the current frame determined in step 1311

And an optimal quantization parameter of the current frame obtained in step 1312

May be communicated to the video encoder (1110).

The distortion prediction / quantization parameter setting unit 1122 sets the distortion quantization parameter /

The initial quantization parameter in the current frame is set (1340). For example, when the current frame is the first P frame in the image group, the distortion prediction / quantization parameter setting unit 1122 sets the optimal quantization parameter

A value obtained by decrementing 1 by 1 can be set as an initial quantization parameter. This is because the distortion of the I frame is lower than the distortion of the P frame due to the cost of the generally higher bit-rate, so that the target encoding distortion

And the distortion of the first P frame, a quantization parameter lower than the quantization parameter of the previous I frame is required. On the other hand, if the current frame is not the first P frame in the image group, the distortion prediction / quantization parameter setting unit 1122 sets the optimal quantization parameter

As an initial quantization parameter.

The model parameter obtaining unit 1123 uses the model parameter lookup tables 1210 and 1220 that are constructed in advance to determine the optimum frame type

And model parameters in the current frame based on the initial quantization parameters (1350).

Here, the model parameter lookup tables 1210 and 1220 correspond to each of the optimal frame types, quantization parameters, and model parameters

And

Lt; RTI ID = 0.0 > a < / RTI > That is, the model parameter lookup tables 1210 and 1220 use the model parameters

The first lookup table 1210 which is calculated in advance according to each quantization parameter and each frame type and the model parameters used in Equations 18 to 19

And a second lookup table 1220 that is calculated in advance according to each quantization parameter and each frame type.

Accordingly, the model parameter acquisition unit 1123 acquires the optimal frame type of the current frame transmitted from the screen change detection unit 1121

And an initial quantization parameter transmitted from the distortion prediction / quantization parameter setting unit 1122 are applied to the model parameter lookup tables 1210 and 1220 as shown in FIG. 12,

And

On the basis of which the model parameters used in equations 18 to 19

(1230).

The model parameters thus obtained

And

May be transmitted to the distortion prediction / quantization parameter setting unit 1122.

In response to this, the distortion prediction / quantization parameter setting unit 1122 sets the distortion descriptor of the previous frame received from the video encoder 1110, using the distortion prediction model pre-

And model parameters

And

The predicted encoding distortion of the current frame

(1360).

Specifically, the distortion prediction / quantization parameter setting unit 1122 sets the distortion parameter / quantization parameter based on the distortion model expressed by Equations (18) to (19)

And

The screen descriptor

And predicted encoding distortion

A distortion prediction model that defines the relationship between the distortion and the distortion can be constructed in advance.

In equation 21,

Is an update parameter for compensating the prediction error of the distortion prediction model due to the real-time change of the screen complexity, and is defined as Equation (22).

In Equation 22,

Represents an adjustable weight according to the target distortion.

Here, the screen descriptor of the previous frame used by the distortion prediction / quantization parameter setting unit 1122 is expressed by Equation (11) and Equation 13, so that the horizontal slope AC coefficient and the vertical slope AC Can be calculated based on the coefficients.

Therefore, the distortion prediction / quantization parameter setting unit 1122 sets

Calculated by

(1240), the predicted encoding distortion of the current frame using the distortion prediction model as shown in Equation (21)

(1250), the predicted encoding distortion of the current frame

The optimal quantization parameter in the current frame

(1370).

More specifically, in step 1370, the distortion prediction / quantization parameter setting unit 1122 sets the predicted encoding distortion

And target encoding distortion of the current frame

The optimal quantization parameter in the current frame that minimizes the difference between

Can be obtained. For example, the distortion prediction / quantization parameter setting unit 1122 sets the predicted encoding distortion

And target encoding distortion of the current frame

By increasing or decreasing the initial quantization parameter so that the difference between the optimal quantization parameter in the current frame and the initial quantization parameter in the current frame is minimized,

.

Steps 1350 to 1370 may be repeatedly performed for at least one generation. That is, the optimal quantization parameter obtained through steps 1350 to 1370 of the first generation is used as the initial quantization parameter in step 1350 of the second generation, which is the next generation, and steps 1360 and 1370 of the second generation are successively performed . At this time, the repetition of steps 1350 to 1370 can be performed by the content recognition quantization parameter determination algorithm 1 of FIG.

As described above, the content-aware video encoding controller 1120 according to an exemplary embodiment of the present invention includes an optimal quantization parameter

May be obtained through iterative procedures so that the video encoder 1110 may encode the current frame with minimal deviation from the target distortion over conventional techniques.

Distortion prediction / quantization parameter setting unit 1122. The optimum quantization parameter < RTI ID = 0.0 >

May be communicated to the video encoder (1110).

The video encoder 1110 encodes (1380) the current frame based on the optimal frame type of the current frame and the optimal quantization parameter in the current frame from the content-aware video encoding controller 1220, Can be output.

Although the content recognition video encoding controller 1120 has been described as having a structure including the screen change detection unit 1121, the distortion prediction / quantization parameter setting unit 1122 and the model parameter obtaining unit 1123 as described above, And may have a structure that includes more or fewer components to perform the content-aware video encoding method as described above.

FIG. 15 is a flowchart showing a specific example in which the content recognition video encoding method shown in FIG. 13 is performed in the VBR encoding.

Referring to FIG. 15, a method of encoding a content recognition video in a variable bit-rate encoding VBR encoding is performed as follows based on the process described above with reference to FIG. Here, since VBR encoding generally focuses more on video memory, the constraint on bit-rate is relatively weak. Thus, for every frame of the VBR encoding, the target encoding distortion of the i < th >

(Target encoding distortion of the current frame)

Is used.

First, the content recognition video encoding controller according to an embodiment determines whether the current frame is the first frame of the video group (1510).

As a result of the determination, if the current frame is the first frame of the video group, the content recognition video encoding controller performs I frame coding on the current frame (1520). Specifically, if the current frame is the first frame of the video group, the content-aware video encoding controller decides (1521) the optimal frame type of the current frame as an I frame (1521) and sets the optimal quantization parameter of the current frame to the optimal quantization parameter (1522). For example, in step 1522, the content-aware video encoding controller may determine an optimal quantization parameter

Lt; RTI ID = 0.0 > 1 < / RTI >

.

On the other hand, if it is determined that the current frame is not the first frame of the video group, the content recognition video encoding controller performs a screen change detection for the current frame (1530). Specifically, in step 1530, the content-aware video encoding controller decodes the content based on the ratio of the number of macroblocks encoded in intra mode to the number of macroblocks encoded in skip mode in the previous frame in a previous frame received from the video encoder Change detection can be performed. For example, if the ratio between the number of macroblocks encoded in the intra-mode in the previous frame and the number of macroblocks encoded in the skipped mode in the previous frame is greater than or equal to the threshold, the content- If the ratio between the number of macroblocks encoded in the intra mode in the previous frame and the number of macroblocks encoded in the skip mode in the previous frame is less than the threshold value, The controller can recognize that the screen change does not occur in the current frame.

Accordingly, if a screen change occurs in the current frame (recognizing that a screen change has occurred), the content recognition video encoding controller may perform I frame coding on the current frame (1520).

On the other hand, if the screen change has not occurred in the current frame (recognizing that no screen change has occurred), the content recognition video encoding controller may perform P frame coding on the current frame (1540).

Specifically, when the screen change does not occur in the current frame (when it is recognized that the screen change does not occur), the content recognition video encoding controller determines the optimal frame type of the current frame as a P frame 1541, The content recognition quantization parameter determination algorithm 1 described with reference to Figs. 14 to 14 may be used (1542).

In more detail, in step 1542, the content-aware video encoding controller sets an initial quantization parameter in the current frame based on the optimal quantization parameter in the previous frame, and uses the pre-built model parameter look- Based on the frame descriptor and model parameters of the previous frame received from the video encoder using the pre-established distortion prediction model for the current frame, based on the frame type and the initial quantization parameter, And then obtain the optimal quantization parameter in the current frame based on the predicted encoding distortion of the current frame. The operation of the content-aware quantization parameter determination algorithm 1 has been described in detail with reference to FIGS. 11 to 14, and will not be described.

The content aware video encoding controller then encodes the current frame (1550) based on the optimal quantization parameter and optimal frame type in the current frame.

The content-aware video encoding controller then calculates an update parameter used in the distortion prediction model, based on the encoding distortion of the current frame obtained as a result of encoding the current frame and the predicted encoding distortion of the current frame, (1560).

Accordingly, the content-aware video encoding controller may use the update parameter calculated in step 1560 in the process of calculating the predicted encoding distortion for the next frame. That is, the content-aware video encoding controller updates the distortion prediction model defined by Equation 21 based on the update parameter calculated in operation 1560, so that the updated distortion prediction model can be utilized in the process of acquiring the optimal quantization parameter for the next frame have.

16 is a flowchart showing a concrete example in which the content recognition video encoding method shown in FIG. 13 is performed in CVBR encoding.

Referring to FIG. 16, the content-aware video encoding method in CVBR encoding, which is a limited variable bit-rate encoding, is performed as follows based on the process described above with reference to FIG. Here, the CVBR encoding should search for the solution of Equation 4 below the bit-rate constraint of Equation 5. Therefore, in order to minimize the distortion fluctuation in the CVBR encoding, considering the bit-rate constraint, the target encoding distortion of the i-th frame

(The target encoding bit-rate of the current frame) and the target encoding bit-rate of the ith frame (the target encoding bit-rate of the current frame)

Should be determined,

Due to the change in screen complexity between consecutive frames, the number of bits allocated to each frame may be varied to reduce distortion fluctuation while satisfying Equation (5). The screen descriptor of the j-th macroblock of the i-th frame defined by equation

The target encoding bit-rate of the i < th >

In Equation 23,

Represents a constraint on the average number of bits encoded per frame.

Further, an individual target encoding distortion for each frame that satisfies the bit-rate constraint using the RD relation, i.e., the target encoding distortion of the i-th frame (the target encoding distortion of the current frame) can be derived as Equation 24 .

First, the content-aware video encoding controller according to an embodiment calculates the target encoding bit-rate in the current frame based on the constraint condition of the average number of encoded bits per frame as shown in Equations 23 and 24, (1610) the target encoding distortion of the current frame based on the target encoding bit-rate of the current frame.

Then, the content recognition video encoding controller determines whether the current frame is the first frame of the image group (1620).

As a result of the determination, if the current frame is the first frame of the video group, the content recognition video encoding controller performs I frame coding on the current frame (1630). Specifically, if the current frame is the first frame of the video group, the content-aware video encoding controller decides (1631) the optimal frame type of the current frame as an I frame, and updates the optimal quantization parameter of the current frame to the optimal quantization parameter (1632). For example, in step 1632, the content-aware video encoding controller may determine an optimal quantization parameter

Lt; RTI ID = 0.0 > 1 < / RTI >

.

On the other hand, if it is determined that the current frame is not the first frame of the video group, the content-aware video encoding controller 1640 performs screen change detection on the current frame. Specifically, in step 1640, the content-aware video encoding controller determines whether the number of macroblocks encoded in the intra-mode in the previous frame received from the video encoder and the number of macroblocks encoded in the skipped mode in the previous frame, Change detection can be performed. For example, if the ratio between the number of macroblocks encoded in the intra-mode in the previous frame and the number of macroblocks encoded in the skipped mode in the previous frame is greater than or equal to the threshold, the content- If the ratio between the number of macroblocks encoded in the intra mode in the previous frame and the number of macroblocks encoded in the skip mode in the previous frame is less than the threshold value, The controller can recognize that the screen change does not occur in the current frame.

Accordingly, when a screen change occurs in the current frame (when it is recognized that a screen change has occurred), the content recognition video encoding controller may perform I frame coding on the current frame (1630).

On the other hand, if the screen change has not occurred in the current frame (recognizing that no screen change has occurred), the content recognition video encoding controller may perform P frame coding on the current frame (1650).

Specifically, when the screen change does not occur in the current frame (when it is recognized that the screen change does not occur), the content recognition video encoding controller determines the optimum frame type of the current frame as a P frame (1651) The content recognition quantization parameter determination algorithm 1 described with reference to Figs. 14 to 14 can be used (1652).

More specifically, in step 1652, the content-aware video encoding controller sets an initial quantization parameter in the current frame based on the optimal quantization parameter in the previous frame, and uses the pre-built model parameter look- Based on the frame descriptor and model parameters of the previous frame received from the video encoder using the pre-established distortion prediction model for the current frame, based on the frame type and the initial quantization parameter, And then obtain the optimal quantization parameter in the current frame based on the predicted encoding distortion of the current frame. The operation of the content-aware quantization parameter determination algorithm 1 has been described in detail with reference to FIGS. 11 to 14, and will not be described.

The content-aware video encoding controller then determines the predicted encoding bit-rate in the current frame based on the optimal quantization parameter in the current frame

(1660) whether to violate the bit-rate constraint, such as Equation 5. < RTI ID = 0.0 >

In more detail, the content-aware video encoding controller determines in step 1660 whether the predicted encoding bit-rate in the current frame is greater than the target encoding bit-rate in the current frame If it is less than or equal to the target encoding bit-rate in the current frame, the current frame may be encoded 1670 based on the optimal quantization parameter and optimal frame type in the current frame.

The content-aware video encoding controller then calculates an update parameter used in the distortion prediction model, based on the encoding distortion of the current frame obtained as a result of encoding the current frame and the predicted encoding distortion of the current frame, (1680).

Accordingly, the content-aware video encoding controller can use the update parameter calculated in step 1680 in the process of calculating the predicted encoding distortion for the next frame. That is, the content-aware video encoding controller updates the distortion prediction model defined by Equation 21 based on the update parameter calculated in operation 1680, so that the updated distortion prediction model can be utilized in the process of acquiring the optimal quantization parameter for the next frame have.

On the other hand, if it is determined in step 1660 that the predicted encoding bit-rate in the current frame is greater than the target encoding bit-rate in the current frame, then the content-aware video encoding controller determines that the predicted encoding bit- The optimal quantization parameter may be increased 1690 until it is less than or equal to the target encoding bit-rate in the current frame.

The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA) , A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device As shown in FIG. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A content-aware video encoding method based on single-pass consistent picture quality control,
Performing a screen change detection on a current frame of a group of pictures (the group of pictures including a plurality of frames);
Determining an optimal frame type of the current frame based on a result of performing the screen change detection;
Setting an initial quantization parameter in the current frame based on an optimal quantization parameter in a previous frame of the current frame;
Obtaining model parameters in the current frame based on the optimal frame type and the initial quantization parameter of the current frame using a pre-built model parameter lookup table;
Calculating a predicted encoding distortion of the current frame based on the screen descriptor of the previous frame received from the video encoder and the obtained model parameters using a distortion prediction model constructed beforehand for the current frame; And
Obtaining an optimal quantization parameter in the current frame that minimizes the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame
Lt; / RTI >
Wherein obtaining the optimal quantization parameter in the current frame comprises:
Increasing or decreasing the initial quantization parameter such that the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is minimized; And
Obtaining the increased or decreased initial quantization parameter as an optimal quantization parameter in the current frame
/ RTI >
Obtaining model parameters at the current frame, calculating a predicted encoding distortion in the current frame, and obtaining an optimal quantization parameter in the current frame,
And repeatedly performed for at least one generation until the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is minimized,
Wherein increasing or decreasing the initial quantization parameter comprises:
If the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is not minimized, the increasing or decreasing initial quantization parameter may be used to obtain model parameters in the current frame of the next generation Using as the initial quantization parameter in step < RTI ID = 0.0 >
Further comprising the steps of:

The method according to claim 1,
The screen descriptor of the previous frame includes:
Wherein the computation is based on a horizontal slope AC coefficient and a vertical slope AC coefficient of the Discrete Cosine Transform coefficient in the previous frame so as to quantitatively represent the screen complexity in the previous frame.

The method according to claim 1,
In the distortion prediction model,
Wherein the model descriptor is constructed in advance to define a relationship between the picture descriptor and the predicted encoding distortion using the model parameters.

The method according to claim 1,
Wherein the step of performing screen change detection on the current frame comprises:
A screen change detection for the current frame is performed based on a ratio between the number of macroblocks encoded in the intra mode in the previous frame received from the video encoder and the number of macroblocks encoded in the skip mode in the previous frame Step
And generating the content-aware video.

5. The method of claim 4,
Wherein the step of performing screen change detection on the current frame comprises:
If the ratio between the number of macroblocks encoded in the intra mode in the previous frame and the number of macroblocks encoded in the skip mode in the previous frame is greater than or equal to a threshold value, ; or
When the ratio between the number of macroblocks encoded in the intra mode in the previous frame and the number of macroblocks encoded in the skip mode in the previous frame is smaller than the threshold value, Step
The method comprising the steps of:
Wherein determining the optimal frame type of the current frame comprises:
Determining an optimal frame type of the current frame as an I frame when it is recognized that a change in the screen occurs in the current frame; or
Determining that an optimal frame type of the current frame is a P frame if it is recognized that no screen change occurs in the current frame
The method comprising the steps of:

The method according to claim 1,
Wherein the step of performing screen change detection on the current frame comprises:
Determining whether the current frame is the first frame of the image group; And
If it is determined that the current frame is not the first frame of the image group, performing a screen change detection on the current frame
Further comprising the steps of:

The method according to claim 1,
Wherein the model parameter lookup table comprises:
Constructed in advance to define a relationship between each optimal frame type, quantization parameters and model parameters in sample frames.

delete

The method according to claim 1,
Calculating a target encoding bit-rate in the current frame based on a constraint of an average number of encoded bits per frame; And
Calculating a target encoding distortion of the current frame based on a target encoding bit-rate in the current frame
Further comprising:
Wherein obtaining the optimal quantization parameter in the current frame comprises:
Estimating a predicted encoding bit-rate in the current frame based on an optimal quantization parameter in the current frame; And
Increasing the optimal quantization parameter until the predicted encoding bit-rate in the current frame is less than or equal to the target encoding bit-rate in the current frame
Further comprising the steps of:

The method according to claim 1,
Encoding the current frame based on an optimal frame type of the current frame and an optimal quantization parameter in the current frame
Further comprising the steps of:

12. The method of claim 11,
Wherein encoding the current frame comprises:
Calculating an update parameter used in the distortion prediction model based on the encoding distortion of the current frame obtained as a result of encoding the current frame and the predicted encoding distortion of the current frame; And
Using the update parameter in calculating the predicted encoding distortion for the next frame of the current frame
Further comprising the steps of:

A computer program stored in a recording medium for executing a content recognition video encoding method based on single-pass consistent image quality control in combination with a computer embodying an electronic device,
The content-aware video encoding method includes:
Performing a screen change detection on a current frame of a group of pictures (the group of pictures including a plurality of frames);
Determining an optimal frame type of the current frame based on a result of performing the screen change detection;
Setting an initial quantization parameter in the current frame based on an optimal quantization parameter in a previous frame of the current frame;
Obtaining model parameters in the current frame based on the optimal frame type and the initial quantization parameter of the current frame using a pre-built model parameter lookup table;
Calculating a predicted encoding distortion of the current frame based on the screen descriptor of the previous frame received from the video encoder and the obtained model parameters using a distortion prediction model constructed beforehand for the current frame; And
Obtaining an optimal quantization parameter in the current frame that minimizes the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame
Lt; / RTI >
Wherein obtaining the optimal quantization parameter in the current frame comprises:
Increasing or decreasing the initial quantization parameter such that the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is minimized; And
Obtaining the increased or decreased initial quantization parameter as an optimal quantization parameter in the current frame
/ RTI >
Obtaining model parameters at the current frame, calculating a predicted encoding distortion in the current frame, and obtaining an optimal quantization parameter in the current frame,
And repeatedly performed for at least one generation until the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is minimized,
Wherein increasing or decreasing the initial quantization parameter comprises:
If the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is not minimized, the increasing or decreasing initial quantization parameter may be used to obtain model parameters in the current frame of the next generation Using as the initial quantization parameter in step < RTI ID = 0.0 >
And a computer program product stored in the storage medium.

A content-aware video encoding controller based on single-pass consistent picture quality control,
A group of pictures (group of pictures), which includes a plurality of frames, is subjected to screen change detection on a current frame, and based on a result of performing the screen change detection, an optimal frame type of the current frame A screen change detecting unit for determining the screen change;
A distortion prediction / quantization parameter setting unit that sets an initial quantization parameter in the current frame based on an optimal quantization parameter in a previous frame of the current frame; And
Acquiring model parameters in the current frame based on the optimum frame type of the current frame and the initial quantization parameter using a pre-built model parameter lookup table,
Lt; / RTI >
Wherein the distortion prediction / quantization parameter setting unit comprises:
Calculating a predicted encoding distortion of the current frame based on the screen descriptor of the previous frame received from the video encoder and the obtained model parameters using a distortion prediction model previously constructed for the current frame, Obtaining an optimal quantization parameter in the current frame that minimizes the difference between the predicted encoding distortion and the target encoding distortion of the current frame,
Increasing or decrementing the initial quantization parameter such that the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is minimized, and adding the increased or decreased initial quantization parameter to the optimal quantization parameter In addition,
Calculating predicted encoding distortion in the current frame performed by the distortion prediction / quantization parameter setting unit, and calculating the predicted encoding distortion in the current frame, The step of obtaining an optimal quantization parameter comprises:
And repeatedly performed for at least one generation until the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is minimized,
Wherein the step of increasing or decreasing the initial quantization parameter in the distortion prediction /
If the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is not minimized, the increasing or decreasing initial quantization parameter may be used to obtain model parameters in the current frame of the next generation Using as the initial quantization parameter in step < RTI ID = 0.0 >
Further comprising: a content-aware video encoding controller.

A content-aware video encoding system based on single-pass consistent picture quality control,
Video encoder; And
The content-aware video encoding controller
Lt; / RTI >
The content-aware video encoding controller includes:
A group of pictures (group of pictures), which includes a plurality of frames, is subjected to screen change detection on a current frame, and based on a result of performing the screen change detection, an optimal frame type of the current frame A screen change detecting unit for determining the screen change;
A distortion prediction / quantization parameter setting unit that sets an initial quantization parameter in the current frame based on an optimal quantization parameter in a previous frame of the current frame; And
Acquiring model parameters in the current frame based on the optimum frame type of the current frame and the initial quantization parameter using a pre-built model parameter lookup table,
Lt; / RTI >
Wherein the distortion prediction / quantization parameter setting unit comprises:
Calculating a predicted encoding distortion of a current frame based on a screen descriptor of the previous frame received from the video encoder and the obtained model parameters using a distortion prediction model constructed in advance for the current frame, To obtain an optimal quantization parameter in the current frame that minimizes the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame,
Increasing or decrementing the initial quantization parameter such that the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is minimized, and adding the increased or decreased initial quantization parameter to the optimal quantization parameter In addition,
Calculating predicted encoding distortion in the current frame performed by the distortion prediction / quantization parameter setting unit, and calculating the predicted encoding distortion in the current frame, The step of obtaining an optimal quantization parameter comprises:
And repeatedly performed for at least one generation until the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is minimized,
Wherein the step of increasing or decreasing the initial quantization parameter in the distortion prediction /
If the difference between the predicted encoding distortion of the current frame and the target encoding distortion of the current frame is not minimized, the increasing or decreasing initial quantization parameter may be used to obtain model parameters in the current frame of the next generation Using as the initial quantization parameter in step < RTI ID = 0.0 >
Further comprising:
The video encoder comprising:
And encodes the current frame based on an optimal frame type of the current frame delivered from the content aware video encoding controller and an optimal quantization parameter in the current frame.