CN115567712A

CN115567712A - Screen content video coding perception code rate control method and device based on just noticeable distortion by human eyes

Info

Publication number: CN115567712A
Application number: CN202211156529.3A
Authority: CN
Inventors: 陈婧; 王世萍; 陈淋淋; 曾焕强; 朱建清; 蔡灿辉
Original assignee: Huaqiao University
Current assignee: Huaqiao University
Priority date: 2022-09-22
Filing date: 2022-09-22
Publication date: 2023-01-03

Abstract

The invention discloses a screen content video coding perception code rate control method and device based on distortion just noticeable to human eyes, and belongs to the field of video coding. Firstly, acquiring a screen content video, carrying out edge modeling on the screen content video to obtain a two-dimensional edge model, extracting edge characteristics of the two-dimensional edge model, and calculating to obtain edge model parameters of the two-dimensional edge model; acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, constructing a pixel domain JND model of the screen content video, and determining a JND factor; then guiding perception complexity classification and target bit allocation by using edge features and JND factors; and finally, constructing a perception bit rate control model under the JND constraint condition through the edge similarity of the reference video frame and the reconstructed video frame. The method can improve the code rate control precision of the screen content video and obviously improve the coding rate distortion performance of the screen content video.

Description

Screen content video coding perception code rate control method and device based on just noticeable distortion by human eyes

Technical Field

The invention relates to the field of video coding, in particular to a method and a device for controlling perceptual code rate of screen content video coding based on distortion just noticeable by human eyes.

Background

With the more mature internet technology, video applications become more and more abundant, and the demand of video communication is increasing day by day. The data volume of the original video signal is huge, and the transmission and the storage are huge burdens, so that the compression coding of the video is necessary. In order to interconnect and intercommunicate different products of different companies, the international video organization sets a series of video coding standards to unify the code stream format after video compression. Among them, h.264/AVC is currently the most widely used video coding standard, which makes high definition video and internet video widely spread. With the requirement of people on video definition and compression efficiency being improved, high efficiency video coding HEVC/H.265 is provided, compared with H.264/AVC, half of code rate can be saved under the same video coding quality, and application of high dynamic video and ultra-high definition video is promoted.

In recent years, screen content video applications are gradually deepened into daily life of people, such as video conferences, online education, live game broadcasting and the like, and people have an increasing demand for screen content videos. Screen content video is computer generated and has many different features than natural video captured by a camera. On one hand, on the spatial domain, screen content video has a large number of flat areas, repetitive patterns, sharp edges, limited color categories, and the like; on the other hand, there may be no correlation between adjacent frames in the time domain because of no physical limitation, that is, the screen content video contains a large number of abrupt frames and still frames. The traditional video coding standard is designed aiming at the characteristics of natural video by default, and in order to adapt to the characteristics of screen content video, the coding standard based on screen content video (HEVC-SCC) is extended on the basis of the HEVC standard. Four new coding tools are mainly added on the basis of the HEVC video coding standard, including Intra Block Copy (IBC), palette Mode (PLT), adaptive Color Transform (ACT), and Adaptive Motion Vector Resolution (AMVR). But in the aspect of rate control, the R-lambda model designed for natural video is still adopted.

In order to efficiently utilize bandwidth resources and improve transmission efficiency, code rate control is an essential link in an encoder. The main work of code rate control is to establish a mathematical relation model between the coding code rate and the quantization parameter, and determine the coding parameter according to the target code rate, so that the coded code rate can adapt to the requirement of the current video transmission bandwidth. However, the current screen content video coding standard HEVC-SCC still performs rate control on natural video, and there is no rate control algorithm for screen content video characteristics and human eye visual perception characteristics. Therefore, it is necessary to research the perceptual code rate control method for screen content video.

Disclosure of Invention

The technical problems mentioned above are addressed. The embodiment of the application aims to provide a screen content video coding perception code rate control method and device based on distortion just noticeable by human eyes by combining screen content characteristics and human eye visual characteristics, so as to solve the technical problems mentioned in the background technology part and improve screen content video coding quality, rate distortion performance and perception code rate control precision.

In a first aspect, an embodiment of the present application provides a method for controlling perceptual bit rate of video coding of screen content based on distortion just noticeable to human eyes, including the following steps:

s1, acquiring a screen content video, carrying out edge modeling on the screen content video to obtain a two-dimensional edge model, extracting edge characteristics of the two-dimensional edge model, and calculating to obtain edge model parameters of the two-dimensional edge model;

s2, acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on edge model parameters, constructing a pixel domain JND model of the screen content video according to the brightness self-adaptive threshold, the contrast masking effect threshold, the edge non-structural distortion sensitivity threshold and the structural distortion sensitivity threshold, and determining a JND factor;

s3, obtaining an edge feature factor according to the edge feature, and guiding target bit allocation by using the edge feature factor and a JND factor;

s4, under the constraint of a pixel domain JND model, calculating to obtain a similarity measurement factor of the reference frame and the reconstructed frame according to the edge feature factor of the reference frame and the edge feature factor of the reconstructed frame, and constructing a perception rate distortion model based on the similarity measurement factor of the reference frame and the reconstructed frame;

and S5, estimating coding parameters through the perceptual rate distortion model, and updating the perceptual rate distortion model according to actual coding parameters.

Preferably, step S1 specifically includes:

expanding the one-dimensional edge model into a two-dimensional edge model, wherein the expanded two-dimensional edge model is calculated as follows:

wherein, the edge model parameters b, c and w respectively represent the brightness, contrast and structure of the edge profile, x represents the horizontal direction coordinate of the two-dimensional edge model, y represents the vertical direction coordinate of the two-dimensional edge model, theta represents the direction angle of the two-dimensional edge model, and erf (-) is an error function;

performing two-dimensional Gaussian partial derivative filtering on a two-dimensional edge model of a screen content video to obtain a two-dimensional edge detection operator e _2D (x, y; c, w, σ, θ), calculated as follows:

e _2D (x,y；c,w,σ,θ)＝|e _x (x,y；c,w,σ,θ)|+|e _y (x,y；c,w,σ,θ)|；

wherein σ represents a smoothing parameter of a gaussian function;

operator e for detecting two-dimensional edge _2D (x, y; c, w, sigma, theta) is convolved with the screen content video and extractedIts edge features;

the edge model parameters are calculated as follows:

wherein e ₁ 、e ₂ 、e ₃ One-dimensional edge x = (0, a, -a) gaussian derivative filtered response of three positions respectively,

preferably, the screen content video is divided into an edge pixel set S after edge detection _E And a non-edge pixel set, based on the edge pixel set S in step S2 _E And acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold.

Preferably, step S2 specifically includes:

edge pixel set S _E The luminance adaptive threshold for the middle edge pixel p is calculated as follows:

T _elum (p)＝s _2D (p；T _lum (p)+b,c,w)-s(p；b,c,w)；

wherein, T _lum (p) denotes the luminance masking effect, when an edge pixel p belongs to the set of edge pixels S _E When the average luminance along the edge contour is I (p) = b + c/2, seeNumber alpha ₁ 、α ₂ Beta is a constant;

the contrast masking effect threshold for the edge pixel p is calculated as follows:

T _econ (p)＝min{|s _2D (p；b,T _con+ ,w)-s _2D (p；b,c,w)|,|s _2D (p；b,T _con- ,w)-s _2D (p；b,c,w)|}；

where c (p) denotes the contrast of the edge pixel p, f _th Is a constant;

by adaptation of the threshold T to the luminance _elum (p) and a contrast masking effect threshold T _econ (p) fusing the nonlinear additive models to obtain an edge unstructured distortion sensitivity threshold T _nstr (p) the calculation formula is as follows:

T _nstr (p)＝T _elum (p)+T _econ (p)-C _nstr ·min{T _elum (p),T _econ (p)}；

wherein, 0 is more than C _nstr ＜1；

The structural distortion sensitivity threshold for the edge pixel p is calculated as follows:

T _str (p)＝|s _2D (p；b,c,w+Δw)-s _2D (p；b,c,w)|；

combining the edge non-structural distortion sensitivity and the structural distortion sensitivity threshold value to obtain the edge pixel human eye just noticeable distortion threshold value T suitable for the screen content video _e (p), the JND factor for edge pixel p, is calculated as follows:

T _e (p)＝T _nstr (p)+T _str (p)-C _e ·min{T _str (p),T _nstr (p)}；

wherein, C _e As a constant, only the brightness is considered for non-edge pixel setsAnd (3) obtaining a pixel domain JND model of the final screen content video after integration according to a visibility threshold of the degree masking effect, wherein the pixel domain JND model comprises the following steps:

preferably, step S3 specifically includes:

at the frame level, allocating target bits to the current coding frame according to the perceptual weight of the current coding frame in the current GOP, wherein the perceptual weight PW of the current coding frame in the current GOP _cur The calculation is as follows:

wherein, EF _GOP Representing the edge characteristic factor, EF, of the current whole GOP _coded Representing edge characteristic factors, EF, of the encoded frame _curRefpic Representing edge feature factors of the reference frame, according to a two-dimensional edge detection operator e _2D (x, y; c, w, σ, θ) is summed by convolution with the reference frame, and thus, the frame-level target bit allocation formula is as follows:

Tar _curfpic ＝(Tar _GOP -Act _codedpics )×PW _curpic ×ε+0.5；

Tar _GOP representing GOP level target bits, obtained from target bits of the video sequence in the coding profile, act _codedpics Representing the actual bits consumed by the encoded frames in the GOP, epsilon being a frame-level bit allocation adjustment factor, associated with the screen content video, obtained by experimentation;

at the CTU level, the specific allocation formula is:

wherein, tar _CTU (i) Represents the ith CTU target bit of the current code, B _curleft Representing the remaining bits, tar, of the current coded frame _codedCTU And Act _codedCTU Representing the encoded CTU target and actual bits, SW representing the size of the sliding window, W _JND (i) The weight representing the just tolerable distortion threshold JND for the eye currently encoding the ith CTU is calculated as follows:

JND _AvecurCTU is the mean of all JND factors of pixels in the ith CTU of the current code, and JND _Avecurpic Is the mean, PW, of JND factors of all pixels within the currently coded frame _curCTU Is the perceptual weight of the current CTU, similar to the frame level, F (CW) _curCTU ) The weighting factors representing different types of CTUs are defined as follows:

CW _curCTU the complexity weight representing the current block is determined by the edge feature factor and is calculated as follows:

EF _curCTU edge feature factor, EF, representing the current CTU _curpic Representing the edge feature factor of the current frame.

Preferably, step S4 specifically includes:

on the basis of the original rate distortion model R-lambda, the code rate control parameter lambda of the CTU level is obtained _scc The calculation is as follows:

λ _scc ＝τ×bpp _JND ^γ ；

where τ and γ are model parameters related to video characteristics of the screen content, bpp _JND Represents the coded bits per pixel, namely:

wherein, tar _CTU (i) As target bit of the current CTU, W _CTU And H _CTU Respectively representing the width and the height of the current CTU, taking just noticeable distortion JND factors of human eyes as perception constraints, combining similarity measurement factors of a reference frame and a reconstructed frame, and constructing a perception rate distortion model, wherein the parameter lambda of the perception rate distortion model _JND And quantization parameter QP _JND The calculation is as follows:

λ _JND ＝k×(T _JND ×SSIM _JND ×λ _scc )；

QP _JND ＝4.2005×λ _JND +13.7122+0.5；

where k is a model parameter related to the characteristics of the video content of the screen content, obtained by experiment, T _JND Represents the average visibility threshold factor for the current coded frame, calculated as follows:

SSIM _JND is a similarity measurement factor of the reference frame and the reconstructed frame, calculated as follows:

wherein, EF _r Representing the edge characteristic factor, EF, of the current reference frame _d Edge feature factor representing reconstructed frame, c ₁ Is a constant.

Preferably, step S5 specifically includes:

predicting the optimal coding parameters by using a perception rate distortion model to carry out coding;

and after the actual coding is finished, updating the perception rate distortion model by using the actual coding parameters.

In a second aspect, an embodiment of the present application provides a device for controlling perceptual bit rate in video coding of screen content based on distortion just noticeable to human eyes, including:

the edge modeling module is configured to acquire a screen content video, perform edge modeling on the screen content video to obtain a two-dimensional edge model, extract edge characteristics of the two-dimensional edge model, and calculate edge model parameters of the two-dimensional edge model;

the JND factor module is configured to acquire a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, construct a pixel domain JND model of the screen content video according to the brightness self-adaptive threshold, the contrast masking effect threshold, the edge non-structural distortion sensitivity threshold and the structural distortion sensitivity threshold, and determine a JND factor;

the target bit distribution module is configured to obtain an edge characteristic factor according to the edge characteristic and guide target bit distribution by using the edge characteristic factor and the JND factor;

the model construction module is configured to calculate similarity measurement factors of the reference frame and the reconstructed frame according to the edge feature factors of the reference frame and the reconstructed frame under the constraint of a pixel domain JND model, and construct a perception rate distortion model based on the similarity measurement factors of the reference frame and the reconstructed frame;

and the updating module is configured to estimate the coding parameters through the perceptual rate distortion model and update the perceptual rate distortion model according to the actual coding parameters.

In a third aspect, embodiments of the present application provide an electronic device comprising one or more processors; storage means for storing one or more programs which, when executed by one or more processors, cause the one or more processors to carry out a method as described in any one of the implementations of the first aspect.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method as described in any implementation manner of the first aspect.

Compared with the prior art, the invention has the following beneficial effects:

(1) The method of the invention fully considers the content characteristics of the screen content video and the characteristics of the human eye vision system, and constructs a Just Noticeable Distortion (JND) model which can well describe the human eye vision system according to the edge parameters of the screen content video. The edge characteristics and the JND factors are used for guiding the perception complexity classification and the target bit allocation, and compared with the allocation mode without considering the content characteristics of the screen video in the standard, the target bit rate allocation scheme is more accurate.

(2) The method of the invention fully considers the problem that the rate distortion model in the standard does not consider the characteristics of the screen content video, combines the human eye perceptible distortion JND model and the similarity measurement factor of the reference frame and the reconstructed frame to construct the perceptual rate distortion model of the screen content video, improves the quality and rate distortion performance of the coded video sequence and improves the perceptual code rate control precision.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is an exemplary device architecture diagram in which one embodiment of the present application may be applied;

fig. 2 is a schematic flowchart of a method for controlling perceptual bit rate of video coding of screen content based on distortion just noticeable by human eyes according to an embodiment of the present application;

fig. 3 is an overall flowchart of a method for controlling perceptual bit rate of video coding of screen content based on distortion just noticeable by human eyes according to an embodiment of the present application;

fig. 4 is an overall block diagram of rate control of the screen content video coding perceptual rate control method based on distortion just noticeable by human eyes according to the embodiment of the present application;

fig. 5 is a schematic diagram of a one-dimensional smooth step-type edge model of a screen content video coding perceptual code rate control method based on distortion just noticeable by human eyes according to an embodiment of the present application;

FIG. 6 is a diagram of a perceptual bitrate control apparatus for video coding of screen content based on distortion just noticeable by human eyes according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a computer device suitable for implementing an electronic apparatus according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 illustrates an exemplary device architecture 100 to which a screen content video coding perceptual rate control method based on distortion just noticeable to the human eye or a screen content video coding perceptual rate control device based on distortion just noticeable to the human eye according to an embodiment of the present application may be applied.

As shown in fig. 1, the apparatus architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various applications, such as data processing type applications, file processing type applications, and the like, may be installed on the

terminal devices

101, 102, 103.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal devices

101, 102, 103 are software, they can be installed in the electronic devices listed above. It may be implemented as a plurality of software or software modules (e.g., software or software modules used to provide distributed services) or as a single software or software module. And is not particularly limited herein.

The server 105 may be a server that provides various services, such as a background data processing server that processes files or data uploaded by the

terminal devices

101, 102, 103. The background data processing server can process the acquired files or data to generate a processing result.

It should be noted that the method for controlling perceptual bitrate of screen content video coding based on distortion just noticeable by human eyes provided in the embodiment of the present application may be executed by the server 105, or may also be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for controlling perceptual bitrate of screen content video coding based on distortion just noticeable by human eyes may be disposed in the server 105, or may also be disposed in the

terminal devices

101, 102, and 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the processed data does not need to be acquired from a remote location, the apparatus architecture described above may not include a network, but only a server or a terminal device.

Fig. 2 shows a method for controlling perceptual bitrate of video coding based on screen content distortion just noticeable by human eyes according to an embodiment of the present application, which includes the following steps:

s1, acquiring a screen content video, carrying out edge modeling on the screen content video to obtain a two-dimensional edge model, extracting edge characteristics of the two-dimensional edge model, and calculating to obtain edge model parameters of the two-dimensional edge model.

In a specific embodiment, the overall flow of the embodiment of the present application is shown in fig. 3, and the overall flow block diagram of the rate control of the embodiment of the present application is shown in fig. 4. The step S1 specifically includes:

wherein, the edge model parameters b, c and w respectively represent the brightness, contrast and structure of the edge profile, x represents the horizontal coordinate of the two-dimensional edge model, y represents the vertical coordinate of the two-dimensional edge model, theta represents the direction angle of the two-dimensional edge model, and erf (·) is an error function;

performing two-dimensional Gaussian partial derivative filtering on a two-dimensional edge model of a screen content video to obtain a two-dimensional edge detection operator e _2D (x, y; c, w, σ, θ), calculated as:

e _2D (x,y；c,w,σ,θ)＝|e _x (x,y；c,w,σ,θ)|+|e _y (x,y；c,w,σ,θ)|；

wherein σ represents a smoothing parameter of a gaussian function;

detecting operator e of two-dimensional edge _2D (x, y; c, w, sigma, theta) is convolved with the screen content video, and the edge features of the screen content video are extracted; the edge model parameters are calculated as follows:

wherein e is ₁ 、e ₂ 、e ₃ One-dimensional edge x = (0, a, -a) gaussian derivative filtered response of three positions respectively,

specifically, a screen content video is input in an HM + SCM platform, edge modeling is performed on the screen content video to obtain edge characteristics of the screen content video, a one-dimensional edge model is expanded into a two-dimensional edge model, the two-dimensional edge model of the screen content video is established, and the edge characteristics of the screen content video are detected through a two-dimensional Gaussian partial derivative function. Preferably, the one-dimensional edge model is a one-dimensional smooth step-type edge model, and the one-dimensional smooth step-type edge model is as shown in fig. 5. The parameter b determines the substrate strength of the edge, the parameter c reflects the contrast strength of the edge, the higher c corresponds to the stronger edge, the edge structure is controlled by w, w corresponds to a specific shape, and the smaller w is, the sharper the edge profile becomes. And obtaining a two-dimensional smooth step type edge model after expansion.

Performing two-dimensional Gaussian partial derivative filtering on the two-dimensional edge model of the screen content image to obtain a two-dimensional edge detection operator e _2D (x, y; c, w, sigma, theta), convolving with the screen content image and extracting the edge characteristics, and calculating as follows:

for simplicity of calculation, in the embodiment of the present application, the two-dimensional edge detection operator e _2D The calculation formula of (x, y; c, w, σ, θ) is simplified as follows:

e _2D (x,y；c,w,σ,θ)＝|e _x (x,y；c,w,σ,θ)|+|e _y (x,y；c,w,σ,θ)|；

edge center point (x) ₀ ,y ₀ ) The edge direction of (2) is calculated as follows:

s2, acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, constructing a pixel domain JND model of the screen content video according to the brightness self-adaptive threshold, the contrast masking effect threshold, the edge non-structural distortion sensitivity threshold and the structural distortion sensitivity threshold, and determining a JND factor.

In a specific embodiment, after the edge detection, the screen content video is divided into an edge pixel set S _E And a non-edge pixel set, based on the edge pixel set S in step S2 _E And acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold, and acquiring a JND factor of the threshold.

In a specific embodiment, step S2 specifically includes:

T _elum (p)＝s _2D (p；T _lum (p)+b,c,w)-s(p；b,c,w)；

wherein, T _lum (p) denotes the luminance masking effect, when an edge pixel p belongs to the set of edge pixels S _E When the average brightness along the edge profile is I (p) = b + c/2, parameter α ₁ 、α ₂ Beta is a constant, in this example, parameter alpha ₁ ＝17，α ₂ ＝3/128，β＝3。

where c (p) represents the contrast of the edge pixel p, f _th Is a constant; in this example, f _th ＝0.14。

By adaptation of the threshold T to the luminance _elum (p) and a contrast masking effect threshold T _econ (p) fusing the nonlinear additive models to obtain an edge unstructured distortion sensitivity threshold T _nstr (p), the calculation formula is as follows:

T _nstr (p)＝T _elum (p)+T _econ (p)-C _nstr ·min{T _elum (p),T _econ (p)}；

wherein, 0 < C _nstr < 1, in this embodiment, set C _nstr Is 0.2.

T _str (p)＝|s _2D (p；b,c,w+Δw)-s _2D (p；b,c,w)|；

the threshold value of the edge non-structural distortion sensitivity and the structural distortion sensitivity is fused to obtain the threshold value T of the edge pixel which is suitable for the screen content video and just perceived by human eyes _e (p), the JND factor for edge pixel p, is calculated as follows:

T _e (p)＝T _nstr (p)+T _str (p)-C _e ·min{T _str (p),T _nstr (p)}；

wherein, C _e Is a constant, in this example, C _e Set to 0.2. For the non-edge pixel set, only considering the visibility threshold of the brightness masking effect, and obtaining a pixel domain JND model of the final screen content video after integration as follows:

and S3, obtaining an edge feature factor according to the edge feature, and guiding target bit allocation by using the edge feature factor and the JND factor.

In a specific embodiment, step S3 specifically includes:

wherein, EF _GOP Representing the edge characteristic factor, EF, of the current whole GOP _coded Representing edge characteristic factors, EF, of the encoded frame _curRefpic Representing edge feature factors of a reference frame, according to a two-dimensional edge detection operator e _2D (x, y; c, w, σ, θ) is summed over the reference frame convolution, and thus, the frame-level target bit allocation formula is as follows:

Tar _curfpic ＝(Tar _GOP -Act _codedpics )×PW _curpic ×ε+0.5；

Tar _GOP representing GOP level target bits, obtained from target bits of the video sequence in the coding profile, act _codedpics Representing the actual bits consumed by the encoded frames in the GOP, epsilon being a frame-level bit allocation adjustment factor, related to the screen content video, obtained by experimentation;

at the CTU level, the specific allocation formula is:

wherein, tar _CTU (i) Represents the ith CTU target bit of the current code, B _curleft Representing the remaining bits, tar, of the current coded frame _codedCTU And Act _codedCTU Representing the encoded CTU target and actual bits, SW representing the size of the sliding window, W _JND (i) The weight of the just tolerable distortion threshold JND for the human eye currently encoding the ith CTU is calculated as follows:

JND _AvecurCTU is the mean of all JND factors of pixels in the ith CTU of the current code, and JND _Avecurpic Is the mean, PW, of JND factors of all pixels within the currently encoded frame _curCTU Is the perceptual weight of the current CTU, similar to the frame level, F (CW) _curCTU ) The weighting factors representing different types of CTUs are defined as follows:

EF _curCTU representing the edge characteristic factor, EF, of the current CTU _curpic Representing the edge feature factor of the current frame.

Specifically, at a frame level, the perceptual weight of a current coding frame in a current GOP is calculated according to the edge characteristics of a screen content video, a target bit is allocated to the current coding frame according to the perceptual weight ratio of the current coding frame and an uncoded frame, at a CTU level, besides the perceptual weight of the current CTU is considered, a human eye just tolerable distortion threshold JND weight of the CTU is fused, and a larger JND weight indicates that the current CTU can tolerate a larger distortion degree, and a smaller target bit number can be allocated.

Specifically, at the CTU level, the CTUs are divided into complex, contiguous, and simple CTU blocks according to complexity weights of encoding the CTUs. For a CTU block with higher complexity, the perceptual information contained is more, and more bits need to be allocated, whereas for a simple CTU, it is required to allocate more bitsFewer bits are allocated. For different classes, besides the perception weight of the current CTU is considered, the just tolerable distortion threshold JND weight of the human eye of the CTU is fused. W is a group of _JND (i) Weight, W, representing the just tolerable distortion threshold JND for the eye currently encoding the ith CTU _JND (i) A larger value indicates a larger tolerable distortion degree of the current CTU, and the distributed target bit is correspondingly smaller, F (CW) _curCTU ) The weighting factors of different types of CTUs are represented, more bits need to be distributed to complex CTU blocks, the weighting factors are large, the number of continuous CTU blocks is the next, and the number of simple CTU blocks is the smallest.

And S4, under the constraint of the pixel domain JND model, calculating to obtain a similarity measurement factor of the reference frame and the reconstructed frame according to the edge characteristic factor of the reference frame and the edge characteristic factor of the reconstructed frame, and constructing a perception rate distortion model based on the similarity measurement factor of the reference frame and the reconstructed frame.

In a specific embodiment, after the GOP level, frame level and CTU level assignment is completed, the process proceeds to step S4. Step S4 specifically includes:

λ _scc ＝τ×bpp _JND ^γ ；

wherein, tar _CTU (i) As target bit of the current CTU, W _CTU And H _CTU Respectively representing the width and the height of the current CTU, taking just noticeable distortion JND factors of human eyes as perception constraints, combining similarity measurement factors of a reference frame and a reconstructed frame, constructing a perception rate distortion model, and constructing a parameter lambda of the perception rate distortion model _JND And quantization parameter QP _JND The calculation is as follows:

λ _JND ＝k×(T _JND ×SSIM _JND ×λ _scc )；

QP _JND ＝4.2005×λ _JND +13.7122+0.5；

where k is a model parameter related to the characteristics of the video content of the screen content, obtained by experiment, T _JND Represents the average visibility threshold factor for the current encoded frame, calculated as follows:

Specifically, on the basis of an R-lambda model, a JND model is used as constraint, a similarity measurement factor of a video reference frame and a video reconstruction frame is added to be used as actual coding feedback, and a perception rate distortion model is constructed. In order to improve the perceptual coding performance of the screen content video, the embodiment of the application adds a Just Noticeable Distortion (JND) factor of human eyes as a perceptual constraint, and constructs a perceptual rate distortion model based on just noticeable distortion of human eyes by combining with a similarity measurement factor of a reference frame and a reconstructed frame.

In a specific embodiment, step S5 specifically includes:

estimating optimal coding parameters by using a perception rate distortion model to carry out coding;

The specific algorithm is consistent with the maintenance platform. And the updated perception rate distortion model is a perception code rate control model. The embodiment of the application improves the rate distortion performance of video coding and improves the code rate control precision and the coding quality.

With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present application provides an embodiment of a device for controlling perceptual bitrate in video coding of screen content based on distortion just noticeable by human eyes, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device can be applied to various electronic devices in particular.

The embodiment of the application provides a screen content video coding perception code rate control device based on just noticeable distortion by human eyes, which comprises:

the edge modeling module 1 is configured to acquire a screen content video, perform edge modeling on the screen content video to obtain a two-dimensional edge model, extract edge characteristics of the two-dimensional edge model, and calculate edge model parameters of the two-dimensional edge model;

the JND factor module 2 is configured to acquire a brightness adaptive threshold, a contrast masking effect threshold, an edge unstructured distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, construct a pixel domain JND model of the screen content video according to the brightness adaptive threshold, the contrast masking effect threshold, the edge unstructured distortion sensitivity threshold and the structural distortion sensitivity threshold, and determine a JND factor;

the target bit distribution module 3 is configured to obtain an edge feature factor according to the edge feature and guide target bit distribution by using the edge feature factor and a JND factor;

the model building module 4 is configured to calculate and obtain a similarity measurement factor between the reference frame and the reconstructed frame according to the edge feature factor of the reference frame and the edge feature factor of the reconstructed frame under the constraint of the pixel domain JND model, and build a perceptual rate distortion model based on the similarity measurement factor between the reference frame and the reconstructed frame;

and the updating module 5 is configured to perform coding parameter estimation through the perceptual rate distortion model, and update the perceptual rate distortion model according to the actual coding parameters.

Reference is now made to fig. 7, which is a schematic diagram illustrating a computer device 700 suitable for use in implementing an electronic device (e.g., the server or the terminal device shown in fig. 1) according to an embodiment of the present application. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present application.

As shown in fig. 7, the computer apparatus 700 includes a Central Processing Unit (CPU) 701 and a Graphics Processing Unit (GPU) 702, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 703 or a program loaded from a storage section 709 into a Random Access Memory (RAM) 704. In the RAM704, various programs and data necessary for the operation of the apparatus 700 are also stored. The CPU 701, GPU702, ROM 703, and RAM704 are connected to each other via a bus 705. An input/output (I/O) interface 706 is also connected to bus 705.

The following components are connected to the I/O interface 706: an input section 707 including a keyboard, a mouse, and the like; an output section 708 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 709 including a hard disk and the like; and a communication section 710 including a network interface card such as a LAN card, a modem, or the like. The communication section 710 performs communication processing via a network such as the internet. The driver 711 may also be connected to the I/O interface 706 as needed. A removable medium 712 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 711 as necessary, so that a computer program read out therefrom is mounted into the storage section 709 as necessary.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication section 710, and/or installed from the removable media 712. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 701 and a Graphics Processing Unit (GPU) 702.

It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or a combination of any of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present application may be implemented by software or hardware. The modules described may also be provided in a processor.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may be separate and not incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a screen content video, carrying out edge modeling on the screen content video to obtain a two-dimensional edge model, extracting edge characteristics of the two-dimensional edge model, and calculating to obtain edge model parameters of the two-dimensional edge model; acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on edge model parameters, constructing a pixel domain JND model of the screen content video according to the brightness self-adaptive threshold, the contrast masking effect threshold, the edge non-structural distortion sensitivity threshold and the structural distortion sensitivity threshold, and determining a JND factor; obtaining an edge feature factor according to the edge feature, and guiding target bit allocation by using the edge feature factor and a JND factor; under the constraint of a pixel domain JND model, calculating to obtain a similarity measurement factor of a reference frame and a reconstructed frame according to an edge feature factor of the reference frame and an edge feature factor of the reconstructed frame, and constructing a perception rate distortion model based on the similarity measurement factor of the reference frame and the reconstructed frame; and estimating coding parameters through the perceptual rate distortion model, and updating the perceptual rate distortion model according to actual coding parameters.

The foregoing description is only exemplary of the preferred embodiments of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims

1. A screen content video coding perception code rate control method based on just noticeable distortion by human eyes is characterized by comprising the following steps:

s2, acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, constructing a pixel domain JND model of the screen content video according to the brightness self-adaptive threshold, the contrast masking effect threshold, the edge non-structural distortion sensitivity threshold and the structural distortion sensitivity threshold, and determining a JND factor;

s3, obtaining an edge feature factor according to the edge feature, and guiding target bit allocation by using the edge feature factor and the JND factor;

s4, under the constraint of a pixel domain JND model, calculating according to the edge characteristic factors of the reference frame and the reconstructed frame to obtain similarity measurement factors of the reference frame and the reconstructed frame, and constructing a perceptual rate distortion model based on the similarity measurement factors of the reference frame and the reconstructed frame;

and S5, estimating coding parameters through a perceptual rate distortion model, and updating the perceptual rate distortion model according to actual coding parameters.

2. The method for controlling perceptual bit rate of video coding of screen content based on just noticeable distortion to human eyes according to claim 1, wherein the step S1 specifically comprises:

performing two-dimensional Gaussian partial derivative filtering on the two-dimensional edge model of the screen content video,obtaining a two-dimensional edge detection operator e _2D (x, y; c, w, σ, θ), calculated as:

e _2D (x,y；c,w,σ,θ)＝|e _x (x,y；c,w,σ,θ)|+|e _y (x,y；c,w,σ,θ)|；

wherein σ represents a smoothing parameter of the gaussian function;

detecting the two-dimensional edge operator e _2D (x, y; c, w, sigma, theta) is convolved with the screen content video, and the edge features of the screen content video are extracted;

the edge model parameters are calculated as follows:

wherein e ₁ 、e ₂ 、e ₃ One-dimensional edge x = (0, a, -a) response of three positions after gaussian bias filtering respectively,

3. the method as claimed in claim 2, wherein the screen content video is divided into an edge pixel set S after edge detection _E And a non-edge pixel set, in the step S2, based on the edge pixel set S _E And acquiring a brightness adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold.

4. The method for controlling perceptual bit rate of video coding of screen content based on just noticeable distortion to human eyes as claimed in claim 3, wherein said step S2 specifically comprises:

the edge pixel set S _E The luminance adaptive threshold for the middle edge pixel p is calculated as follows:

T _elum (p)＝s _2D (p；T _lum (p)+b,c,w)-s(p；b,c,w)；

wherein, T _lum (p) denotes the luminance masking effect, when an edge pixel p belongs to the set of edge pixels S _E When the average luminance along the edge contour is I (p) = b + c/2, parameter α ₁ 、α ₂ Beta is a constant;

where c (p) represents the contrast of the edge pixel p, f _th Is a constant;

T _nstr (p)＝T _elum (p)+T _econ (p)-C _nstr ·min{T _elum (p),T _econ (p)}；

wherein, 0 is more than C _nstr ＜1；

The structural distortion sensitivity threshold of the edge pixel p is calculated as follows:

T _str (p)＝|s _2D (p；b,c,w+Δw)-s _2D (p；b,c,w)|；

fusing the edge non-structural distortion sensitivity and the structural distortion sensitivity threshold value to obtain an edge pixel human eye just noticeable distortion threshold value T suitable for screen content video _e (p), i.e. the JND factor of the edge pixel p, is specifically calculated as follows:

T _e (p)＝T _nstr (p)+T _str (p)-C _e ·min{T _str (p),T _nstr (p)}；

wherein, C _e As a constant, only considering the visibility threshold of the luminance masking effect for the non-edge pixel set, and obtaining a pixel domain JND model of the final screen content video after integration as follows:

5. the method for controlling perceptual bit rate of video coding of screen content based on just noticeable distortion to the human eye as claimed in claim 4, wherein the step S3 specifically comprises:

at a frame level, allocating target bits to a current coding frame according to the perceptual weight of the current coding frame in a current GOP (group of pictures), wherein the perceptual weight PW of the current coding frame in the current GOP _cur The calculation is as follows:

wherein, EF _GOP Representing the edge characteristic factor, EF, of the current whole GOP _coded Representing edge characteristic factors, EF, of the encoded frame _curRefp Representing edges of a reference frameEdge feature factor according to the two-dimensional edge detection operator e _2D (x, y; c, w, σ, θ) is summed by convolution with the reference frame, and thus, the frame-level target bit allocation formula is as follows:

Tar _curfpic ＝(Tar _GOP -Act _codedpics )×PW _curpic ×ε+0.5；

at the CTU level, the specific allocation formula is:

wherein, tar _CTU (i) Represents the current coded i-th CTU target bit, B _curleft Representing the remaining bits, tar, of the current coded frame _codedCTU And Act _codedCTU Representing the target and actual bits of the coded CTU, SW representing the size of the sliding window, W _JND (i) The weight representing the just tolerable distortion threshold JND for the eye currently encoding the ith CTU is calculated as follows:

6. The method for controlling perceptual bit rate of video coding of screen content based on just noticeable distortion to human eyes according to claim 5, wherein the step S4 specifically comprises:

λ _scc ＝τ×bpp _JND ^γ ；

wherein τ and γ are model parameters, bpp, related to video characteristics of the screen content _JND Represents the coded bits per pixel, namely:

wherein, tar _CTU (i) As target bit of the current CTU, W _CTU And H _CTU Respectively setting the width and the height of the current CTU, taking just noticeable distortion JND factors of human eyes as perception constraints, combining similarity measurement factors of a reference frame and a reconstructed frame, and constructing the perception rate distortion model, wherein the parameter lambda of the perception rate distortion model _JND And quantization parameter QP _JND The calculation is as follows:

λ _JND ＝k×(T _JND ×SSIM _JND ×λ _scc )；

QP _JND ＝4.2005×λ _JND +13.7122+0.5；

where k is a model parameter related to a characteristic of the video content of the screen content, is calculated byObtained by experiment, T _JND Represents the average visibility threshold factor for the current encoded frame, calculated as follows:

wherein, EF _r Representing the edge characteristic factor, EF, of the current reference frame _d Representing edge feature factors of the reconstructed frame, c ₁ Is a constant.

7. The method for controlling perceptual bit rate of video coding of screen content based on just noticeable distortion to human eyes according to claim 1, wherein the step S5 specifically comprises:

8. A screen content video coding perception code rate control device based on just noticeable distortion by human eyes is characterized by comprising:

a JND factor module configured to obtain a brightness adaptive threshold, a contrast masking effect threshold, an edge unstructured distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, construct a pixel domain JND model of the screen content video according to the brightness adaptive threshold, the contrast masking effect threshold, the edge unstructured distortion sensitivity threshold and the structural distortion sensitivity threshold, and determine a JND factor;

the target bit distribution module is configured to obtain an edge feature factor according to the edge feature and guide target bit distribution by using the edge feature factor and the JND factor;

the model construction module is configured to calculate and obtain similarity measurement factors of the reference frame and the reconstructed frame according to the edge feature factors of the reference frame and the reconstructed frame under the constraint of a pixel domain JND model, and construct a perception rate distortion model based on the similarity measurement factors of the reference frame and the reconstructed frame;

and the updating module is configured to estimate the coding parameters through a perceptual rate distortion model, and update the perceptual rate distortion model according to the actual coding parameters.

9. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.