CN115567712A - Screen content video coding perception code rate control method and device based on just noticeable distortion by human eyes - Google Patents

Screen content video coding perception code rate control method and device based on just noticeable distortion by human eyes Download PDF

Info

Publication number
CN115567712A
CN115567712A CN202211156529.3A CN202211156529A CN115567712A CN 115567712 A CN115567712 A CN 115567712A CN 202211156529 A CN202211156529 A CN 202211156529A CN 115567712 A CN115567712 A CN 115567712A
Authority
CN
China
Prior art keywords
edge
model
jnd
screen content
distortion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202211156529.3A
Other languages
Chinese (zh)
Inventor
陈婧
王世萍
陈淋淋
曾焕强
朱建清
蔡灿辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao University filed Critical Huaqiao University
Priority to CN202211156529.3A priority Critical patent/CN115567712A/en
Publication of CN115567712A publication Critical patent/CN115567712A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/149Data rate or code amount at the encoder output by estimating the code amount by means of a model, e.g. mathematical model or statistical model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a screen content video coding perception code rate control method and device based on distortion just noticeable to human eyes, and belongs to the field of video coding. Firstly, acquiring a screen content video, carrying out edge modeling on the screen content video to obtain a two-dimensional edge model, extracting edge characteristics of the two-dimensional edge model, and calculating to obtain edge model parameters of the two-dimensional edge model; acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, constructing a pixel domain JND model of the screen content video, and determining a JND factor; then guiding perception complexity classification and target bit allocation by using edge features and JND factors; and finally, constructing a perception bit rate control model under the JND constraint condition through the edge similarity of the reference video frame and the reconstructed video frame. The method can improve the code rate control precision of the screen content video and obviously improve the coding rate distortion performance of the screen content video.

Description

Screen content video coding perception code rate control method and device based on just noticeable distortion by human eyes
Technical Field
The invention relates to the field of video coding, in particular to a method and a device for controlling perceptual code rate of screen content video coding based on distortion just noticeable by human eyes.
Background
With the more mature internet technology, video applications become more and more abundant, and the demand of video communication is increasing day by day. The data volume of the original video signal is huge, and the transmission and the storage are huge burdens, so that the compression coding of the video is necessary. In order to interconnect and intercommunicate different products of different companies, the international video organization sets a series of video coding standards to unify the code stream format after video compression. Among them, h.264/AVC is currently the most widely used video coding standard, which makes high definition video and internet video widely spread. With the requirement of people on video definition and compression efficiency being improved, high efficiency video coding HEVC/H.265 is provided, compared with H.264/AVC, half of code rate can be saved under the same video coding quality, and application of high dynamic video and ultra-high definition video is promoted.
In recent years, screen content video applications are gradually deepened into daily life of people, such as video conferences, online education, live game broadcasting and the like, and people have an increasing demand for screen content videos. Screen content video is computer generated and has many different features than natural video captured by a camera. On one hand, on the spatial domain, screen content video has a large number of flat areas, repetitive patterns, sharp edges, limited color categories, and the like; on the other hand, there may be no correlation between adjacent frames in the time domain because of no physical limitation, that is, the screen content video contains a large number of abrupt frames and still frames. The traditional video coding standard is designed aiming at the characteristics of natural video by default, and in order to adapt to the characteristics of screen content video, the coding standard based on screen content video (HEVC-SCC) is extended on the basis of the HEVC standard. Four new coding tools are mainly added on the basis of the HEVC video coding standard, including Intra Block Copy (IBC), palette Mode (PLT), adaptive Color Transform (ACT), and Adaptive Motion Vector Resolution (AMVR). But in the aspect of rate control, the R-lambda model designed for natural video is still adopted.
In order to efficiently utilize bandwidth resources and improve transmission efficiency, code rate control is an essential link in an encoder. The main work of code rate control is to establish a mathematical relation model between the coding code rate and the quantization parameter, and determine the coding parameter according to the target code rate, so that the coded code rate can adapt to the requirement of the current video transmission bandwidth. However, the current screen content video coding standard HEVC-SCC still performs rate control on natural video, and there is no rate control algorithm for screen content video characteristics and human eye visual perception characteristics. Therefore, it is necessary to research the perceptual code rate control method for screen content video.
Disclosure of Invention
The technical problems mentioned above are addressed. The embodiment of the application aims to provide a screen content video coding perception code rate control method and device based on distortion just noticeable by human eyes by combining screen content characteristics and human eye visual characteristics, so as to solve the technical problems mentioned in the background technology part and improve screen content video coding quality, rate distortion performance and perception code rate control precision.
In a first aspect, an embodiment of the present application provides a method for controlling perceptual bit rate of video coding of screen content based on distortion just noticeable to human eyes, including the following steps:
s1, acquiring a screen content video, carrying out edge modeling on the screen content video to obtain a two-dimensional edge model, extracting edge characteristics of the two-dimensional edge model, and calculating to obtain edge model parameters of the two-dimensional edge model;
s2, acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on edge model parameters, constructing a pixel domain JND model of the screen content video according to the brightness self-adaptive threshold, the contrast masking effect threshold, the edge non-structural distortion sensitivity threshold and the structural distortion sensitivity threshold, and determining a JND factor;
s3, obtaining an edge feature factor according to the edge feature, and guiding target bit allocation by using the edge feature factor and a JND factor;
s4, under the constraint of a pixel domain JND model, calculating to obtain a similarity measurement factor of the reference frame and the reconstructed frame according to the edge feature factor of the reference frame and the edge feature factor of the reconstructed frame, and constructing a perception rate distortion model based on the similarity measurement factor of the reference frame and the reconstructed frame;
and S5, estimating coding parameters through the perceptual rate distortion model, and updating the perceptual rate distortion model according to actual coding parameters.
Preferably, step S1 specifically includes:
expanding the one-dimensional edge model into a two-dimensional edge model, wherein the expanded two-dimensional edge model is calculated as follows:
Figure BDA0003858998460000021
wherein, the edge model parameters b, c and w respectively represent the brightness, contrast and structure of the edge profile, x represents the horizontal direction coordinate of the two-dimensional edge model, y represents the vertical direction coordinate of the two-dimensional edge model, theta represents the direction angle of the two-dimensional edge model, and erf (-) is an error function;
performing two-dimensional Gaussian partial derivative filtering on a two-dimensional edge model of a screen content video to obtain a two-dimensional edge detection operator e 2D (x, y; c, w, σ, θ), calculated as follows:
e 2D (x,y;c,w,σ,θ)=|e x (x,y;c,w,σ,θ)|+|e y (x,y;c,w,σ,θ)|;
wherein σ represents a smoothing parameter of a gaussian function;
operator e for detecting two-dimensional edge 2D (x, y; c, w, sigma, theta) is convolved with the screen content video and extractedIts edge features;
the edge model parameters are calculated as follows:
Figure BDA0003858998460000031
Figure BDA0003858998460000032
Figure BDA0003858998460000033
wherein e 1 、e 2 、e 3 One-dimensional edge x = (0, a, -a) gaussian derivative filtered response of three positions respectively,
Figure BDA0003858998460000034
preferably, the screen content video is divided into an edge pixel set S after edge detection E And a non-edge pixel set, based on the edge pixel set S in step S2 E And acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold.
Preferably, step S2 specifically includes:
edge pixel set S E The luminance adaptive threshold for the middle edge pixel p is calculated as follows:
T elum (p)=s 2D (p;T lum (p)+b,c,w)-s(p;b,c,w);
Figure BDA0003858998460000035
wherein, T lum (p) denotes the luminance masking effect, when an edge pixel p belongs to the set of edge pixels S E When the average luminance along the edge contour is I (p) = b + c/2, seeNumber alpha 1 、α 2 Beta is a constant;
the contrast masking effect threshold for the edge pixel p is calculated as follows:
T econ (p)=min{|s 2D (p;b,T con+ ,w)-s 2D (p;b,c,w)|,|s 2D (p;b,T con- ,w)-s 2D (p;b,c,w)|};
Figure BDA0003858998460000041
Figure BDA0003858998460000042
where c (p) denotes the contrast of the edge pixel p, f th Is a constant;
by adaptation of the threshold T to the luminance elum (p) and a contrast masking effect threshold T econ (p) fusing the nonlinear additive models to obtain an edge unstructured distortion sensitivity threshold T nstr (p) the calculation formula is as follows:
T nstr (p)=T elum (p)+T econ (p)-C nstr ·min{T elum (p),T econ (p)};
wherein, 0 is more than C nstr <1;
The structural distortion sensitivity threshold for the edge pixel p is calculated as follows:
T str (p)=|s 2D (p;b,c,w+Δw)-s 2D (p;b,c,w)|;
combining the edge non-structural distortion sensitivity and the structural distortion sensitivity threshold value to obtain the edge pixel human eye just noticeable distortion threshold value T suitable for the screen content video e (p), the JND factor for edge pixel p, is calculated as follows:
T e (p)=T nstr (p)+T str (p)-C e ·min{T str (p),T nstr (p)};
wherein, C e As a constant, only the brightness is considered for non-edge pixel setsAnd (3) obtaining a pixel domain JND model of the final screen content video after integration according to a visibility threshold of the degree masking effect, wherein the pixel domain JND model comprises the following steps:
Figure BDA0003858998460000043
preferably, step S3 specifically includes:
at the frame level, allocating target bits to the current coding frame according to the perceptual weight of the current coding frame in the current GOP, wherein the perceptual weight PW of the current coding frame in the current GOP cur The calculation is as follows:
Figure BDA0003858998460000044
wherein, EF GOP Representing the edge characteristic factor, EF, of the current whole GOP coded Representing edge characteristic factors, EF, of the encoded frame curRefpic Representing edge feature factors of the reference frame, according to a two-dimensional edge detection operator e 2D (x, y; c, w, σ, θ) is summed by convolution with the reference frame, and thus, the frame-level target bit allocation formula is as follows:
Tar curfpic =(Tar GOP -Act codedpics )×PW curpic ×ε+0.5;
Tar GOP representing GOP level target bits, obtained from target bits of the video sequence in the coding profile, act codedpics Representing the actual bits consumed by the encoded frames in the GOP, epsilon being a frame-level bit allocation adjustment factor, associated with the screen content video, obtained by experimentation;
at the CTU level, the specific allocation formula is:
Figure BDA0003858998460000051
wherein, tar CTU (i) Represents the ith CTU target bit of the current code, B curleft Representing the remaining bits, tar, of the current coded frame codedCTU And Act codedCTU Representing the encoded CTU target and actual bits, SW representing the size of the sliding window, W JND (i) The weight representing the just tolerable distortion threshold JND for the eye currently encoding the ith CTU is calculated as follows:
Figure BDA0003858998460000052
JND AvecurCTU is the mean of all JND factors of pixels in the ith CTU of the current code, and JND Avecurpic Is the mean, PW, of JND factors of all pixels within the currently coded frame curCTU Is the perceptual weight of the current CTU, similar to the frame level, F (CW) curCTU ) The weighting factors representing different types of CTUs are defined as follows:
Figure BDA0003858998460000053
CW curCTU the complexity weight representing the current block is determined by the edge feature factor and is calculated as follows:
Figure BDA0003858998460000054
EF curCTU edge feature factor, EF, representing the current CTU curpic Representing the edge feature factor of the current frame.
Preferably, step S4 specifically includes:
on the basis of the original rate distortion model R-lambda, the code rate control parameter lambda of the CTU level is obtained scc The calculation is as follows:
λ scc =τ×bpp JND γ
where τ and γ are model parameters related to video characteristics of the screen content, bpp JND Represents the coded bits per pixel, namely:
Figure BDA0003858998460000055
wherein, tar CTU (i) As target bit of the current CTU, W CTU And H CTU Respectively representing the width and the height of the current CTU, taking just noticeable distortion JND factors of human eyes as perception constraints, combining similarity measurement factors of a reference frame and a reconstructed frame, and constructing a perception rate distortion model, wherein the parameter lambda of the perception rate distortion model JND And quantization parameter QP JND The calculation is as follows:
λ JND =k×(T JND ×SSIM JND ×λ scc );
QP JND =4.2005×λ JND +13.7122+0.5;
where k is a model parameter related to the characteristics of the video content of the screen content, obtained by experiment, T JND Represents the average visibility threshold factor for the current coded frame, calculated as follows:
Figure BDA0003858998460000061
SSIM JND is a similarity measurement factor of the reference frame and the reconstructed frame, calculated as follows:
Figure BDA0003858998460000062
wherein, EF r Representing the edge characteristic factor, EF, of the current reference frame d Edge feature factor representing reconstructed frame, c 1 Is a constant.
Preferably, step S5 specifically includes:
predicting the optimal coding parameters by using a perception rate distortion model to carry out coding;
and after the actual coding is finished, updating the perception rate distortion model by using the actual coding parameters.
In a second aspect, an embodiment of the present application provides a device for controlling perceptual bit rate in video coding of screen content based on distortion just noticeable to human eyes, including:
the edge modeling module is configured to acquire a screen content video, perform edge modeling on the screen content video to obtain a two-dimensional edge model, extract edge characteristics of the two-dimensional edge model, and calculate edge model parameters of the two-dimensional edge model;
the JND factor module is configured to acquire a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, construct a pixel domain JND model of the screen content video according to the brightness self-adaptive threshold, the contrast masking effect threshold, the edge non-structural distortion sensitivity threshold and the structural distortion sensitivity threshold, and determine a JND factor;
the target bit distribution module is configured to obtain an edge characteristic factor according to the edge characteristic and guide target bit distribution by using the edge characteristic factor and the JND factor;
the model construction module is configured to calculate similarity measurement factors of the reference frame and the reconstructed frame according to the edge feature factors of the reference frame and the reconstructed frame under the constraint of a pixel domain JND model, and construct a perception rate distortion model based on the similarity measurement factors of the reference frame and the reconstructed frame;
and the updating module is configured to estimate the coding parameters through the perceptual rate distortion model and update the perceptual rate distortion model according to the actual coding parameters.
In a third aspect, embodiments of the present application provide an electronic device comprising one or more processors; storage means for storing one or more programs which, when executed by one or more processors, cause the one or more processors to carry out a method as described in any one of the implementations of the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method as described in any implementation manner of the first aspect.
Compared with the prior art, the invention has the following beneficial effects:
(1) The method of the invention fully considers the content characteristics of the screen content video and the characteristics of the human eye vision system, and constructs a Just Noticeable Distortion (JND) model which can well describe the human eye vision system according to the edge parameters of the screen content video. The edge characteristics and the JND factors are used for guiding the perception complexity classification and the target bit allocation, and compared with the allocation mode without considering the content characteristics of the screen video in the standard, the target bit rate allocation scheme is more accurate.
(2) The method of the invention fully considers the problem that the rate distortion model in the standard does not consider the characteristics of the screen content video, combines the human eye perceptible distortion JND model and the similarity measurement factor of the reference frame and the reconstructed frame to construct the perceptual rate distortion model of the screen content video, improves the quality and rate distortion performance of the coded video sequence and improves the perceptual code rate control precision.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is an exemplary device architecture diagram in which one embodiment of the present application may be applied;
fig. 2 is a schematic flowchart of a method for controlling perceptual bit rate of video coding of screen content based on distortion just noticeable by human eyes according to an embodiment of the present application;
fig. 3 is an overall flowchart of a method for controlling perceptual bit rate of video coding of screen content based on distortion just noticeable by human eyes according to an embodiment of the present application;
fig. 4 is an overall block diagram of rate control of the screen content video coding perceptual rate control method based on distortion just noticeable by human eyes according to the embodiment of the present application;
fig. 5 is a schematic diagram of a one-dimensional smooth step-type edge model of a screen content video coding perceptual code rate control method based on distortion just noticeable by human eyes according to an embodiment of the present application;
FIG. 6 is a diagram of a perceptual bitrate control apparatus for video coding of screen content based on distortion just noticeable by human eyes according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a computer device suitable for implementing an electronic apparatus according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 illustrates an exemplary device architecture 100 to which a screen content video coding perceptual rate control method based on distortion just noticeable to the human eye or a screen content video coding perceptual rate control device based on distortion just noticeable to the human eye according to an embodiment of the present application may be applied.
As shown in fig. 1, the apparatus architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. Various applications, such as data processing type applications, file processing type applications, and the like, may be installed on the terminal devices 101, 102, 103.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal devices 101, 102, 103 are software, they can be installed in the electronic devices listed above. It may be implemented as a plurality of software or software modules (e.g., software or software modules used to provide distributed services) or as a single software or software module. And is not particularly limited herein.
The server 105 may be a server that provides various services, such as a background data processing server that processes files or data uploaded by the terminal devices 101, 102, 103. The background data processing server can process the acquired files or data to generate a processing result.
It should be noted that the method for controlling perceptual bitrate of screen content video coding based on distortion just noticeable by human eyes provided in the embodiment of the present application may be executed by the server 105, or may also be executed by the terminal devices 101, 102, and 103, and accordingly, the apparatus for controlling perceptual bitrate of screen content video coding based on distortion just noticeable by human eyes may be disposed in the server 105, or may also be disposed in the terminal devices 101, 102, and 103.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case where the processed data does not need to be acquired from a remote location, the apparatus architecture described above may not include a network, but only a server or a terminal device.
Fig. 2 shows a method for controlling perceptual bitrate of video coding based on screen content distortion just noticeable by human eyes according to an embodiment of the present application, which includes the following steps:
s1, acquiring a screen content video, carrying out edge modeling on the screen content video to obtain a two-dimensional edge model, extracting edge characteristics of the two-dimensional edge model, and calculating to obtain edge model parameters of the two-dimensional edge model.
In a specific embodiment, the overall flow of the embodiment of the present application is shown in fig. 3, and the overall flow block diagram of the rate control of the embodiment of the present application is shown in fig. 4. The step S1 specifically includes:
expanding the one-dimensional edge model into a two-dimensional edge model, wherein the expanded two-dimensional edge model is calculated as follows:
Figure BDA0003858998460000091
wherein, the edge model parameters b, c and w respectively represent the brightness, contrast and structure of the edge profile, x represents the horizontal coordinate of the two-dimensional edge model, y represents the vertical coordinate of the two-dimensional edge model, theta represents the direction angle of the two-dimensional edge model, and erf (·) is an error function;
performing two-dimensional Gaussian partial derivative filtering on a two-dimensional edge model of a screen content video to obtain a two-dimensional edge detection operator e 2D (x, y; c, w, σ, θ), calculated as:
e 2D (x,y;c,w,σ,θ)=|e x (x,y;c,w,σ,θ)|+|e y (x,y;c,w,σ,θ)|;
wherein σ represents a smoothing parameter of a gaussian function;
detecting operator e of two-dimensional edge 2D (x, y; c, w, sigma, theta) is convolved with the screen content video, and the edge features of the screen content video are extracted; the edge model parameters are calculated as follows:
Figure BDA0003858998460000092
Figure BDA0003858998460000101
Figure BDA0003858998460000102
wherein e is 1 、e 2 、e 3 One-dimensional edge x = (0, a, -a) gaussian derivative filtered response of three positions respectively,
Figure BDA0003858998460000103
specifically, a screen content video is input in an HM + SCM platform, edge modeling is performed on the screen content video to obtain edge characteristics of the screen content video, a one-dimensional edge model is expanded into a two-dimensional edge model, the two-dimensional edge model of the screen content video is established, and the edge characteristics of the screen content video are detected through a two-dimensional Gaussian partial derivative function. Preferably, the one-dimensional edge model is a one-dimensional smooth step-type edge model, and the one-dimensional smooth step-type edge model is as shown in fig. 5. The parameter b determines the substrate strength of the edge, the parameter c reflects the contrast strength of the edge, the higher c corresponds to the stronger edge, the edge structure is controlled by w, w corresponds to a specific shape, and the smaller w is, the sharper the edge profile becomes. And obtaining a two-dimensional smooth step type edge model after expansion.
Performing two-dimensional Gaussian partial derivative filtering on the two-dimensional edge model of the screen content image to obtain a two-dimensional edge detection operator e 2D (x, y; c, w, sigma, theta), convolving with the screen content image and extracting the edge characteristics, and calculating as follows:
Figure BDA0003858998460000104
Figure BDA0003858998460000105
Figure BDA0003858998460000106
Figure BDA0003858998460000107
for simplicity of calculation, in the embodiment of the present application, the two-dimensional edge detection operator e 2D The calculation formula of (x, y; c, w, σ, θ) is simplified as follows:
e 2D (x,y;c,w,σ,θ)=|e x (x,y;c,w,σ,θ)|+|e y (x,y;c,w,σ,θ)|;
edge center point (x) 0 ,y 0 ) The edge direction of (2) is calculated as follows:
Figure BDA0003858998460000111
s2, acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, constructing a pixel domain JND model of the screen content video according to the brightness self-adaptive threshold, the contrast masking effect threshold, the edge non-structural distortion sensitivity threshold and the structural distortion sensitivity threshold, and determining a JND factor.
In a specific embodiment, after the edge detection, the screen content video is divided into an edge pixel set S E And a non-edge pixel set, based on the edge pixel set S in step S2 E And acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold, and acquiring a JND factor of the threshold.
In a specific embodiment, step S2 specifically includes:
edge pixel set S E The luminance adaptive threshold for the middle edge pixel p is calculated as follows:
T elum (p)=s 2D (p;T lum (p)+b,c,w)-s(p;b,c,w);
Figure BDA0003858998460000112
wherein, T lum (p) denotes the luminance masking effect, when an edge pixel p belongs to the set of edge pixels S E When the average brightness along the edge profile is I (p) = b + c/2, parameter α 1 、α 2 Beta is a constant, in this example, parameter alpha 1 =17,α 2 =3/128,β=3。
The contrast masking effect threshold for the edge pixel p is calculated as follows:
T econ (p)=min{|s 2D (p;b,T con+ ,w)-s 2D (p;b,c,w)|,|s 2D (p;b,T con- ,w)-s 2D (p;b,c,w)|};
Figure BDA0003858998460000113
Figure BDA0003858998460000114
where c (p) represents the contrast of the edge pixel p, f th Is a constant; in this example, f th =0.14。
By adaptation of the threshold T to the luminance elum (p) and a contrast masking effect threshold T econ (p) fusing the nonlinear additive models to obtain an edge unstructured distortion sensitivity threshold T nstr (p), the calculation formula is as follows:
T nstr (p)=T elum (p)+T econ (p)-C nstr ·min{T elum (p),T econ (p)};
wherein, 0 < C nstr < 1, in this embodiment, set C nstr Is 0.2.
The structural distortion sensitivity threshold for the edge pixel p is calculated as follows:
T str (p)=|s 2D (p;b,c,w+Δw)-s 2D (p;b,c,w)|;
the threshold value of the edge non-structural distortion sensitivity and the structural distortion sensitivity is fused to obtain the threshold value T of the edge pixel which is suitable for the screen content video and just perceived by human eyes e (p), the JND factor for edge pixel p, is calculated as follows:
T e (p)=T nstr (p)+T str (p)-C e ·min{T str (p),T nstr (p)};
wherein, C e Is a constant, in this example, C e Set to 0.2. For the non-edge pixel set, only considering the visibility threshold of the brightness masking effect, and obtaining a pixel domain JND model of the final screen content video after integration as follows:
Figure BDA0003858998460000121
and S3, obtaining an edge feature factor according to the edge feature, and guiding target bit allocation by using the edge feature factor and the JND factor.
In a specific embodiment, step S3 specifically includes:
at the frame level, allocating target bits to the current coding frame according to the perceptual weight of the current coding frame in the current GOP, wherein the perceptual weight PW of the current coding frame in the current GOP cur The calculation is as follows:
Figure BDA0003858998460000122
wherein, EF GOP Representing the edge characteristic factor, EF, of the current whole GOP coded Representing edge characteristic factors, EF, of the encoded frame curRefpic Representing edge feature factors of a reference frame, according to a two-dimensional edge detection operator e 2D (x, y; c, w, σ, θ) is summed over the reference frame convolution, and thus, the frame-level target bit allocation formula is as follows:
Tar curfpic =(Tar GOP -Act codedpics )×PW curpic ×ε+0.5;
Tar GOP representing GOP level target bits, obtained from target bits of the video sequence in the coding profile, act codedpics Representing the actual bits consumed by the encoded frames in the GOP, epsilon being a frame-level bit allocation adjustment factor, related to the screen content video, obtained by experimentation;
at the CTU level, the specific allocation formula is:
Figure BDA0003858998460000131
wherein, tar CTU (i) Represents the ith CTU target bit of the current code, B curleft Representing the remaining bits, tar, of the current coded frame codedCTU And Act codedCTU Representing the encoded CTU target and actual bits, SW representing the size of the sliding window, W JND (i) The weight of the just tolerable distortion threshold JND for the human eye currently encoding the ith CTU is calculated as follows:
Figure BDA0003858998460000132
JND AvecurCTU is the mean of all JND factors of pixels in the ith CTU of the current code, and JND Avecurpic Is the mean, PW, of JND factors of all pixels within the currently encoded frame curCTU Is the perceptual weight of the current CTU, similar to the frame level, F (CW) curCTU ) The weighting factors representing different types of CTUs are defined as follows:
Figure BDA0003858998460000133
CW curCTU the complexity weight representing the current block is determined by the edge feature factor and is calculated as follows:
Figure BDA0003858998460000134
EF curCTU representing the edge characteristic factor, EF, of the current CTU curpic Representing the edge feature factor of the current frame.
Specifically, at a frame level, the perceptual weight of a current coding frame in a current GOP is calculated according to the edge characteristics of a screen content video, a target bit is allocated to the current coding frame according to the perceptual weight ratio of the current coding frame and an uncoded frame, at a CTU level, besides the perceptual weight of the current CTU is considered, a human eye just tolerable distortion threshold JND weight of the CTU is fused, and a larger JND weight indicates that the current CTU can tolerate a larger distortion degree, and a smaller target bit number can be allocated.
Specifically, at the CTU level, the CTUs are divided into complex, contiguous, and simple CTU blocks according to complexity weights of encoding the CTUs. For a CTU block with higher complexity, the perceptual information contained is more, and more bits need to be allocated, whereas for a simple CTU, it is required to allocate more bitsFewer bits are allocated. For different classes, besides the perception weight of the current CTU is considered, the just tolerable distortion threshold JND weight of the human eye of the CTU is fused. W is a group of JND (i) Weight, W, representing the just tolerable distortion threshold JND for the eye currently encoding the ith CTU JND (i) A larger value indicates a larger tolerable distortion degree of the current CTU, and the distributed target bit is correspondingly smaller, F (CW) curCTU ) The weighting factors of different types of CTUs are represented, more bits need to be distributed to complex CTU blocks, the weighting factors are large, the number of continuous CTU blocks is the next, and the number of simple CTU blocks is the smallest.
And S4, under the constraint of the pixel domain JND model, calculating to obtain a similarity measurement factor of the reference frame and the reconstructed frame according to the edge characteristic factor of the reference frame and the edge characteristic factor of the reconstructed frame, and constructing a perception rate distortion model based on the similarity measurement factor of the reference frame and the reconstructed frame.
In a specific embodiment, after the GOP level, frame level and CTU level assignment is completed, the process proceeds to step S4. Step S4 specifically includes:
on the basis of the original rate distortion model R-lambda, the code rate control parameter lambda of the CTU level is obtained scc The calculation is as follows:
λ scc =τ×bpp JND γ
where τ and γ are model parameters related to video characteristics of the screen content, bpp JND Represents the coded bits per pixel, namely:
Figure BDA0003858998460000141
wherein, tar CTU (i) As target bit of the current CTU, W CTU And H CTU Respectively representing the width and the height of the current CTU, taking just noticeable distortion JND factors of human eyes as perception constraints, combining similarity measurement factors of a reference frame and a reconstructed frame, constructing a perception rate distortion model, and constructing a parameter lambda of the perception rate distortion model JND And quantization parameter QP JND The calculation is as follows:
λ JND =k×(T JND ×SSIM JND ×λ scc );
QP JND =4.2005×λ JND +13.7122+0.5;
where k is a model parameter related to the characteristics of the video content of the screen content, obtained by experiment, T JND Represents the average visibility threshold factor for the current encoded frame, calculated as follows:
Figure BDA0003858998460000142
SSIM JND is a similarity measurement factor of the reference frame and the reconstructed frame, calculated as follows:
Figure BDA0003858998460000143
wherein, EF r Representing the edge characteristic factor, EF, of the current reference frame d Edge feature factor representing reconstructed frame, c 1 Is a constant.
Specifically, on the basis of an R-lambda model, a JND model is used as constraint, a similarity measurement factor of a video reference frame and a video reconstruction frame is added to be used as actual coding feedback, and a perception rate distortion model is constructed. In order to improve the perceptual coding performance of the screen content video, the embodiment of the application adds a Just Noticeable Distortion (JND) factor of human eyes as a perceptual constraint, and constructs a perceptual rate distortion model based on just noticeable distortion of human eyes by combining with a similarity measurement factor of a reference frame and a reconstructed frame.
And S5, estimating coding parameters through the perceptual rate distortion model, and updating the perceptual rate distortion model according to actual coding parameters.
In a specific embodiment, step S5 specifically includes:
estimating optimal coding parameters by using a perception rate distortion model to carry out coding;
and after the actual coding is finished, updating the perception rate distortion model by using the actual coding parameters.
The specific algorithm is consistent with the maintenance platform. And the updated perception rate distortion model is a perception code rate control model. The embodiment of the application improves the rate distortion performance of video coding and improves the code rate control precision and the coding quality.
With further reference to fig. 6, as an implementation of the methods shown in the above figures, the present application provides an embodiment of a device for controlling perceptual bitrate in video coding of screen content based on distortion just noticeable by human eyes, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 2, and the device can be applied to various electronic devices in particular.
The embodiment of the application provides a screen content video coding perception code rate control device based on just noticeable distortion by human eyes, which comprises:
the edge modeling module 1 is configured to acquire a screen content video, perform edge modeling on the screen content video to obtain a two-dimensional edge model, extract edge characteristics of the two-dimensional edge model, and calculate edge model parameters of the two-dimensional edge model;
the JND factor module 2 is configured to acquire a brightness adaptive threshold, a contrast masking effect threshold, an edge unstructured distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, construct a pixel domain JND model of the screen content video according to the brightness adaptive threshold, the contrast masking effect threshold, the edge unstructured distortion sensitivity threshold and the structural distortion sensitivity threshold, and determine a JND factor;
the target bit distribution module 3 is configured to obtain an edge feature factor according to the edge feature and guide target bit distribution by using the edge feature factor and a JND factor;
the model building module 4 is configured to calculate and obtain a similarity measurement factor between the reference frame and the reconstructed frame according to the edge feature factor of the reference frame and the edge feature factor of the reconstructed frame under the constraint of the pixel domain JND model, and build a perceptual rate distortion model based on the similarity measurement factor between the reference frame and the reconstructed frame;
and the updating module 5 is configured to perform coding parameter estimation through the perceptual rate distortion model, and update the perceptual rate distortion model according to the actual coding parameters.
Reference is now made to fig. 7, which is a schematic diagram illustrating a computer device 700 suitable for use in implementing an electronic device (e.g., the server or the terminal device shown in fig. 1) according to an embodiment of the present application. The electronic device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the use range of the embodiment of the present application.
As shown in fig. 7, the computer apparatus 700 includes a Central Processing Unit (CPU) 701 and a Graphics Processing Unit (GPU) 702, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM) 703 or a program loaded from a storage section 709 into a Random Access Memory (RAM) 704. In the RAM704, various programs and data necessary for the operation of the apparatus 700 are also stored. The CPU 701, GPU702, ROM 703, and RAM704 are connected to each other via a bus 705. An input/output (I/O) interface 706 is also connected to bus 705.
The following components are connected to the I/O interface 706: an input section 707 including a keyboard, a mouse, and the like; an output section 708 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage section 709 including a hard disk and the like; and a communication section 710 including a network interface card such as a LAN card, a modem, or the like. The communication section 710 performs communication processing via a network such as the internet. The driver 711 may also be connected to the I/O interface 706 as needed. A removable medium 712 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 711 as necessary, so that a computer program read out therefrom is mounted into the storage section 709 as necessary.
In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication section 710, and/or installed from the removable media 712. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 701 and a Graphics Processing Unit (GPU) 702.
It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable medium or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or a combination of any of the foregoing. More specific examples of the computer readable medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware. The modules described may also be provided in a processor.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may be separate and not incorporated into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a screen content video, carrying out edge modeling on the screen content video to obtain a two-dimensional edge model, extracting edge characteristics of the two-dimensional edge model, and calculating to obtain edge model parameters of the two-dimensional edge model; acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on edge model parameters, constructing a pixel domain JND model of the screen content video according to the brightness self-adaptive threshold, the contrast masking effect threshold, the edge non-structural distortion sensitivity threshold and the structural distortion sensitivity threshold, and determining a JND factor; obtaining an edge feature factor according to the edge feature, and guiding target bit allocation by using the edge feature factor and a JND factor; under the constraint of a pixel domain JND model, calculating to obtain a similarity measurement factor of a reference frame and a reconstructed frame according to an edge feature factor of the reference frame and an edge feature factor of the reconstructed frame, and constructing a perception rate distortion model based on the similarity measurement factor of the reference frame and the reconstructed frame; and estimating coding parameters through the perceptual rate distortion model, and updating the perceptual rate distortion model according to actual coding parameters.
The foregoing description is only exemplary of the preferred embodiments of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (10)

1. A screen content video coding perception code rate control method based on just noticeable distortion by human eyes is characterized by comprising the following steps:
s1, acquiring a screen content video, carrying out edge modeling on the screen content video to obtain a two-dimensional edge model, extracting edge characteristics of the two-dimensional edge model, and calculating to obtain edge model parameters of the two-dimensional edge model;
s2, acquiring a brightness self-adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, constructing a pixel domain JND model of the screen content video according to the brightness self-adaptive threshold, the contrast masking effect threshold, the edge non-structural distortion sensitivity threshold and the structural distortion sensitivity threshold, and determining a JND factor;
s3, obtaining an edge feature factor according to the edge feature, and guiding target bit allocation by using the edge feature factor and the JND factor;
s4, under the constraint of a pixel domain JND model, calculating according to the edge characteristic factors of the reference frame and the reconstructed frame to obtain similarity measurement factors of the reference frame and the reconstructed frame, and constructing a perceptual rate distortion model based on the similarity measurement factors of the reference frame and the reconstructed frame;
and S5, estimating coding parameters through a perceptual rate distortion model, and updating the perceptual rate distortion model according to actual coding parameters.
2. The method for controlling perceptual bit rate of video coding of screen content based on just noticeable distortion to human eyes according to claim 1, wherein the step S1 specifically comprises:
expanding the one-dimensional edge model into a two-dimensional edge model, wherein the expanded two-dimensional edge model is calculated as follows:
Figure FDA0003858998450000011
wherein, the edge model parameters b, c and w respectively represent the brightness, contrast and structure of the edge profile, x represents the horizontal coordinate of the two-dimensional edge model, y represents the vertical coordinate of the two-dimensional edge model, theta represents the direction angle of the two-dimensional edge model, and erf (·) is an error function;
performing two-dimensional Gaussian partial derivative filtering on the two-dimensional edge model of the screen content video,obtaining a two-dimensional edge detection operator e 2D (x, y; c, w, σ, θ), calculated as:
e 2D (x,y;c,w,σ,θ)=|e x (x,y;c,w,σ,θ)|+|e y (x,y;c,w,σ,θ)|;
wherein σ represents a smoothing parameter of the gaussian function;
detecting the two-dimensional edge operator e 2D (x, y; c, w, sigma, theta) is convolved with the screen content video, and the edge features of the screen content video are extracted;
the edge model parameters are calculated as follows:
Figure FDA0003858998450000021
Figure FDA0003858998450000022
Figure FDA0003858998450000023
wherein e 1 、e 2 、e 3 One-dimensional edge x = (0, a, -a) response of three positions after gaussian bias filtering respectively,
Figure FDA0003858998450000024
3. the method as claimed in claim 2, wherein the screen content video is divided into an edge pixel set S after edge detection E And a non-edge pixel set, in the step S2, based on the edge pixel set S E And acquiring a brightness adaptive threshold, a contrast masking effect threshold, an edge non-structural distortion sensitivity threshold and a structural distortion sensitivity threshold.
4. The method for controlling perceptual bit rate of video coding of screen content based on just noticeable distortion to human eyes as claimed in claim 3, wherein said step S2 specifically comprises:
the edge pixel set S E The luminance adaptive threshold for the middle edge pixel p is calculated as follows:
T elum (p)=s 2D (p;T lum (p)+b,c,w)-s(p;b,c,w);
Figure FDA0003858998450000025
wherein, T lum (p) denotes the luminance masking effect, when an edge pixel p belongs to the set of edge pixels S E When the average luminance along the edge contour is I (p) = b + c/2, parameter α 1 、α 2 Beta is a constant;
the contrast masking effect threshold for the edge pixel p is calculated as follows:
Figure FDA0003858998450000026
Figure FDA0003858998450000031
Figure FDA0003858998450000032
where c (p) represents the contrast of the edge pixel p, f th Is a constant;
by adaptation of the threshold T to the luminance elum (p) and a contrast masking effect threshold T econ (p) fusing the nonlinear additive models to obtain an edge unstructured distortion sensitivity threshold T nstr (p), the calculation formula is as follows:
T nstr (p)=T elum (p)+T econ (p)-C nstr ·min{T elum (p),T econ (p)};
wherein, 0 is more than C nstr <1;
The structural distortion sensitivity threshold of the edge pixel p is calculated as follows:
T str (p)=|s 2D (p;b,c,w+Δw)-s 2D (p;b,c,w)|;
fusing the edge non-structural distortion sensitivity and the structural distortion sensitivity threshold value to obtain an edge pixel human eye just noticeable distortion threshold value T suitable for screen content video e (p), i.e. the JND factor of the edge pixel p, is specifically calculated as follows:
T e (p)=T nstr (p)+T str (p)-C e ·min{T str (p),T nstr (p)};
wherein, C e As a constant, only considering the visibility threshold of the luminance masking effect for the non-edge pixel set, and obtaining a pixel domain JND model of the final screen content video after integration as follows:
Figure FDA0003858998450000033
5. the method for controlling perceptual bit rate of video coding of screen content based on just noticeable distortion to the human eye as claimed in claim 4, wherein the step S3 specifically comprises:
at a frame level, allocating target bits to a current coding frame according to the perceptual weight of the current coding frame in a current GOP (group of pictures), wherein the perceptual weight PW of the current coding frame in the current GOP cur The calculation is as follows:
Figure FDA0003858998450000034
wherein, EF GOP Representing the edge characteristic factor, EF, of the current whole GOP coded Representing edge characteristic factors, EF, of the encoded frame curRefp Representing edges of a reference frameEdge feature factor according to the two-dimensional edge detection operator e 2D (x, y; c, w, σ, θ) is summed by convolution with the reference frame, and thus, the frame-level target bit allocation formula is as follows:
Tar curfpic =(Tar GOP -Act codedpics )×PW curpic ×ε+0.5;
Tar GOP representing GOP level target bits, obtained from target bits of the video sequence in the coding profile, act codedpics Representing the actual bits consumed by the encoded frames in the GOP, epsilon being a frame-level bit allocation adjustment factor, associated with the screen content video, obtained by experimentation;
at the CTU level, the specific allocation formula is:
Figure FDA0003858998450000041
wherein, tar CTU (i) Represents the current coded i-th CTU target bit, B curleft Representing the remaining bits, tar, of the current coded frame codedCTU And Act codedCTU Representing the target and actual bits of the coded CTU, SW representing the size of the sliding window, W JND (i) The weight representing the just tolerable distortion threshold JND for the eye currently encoding the ith CTU is calculated as follows:
Figure FDA0003858998450000042
JND AvecurCTU is the mean of all JND factors of pixels in the ith CTU of the current code, and JND Avecurpic Is the mean, PW, of JND factors of all pixels within the currently encoded frame curCTU Is the perceptual weight of the current CTU, similar to the frame level, F (CW) curCTU ) The weighting factors representing different types of CTUs are defined as follows:
Figure FDA0003858998450000043
CW curCTU the complexity weight representing the current block is determined by the edge feature factor and is calculated as follows:
Figure FDA0003858998450000044
EF curCTU representing the edge characteristic factor, EF, of the current CTU curpic Representing the edge feature factor of the current frame.
6. The method for controlling perceptual bit rate of video coding of screen content based on just noticeable distortion to human eyes according to claim 5, wherein the step S4 specifically comprises:
on the basis of the original rate distortion model R-lambda, the code rate control parameter lambda of the CTU level is obtained scc The calculation is as follows:
λ scc =τ×bpp JND γ
wherein τ and γ are model parameters, bpp, related to video characteristics of the screen content JND Represents the coded bits per pixel, namely:
Figure FDA0003858998450000051
wherein, tar CTU (i) As target bit of the current CTU, W CTU And H CTU Respectively setting the width and the height of the current CTU, taking just noticeable distortion JND factors of human eyes as perception constraints, combining similarity measurement factors of a reference frame and a reconstructed frame, and constructing the perception rate distortion model, wherein the parameter lambda of the perception rate distortion model JND And quantization parameter QP JND The calculation is as follows:
λ JND =k×(T JND ×SSIM JND ×λ scc );
QP JND =4.2005×λ JND +13.7122+0.5;
where k is a model parameter related to a characteristic of the video content of the screen content, is calculated byObtained by experiment, T JND Represents the average visibility threshold factor for the current encoded frame, calculated as follows:
Figure FDA0003858998450000052
SSIM JND is a similarity measurement factor of the reference frame and the reconstructed frame, calculated as follows:
Figure FDA0003858998450000053
wherein, EF r Representing the edge characteristic factor, EF, of the current reference frame d Representing edge feature factors of the reconstructed frame, c 1 Is a constant.
7. The method for controlling perceptual bit rate of video coding of screen content based on just noticeable distortion to human eyes according to claim 1, wherein the step S5 specifically comprises:
estimating optimal coding parameters by using a perception rate distortion model to carry out coding;
and after the actual coding is finished, updating the perception rate distortion model by using the actual coding parameters.
8. A screen content video coding perception code rate control device based on just noticeable distortion by human eyes is characterized by comprising:
the edge modeling module is configured to acquire a screen content video, perform edge modeling on the screen content video to obtain a two-dimensional edge model, extract edge characteristics of the two-dimensional edge model, and calculate edge model parameters of the two-dimensional edge model;
a JND factor module configured to obtain a brightness adaptive threshold, a contrast masking effect threshold, an edge unstructured distortion sensitivity threshold and a structural distortion sensitivity threshold based on the edge model parameters, construct a pixel domain JND model of the screen content video according to the brightness adaptive threshold, the contrast masking effect threshold, the edge unstructured distortion sensitivity threshold and the structural distortion sensitivity threshold, and determine a JND factor;
the target bit distribution module is configured to obtain an edge feature factor according to the edge feature and guide target bit distribution by using the edge feature factor and the JND factor;
the model construction module is configured to calculate and obtain similarity measurement factors of the reference frame and the reconstructed frame according to the edge feature factors of the reference frame and the reconstructed frame under the constraint of a pixel domain JND model, and construct a perception rate distortion model based on the similarity measurement factors of the reference frame and the reconstructed frame;
and the updating module is configured to estimate the coding parameters through a perceptual rate distortion model, and update the perceptual rate distortion model according to the actual coding parameters.
9. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.
CN202211156529.3A 2022-09-22 2022-09-22 Screen content video coding perception code rate control method and device based on just noticeable distortion by human eyes Withdrawn CN115567712A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211156529.3A CN115567712A (en) 2022-09-22 2022-09-22 Screen content video coding perception code rate control method and device based on just noticeable distortion by human eyes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211156529.3A CN115567712A (en) 2022-09-22 2022-09-22 Screen content video coding perception code rate control method and device based on just noticeable distortion by human eyes

Publications (1)

Publication Number Publication Date
CN115567712A true CN115567712A (en) 2023-01-03

Family

ID=84741390

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211156529.3A Withdrawn CN115567712A (en) 2022-09-22 2022-09-22 Screen content video coding perception code rate control method and device based on just noticeable distortion by human eyes

Country Status (1)

Country Link
CN (1) CN115567712A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115967806A (en) * 2023-03-13 2023-04-14 阿里巴巴(中国)有限公司 Data frame coding control method and system and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115967806A (en) * 2023-03-13 2023-04-14 阿里巴巴(中国)有限公司 Data frame coding control method and system and electronic equipment
CN115967806B (en) * 2023-03-13 2023-07-04 阿里巴巴(中国)有限公司 Data frame coding control method, system and electronic equipment

Similar Documents

Publication Publication Date Title
US11310509B2 (en) Method and apparatus for applying deep learning techniques in video coding, restoration and video quality analysis (VQA)
Zhou et al. End-to-end Optimized Image Compression with Attention Mechanism.
WO2021068598A1 (en) Encoding method and device for screen sharing, and storage medium and electronic equipment
CN110139112B (en) Video coding method based on JND model
CN110062236B (en) Code rate allocation method, system and medium based on just-perceivable distortion of space-time domain
CN110620924B (en) Method and device for processing coded data, computer equipment and storage medium
CN114071189A (en) Video processing device and video streaming processing method
US20200068200A1 (en) Methods and apparatuses for encoding and decoding video based on perceptual metric classification
KR20080050491A (en) Classified filtering for temporal prediction
CN112399176B (en) Video coding method and device, computer equipment and storage medium
WO2020098751A1 (en) Video data encoding processing method and computer storage medium
WO2022000298A1 (en) Reinforcement learning based rate control
CN115567712A (en) Screen content video coding perception code rate control method and device based on just noticeable distortion by human eyes
CN115661008A (en) Image enhancement processing method, device, equipment and medium
CN111327950A (en) Video transcoding method and device
CN112437301B (en) Code rate control method and device for visual analysis, storage medium and terminal
WO2022021422A1 (en) Video coding method and system, coder, and computer storage medium
Wang et al. Rate constrained multiple-QP optimization for HEVC
US20230108722A1 (en) Allocating bit rate between video streams using machine learning
CN112243129B (en) Video data processing method and device, computer equipment and storage medium
CN102948147A (en) Video rate control based on transform-coefficients histogram
Kim et al. Implementation of DWT-based adaptive mode selection for LCD overdrive
CN110944199A (en) Screen content video code rate control method based on space-time perception characteristics
US11330258B1 (en) Method and system to enhance video quality in compressed video by manipulating bit usage
CN115988201B (en) Method, apparatus, electronic device and storage medium for encoding film grain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20230103

WW01 Invention patent application withdrawn after publication