WO2020098751A1

WO2020098751A1 - Video data encoding processing method and computer storage medium

Info

Publication number: WO2020098751A1
Application number: PCT/CN2019/118526
Authority: WO
Inventors: 徐科; 宋剑军; 宋利; 王浩
Original assignee: 深圳市中兴微电子技术有限公司
Priority date: 2018-11-14
Filing date: 2019-11-14
Publication date: 2020-05-22
Also published as: CN111193931B; CN111193931A

Abstract

Disclosed are a video data encoding processing method and a computer storage medium. The method comprises: before performing encoding on an object to be encoded, acquiring spatial domain perception information k_si and time domain perception information k_ti of each encoding unit in the object, wherein i is an integer greater than or equal to 1; calculating and obtaining time and spatial domain joint perception information k_pi according to the spatial domain perception information k_si of each encoding unit and the time domain perception information k_ti of each encoding unit; using the time and spatial domain joint perception information of each encoding unit to calculate an adjustment coefficient η_i of a lagrange multiplier corresponding to each encoding unit; and in the process of performing an encoding operation on the object, encoding each encoding unit in the object according to the adjustment coefficient η_i and the lagrange multiplier.

Description

Video data encoding processing method and computer storage medium

This application requires the priority of the Chinese patent application filed on November 14, 2018 in the Chinese Patent Office with the application number 201811353976.1 and the invention titled "A video data encoding processing method and computer storage medium", the entire contents of which are incorporated by reference In this application.

Technical field

Embodiments of the present application relate to, but are not limited to, the field of signal processing, and provide a video data encoding processing method and a computer storage medium.

Background technique

HEVC (High Efficiency Video Coding) video coding standards mainly use the statistical correlation of video signals to eliminate redundant information in the spatial and temporal domains based on coding techniques such as intra-frame and inter-frame prediction, but these coding techniques do not Consider the subjective visual characteristics of the human eye. In addition, in order to make the reconstructed video have a higher encoding quality at a given code rate, many video encoding modules use Rate Distortion Optimization (RDO) technology to select the optimal encoding mode. In the rate-distortion optimization process, it is necessary that the distortion function can better characterize the characteristics of the video signal, and it is easy to calculate. Due to the limited level of knowledge of the Human Visual System (HVS), it is difficult to accurately quantify visual quality very well. Therefore, in the rate-distortion optimization calculation, mean square error (Mean Square Error, MSE) or SSE (Sum of Square Error, and variance) are often used as distortion measurement indicators. As we all know, MSE or SSE does not consider any human visual characteristics, making the subjective visual quality effect of encoded video not ideal. At the same time, as the final recipient of video image information, the human visual system has a large amount of perceptual redundancy. Therefore, with the research on the video quality evaluation (VQA) indicators with subjective perception characteristics and the visual characteristics of human eyes, these quality evaluation indicators based on visual perception and the visual characteristics of human eyes can be combined and applied to video coding In optimization, a coding optimization scheme based on visual perception is designed to eliminate visual perception redundancy to improve the subjective visual effect of decoded video.

In the related art, some methods have been proposed to improve coding performance by studying human visual characteristics. One is to put forward objective quality assessment indicators that can reflect the distortion of visual perception. For example, the more commonly used structural similarity index (Structured Similarity, SSIM) considers the structural information of the image and the brightness and contrast masking factors. Because of its good subjective consistency, it is widely used as a quality evaluation index for video coding. Therefore, a rate-distortion optimization method based on SSIM is proposed to improve the mode decision-making process in inter-frame coding, or to establish a SSIM-related distortion model for adjusting the distortion and Lagrange multiplier of the rate-distortion equation. The other is to use visual distortion sensitivity, such as minimum noticeable difference (Just Noticeable Difference, JND) and other models to improve the perceptual coding performance. A quantization process that uses JND for adaptive motion estimation to reduce perceptual redundancy in pixel domain residuals or adaptively adjust DCT frequency-domain transform coefficients based on JND is proposed.

In view of the above method, the coding rate consumed by coding efficiency is relatively high, so how to effectively reduce the coding rate is an urgent problem to be solved.

Summary of the invention

In order to solve the above technical problems, the present application provides a video data encoding processing method and a computer storage medium, which can effectively reduce the bit rate of encoding consumption.

In order to achieve the above object of the invention, the present application provides a video data encoding processing method, including:

Before performing the encoding of the object to be encoded, obtain the spatial domain sensing information k _si and the temporal domain sensing information k _ti of each coding unit in the object to be encoded, where i is an integer greater than or equal to 1;

According to the spatial domain sensing information k _{si of} each coding unit and the temporal domain sensing information k _{ti of} each coding unit, the temporal and spatial domain joint sensing information k _{pi of} each coding unit is calculated;

Calculate the adjustment coefficient η _i of the Lagrange multiplier corresponding to each coding unit by using the joint temporal and spatial domain sensing information of each coding unit described above;

During the encoding operation on the object to be encoded, each encoding unit in the object to be encoded is encoded according to the adjustment coefficient η _i and the Lagrange multiplier.

In an exemplary embodiment, the above spatial sensing information k _si of each coding unit is determined according to the gradient amplitude k _gi and / or the variance value k _{σi of} each coding unit.

In an exemplary embodiment, the above calculation of the gradient amplitude k _gi and / or variance value k _{σi of} each coding unit requires each pixel value. For the YUV sequence, the pixel value includes the luminance component Y and the chroma Either component U or chrominance component V is calculated, or the weighted average of the three is used for calculation.

In an exemplary embodiment, the above spatial domain sensing information k _si of each coding unit is obtained by the following calculation expression:

k _si = (1-τ) · k _gi + τ · k _σi ;

Among them, τ is a constant weighting coefficient, the value range is [0,1].

In an exemplary embodiment, the gradient amplitude k _gi of each coding unit is obtained as follows, including:

Calculate the horizontal and vertical gradient amplitudes of each pixel in the i-th coding unit;

Calculate the average gradient amplitude value of the i-th coding unit according to the gradient amplitude values of each pixel in the horizontal direction and the vertical direction;

After obtaining the average gradient amplitude value of the coding unit of the object to be encoded, the normalized gradient amplitude value k _{gi of} the i-th coding unit is calculated.

In an exemplary embodiment, the normalized gradient amplitude k _gi of the i-th coding unit is obtained by the following calculation expression:

Where G (i) represents the average gradient amplitude of the i-th coding unit, and N _block represents the total number of coding units in the object to be coded, where j is an integer greater than or equal to 1.

In an exemplary embodiment, the variance value k _σi of each coding unit is obtained in the following manner, including:

Acquiring the variance value between the pixel value of the i-th coding unit and the pixel value of the reference coding unit of the reference image;

After obtaining the variance value of the coding unit of the object to be coded, the normalized variance value k _{σi of} the i-th coding unit is calculated.

In an exemplary embodiment, the normalized variance value k _σi of the i-th coding unit is obtained by the following calculation expression:

among them,

Represents the variance of the ith encoding unit, N _block represents the total number of encoding units in the object to be encoded, c ₂ is a constant coefficient, where j is an integer greater than or equal to 1.

In an exemplary embodiment, the time domain perception information k _ti of each coding unit is calculated by the motion vector and motion compensation in the coding unit, where the motion compensation is the difference between the object to be coded and the preset reference frame Vector distance between.

In an exemplary embodiment, each pixel value needed for the calculation of the time domain perception information k _{ti of} each coding unit described above, for the YUV sequence, the pixel value includes the luminance component Y, the chrominance component U, and the chrominance One of the components V is calculated, or the weighted average of the three is used for calculation.

In an exemplary embodiment, the time domain perception information k _ti of each coding unit is obtained by the following calculation expression:

Where (v _x , v _y ) represents the motion vector of the coding block in the coding unit, d (o, p) represents the distance between the frame corresponding to the current coding unit and the frame corresponding to the reference unit corresponding to the current coding unit, different coding in the same frame The frame of the reference unit corresponding to the unit is different or the same, o, p represent the coordinate information of the i-th coding unit above, and o and p are real numbers.

In an exemplary embodiment, the joint spatio-temporal sensing information k _p (i) of each coding unit is obtained by the following calculation expression:

Among them, c is a constant, which has the same order of magnitude as k _ti , and A _s is an adjustment parameter of the spatial domain sensing information k _si .

In one exemplary embodiment, the above-described spatial perception information k _si adjustment parameter A _s by calculating the spatial perceptual information k _si mean square error MSE obtained; or by calculating the spatial perceptual information k _si of the absolute differences SAD resulting Or, obtained by calculating the SATD of the Hadamard transform algorithm of the spatial domain sensing information k _si .

In an exemplary embodiment, the adjustment coefficient η _i corresponding to each coding unit described above is obtained by calculating an expression as follows:

among them,

Is the linear transformation result of k _pi , N _block represents the total number of coding units in the object to be encoded, and j is an integer greater than or equal to 1.

In an exemplary embodiment, the value of the adjustment coefficient η _i corresponding to each coding unit is calculated as follows:

In an exemplary embodiment, the above

It is obtained by calculating the expression as follows:

Among them, a and b are constant parameters, with the same order of magnitude as k _pi .

In an exemplary embodiment, the foregoing encoding each encoding unit in the object to be encoded according to the foregoing adjustment coefficient η _i and Lagrange multiplier includes:

Use the following calculation expression to get the Lagrange multiplier of the i-th coding unit

include:

among them,

Represents the Lagrangian multiplier with the sum variance SSE as the distortion index;

Use the Lagrange multiplier of the i-th coding unit above

Encode the ith encoding unit.

In order to achieve the above object of the invention, the present application provides a computer storage medium for storing a computer program, where the above computer program is executed by a processor to implement any of the above methods.

Compared with the related art, this application includes obtaining the spatial domain sensing information k _si and the temporal domain sensing information k _ti of each coding unit in the object to be coded before performing the coding of the object to be coded, and then according to the The spatial domain sensing information k _si and the time domain sensing information k _{ti of} each coding unit are calculated to obtain the spatial and temporal domain joint sensing information k _pi of each coding unit, and the above spatial and temporal domain joint sensing information of each coding unit is used to calculate each code The adjustment coefficient η _i of the Lagrangian multiplier corresponding to the unit, and finally encoding each coding unit in the object to be encoded according to the adjustment coefficient η _i and the Lagrange multiplier for adaptive dynamic adjustment Lagrange multiplier in the process of rate distortion optimization, so as to effectively reduce the bit rate of coding consumption while keeping the subjective quality basically unchanged.

Other features and advantages of the present application will be explained in the subsequent description, and partly become obvious from the description, or be understood by implementing the present application. The purpose and other advantages of the present application can be realized and obtained by the structures particularly pointed out in the description, claims and drawings.

BRIEF DESCRIPTION

The drawings are used to provide an optional understanding of the technical solutions of the present application, and form a part of the specification. They are used to explain the technical solutions of the present application together with the embodiments of the present application, and do not constitute a limitation on the technical solutions of the present application.

1 is a flowchart of a video data encoding processing method provided by this application;

FIG. 2 is a flowchart of a rate-distortion coding optimization method based on the visual masking effect in the space-time domain provided by the present application.

detailed description

To make the objectives, technical solutions, and advantages of the present application clearer, the embodiments of the present application will be described in detail below with reference to the drawings. It should be noted that the embodiments in the present application and the features in the embodiments can be arbitrarily combined with each other without conflict.

The steps shown in the flowcharts of the figures can be performed in a computer system such as a set of computer-executable instructions. And, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order different from here.

FIG. 1 is a flowchart of a video data encoding processing method provided by this application. The method shown in Figure 1 includes:

Step 101: Before performing encoding on the object to be coded, obtain spatial domain sensing information k _si and time domain sensing information k _ti of each coding unit in the object to be encoded, where i is an integer greater than or equal to 1;

In this step, the object to be encoded may be a certain video frame or a certain area in the video frame; the object to be encoded includes one or at least two encoding units, and the spatial domain sensing information k _si and time of each encoding unit are calculated Domain awareness information k _ti ;

In an exemplary embodiment, the spatial domain sensing information k _si of each coding unit is determined according to the gradient amplitude k _gi and / or the variance value k _{σi of} each coding unit;

Step 102: According to the spatial domain sensing information k _{si of} each coding unit and the temporal domain sensing information k _{ti of} each coding unit, calculate the temporal and spatial domain joint sensing information k _{pi of} each coding unit;

In an exemplary embodiment, the spatiotemporal joint sensing information k _p (i) of each coding unit is obtained by the following calculation expression:

Step 103: Calculate the adjustment coefficient η _i of the Lagrangian multiplier corresponding to each coding unit using the joint sensing information of each coding unit in time and space domains;

The adjustment coefficient η _i corresponding to each coding unit is obtained by calculating the expression as follows:

among them,

Is the linear transformation result of k _pi , N _block represents the total number of coding units in the object to be coded, and j is an integer greater than or equal to 1.

Step 104: During the encoding operation of the object to be encoded, encode each encoding unit in the object to be encoded according to the adjustment coefficient η _i and the Lagrange multiplier.

In an exemplary embodiment, the Lagrange multiplier of the i-th coding unit is obtained using the following calculation expression

include:

among them,

Lagrange multiplier using the i-th coding unit

Encode the ith encoding unit.

In the method embodiment provided by the present application, before performing the encoding of the object to be encoded, the spatial domain sensing information k _si and the time domain sensing information k _ti of each coding unit in the object to be encoded are obtained, and then according to the spatial domain of each coding unit Perceptual information k _si and time-domain perceptual information k _{ti of} each coding unit, the time-space domain joint perceptual information k _pi of each coding unit is calculated, and the time-space domain joint perceptual information of each coding unit is used to calculate the correspondence of each coding unit The adjustment coefficient η _i of the Lagrangian multiplier, and finally, according to the adjustment coefficient η _i and the Lagrange multiplier, encode each coding unit in the object to be coded, which is used in the adaptive dynamic adjustment rate distortion optimization process The Lagrangian multiplier effectively reduces the bit rate of coding consumption while keeping the subjective quality basically unchanged.

The method embodiments provided in this application are further described below:

In the process of implementing the present application, the inventor found that: the method of encoding using objective quality assessment indicators, because there is a large amount of time-domain redundant information between video frames, and SSIM only considers the spatial structural characteristics, so the video quality Evaluation performance is not as effective as image quality assessment. If the encoding processing method using the visual distortion sensitivity is adopted, the content and visual perception characteristics of the time domain and the space domain are not considered, and there is also a problem that the encoding bit rate effect is too high.

In view of the reasons obtained by the inventors' analysis, the present application proposes to calculate the Lagrangian multiplier adjustment coefficient of each coding unit through joint sensing information in the space-time domain, and adaptively adjust the Lagrangian multiplier during the encoding process. Then the adjusted Lagrange multiplier is used for encoding.

In an exemplary embodiment, the calculation of the gradient amplitude k _gi and / or variance value k _{σi of} each coding unit requires each pixel value. For the YUV sequence, the pixel value includes the luminance component Y and the chrominance component Either U or chrominance component V is calculated, or the weighted average of the three is used for calculation.

In this exemplary embodiment, for the YUV sequence, the pixel value information can be one of the three YUV values, or two of the three YUV values can be obtained by weighted average, or the three YUV values can be obtained by weighted average Get the value.

In an exemplary embodiment, the spatial domain sensing information k _si of each coding unit is obtained by the following calculation expression:

k _si = (1-τ) · k _gi + τ · k _σi ;

Among them, τ is a constant weighting coefficient, the value range is [0,1].

In this exemplary embodiment, the gradient amplitude k _{gi of the} coding unit and the variance value k _σi can be selected together to determine the spatial sensing information of the coding unit more accurately; when the two values are jointly confirmed, it can be passed Set different weights for the two values to complete the calculation of the airspace perception information.

According to the horizontal and vertical gradient amplitudes of each pixel, the average gradient amplitude of the i-th coding unit is calculated;

After the average gradient amplitude of the coding unit of the object to be encoded is obtained, the normalized gradient amplitude k _{gi of} the i-th coding unit is calculated.

In an exemplary embodiment, the average gradient amplitude of the coding unit can be obtained by the following calculation expression, including:

Wherein, G _h and G _v respectively represent the gradient of each pixel in the horizontal direction and the vertical direction, N _pixel represents the number of pixels of the current coding unit, r and s are the coordinate positions of the pixels, where r and s are real numbers.

In an exemplary embodiment, the normalized gradient amplitude k _gi of the i-th coding unit is obtained by calculating the expression as follows:

In an exemplary embodiment, the variance value k _σi of each coding unit is obtained as follows, including:

among them,

In an exemplary embodiment, the time domain perception information k _ti of each coding unit is calculated by the motion vector in the coding unit, where the motion vector is obtained by the motion search minimum variance value.

In an exemplary embodiment, the time domain perception information k _{ti of} each coding unit needs to be calculated for each pixel value. For the YUV sequence, the pixel value includes a luminance component Y, a chroma component U, and a chroma component V, take one of the calculations, or take the weighted average of the three to calculate.

Where (v _x , v _y ) represents the motion vector of the coding block in the coding unit, d (o, p) represents the distance between the frame corresponding to the current coding unit and the frame corresponding to the reference unit of the current coding unit, and different coding units in the same frame The frames of the corresponding reference units are different or the same, o, p represent the coordinate information of the i-th coding unit, and o and p are real numbers.

In one exemplary embodiment, the spatial perception information k _si adjustment parameter A _s by calculating the spatial perceptual information k _si mean square error MSE obtained; or by calculating the spatial perceptual information k _si of absolute difference SAD obtained Or, it is obtained by calculating the SATD of the Hadamard transform algorithm of the spatial domain sensing information k _si .

In an exemplary embodiment, the adjustment coefficient η _i corresponding to each coding unit is obtained by calculating the expression as follows:

among them,

Here, the value of the adjustment coefficient η _i corresponding to each coding unit is calculated as follows:

In the above calculation expression, the adjustment factor [eta] _i ranges is limited, effective control of the adjustment coefficient [eta] _i value is too large or too small, resulting in Lagrange multipliers extreme outliers, to ensure the normal calculation data .

In an exemplary embodiment,

It is obtained by calculating the expression as follows:

The spatio-temporal joint perception information k _pi also takes into account the characteristics of video content such as the complexity of the spatial texture and the intensity of temporal motion. For areas with complex textures and intense motion, the spatial sensing information k _si and the temporal sensing information k _ti will be relatively large, which results in the spatio-temporal joint sensing information k _pi becoming smaller. By linearly transforming the spatio-temporal sensing information k _pi , The above changes can be eliminated to better apply it in rate distortion optimization.

This application mainly uses the visual characteristics of human eyes such as the temporal and spatial domain visual masking effect as a starting point to optimize the visual perception coding. Optionally, for the spatial masking effect, the distortion of the complex texture area is hardly noticeable by the human eye compared to the flat area, that is to say, the human eye is not sensitive to the distortion of the complex texture area. Therefore, these areas can accommodate or hide more visual distortion than flat areas. Similarly, for the time-domain masking effect, details and distortions of objects in areas with severe motion are harder to be noticed by the human eye than areas with static or slow motion. As the movement speeds up, the clarity of the object will further decrease. Therefore, the human eye is not sensitive to distortion in areas of intense movement. Therefore, when the same distortion is introduced, areas with complex textures or violent motions can produce higher subjective visual quality than flat or still areas. According to the above-mentioned spatial and temporal masking effects, the spatial and temporal perception factors of each coding unit are first calculated during implementation, and then the Lagrangian multiplication during rate-distortion optimization during encoding is performed according to the synthesized temporal and spatial joint perception factors Sub-adaptive adjustment.

The following provides further explanation with the examples provided in this application:

FIG. 2 is a flowchart of a rate-distortion coding optimization method based on the visual masking effect in the space-time domain provided by the present application. The method shown in Figure 2 includes:

Step 201: Before encoding a video frame, calculate the gradient amplitude values of all encoding units in the object to be encoded, and normalize the gradient values of each encoding unit according to the gradient average values of all encoding units of the current frame to obtain The normalized gradient amplitude k _{g of} each coding unit.

In the present exemplary embodiment, the gradient information in the horizontal direction and the vertical direction can be calculated using the Sobel gradient operator.

After obtaining the gradient amplitude of each coding unit, the normalized gradient amplitude k _{gi of} each coding unit is calculated based on the average gradient amplitude of the frame image, as shown in equation (2).

Where G (i) represents the gradient amplitude of the i-th coding unit calculated according to formula (1), and N _block represents the number of coding units in the object to be coded.

Step 202: Calculate the variance of all coding units in the frame before encoding a frame, and normalize the variance of each coding unit according to the average of the variances of all coding units in the current frame.

The normalized variance value of each coding unit is shown in equation (3).

among them,

Represents the variance of the i-th coding unit, N _block represents the number of coding units in the current frame, and c ₂ is a constant coefficient of the SSIM model used to ensure numerical stability.

Step 203: According to the results of

steps

201 and 202, the gradient value and the variance value of each coding unit are weighted to serve as the spatial domain perception factor, and the spatial domain perception factor of each coding unit is obtained.

Combining the results of equations (2) and (3), the spatial domain perception factor k _si can be calculated by weighting k _gi and k _σi , as shown in equation (4). Among them, τ is a constant weighting coefficient, the value range is [0,1].

k _si = (1-τ) · k _gi + τ · k _σi (4)

Step 204: Before encoding a video frame, the previous frame is used as a reference frame to perform motion estimation, calculate the motion vectors and residuals of all coding units in the current frame, and calculate the motion vector intensity of all coding units in the current frame for each The motion vector strength of the coding unit is normalized and used as the time domain perception factor k _ti .

Step 205: First, motion vector estimation is performed on all 16x16 size coding blocks of the current coding unit, and then the motion intensity of the current coding unit is synthesized according to formula (5).

Among them, (v _x , v _y ) represents the motion vector of the coding block in the current coding unit, d (i, j) represents the distance from the current frame to its reference frame, which can be the POC (picture order count) from the current frame to its reference frame , The image serial number).

Step 206: Based on the quality prediction model MOSp, the spatial and temporal perception factors obtained in

steps

203 and 205 are synthesized into a joint temporal and spatial perception factor.

MOSp is a common video quality prediction model as shown in (6), where k is a preset coefficient.

MOSp ＝ 1-k · MSE (6)

Based on the mathematical model of MOSp in (6), after obtaining the spatial domain perception factor k _si and the temporal domain perception factor k _ti through

steps

203 and 205, the joint spatial and spatial domain perception factor k _{pi of} each coding unit is as (7) Formula.

Where c is a constant and has the same order of magnitude as k _t .

Step 207: Calculate the Lagrangian multiplier adjustment coefficient of each coding unit, and perform adaptive dynamic adjustment on the Lagrange multiplier during the encoding process.

The spatiotemporal joint perception factor k _pi improved based on MOSp takes into account the video content characteristics such as spatial texture complexity and temporal motion intensity. For regions with complex textures and intense movements, the spatial domain perception factor k _si and the temporal domain perception factor k _ti will be relatively large, resulting in a small spatio-temporal joint perception factor k _pi . To better apply it to rate-distortion optimization, first define a new distortion index D _p related to MSE, as shown in equation (8).

among them,

Is the linear transformation result of k _p , as shown in equation (9), a and b are constant parameters, and have the same order of magnitude as k _p . According to equation (8), under the same distortion conditions, the texture of the image area with complex texture and violent motion

A larger factor can hide more visual distortion, which is consistent with the visual masking effect in the space and time domains.

Then, by replacing the newly defined distortion index D _p with the distortion D of the original rate-distortion equation, the following relationship can be obtained:

Can be further simplified to:

It can be seen from equation (11) that the change to the distortion D has been transferred to the Lagrange multiplier. In addition, under normal circumstances, the code rate consumed by the coding unit and the resulting distortion distortion have the following relationship model:

Where r (d) represents the code rate consumed by the coding unit, d represents the distortion SSE of the coding unit, σ ² represents the variance of the coding distortion of the coding unit, α is a constant coefficient, and N _pixel represents the number of pixels of the current coding unit . According to the above rate distortion model, the Lagrange multiplier corresponding to the new distortion model D _p can be obtained, as shown in equation (13), where N _block represents the number of coding units, and η _i is an adaptive adjustment coefficient.

According to the above analysis, for the area with complex texture and intense movement, the calculated

It will be relatively large. According to the visual masking effect, these regions can hide more visual distortion. In the process of rate distortion optimization, they should tend to allocate fewer bits to these regions, which is equivalent to choosing a larger pull for these regions in the encoding process Granger multiplier. Therefore, the Lagrange multiplier of the i-th coding unit is adaptively adjusted according to equation (13) during actual coding. In addition, in order to prevent the occurrence of extreme abnormal values, the range of the adaptive coefficient η _i is limited, as shown in equation (14).

The method provided in the application example of this application comprehensively considers the content characteristics of space texture complexity and time domain motion intensity, and synthesizes the joint sensory factors of time and space domain based on the subjective quality prediction model of MOSp (perceptual Mean Opinion Score) for adaptive Dynamically adjust the Lagrange multiplier in the process of rate-distortion optimization, so as to effectively reduce the bit rate of coding consumption while keeping the subjective quality basically unchanged.

Compared with the related art, the coding rate can be effectively reduced while keeping the subjective quality of the video sequence basically unchanged. Optionally, under the condition that the subjective perceived quality is basically unchanged, for the standard test sequence with global motion (take HEVC and CTC sequence as an example), the code rate can be saved by 10% compared to the HEVC standard reference model HM, where the code rate The average reduction is 10.32%, and the SSIM average reduction is 0.00253.

The present application also provides a computer storage medium for storing a computer program, where the computer program is executed by a processor to implement any of the above methods.

Those of ordinary skill in the art may understand that all or some of the steps, systems, and functional modules / units in the method disclosed above may be implemented as software, firmware, hardware, and appropriate combinations thereof. In a hardware implementation, the division between the functional modules / units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical The components are executed in cooperation. Some or all components may be implemented as software executed by a processor, such as a digital signal processor or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As is well known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile implemented in any method or technology for storing information such as computer readable instructions, data structures, program modules, or other data Sex, removable and non-removable media. Computer storage media include but are not limited to RAM, ROM, EEPROM, flash memory or other memory technologies, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, magnetic tape, magnetic disk storage or other magnetic storage devices, or may Any other medium for storing desired information and accessible by a computer. In addition, it is well known to those of ordinary skill in the art that the communication medium generally contains computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transmission mechanism, and may include any information delivery medium .

Industrial applicability

Claims

A video data encoding processing method, including:

Before performing the encoding of the object to be encoded, obtain the spatial domain sensing information k si and the temporal domain sensing information k ti of each coding unit in the object to be encoded, where i is an integer greater than or equal to 1;

According to the spatial domain sensing information k si of each coding unit and the temporal domain sensing information k ti of each coding unit, the temporal and spatial domain joint sensing information k pi of each coding unit is calculated;

Calculate the adjustment coefficient η i of the Lagrangian multiplier corresponding to each coding unit by using the joint temporal and spatial domain sensing information of each coding unit;

During the encoding operation on the object to be encoded, each encoding unit in the object to be encoded is encoded according to the adjustment coefficient η i and the Lagrange multiplier.
The method according to claim 1, wherein the spatial domain sensing information k si of each coding unit is determined according to the gradient amplitude k gi and / or the variance value k σi of each coding unit.
The method according to claim 2, wherein the calculation of the gradient amplitude k gi and / or variance value k σi of each coding unit requires each pixel value, and for the YUV sequence, the pixel value includes the luminance component Y, chrominance component U and chrominance component V, one of them is calculated, or the weighted average of the three is used for calculation.
The method according to claim 2, wherein the spatial domain sensing information k si of each coding unit is obtained by calculating an expression as follows:

k si = (1-τ) · k gi + τ · k σi ;

Among them, τ is a constant weighting coefficient, the value range is [0,1].
The method according to any one of claims 2 to 4, wherein the gradient amplitude k gi of each coding unit is obtained as follows, including:

Calculate the horizontal and vertical gradient amplitudes of each pixel in the i-th coding unit;

Calculate the average gradient amplitude of the i-th coding unit according to the gradient amplitude of the horizontal direction and the vertical direction of each pixel;

After obtaining the average gradient amplitude value of the coding unit of the object to be coded, the normalized gradient amplitude value k gi of the i-th coding unit is calculated.
The method according to claim 5, wherein the normalized gradient amplitude k gi of the i-th coding unit is obtained by the following calculation expression:

Where G (i) represents the average gradient amplitude of the i-th coding unit, and N block represents the total number of coding units in the object to be coded, where j is an integer greater than or equal to 1.
The method according to claim 2 or 3, wherein the variance value k σi of each coding unit is obtained as follows, including:

Acquiring the variance value between the pixel value of the i-th coding unit and the pixel value of the reference coding unit of the reference image;

After obtaining the variance value of the coding unit of the object to be coded, the normalized variance value k σi of the i-th coding unit is calculated.
The method according to claim 7, wherein the normalized variance value k σi of the i-th coding unit is obtained by the following calculation expression:

among them,
Represents the variance of the ith encoding unit, N block represents the total number of encoding units in the object to be encoded, c 2 is a constant coefficient, where j is an integer greater than or equal to 1.
The method according to claim 1 or 2, wherein the time domain perception information k ti of each coding unit is calculated by a motion vector and motion compensation in the coding unit, wherein the motion compensation is the to-be-encoded The vector distance between the object and the preset reference frame.
The method according to claim 9, wherein the time domain perception information k ti of each coding unit calculates each pixel value needed, and for the YUV sequence, the pixel value includes a luminance component Y and a chrominance component Either U or the chrominance component V is calculated, or the weighted average of the three is used for calculation.
The method according to claim 9, wherein the time domain perception information k ti of each coding unit is obtained by calculating an expression as follows:

Where (v x , v y ) represents the motion vector of the coding block in the coding unit, and d (o, p) represents the distance between the frame corresponding to the current coding unit and the frame corresponding to the reference unit of the current coding unit, which are different in the same frame The frames of the reference unit corresponding to the coding unit are different or the same, o, p represent coordinate information of the i-th coding unit, and o and p are real numbers.
The method according to claim 1, wherein the spatio-temporal joint sensing information k p (i) of each coding unit is obtained by the following calculation expression:

Among them, c is a constant, which has the same order of magnitude as k ti , and A s is an adjustment parameter of the spatial domain sensing information k si .
The method of claim 12, wherein the spatial perception information adjustment parameter k si A s is calculated by spatial perceptual information k si mean square error MSE obtained; or by calculating the spatial perceptual information k si absolute and SAD error obtained; or perceptual information k si is calculated by spatial hadamard transform algorithm SATD obtained.
The method according to claim 1 or 11 or 12, wherein the adjustment coefficient η i corresponding to each coding unit is obtained by calculating an expression as follows:

among them,
Is the linear transformation result of k pi , N block represents the total number of coding units in the object to be coded, and j is an integer greater than or equal to 1.
The method according to claim 14, wherein the value of the adjustment coefficient η i corresponding to each coding unit is calculated as follows:
The method according to claim 14, wherein said
It is obtained by calculating the expression as follows:

Among them, a and b are constant parameters, with the same order of magnitude as k pi .
The method according to claim 1, wherein the encoding each encoding unit in the object to be encoded according to the adjustment coefficient η i and the Lagrange multiplier includes:

Use the following calculation expression to get the Lagrange multiplier of the i-th coding unit
include:

among them,
Represents the Lagrangian multiplier with the sum variance SSE as the distortion index;

Lagrange multiplier using the i-th coding unit
Encode the ith encoding unit.
A computer storage medium for storing a computer program, wherein the computer program is executed by a processor to implement the method according to any one of claims 1 to 17.