CN114222121A

CN114222121A - Video encoding method, video encoding device, electronic device, and computer-readable storage medium

Info

Publication number: CN114222121A
Application number: CN202111572857.7A
Authority: CN
Inventors: 薛毅; 黄跃; 黄博; 白瑞
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2022-03-22
Anticipated expiration: 2041-12-21
Also published as: CN114222121B

Abstract

The present disclosure relates to a video encoding method, an apparatus, an electronic device, and a computer-readable storage medium, the video encoding method including: obtaining a quantization parameter of each image frame of a video to be coded, wherein each image frame comprises at least one coding unit; acquiring the edge density and the texture complexity of at least one coding unit; determining a first quantization parameter offset of the at least one coding unit according to the edge density of the at least one coding unit; for each coding unit, determining a second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and the coding reference relationship between the coding unit and other coding units; adjusting the quantization parameter of the corresponding coding unit based on the first quantization parameter offset and the second quantization parameter offset; and coding the video to be coded according to the quantization parameter of each coding unit. The method can improve the coding efficiency.

Description

Video encoding method, video encoding device, electronic device, and computer-readable storage medium

Technical Field

The present disclosure relates to the field of video technologies, and in particular, to a video encoding method, an apparatus, an electronic device, and a computer-readable storage medium.

Background

Video transmission does not separate video coding, which means that an original video format file is converted into another video format file through a compression technology. For a video, the content of different image frames and different areas of the same image frame may have large differences, and the requirement for retaining image details is different. The smaller the quantization parameter used in encoding, the more detail in the image can be preserved. Therefore, if the fixed quantization parameter is used for video encoding, the content with high detail retention requirement may be distorted due to the overlarge quantization parameter, the content with low detail retention requirement wastes code rate due to the undersize quantization parameter, the detail retention requirement and the code rate saving requirement are difficult to balance, and the encoding efficiency is low.

In order to solve the problem, in the related art, there exists a coding optimization scheme for adjusting quantization parameters according to different requirements for retaining image details, however, different schemes often determine the requirements for retaining image details from different angles, each scheme has a bias, and the coding efficiency improvement effect is limited.

Disclosure of Invention

The present disclosure provides a video encoding method, apparatus, electronic device and computer-readable storage medium to at least solve the problem of how to improve encoding efficiency in the related art.

According to a first aspect of the present disclosure, there is provided a video encoding method, comprising: obtaining a quantization parameter of each image frame of a video to be coded, wherein each image frame comprises at least one coding unit; acquiring the edge density and the texture complexity of the at least one coding unit; determining a first quantization parameter offset for the at least one coding unit based on the edge density of the at least one coding unit; for each coding unit, determining a second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and the coding reference relation between the coding unit and other coding units; adjusting the quantization parameter of the respective coding unit based on the first quantization parameter offset and the second quantization parameter offset; and coding the video to be coded according to the quantization parameter of each coding unit.

Optionally, the determining, based on the texture complexity of the coding unit and the coding reference relationship between the coding unit and other coding units, a second quantization parameter offset of the corresponding coding unit includes: and if the texture complexity is determined to be less than or equal to a texture threshold, determining the second quantization parameter offset based on the coding reference relation.

Optionally, the determining, based on the texture complexity of the coding unit and the coding reference relationship between the coding unit and other coding units, a second quantization parameter offset of the corresponding coding unit further includes: if the texture complexity is determined to be greater than the texture threshold and the coding unit belongs to the face region, determining the second quantization parameter offset based on the coding reference relationship; and if the texture complexity is determined to be greater than the texture threshold and the coding unit does not belong to the face region, determining that the second quantization parameter offset is 0.

Optionally, the video encoding method further comprises: acquiring a face area of each image frame; determining whether each of the coding units belongs to a face region based on a positional relationship of the corresponding coding unit to the face region of the image frame.

Optionally, the image frame of the video to be encoded includes a plurality of reference frames and an intermediate frame located between two adjacent reference frames, a face region of the reference frame is obtained through image recognition, and a face region of the intermediate frame is obtained through at least one of a track translation algorithm and a skin color matching algorithm based on the face regions of the two corresponding reference frames.

Optionally, the determining the second quantization parameter offset based on the coding reference relationship includes: determining the second quantization parameter offset based on the referenced degree, wherein the second quantization parameter offset is a negative value that is negatively correlated with the referenced degree.

Optionally, the texture complexity is obtained by: traversing each image frame, determining the gradient amplitude of each pixel of the coding unit and summing for each coding unit of the current image frame to obtain the absolute texture complexity of the corresponding coding unit; determining an average value based on the absolute texture complexity of the at least one coding unit of the current image frame to obtain a reference texture complexity of the current image frame; and determining the ratio of the absolute texture complexity to the reference texture complexity for each coding unit of the current image frame to obtain the texture complexity of the corresponding coding unit.

Optionally, the determining a first quantization parameter offset of the at least one coding unit according to the edge density of the at least one coding unit includes: for each image frame, determining an average value based on the edge density of the at least one coding unit to obtain an average edge density of the corresponding image frame; determining a difference between the edge density of each coding unit and the average edge density of the corresponding image frame, and obtaining the first quantization parameter offset of the corresponding coding unit based on the difference and an intensity factor, wherein the intensity factor is positively correlated to the average edge density of the corresponding image frame.

According to a second aspect of the present disclosure, there is provided a video encoding apparatus including: an acquisition unit configured to: obtaining a quantization parameter of each image frame of a video to be coded, wherein each image frame comprises at least one coding unit; the acquisition unit is further configured to: acquiring the edge density and the texture complexity of the at least one coding unit; a first computing unit configured to: determining a first quantization parameter offset for the at least one coding unit based on the edge density of the at least one coding unit; a second calculation unit configured to: for each coding unit, determining a second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and the coding reference relation between the coding unit and other coding units; an adjustment unit configured to: adjusting the quantization parameter of the respective coding unit based on the first quantization parameter offset and the second quantization parameter offset; an encoding unit configured to: and coding the video to be coded according to the quantization parameter of each coding unit.

Optionally, the second computing unit is further configured to: and if the texture complexity is determined to be less than or equal to a texture threshold, determining the second quantization parameter offset based on the coding reference relation.

Optionally, the second computing unit is further configured to: if the texture complexity is determined to be greater than the texture threshold and the coding unit belongs to the face region, determining the second quantization parameter offset based on the coding reference relationship; and if the texture complexity is determined to be greater than the texture threshold and the coding unit does not belong to the face region, determining that the second quantization parameter offset is 0.

Optionally, the obtaining unit is further configured to: acquiring a face area of each image frame; the second computing unit is further configured to: determining whether each of the coding units belongs to a face region based on a positional relationship of the corresponding coding unit to the face region of the image frame.

Optionally, the encoding reference relationship includes a referred degree, the referred degree being a degree to which the coding unit is referred to by other coding units in inter prediction, and the second calculation unit is further configured to: determining the second quantization parameter offset based on the referenced degree, wherein the second quantization parameter offset is a negative value that is negatively correlated with the referenced degree.

Optionally, the first computing unit is further configured to: for each image frame, determining an average value based on the edge density of the at least one coding unit to obtain an average edge density of the corresponding image frame; determining a difference between the edge density of each coding unit and the average edge density of the corresponding image frame, and obtaining the first quantization parameter offset of the corresponding coding unit based on the difference and an intensity factor, wherein the intensity factor is positively correlated to the average edge density of the corresponding image frame.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a video encoding method according to the present disclosure.

According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by at least one processor, cause the at least one processor to perform a video encoding method according to the present disclosure.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by at least one processor, implement a video encoding method according to the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the video coding method and the video coding device, the edge density of the coding unit is preferably combined to obtain the first quantization parameter offset used for adjusting the quantization parameter, and the edge density is related to the distortion sensitivity of human eyes to textures, so that the quantization parameter of the coding unit sensitive to the human eyes can be properly reduced, the subjective quality of coding is improved, the quantization parameter of the coding unit insensitive to the human eyes is properly increased, the code rate is saved, and the subjective quality of coding can be improved under the condition of certain code rate. In addition, texture complexity and coding reference relations are introduced, which can reflect the subjective and objective importance of the coding unit respectively. In contrast to a scheme in which the second quantization parameter offset is determined directly using the coding reference relationship, jointly determining the second quantization parameter offset in combination with the texture complexity and the coding reference relationship according to the embodiments of the present disclosure can weaken subjective and objective contradictions. By applying the first quantization parameter offset and the second quantization parameter offset in a superposition manner, the subjective and objective quality of coding can be balanced, the grasp of the requirement for retaining image details can be enhanced, and the coding efficiency can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a flowchart illustrating a video encoding method according to an exemplary embodiment of the present disclosure.

Fig. 2 is a flowchart illustrating a video encoding method according to an exemplary embodiment of the present disclosure.

Fig. 3 is a schematic diagram illustrating a structure of an MTCNN network according to an exemplary embodiment of the present disclosure.

Fig. 4 is a block diagram illustrating a video encoding apparatus according to an exemplary embodiment of the present disclosure.

Fig. 5 is a block diagram of an electronic device according to an example embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.

The video coding optimization technology can adjust Quantization Parameters (QP) according to different requirements for preserving image details, so as to improve coding efficiency. Different optimization tools often determine the need to preserve image detail from different angles, and thus the optimization results are biased.

CAQ (Content Adaptive quantization) is a subjective coding quality optimization tool in video coding. Because human eyes have different distortion sensitivities to different content characteristics (texture complexity, motion complexity and ROI regions) in an image, the CAQ adopts a specific QP to encode different contents, namely, a region with higher subjective sensitivity adopts a relatively smaller QP to encode, so that the subjective quality of an encoding unit is improved, a region with lower subjective sensitivity adopts a relatively larger QP to encode so as to save a code rate, and finally the subjective quality of encoding can be improved under the condition of certain code rate.

Coding Units Tree (Coding unit Tree) is an objective Coding quality optimization tool in video Coding. The basic principle of the tool is that the QP is adjusted according to the degree of the current coding unit being referred to during preprocessing, namely in a multi-layer reference frame structure, a coding unit with high degree of reference adopts a smaller QP, and conversely, a coding unit with low degree of reference adopts a larger QP, so that the purpose of improving the objective quality of coding is achieved. Specifically, the curree generally adjusts the QP in a manner of reducing the original QP, and only reduces the QP more for coding units with high reference levels, and reduces the QP less for coding units with low reference levels.

According to the video coding method of the exemplary embodiment of the present disclosure, the two tools can be reasonably combined by introducing texture complexity, thereby taking both subjective quality and objective quality of coding into consideration.

Hereinafter, a video encoding method and a video encoding apparatus according to exemplary embodiments of the present disclosure will be described in detail with reference to fig. 1 to 5.

Fig. 1 is a flowchart illustrating a video encoding method according to an exemplary embodiment of the present disclosure. It should be understood that the video encoding method according to the exemplary embodiments of the present disclosure may be implemented in a terminal device such as a smart phone, a tablet computer, a Personal Computer (PC), or may be implemented in a device such as a server.

Referring to fig. 1, in step 101, a quantization parameter of each image frame of a video to be encoded is obtained, wherein each image frame includes at least one coding unit. The video to be encoded is a processing object of the video encoding method according to the exemplary embodiment of the present disclosure, and encoding thereof may be implemented based on respective quantization parameters of each image frame constituting the video to be encoded. Specifically, one image frame may be divided into at least one CTU (Coding Tree Unit), each CTU having a size of 64 × 64. Each CTU can be further divided into four Coding Units (CUs) of smaller size, and can be further subdivided, and the size of the Coding units can be selected as required during actual Coding. Fig. 2 is a flowchart illustrating a video encoding method according to an exemplary embodiment of the present disclosure. Referring to fig. 2, the coding unit may be, for example, a 16 × 16 CU sub-block.

Referring back to fig. 1, in step 102, the edge density and texture complexity of at least one coding unit is obtained. Besides the original quantization parameters, the edge density and texture complexity of each coding unit can be obtained for

subsequent steps

103 and 104.

Specifically, the edge density of the coding unit may be obtained by edge filtering and variance calculation, and the edge filtering may be, for example, sobel edge filtering.

Optionally, the texture complexity is obtained by: traversing each image frame, determining the gradient amplitude of each pixel of the coding unit and summing for each coding unit of the current image frame to obtain the absolute texture complexity of the corresponding coding unit; determining an average value based on absolute texture complexity of all coding units of the current image frame to obtain reference texture complexity of the current image frame; for the currentAnd each coding unit of the image frame determines the ratio of the absolute texture complexity to the reference texture complexity to obtain the texture complexity of the corresponding coding unit. By performing the calculation based on the gradient magnitude of each pixel of the coding unit, the texture detail inside each coding unit can be sufficiently extracted. The average parameters of all coding units in one image frame are introduced as reference, and the texture complexity is represented in a ratio form, so that the parameters can be normalized, the universality of the parameters is improved, and the subsequent application is facilitated. Specifically, the encoding unit may be filtered according to the horizontal and vertical convolution kernels shown in the formula (1), that is, the gradient calculation is performed to obtain the horizontal and vertical gradients of each pixel position of the encoding unit, and then the gradient amplitude GM of the corresponding pixel is calculated_iAs shown in formula (2).

Sum of gradient amplitudes (i.e. absolute texture complexity) GM of coding units_CUCalculating as shown in formula (3) to obtain GM_CUThen, the average GM of the absolute texture complexity of all the coding units in one image frame is calculated_avgAs shown in formula (4). Finally, the texture complexity gmr (i) of each coding unit is calculated as shown in equation (5).

It is understood that the edge density and texture complexity of each coding unit may be calculated by other tools, and the video coding method according to the exemplary embodiment of the present disclosure may be directly used, or may be calculated in step 102. In addition, step 102 is dedicated to acquiring parameters, and the execution timing thereof may be before step 103 and step 104, or the parameter acquisition may be performed only when step 103 and step 104 need to apply corresponding specific parameters, which is not limited in this disclosure.

In step 103, a first quantization parameter offset for the at least one coding unit is determined based on the edge density of the at least one coding unit. This step corresponds to the subjective tool CAQ described above, and referring to fig. 2, the QP is adjusted by using CAQ, and a first quantization parameter offset is determined according to the edge density by using an edge adaptive quantization method. The edge density can reflect the complexity of the edge of the coding unit, and further reflect the complexity of the internal texture. The higher the edge density is, the more complex the inner texture of the coding unit is considered, and the distortion sensitivity of human eyes to the texture complex region is low, and a forward offset can be added to the quantization parameter, that is, the offset of the first quantization parameter is positive. The lower the edge density, the flatter the internal texture of the coding unit is considered, and the distortion sensitivity of human eyes to the flat texture region is high, a negative offset may be added to the quantization parameter, that is, the offset of the first quantization parameter is negative.

Optionally, step 103 specifically includes: determining an average value of each image frame based on the edge densities of all the coding units to obtain the average edge density of the corresponding image frame; determining a difference value between the edge density of each coding unit and the average edge density of the corresponding image frame, and obtaining a first quantization parameter offset of the corresponding coding unit based on the difference value and an intensity factor, wherein the intensity factor is positively correlated with the average edge density of the corresponding image frame. By calculating the difference between the edge density of a single coding unit and the average edge density, the edge density of a single coding unit can be compared with the edge densities of other coding units. If the edge density is greater than the average level, the edge density is large, the texture complexity is high, the offset of the obtained first quantization parameter is a positive value, and the forward offset of the quantization parameter can be realized; if the edge density is smaller than the average level, the edge density is small, the texture complexity is low, the offset of the obtained first quantization parameter is a negative value, and negative offset of the quantization parameter can be realized. In addition, because the intensity factor is positively correlated with the average edge density, the method can not only perform personalized adjustment on different image frames, but also keep the consistency in a single image frame, and is favorable for obtaining more reasonable first quantization parameter offset. Specifically, for one coding unit, the first quantization parameter offset may be obtained based on a difference between the edge density and the average edge density of the corresponding image frame and the intensity factor, and a product of the difference and the intensity factor may be used as the first quantization parameter offset.

Referring back to fig. 1, in step 104, for each coding unit, a second quantization parameter offset of the corresponding coding unit is determined based on the texture complexity of the coding unit and the coding reference relationship between the coding unit and other coding units.

Optionally, step 104 specifically includes: and if the texture complexity is determined to be less than or equal to the texture threshold, determining a second quantization parameter offset based on the coding reference relation. The coding reference relation comprises a referenced degree, and the referenced degree is a degree that a coding unit is referenced by other coding units in the inter-frame prediction. Referring to fig. 2, the step of determining the second quantization parameter offset may be to adjust the QP, and particularly, determine the second quantization parameter offset, using the aforementioned curtree, based on the referenced degree of the coding unit. The step of determining a second quantization parameter offset comprises: based on the referenced degree, a second quantization parameter offset is determined, wherein the second quantization parameter offset is a negative value that is negatively correlated with the referenced degree. That is, when the quantization parameter is adjusted by using the curree, the quantization parameter is always adjusted to be smaller, that is, the second quantization parameter offset is a negative value. And the larger the degree to which the coding unit is referred to is, the more objectively important the coding unit is, the stronger the requirement for retaining the image details is, so the larger the reduction of the quantization parameter is, i.e. the larger the absolute value of the second quantization parameter offset is, the smaller the second quantization parameter offset is, the negative correlation with the referred degree is. Because the QP can be adjusted by directly adopting the CUTree, the tool development cost can be reduced, and the improvement on the existing video coding method is facilitated. Referring back to step 104, when the texture complexity is small, the texture of the coding unit may be considered relatively flat, and the human eye is sensitive to this, and in step 103, the quantization parameter of the coding unit is usually negatively offset, that is, the first quantization parameter offset is a negative value. At the moment, the CUTree is directly superposed, so that the quantization parameters can be further reduced, the main adjustment direction cannot be influenced, and the influence of the reference degree on the quantization parameters can be superposed and reflected, thereby being beneficial to improving the quality of the subjective and objective of the human eye sensitive area.

Optionally, referring to fig. 2, step 104 specifically further includes: if the texture complexity is larger than the texture threshold value and the coding unit belongs to the face area, determining a second quantization parameter offset based on the coding reference relation; and if the texture complexity is determined to be greater than the texture threshold and the coding unit does not belong to the face region, determining the second quantization parameter offset to be 0. It is to be understood that, based on the referenced degree of the coding unit, the step of determining the second quantization parameter offset may also be to adjust the QP by using the aforementioned curree, which is not described herein again. When the texture complexity is large, the texture of the coding unit may be considered to be relatively complex, and human eyes are relatively insensitive to this, and the quantization parameter of the coding unit is usually forward shifted in step 103, that is, the first quantization parameter offset is a positive value. If the curree is directly superimposed, the quantization parameter may negatively shift due to the fact that the reduction exceeds the first quantization parameter offset obtained in step 103, which results in poor subjective benefit of the CAQ tool. In addition, in some scenes, the area of the face region is small due to the fact that the human or animal is far away from the lens, the face region is judged to be a complex texture region by the edge density algorithm, then the quantization parameter of the face region is increased, namely the offset of the first quantization parameter is a positive value, the face region is distorted, and subjective perception is obvious. When the texture complexity is determined to be high, whether the face region belongs to the face region is further judged, and the CUTree is continuously overlapped when the face region is determined, so that the quantization parameters of the corresponding coding units can be recalled, the influence of regarding the face region as the texture complexity region in the step 103 can be weakened, and the subjective quality of the face region is improved. When the coding unit is judged not to belong to the face region, the coding unit can be considered to be only a common texture complex region, and the risk of deterioration of the subjective benefit of the CAQ tool can be reduced by not overlapping the CUTree. Therefore, by applying the video coding method according to the exemplary embodiment of the disclosure, the problem of poor subjective profit when the curree and the CAQ are applied simultaneously can be solved, the curree is superimposed adaptively based on the texture complexity, the problem that the face region is identified as the eye insensitive region in the existing CAQ can be solved, the subjective and objective coding quality of the eye sensitive region is further improved under the same code rate, and the coding efficiency is improved.

Optionally, the video encoding method according to an exemplary embodiment of the present disclosure further includes: acquiring a face area of each image frame; whether the corresponding coding unit belongs to the face region is determined based on the positional relationship of each coding unit with the face region of the corresponding image frame. By extracting the face region from the angle of the whole image frame, the accuracy of the extracted face region can be guaranteed. And then the extracted face area is compared with the coding units in the image frame, so that whether each coding unit belongs to the face area can be determined, face identification does not need to be carried out on each coding unit, the accuracy and the speed of judgment can be improved, and the judgment efficiency can be improved. For example, when there is a partial region falling within a face region in a coding unit, the coding unit may be considered to belong to the face region; or, a coding unit can be considered to belong to the face region only when the coding unit completely falls into the face region; a scale threshold (e.g., 50%) may also be configured, and a coding unit is considered to belong to a face region when a region of the coding unit that exceeds the scale threshold falls within the face region, which is not limited by the present disclosure.

Alternatively, the face region in the image frame may be obtained by image recognition. For example, MTCNN (Multi-task Cascaded Convolutional Neural Network) can be used, which can identify face position and locate five sense organs. The MTCNN Network is structured as shown in fig. 3 and is composed of a three-part Network, where P-Net (recommended Network) is used to quickly and roughly detect some candidate boxes, R-Net (optimized Network) is used to check the candidate boxes and further exclude non-face candidates, and O-Net (Output Network) is used to generate accurate candidate boxes and face and facial feature position coordinates. As another example, face detection in opencv, YOLO network face detection, etc. may be employed, which is not limited by this disclosure.

Optionally, the image frame of the video to be encoded includes a plurality of reference frames and an intermediate frame located between two adjacent reference frames, the face area of the reference frame is obtained through image recognition, and the face area of the intermediate frame is obtained through at least one of a track translation algorithm and a skin color matching algorithm based on the face areas of the two corresponding reference frames. The face region of the intermediate frame is tracked based on the reference frame by only extracting a plurality of reference frames at intervals and applying an image recognition algorithm, so that the calculation complexity of an image recognition network (particularly the face recognition network) can be greatly reduced, and the face region detection accuracy is improved. As an example, one reference frame may be selected for each 3 image frames. The track translation algorithm and the skin color matching algorithm are existing mature algorithms and are not described herein.

In step 105, the quantization parameter of the corresponding coding unit is adjusted based on the first quantization parameter offset and the second quantization parameter offset. Based on the quantization parameter offset obtained in step 103 and step 104, the quantization parameter may be summed with the original quantization parameter, thereby implementing the adjustment of the quantization parameter.

In step 106, the video to be encoded is encoded according to the quantization parameter of each coding unit. The video to be coded is coded by using the quantization parameters obtained after adjustment, the CUTree can be adaptively overlapped based on the texture complexity on the basis of the CAQ, the problem that the face region is identified as the eye insensitive region in the existing CAQ can be solved, the subjective and objective coding quality of the eye sensitive region is further improved under the same code rate, and the coding efficiency is improved. The specific encoding process is a mature technology and is not described herein.

By adopting the video coding method according to the exemplary embodiment of the disclosure, 20 professional subjective evaluators are selected, subjective blind measurement is performed on 100 sequences in the KwaiMp4 sequence set, on the basis of subjective quality leveling (anchor: test is 50:50), the code rate of the test is reduced by 5% compared with that of the anchor, and meanwhile, the coding time is only increased by 0.5% and can be ignored.

Fig. 4 is a block diagram illustrating a video encoding apparatus according to an exemplary embodiment of the present disclosure. It should be understood that the video encoding apparatus according to the exemplary embodiments of the present disclosure may be implemented in a terminal device such as a smart phone, a tablet computer, a Personal Computer (PC) in a software, hardware, or a combination of software and hardware, and may also be implemented in a device such as a server.

Referring to fig. 4, video encoding apparatus 400 includes an acquisition unit 401, a first calculation unit 402, a second calculation unit 403, an adjustment unit 404, and an encoding unit 405.

The obtaining unit 401 may obtain a quantization parameter of each image frame of a video to be encoded, where each image frame includes at least one encoding unit. The video to be encoded is a processing object of the video encoding apparatus 400 according to the exemplary embodiment of the present disclosure, and encoding thereof may be implemented based on respective quantization parameters of each image frame constituting the video to be encoded. Specifically, one image frame may be divided into at least one CTU, each CTU having a size of 64 × 64. Each CTU can be further divided into four coding units of smaller size, and can be further subdivided, and the size of the coding unit can be selected as required during actual coding.

The obtaining unit 401 may further obtain an edge density and a texture complexity of at least one coding unit. Besides the original quantization parameters, the edge density and texture complexity of each coding unit can be obtained for the subsequent first calculation unit 402 and second calculation unit 403 to use.

Optionally, the texture complexity is obtained by: traversing each image frame, determining the gradient amplitude of each pixel of the coding unit and summing for each coding unit of the current image frame to obtain the absolute texture complexity of the corresponding coding unit; determining an average value based on absolute texture complexity of all coding units of the current image frame to obtain reference texture complexity of the current image frame; and determining the ratio of the absolute texture complexity to the reference texture complexity of each coding unit of the current image frame to obtain the texture complexity of the corresponding coding unit. By performing the calculation based on the gradient magnitude of each pixel of the coding unit, the texture detail inside each coding unit can be sufficiently extracted. The average parameters of all coding units in one image frame are introduced as reference, and the texture complexity is represented in a ratio form, so that the parameters can be normalized, the universality of the parameters is improved, and the subsequent application is facilitated.

It is understood that the edge density and the texture complexity of each coding unit may be calculated by other tools, and the video coding apparatus 400 according to the exemplary embodiment of the present disclosure may be directly used for acquisition, or may be calculated by the acquisition unit 401. In addition, the obtaining unit 401 is dedicated to obtain the parameters, and the execution timing thereof may be before the first calculating unit 402 and the second calculating unit 403, or the parameter obtaining may be performed only when the first calculating unit 402 and the second calculating unit 403 need to apply the corresponding specific parameters, which is not limited in this disclosure.

The first calculation unit 402 may determine a first quantization parameter offset of the at least one coding unit according to the edge density of the at least one coding unit. The first calculating unit 402 corresponds to the subjective tool CAQ, and may use an edge adaptive quantization method to determine a first quantization parameter offset according to the edge density. The edge density can reflect the complexity of the edge of the coding unit, and further reflect the complexity of the internal texture. The higher the edge density is, the more complex the inner texture of the coding unit is considered, and the distortion sensitivity of human eyes to the texture complex region is low, and a forward offset can be added to the quantization parameter, that is, the offset of the first quantization parameter is positive. The lower the edge density, the flatter the internal texture of the coding unit is considered, and the distortion sensitivity of human eyes to the flat texture region is high, a negative offset may be added to the quantization parameter, that is, the offset of the first quantization parameter is negative.

Optionally, the first computing unit 402 specifically performs the following actions: determining an average value of each image frame based on the edge densities of all the coding units to obtain the average edge density of the corresponding image frame; determining a difference value between the edge density of each coding unit and the average edge density of the corresponding image frame, and obtaining a first quantization parameter offset of the corresponding coding unit based on the difference value and an intensity factor, wherein the intensity factor is positively correlated with the average edge density of the corresponding image frame. By calculating the difference between the edge density of a single coding unit and the average edge density, the edge density of a single coding unit can be compared with the edge densities of other coding units. If the edge density is greater than the average level, the edge density is large, the texture complexity is high, the offset of the obtained first quantization parameter is a positive value, and the forward offset of the quantization parameter can be realized; if the edge density is smaller than the average level, the edge density is small, the texture complexity is low, the offset of the obtained first quantization parameter is a negative value, and negative offset of the quantization parameter can be realized. In addition, because the intensity factor is positively correlated with the average edge density, the method can not only perform personalized adjustment on different image frames, but also keep the consistency in a single image frame, and is favorable for obtaining more reasonable first quantization parameter offset. Specifically, for one coding unit, the first quantization parameter offset may be obtained based on a difference between the edge density and the average edge density of the corresponding image frame and the intensity factor, and a product of the difference and the intensity factor may be used as the first quantization parameter offset.

The second calculation unit 403 may determine, for each coding unit, a second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and a coding reference relationship between the coding unit and other coding units.

Optionally, the second computing unit 403 specifically performs the following actions: if the texture complexity is determined to be less than or equal to the texture threshold, a second quantization parameter offset is determined based on the encoding reference level. The coding reference relation comprises a referenced degree, and the referenced degree is a degree that a coding unit is referenced by other coding units in the inter-frame prediction. Determining the second quantization parameter offset based on the referenced degree of the coding unit may be adjusting the QP using the aforementioned curree, and specifically determining the second quantization parameter offset. The act of determining a second quantization parameter offset comprises: based on the referenced degree, a second quantization parameter offset is determined, wherein the second quantization parameter offset is a negative value that is negatively correlated with the referenced degree. That is, when the quantization parameter is adjusted by using the curree, the quantization parameter is always adjusted to be smaller, that is, the second quantization parameter offset is a negative value. And the larger the degree to which the coding unit is referred to is, the more objectively important the coding unit is, the stronger the requirement for retaining the image details is, so the larger the reduction of the quantization parameter is, i.e. the larger the absolute value of the second quantization parameter offset is, the smaller the second quantization parameter offset is, the negative correlation with the referred degree is. Because the QP can be adjusted by directly adopting the CUTree, the tool development cost can be reduced, and the improvement on the existing video coding method is facilitated. Referring back to the second calculating unit 403, when the texture complexity is small, the texture of the coding unit may be considered to be relatively flat, and human eyes are sensitive to this, and the first calculating unit 402 usually makes a negative offset to the quantization parameter of the coding unit, that is, the offset of the first quantization parameter is a negative value. At this time, the second computing unit 403 can further reduce the quantization parameter by directly overlaying the curree, does not affect the main adjustment direction, and can superpose and reflect the influence of the reference degree on the quantization parameter, thereby contributing to the improvement of the quality of the subjective and objective images in the human eye sensitive region.

Optionally, the second computing unit 403 may specifically perform the following actions: if the texture complexity is larger than the texture threshold value and the coding unit belongs to the face area, determining a second quantization parameter offset based on the coding reference relation; and if the texture complexity is determined to be greater than the texture threshold and the coding unit does not belong to the face region, determining the second quantization parameter offset to be 0. It is to be understood that, based on the referenced degree of the coding unit, the step of determining the second quantization parameter offset may also be to adjust the QP by using the aforementioned curree, which is not described herein again. When the texture complexity is large, the texture of the coding unit may be considered to be relatively complex, and human eyes are relatively insensitive to this, and the first calculating unit 402 usually performs a forward offset on the quantization parameter of the coding unit, that is, the offset of the first quantization parameter is a positive value. If the curree is directly superimposed, the reduction amplitude exceeds the first quantization parameter offset obtained by the first calculation unit 402, so that the quantization parameter is negatively offset, which may result in poor subjective benefit of the CAQ tool. In addition, in some scenes, the area of the face region is small due to the fact that the human or animal is far away from the lens, the face region is judged to be a complex texture region by the edge density algorithm, then the quantization parameter of the face region is increased, namely the offset of the first quantization parameter is a positive value, the face region is distorted, and subjective perception is obvious. When the texture complexity is determined to be high, whether the face region belongs to the face region is further judged, and the cut is continuously overlapped when the face region is determined, so that the quantization parameters of the corresponding coding units can be recalled, the influence of the first computing unit 402 regarding the face region as the texture complex region is weakened, and the subjective quality of the face region is improved. When the coding unit is judged not to belong to the face region, the coding unit can be considered to be only a common texture complex region, and the risk of deterioration of the subjective benefit of the CAQ tool can be reduced by not overlapping the CUTree. Therefore, by applying the video coding device 400 according to the exemplary embodiment of the present disclosure, the problem of degraded subjective benefit when concurrently applying the curree and the CAQ can be solved, the curree is adaptively superimposed based on the texture complexity, and the problem of identifying the face region as the eye insensitive region in the existing CAQ can be solved, so that the subjective and objective coding quality of the eye sensitive region can be further improved under the same code rate, and the coding efficiency can be improved.

Alternatively, the acquisition unit 401 may also acquire a face region of each image frame; the second calculation unit 403 may also determine whether or not the corresponding coding unit belongs to the face region based on the positional relationship of each coding unit with the face region of the corresponding image frame. By extracting the face region from the angle of the whole image frame, the accuracy of the extracted face region can be guaranteed. And then the extracted face area is compared with the coding units in the image frame, so that whether each coding unit belongs to the face area can be determined, face identification does not need to be carried out on each coding unit, the accuracy and the speed of judgment can be improved, and the judgment efficiency can be improved. For example, the second calculation unit 403 may consider a coding unit as belonging to a face region when there is a partial region in the coding unit that falls into the face region; or, a coding unit can be considered to belong to the face region only when the coding unit completely falls into the face region; a scale threshold (e.g., 50%) may also be configured, and a coding unit is considered to belong to a face region when a region of the coding unit that exceeds the scale threshold falls within the face region, which is not limited by the present disclosure.

Alternatively, the face region in the image frame may be obtained by image recognition.

Optionally, the image frame of the video to be encoded includes a plurality of reference frames and an intermediate frame located between two adjacent reference frames, the face area of the reference frame is obtained through image recognition, and the face area of the intermediate frame is obtained through at least one of a track translation algorithm and a skin color matching algorithm based on the face areas of the two corresponding reference frames. The face region of the intermediate frame is tracked based on the reference frame by only extracting a plurality of reference frames at intervals and applying an image recognition algorithm, so that the calculation complexity of an image recognition network (particularly the face recognition network) can be greatly reduced, and the face region detection accuracy is improved. The track translation algorithm and the skin color matching algorithm are existing mature algorithms and are not described herein.

The adjustment unit 404 may adjust the quantization parameter of the corresponding coding unit based on the first quantization parameter offset and the second quantization parameter offset. Based on the quantization parameter offsets obtained by the first and

second calculation units

402 and 403, the original quantization parameters may be summed, thereby implementing the adjustment of the quantization parameters.

The encoding unit 405 may encode the video to be encoded according to the quantization parameter of each encoding unit. The video to be coded is coded by using the quantization parameters obtained after adjustment, the CUTree can be adaptively overlapped based on the texture complexity on the basis of the CAQ, the problem that the face region is identified as the eye insensitive region in the existing CAQ can be solved, the subjective and objective coding quality of the eye sensitive region is further improved under the same code rate, and the coding efficiency is improved. The specific encoding process is a mature technology and is not described herein.

Referring to fig. 5, an electronic device 500 includes at least one memory 501 and at least one processor 502, the at least one memory 501 having stored therein a set of computer-executable instructions that, when executed by the at least one processor 502, perform a video encoding method according to an exemplary embodiment of the present disclosure.

By way of example, the electronic device 500 may be a PC computer, tablet device, personal digital assistant, smartphone, or other device capable of executing the set of instructions described above. Here, the electronic device 500 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. The electronic device 500 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).

In the electronic device 500, the processor 502 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special-purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor 502 may execute instructions or code stored in the memory 501, wherein the memory 501 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.

The memory 501 may be integrated with the processor 502, for example, by having RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 501 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 501 and the processor 502 may be operatively coupled or may communicate with each other, e.g., through I/O ports, network connections, etc., such that the processor 502 is able to read files stored in the memory.

In addition, the electronic device 500 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 500 may be connected to each other via a bus and/or a network.

According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium, in which instructions, when executed by at least one processor, cause the at least one processor to perform a video encoding method according to an exemplary embodiment of the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an exemplary embodiment of the present disclosure, a computer program product may also be provided, the computer program product comprising computer instructions which, when executed by at least one processor, cause the at least one processor to carry out the video encoding method according to an exemplary embodiment of the present disclosure.

According to the video coding method, the video coding device, the electronic equipment and the computer readable storage medium of the exemplary embodiment of the disclosure, the edge density of the coding unit is preferentially combined to obtain the first quantization parameter offset used for adjusting the quantization parameter, and because the edge density is related to the distortion sensitivity of human eyes to textures, the quantization parameter of the coding unit which is sensitive to human eyes can be properly reduced, the subjective quality of coding is improved, the quantization parameter of the coding unit which is relatively insensitive to human eyes is properly increased, the code rate is saved, and finally the subjective quality of coding can be improved under the condition of certain code rate. In addition, the referred degree of the coding unit can reflect the importance degree of each coding unit in the coding process, and the quantization parameter is adjusted by combining the referred degree, so that the objective quality of coding can be improved. However, directly overlapping the two adjustment strategies, it may happen that the adjustment directions are opposite and cancel each other out. By introducing the condition of texture complexity, which can reflect the sensitivity of human eyes, and determining whether to superpose the influence of the reference degree on the basis of the edge density based on the texture complexity when adjusting the quantization parameter, the objective quality of the coding can be improved as much as possible while the subjective quality of the coding is preferentially ensured, and the coding efficiency is improved.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A video encoding method, comprising:

obtaining a quantization parameter of each image frame of a video to be coded, wherein each image frame comprises at least one coding unit;

acquiring the edge density and the texture complexity of the at least one coding unit;

determining a first quantization parameter offset for the at least one coding unit based on the edge density of the at least one coding unit;

for each coding unit, determining a second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and the coding reference relation between the coding unit and other coding units;

adjusting the quantization parameter of the respective coding unit based on the first quantization parameter offset and the second quantization parameter offset;

and coding the video to be coded according to the quantization parameter of each coding unit.

2. The video coding method of claim 1, wherein determining the second quantization parameter offset for the corresponding coding unit based on the texture complexity of the coding unit and coding reference relationships between the coding unit and other coding units comprises:

and if the texture complexity is determined to be less than or equal to a texture threshold, determining the second quantization parameter offset based on the coding reference relation.

3. The video coding method of claim 2, wherein the determining the second quantization parameter offset for the corresponding coding unit based on the texture complexity of the coding unit and coding reference relationships between the coding unit and other coding units further comprises:

if the texture complexity is determined to be greater than the texture threshold and the coding unit belongs to the face region, determining the second quantization parameter offset based on the coding reference relationship;

and if the texture complexity is determined to be greater than the texture threshold and the coding unit does not belong to the face region, determining that the second quantization parameter offset is 0.

4. The video encoding method of claim 3, wherein the video encoding method further comprises:

acquiring a face area of each image frame;

determining whether each of the coding units belongs to a face region based on a positional relationship of the corresponding coding unit to the face region of the image frame.

5. The video coding method of claim 4, wherein the image frames of the video to be coded comprise a plurality of reference frames and an intermediate frame located between two adjacent reference frames, the face regions of the reference frames are obtained by image recognition, and the face regions of the intermediate frame are obtained by at least one of a track shift algorithm and a skin color matching algorithm based on the face regions of the corresponding two reference frames.

6. The video coding method of any of claims 2 to 5, wherein the coding reference relationship comprises a referenced degree, the referenced degree being a degree to which the coding unit is referenced by other coding units in inter prediction, and wherein the determining the second quantization parameter offset based on the coding reference relationship comprises:

determining the second quantization parameter offset based on the referenced degree, wherein the second quantization parameter offset is a negative value that is negatively correlated with the referenced degree.

7. A video encoding apparatus, comprising:

an acquisition unit configured to: obtaining a quantization parameter of each image frame of a video to be coded, wherein each image frame comprises at least one coding unit;

the acquisition unit is further configured to: acquiring the edge density and the texture complexity of the at least one coding unit;

a first computing unit configured to: determining a first quantization parameter offset for the at least one coding unit based on the edge density of the at least one coding unit;

a second calculation unit configured to: for each coding unit, determining a second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and the coding reference relation between the coding unit and other coding units;

an adjustment unit configured to: adjusting the quantization parameter of the respective coding unit based on the first quantization parameter offset and the second quantization parameter offset;

an encoding unit configured to: and coding the video to be coded according to the quantization parameter of each coding unit.

8. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform the video encoding method of any of claims 1 to 6.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform the video encoding method of any of claims 1 to 6.

10. A computer program product comprising computer instructions, characterized in that the computer instructions, when executed by at least one processor, implement the video encoding method of any of claims 1 to 6.