CN114222121B

CN114222121B - Video encoding method, apparatus, electronic device, and computer-readable storage medium

Info

Publication number: CN114222121B
Application number: CN202111572857.7A
Authority: CN
Inventors: 薛毅; 黄跃; 黄博; 白瑞
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2023-11-14
Anticipated expiration: 2041-12-21
Also published as: CN114222121A

Abstract

The present disclosure relates to a video encoding method, apparatus, electronic device, and computer-readable storage medium, the video encoding method including: acquiring quantization parameters of each image frame of a video to be encoded, wherein each image frame comprises at least one encoding unit; acquiring edge density and texture complexity of at least one coding unit; determining a first quantization parameter offset of the at least one coding unit according to an edge density of the at least one coding unit; determining, for each coding unit, a second quantization parameter offset for the corresponding coding unit based on the texture complexity of the coding unit and a coding reference relationship between the coding unit and other coding units; adjusting quantization parameters of the corresponding coding units based on the first quantization parameter offset and the second quantization parameter offset; and encoding the video to be encoded according to the quantization parameter of each encoding unit. The method can improve the coding efficiency.

Description

Video encoding method, apparatus, electronic device, and computer-readable storage medium

Technical Field

The present disclosure relates to the field of video technologies, and in particular, to a video encoding method, apparatus, electronic device, and computer readable storage medium.

Background

Video transmission is independent of video coding, which means that an original video format file is converted into another video format file by a compression technology. For a video, there may be a large difference in content between different image frames and different areas of the same image frame, and the need for preserving image details may also vary. The smaller the quantization parameter adopted in the encoding process, the more the details in the image can be kept. Therefore, if the fixed quantization parameter is adopted for video coding, the content with large detail reservation requirement may be distorted due to the oversized quantization parameter, the content with low detail reservation requirement may waste the code rate due to the undersized quantization parameter, it is difficult to balance the detail reservation requirement and the code rate saving requirement, and the coding efficiency is low.

In order to solve this problem, in the related art, there are coding optimization schemes for adjusting quantization parameters according to different requirements for preserving image details, however, different schemes often determine the requirements for preserving image details from different angles, which have bias, and the coding efficiency improving effect is limited.

Disclosure of Invention

The present disclosure provides a video encoding method, apparatus, electronic device, and computer-readable storage medium to at least solve the problem of how to improve encoding efficiency in the related art.

According to a first aspect of the present disclosure, there is provided a video encoding method, comprising: acquiring quantization parameters of each image frame of a video to be encoded, wherein each image frame comprises at least one encoding unit; acquiring the edge density and texture complexity of the at least one coding unit; determining a first quantization parameter offset of the at least one coding unit according to an edge density of the at least one coding unit; for each coding unit, determining a second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and coding reference relationships between the coding unit and other coding units; adjusting the quantization parameter of the corresponding coding unit based on the first quantization parameter offset and the second quantization parameter offset; and encoding the video to be encoded according to the quantization parameter of each encoding unit.

Optionally, the determining the second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and the coding reference relationship between the coding unit and other coding units includes: and if the texture complexity is determined to be less than or equal to a texture threshold value, determining the second quantization parameter offset based on the coding reference relation.

Optionally, the determining the second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and the coding reference relationship between the coding unit and other coding units further includes: if the texture complexity is determined to be greater than the texture threshold and the coding unit belongs to a face region, determining the second quantization parameter offset based on the coding reference relationship; if the texture complexity is determined to be greater than the texture threshold and the coding unit does not belong to the face region, determining that the second quantization parameter offset is 0.

Optionally, the video encoding method further comprises: acquiring a face area of each image frame; based on the positional relationship of each of the encoding units and the face area of the corresponding image frame, it is determined whether the corresponding encoding unit belongs to the face area.

Optionally, the image frame of the video to be encoded includes a plurality of reference frames and an intermediate frame located between two adjacent reference frames, the face area of the reference frames is obtained through image recognition, and the face area of the intermediate frame is obtained through at least one of a track translation algorithm and a skin color matching algorithm based on the face areas of the two corresponding reference frames.

Optionally, the coding reference relation includes a referenced degree, the referenced degree being a degree by which the coding unit is referenced by other coding units in inter prediction, and the determining the second quantization parameter offset based on the coding reference relation includes: determining the second quantization parameter offset based on the referenced degree, wherein the second quantization parameter offset is a negative value that is inversely related to the referenced degree.

Optionally, the texture complexity is obtained by: traversing each image frame, determining the gradient amplitude of each pixel of each coding unit for each coding unit of the current image frame, and summing to obtain the absolute texture complexity of the corresponding coding unit; determining an average value based on the absolute texture complexity of the at least one coding unit of the current image frame, resulting in a reference texture complexity of the current image frame; and determining the ratio of the absolute texture complexity to the reference texture complexity for each coding unit of the current image frame to obtain the texture complexity of the corresponding coding unit.

Optionally, the determining the first quantization parameter offset of the at least one coding unit according to the edge density of the at least one coding unit includes: for each image frame, determining an average value based on the edge density of the at least one coding unit, to obtain an average edge density of the corresponding image frame; determining a difference between the edge density of each coding unit and the average edge density of the corresponding image frame, and obtaining the first quantization parameter offset of the corresponding coding unit based on the difference and an intensity factor, wherein the intensity factor is positively correlated with the average edge density of the corresponding image frame.

According to a second aspect of the present disclosure, there is provided a video encoding apparatus comprising: an acquisition unit configured to: acquiring quantization parameters of each image frame of a video to be encoded, wherein each image frame comprises at least one encoding unit; the acquisition unit is further configured to: acquiring the edge density and texture complexity of the at least one coding unit; a first computing unit configured to: determining a first quantization parameter offset of the at least one coding unit according to an edge density of the at least one coding unit; a second computing unit configured to: for each coding unit, determining a second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and coding reference relationships between the coding unit and other coding units; an adjustment unit configured to: adjusting the quantization parameter of the corresponding coding unit based on the first quantization parameter offset and the second quantization parameter offset; an encoding unit configured to: and encoding the video to be encoded according to the quantization parameter of each encoding unit.

Optionally, the second computing unit is further configured to: and if the texture complexity is determined to be less than or equal to a texture threshold value, determining the second quantization parameter offset based on the coding reference relation.

Optionally, the second computing unit is further configured to: if the texture complexity is determined to be greater than the texture threshold and the coding unit belongs to a face region, determining the second quantization parameter offset based on the coding reference relationship; if the texture complexity is determined to be greater than the texture threshold and the coding unit does not belong to the face region, determining that the second quantization parameter offset is 0.

Optionally, the acquisition unit is further configured to: acquiring a face area of each image frame; the second computing unit is further configured to: based on the positional relationship of each of the encoding units and the face area of the corresponding image frame, it is determined whether the corresponding encoding unit belongs to the face area.

Optionally, the coding reference relation includes a referenced degree, the referenced degree being a degree by which the coding unit is referenced by other coding units in inter prediction, and the second calculating unit is further configured to: determining the second quantization parameter offset based on the referenced degree, wherein the second quantization parameter offset is a negative value that is inversely related to the referenced degree.

Optionally, the first computing unit is further configured to: for each image frame, determining an average value based on the edge density of the at least one coding unit, to obtain an average edge density of the corresponding image frame; determining a difference between the edge density of each coding unit and the average edge density of the corresponding image frame, and obtaining the first quantization parameter offset of the corresponding coding unit based on the difference and an intensity factor, wherein the intensity factor is positively correlated with the average edge density of the corresponding image frame.

According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions, when executed by the at least one processor, cause the at least one processor to perform a video encoding method according to the present disclosure.

According to a fourth aspect of the present disclosure, there is provided a computer readable storage medium, which when executed by at least one processor, causes the at least one processor to perform a video encoding method according to the present disclosure.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by at least one processor, implement a video encoding method according to the present disclosure.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the video coding method and the video coding device of the embodiment of the disclosure, the first quantization parameter offset used for adjusting the quantization parameter is obtained by preferentially combining the edge density of the coding unit, and the edge density is related to the distortion sensitivity of human eyes to textures, so that the quantization parameter of the coding unit which is more sensitive to human eyes can be properly reduced, the subjective quality of coding is improved, the quantization parameter of the coding unit which is relatively insensitive to human eyes is properly increased, the code rate is saved, and finally the subjective quality of coding can be improved under the condition of a certain code rate. In addition, texture complexity and coding reference relation are introduced, and the two can reflect subjective importance and objective importance of the coding unit respectively. Compared to schemes that directly use coding reference relationships to determine the second quantization parameter offset, embodiments according to the present disclosure combine texture complexity and coding reference relationships to jointly determine the second quantization parameter offset, which can mitigate subjective and objective contradictions. By superposing the first quantization parameter offset and the second quantization parameter offset, the subjective and objective quality of coding can be balanced, and the grasp of the requirement for reserving image details can be enhanced, so that the coding efficiency is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is a flowchart illustrating a video encoding method according to an exemplary embodiment of the present disclosure.

Fig. 2 is a flowchart illustrating a video encoding method according to an exemplary embodiment of the present disclosure.

Fig. 3 is a schematic diagram illustrating the structure of an MTCNN network according to an exemplary embodiment of the present disclosure.

Fig. 4 is a block diagram illustrating a video encoding apparatus according to an exemplary embodiment of the present disclosure.

Fig. 5 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The embodiments described in the examples below are not representative of all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, in this disclosure, "at least one of the items" refers to a case where three types of juxtaposition including "any one of the items", "a combination of any of the items", "an entirety of the items" are included. For example, "including at least one of a and B" includes three cases side by side as follows: (1) comprises A; (2) comprising B; (3) includes A and B. For example, "at least one of the first and second steps is executed", that is, three cases are juxtaposed as follows: (1) performing step one; (2) executing the second step; (3) executing the first step and the second step.

The video coding optimization technology can adjust quantization parameters (QP, quantitative Parameters) according to different requirements for preserving image details, so that the coding efficiency is improved. Different optimization tools often determine the need to preserve image detail from different angles, and thus the optimization effects are each biased.

CAQ (Content Adaptive Quantification, content adaptive quantization) is a subjective coding quality optimization tool in video coding. Because the human eyes have different distortion sensitivities to different content characteristics (texture complexity, motion complexity and ROI (region of interest) in the image, CAQ adopts specific QP to code different contents, namely adopts relatively smaller QP to code the region with higher subjective sensitivity, improves the subjective quality of a coding unit, and adopts larger QP to code the region with lower subjective sensitivity to save the code rate, and finally can improve the subjective quality of the coding under the condition of a certain code rate.

The cut (Coding Units Tree, coding unit tree) is an objective coding quality optimization tool in video coding. The basic principle of the tool is that QP is adjusted according to the reference degree of the current coding unit during preprocessing, namely, in a multi-layer reference frame structure, smaller QP is adopted for coding units with high reference degree, and on the contrary, larger QP is adopted for coding units with low reference degree, so that the objective quality of coding is improved. Specifically, the CUTree generally adjusts the QP by reducing the original QP, but for coding units with high reference levels, the QP reduction is greater, and for coding units with low reference levels, the QP reduction is smaller.

According to the video coding method of the exemplary embodiment of the present disclosure, the two tools can be reasonably combined by introducing texture complexity, so as to consider both subjective quality and objective quality of coding.

Hereinafter, a video encoding method and a video encoding apparatus according to exemplary embodiments of the present disclosure will be described in detail with reference to fig. 1 to 5.

Fig. 1 is a flowchart illustrating a video encoding method according to an exemplary embodiment of the present disclosure. It should be appreciated that the video encoding method according to the exemplary embodiments of the present disclosure may be implemented in a terminal device such as a smart phone, a tablet computer, a Personal Computer (PC), or in a device such as a server.

Referring to fig. 1, in step 101, a quantization parameter is acquired for each image frame of a video to be encoded, wherein each image frame includes at least one encoding unit. The video to be encoded is a processing object of the video encoding method according to an exemplary embodiment of the present disclosure, and encoding thereof may be implemented based on respective quantization parameters of each image frame constituting the video to be encoded, which aims to reasonably adjust quantization parameters of each encoding unit, thereby improving encoding efficiency. Specifically, one image frame may be divided into at least one CTU (Coding Tree Unit), each having a size of 64×64. Each CTU can be further divided into four smaller-sized Coding Units (CU), and can be further subdivided, and the size of the Coding units can be selected as needed for actual encoding. Fig. 2 is a flowchart illustrating a video encoding method according to an exemplary embodiment of the present disclosure. Referring to fig. 2, the encoding unit may be, for example, a 16×16 CU sub-block.

Referring back to fig. 1, at step 102, the edge density and texture complexity of at least one coding unit is obtained. In addition to the original quantization parameters, the edge density and texture complexity of each coding unit may be obtained for use in subsequent steps 103 and 104.

Specifically, the edge density of the coding unit may be obtained by edge filtering and variance calculation, and the edge filtering may be, for example, sobel edge filtering.

Optionally, the texture complexity is obtained by: traversing each image frame, determining the gradient amplitude of each pixel of each coding unit for each coding unit of the current image frame, and summing to obtain the absolute texture complexity of the corresponding coding unit; determining an average value based on absolute texture complexity of all coding units of the current image frame to obtain reference texture complexity of the current image frame; and determining the ratio of the absolute texture complexity to the reference texture complexity for each coding unit of the current image frame to obtain the texture complexity of the corresponding coding unit. By performing the calculation based on the gradient magnitude of each pixel of the coding unit, the texture details inside each coding unit can be sufficiently extracted. By introducing the average parameters of all the coding units in one image frame as a reference and characterizing the texture complexity in the form of a ratio, the parameters can be normalized, so that the universality of the parameters is improved, and the subsequent application is facilitated. Specifically, the encoding unit may be filtered according to the horizontal and vertical convolution kernels shown in the formula (1), that is, the gradient calculation is performed to obtain the horizontal and vertical gradients of each pixel position of the encoding unit, and then the gradient magnitude GM of the corresponding pixel is calculated _i As shown in formula (2).

Sum of gradient magnitudes (i.e., absolute texture complexity) GM of coding units _CU The calculation is shown in formula (3) to obtain GM _CU Thereafter, an average value GM of absolute texture complexity of all coding units within one image frame is calculated _avg As shown in formula (4). The final texture complexity GMR (i) for each coding unit is calculated as shown in equation (5).

It is understood that the edge density and texture complexity of each coding unit may be calculated by other tools, and the video coding method according to the exemplary embodiment of the present disclosure may be directly obtained and used, or may be calculated in step 102. In addition, step 102 is dedicated to acquiring parameters, and the execution timing may be before step 103 and step 104, or may be when step 103 and step 104 need to apply corresponding specific parameters, which is not limited in this disclosure.

In step 103, a first quantization parameter offset for the at least one coding unit is determined based on the edge density of the at least one coding unit. This step corresponds to the subjective tool CAQ described above, and referring to fig. 2, the QP is adjusted by using CAQ, and the first quantization parameter offset is determined according to the edge density by using the edge adaptive quantization method. The edge density can reflect the complexity of the coding unit edges, and thus the internal texture complexity. The higher the edge density, the more complex the internal texture of the coding unit is considered, while the lower the distortion sensitivity of the human eye to the texture complex region, the more positive the quantization parameter can be added, i.e. the first quantization parameter offset is. The lower the edge density, the flatter the internal texture of the coding unit is considered, while the human eye has high sensitivity to distortion in the texture flattening region, and a negative offset can be added to the quantization parameter, i.e. the first quantization parameter offset is negative.

Optionally, step 103 specifically includes: for each image frame, determining an average value based on the edge densities of all the coding units to obtain an average edge density of the corresponding image frame; determining a difference between the edge density of each coding unit and the average edge density of the corresponding image frame, and obtaining a first quantization parameter offset of the corresponding coding unit based on the difference and an intensity factor, wherein the intensity factor is positively correlated with the average edge density of the corresponding image frame. By calculating the difference between the edge density of the coding unit and the average edge density, the edge density of a single coding unit can be compared with the edge densities of other coding units. If the edge density is larger than the average level, the edge density is larger, the texture complexity is high, and the obtained first quantization parameter offset is a positive value, so that the positive offset of the quantization parameter can be realized; if the edge density is smaller than the average level, the edge density is smaller, the texture complexity is low, the offset of the obtained first quantization parameter is negative, and negative offset of the quantization parameter can be realized. In addition, because the intensity factor is positively correlated with the average edge density, not only can the individuation adjustment be carried out on different image frames, but also the consistency can be kept in a single image frame, and the more reasonable first quantization parameter offset can be obtained. Specifically, for a coding unit, the first quantization parameter offset may be obtained based on the difference between the edge density and the average edge density of the corresponding image frame and the intensity factor, where the product of the difference and the intensity factor is used as the first quantization parameter offset.

Referring back to fig. 1, at step 104, for each coding unit, a second quantization parameter offset for the corresponding coding unit is determined based on the texture complexity of the coding unit and the coding reference relationship between the coding unit and other coding units.

Optionally, step 104 specifically includes: if the texture complexity is determined to be less than or equal to the texture threshold, a second quantization parameter offset is determined based on the coding reference relationship. Wherein the coding reference relation includes a referenced degree, which is a degree by which the coding unit is referenced by other coding units in inter prediction. Referring to fig. 2, the step of determining the second quantization parameter offset based on the referenced degree of the coding unit may be to adjust QP using the aforementioned cut, specifically to determine the second quantization parameter offset. The step of determining the second quantization parameter offset includes: a second quantization parameter offset is determined based on the referenced degree, wherein the second quantization parameter offset is a negative value that is inversely related to the referenced degree. That is, when the quantization parameter is adjusted by the CUTree, the quantization parameter is always adjusted to be smaller, i.e. the second quantization parameter offset is negative. And the greater the degree to which the coding unit is referenced, the more objectively the coding unit is, the more intense the need to preserve its image details, so the greater the amplitude reduction of the quantization parameter, i.e. the greater the absolute value of the second quantization parameter offset, the smaller the second quantization parameter offset, inversely related to the degree to which it is referenced. Since the QP can be directly adjusted by adopting the CUTree, the tool development cost can be reduced, and the existing video coding method can be improved conveniently. Returning to step 104, when the texture complexity is small, the texture of the coding unit is considered to be relatively flat, and the human eye is sensitive to the texture, and in step 103, the quantization parameter of the coding unit is often negatively shifted, i.e. the first quantization parameter offset is a negative value. At this time, the quantization parameter can be further reduced by directly superposing the CUTree, the main adjustment direction is not influenced, and the influence of the referenced degree on the quantization parameter can be reflected in a superposition manner, so that the subjective and objective quality of the human eye sensitive area can be improved.

Optionally, referring to fig. 2, step 104 specifically further includes: if the texture complexity is determined to be greater than the texture threshold and the coding unit belongs to the face region, determining a second quantization parameter offset based on a coding reference relationship; if the texture complexity is determined to be greater than the texture threshold and the coding unit does not belong to the face region, determining that the second quantization parameter offset is 0. It is understood that, based on the referenced degree of the coding unit, the step of determining the second quantization parameter offset may also be to adjust the QP using the aforementioned cut, which is not described herein. When the texture complexity is high, the texture of the coding unit is considered to be relatively complex, and human eyes are relatively insensitive to the texture, and in step 103, the quantization parameter of the coding unit is often shifted forward, i.e. the first quantization parameter shift amount is a positive value. If the CUTree is directly superimposed, the quantization parameter may be negatively shifted due to the amplitude reduction exceeding the first quantization parameter offset obtained in step 103, resulting in a poor subjective benefit of the CAQ tool. In addition, in some scenes, the human or animal is far away from the lens, so that the area of the face area is small, the edge density algorithm can judge the face area as a region with complex texture, and further the quantization parameter is heightened, namely the first quantization parameter offset is a positive value, so that the face area is distorted, and subjective perception is obvious. By further judging whether the texture complexity is large or not and continuing to superimpose the CUTree when the texture complexity is large, the quantization parameter of the corresponding coding unit can be recalled, so that the influence of the texture complexity area in the step 103 is weakened, and the subjective quality of the face area is improved. When the coding unit is judged not to belong to the face region, the coding unit can be considered to be a common texture complex region, and the risk of subjective gain deterioration of the CAQ tool can be reduced by not overlapping the CUTrees. Therefore, by applying the video coding method according to the exemplary embodiment of the present disclosure, the problem of subjective gain degradation when the CUTree and the CAQ are simultaneously applied can be solved, the CUTree is adaptively superimposed based on texture complexity, the problem that the face region is identified as a human eye insensitive region in the existing CAQ can be solved, the subjective and objective coding quality of the human eye sensitive region is further improved under the same code rate, and the coding efficiency is improved.

Optionally, the video encoding method according to an exemplary embodiment of the present disclosure further includes: acquiring a face area of each image frame; based on the positional relationship of each coding unit with the face region of the corresponding image frame, it is determined whether the corresponding coding unit belongs to the face region. By extracting the face region from the whole image frame, the accuracy of the extracted face region can be ensured. And then comparing the extracted face area with the coding units in the image frames, and determining whether each coding unit belongs to the face area or not without respectively carrying out face recognition on each coding unit, thereby being beneficial to improving the accuracy and the judging speed of judgment and improving the judging efficiency. For example, a coding unit may be considered to belong to a face region when there is a partial region in the coding unit that falls into the face region; or a coding unit may be considered to belong to a face region only when the coding unit falls completely within the face region; a scale threshold (e.g., 50%) may also be configured, and when an encoding unit has an area that exceeds the scale threshold falling within the face area, the encoding unit is considered to belong to the face area, which is not limiting in this disclosure.

Alternatively, the face region in the image frame may be obtained by image recognition. For example, MTCNN (Multi-task Cascaded Convolutional Neural Network, a Multi-tasking cascade convolutional neural network) may be employed, which can identify face locations and locate facial features. The MTCNN Network is composed of three parts, as shown in fig. 3, and the P-Net (recommended Network) is used for quickly and roughly detecting some candidate boxes, and the R-Net (optimized Network) is used for checking the candidate boxes, further excluding non-face candidates, and the O-Net (Output Network) is used for generating accurate candidate boxes and face and five-element position coordinates. As another example, face detection in opencv, YOLO network face detection, etc., may be employed, which is not limited by the present disclosure.

Optionally, the image frame of the video to be encoded includes a plurality of reference frames and an intermediate frame located between two adjacent reference frames, the face area of the reference frames is obtained by image recognition, and the face area of the intermediate frame is obtained by at least one of a track translation algorithm and a skin color matching algorithm based on the face areas of the two corresponding reference frames. By extracting a plurality of reference frames only at intervals and applying an image recognition algorithm, the face area tracking of the intermediate frames based on the reference frames can be obtained, so that the computational complexity of an image recognition network (particularly a face recognition network) can be greatly reduced, and the face area detection accuracy is improved. As an example, one reference frame may be selected for each 3 image frames. The track translation algorithm and the skin color matching algorithm are existing mature algorithms, and are not described herein.

In step 105, the quantization parameters of the corresponding coding units are adjusted based on the first quantization parameter offset and the second quantization parameter offset. Based on the quantization parameter offset obtained in step 103 and step 104, the quantization parameter offset may be summed with the original quantization parameter, thereby achieving adjustment of the quantization parameter.

In step 106, the video to be encoded is encoded according to the quantization parameter of each encoding unit. The video to be encoded is encoded by utilizing the quantization parameters obtained after adjustment, the CUTree can be adaptively overlapped based on texture complexity on the basis of CAQ, the problem that the face area is identified as the insensitive area of the human eye in the existing CAQ can be solved, the subjective and objective encoding quality of the sensitive area of the human eye is further improved under the same code rate, and the encoding efficiency is improved. The specific encoding process is a mature technology and will not be described in detail herein.

By adopting the video coding method according to the exemplary embodiment of the present disclosure, 20 professional subjective evaluation personnel are selected, subjective blind measurement is performed on 100 sequences in the KwaiMp4 sequence set, and on the basis of subjective quality leveling (anchor: test=50:50), the code rate of test compared with anchor is reduced by 5%, and meanwhile, the coding time is increased by only 0.5%, which is negligible.

Fig. 4 is a block diagram illustrating a video encoding apparatus according to an exemplary embodiment of the present disclosure. It should be understood that the video encoding apparatus according to the exemplary embodiments of the present disclosure may be implemented in a terminal device such as a smart phone, a tablet computer, a Personal Computer (PC), in software, hardware, or a combination of software and hardware, or may be implemented in a device such as a server.

Referring to fig. 4, the video encoding apparatus 400 includes an acquisition unit 401, a first calculation unit 402, a second calculation unit 403, an adjustment unit 404, and an encoding unit 405.

The acquisition unit 401 may acquire a quantization parameter of each image frame of the video to be encoded, wherein each image frame includes at least one encoding unit. The video to be encoded is a processing object of the video encoding apparatus 400 according to an exemplary embodiment of the present disclosure, and encoding thereof may be implemented based on respective quantization parameters of each image frame constituting the video to be encoded, and the purpose of the present disclosure is to reasonably adjust quantization parameters of each encoding unit, thereby improving encoding efficiency. Specifically, one image frame may be divided into at least one CTU, each having a size of 64×64. Each CTU can be further divided into four smaller-sized coding units, and further subdivided, and the size of the coding units can be selected as desired for actual encoding.

The acquisition unit 401 may also acquire the edge density and texture complexity of at least one coding unit. In addition to the original quantization parameters, the edge density and texture complexity of each coding unit may be obtained for subsequent use by the first computing unit 402 and the second computing unit 403.

Optionally, the texture complexity is obtained by: traversing each image frame, determining the gradient amplitude of each pixel of each coding unit for each coding unit of the current image frame, and summing to obtain the absolute texture complexity of the corresponding coding unit; determining an average value based on absolute texture complexity of all coding units of the current image frame to obtain reference texture complexity of the current image frame; and determining the ratio of the absolute texture complexity to the reference texture complexity for each coding unit of the current image frame to obtain the texture complexity of the corresponding coding unit. By performing the calculation based on the gradient magnitude of each pixel of the coding unit, the texture details inside each coding unit can be sufficiently extracted. By introducing the average parameters of all the coding units in one image frame as a reference and characterizing the texture complexity in the form of a ratio, the parameters can be normalized, so that the universality of the parameters is improved, and the subsequent application is facilitated.

It is understood that the edge density and texture complexity of each encoding unit may be calculated by other tools, and the video encoding apparatus 400 according to the exemplary embodiment of the present disclosure may be directly acquired for use, or may be calculated by the acquisition unit 401. In addition, the acquiring unit 401 is dedicated to acquiring parameters, and the execution timing of the parameter acquiring unit may precede the first computing unit 402 and the second computing unit 403, or may perform parameter acquisition when the first computing unit 402 and the second computing unit 403 need to apply corresponding specific parameters, which is not limited in this disclosure.

The first calculating unit 402 may determine a first quantization parameter offset of the at least one coding unit according to an edge density of the at least one coding unit. The first computing unit 402 corresponds to the subjective tool CAQ, and may determine the first quantization parameter offset according to the edge density by using an edge adaptive quantization method. The edge density can reflect the complexity of the coding unit edges, and thus the internal texture complexity. The higher the edge density, the more complex the internal texture of the coding unit is considered, while the lower the distortion sensitivity of the human eye to the texture complex region, the more positive the quantization parameter can be added, i.e. the first quantization parameter offset is. The lower the edge density, the flatter the internal texture of the coding unit is considered, while the human eye has high sensitivity to distortion in the texture flattening region, and a negative offset can be added to the quantization parameter, i.e. the first quantization parameter offset is negative.

Optionally, the first computing unit 402 specifically performs the following actions: for each image frame, determining an average value based on the edge densities of all the coding units to obtain an average edge density of the corresponding image frame; determining a difference between the edge density of each coding unit and the average edge density of the corresponding image frame, and obtaining a first quantization parameter offset of the corresponding coding unit based on the difference and an intensity factor, wherein the intensity factor is positively correlated with the average edge density of the corresponding image frame. By calculating the difference between the edge density of the coding unit and the average edge density, the edge density of a single coding unit can be compared with the edge densities of other coding units. If the edge density is larger than the average level, the edge density is larger, the texture complexity is high, and the obtained first quantization parameter offset is a positive value, so that the positive offset of the quantization parameter can be realized; if the edge density is smaller than the average level, the edge density is smaller, the texture complexity is low, the offset of the obtained first quantization parameter is negative, and negative offset of the quantization parameter can be realized. In addition, because the intensity factor is positively correlated with the average edge density, not only can the individuation adjustment be carried out on different image frames, but also the consistency can be kept in a single image frame, and the more reasonable first quantization parameter offset can be obtained. Specifically, for a coding unit, the first quantization parameter offset may be obtained based on the difference between the edge density and the average edge density of the corresponding image frame and the intensity factor, where the product of the difference and the intensity factor is used as the first quantization parameter offset.

The second calculation unit 403 may determine, for each coding unit, a second quantization parameter offset for the corresponding coding unit based on the texture complexity of the coding unit and the coding reference relationship between the coding unit and other coding units.

Optionally, the second computing unit 403 specifically performs the following actions: if the texture complexity is determined to be less than or equal to the texture threshold, a second quantization parameter offset is determined based on the coding reference level. Wherein the coding reference relation includes a referenced degree, which is a degree by which the coding unit is referenced by other coding units in inter prediction. The second quantization parameter offset is determined based on the referenced degree of the coding unit, and may be the QP is adjusted by using the aforementioned cut, specifically the second quantization parameter offset is determined. The act of determining the second quantization parameter offset includes: a second quantization parameter offset is determined based on the referenced degree, wherein the second quantization parameter offset is a negative value that is inversely related to the referenced degree. That is, when the quantization parameter is adjusted by the CUTree, the quantization parameter is always adjusted to be smaller, i.e. the second quantization parameter offset is negative. And the greater the degree to which the coding unit is referenced, the more objectively the coding unit is, the more intense the need to preserve its image details, so the greater the amplitude reduction of the quantization parameter, i.e. the greater the absolute value of the second quantization parameter offset, the smaller the second quantization parameter offset, inversely related to the degree to which it is referenced. Since the QP can be directly adjusted by adopting the CUTree, the tool development cost can be reduced, and the existing video coding method can be improved conveniently. Turning back to the second computing unit 403, when the texture complexity is small, the texture of the coding unit is considered to be relatively flat, and the human eye is sensitive to the texture, and the first computing unit 402 tends to negatively shift the quantization parameter of the coding unit, i.e. the first quantization parameter offset is a negative value. At this time, the second computing unit 403 may further reduce the quantization parameter by directly superposing the cut, without affecting the main adjustment direction, and may superimpose and reflect the influence of the referenced degree on the quantization parameter, thereby helping to improve the subjective and objective quality of the sensitive region of the human eye.

Optionally, the second computing unit 403 may specifically perform the following actions: if the texture complexity is determined to be greater than the texture threshold and the coding unit belongs to the face region, determining a second quantization parameter offset based on a coding reference relationship; if the texture complexity is determined to be greater than the texture threshold and the coding unit does not belong to the face region, determining that the second quantization parameter offset is 0. It is understood that, based on the referenced degree of the coding unit, the step of determining the second quantization parameter offset may also be to adjust the QP using the aforementioned cut, which is not described herein. When the texture complexity is high, the texture of the coding unit is considered to be relatively complex, and the human eye is relatively insensitive to the texture, the first computing unit 402 tends to forward shift the quantization parameter of the coding unit, i.e. the first quantization parameter shift amount is a positive value. If the CUTree is directly superimposed, the quantization parameter may be negatively shifted due to the decreasing amplitude exceeding the first quantization parameter offset obtained by the first calculation unit 402, which results in a poor subjective gain of the CAQ tool. In addition, in some scenes, the human or animal is far away from the lens, so that the area of the face area is small, the edge density algorithm can judge the face area as a region with complex texture, and further the quantization parameter is heightened, namely the first quantization parameter offset is a positive value, so that the face area is distorted, and subjective perception is obvious. By further determining whether the texture complexity is large or not and continuing to superimpose the CUTree when the texture complexity is large, quantization parameters of the corresponding coding unit can be recalled, which helps to weaken the influence of the first computing unit 402 on the texture complexity area, so that subjective quality of the face area is improved. When the coding unit is judged not to belong to the face region, the coding unit can be considered to be a common texture complex region, and the risk of subjective gain deterioration of the CAQ tool can be reduced by not overlapping the CUTrees. Therefore, with the video encoding apparatus 400 according to the exemplary embodiment of the present disclosure, the problem of subjective gain degradation when CUTree and CAQ are simultaneously applied can be solved, the CUTree is adaptively superimposed based on texture complexity, the problem of recognizing a face region as a human eye insensitive region in the existing CAQ can be solved, the subjective and objective encoding quality of a human eye sensitive region can be further improved under the same code rate, and the encoding efficiency can be improved.

Alternatively, the acquisition unit 401 may also acquire a face area of each image frame; the second calculation unit 403 may also determine whether or not each of the encoding units belongs to a face region based on the positional relationship of the corresponding encoding unit and the face region of the corresponding image frame. By extracting the face region from the whole image frame, the accuracy of the extracted face region can be ensured. And then comparing the extracted face area with the coding units in the image frames, and determining whether each coding unit belongs to the face area or not without respectively carrying out face recognition on each coding unit, thereby being beneficial to improving the accuracy and the judging speed of judgment and improving the judging efficiency. For example, the second calculating unit 403 may consider that one encoding unit belongs to a face region when there is a partial region falling into the face region in the encoding unit; or a coding unit may be considered to belong to a face region only when the coding unit falls completely within the face region; a scale threshold (e.g., 50%) may also be configured, and when an encoding unit has an area that exceeds the scale threshold falling within the face area, the encoding unit is considered to belong to the face area, which is not limiting in this disclosure.

Alternatively, the face region in the image frame may be obtained by image recognition.

Optionally, the image frame of the video to be encoded includes a plurality of reference frames and an intermediate frame located between two adjacent reference frames, the face area of the reference frames is obtained by image recognition, and the face area of the intermediate frame is obtained by at least one of a track translation algorithm and a skin color matching algorithm based on the face areas of the two corresponding reference frames. By extracting a plurality of reference frames only at intervals and applying an image recognition algorithm, the face area tracking of the intermediate frames based on the reference frames can be obtained, so that the computational complexity of an image recognition network (particularly a face recognition network) can be greatly reduced, and the face area detection accuracy is improved. The track translation algorithm and the skin color matching algorithm are existing mature algorithms, and are not described herein.

The adjusting unit 404 may adjust the quantization parameter of the corresponding encoding unit based on the first quantization parameter offset and the second quantization parameter offset. Based on the quantization parameter offsets obtained by the first calculation unit 402 and the second calculation unit 403, the quantization parameter offsets may be summed with the original quantization parameter, thereby implementing the adjustment of the quantization parameter.

The encoding unit 405 may encode the video to be encoded according to the quantization parameter of each encoding unit. The video to be encoded is encoded by utilizing the quantization parameters obtained after adjustment, the CUTree can be adaptively overlapped based on texture complexity on the basis of CAQ, the problem that the face area is identified as the insensitive area of the human eye in the existing CAQ can be solved, the subjective and objective encoding quality of the sensitive area of the human eye is further improved under the same code rate, and the encoding efficiency is improved. The specific encoding process is a mature technology and will not be described in detail herein.

Referring to fig. 5, an electronic device 500 includes at least one memory 501 and at least one processor 502, the at least one memory 501 having stored therein a set of computer-executable instructions that, when executed by the at least one processor 502, perform a video encoding method according to an exemplary embodiment of the present disclosure.

By way of example, electronic device 500 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the above-described set of instructions. Here, the electronic device 500 is not necessarily a single electronic device, but may be any apparatus or a collection of circuits capable of executing the above-described instructions (or instruction sets) individually or in combination. The electronic device 500 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with either locally or remotely (e.g., via wireless transmission).

In electronic device 500, processor 502 may include a Central Processing Unit (CPU), a Graphics Processor (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.

The processor 502 may execute instructions or code stored in the memory 501, wherein the memory 501 may also store data. The instructions and data may also be transmitted and received over a network via a network interface device, which may employ any known transmission protocol.

The memory 501 may be integrated with the processor 502, for example, RAM or flash memory disposed within an integrated circuit microprocessor or the like. In addition, memory 501 may include a stand-alone device, such as an external disk drive, a storage array, or other storage device usable by any database system. The memory 501 and the processor 502 may be operatively coupled or may communicate with each other, for example, through an I/O port, network connection, etc., such that the processor 502 is able to read files stored in the memory.

In addition, the electronic device 500 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 500 may be connected to each other via a bus and/or a network.

According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium, which when executed by at least one processor, causes the at least one processor to perform a video encoding method according to an exemplary embodiment of the present disclosure. Examples of the computer readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, nonvolatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD+RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, blu-ray or optical disk storage, hard Disk Drives (HDD), solid State Disks (SSD), card memory (such as multimedia cards, secure Digital (SD) cards or ultra-fast digital (XD) cards), magnetic tape, floppy disks, magneto-optical data storage, hard disks, solid state disks, and any other means configured to store computer programs and any associated data, data files and data structures in a non-transitory manner and to provide the computer programs and any associated data, data files and data structures to a processor or computer to enable the processor or computer to execute the programs. The computer programs in the computer readable storage media described above can be run in an environment deployed in a computer device, such as a client, host, proxy device, server, etc., and further, in one example, the computer programs and any associated data, data files, and data structures are distributed across networked computer systems such that the computer programs and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.

According to an exemplary embodiment of the present disclosure, there may also be provided a computer program product comprising computer instructions which, when executed by at least one processor, cause the at least one processor to perform a video encoding method according to an exemplary embodiment of the present disclosure.

According to the video coding method, the video coding device, the electronic equipment and the computer readable storage medium of the exemplary embodiments of the present disclosure, the edge density of the coding units is preferentially combined to obtain the first quantization parameter offset used for adjusting the quantization parameter, and the edge density is related to the distortion sensitivity of human eyes to textures, so that the quantization parameter can be properly reduced by the coding units which are more sensitive to human eyes, the subjective quality of coding is improved, the quantization parameter can be properly increased by the coding units which are relatively insensitive to human eyes, the code rate is saved, and finally the subjective quality of coding can be improved under the condition that the code rate is fixed. In addition, the referenced degree of the coding units can reflect the importance degree of each coding unit in the coding process, and the quantization parameters are adjusted by combining the referenced degree, so that the objective quality of coding can be improved. However, the two adjustment strategies are directly overlapped, and the situation that the adjustment directions are opposite and offset each other may occur. By introducing texture complexity which can reflect the condition of human eye sensitivity and determining whether to superimpose the influence of the referenced degree on the basis of the edge density based on the texture complexity when adjusting the quantization parameter, the subjective quality of the coding can be ensured preferentially, the objective quality of the coding can be improved as much as possible, and the coding efficiency can be improved.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A video encoding method, comprising:

acquiring quantization parameters of each image frame of a video to be encoded, wherein each image frame comprises at least one encoding unit;

acquiring the edge density and texture complexity of the at least one coding unit;

determining a first quantization parameter offset of the at least one coding unit according to an edge density of the at least one coding unit;

For each coding unit, determining a second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and coding reference relationships between the coding unit and other coding units;

adjusting the quantization parameter of the corresponding coding unit based on the first quantization parameter offset and the second quantization parameter offset;

coding the video to be coded according to the quantization parameter of each coding unit;

the determining the second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and the coding reference relationship between the coding unit and other coding units includes:

and if the texture complexity is determined to be less than or equal to a texture threshold value, determining the second quantization parameter offset based on the coding reference relation.

2. The method of video coding according to claim 1, wherein said determining a second quantization parameter offset for a corresponding coding unit based on a texture complexity of the coding unit and coding reference relationships between the coding unit and other coding units, further comprises:

If the texture complexity is determined to be greater than the texture threshold and the coding unit belongs to a face region, determining the second quantization parameter offset based on the coding reference relationship;

if the texture complexity is determined to be greater than the texture threshold and the coding unit does not belong to the face region, determining that the second quantization parameter offset is 0.

3. The video coding method of claim 2, wherein the video coding method further comprises:

acquiring a face area of each image frame;

based on the positional relationship of each of the encoding units and the face area of the corresponding image frame, it is determined whether the corresponding encoding unit belongs to the face area.

4. The video encoding method according to claim 3, wherein the image frame of the video to be encoded includes a plurality of reference frames and an intermediate frame located between two adjacent ones of the reference frames, the face regions of the reference frames being obtained by image recognition, the face regions of the intermediate frame being obtained by at least one of a trajectory panning algorithm and a skin tone matching algorithm based on the face regions of the corresponding two reference frames.

5. The video coding method of any of claims 1-4, wherein the coding reference relationship comprises a referenced degree, the referenced degree being a degree by which the coding unit is referenced by other coding units in inter prediction, the determining the second quantization parameter offset based on the coding reference relationship comprising:

Determining the second quantization parameter offset based on the referenced degree, wherein the second quantization parameter offset is a negative value that is inversely related to the referenced degree.

6. The video coding method according to any of claims 1 to 4, wherein the texture complexity is obtained by:

traversing each image frame, determining the gradient amplitude of each pixel of each coding unit for each coding unit of the current image frame, and summing to obtain the absolute texture complexity of the corresponding coding unit;

determining an average value based on the absolute texture complexity of the at least one coding unit of the current image frame, resulting in a reference texture complexity of the current image frame;

and determining the ratio of the absolute texture complexity to the reference texture complexity for each coding unit of the current image frame to obtain the texture complexity of the corresponding coding unit.

7. The video coding method of any of claims 1-4, wherein the determining a first quantization parameter offset for the at least one coding unit based on an edge density of the at least one coding unit comprises:

For each image frame, determining an average value based on the edge density of the at least one coding unit, to obtain an average edge density of the corresponding image frame;

determining a difference between the edge density of each coding unit and the average edge density of the corresponding image frame, and obtaining the first quantization parameter offset of the corresponding coding unit based on the difference and an intensity factor, wherein the intensity factor is positively correlated with the average edge density of the corresponding image frame.

8. A video encoding apparatus, comprising:

an acquisition unit configured to: acquiring quantization parameters of each image frame of a video to be encoded, wherein each image frame comprises at least one encoding unit;

the acquisition unit is further configured to: acquiring the edge density and texture complexity of the at least one coding unit;

a first computing unit configured to: determining a first quantization parameter offset of the at least one coding unit according to an edge density of the at least one coding unit;

a second computing unit configured to: for each coding unit, determining a second quantization parameter offset of the corresponding coding unit based on the texture complexity of the coding unit and coding reference relationships between the coding unit and other coding units;

An adjustment unit configured to: adjusting the quantization parameter of the corresponding coding unit based on the first quantization parameter offset and the second quantization parameter offset;

an encoding unit configured to: coding the video to be coded according to the quantization parameter of each coding unit;

the second computing unit is further configured to:

9. The video encoding device of claim 8, wherein the second computing unit is further configured to:

10. The video encoding apparatus of claim 9, wherein,

the acquisition unit is further configured to: acquiring a face area of each image frame;

The second computing unit is further configured to: based on the positional relationship of each of the encoding units and the face area of the corresponding image frame, it is determined whether the corresponding encoding unit belongs to the face area.

11. The video encoding apparatus of claim 10, wherein the image frame of the video to be encoded includes a plurality of reference frames and an intermediate frame located between two adjacent ones of the reference frames, the face regions of the reference frames being obtained by image recognition, the face regions of the intermediate frame being obtained by at least one of a trajectory panning algorithm and a skin tone matching algorithm based on the face regions of the corresponding two reference frames.

12. The video coding apparatus according to any one of claims 8 to 11, wherein the coding reference relation includes a referenced degree, the referenced degree being a degree by which the coding unit is referenced by other coding units in inter prediction, the second calculation unit being further configured to:

13. The video coding device according to any of claims 8 to 11, wherein the texture complexity is obtained by:

14. The video encoding device of any one of claims 8 to 11, wherein the first computing unit is further configured to:

15. An electronic device, comprising:

at least one processor;

at least one memory storing computer-executable instructions,

wherein the computer executable instructions, when executed by the at least one processor, cause the at least one processor to perform the video encoding method of any of claims 1 to 7.

16. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by at least one processor, cause the at least one processor to perform the video encoding method of any one of claims 1 to 7.