CN115428451A

CN115428451A - Video encoding method, encoder, system, and computer storage medium

Info

Publication number: CN115428451A
Application number: CN202080099999.3A
Authority: CN
Inventors: 元辉; 周兰; 李明; 姜东冉
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2022-12-02
Also published as: WO2022021422A1

Abstract

The embodiment of the application discloses a video coding method, a coder, a system and a computer storage medium, wherein the method comprises the following steps: determining a pre-parameter of a video to be coded, and determining a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameter; determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier; determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion; determining a target distortion value according to the first distortion value and the second distortion value; and determining the encoding parameters of the video to be encoded by using the target Lagrange multiplier and the target distortion value, and encoding the video to be encoded.

Description

Video encoding method, encoder, system, and computer storage medium

Technical Field

Embodiments of the present disclosure relate to the field of video encoding and decoding technologies, and in particular, to a video encoding method, an encoder, a system, and a computer storage medium.

Background

Currently, the International Telecommunication Union (ITU) and the International Organization for Standardization (ISO) have established Joint Video Experts group (jmet) to study the latest Video Coding standard h.266/multifunctional Video Coding (VVC) and improve the performance of h.266/VVC ratio h.265/High Efficiency Video Coding (HEVC) by about 40%, which is the leading Video compression technical solution in the industry.

Generally speaking, for the same video coding algorithm, the higher the code rate, the better the reconstructed video quality, and the smaller the distortion; but the larger the storage space occupied by the encoded file is, the larger the generated code rate is. At this time, a balance point needs to be found between Distortion and a code Rate of a reconstructed video through a Rate Distortion Optimization (RDO) technology, so that a compression effect is optimal.

However, in the current related art, the rate-distortion optimization algorithm only guarantees the fidelity of the reconstructed video, or can guarantee the subjective quality of the reconstructed video, but the fidelity performance of the video is greatly reduced. Especially, in video coding facing machine vision and man-machine vision, the distortion criterion adopted by the existing rate distortion optimization algorithm is considered to be single and incomplete, so that the existing rate distortion optimization algorithm cannot be well adapted to application scenes facing machine vision and man-machine vision.

Disclosure of Invention

The embodiment of the application provides a video coding method, a video coder, a video coding system and a computer storage medium, which can be well suitable for application scenes facing machine vision and human-machine vision, can improve the semantic segmentation accuracy of a reconstructed video under the condition of a certain code rate, and can keep better fidelity performance at the same time, thereby improving the coding efficiency.

The technical scheme of the embodiment of the application can be realized as follows:

in a first aspect, an embodiment of the present application provides a video encoding method applied to an encoder, where the method includes:

determining a pre-parameter of a video to be coded, and determining a first Lagrange multiplier and a second Lagrange multiplier according to the pre-parameter;

determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier;

determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion;

determining a second distortion value based on a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion;

determining a target distortion value according to the first distortion value and the second distortion value;

and determining the encoding parameters of the video to be encoded by using the target Lagrange multiplier and the target distortion value, and encoding the video to be encoded.

In a second aspect, an embodiment of the present application provides an encoder, which includes a determining unit, a calculating unit, and an encoding unit; wherein the content of the first and second substances,

the determining unit is configured to determine a pre-parameter of a video to be encoded, and determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameter;

the computing unit is configured to determine a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier;

the determining unit is further configured to determine a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; and determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion;

the calculation unit is further configured to determine a target distortion value according to the first distortion value and the second distortion value;

the encoding unit is configured to determine an encoding parameter of the video to be encoded by using the target lagrangian multiplier and the target distortion value, and encode the video to be encoded.

In a third aspect, an embodiment of the present application provides an encoder, including a memory and a processor; wherein the content of the first and second substances,

the memory for storing a computer program operable on the processor;

the processor, when executing the computer program, is adapted to perform the method according to the first aspect.

In a fourth aspect, embodiments of the present application provide a computer storage medium storing a computer program, which when executed by at least one processor implements the method according to the first aspect.

In a fifth aspect, embodiments of the present application provide a video system comprising an encoder and a decoder; wherein the content of the first and second substances,

the encoder is configured to determine a pre-parameter of a video to be encoded, and determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameter; determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier; and determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; and determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion; determining a target distortion value according to the first distortion value and the second distortion value; determining encoding parameters of the video to be encoded by using the target Lagrange multiplier and the target distortion value, encoding the video to be encoded to generate a code stream, and transmitting the code stream to the decoder;

the decoder is configured to analyze the code stream to obtain a decoded video.

The embodiment of the application provides a video coding method, a coder, a system and a computer storage medium, wherein a first Lagrange multiplier and a second Lagrange multiplier are determined according to a pre-parameter of a video to be coded; determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier; determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion; determining a target distortion value according to the first distortion value and the second distortion value; and determining the encoding parameters of the video to be encoded by using the target Lagrange multiplier and the target distortion value, and encoding the video to be encoded. Therefore, rate distortion optimization is carried out by comprehensively considering the first distortion measurement criterion based on the semantic distortion measurement and the second distortion measurement criterion based on the numerical error measurement in video coding, the method can be well suitable for application scenes facing machine vision and man-machine vision, the semantic segmentation accuracy of the reconstructed video can be improved under the condition of a certain code rate, and meanwhile, better fidelity performance can be kept, so that the coding efficiency is improved.

Drawings

Fig. 1 is a schematic structural diagram of an RD curve provided in the related art;

fig. 2 is a schematic structural diagram of a system of an encoder according to an embodiment of the present disclosure;

fig. 3 is a flowchart illustrating a video encoding method according to an embodiment of the present application;

fig. 4 is a schematic diagram of a functional relationship between a first distortion value and a code rate according to an embodiment of the present application;

fig. 5 is a schematic diagram of a curve of a functional relationship between a code rate and a quantization parameter according to an embodiment of the present application;

fig. 6 is a schematic diagram of a curve of a functional relationship between a first distortion value and an MSE according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an encoder according to an embodiment of the present disclosure;

fig. 8 is a schematic hardware structure diagram of an encoder according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a video system according to an embodiment of the present application.

Detailed Description

So that the manner in which the above recited features and aspects of the present invention can be understood in detail, a more particular description of the embodiments of the invention, briefly summarized above, may be had by reference to the appended drawings, which are included to illustrate, but are not intended to limit the embodiments of the invention.

With the development of the digital media era, continuous media data transmission via network has become a trend, and more users want to perform video communication and service via internet and wireless network by using Personal Computers (PCs) and non-PC devices, and such video communication and service at any time and any place presents a greater challenge to the current video coding technology.

It should be understood that the International Telecommunications Union (ITU) and the International Organization for Standardization (ISO) have established Joint Video Experts group (jmet) to study the next generation Video Coding standard h.266/multifunctional Video Coding (VVC), and the technology accumulated in the industry at present has further improved the h.266/VVC performance by about 40% compared to h.265/HEVC, which is the most leading Video compression technical solution in the industry at present.

Generally speaking, for the same video coding algorithm, the higher the code rate, the better the reconstructed video quality, and the smaller the distortion; but the larger the storage space occupied by the coded file is, the larger the generated code rate is. Therefore, at this time, a balance point needs to be found between distortion and a code rate of the reconstructed video through a rate-distortion optimization algorithm, so that the compression effect is optimal.

It should be noted that rate-distortion optimization can be expressed as minimizing distortion of a video reconstructed by decoding when a coded file does not exceed a certain code rate, as shown in the following formula (1).

min{D}s.t.R<＝R _max (1)

Where D and R represent distortion and code rate, respectively, under certain coding parameters.

The video is encoded with given encoding parameters, and the encoded code rate (R) and the distortion (D) of the reconstructed video are calculated. By changing the encoding parameters and repeatedly encoding the video to be encoded, a plurality of R-D points consisting of code rate and distortion can be obtained, as shown in fig. 1. In general, for a predetermined code rate, the point with the minimum distortion will appear on the convex curve (i.e., the RD curve) in fig. 1. For the input video to be encoded, the encoder needs to determine a set of encoding parameters, so that the encoded R-D points can approximate the convex curve as much as possible.

The constrained problem of equation (1) above can be converted to the unconstrained problem by the lagrange multiplier method at this time, as shown in equation (2) below.

min{J＝D+λ·R} (2)

Where λ represents a lagrange multiplier and J represents a rate-distortion cost function. For each possible λ, whose corresponding value is the slope of the RD curve tangent, the encoder can find the optimal encoding parameters by minimizing the rate-distortion cost function.

Thus, using the distortion optimization algorithm, the encoder can determine the optimal block division mode, the optimal intra-frame prediction mode, and the optimal inter-frame prediction motion mode (including motion vectors, reference images, prediction weights, etc.) to achieve the optimal encoding performance.

In related art schemes (such as VVC), the rate distortion optimization herein uses Sum of Squared Errors (SSE) as a distortion criterion, and the corresponding reconstructed video quality can be measured by Peak Signal to Noise Ratio (PSNR). The SSE distortion can objectively measure the fidelity of a video, and the calculation formula is shown in the following formula (3).

Where M and N represent the horizontal and vertical spatial resolution of the video, respectively, f (x, y) represents the original pixel value at pixel location (x, y), and g (x, y) represents the reconstructed pixel value at pixel location (x, y).

Since rate-distortion optimization is a key technology in video coding, the performance of the encoder is affected. Although SSE distortion is adopted in a rate distortion optimization algorithm in the existing video coding, the fidelity of a video can be measured from an objective angle; however, the SSE distortion is not consistent with the perception of the human visual system, for example, for some regions with large SSE distortion, the human eye does not perceive the degradation of the reconstructed video quality. At this time, when the encoder needs to ensure subjective quality of the reconstructed video, the distortion criterion needs to be changed to a distortion metric capable of measuring the subjective quality, such as Structural SIMilarity (SSIM) distortion that is consistent with human eye perception, and the calculation formula is shown in the following formula (4).

Wherein x and y represent the original image and the reconstructed image, respectively, mu _x And mu _y Respectively representing the mean of the original image and the reconstructed image,

and

representing the variance, σ, of the original and reconstructed images, respectively _xy Representing the covariance of the original and reconstructed images, C ₁ And C ₂ Is two constants in order to avoid

And

instability near 0. Here, C may be taken to obtain a robust quality evaluation result ₁ ＝(K ₁ L) ² ，C ₂ ＝(K ₂ L) ² (ii) a Wherein L =2 ^bit_depth -1 (bit depth denotes bit depth, L =255 for an 8 bit depth image), K ₁ ＝0.01，K ₂ ＝0.03。

In the related art, the rate distortion optimization algorithm based on SSE distortion can ensure the fidelity of the reconstructed video; however, although the SSIM distortion considering subjective quality can guarantee the subjective quality of the reconstructed video, the fidelity performance of the video is greatly reduced.

Because the current rate-distortion optimization algorithms are all directed to traditional application scenarios in which reconstructed videos are provided for people to watch and research, but mass applications facing machines, such as machine vision contents of car networking, unmanned driving, industrial internet, smart and safe cities, wearable, video monitoring and the like, are promoted in the Fifth Generation mobile communication (5G) era, and the application scenarios are wider. In the 5G era and the later 5G era, most videos can be used by machines, for example, the reconstructed videos are subjected to pedestrian detection, semantic segmentation, target detection and other intelligent analysis. However, in video coding oriented to machine vision and human-machine vision, only fidelity distortion is considered in a distortion criterion adopted by the current rate distortion optimization algorithm, and semantic distortion is not considered; although the reconstructed video obtained through coding has good fidelity performance, the semantic accuracy of the reconstructed video cannot be guaranteed, so that the existing rate distortion optimization algorithm cannot be well adapted to a plurality of scenes facing machine vision and man-machine vision.

Based on this, the embodiment of the present application provides a video encoding method, and the basic idea is that: determining a pre-parameter of a video to be coded, and determining a first Lagrange multiplier and a second Lagrange multiplier according to the pre-parameter; determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier; determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion; determining a target distortion value according to the first distortion value and the second distortion value; and determining the encoding parameters of the video to be encoded by using the target Lagrange multiplier and the target distortion value, and encoding the video to be encoded. Therefore, rate distortion optimization is carried out by comprehensively considering the first distortion measurement criterion based on the semantic distortion measurement and the second distortion measurement criterion based on the numerical error measurement in video coding, the method can be well suitable for application scenes facing machine vision and man-machine vision, the semantic segmentation accuracy of the reconstructed video can be improved under the condition of a certain code rate, and meanwhile, better fidelity performance can be kept, so that the coding efficiency is improved.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to fig. 2, a block diagram example of a system composition of an encoder provided in an embodiment of the present application is shown. As shown in fig. 2, the encoder 10 may include a transform and quantization unit 101, an intra estimation unit 102, an intra prediction unit 103, a motion compensation unit 104, a motion estimation unit 105, an inverse transform and inverse quantization unit 106, a filter control analysis unit 107, a filtering unit 108, an encoding unit 109, a decoded image buffer unit 110, and the like, wherein the filtering unit 108 may implement deblocking filtering and Sample Adaptive indent (sao) filtering, and the encoding unit 109 may implement header information encoding and Context-based Adaptive Binary arithmetic Coding (CABAC). For an input original video signal, a video Coding block can be obtained by dividing a Coding Unit (CU) through Coding blocks, and then residual pixel information obtained after intra-frame or inter-frame prediction is transformed through a transformation and quantization Unit 101, including transforming the residual information from a pixel domain to a transformation domain and quantizing the obtained transformation coefficient, so as to further reduce the bit rate; the intra estimation unit 102 and the intra prediction unit 103 are used for intra prediction of the video coding block; in particular, intra estimation unit 102 and intra prediction unit 103 are used to determine the intra prediction mode to be used to encode the video coding block; motion compensation unit 104 and motion estimation unit 105 are to perform inter-prediction encoding of the received video coding block relative to one or more blocks in one or more reference frames to provide temporal prediction information; motion estimation performed by the motion estimation unit 105 is a process of generating motion vectors that can estimate the motion of the video coding block, and then motion compensation is performed by the motion compensation unit 104 based on the motion vectors determined by the motion estimation unit 105; after determining the intra prediction mode, the intra prediction unit 103 is also configured to supply the selected intra prediction data to the encoding unit 109, and the motion estimation unit 105 sends the calculated determined motion vector data to the encoding unit 109 as well; further, the inverse transform and inverse quantization unit 106 is for reconstruction of the video coding block, reconstructing a residual block in the pixel domain, the reconstructed residual block removing the blocking artifact through the filter control analysis unit 107 and the filtering unit 108, and then adding the reconstructed residual block to a predictive block in the frame of the decoded picture buffer unit 110 to generate a reconstructed video coding block; the encoding unit 109 is configured to encode various encoding parameters and quantized transform coefficients, and in a CABAC-based encoding algorithm, context may be based on adjacent encoding blocks, and may be configured to encode information indicating the determined intra prediction mode and output a code stream of the video signal; the decoded picture buffer unit 110 is used to store reconstructed video coding blocks for prediction reference. As the video coding proceeds, new reconstructed video coding blocks are continuously generated, and these reconstructed video coding blocks are stored in the decoded picture buffer unit 110.

The video encoding method in the embodiment of the present application is mainly applied to an encoding control portion in the encoder 10, such as a portion including a Coding Unit (CU) partition, an intra prediction Unit 103, a motion compensation Unit 104, and a motion estimation Unit 105 shown in fig. 2. That is to say, the video encoding method according to the embodiment of the present application is mainly used for determining the encoding parameters so as to perform encoding according to the determined encoding parameters. The coding parameters may include a CU partition method, an intra prediction mode or an inter prediction mode for determining a CU.

Based on this, the technical solution of the present application is further elaborated below with reference to the drawings and the embodiments. Before detailed explanation, it should be noted that "first", "second", "third", and the like, are mentioned throughout the specification only for distinguishing different features, and do not have functions of defining priority, precedence, size relationships, and the like.

The embodiment of the application provides a video coding method, which is applied to a video coding device, namely an encoder. The functions implemented by the method may be implemented by a processor in the encoder invoking a computer program, which of course may be stored in a memory, and the encoder comprises at least a processor and a memory.

Referring to fig. 3, a schematic flowchart of a video encoding method provided by an embodiment of the present application is shown. As shown in fig. 3, the method may include:

s301: determining a pre-parameter of a video to be coded, and determining a first Lagrange multiplier and a second Lagrange multiplier according to the pre-parameter;

s302: determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier;

s303: determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; and determining a second distortion value in accordance with a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion;

s304: determining a target distortion value according to the first distortion value and the second distortion value;

s305: and determining the encoding parameters of the video to be encoded by using the target Lagrange multiplier and the target distortion value, and encoding the video to be encoded.

It should be noted that the Video coding method in the embodiment of the present application may be applied to an encoder of h.266/VVC standard, an encoder of h.265/HEVC standard, and even an encoder of other standards, such as an encoder of the first generation Video coding standard (Alliance for Open Media Video 1, av-1) developed by the Open Media Alliance, and the present application is not limited in any way.

It should be further noted that the rate-distortion optimization algorithm used in the video coding method of the embodiment of the present application comprehensively considers a first distortion metric criterion based on the semantic distortion metric and a second distortion metric criterion based on the numerical error metric, so that the rate-distortion optimization may be a multi-distortion criterion rate-distortion optimization algorithm oriented to human and machine vision. That is to say, in video encoding, in addition to the second lagrangian multiplier and the second distortion value derived by using the related technical scheme, for the human-machine vision application scene of video semantic segmentation, the embodiment of the present application may further define a semantic distortion metric, and then derive a corresponding calculation formula of the first lagrangian multiplier and the first distortion value.

Thus, the target Lagrange multiplier can be determined according to the first Lagrange multiplier and the second Lagrange multiplier, and the target distortion value can also be determined according to the first distortion value determined by the first distortion measurement criterion and the second distortion value determined by the second distortion measurement criterion; therefore, after the encoding parameters of the video to be encoded are determined according to the target Lagrange multiplier and the target distortion value, the video to be encoded is encoded by using the encoding parameters, the semantic segmentation accuracy of the reconstructed video can be improved, the fidelity of the reconstructed video is improved, the encoding code rate of the video can be reduced, the time required by encoding is shortened, the encoding speed is improved, and the encoding efficiency is improved.

It will be appreciated that in one possible implementation, the pre-parameters of the video to be encoded may include Quantization Parameters (QPs).

At this time, for S301, the determining the pre-parameter of the video to be encoded may include:

determining quantization parameters of coding units in the video to be coded; wherein the encoding unit may include at least one of: picture, slice (Slice), sub-picture (Sub-picture), tile (tile), coded block.

Here, the quantization parameter may be a quantization step size of a quantizer in the encoder, or an index number value corresponding to the quantization step size of the quantizer in the encoder.

For the determining of the first lagrangian multiplier, in some embodiments, for S301, the determining the first lagrangian multiplier according to the pre-parameter may include:

determining a first computational model parameter, the first computational model representing a correspondence between the first Lagrangian multiplier and a quantization parameter;

and determining the first Lagrangian multiplier according to the quantization parameter and the first calculation model.

It should be noted that, for the first computational model, the determining parameters of the first computational model may include:

setting the first Lagrangian multiplier to a weighting equal to an exponential power of the quantization parameter in the first computational model;

the first computational model parameter includes a first exponential parameter indicative of the exponential power and a first weighting coefficient indicative of the weighting.

The first Lagrangian multiplier is represented by λ _miou Representing the quantization parameter as QP, the calculation formula of the first calculation model is as follows,

λ _miou ＝2.30422*10 ^-8 *QP ^6.3612072 (5)

here, equation (5) is a first calculation model representing a correspondence between the first lagrange multiplier and the quantization parameter. Wherein the first calculation model parameters may comprise a first exponential parameter (i.e. 6.3612072 in the formula) and a first weighting factor (i.e. 2.30422 x 10 in the formula) ^-8 )。

Further, the first calculation model parameter may be determined as a preset value, or may be obtained by fitting a large amount of experimental test data of the test video, which is not limited herein.

Optionally, in some embodiments, the method may further include:

and setting the first calculation model parameter as a preset value.

Here, for the first calculation model parameter, the first exponential parameter may be set to 6.3612072, and the first weighting coefficient may be set to 2.30422 × 10 ^-8 . After the first exponential parameter and the first weighting coefficient are determined, a first calculation model can be obtained, so that a first Lagrangian multiplier can be determined according to the quantization parameter.

Optionally, in some embodiments, the method may further include:

determining a first relation function between the first distortion value and the code rate of the test video by using the first distortion metric criterion based on the test video, and performing derivative operation on the first relation function to determine a derivative function of the first relation function;

determining a second relation function between the code rate and the quantization parameter based on the test video, and determining the first calculation model parameter according to the derivative function and the second relation function.

It should be noted that the test video of the embodiment of the present application may be one or more test videos, for example, the test video may be a plurality (for example, 59) of test video sequences in a large-scale city scene data set (cityscaps).

Thus, using a first distortion metric criterion for the test video, a first relationship function between a first distortion value and a bitrate of the test video may be determined, the first relationship function being represented by,

D _miou ＝0.2299*R ^-0.7553 +3.848 (6)

wherein D is _miou Representing the first distortion value and R representing the code rate.

In the RD curve shown in FIG. 1, λ represents the maximum value _miou The value of (d) is the slope (lambda) of the tangent of the curve _miou >0) I.e. a derivative function equal to the negative curve. Based on the test video, the average code rate of the reconstructed video under different quantization parameters can be counted according to the file obtained by encoding, and a first distortion value (D) can be determined by fitting by utilizing a large amount of test data _miou ) The fitted curve is shown in fig. 4 as a function of the code rate (R).

Performing a derivative operation on equation (6) may obtain a derivative function of the first relation function, where the derivative function is used to represent a correspondence between the first lagrangian multiplier and the code rate. Here, the derivative function is shown as follows,

λ _miou ＝0.17364347*R ^-1.7553 (7)

in addition, according to the test video, the second relation function between the code rate (R) and the Quantization Parameter (QP) can be determined by fitting according to a large amount of experimental test data, and the fitting curve is shown in fig. 5. Here, the second relation function is shown as follows,

R＝8278*QP ^-3.624 (8)

by substituting formula (8) into formula (7) according to formula (7) and formula (8), lambda can be obtained _miou And the QP, that is, the correspondence between the first lagrangian multiplier and the quantization parameter, to obtain a first calculation model represented by equation (5); thereby, the first computational model parameter is determined, so that the first Lagrangian multiplier can be determined according to the quantization parameter.

It should be noted that the calculation formula of the first lagrangian multiplier can be modified into other functional forms. For example, the functional relationship of the above equation (8) can also be fit to an e-exponential form, and the corresponding equation (5), i.e., the calculation equation of the first lagrange multiplier, can also be an e-exponential form of QP, which is not limited here.

For the determining of the second lagrangian multiplier, in some embodiments, for S301, the determining the second lagrangian multiplier according to the pre-parameter may include:

determining the second Lagrange multiplier according to a preset third calculation model; wherein the third computational model represents a correspondence between the second Lagrangian multiplier and a quantization parameter.

It should be noted that, for the third calculation model, it may be constructed by using the SSE distortion criterion in the prior art. In some embodiments, the determining the third computational model parameter may include:

setting the second Lagrangian multiplier to a weight equal to an exponential power of 2 in the third computational model;

the third calculation model parameter comprises a third exponential parameter indicative of the power of the exponent and a third weighting coefficient indicative of the weighting, the third exponential parameter being related to a quantization parameter.

The second Lagrangian multiplier is represented by λ _SSE Representing the quantization parameter as QP, the calculation formula of the third calculation model is as follows,

λ _SSE ＝0.57*2 ^(QP-12)/3 (9)

here, equation (9) is a third calculation model for representing the correspondence between the second lagrangian multiplier and the quantization parameter. The third calculation model parameter may include a third exponential parameter (i.e., (QP-12)/3 in the equation) and a third weighting coefficient (i.e., 0.57 in the equation), and a value of the third exponential parameter is related to the Quantization Parameter (QP).

It should be further noted that, the determination of the third calculation model parameter may be a preset value; or may be fit from a large amount of experimental test data of the test video, and is not limited herein.

Further, both the first and second lagrangian multipliers are related to the quantization parameter. For the determination of the quantization parameter, in some embodiments, the determining the quantization parameter of the coding unit in the video to be coded may include: and determining the quantization parameter of the coding unit in the video to be coded by using a code rate control mode.

Alternatively, in some embodiments, the determining a quantization parameter of a coding unit in the video to be coded may include: and setting the quantization parameter as a preset value.

That is, for the quantization parameter in the video to be encoded, on the one hand, the quantization parameter may be set to a preset value, such as 22, 27, 32, 37, etc. On the other hand, the quantization parameter can also be determined by using a code rate control mode; specifically, the current code stream control algorithm mainly controls the code stream by adjusting the size of the quantization parameter; therefore, by controlling the size of the code rate, the required quantization parameter can be obtained.

In another possible implementation, the pre-parameters of the video to be encoded may include a quantization parameter and a target bitrate.

determining quantization parameters and target code rates of coding units in the video to be coded; wherein the encoding unit may include at least one of: picture, slice (Slice), sub-picture (Sub-picture), tile (tile), coded block.

determining a second calculation model parameter, wherein the second calculation model represents the corresponding relation between the first Lagrange multiplier and the code rate;

and determining the first Lagrangian multiplier according to the target code rate and the second calculation model.

It should be noted that, the determining a target code rate of a coding unit in the video to be coded may include: and determining the target code rate of the coding unit in the video to be coded by using a bit allocation mode.

That is to say, the target code rate for the coding unit in the video to be coded can be obtained by adopting a bit allocation manner. Here, the target bit rate can be dynamically adjusted according to the number of bits consumed by the coding unit in the video to be coded, so as to ensure the real-time performance and accuracy of bit allocation.

It should be further noted that, for the second calculation model, the determining parameters of the second calculation model may include:

setting the first Lagrangian multiplier to a weight equal to an exponential power of the target bitrate in the second computational model;

the second computational model parameter includes a second index parameter indicative of the exponential power and a second weighting coefficient indicative of the weighting.

The first lagrange multiplier is represented by λ _miou Representing the code rate as R, the calculation formula of the second calculation model is as follows,

λ _miou ＝0.17364347*R ^-1.7553 (10)

here, equation (10) is a second calculation model representing the correspondence between the first lagrangian multiplier and the code rate. Wherein the second computational model parameter may include a second index parameter (i.e., -1.7553 in the equation) and a second weighting factor (i.e., 0.17364347 in the equation).

Further, the second calculation model parameter may be determined as a preset value, or may be obtained by fitting a large amount of experimental test data of the test video, which is not limited herein.

Optionally, in some embodiments, the method may further include:

and setting the second calculation model parameter as a preset value.

Here, for the second calculation model parameter, the second index parameter may be set to-1.7553, and the second weighting coefficient may be set to 0.17364347. After the second index parameter and the second weighting coefficient are determined, a second calculation model can be obtained, so that the first Lagrangian multiplier can be determined according to the target code rate.

Optionally, in some embodiments, the method may further include:

determining, based on a test video, a first relationship function between the first distortion value and a code rate of the test video using the first distortion metric criterion;

and carrying out derivative operation on the first relation function to determine the second calculation model parameter.

It should be noted that the test video may also be one or more test videos, for example, the test video may be a plurality (e.g., 59) of test video sequences in a large-scale city scene data set (Cityscapes).

Thus, using the first distortion metric criterion for the test video, a first relation function between the first distortion value and the bitrate of the test video can be determined, the first relation function being shown in equation (6) above. Then, performing derivative operation on the equation (6) to obtain a derivative function of the first relation function, wherein the derivative function is used for representing the corresponding relation between the first lagrangian multiplier and the code rate, and a second calculation model shown in the equation (10) is obtained; thereby, the second calculation model parameter is determined, so that the first Lagrangian multiplier can be determined according to the quantization parameter.

It is noted that in determining the first relation function, λ is based on the RD curve shown in FIG. 1 _miou The value of (A) is the slope (lambda) of the tangent to the curve _miou >0) I.e. a derivative function equal to the negative curve. At this time, based on the test video, the average bit rate of the reconstructed video under different quantization parameters can be counted according to the file obtained by encoding, and at this time, a first distortion value (D) can be determined by fitting by using a large amount of test data _miou ) As a function of the code rate (R), the fitted curve is shown in fig. 4 to obtain a first relation function.

It should be noted that, for the third calculation model, it may be constructed by using the SSE distortion criterion in the prior art. In some embodiments, the determining the third computational model parameters may include:

The second Lagrangian multiplier is represented by λ _SSE That is, the quantization parameter is represented by QP, and the calculation formula of the third calculation model is shown in the above formula (9).

It should be further noted that, the determination of the third calculation model parameter may be a preset value; or may be fit to a large amount of experimental test data from a test video, and is not limited herein.

That is, for the quantization parameter in the video to be encoded, on one hand, the quantization parameter may be set to a preset value, such as 22, 27, 32, 37, etc. On the other hand, a quantization parameter can also be determined in a code rate control mode; specifically, the current code stream control algorithm mainly controls the code stream by adjusting the size of the quantization parameter; therefore, by controlling the size of the code rate, the required quantization parameter can be obtained.

Thus, for the rate-distortion optimization algorithm of the embodiment of the present application, in addition to the first lagrangian multiplier and the second lagrangian multiplier, a distortion value also needs to be determined. Here, it is assumed that what is obtained based on the first distortion metric criterion is referred to as a first distortion value, and what is obtained based on the second distortion metric criterion is referred to as a second distortion value.

It should be noted that the first distortion metric criterion may be a semantic distortion metric criterion. Taking the encoder of the h.266/VVC standard as an example, in order to improve the semantic segmentation accuracy of the reconstructed video, a semantic distortion metric needs to be defined first. Specifically, a plurality of quantization parameters may be selected, then, VVC encoding may be performed on a plurality (for example, 59) of test video sequences in a large-scale city scene data set (cities scenes) under a Random Access (RA) condition, and video semantic segmentation may be performed on videos before and after encoding, so that accuracy of a semantic segmentation result may be calculated according to corresponding labeled data.

The accuracy of semantic segmentation can be generally expressed by using a mean Intersection over unit (mlou), which is an average of Intersection over units (ious) of all categories. Here, the IoU is used as a Detection evaluation function, which is simply the overlapping rate of the generated prediction window and the real window, i.e. the ratio of the intersection of the Detection Result region (Detection Result) and the true value region (Ground true) to the union of the two, i.e. semantic accuracy (expressed by IoU).

In a possible implementation manner, for S303, the determining a first distortion value according to a first distortion metric criterion may include:

performing semantic segmentation on a test video based on the test video, and determining semantic accuracy of one or more categories;

determining a target semantic accuracy from the semantic accuracies of the one or more categories;

and carrying out distortion measurement on the target semantic accuracy by utilizing a fourth calculation model to obtain the first distortion value.

Further, the determining a target semantic accuracy from the semantic accuracies of the one or more categories may include:

calculating a weighted sum of semantic accuracies of the one or more categories, determining the resulting weighted sum as the target semantic accuracy.

It should be noted that, a weighted sum of the semantic accuracies of one or more categories is calculated, and in one embodiment, the weight may be set to 1, that is, an average of the semantic accuracies of the one or more categories is calculated, and the obtained average is determined as the target semantic accuracy.

Illustratively, in the context of semantic segmentation,the two sets may represent predicted and true values, respectively, i.e. A _pred Is a predicted segmentation result area, A _true Is the marked segmentation result area; the semantic accuracy of each class can be expressed in terms of IoU, which is calculated as follows.

After obtaining the IoU of each category, the target semantic accuracy can be obtained by averaging, which can be expressed by mliou. Here, mlou refers to the average IoU of all categories, and the numeric area thereof is 0 to 1; a larger numerical value indicates a higher semantic accuracy. The calculation of the IoU for the n classes is as follows.

Wherein, ioU _i IoU, i =1, \ 8230indicating the i-th class, and n, n indicates the number of all classes.

It should be further noted that, for the fourth calculation model, the performing distortion measurement on the target semantic accuracy by using the fourth calculation model to obtain the first distortion value includes:

determining the fourth computational model parameters, the fourth computational model representing a correspondence between the first distortion value and the target semantic accuracy;

and obtaining the first distortion value according to the target semantic accuracy and the fourth calculation model.

Further, the determining the fourth calculation model parameter may include:

in the fourth computational model, setting the first distortion value to a weighting value equal to a logarithm of the target semantic accuracy;

the fourth calculation model parameter includes a base parameter indicative of the logarithm and a fourth weighting coefficient parameter indicative of the weighting.

Wherein the fourth calculation model parameter is set to a preset value.

Here, for a scene that is semantically segmented by the video, a semantic distortion metric (i.e., a first distortion value, in terms of D) may be defined _miou Expressed), the calculation formula is shown in the following formula (13).

D _miou ＝-10*ln(mIoU) (13)

Wherein equation (13) is a fourth calculation model for representing the first distortion value (D) _miou ) And a target semantic accuracy (mlou). The fourth calculation model parameter may include a base parameter (i.e. the base of ln in the formula is 10) and a fourth weighting coefficient parameter (i.e. the base of-10 in the formula), where the fourth weighting coefficient parameter may also be regarded as a preset magnification factor for ln (mlou).

It should be further noted that, the fourth calculation model parameter may be determined as a preset value; or may be fit from a large amount of experimental test data of the test video, and is not limited herein.

In addition, according to equation (13), the natural log function maps a finite mlou value to an infinite range, and the multiplied coefficient amplifies the resulting value to match the distortion magnitude in the rate-distortion optimization algorithm. Thus, D when mIoU goes to 0 _miou Tends to be infinite; when mIoU goes to 1, D _miou Tending towards 0.

In another possible implementation, the first distortion value may also relate to a target mean square error of an encoding unit in the video to be encoded. The determining a first distortion value according to a first distortion metric criterion may include:

determining fifth computational model parameters representing a third relationship function between the first distortion value and a mean square error;

and determining a target mean square error of an encoding unit in the video to be encoded, and determining the first distortion value according to the target mean square error and the fifth calculation model.

The reconstructed video is obtained by performing video decoding reconstruction on the encoded video. After the test video is coded by using the quantization parameter to obtain the coded video under the quantization parameter, video reconstruction can be performed on the coded video under the quantization parameter to obtain a reconstructed video under the quantization parameter; according to the reconstructed video and the original video, the Mean Squared Error (MSE) of the reconstructed video under the quantization parameter can be obtained. Here, the mean square error is an expected value of the square of the difference between the predicted parameter value and the actual parameter value. The MSE can evaluate the change degree of the data, and the smaller the value of the MSE is, the better the accuracy of the prediction model for describing the experimental data is.

It should be further noted that, for the fifth calculation model, the determining the fifth calculation model parameters may include:

setting the first distortion value equal to the product of the target mean square error and a first parameter factor and adding a sum of second parameter factors in the fifth calculation model;

the fifth computational model parameter includes the first parametric factor and the second parametric factor.

The first distortion value D is used _miou Expressed in terms of mean square error, MSE, the calculation formula of the fifth calculation model is as follows,

D _miou ＝0.6276*MSE+3.48 (14)

here, equation (14) is a fifth calculation model, which represents a correspondence relationship between the first distortion value and the mean square error. Wherein the fifth computational model parameter may include a first parametric factor (i.e., 0.6276 in the equation) and a second parametric factor (i.e., 3.48 in the equation).

Further, the fifth calculation model parameter may be determined as a preset value, or may be obtained by fitting a large amount of experimental test data of the test video, which is not limited herein.

Optionally, in some embodiments, the method may further include:

and setting the parameters of the fifth calculation model as preset values.

Here, for the fifth calculation model parameter, the first parameter factor may be set to 0.6276, and the second parameter factor may be set to 3.48. After the first parameter factor and the second parameter factor are determined, a fifth calculation model can be obtained, so that a first distortion value can be determined according to the target mean square error.

Optionally, in some embodiments, the method may further include:

determining, based on a test video, a third correlation function between the first distortion value and a mean square error of the test video using the first distortion metric criterion;

and determining the fifth calculation model parameter according to the third relation function.

It should be noted that the test video may also be one or more test videos, for example, the test video may be a plurality (e.g., 59) of test video sequences in a large-scale city scene data set (cityscaps).

In this way, a first distortion metric criterion is used for the test video, the average MSE of the reconstructed video under different quantization parameters is counted according to the file obtained by encoding, and a first distortion value (D) can be determined by fitting through a large amount of experimental test data at this time _miou ) And MSE, the fitted curve is shown in fig. 6, and the fitted curve is linear, a fifth calculation model can be obtained.

It should be noted that the determining the pre-parameter of the video to be encoded may further include: and determining the target mean square error of the coding unit in the video to be coded. Thus, after the fifth calculation model is obtained, the first distortion value can be determined according to the target mean square error and the fifth calculation model shown in the formula (14).

In addition, for the determination of the first distortion value, other manners, such as a difference between semantic segmentation results of the video before and after encoding, may also be adopted. Here, a first distortion value is determined according to a difference between a first mlou value and a second mlou value, according to a semantic segmentation result (first mlou value) of a pre-encoded video and a semantic segmentation result (second mlou value) of a post-encoded video; the embodiments of the present application are not particularly limited.

It is further noted that the second distortion metric criterion may be a numerical error criterion. In some embodiments, for S303, the determining a second distortion value according to a second distortion metric criterion may include:

determining a reconstruction value for a coding unit in the video, wherein the coding unit comprises at least one of: a picture, a Slice (Slice), a Sub-picture (Sub-picture), a tile (tile), and a coding block;

determining the second distortion value according to a reconstructed value and an original value of the encoding unit based on the numerical error criterion;

wherein the numerical error criterion is one of: sum of Absolute Differences (SAD) criterion, mean Absolute Deviation (MAD) criterion, sum of Squared Errors (SSE) criterion, and Mean-Square Error (MSE) criterion. It should be noted that the numerical error criteria are not limited to these criteria, and may be other criteria, and the embodiments of the present application are not particularly limited.

It should be noted that, taking the numerical error criterion as the SSE criterion as an example, the second distortion value is represented by SSE at this time, and the calculation formula is shown as follows,

where M and N represent the horizontal and vertical spatial resolutions of the video, respectively, f (x, y) represents the original pixel value at pixel location (x, y), and g (x, y) represents the reconstructed pixel value at pixel location (x, y).

Understandably, the first Lagrange multiplier (λ) is obtained _miou ) And a second Lagrange multiplier (λ) _SSE ) And a first distortion value (D) _miou ) And a second distortion value (SSE), can be passed through _miou And λ _SSE The target Lagrange multiplier (denoted by λ) is calculated, by D _miou And the SSE calculates the target distortion value (denoted by D).

In some embodiments, for S302, the determining a target lagrangian multiplier based on the first lagrangian multiplier and the second lagrangian multiplier may comprise:

determining a first preset parameter; the first preset parameter is used for controlling weight values corresponding to the first Lagrangian multiplier and the second Lagrangian multiplier;

and performing weighted calculation on the first Lagrangian multiplier and the second Lagrangian multiplier by using the first preset parameter to obtain the target Lagrangian multiplier.

It should be noted that the first preset parameter may control the weight values corresponding to the first lagrangian multiplier and the second lagrangian multiplier. Specifically, in some embodiments, the determining the first preset parameter may include:

and setting the first preset parameter according to the configuration information of the encoder.

Further, the method may further include:

when the configuration information of the encoder indicates that the first preset parameter is equal to k, setting the target Lagrangian multiplier to be equal to a weighted sum of the first Lagrangian multiplier and the second Lagrangian multiplier, wherein k is any value greater than or equal to 0 and less than or equal to 1, the weighting coefficient of the first Lagrangian multiplier is set to be equal to 1-k, and the weighting coefficient of the second Lagrangian multiplier is set to be equal to k.

It should be noted that, assuming that the first preset parameter is represented by k, when the weighting coefficient of the second lagrangian multiplier is set to k, the weighting coefficient of the first lagrangian multiplier may be set to 1-k; thus, the target Lagrangian multiplier is calculated as follows,

λ＝k*λ _SSE +(1-k)*λ _miou (16)

wherein λ represents a target Lagrangian multiplier, λ _miou Denotes the first Lagrange multiplier, λ _SSE Representing a second Lagrangian multiplier; 1-k and k represent the weighting coefficients of the first and second lagrangian multipliers, respectively.

K may be a constant in the range of 0 to 1. Here, the value of k may be equal to 0.5, may also be 0.75, and may also be a variable value (for example, obtained by performing some calculation on the current coding unit). In general, a typical value of k may be equal to 0.75.

In some embodiments, for S304, the determining a target distortion value according to the first distortion value and the second distortion value comprises:

determining a second preset parameter; the second preset parameter is used for controlling weight values corresponding to the first distortion value and the second distortion value;

and performing weighted calculation on the first distortion value and the second distortion value by using the second preset parameter to obtain the target distortion value.

It should be noted that the second preset parameter may control a weight value corresponding to the first distortion value and the second distortion value. Specifically, in some embodiments, the determining the second preset parameter may include:

and setting the second preset parameter according to the configuration information of the encoder.

Further, the method may further include:

when the configuration information of the encoder indicates that the second preset parameter is equal to m, setting the target distortion value equal to a weighted sum of the first distortion value and the second distortion value, where m is any value greater than or equal to 0 and less than or equal to 1, setting a weighting coefficient of the first distortion value equal to 1-m, and setting a weighting coefficient of the second distortion value equal to m.

It should be noted that, assuming that the second preset parameter is represented by m, when the weighting factor of the second distortion value is set to m, the weighting factor of the first distortion value may be set to 1-m; thus, the calculation formula of the target distortion value is as follows,

D＝m*SSE+(1-m)*D _miou (17)

wherein D represents the target distortion value, D _miou Representing a first distortion value, SSE representing a second distortion value; 1-m and m represent weighting coefficients for the first distortion value and the second distortion value, respectively.

M may be a constant in the range of 0 to 1. Here, the value of m may be equal to 0.5, may also be 0.75, and may also be a variable value (for example, obtained by performing some calculation on the current coding unit). In general, a typical value of m may be equal to 0.75.

It should be noted that, the values of the first preset parameter and the second preset parameter may be set to be the same or different. Generally, the values of the first preset parameter and the second preset parameter are the same, and for example, both values can be represented by θ. The target lagrange multiplier and the target distortion value may be calculated as follows,

λ＝θ*λ _SSE +(1-θ)*λ _miou (18)

D＝θ*SSE+(1-θ)*D _miou (19)

the constant θ is in a range from 0 to 1, and can be used to control the respective weights of the first lagrangian multiplier and the second lagrangian multiplier, and can also be used to control the respective weights of semantic distortion (i.e., the first distortion value) and fidelity distortion (i.e., the second distortion value). Alternatively, θ is typically set to 0.75.

In short, the existing video coding methods are not well suited for machine-oriented video applications. In the embodiment of the present application, λ _miou Denoted machine-oriented mass, λ _SSE The subjective quality of the human eye view is indicated and θ indicates that adjustments can be made between the subjective quality of the human eye view and the machine facing quality. For example, if θ is equal to 1, then the target distortion value at this time is completely the subjective quality of human eye viewing;if θ is equal to 0, then the target distortion value at this time is purely machine-oriented quality.

Here, the value of θ can be set by the configuration information of the encoder. Specifically, one implementation is to set directly according to the application requirements, such as the cases of 0 and 1 described above; another implementation is that the encoder sets different working modes, for example, if the working mode is set to human eyes, the encoder sets the value of θ to 1; if the working mode of the machine is set, the encoder sets the value of theta to be 0; if the value of theta is set to be man-machine hybrid, the encoder adaptively determines the value of theta, for example, a pre-encoding mode is adopted in a preprocessing stage to pre-encode a video to be encoded, and then the value of theta is estimated from a pre-encoding result.

It should be further noted that after the target lagrangian multiplier and the target distortion value are obtained, according to the target lagrangian multiplier and the target distortion value, the encoding parameters of the video to be encoded can be determined, so as to encode the video to be encoded. In some embodiments, for S305, the determining the encoding parameter of the video to be encoded by using the target lagrangian multiplier and the target distortion value may include:

constructing a rate distortion cost function based on the target Lagrange multiplier and the target distortion value;

carrying out pre-coding processing on the video to be coded by utilizing one or more candidate coding parameters, and determining rate distortion cost values corresponding to the one or more candidate coding parameters;

and selecting a minimum rate distortion cost value from the determined rate distortion cost values, and determining candidate coding parameters corresponding to the minimum rate distortion cost value as the coding parameters of the video to be coded.

Here, the encoding parameters at least include a parameter indicating a partition manner of the video to be encoded and a parameter constructing a prediction value of a coding block in the video to be encoded.

Further, in some embodiments, the encoding the video to be encoded may include: and writing the coding parameters into a code stream.

It should be noted that a rate distortion cost function can be constructed according to the target lagrangian multiplier and the target distortion value; then, pre-coding the video to be coded by utilizing one or more candidate coding parameters, thereby determining the rate distortion cost value corresponding to the one or more candidate coding parameters; selecting a minimum rate distortion cost value from the determined rate distortion cost values, determining candidate coding parameters corresponding to the minimum rate distortion cost value as coding parameters of the video to be coded, and coding the video according to the candidate coding parameters which are determined at the moment and are optimal coding parameters (the rate distortion cost is minimum); in this process, the encoding parameters can also be written into the code stream for transmission by the encoder to the decoder for recovering the original video to be encoded at the decoder side.

In the embodiment of the application, a VVC standard encoder is taken as an example, in order to improve semantic accuracy of reconstructed video, ensure fidelity of video, and meet subjective viewing requirements in a human-computer visual scene, a distortion criterion in a VVC rate distortion optimization process is modified into semantic distortion D _miou And the weighting of the fidelity distortion SSE, as expressed by the above equation (17) or equation (19); corresponding target Lagrange multiplier is modified to be lambda _miou And λ _SSE The weighting is as shown in the above equation (16) or equation (18), so that according to a rate distortion optimization algorithm of a multi-distortion criterion oriented to human-computer vision, a rate distortion process of the VVC standard encoder can be optimized, so as to improve semantic segmentation accuracy of a reconstructed video under the condition of a certain code rate, and simultaneously maintain good fidelity performance.

Illustratively, based on a VVC reference software TEST platform (VTM), after implementation on VTM7.1 is assumed, different QPs are selected at this time, a TEST video sequence in a large-scale city scene data set is encoded under RA conditions, and a semantic segmentation TEST is performed on a reconstructed video.

Firstly, different QPs are selected, the test video is coded through a VVC standard coder, the coding rate and the PSNR of the reconstructed video can be obtained, the semantic segmentation is carried out on the reconstructed video, and the segmentation accuracy is calculated. Then, according to the video coding method of the embodiment of the present application, the rate distortion process in the VVC is optimized, different QPs are selected, the test video is coded by the optimized coder, the coding rate can be obtained, and the reconstructed video obtained by coding is subjected to semantic segmentation and the segmentation accuracy is calculated. For the experimental results in the two cases, the BD-rate and BD-miou of the reconstructed video compared with the VVC standard encoder can be calculated, and the performance of the video encoding method and the VVC standard encoder in the aspect of video semantic accuracy compared with the same code rate in the embodiment of the present application can be measured.

Specifically, in the case of QP of 22, 27, 32, and 37, the performance of the video coding method of the embodiment of the present application in terms of semantic accuracy can be measured by calculating BD-miou and BD-rate of the reconstructed video of the embodiment of the present application compared to the reconstructed video of the VVC standard encoder according to the experimental results, which are shown in table 1. The BD-miou represents the improvement condition of the semantic accuracy of the reconstructed video under the condition of the same code rate, and the BD-miou is larger than 0, which shows that the semantic accuracy is improved; BD-miou is less than 0, indicating a decrease in semantic accuracy; BD-rate represents the increase of the coding rate under the condition of the same semantic accuracy, and BD-rate is greater than 0, which indicates that the coding rate is increased; the BD-rate is less than 0, which indicates that the code rate is reduced, namely the coding efficiency is improved.

In the aspect of video fidelity, the PSNR and the coding rate of a reconstructed video are counted through a file obtained by coding, and the BD-rate and the BD-PSNR of the reconstructed video compared with a VVC standard coder are obtained according to an experimental result, so that the performance of the video coding method and the performance of the VVC standard coder in the aspect of video fidelity compared with the same coding rate can be measured.

Specifically, the BD-PSNR and BD-rate of the reconstructed video of the embodiment of the present application compared to the reconstructed video of the VVC standard encoder are calculated according to the experimental results, which can measure the performance of the video encoding method of the embodiment of the present application in terms of fidelity, and the experimental results are shown in table 2. Here, the BD-PSNR represents the increase of the fidelity of the reconstructed video under the condition of the same code rate, and the BD-PSNR is greater than 0, which indicates that the fidelity is increased; BD-PSNR is less than 0, indicating a reduced fidelity; the BD-rate represents the increase of the coding rate under the condition of the same fidelity, and the BD-rate is greater than 0, which indicates that the coding rate is increased; when the BD-rate is less than 0, the code rate is reduced, namely the coding efficiency is improved.

TABLE 1

	BD-miou	BD-rate
θ＝0.75	0.0112	-24.8673

TABLE 2

	BD-PSNR	BD-rate
θ＝0.75	0.0316	-1.0836

Further, according to the above experimental results, the following technical advantages are obtained by using the video coding method of the embodiment of the present application:

according to table 1, the BD-miou obtained according to the experimental result is 0.0112, which shows that the video coding method according to the embodiment of the present application can improve the semantic segmentation accuracy of the reconstructed video under the condition of the same code rate. In addition, under the condition of a lower code rate, the overall semantic effect of the embodiment of the application is better than that of a VVC standard encoder. That is to say, the embodiment of the application can improve the semantic segmentation accuracy of the reconstructed video under the condition of the same code rate.

According to table 1, it can be further obtained that the BD-rate obtained according to the experimental result is-24.8673, which shows that the video coding method of the embodiment of the present application can reduce the coding rate of the video under the condition of the same semantic accuracy. That is to say, the embodiment of the present application can reduce the code rate under the condition of the same semantic accuracy.

According to table 2, the BD-PSNR obtained according to the experimental result is 0.0316, which shows that the video coding method according to the embodiment of the present application can improve the fidelity of the reconstructed video under the condition of the same code rate. That is to say, the embodiment of the application can improve the fidelity of the reconstructed video under the condition of the same code rate.

According to table 2, the BD-rate obtained according to the experimental result is-1.0836, which shows that the video encoding method of the embodiment of the present application can reduce the encoding rate of the video under the same fidelity. That is to say, the embodiment of the application can reduce the code rate under the condition of the same fidelity.

In addition, after the video coding method of the embodiment of the application is used, the PSNR performance of the reconstructed video is basically not reduced compared with that of a VVC standard encoder. Under the condition of a low code rate, the subjective performance of the video coding method is superior to VVC, which shows that the video coding method of the embodiment of the application can ensure the fidelity of the reconstructed video while improving the semantic accuracy, and meets the subjective watching requirement of the video. That is to say, the embodiment of the application can ensure the fidelity of the video while improving the semantic effect.

It should be further noted that, in the video encoding method in the embodiment of the present application, the rate distortion process in the VVC is optimized, and the process of video encoding and decoding and the code stream structure are not changed, so that the complexity of encoding and decoding is not increased. In addition, the video coding method of the embodiment of the application can also reduce the coding rate of the video, thereby shortening the time required by coding and improving the coding speed.

That is to say, for an application scenario of video semantic segmentation, i.e. human-machine vision, in the embodiment of the present application, a semantic distortion metric is defined, a corresponding first lagrangian multiplier is derived, and the weights of semantic distortion and SSE distortion and the weights of the first lagrangian multiplier and the second lagrangian multiplier are adjusted by preset parameters (including a first preset parameter and a second preset parameter), so that a rate distortion process of video coding is optimized, so that the semantic segmentation accuracy of a reconstructed video can be improved under the condition of a certain code rate, and better fidelity performance can be maintained.

The embodiment provides a video coding method applied to an encoder. Determining a first Lagrange multiplier and a second Lagrange multiplier according to a pre-parameter by determining the pre-parameter of a video to be coded; determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier; determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion; determining a target distortion value according to the first distortion value and the second distortion value; and determining the encoding parameters of the video to be encoded by using the target Lagrange multiplier and the target distortion value, and encoding the video to be encoded. Therefore, rate distortion optimization is carried out by comprehensively considering the first distortion measurement criterion based on the semantic distortion measurement and the second distortion measurement criterion based on the numerical error measurement in video coding, the method can be well suitable for application scenes facing machine vision and man-machine vision, the semantic segmentation accuracy of the reconstructed video can be improved under the condition of a certain code rate, and meanwhile, better fidelity performance can be kept, so that the coding efficiency is improved.

Based on the same inventive concept of the foregoing embodiment, refer to fig. 7, which illustrates a schematic structural diagram of a component of an encoder 70 provided in an embodiment of the present application. As shown in fig. 70, the encoder 70 may include: a determination unit 701, a calculation unit 702, and an encoding unit 703; wherein, the first and the second end of the pipe are connected with each other,

a determining unit 701 configured to determine a pre-parameter of a video to be encoded, and determine a first lagrangian multiplier and a second lagrangian multiplier according to the pre-parameter;

a calculating unit 702 configured to determine a target lagrangian multiplier according to the first lagrangian multiplier and the second lagrangian multiplier;

a determining unit 701, further configured to determine a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; and determining a second distortion value in accordance with a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion;

a calculating unit 702, further configured to determine a target distortion value according to the first distortion value and the second distortion value;

an encoding unit 703 configured to determine an encoding parameter of the video to be encoded by using the target lagrangian multiplier and the target distortion value, and encode the video to be encoded.

In some embodiments, the pre-parameter includes a quantization parameter, and the determining unit 701 is further configured to determine the quantization parameter of a coding unit in the video to be coded, where the coding unit includes at least one of: picture, slice, sub-picture, tile, coded block.

In some embodiments, the determining unit 701 is further configured to determine a first calculation model parameter, where the first calculation model represents a correspondence between the first lagrangian multiplier and a quantization parameter; and determining the first Lagrangian multiplier according to the quantization parameter and the first calculation model.

In some embodiments, the determining unit 701 is further configured to set the first lagrangian multiplier to a weighting value equal to an exponential power of the quantization parameter in the first computational model; wherein the first computational model parameter comprises a first exponential parameter indicative of the exponential power and a first weighting coefficient indicative of the weighting.

In some embodiments, the determining unit 701 is further configured to set the first calculation model parameter to a preset value.

In some embodiments, the determining unit 701 is further configured to determine, based on a test video, a first relation function between the first distortion value and a code rate of the test video using the first distortion metric criterion, perform a derivative operation on the first relation function, and determine a derivative function of the first relation function; and determining a second relation function between the code rate and the quantization parameter based on the test video, and determining the first calculation model parameter according to the derivative function and the second relation function.

In some embodiments, the pre-parameters include a quantization parameter and a target code rate, and the determining unit 701 is further configured to determine the quantization parameter and the target code rate of a coding unit in the video to be coded, where the coding unit includes at least one of: picture, slice, sub-picture, tile, coded block.

In some embodiments, the determining unit 701 is further configured to determine a second calculation model parameter, where the second calculation model represents a correspondence between the first lagrangian multiplier and a code rate; and determining the first Lagrangian multiplier according to the target code rate and the second calculation model.

In some embodiments, the determining unit 701 is further configured to set the first lagrangian multiplier to a weighting equal to an exponential power of the target code rate in the second calculation model; wherein the second computational model parameter comprises a second index parameter indicative of the exponential power and a second weighting coefficient indicative of the weighting.

In some embodiments, the determining unit 701 is further configured to set the second calculation model parameter to a preset value.

In some embodiments, the determining unit 701 is further configured to determine, based on a test video, a first relation function between the first distortion value and a code rate of the test video using the first distortion metric criterion; and performing derivative operation on the first relation function to determine the second calculation model parameter.

In some embodiments, the determining unit 701 is further configured to determine a target bitrate of a coding unit in the video to be coded by using a bit allocation manner.

In some embodiments, the determining unit 701 is further configured to determine the second lagrangian multiplier according to a preset third calculation model; wherein the third computational model represents a correspondence between the second Lagrangian multiplier and a quantization parameter.

In some embodiments, the determining unit 701 is further configured to determine a quantization parameter of a coding unit in the video to be coded in a rate control manner.

In some embodiments, the determining unit 701 is further configured to set the quantization parameter to a preset value.

In some embodiments, the determining unit 701 is further configured to perform semantic segmentation on the test video based on the test video, and determine semantic accuracy of one or more categories; and determining a target semantic accuracy from the semantic accuracies of the one or more categories;

the calculating unit 702 is further configured to perform a distortion measure on the target semantic accuracy by using a fourth calculation model, so as to obtain the first distortion value.

In some embodiments, the calculating unit 702 is further configured to calculate a weighted sum of the semantic accuracies of the one or more classes, the resulting weighted sum being determined as the target semantic accuracy.

In some embodiments, the determining unit 701 is further configured to determine the fourth calculation model parameter, where the fourth calculation model represents a correspondence between the first distortion value and the target semantic accuracy; and obtaining the first distortion value according to the target semantic accuracy and the fourth calculation model.

In some embodiments, the determining unit 701 is further configured to set the first distortion value to a weighted value equal to a logarithm of the target semantic accuracy in the fourth computational model; wherein the fourth computational model parameter comprises a base parameter indicative of the logarithm and a fourth weighting coefficient parameter indicative of the weighting.

In some embodiments, the determining unit 701 is further configured to set the fourth calculation model parameter to a preset value.

In some embodiments, the determining unit 701 is further configured to determine fifth calculation model parameters, the fifth calculation model representing a third relation function between the first distortion value and a mean square error; and determining a target mean square error of an encoding unit in the video to be encoded, and determining the first distortion value according to the target mean square error and the fifth calculation model.

In some embodiments, the determining unit 701 is further configured to set the first distortion value equal to a product of the target mean square error and a first parameter factor and superimpose a sum of second parameter factors in the fifth calculation model; wherein the fifth computational model parameter includes the first parametric factor and the second parametric factor.

In some embodiments, the determining unit 701 is further configured to set the fifth calculation model parameter to a preset value.

In some embodiments, the determining unit 701 is further configured to determine, based on a test video, a third relation function between the first distortion value and a mean square error of the test video using the first distortion metric criterion; and determining the fifth calculation model parameter according to the third relation function.

In some embodiments, the determining unit 701 is further configured to determine a reconstruction value of a coding unit in the video, wherein the coding unit includes at least one of: pictures, slices, sub-pictures, tiles, coded blocks;

a calculation unit 702, further configured to determine the second distortion value from a reconstructed value and an original value of the encoding unit based on the numerical error criterion; wherein the numerical error criterion is one of: sum of absolute error criterion, mean absolute error criterion, sum of squared error criterion, mean square error criterion.

In some embodiments, the determining unit 701 is further configured to determine a first preset parameter; the first preset parameter is used for controlling weight values corresponding to the first Lagrange multiplier and the second Lagrange multiplier;

the calculating unit 702 is further configured to perform weighted calculation on the first lagrangian multiplier and the second lagrangian multiplier by using the first preset parameter, so as to obtain the target lagrangian multiplier.

In some embodiments, referring to fig. 7, the encoder 70 may further include a configuration unit 704 configured to set the first preset parameter according to configuration information of the encoder.

In some embodiments, the configuration unit 704 is further configured to set the target lagrangian multiplier equal to a weighted sum of the first lagrangian multiplier and the second lagrangian multiplier when the configuration information of the encoder indicates that the first preset parameter is equal to k, where k is any value greater than or equal to 0 and less than or equal to 1, a weighting coefficient of the first lagrangian multiplier is set equal to 1-k, and a weighting coefficient of the second lagrangian multiplier is set equal to k.

In some embodiments, k is equal to 0.75.

In some embodiments, the determining unit 701 is further configured to determine a second preset parameter; the second preset parameter is used for controlling weight values corresponding to the first distortion value and the second distortion value;

the calculating unit 702 is further configured to perform weighted calculation on the first distortion value and the second distortion value by using the second preset parameter, so as to obtain the target distortion value.

In some embodiments, the configuration unit 704 is further configured to set the second preset parameter according to the configuration information of the encoder.

In some embodiments, the configuring unit 704 is further configured to set the target distortion value equal to a weighted sum of the first distortion value and the second distortion value when the configuration information of the encoder indicates that the second preset parameter is equal to m, where m is any value greater than or equal to 0 and less than or equal to 1, a weighting coefficient of the first distortion value is set equal to 1-m, and a weighting coefficient of the second distortion value is set equal to m.

In some embodiments, m is equal to 0.75.

In some embodiments, the determining unit 701 is further configured to construct a rate-distortion cost function based on the target lagrangian multiplier and the target distortion value; carrying out pre-coding processing on the video to be coded by utilizing one or more candidate coding parameters, and determining rate distortion cost values corresponding to the one or more candidate coding parameters; and selecting a minimum rate distortion cost value from the determined rate distortion cost values, and determining candidate coding parameters corresponding to the minimum rate distortion cost value as the coding parameters of the video to be coded.

In some embodiments, the encoding parameters include at least a parameter indicating a partitioning manner of the video to be encoded and a parameter constructing a prediction value of a coding block in the video to be encoded.

In some embodiments, referring to fig. 7, the encoder 70 may further include a writing unit 705 configured to write the encoding parameters into the code stream.

It is understood that in the embodiments of the present application, a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, and the like, and may also be a module, and may also be non-modular. Moreover, each component in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.

Based on the understanding that the technical solution of the present embodiment essentially or partly contributes to the prior art, or all or part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.

Accordingly, the present embodiment provides a computer storage medium applied to the encoder 70, which stores a computer program that, when executed by at least one processor, implements the steps of the method of any one of the preceding embodiments.

Based on the above-mentioned composition of the encoder 70 and the computer storage medium, referring to fig. 8, it shows a specific hardware structure example of the encoder 70 provided in the embodiment of the present application, which may include: a communication interface 801, a memory 802, and a processor 803; the various components are coupled together by a bus system 804. It is understood that the bus system 804 is used to enable communications among the components for the connection. The bus system 804 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are identified in fig. 7 as the bus system 804. The communication interface 801 is used for receiving and sending signals in the process of receiving and sending information with other external network elements;

a memory 802 for storing a computer program capable of running on the processor 803;

a processor 803, configured to, when running the computer program, perform:

and determining the encoding parameters of the video to be encoded by utilizing the target Lagrange multiplier and the target distortion value, and encoding the video to be encoded.

It will be appreciated that the memory 802 in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. The volatile Memory may be a Random Access Memory (RAM) which serves as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), enhanced Synchronous SDRAM (ESDRAM), synchronous Link Dynamic Random Access Memory (SLDRAM), and Direct Rambus RAM (DRRAM). The memory 802 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

The processor 803 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 803. The Processor 803 may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, etc. as is well known in the art. The storage medium is located in the memory 802, and the processor 803 reads the information in the memory 802, and completes the steps of the above method in combination with the hardware thereof.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Optionally, as another embodiment, the processor 803 is further configured to perform the steps of the method of any one of the previous embodiments when running the computer program.

The present embodiment provides an encoder including a determination unit, a calculation unit, and an encoding unit; the determining unit is configured to determine a pre-parameter of a video to be encoded, and determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameter; the computing unit is configured to determine a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier; and the determining unit is further configured to determine the first distortion value in accordance with a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; and determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion; the calculation unit is further configured to determine a target distortion value based on the first distortion value and the second distortion value; the encoding unit is configured to determine an encoding parameter of the video to be encoded by using the target lagrangian multiplier and the target distortion value, and encode the video to be encoded. Therefore, rate distortion optimization is carried out by comprehensively considering the first distortion measurement criterion based on the semantic distortion measurement and the second distortion measurement criterion based on the numerical error measurement in video coding, the method can be well suitable for application scenes facing machine vision and man-machine vision, the semantic segmentation accuracy of the reconstructed video can be improved under the condition of a certain code rate, and meanwhile, better fidelity performance can be kept, so that the coding efficiency is improved.

Referring to fig. 9, a schematic diagram of a component structure of a video system provided in an embodiment of the present application is shown. As shown in fig. 9, the video system 90 may include an encoder 901 and a decoder 902. The encoder 901 may be the encoder 70 described in any of the previous embodiments.

The encoder 901 is configured to determine a pre-parameter of a video to be encoded, and determine a first lagrangian multiplier and a second lagrangian multiplier according to the pre-parameter; determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier; and determining a first distortion value in accordance with a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; and determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion; determining a target distortion value according to the first distortion value and the second distortion value; determining encoding parameters of the video to be encoded by using the target Lagrange multiplier and the target distortion value, encoding the video to be encoded to generate a code stream, and transmitting the code stream to the decoder;

the decoder 902 is configured to parse the code stream to obtain a decoded video.

Further, in some embodiments, the decoder 902 is further configured to parse the code stream, obtain a decoding parameter, and obtain the decoded video according to the decoding parameter; the decoding parameters at least comprise parameters indicating the dividing mode of the video to be decoded and parameters for constructing a predicted value of a decoding block in the video to be decoded.

In the embodiment of the present application, the video system 90 performs rate distortion optimization in video coding by comprehensively considering a first distortion metric criterion based on semantic distortion metric and a second distortion metric criterion based on numerical error metric, which can be well adapted to application scenarios facing machine vision and human-machine vision, and can improve semantic segmentation accuracy of reconstructed video under the condition of a certain code rate, and simultaneously can maintain good fidelity performance, thereby improving coding efficiency.

It should also be noted that, in the present application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided herein may be combined in any combination to arrive at a new method or apparatus embodiment without conflict.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Industrial applicability

In the embodiment of the application, a first Lagrange multiplier and a second Lagrange multiplier are determined according to a pre-parameter of a video to be coded; determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier; determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; determining a second distortion value based on a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion; determining a target distortion value according to the first distortion value and the second distortion value; and determining encoding parameters of the video to be encoded by using the target Lagrange multiplier and the target distortion value, and encoding the video to be encoded. Therefore, the first distortion measurement criterion based on the semantic distortion measurement and the second distortion measurement criterion based on the numerical error measurement are comprehensively considered in video coding for rate distortion optimization, the method can be well suitable for application scenes facing machine vision and man-machine vision, the semantic segmentation accuracy of the reconstructed video can be improved under the condition of a certain code rate, and meanwhile, better fidelity performance can be kept, so that the coding efficiency is improved.

Claims

A video encoding method applied to an encoder, the method comprising:

determining a pre-parameter of a video to be coded, and determining a first Lagrange multiplier and a second Lagrange multiplier according to the pre-parameter;

determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier;

determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion;

determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion;

determining a target distortion value according to the first distortion value and the second distortion value;

and determining the encoding parameters of the video to be encoded by using the target Lagrange multiplier and the target distortion value, and encoding the video to be encoded.
The method of claim 1, wherein the pre-parameter comprises a quantization parameter, and the determining the pre-parameter of the video to be encoded comprises:

determining a quantization parameter of a coding unit in the video to be coded, wherein the coding unit comprises at least one of the following: picture, slice, sub-picture, tile, coded block.
The method as claimed in claim 2 wherein said determining a first lagrangian multiplier according to said pre-parameter comprises:

determining a first computational model parameter, the first computational model representing a correspondence between the first Lagrangian multiplier and a quantization parameter;

and determining the first Lagrangian multiplier according to the quantization parameter and the first calculation model.
The method of claim 3, wherein the determining first computational model parameters comprises:

setting the first Lagrangian multiplier to a weighting value equal to an exponential power of the quantization parameter in the first computational model;

the first computational model parameter includes a first exponential parameter indicative of the exponential power and a first weighting coefficient indicative of the weighting.
The method of claim 4, wherein the method further comprises:

and setting the first calculation model parameter as a preset value.
The method of claim 4, wherein the method further comprises:

determining a first relation function between the first distortion value and the code rate of the test video by using the first distortion metric criterion based on the test video, and performing derivative operation on the first relation function to determine a derivative function of the first relation function;

determining a second relation function between the code rate and the quantization parameter based on the test video, and determining the first calculation model parameter according to the derivative function and the second relation function.
The method of claim 1, wherein the pre-parameters comprise a quantization parameter and a target code rate, and the determining the pre-parameters of the video to be encoded comprises:

determining a quantization parameter and a target code rate of an encoding unit in the video to be encoded, wherein the encoding unit comprises at least one of the following: picture, slice, sub-picture, tile, coded block.
The method as claimed in claim 7, wherein said determining a first Lagrangian multiplier according to said pre-parameter comprises:

determining a second calculation model parameter, wherein the second calculation model represents the corresponding relation between the first Lagrange multiplier and the code rate;

and determining the first Lagrangian multiplier according to the target code rate and the second calculation model.
The method of claim 8, wherein the determining second computational model parameters comprises:

setting the first Lagrangian multiplier to a weight equal to an exponential power of the target code rate in the second computational model;

the second calculation model parameter includes a second exponential parameter indicative of the exponential power and a second weighting coefficient indicative of the weighting.
The method of claim 9, wherein the method further comprises:

and setting the parameters of the second calculation model to preset values.
The method of claim 9, wherein the method further comprises:

determining, based on a test video, a first relationship function between the first distortion value and a bitrate of the test video using the first distortion metric criterion;

and carrying out derivative operation on the first relation function to determine the second calculation model parameter.
The method of claim 7, wherein the determining a target code rate for coding units in the video to be encoded comprises:

and determining the target code rate of the coding unit in the video to be coded by using a bit allocation mode.
The method according to claim 2 or 7, wherein said determining a second Lagrangian multiplier according to said pre-parameter comprises:

determining the second Lagrange multiplier according to a preset third calculation model; wherein the third computational model represents a correspondence between the second Lagrangian multiplier and a quantization parameter.
The method of claim 2 or 7, wherein the determining quantization parameters for coding units in the video to be encoded comprises:

and determining the quantization parameter of the coding unit in the video to be coded by using a code rate control mode.
The method of claim 2 or 7, wherein the determining quantization parameters for coding units in the video to be encoded comprises:

and setting the quantization parameter as a preset value.
The method of claim 1, wherein the determining a first distortion value according to a first distortion metric criterion comprises:

performing semantic segmentation on a test video based on the test video, and determining semantic accuracy of one or more categories;

determining a target semantic accuracy from the semantic accuracies of the one or more categories;

and carrying out distortion measurement on the target semantic accuracy by utilizing a fourth calculation model to obtain the first distortion value.
The method of claim 16, wherein the determining a target semantic accuracy from the semantic accuracies of the one or more categories comprises:

calculating a weighted sum of semantic accuracies of the one or more categories, determining the resulting weighted sum as the target semantic accuracy.
The method of claim 16, wherein the distortion measuring the target semantic accuracy using the fourth computational model to obtain the first distortion value comprises:

determining the fourth computational model parameters, the fourth computational model representing a correspondence between the first distortion value and the target semantic accuracy;

and obtaining the first distortion value according to the target semantic accuracy and the fourth calculation model.
The method of claim 18, wherein said determining fourth computational model parameters comprises:

in the fourth computational model, setting the first distortion value to a weighted value equal to a logarithm of the target semantic accuracy;

the fourth calculation model parameter includes a base parameter indicative of the logarithm and a fourth weighting coefficient parameter indicative of the weighting.
The method of claim 19, wherein the method further comprises:

and setting the fourth calculation model parameter as a preset value.
The method of claim 1, wherein the determining a first distortion value according to a first distortion metric criterion comprises:

determining fifth computational model parameters representing a third relationship function between the first distortion value and a mean square error;

and determining a target mean square error of an encoding unit in the video to be encoded, and determining the first distortion value according to the target mean square error and the fifth calculation model.
The method of claim 21, wherein said determining fifth computational model parameters comprises:

setting the first distortion value equal to the product of the target mean square error and a first parameter factor and adding a sum of second parameter factors in the fifth calculation model;

the fifth computational model parameter includes the first parametric factor and the second parametric factor.
The method of claim 22, wherein the method further comprises:

and setting the parameters of the fifth calculation model as preset values.
The method of claim 22, wherein the method further comprises:

determining, based on a test video, a third relationship function between the first distortion value and a mean square error of the test video using the first distortion metric criterion;

and determining the fifth calculation model parameter according to the third relation function.
The method as recited in claim 1, wherein the determining a second distortion value in accordance with a second distortion metric criterion comprises:

determining a reconstruction value for a coding unit in the video, wherein the coding unit comprises at least one of: pictures, slices, sub-pictures, tiles, coded blocks;

determining the second distortion value according to a reconstructed value and an original value of the encoding unit based on the numerical error criterion;

wherein the numerical error criterion is one of: sum of absolute error criterion, mean absolute error criterion, sum of squared error criterion, mean square error criterion.
The method as recited in claim 1, wherein said determining a target lagrangian multiplier from the first lagrangian multiplier and the second lagrangian multiplier comprises:

determining a first preset parameter; the first preset parameter is used for controlling weight values corresponding to the first Lagrangian multiplier and the second Lagrangian multiplier;

and performing weighted calculation on the first Lagrangian multiplier and the second Lagrangian multiplier by using the first preset parameter to obtain the target Lagrangian multiplier.
The method of claim 26, wherein the determining a first preset parameter comprises:

and setting the first preset parameter according to the configuration information of the encoder.
The method of claim 26, wherein the method further comprises:

when the configuration information of the encoder indicates that the first preset parameter is equal to k, setting the target Lagrangian multiplier to be equal to a weighted sum of the first Lagrangian multiplier and the second Lagrangian multiplier, wherein k is any value greater than or equal to 0 and less than or equal to 1, the weighting coefficient of the first Lagrangian multiplier is set to be equal to 1-k, and the weighting coefficient of the second Lagrangian multiplier is set to be equal to k.
The method of claim 28, wherein k is equal to 0.75.
The method as recited in claim 1, wherein the determining a target distortion value as a function of the first distortion value and the second distortion value comprises:

determining a second preset parameter; the second preset parameter is used for controlling weight values corresponding to the first distortion value and the second distortion value;

and performing weighted calculation on the first distortion value and the second distortion value by using the second preset parameter to obtain the target distortion value.
The method of claim 30, wherein said determining a second preset parameter comprises:

and setting the second preset parameter according to the configuration information of the encoder.
The method of claim 31, wherein the method further comprises:

when the configuration information of the encoder indicates that the second preset parameter is equal to m, setting the target distortion value to be equal to a weighted sum of the first distortion value and the second distortion value, where m is any value greater than or equal to 0 and less than or equal to 1, setting a weighting coefficient of the first distortion value to be equal to 1-m, and setting a weighting coefficient of the second distortion value to be equal to m.
The method of claim 32, wherein m is equal to 0.75.
The method according to claim 1, wherein said determining the encoding parameters of the video to be encoded using the target lagrangian multiplier and the target distortion value comprises:

constructing a rate distortion cost function based on the target Lagrange multiplier and the target distortion value;

carrying out pre-coding processing on the video to be coded by utilizing one or more candidate coding parameters, and determining rate distortion cost values corresponding to the one or more candidate coding parameters;

and selecting a minimum rate distortion cost value from the determined rate distortion cost values, and determining candidate coding parameters corresponding to the minimum rate distortion cost value as the coding parameters of the video to be coded.
The method according to claim 1 or 34, wherein said coding parameters comprise at least a parameter indicating a partitioning manner of said video to be coded and a parameter constructing a prediction value of a coding block in said video to be coded.
The method of claim 35, wherein the encoding the video to be encoded comprises:

and writing the coding parameters into a code stream.
An encoder comprising a determining unit, a calculating unit and an encoding unit; wherein the content of the first and second substances,

the determining unit is configured to determine a pre-parameter of a video to be encoded, and determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameter;

the computing unit is configured to determine a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier;

the determining unit is further configured to determine a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; and determining a second distortion value in accordance with a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion;

the calculation unit is further configured to determine a target distortion value according to the first distortion value and the second distortion value;

the encoding unit is configured to determine an encoding parameter of the video to be encoded by using the target lagrangian multiplier and the target distortion value, and encode the video to be encoded.
An encoder, the encoder comprising a memory and a processor; wherein the content of the first and second substances,

the memory for storing a computer program operable on the processor;

the processor, when running the computer program, is configured to perform the method of any of claims 1 to 36.
A computer storage medium, wherein the computer storage medium stores a computer program which, when executed by at least one processor, implements the method of any one of claims 1 to 36.
A video system, the video system comprising an encoder and a decoder; wherein the content of the first and second substances,

the encoder is configured to determine a pre-parameter of a video to be encoded, and determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameter; determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier; and determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion comprises a semantic distortion metric criterion; and determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion comprises a numerical error metric criterion; determining a target distortion value according to the first distortion value and the second distortion value; determining encoding parameters of the video to be encoded by using the target Lagrange multiplier and the target distortion value, encoding the video to be encoded to generate a code stream, and transmitting the code stream to the decoder;

the decoder is configured to analyze the code stream to obtain a decoded video.
The system of claim 40, wherein,

the decoder is also configured to analyze the code stream, obtain decoding parameters and obtain the decoded video according to the decoding parameters; the decoding parameters at least comprise a parameter indicating the partition mode of the video to be decoded and a parameter constructing a predicted value of a decoding block in the video to be decoded.