WO2022021422A1

WO2022021422A1 - Video coding method and system, coder, and computer storage medium

Info

Publication number: WO2022021422A1
Application number: PCT/CN2020/106416
Authority: WO
Inventors: 元辉; 周兰; 李明; 姜东冉
Original assignee: Oppo广东移动通信有限公司
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2022-02-03
Also published as: CN115428451A

Abstract

Disclosed are a video coding method and system, a coder, and a computer storage medium. The method comprises: determining a pre-parameter of a video to be coded, and determining a first Lagrange multiplier and a second Lagrange multiplier according to the pre-parameter; determining a target Lagrange multiplier according to the first Lagrange multiplier and the second Lagrange multiplier; determining a first distortion value according to a first distortion measure criterion, the first distortion measure criterion comprising a semantic distortion measure criterion; determining a second distortion value according to a second distortion measure criterion, the second distortion measure criterion comprising a numerical error measurement criterion; determining a target distortion value according to the first distortion value and the second distortion value; and using the target Lagrange multiplier and the target distortion value to determine a coding parameter of said video, and coding said video.

Description

Video encoding method, encoder, system, and computer storage medium

technical field

The embodiments of the present application relate to the technical field of video coding and decoding, and in particular, to a video coding method, an encoder, a system, and a computer storage medium.

Background technique

At present, the International Telecommunication Union (ITU) and the International Organization for Standardization (ISO) have established the Joint Video Experts Team (JVET) to study the latest video coding standard H.266/ Versatile Video Coding (VVC), and makes H.266/VVC improve the performance of H.265/High Efficiency Video Coding (HEVC) by about 40%, which is the industry's most leading video compression technology plan.

Generally speaking, for the same video encoding algorithm, the higher the bit rate, the better the reconstructed video quality and the smaller the distortion; however, the encoded file will occupy more storage space and the generated bit rate will be larger. At this time, it is necessary to find a balance between the distortion of the reconstructed video and the bit rate through the Rate Distortion Optimization (RDO) technology, so that the compression effect is optimal.

However, in the current related art, the rate-distortion optimization algorithm can either only guarantee the fidelity of the reconstructed video, or can guarantee the subjective quality of the reconstructed video, but the fidelity performance of the video will be greatly reduced. Especially in the video coding for machine vision and human-machine vision, the distortion criteria adopted by the existing rate-distortion optimization algorithms are single and incomplete, so that the existing rate-distortion optimization algorithms cannot be well adapted to machine vision and computer vision. Application scenarios of human-machine vision.

SUMMARY OF THE INVENTION

Embodiments of the present application provide a video encoding method, encoder, system, and computer storage medium, which can be well adapted to application scenarios oriented to machine vision and human-machine vision, and can improve the reconstructed video under the condition of a certain bit rate The accuracy of semantic segmentation can be improved, while maintaining good fidelity performance, thereby improving coding efficiency.

The technical solutions of the embodiments of the present application can be implemented as follows:

In a first aspect, an embodiment of the present application provides a video encoding method, which is applied to an encoder, and the method includes:

determining pre-parameters of the video to be encoded, and determining a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameters;

determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier;

determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion includes a semantic distortion metric criterion;

determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion includes a numerical error metric criterion;

determining a target distortion value according to the first distortion value and the second distortion value;

Using the target Lagrangian multiplier and the target distortion value, the encoding parameters of the video to be encoded are determined, and the video to be encoded is encoded.

In a second aspect, an embodiment of the present application provides an encoder, the encoder includes a determination unit, a calculation unit, and an encoding unit; wherein,

The determining unit is configured to determine pre-parameters of the video to be encoded, and determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameters;

the computing unit, configured to determine a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier;

The determining unit is further configured to determine a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion includes a semantic distortion metric criterion; and determine a second distortion value according to a second distortion metric criterion, Wherein, the second distortion metric criterion includes a numerical error metric criterion;

The computing unit is further configured to determine a target distortion value according to the first distortion value and the second distortion value;

The encoding unit is configured to use the target Lagrangian multiplier and the target distortion value to determine the encoding parameter of the video to be encoded, and to encode the video to be encoded.

In a third aspect, an embodiment of the present application provides an encoder, where the encoder includes a memory and a processor; wherein,

the memory for storing a computer program executable on the processor;

The processor is configured to execute the method according to the first aspect when running the computer program.

In a fourth aspect, an embodiment of the present application provides a computer storage medium, where the computer storage medium stores a computer program, and the computer program implements the method according to the first aspect when the computer program is executed by at least one processor.

In a fifth aspect, an embodiment of the present application provides a video system, where the video system includes an encoder and a decoder; wherein,

the encoder, configured to determine pre-parameters of the video to be encoded, determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameters; and according to the first Lagrangian a multiplier and the second Lagrangian multiplier to determine a target Lagrange multiplier; and a first distortion value based on a first distortion metric criterion, wherein the first distortion metric criterion includes a semantic distortion metric and determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion includes a numerical error metric criterion; and determining a target distortion according to the first distortion value and the second distortion value and using the target Lagrangian multiplier and the target distortion value to determine the encoding parameters of the to-be-encoded video, encode the to-be-encoded video to generate a code stream, and transmit the code stream to the decoder;

The decoder is configured to parse the code stream to obtain decoded video.

Embodiments of the present application provide a video encoding method, encoder, system, and computer storage medium, by determining pre-parameters of the video to be encoded, and determining a first Lagrangian multiplier and a second Lagrangian according to the pre-parameters the Lagrangian multiplier; according to the first Lagrangian multiplier and the second Lagrangian multiplier, determine the target Lagrangian multiplier; according to the first distortion metric criterion, determine the first distortion value, Wherein, the first distortion metric criterion includes a semantic distortion metric criterion; a second distortion value is determined according to a second distortion metric criterion, wherein the second distortion metric criterion includes a numerical error metric criterion; according to the first distortion value and the second distortion value, to determine a target distortion value; using the target Lagrangian multiplier and the target distortion value to determine the encoding parameters of the video to be encoded, and to encode the video to be encoded. In this way, the first distortion metric based on the semantic distortion metric and the second distortion metric based on the numerical error metric are comprehensively considered in video coding for rate-distortion optimization, which can be well adapted to machine vision and human-machine vision-oriented applications. scene, and under a certain bit rate, it can improve the semantic segmentation accuracy of the reconstructed video, while maintaining a good fidelity performance, thereby also improving the coding efficiency.

Description of drawings

Fig. 1 is the structural representation of a kind of RD curve that related technical scheme provides;

2 is a schematic structural diagram of a system composition of an encoder according to an embodiment of the present application;

3 is a schematic flowchart of a video encoding method provided by an embodiment of the present application;

4 is a schematic diagram of a curve of a functional relationship between a first distortion value and a code rate according to an embodiment of the present application;

5 is a schematic diagram of a curve of a functional relationship between a code rate and a quantization parameter according to an embodiment of the present application;

6 is a schematic diagram of a curve of a functional relationship between a first distortion value and MSE provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of the composition and structure of an encoder provided by an embodiment of the present application;

8 is a schematic diagram of a specific hardware structure of an encoder provided by an embodiment of the present application;

FIG. 9 is a schematic structural diagram of a video system according to an embodiment of the present application.

detailed description

In order to have a more detailed understanding of the features and technical contents of the embodiments of the present application, the implementation of the embodiments of the present application will be described in detail below with reference to the accompanying drawings.

With the development of the digital media era, the transmission of continuous media data through the network has become the general trend, and more and more users hope to use personal computers (Personal Computer, PC) and non-PC devices to conduct video communication and video communication through the Internet and wireless networks. Services, this anytime, anywhere video communication and service pose a greater challenge to current video coding technologies.

It should be understood that the International Telecommunication Union (ITU) and the International Organization for Standardization (ISO) established the Joint Video Experts Team (JVET) to study the next-generation video coding standard H. 266/Versatile Video Coding (VVC), and the technology accumulated in the industry has further improved the performance of H.266/VVC by about 40% compared to H.265/HEVC, which is the most advanced video compression technology in the industry. plan.

Generally speaking, for the same video encoding algorithm, the higher the bit rate, the better the reconstructed video quality and the smaller the distortion; however, the larger the storage space occupied by the encoded file, the larger the generated bit rate. Therefore, at this time, it is necessary to find a balance between the distortion of the reconstructed video and the bit rate through a rate-distortion optimization algorithm, so that the compression effect is optimal.

It should be noted that the rate-distortion optimization can be expressed as minimizing the distortion of the decoded and reconstructed video when the encoded file does not exceed a certain bit rate, as shown in the following formula (1).

min{D}stR<= _Rmax (1)

Among them, D and R represent the distortion and code rate under certain coding parameters, respectively.

The video is encoded with the given encoding parameters, and the encoded bit rate (R) and the distortion (D) of the reconstructed video are calculated. By changing the encoding parameters and repeatedly encoding the to-be-encoded video, multiple R-D points consisting of bit rate and distortion can be obtained, as shown in Figure 1. Normally, for a preset code rate, the point with the least distortion will appear on the convex curve (ie, the RD curve) in Fig. 1 . For the input video to be encoded, the encoder needs to determine a set of encoding parameters so that the encoded R-D point can approximate this convex curve as much as possible.

At this time, the constrained problem of the above formula (1) can be transformed into an unconstrained problem by the Lagrange multiplier method, as shown in the following formula (2).

min{J=D+λ·R} (2)

where λ is the Lagrange multiplier and J is the rate-distortion cost function. For each possible λ, the corresponding value is the slope of the RD curve tangent, and the encoder can find the optimal encoding parameters by minimizing the rate-distortion cost function.

In this way, using the distortion optimization algorithm, the encoder can determine the optimal block division method, the optimal intra-frame prediction mode, and the optimal inter-frame prediction motion mode (including motion vector, reference image, prediction weight, etc.), To achieve optimal encoding performance.

In related technical solutions (such as VVC), the rate-distortion optimization here adopts the sum of square error (SSE) as the distortion criterion, and the corresponding reconstructed video quality can be determined by the peak signal-to-noise ratio (Peak Signal to Noise Ratio, PSNR). )to measure. SSE distortion can objectively measure the fidelity of the video, and its calculation formula is shown in the following formula (3).

Among them, M and N represent the horizontal spatial resolution and vertical spatial resolution of the video, respectively, f(x, y) represents the original pixel value at the pixel position (x, y), and g(x, y) represents the pixel position (x, y) , y) at the reconstructed pixel value.

Since rate-distortion optimization is a key technology in video coding, it affects the performance of the encoder. Although the rate-distortion optimization algorithm in the existing video coding uses SSE distortion, which can measure the fidelity of the video from an objective point of view; but the SSE distortion is not consistent with the perception of the human visual system, such as for some areas with large SSE distortion , the human eye does not perceive the degradation of the reconstructed video quality. At this time, when the encoder needs to ensure the subjective quality of the reconstructed video, the distortion criterion needs to be changed to a distortion metric that can measure the subjective quality. The calculation formula is shown in the following formula (4).

where x and y represent the original image and the reconstructed image, respectively, μ _x and μ _y represent the mean of the original image and the reconstructed image, respectively,

and

represent the variance of the original image and the reconstructed image, respectively, σ _xy represent the covariance of the original image and the reconstructed image, _C1 and C2 are _two constants, in order to avoid

and

Instability occurs when it is close to 0. Here, in order to obtain a robust quality evaluation result, C ₁ =(K ₁ L) ² , C ₂ =(K ₂ L) ² ; where, L=2 ^bit_depth −1 (bit_depth represents the bit depth, for 8 bits Bit depth image, L=255), K ₁ =0.01, K ₂ =0.03.

In the related art, the rate-distortion optimization algorithm based on SSE distortion can ensure the fidelity of the reconstructed video; however, although the SSIM distortion considering the subjective quality can guarantee the subjective quality of the reconstructed video, the fidelity performance of the video will be greatly reduced.

Since the current rate-distortion optimization algorithms are all traditional application scenarios for reconstructing videos for people to watch and study, the arrival of the fifth-generation mobile communication (Fifth Generation, 5G) era has spawned a large number of machine-oriented applications, such as the Internet of Vehicles, wireless Machine vision content such as human driving, industrial Internet, smart and safe cities, wearables, and video surveillance has a wider range of application scenarios. In the 5G era and the post-5G era, most videos will be used by machines, such as intelligent analysis of reconstructed videos such as pedestrian detection, semantic segmentation, and target detection. However, in the video coding for machine vision and human-machine vision, the distortion criterion adopted by the current rate-distortion optimization algorithm only considers the fidelity distortion, and does not consider the semantic distortion; Fidelity performance, but the semantic accuracy of the reconstructed video cannot be guaranteed, resulting in the current rate-distortion optimization algorithm can not be well adapted to many scenarios for machine vision and human-machine vision.

Based on this, an embodiment of the present application provides a video encoding method. The basic idea is: determine pre-parameters of the video to be encoded, and determine the first Lagrangian multiplier and the second Lagrangian according to the pre-parameters multiplier; according to the first Lagrangian multiplier and the second Lagrangian multiplier, determine the target Lagrangian multiplier; according to the first distortion measurement criterion, determine the first distortion value, wherein, The first distortion metric criterion includes a semantic distortion metric criterion; a second distortion value is determined according to the second distortion metric criterion, wherein the second distortion metric criterion includes a numerical error metric criterion; according to the first distortion value and the obtained The second distortion value is determined, and the target distortion value is determined; the encoding parameter of the to-be-encoded video is determined by using the target Lagrangian multiplier and the target distortion value, and the to-be-encoded video is encoded. In this way, the first distortion metric based on the semantic distortion metric and the second distortion metric based on the numerical error metric are comprehensively considered in video coding for rate-distortion optimization, which can be well adapted to machine vision and human-machine vision-oriented applications. scene, and under a certain bit rate, it can improve the semantic segmentation accuracy of the reconstructed video, while maintaining a good fidelity performance, thereby also improving the coding efficiency.

The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Referring to FIG. 2, it shows an example of a system composition block diagram of an encoder provided by an embodiment of the present application. As shown in FIG. 2, the encoder 10 may include a transform and quantization unit 101, an intra-frame estimation unit 102, an intra-frame prediction unit 103, a motion compensation unit 104, a motion estimation unit 105, an inverse transform and inverse quantization unit 106, a filter Control the analysis unit 107, the filtering unit 108, the encoding unit 109, the decoded image buffering unit 110, etc., wherein the filtering unit 108 can realize deblocking filtering and sample adaptive offset (Sample Adaptive Offset, SAO) filtering, and the encoding unit 109 can realize Header information coding and context-based adaptive binary arithmetic coding (Context-based Adaptive Binary Arithmatic Coding, CABAC). For the input original video signal, a coding block (Coding Unit, CU) can be divided to obtain a video coding block, and then the residual pixel information obtained after intra-frame or inter-frame prediction is encoded by the transform and quantization unit 101. The block is transformed, including transforming the residual information from the pixel domain to the transform domain, and quantizing the resulting transform coefficients to further reduce the bit rate; the intra-frame estimation unit 102 and the intra-frame prediction unit 103 are used for this video. The coding block is intra-predicted; specifically, the intra-frame estimation unit 102 and the intra-frame prediction unit 103 are used to determine the intra-frame prediction mode to be used to encode the video coding block; the motion compensation unit 104 and the motion estimation unit 105 are used to Inter-predictive encoding of the received video encoding blocks relative to one or more blocks in one or more reference frames is performed to provide temporal prediction information; motion estimation performed by motion estimation unit 105 is the process of generating motion vectors, so The motion vector can estimate the motion of the video coding block, and then the motion compensation unit 104 performs motion compensation based on the motion vector determined by the motion estimation unit 105; after determining the intra prediction mode, the intra prediction unit 103 is also used to The selected intra prediction data is supplied to the encoding unit 109, and the motion estimation unit 105 also sends the calculated motion vector data to the encoding unit 109; in addition, the inverse transform and inverse quantization unit 106 is used for the video encoding block. Reconstruction, a residual block is reconstructed in the pixel domain, the reconstructed residual block is controlled by the filter analysis unit 107 and the filtering unit 108 to remove the blocking artifacts, and then the reconstructed residual block is added to the decoded image buffer unit A predictive block in the frame of 110 is used to generate a reconstructed video coding block; the coding unit 109 is used for coding various coding parameters and quantized transform coefficients. In the CABAC-based coding algorithm, the context content can be Based on the adjacent coding blocks, it can be used to encode the information indicating the determined intra prediction mode, and output the code stream of the video signal; and the decoded image buffer unit 110 is used to store the reconstructed video coding blocks for prediction reference. As the video image coding proceeds, new reconstructed video coding blocks are continuously generated, and these reconstructed video coding blocks are all stored in the decoded image buffer unit 110 .

The video coding method in this embodiment of the present application is mainly applied to the coding control part in the encoder 10, for example, including the coding block (Coding Unit, CU) division shown in FIG. 2, the intra prediction unit 103, the motion compensation unit 104 and Motion estimation unit 105 and other parts. That is to say, the video encoding method of the embodiment of the present application is mainly used to determine encoding parameters, so as to perform encoding according to the determined encoding parameters. Wherein, the coding parameters may include a CU division mode, and an intra-frame prediction mode or an inter-frame prediction mode for determining the CU.

Based on this, the technical solutions of the present application are further elaborated below with reference to the accompanying drawings and embodiments. Before going into detail, it should be noted that the "first", "second", "third", etc. mentioned throughout the specification are only for distinguishing different features, and do not have a limited priority or sequence. , size relationship and other functions.

An embodiment of the present application provides a video encoding method, and the method is applied to a video encoding device, that is, an encoder. The functions implemented by the method can be implemented by the processor in the encoder calling a computer program, and of course the computer program can be stored in a memory. It can be seen that the encoder includes at least a processor and a memory.

Referring to FIG. 3 , it shows a schematic flowchart of a video encoding method provided by an embodiment of the present application. As shown in Figure 3, the method may include:

S301: Determine pre-parameters of the video to be encoded, and determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameters;

S302: Determine a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier;

S303: Determine a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion includes a semantic distortion metric criterion; and determine a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion Distortion metrics include numerical error metrics;

S304: Determine a target distortion value according to the first distortion value and the second distortion value;

S305: Using the target Lagrangian multiplier and the target distortion value, determine the encoding parameter of the video to be encoded, and encode the video to be encoded.

It should be noted that the video coding method in this embodiment of the present application may be applicable to an encoder of the H.266/VVC standard, an encoder of the H.265/HEVC standard, or even an encoder of other standards , such as an encoder suitable for the first-generation video coding standard (Alliance for Open Media Video 1, AV-1) developed by the Open Media Alliance, and the embodiment of this application does not make any limitation.

It should also be noted that the rate-distortion optimization algorithm used in the video coding method of the embodiment of the present application comprehensively considers the first distortion metric based on the semantic distortion metric and the second distortion metric based on the numerical error metric, so that the rate The distortion optimization can be a multi-distortion criterion rate-distortion optimization algorithm for human-machine vision. That is to say, in video coding, in addition to the second Lagrangian multiplier and the second distortion value derived by using the related technical solution, for the human-machine vision application scenario of video semantic segmentation, the embodiment of the present application can also A semantic distortion metric is defined, and then the corresponding first Lagrangian multiplier and the first distortion value calculation formula are derived.

In this way, the target Lagrangian multiplier can be determined according to the first Lagrangian multiplier and the second Lagrangian multiplier, and the first distortion value determined according to the first distortion metric criterion and the second distortion value The second distortion value determined by the metric criterion can also determine the target distortion value; in this way, after determining the encoding parameters of the video to be encoded according to the target Lagrangian multiplier and the target distortion value, use the encoding parameters to perform the encoding of the video to be encoded. Coding can improve the accuracy of semantic segmentation of reconstructed video, and improve the fidelity of reconstructed video, and can also reduce the coding bit rate of video, thereby shortening the time required for coding, improving coding speed, and improving coding efficiency.

It can be understood that, in a possible implementation manner, the pre-parameter of the video to be encoded may include a quantization parameter (Quantization Parameter, QP).

At this time, for S301, the determining the pre-parameters of the video to be encoded may include:

A quantization parameter of the coding unit in the video to be coded is determined; wherein, the coding unit may include at least one of the following: a picture, a slice (Slice), a sub-picture (Sub-picture), a tile (tile), and a coding block.

Here, the quantization parameter may be the quantization step size of the quantizer in the encoder, or the index number value corresponding to the quantization step size of the quantizer in the encoder.

For the determination of the first Lagrangian multiplier, in some embodiments, for S301, the determining the first Lagrangian multiplier according to the pre-parameter may include:

determining a first calculation model parameter, the first calculation model representing the correspondence between the first Lagrangian multiplier and the quantization parameter;

The first Lagrangian multiplier is determined according to the quantization parameter and the first calculation model.

It should be noted that, for the first calculation model, the determining the parameters of the first calculation model may include:

In the first calculation model, the first Lagrangian multiplier is set to a weighted value equal to the exponential power of the quantization parameter;

The first calculation model parameter includes a first exponential parameter indicating the exponential power and a first weighting coefficient indicating the weighting.

It should be noted that the first Lagrange multiplier is represented by λ _miou , and the quantization parameter is represented by QP, then the calculation formula of the first calculation model is as follows:

λ _miou = 2.30422*10 ^-8 *QP ^6.3612072 (5)

Here, Equation (5) is the first calculation model, which is used to represent the correspondence between the first Lagrangian multiplier and the quantization parameter. The first calculation model parameter may include a first index parameter (ie, 6.3612072 in the formula) and a first weighting coefficient (ie, 2.30422*10 ^-8 in the formula).

Further, the determination of the parameters of the first calculation model may be a preset value, or may be obtained by fitting according to a large amount of test data of a test video, which is not limited herein.

Optionally, in some embodiments, the method may further include:

The first calculation model parameter is set to a preset value.

Here, for the first calculation model parameter, the first index parameter may be set to 6.3612072, and the first weighting coefficient may be set to 2.30422*10 ^-8 . After the first index parameter and the first weighting coefficient are determined, the first calculation model can be obtained, so as to determine the first Lagrangian multiplier according to the quantization parameter.

Optionally, in some embodiments, the method may further include:

Based on the test video, using the first distortion metric, determine a first relationship function between the first distortion value and the bit rate of the test video, perform a derivative operation on the first relationship function, and determine the the derivative function of the first relation function;

Based on the test video, a second relationship function between the bit rate and the quantization parameter is determined, and the first calculation model parameter is determined according to the derivative function and the second relationship function.

It should be noted that the test video in this embodiment of the present application may be one or more test videos, for example, the test video may be multiple (eg, 59) test video sequences in a large-scale city scene dataset (Cityscapes).

In this way, using the first distortion metric criterion for the test video, the first relationship function between the first distortion value and the bit rate of the test video can be determined, and the first relationship function is shown in the following formula:

D _miou = 0.2299*R ^-0.7553 +3.848 (6)

Among them, D _miou represents the first distortion value, and R represents the code rate.

It should be noted that, in the RD curve shown in FIG. 1 , the value of λ _miou is the slope of the tangent to the curve (λ _miou >0), that is, the derivative function of the negative curve. Based on the test video, the average bit rate of the reconstructed video under different quantization parameters can be calculated according to the encoded files. At this time, a large amount of experimental test data can be used to determine the first distortion value (D _miou ) and the bit rate (R ), the fitting curve is shown in Figure 4.

The derivative function of the first relation function can be obtained by performing the derivative operation on the formula (6), and the derivative function is used to represent the corresponding relationship between the first Lagrangian multiplier and the code rate. Here, the derivative function is as follows,

λ _miou = 0.17364347*R ^-1.7553 (7)

In addition, according to the test video, a second relationship function between the bit rate (R) and the quantization parameter (QP) can be determined by fitting using a large amount of experimental test data. The fitting curve is shown in FIG. 5 . Here, the second relation function is as follows,

R=8278*QP ^-3.624 (8)

According to Equation (7) and Equation (8), substituting Equation (8) into Equation (7), the functional relationship between λ _miou and QP can be obtained, that is, the relationship between the first Lagrange multiplier and the quantization parameter The first calculation model shown in formula (5) is obtained; thus, the parameters of the first calculation model are determined, so that the first Lagrangian multiplier can be determined according to the quantization parameters.

It should be noted that the calculation formula of the first Lagrange multiplier can also be modified into other functional forms. For example, the functional relationship of the above formula (8) can also be fitted in the e-exponential form, then the corresponding formula (5), that is, the calculation formula of the first Lagrangian multiplier can also be in the e-exponential form of the QP. limited.

Regarding the determination of the second Lagrangian multiplier, in some embodiments, for S301, the determining the second Lagrangian multiplier according to the pre-parameter may include:

The second Lagrangian multiplier is determined according to a preset third calculation model; wherein, the third calculation model represents the corresponding relationship between the second Lagrangian multiplier and the quantization parameter.

It should be noted that, for the third calculation model, it may be constructed using the SSE distortion criterion in the prior art. In some embodiments, the determining the third calculation model parameter may include:

In the third calculation model, the second Lagrangian multiplier is set equal to a weighted value of an exponential power of 2;

The third calculation model parameter includes a third exponent parameter indicating the power of the exponent and a third weighting coefficient indicating the weighting, the third exponent parameter being related to a quantization parameter.

It should be noted that the second Lagrange multiplier is represented by λ _SSE , and the quantization parameter is represented by QP, then the calculation formula of the third calculation model is as follows:

λ _SSE = 0.57*2 ^(QP-12)/3 (9)

Here, Equation (9) is the third calculation model, which is used to represent the correspondence between the second Lagrangian multiplier and the quantization parameter. The third calculation model parameter may include a third index parameter (ie (QP-12)/3 in the formula) and a third weighting coefficient (ie, 0.57 in the formula), and the value of the third index parameter is related to the quantization parameter (QP) related.

It should also be noted that, for the determination of the parameters of the third calculation model, it may be a preset value; it may also be obtained by fitting according to a large amount of experimental test data of the test video, which is not limited here.

Further, whether it is the first Lagrangian multiplier or the second Lagrangian multiplier, both are related to the quantization parameter. Regarding the determination of the quantization parameter, in some embodiments, the determining the quantization parameter of the coding unit in the to-be-coded video may include: using a rate control manner to determine the quantization parameter of the coding unit in the to-be-coded video.

Alternatively, in some embodiments, the determining the quantization parameter of the coding unit in the to-be-coded video may include: setting the quantization parameter to a preset value.

That is, for the quantization parameter in the video to be encoded, on the one hand, the quantization parameter may be set to a preset value, such as 22, 27, 32, 37, and so on. On the other hand, the quantization parameter can also be determined by using the code rate control method; specifically, the current code stream control algorithm mainly controls the code stream by adjusting the size of the quantization parameter; in this way, by controlling the size of the code rate, also The required quantization parameters can be obtained.

In another possible implementation, the pre-parameters of the video to be encoded may include a quantization parameter and a target bit rate.

Determine the quantization parameter and target code rate of the coding unit in the video to be coded; wherein, the coding unit may include at least one of the following: a picture, a slice (Slice), a sub-picture (Sub-picture), a tile (tile), encoding block.

determining a second calculation model parameter, the second calculation model representing the correspondence between the first Lagrange multiplier and the code rate;

The first Lagrangian multiplier is determined according to the target code rate and the second calculation model.

It should be noted that the determining the target bit rate of the coding unit in the to-be-encoded video may include: determining the target bit-rate of the encoding unit in the to-be-encoded video by using a bit allocation method.

That is to say, the target bit rate for the coding unit in the video to be encoded can be obtained by way of bit allocation. Here, the target bit rate can be dynamically adjusted according to the number of bits consumed by the coding unit in the video to be encoded, so as to ensure real-time and accurate bit allocation.

It should also be noted that, for the second calculation model, the determining the parameters of the second calculation model may include:

In the second calculation model, the first Lagrangian multiplier is set to a weighted value equal to the exponential power of the target code rate;

The second calculation model parameter includes a second exponent parameter indicating the power of the exponent and a second weighting coefficient indicating the weighting.

It should be noted that the first Lagrange multiplier is represented by λ _miou , and the code rate is represented by R, then the calculation formula of the second calculation model is as follows:

_λmiou = 0.17364347*R ^-1.7553 (10)

Here, Equation (10) is the second calculation model, which is used to represent the correspondence between the first Lagrangian multiplier and the code rate. Wherein, the second calculation model parameter may include a second index parameter (ie -1.7553 in the formula) and a second weighting coefficient (ie, 0.17364347 in the formula).

Further, the determination of the parameters of the second calculation model may be a preset value, or may be obtained by fitting according to a large amount of experimental test data of the test video, which is not limited herein.

Optionally, in some embodiments, the method may further include:

The second calculation model parameter is set to a preset value.

Here, for the second calculation model parameter, the second index parameter may be set to -1.7553, and the second weighting coefficient may be set to 0.17364347. After the second index parameter and the second weighting coefficient are determined, the second calculation model can be obtained, so as to determine the first Lagrangian multiplier according to the target code rate.

Optionally, in some embodiments, the method may further include:

Based on the test video, using the first distortion metric, determine a first relationship function between the first distortion value and the bit rate of the test video;

A derivative operation is performed on the first relational function to determine the second calculation model parameter.

It should be noted that the test video here may also be one or more test videos, for example, the test video may be multiple (eg, 59) test video sequences in a large-scale urban scene dataset (Cityscapes).

In this way, using the first distortion metric criterion for the test video, the first relationship function between the first distortion value and the bit rate of the test video can be determined, and the first relationship function is shown in the above formula (6). Then the derivative operation is performed on the formula (6), the derivative function of the first relation function can be obtained, and the derivative function is used to represent the corresponding relationship between the first Lagrange multiplier and the code rate, and the formula ( 10) the second calculation model shown; thus, the parameters of the second calculation model are also determined, so that the first Lagrangian multiplier can be determined according to the quantization parameters.

It should be noted that, when determining the first relationship function, based on the RD curve shown in FIG. 1 , the value of λ _miou is the slope of the tangent of the curve (λ _miou >0), that is, the derivative function of the negative curve. At this time, based on the test video, the average bit rate of the reconstructed video under different quantization parameters can be calculated according to the encoded files. At this time, a large amount of experimental test data can be used to determine the first distortion value (D _miou ) and the bit rate by fitting (R), the fitting curve is shown in Fig. 4 to obtain the first relation function.

It should be noted that, the second Lagrange multiplier is represented by λ _SSE , and the quantization parameter is represented by QP, then the calculation formula of the third calculation model is shown in the above formula (9).

In this way, for the rate-distortion optimization algorithm of the embodiment of the present application, in addition to determining the first Lagrangian multiplier and the second Lagrangian multiplier, the distortion value also needs to be determined. Here, it is assumed that the first distortion value is obtained based on the first distortion metric criterion, and the second distortion value is obtained based on the second distortion metric criterion.

It should be noted that the first distortion metric criterion may be a semantic distortion metric criterion. Taking the H.266/VVC standard encoder as an example, in order to improve the semantic segmentation accuracy of reconstructed video, it is first necessary to define a semantic distortion metric. Specifically, multiple quantization parameters can be selected, and then VVC encoding is performed on multiple (for example, 59) test video sequences in the large-scale urban scene dataset (Cityscapes) under the condition of random access (RA). Video semantic segmentation is performed on the video before and after encoding, so that the accuracy of the semantic segmentation result can be calculated according to the corresponding annotation data.

Among them, measuring the accuracy of semantic segmentation can usually be expressed by mean Intersection over Union (mIoU), where mIoU refers to the average value of Intersection over Union (IoU) of all categories. Here, IoU is used as a detection evaluation function, which is simply the overlap rate of the generated prediction window and the real window, that is, the intersection of the detection result area (Detection Result) and the ground truth area (Ground Truth) and the union of the two. ratio, that is, semantic accuracy (represented by IoU).

In a possible implementation manner, for S303, the determining the first distortion value according to the first distortion metric criterion may include:

Based on the test video, semantically segment the test video to determine the semantic accuracy of one or more categories;

determining the target semantic accuracy according to the semantic accuracy of the one or more categories;

Distortion measurement is performed on the semantic accuracy of the target by using the fourth calculation model to obtain the first distortion value.

Further, determining the target semantic accuracy according to the semantic accuracy of one or more categories may include:

A weighted sum of the semantic accuracy of the one or more categories is calculated, and the resulting weighted sum is determined as the target semantic accuracy.

It should be noted that, to calculate the weighted sum of the semantic accuracy of one or more categories, a specific implementation is to set the weight to 1, in this case, the average of the semantic accuracy of the one or more categories is calculated. value, and the obtained average is determined as the target semantic accuracy.

Exemplarily, in the scenario of semantic segmentation, the two sets can represent the predicted value and the real value respectively, that is, A _pred is the predicted segmentation result area, and A _true is the labeled segmentation result area; the semantic accuracy of each category It can be represented by IoU, and the calculation of IoU is as follows.

After obtaining the IoU of each category, the target semantic accuracy can be obtained by taking the average value, which can be expressed as mIoU. Here, mIoU refers to the average IoU of all categories, and its value ranges from 0 to 1; the larger the value, the higher the semantic accuracy. For IoU of n classes, the calculation of mIoU is as follows.

Among them, IoU _i represents the IoU of the i-th category, i=1,...,n, and n represents the number of all categories.

It should also be noted that, for the fourth calculation model, the use of the fourth calculation model to perform distortion measurement on the target semantic accuracy to obtain the first distortion value includes:

determining the fourth calculation model parameter, the fourth calculation model representing the correspondence between the first distortion value and the target semantic accuracy;

The first distortion value is obtained according to the target semantic accuracy and the fourth calculation model.

Further, the determining of the fourth calculation model parameter may include:

In the fourth calculation model, the first distortion value is set as a weighted value equal to the logarithm of the target semantic accuracy;

The fourth calculation model parameter includes a base parameter indicating the logarithm and a fourth weighting coefficient parameter indicating the weighting.

Wherein, the fourth calculation model parameter is set as a preset value.

Here, for the scene of video semantic segmentation, a semantic distortion metric (ie, the first distortion value, represented by D _miou ) can be defined, and its calculation formula is shown in the following formula (13).

D _miou = -10*ln(mIoU) (13)

The formula (13) is the fourth calculation model, which is used to represent the correspondence between the first distortion value (D _miou ) and the target semantic accuracy (mIoU). The fourth calculation model parameter may include a base parameter (that is, the base of ln in the formula is 10) and a fourth weighting coefficient parameter (that is, -10 in the formula). (mIoU) preset magnification.

It should also be noted that the determination of the parameters of the fourth calculation model may be a preset value, or may be obtained by fitting according to a large amount of test data of a test video, which is not limited here.

In addition, according to equation (13), the natural logarithmic function maps the finite mIoU value to an infinite range, and the multiplied coefficient amplifies the obtained value to match the distortion size in the rate-distortion optimization algorithm. In this way, when mIoU tends to 0, D _miou tends to infinity; when mIoU tends to 1, D _miou tends to 0.

In another possible implementation, the first distortion value may also be related to the target mean square error of the coding unit in the video to be encoded. The determining the first distortion value according to the first distortion metric criterion may include:

determining a fifth calculation model parameter, the fifth calculation model representing a third relationship function between the first distortion value and the mean square error;

The target mean square error of the coding unit in the video to be encoded is determined, and the first distortion value is determined according to the target mean square error and the fifth calculation model.

It should be noted that the reconstructed video is obtained by performing video decoding and reconstruction on the encoded video. After encoding the test video by using the quantization parameter to obtain the encoded video under the quantization parameter, video reconstruction can be performed on the encoded video under the quantization parameter to obtain the reconstructed video under the quantization parameter; according to the reconstructed video and the original video, it is possible to The mean squared error (Mean Squared Error, MSE) of the reconstructed video under the quantization parameter is obtained. Here, the mean squared error refers to the expected value of the square of the difference between the predicted value of the parameter and the actual value of the parameter. MSE can evaluate the degree of change of the data. The smaller the value of MSE, the better the accuracy of the prediction model in describing the experimental data.

It should also be noted that, for the fifth calculation model, the determining the parameters of the fifth calculation model may include:

In the fifth calculation model, the first distortion value is set equal to the product of the target mean square error and the first parameter factor and the sum value of the second parameter factor is superimposed;

The fifth calculation model parameter includes the first parameter factor and the second parameter factor.

It should be noted that the first distortion value is represented by D _miou , and the mean square error is represented by MSE, then the calculation formula of the fifth calculation model is as follows:

_Dmiou = 0.6276*MSE+3.48 (14)

Here, Equation (14) is the fifth calculation model, which is used to represent the corresponding relationship between the first distortion value and the mean square error. The fifth calculation model parameter may include a first parameter factor (ie, 0.6276 in the formula) and a second parameter factor (ie, 3.48 in the formula).

Further, the determination of the parameter of the fifth calculation model may be a preset value, or may be obtained by fitting according to a large amount of experimental test data of the test video, which is not limited herein.

Optionally, in some embodiments, the method may further include:

The fifth calculation model parameter is set to a preset value.

Here, for the fifth calculation model parameter, the first parameter factor may be set to 0.6276, and the second parameter factor may be set to 3.48. After the first parameter factor and the second parameter factor are determined, a fifth calculation model can be obtained, so as to determine the first distortion value according to the target mean square error.

Optionally, in some embodiments, the method may further include:

Based on the test video, using the first distortion metric, determine a third relationship function between the first distortion value and the mean square error of the test video;

The fifth calculation model parameter is determined according to the third relational function.

In this way, the first distortion metric is used for the test video, and the average MSE of the reconstructed video under different quantization parameters is counted according to the encoded files. At this time, a large amount of experimental test data can be used to determine the first distortion value (D _miou ) and MSE, the fitting curve is shown in FIG. 6 , and the fitting curve is linear, and a fifth calculation model can be obtained.

It should be noted that the determining the pre-parameter of the video to be encoded may further include: determining the target mean square error of the coding units in the video to be encoded. In this way, after the fifth calculation model is obtained, the first distortion value can be determined according to the target mean square error and the fifth calculation model shown in formula (14).

Besides, for the determination of the first distortion value, other methods may also be used, such as the difference between the video semantic segmentation results before and after encoding. Here, the first mIoU value is determined according to the difference between the first mIoU value and the second mIoU value according to the semantic segmentation result (the first mIoU value) of the video before encoding and the semantic segmentation result (the second mIoU value) of the encoded video. Distortion value; the embodiment of the present application does not make any specific limitation.

It should also be noted that the second distortion metric criterion may be a numerical error criterion. In some embodiments, for S303, the determining the second distortion value according to the second distortion metric criterion may include:

determining a reconstruction value of a coding unit in the video, wherein the coding unit includes at least one of the following: a picture, a slice (Slice), a sub-picture (Sub-picture), a tile (tile), and a coding block;

Based on the numerical error criterion, the second distortion value is determined according to the reconstructed value and the original value of the coding unit;

Wherein, the numerical error criterion is one of the following: Sum of Absolute Differences (SAD) criterion, Mean Absolute Deviation (MAD) criterion, and Sum of Square Error (SSE) criterion , the mean-square error (MSE) criterion. It should be noted that the numerical error criterion is not limited to these criteria, and may also be other criteria, which are not specifically limited in the embodiments of the present application.

It should be noted that, taking the numerical error criterion as the SSE criterion as an example, at this time, the second distortion value is represented by SSE, and its calculation formula is as follows:

Understandably, after obtaining the first Lagrangian multiplier (λ _miou ) and the second Lagrangian multiplier (λ _SSE ), as well as the first distortion value (D _miou ) and the second distortion value (SSE) , the target Lagrangian multiplier (represented by λ) can be calculated by λ _miou and λ _SSE , and the target distortion value (represented by D) can be calculated by D _miou and SSE.

In some embodiments, for S302, the determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier may include:

determining a first preset parameter; wherein, the first preset parameter is used to control weight values corresponding to the first Lagrangian multiplier and the second Lagrangian multiplier;

The first Lagrangian multiplier and the second Lagrangian multiplier are weighted and calculated by using the first preset parameter to obtain the target Lagrangian multiplier.

It should be noted that the first preset parameter can control the weight values corresponding to the first Lagrangian multiplier and the second Lagrangian multiplier. Specifically, in some embodiments, the determining the first preset parameter may include:

The first preset parameter is set according to the configuration information of the encoder.

Further, the method can also include:

When the configuration information of the encoder indicates that the first preset parameter is equal to k, the target Lagrangian multiplier is set equal to the first Lagrangian multiplier and the second Lagrangian A weighted sum of Lagrangian multipliers, where k is any value greater than or equal to 0 and less than or equal to 1, the weighting coefficient of the first Lagrange multiplier is set to be equal to 1–k, and the second Lagrangian The weighting factor of the Grange multiplier is set equal to k.

It should be noted that, assuming that the first preset parameter is represented by k, then when the weighting coefficient of the second Lagrangian multiplier is set to k, the weighting coefficient of the first Lagrangian multiplier can be set to 1-k ; in this way, the calculation formula of the target Lagrange multiplier is as follows,

λ=k*λ _SSE +(1-k)*λ _miou (16)

where λ represents the target Lagrangian multiplier, λ _miou represents the first Lagrangian multiplier, λ _SSE represents the second Lagrangian multiplier; 1-k and k represent the first Lagrangian multiplier, respectively The weighting coefficients of the second Lagrangian and the second Lagrange multiplier.

It should be noted that k may be a constant within the range of 0 to 1. Here, the value of k may be equal to 0.5, may also be 0.75, or may be a variable value (for example, obtained by performing a certain calculation on the current coding unit), which is not specifically limited in this embodiment of the present application. Typically, a typical value of k can be equal to 0.75.

In some embodiments, for S304, the determining a target distortion value according to the first distortion value and the second distortion value includes:

determining a second preset parameter; wherein the second preset parameter is used to control the weight value corresponding to the first distortion value and the second distortion value;

The first distortion value and the second distortion value are weighted and calculated by using the second preset parameter to obtain the target distortion value.

It should be noted that the second preset parameter can control the weight values corresponding to the first distortion value and the second distortion value. Specifically, in some embodiments, the determining the second preset parameter may include:

The second preset parameter is set according to the configuration information of the encoder.

Further, the method can also include:

When the configuration information of the encoder indicates that the second preset parameter is equal to m, the target distortion value is set equal to the weighted sum of the first distortion value and the second distortion value, where m is Any value greater than or equal to 0 and less than or equal to 1, the weighting coefficient of the first distortion value is set equal to 1−m, and the weighting coefficient of the second distortion value is set equal to m.

It should be noted that, assuming that the second preset parameter is represented by m, then when the weighting coefficient of the second distortion value is set to m, the weighting coefficient of the first distortion value can be set to 1-m; in this way, the calculation of the target distortion value The formula is as follows,

D=m*SSE+(1-m)*D _miou (17)

Wherein, D represents the target distortion value, D _miou represents the first distortion value, and SSE represents the second distortion value; 1-m and m represent the weighting coefficients of the first distortion value and the second distortion value, respectively.

It should be noted that m may be a constant within the range of 0 to 1. Here, the value of m may be equal to 0.5, may also be 0.75, or may be a variable value (for example, obtained by performing a certain calculation on the current coding unit), which is not specifically limited in this embodiment of the present application. Typically, a typical value of m can be equal to 0.75.

It should also be noted that the values of the first preset parameter and the second preset parameter may be set to be the same or different. Generally speaking, the values of the first preset parameter and the second preset parameter are the same, for example, both can be represented by θ. At this time, the calculation formula of the target Lagrange multiplier and the target distortion value can be as follows,

λ=θ*λ _SSE +(1-θ)*λ _miou (18)

D=θ*SSE+(1-θ)*D _miou (19)

Among them, θ is a constant in the range of 0 to 1, which can not only be used to control the respective weights of the first Lagrangian multiplier and the second Lagrangian multiplier, but also can be used to control the semantic distortion (ie The first distortion value) and the fidelity distortion (ie, the second distortion value) respectively occupy the size of the weight. Optionally, θ is typically set to 0.75.

In short, because the existing video coding methods are not well suited for machine video-oriented applications. In the embodiment of the present application, λ _miou represents the machine-oriented quality, λ _SSE represents the subjective quality viewed by the human eye, and θ represents the subjective quality viewed by the human eye and the machine-oriented quality that can be adjusted between . For example, if θ is equal to 1, then the target distortion value at this time is entirely the subjective quality viewed by the human eye; if θ is equal to 0, then the target distortion value at this time is entirely the quality for the machine.

Here, the value of θ can be set through the configuration information of the encoder. Specifically, one implementation is to set directly according to the application requirements, such as the cases of 0 and 1 described above; another implementation is to set the encoder to work in different ways, for example, if it is set to work with the human eye , the encoder will set the value of θ to 1; if it is set to the working mode of the machine, the encoder will set the value of θ to 0; if it is set to human-machine hybrid, the encoder will adaptively determine the value of θ For example, in the preprocessing stage, the pre-encoding method is used to pre-encode the video to be encoded, and then the value of θ is estimated from the pre-encoding result.

It should also be noted that, after obtaining the target Lagrangian multiplier and the target distortion value, the encoding parameters of the video to be encoded can be determined according to the target Lagrangian multiplier and the target distortion value, so as to encode the video to be encoded. . In some embodiments, for S305, the determining the encoding parameter of the video to be encoded by using the target Lagrangian multiplier and the target distortion value may include:

constructing a rate-distortion cost function based on the target Lagrangian multiplier and the target distortion value;

Using one or more candidate encoding parameters to pre-encode the video to be encoded, to determine the rate-distortion cost value corresponding to the one or more candidate encoding parameters;

A minimum rate-distortion cost value is selected from the determined rate-distortion cost values, and a candidate encoding parameter corresponding to the minimum rate-distortion cost value is determined as the encoding parameter of the video to be encoded.

Here, the encoding parameters include at least a parameter indicating a division manner of the to-be-encoded video and a parameter for constructing a prediction value of an encoded block in the to-be-encoded video.

Further, in some embodiments, the encoding the video to be encoded may include: writing the encoding parameter into a code stream.

It should be pointed out that, according to the target Lagrange multiplier and the target distortion value, the rate-distortion cost function can be constructed; then one or more candidate encoding parameters are used to pre-encode the video to be encoded, so as to determine this one. or the rate-distortion cost value corresponding to multiple candidate encoding parameters; then select the minimum rate-distortion cost value from the determined rate-distortion cost value, and determine the candidate encoding parameter corresponding to the minimum rate-distortion cost value as the encoding parameter of the video to be encoded, The coding parameters determined at this time are the optimal coding parameters (with the lowest rate-distortion cost), and then coding is performed; in this process, the coding parameters can also be written into the code stream for transmission from the encoder to the decoder, using to restore the original to-be-encoded video on the decoder side.

In the embodiment of the present application, taking the VVC standard encoder as an example, in order to improve the semantic accuracy of the reconstructed video while ensuring the fidelity of the video, and to meet the subjective viewing requirements in the human-machine vision scene, the embodiment of the present application uses the VVC The distortion criterion in the rate-distortion optimization process is modified to the weight of the semantic distortion D _miou and the fidelity distortion SSE, as shown in the above equation (17) or equation (19); the corresponding target Lagrange multiplier is modified to λ The weighting of _miou and λ _SSE is shown in the above formula (16) or formula (18), so that the rate-distortion process of the VVC standard encoder can be optimized according to the rate-distortion optimization algorithm of multi-distortion criteria for human-machine vision. , to improve the semantic segmentation accuracy of reconstructed video at a certain bit rate while maintaining good fidelity performance.

Exemplarily, based on the VVC reference software test platform (VVC TEST MODE, VTM), it is assumed that after the implementation on VTM7.1, different QPs are selected at this time, and the test video sequences in the large-scale urban scene dataset are tested under RA conditions. encoding, and perform a semantic segmentation test on the reconstructed video.

First, select different QPs, encode the test video through the VVC standard encoder, and obtain the encoding bit rate and PSNR of the reconstructed video, and perform semantic segmentation on the reconstructed video and calculate the accuracy of the segmentation. Then, the rate-distortion process in the VVC is optimized according to the video encoding method of the embodiment of the present application, and then different QPs are selected, and the test video is encoded by the optimized encoder to obtain the encoding bit rate, and the encoding The resulting reconstructed video is semantically segmented and the segmentation accuracy is calculated. According to the experimental results in these two cases, the BD-rate and BD-miou of the reconstructed video compared with the VVC standard encoder can be calculated. Performance in terms of video semantic accuracy at bit rate.

Specifically, when the QPs are 22, 27, 32, and 37, according to the experimental results, the BD-miou and BD-rate of the reconstructed video of the embodiment of the present application compared with the reconstructed video of the VVC standard encoder can be calculated. Table 1 shows the performance of the video coding method of the application embodiment in terms of semantic accuracy. Here, BD-miou represents the improvement of the semantic accuracy of the reconstructed video under the same bit rate. BD-miou is greater than 0, indicating that the semantic accuracy is improved; BD-miou is less than 0, indicating that the semantic accuracy has decreased; BD-rate represents For the increase of the coding rate under the same semantic accuracy, if BD-rate is greater than 0, it indicates that the coding rate increases; if BD-rate is less than 0, it indicates that the coding rate decreases, that is, the coding efficiency is improved.

In terms of video fidelity, the PSNR and coding rate of the reconstructed video are calculated from the encoded files. According to the experimental results, the BD-rate and BD-PSNR of the reconstructed video compared with the VVC standard encoder are obtained, which can be measured in this application. The video encoding method of the embodiment compares the performance of the VVC standard encoder in terms of video fidelity with the same bit rate.

Specifically, calculating the BD-PSNR and BD-rate of the reconstructed video of the embodiment of the present application compared to the reconstructed video of the VVC standard encoder according to the experimental results, can measure the performance of the video coding method of the embodiment of the present application in terms of fidelity , and the experimental results are shown in Table 2. Here, BD-PSNR represents the increase of reconstructed video fidelity under the same bit rate. BD-PSNR is greater than 0, indicating that the fidelity has increased; BD-PSNR is less than 0, indicating that the fidelity has decreased; BD-rate represents the same fidelity The increase of the coding rate in the case of true degree, if BD-rate is greater than 0, it indicates that the code rate increases; if BD-rate is less than 0, it indicates that the code rate decreases, that is, the coding efficiency is improved.

Table 1

	BD-miouBD-miou	BD-rateBD-rate
θ＝0.75θ=0.75	0.01120.0112	-24.8673-24.8673

Table 2

	BD-PSNRBD-PSNR	BD-rateBD-rate
θ＝0.75θ=0.75	0.03160.0316	-1.0836-1.0836

Further, according to the above-mentioned experimental results, the following technical beneficial effects are obtained by using the video coding method of the embodiment of the present application:

According to Table 1, it can be obtained that the BD-miou obtained according to the experimental results is 0.0112, indicating that the video coding method of the embodiment of the present application can improve the accuracy of semantic segmentation of reconstructed video under the same bit rate. In addition, in the case of a lower code rate, the overall semantic effect of the embodiments of the present application is better than that of the VVC standard encoder. That is to say, the embodiments of the present application can improve the semantic segmentation accuracy of the reconstructed video under the condition of the same bit rate.

It can also be obtained from Table 1 that the BD-rate obtained according to the experimental results is -24.8673, indicating that the video coding method of the embodiment of the present application can reduce the video coding bit rate under the same semantic accuracy. That is to say, the embodiments of the present application can reduce the code rate with the same semantic accuracy.

According to Table 2, it can be obtained that the BD-PSNR obtained according to the experimental results is 0.0316, indicating that the video coding method of the embodiment of the present application can improve the fidelity of the reconstructed video under the condition of the same bit rate. That is to say, the embodiments of the present application can improve the fidelity of the reconstructed video under the condition of the same bit rate.

According to Table 2, it can also be obtained that the BD-rate obtained according to the experimental results is -1.0836, indicating that the video encoding method of the embodiment of the present application can reduce the video encoding bit rate under the same fidelity. That is to say, the embodiments of the present application can reduce the code rate with the same fidelity.

In addition, after using the video encoding method of the embodiment of the present application, the PSNR performance of the reconstructed video is basically not degraded compared with the VVC standard encoder. In the case of a lower bit rate, the subjective performance of the embodiment of the present application is better than that of VVC, which shows that the video coding method of the embodiment of the present application can ensure the fidelity of the reconstructed video while improving the semantic accuracy, and satisfy the subjective performance of the video. Watch demand. That is to say, the embodiments of the present application can ensure the fidelity of the video while improving the semantic effect.

It should also be noted that the video encoding method of the embodiment of the present application optimizes the rate-distortion process in VVC, and does not change the video encoding and decoding process and code stream structure, and therefore does not increase the complexity of encoding and decoding. Moreover, the video coding method of the embodiment of the present application can also reduce the coding bit rate of the video, thereby shortening the time required for coding and improving the coding speed.

That is to say, for the human-machine vision application scenario of video semantic segmentation, the embodiment of the present application defines a semantic distortion metric, and derives the corresponding first Lagrangian multiplier, through preset parameters (including the first The preset parameter and the second preset parameter) adjust the weights of semantic distortion and SSE distortion, as well as the weights of the first Lagrangian multiplier and the second Lagrangian multiplier, so as to optimize the rate-distortion process of video coding , so that the semantic segmentation accuracy of the reconstructed video can be improved under the condition of a certain bit rate, and a good fidelity performance can also be maintained.

This embodiment provides a video encoding method, which is applied to an encoder. By determining the pre-parameters of the video to be encoded, the first Lagrangian multiplier and the second Lagrangian multiplier are determined according to the pre-parameters; according to the first Lagrangian multiplier and the second Lagrangian multiplier Lagrangian multiplier, determining the target Lagrangian multiplier; determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion includes a semantic distortion metric criterion; according to the second distortion metric criterion , determine a second distortion value, wherein the second distortion measurement criterion includes a numerical error measurement criterion; determine a target distortion value according to the first distortion value and the second distortion value; use the target Lagrangian The multiplier and the target distortion value are used to determine the encoding parameters of the to-be-encoded video, and the to-be-encoded video is encoded. In this way, the first distortion metric based on the semantic distortion metric and the second distortion metric based on the numerical error metric are comprehensively considered in video coding for rate-distortion optimization, which can be well adapted to machine vision and human-machine vision-oriented applications. scene, and at a certain bit rate, it can improve the semantic segmentation accuracy of the reconstructed video, while maintaining good fidelity performance, thereby improving coding efficiency.

Based on the same inventive concept as the foregoing embodiments, see FIG. 7 , which shows a schematic structural diagram of the composition of an encoder 70 provided by an embodiment of the present application. As shown in FIG. 70, the encoder 70 may include: a determination unit 701, a calculation unit 702 and an encoding unit 703; wherein,

A determining unit 701, configured to determine pre-parameters of the video to be encoded, and determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameters;

a computing unit 702, configured to determine a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier;

The determining unit 701 is further configured to determine a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion includes a semantic distortion metric criterion; and determine a second distortion value according to a second distortion metric criterion, wherein , the second distortion metric criterion includes a numerical error metric criterion;

The calculation unit 702 is further configured to determine a target distortion value according to the first distortion value and the second distortion value;

The encoding unit 703 is configured to use the target Lagrangian multiplier and the target distortion value to determine encoding parameters of the video to be encoded, and to encode the video to be encoded.

In some embodiments, the pre-parameter includes a quantization parameter, and the determining unit 701 is further configured to determine a quantization parameter of an encoding unit in the to-be-encoded video, wherein the encoding unit includes at least one of the following: image, slice , subimages, tiles, encoded blocks.

In some embodiments, the determining unit 701 is further configured to determine a first calculation model parameter, where the first calculation model represents the corresponding relationship between the first Lagrangian multiplier and the quantization parameter; and according to the The quantization parameter and the first calculation model determine the first Lagrangian multiplier.

In some embodiments, the determining unit 701 is further configured to, in the first calculation model, set the first Lagrangian multiplier to a weighted value equal to the exponential power of the quantization parameter; wherein, The first calculation model parameter includes a first exponential parameter indicating the exponential power and a first weighting coefficient indicating the weighting.

In some embodiments, the determining unit 701 is further configured to set the parameter of the first calculation model to a preset value.

In some embodiments, the determining unit 701 is further configured to, based on the test video, use the first distortion metric criterion to determine a first relationship function between the first distortion value and the bit rate of the test video, for The first relationship function performs a derivative operation to determine a derivative function of the first relationship function; and based on the test video, determine a second relationship function between the code rate and the quantization parameter, according to the derivative function and the second relational function to determine the first computational model parameter.

In some embodiments, the pre-parameter includes a quantization parameter and a target code rate, and the determining unit 701 is further configured to determine a quantization parameter and a target code rate of a coding unit in the to-be-coded video, wherein the coding unit includes the following At least one of: image, tile, subimage, tile, encoded block.

In some embodiments, the determining unit 701 is further configured to determine a second calculation model parameter, where the second calculation model represents the corresponding relationship between the first Lagrangian multiplier and the code rate; and according to the The target code rate and the second calculation model determine the first Lagrangian multiplier.

In some embodiments, the determining unit 701 is further configured to, in the second calculation model, set the first Lagrangian multiplier to a weighted value equal to the exponential power of the target code rate; wherein , the second calculation model parameter includes a second exponent parameter indicating the power of the exponent and a second weighting coefficient indicating the weighting.

In some embodiments, the determining unit 701 is further configured to set the parameter of the second calculation model to a preset value.

In some embodiments, the determining unit 701 is further configured to, based on the test video, use the first distortion metric criterion to determine a first relationship function between the first distortion value and the bit rate of the test video; and A derivative operation is performed on the first relational function to determine the second calculation model parameter.

In some embodiments, the determining unit 701 is further configured to determine the target bit rate of the coding unit in the to-be-coded video by using a bit allocation method.

In some embodiments, the determining unit 701 is further configured to determine the second Lagrangian multiplier according to a preset third calculation model; wherein the third calculation model represents the second Lagrangian Correspondence between day multipliers and quantization parameters.

In some embodiments, the determining unit 701 is further configured to use a rate control manner to determine the quantization parameter of the coding unit in the to-be-coded video.

In some embodiments, the determining unit 701 is further configured to set the quantization parameter to a preset value.

In some embodiments, the determining unit 701 is further configured to perform semantic segmentation on the test video based on the test video, and determine the semantic accuracy of one or more categories; and according to the semantic accuracy of the one or more categories , to determine the target semantic accuracy;

The calculation unit 702 is further configured to use a fourth calculation model to perform a distortion measurement on the target semantic accuracy to obtain the first distortion value.

In some embodiments, the calculating unit 702 is further configured to calculate a weighted sum of the semantic accuracy of the one or more categories, and determine the obtained weighted sum as the target semantic accuracy.

In some embodiments, the determining unit 701 is further configured to determine the fourth calculation model parameter, where the fourth calculation model represents the correspondence between the first distortion value and the target semantic accuracy; and according to The target semantic accuracy and the fourth calculation model are used to obtain the first distortion value.

In some embodiments, the determining unit 701 is further configured to, in the fourth calculation model, set the first distortion value to be a weighted value equal to the logarithm of the target semantic accuracy; Four computational model parameters include a base parameter indicating the logarithm and a fourth weighting coefficient parameter indicating the weighting.

In some embodiments, the determining unit 701 is further configured to set the fourth calculation model parameter to a preset value.

In some embodiments, the determining unit 701 is further configured to determine a fifth calculation model parameter, where the fifth calculation model represents a third relationship function between the first distortion value and the mean square error; The target mean square error of the coding unit in the encoded video, and the first distortion value is determined according to the target mean square error and the fifth calculation model.

In some embodiments, the determining unit 701 is further configured to, in the fifth calculation model, set the first distortion value equal to the product of the target mean square error and the first parameter factor and superimpose the second parameter The sum of factors; wherein, the fifth calculation model parameter includes the first parameter factor and the second parameter factor.

In some embodiments, the determining unit 701 is further configured to set the parameter of the fifth calculation model to a preset value.

In some embodiments, the determining unit 701 is further configured to, based on the test video, use the first distortion metric criterion to determine a third relationship function between the first distortion value and the mean square error of the test video; and determining the fifth calculation model parameter according to the third relational function.

In some embodiments, the determining unit 701 is further configured to determine a reconstruction value of a coding unit in the video, wherein the coding unit includes at least one of the following: an image, a slice, a sub-image, a tile, and a coding block;

The calculation unit 702 is further configured to, based on the numerical error criterion, determine the second distortion value according to the reconstructed value and the original value of the coding unit; wherein the numerical error criterion is one of the following: absolute error and Criterion, Mean Absolute Error Criterion, Error Sum of Squares Criterion, Mean Squared Error Criterion.

In some embodiments, the determining unit 701 is further configured to determine a first preset parameter; wherein the first preset parameter is used to control the first Lagrangian multiplier and the second Lagrangian The weight value corresponding to the daily multiplier;

The calculation unit 702 is further configured to perform weighted calculation on the first Lagrangian multiplier and the second Lagrangian multiplier by using the first preset parameter to obtain the target Lagrangian multiplier son.

In some embodiments, referring to FIG. 7 , the encoder 70 may further include a configuration unit 704 configured to set the first preset parameter according to the configuration information of the encoder.

In some embodiments, the configuration unit 704 is further configured to, when the configuration information of the encoder indicates that the first preset parameter is equal to k, set the target Lagrangian multiplier to be equal to the first The weighted sum of the Lagrangian multiplier and the second Lagrangian multiplier, where k is any value greater than or equal to 0 and less than or equal to 1, and the weighted sum of the first Lagrangian multiplier The coefficients are set equal to 1-k, and the weighting coefficients of the second Lagrangian multipliers are set equal to k.

In some embodiments, the value of k is equal to 0.75.

In some embodiments, the determining unit 701 is further configured to determine a second preset parameter; wherein the second preset parameter is used to control the weight value corresponding to the first distortion value and the second distortion value;

The calculation unit 702 is further configured to perform weighted calculation on the first distortion value and the second distortion value by using the second preset parameter to obtain the target distortion value.

In some embodiments, the configuration unit 704 is further configured to set the second preset parameter according to the configuration information of the encoder.

In some embodiments, the configuration unit 704 is further configured to, when the configuration information of the encoder indicates that the second preset parameter is equal to m, set the target distortion value to be equal to the first distortion value and the The weighted sum of the second distortion value, where m is any value greater than or equal to 0 and less than or equal to 1, the weighting coefficient of the first distortion value is set to be equal to 1−m, the weighting of the second distortion value The coefficients are set equal to m.

In some embodiments, the value of m is equal to 0.75.

In some embodiments, the determining unit 701 is further configured to construct a rate-distortion cost function based on the target Lagrangian multiplier and the target distortion value; and use one or more candidate encoding parameters to encode the to-be-encoded The video is subjected to precoding processing to determine a rate-distortion cost value corresponding to the one or more candidate encoding parameters; and a minimum rate-distortion cost value is selected from the determined rate-distortion cost values, and the minimum rate-distortion cost value corresponds to The candidate encoding parameter of is determined as the encoding parameter of the video to be encoded.

In some embodiments, the encoding parameters include at least a parameter indicating how the video to be encoded is divided and a parameter constructing a predictor of an encoded block in the video to be encoded.

In some embodiments, referring to FIG. 7 , the encoder 70 may further include a writing unit 705 configured to write the encoding parameters into the code stream.

It can be understood that, in the embodiments of the present application, a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course, it may also be a module, and it may also be non-modular. Moreover, each component in this embodiment may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software function modules.

If the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially or Said part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium and includes several instructions for making a computer device (which can be It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.

Therefore, an embodiment of the present application provides a computer storage medium, which is applied to the encoder 70, where the computer storage medium stores a computer program, and when the computer program is executed by at least one processor, implements any one of the foregoing embodiments. steps of the method.

Based on the composition of the encoder 70 and the computer storage medium described above, see FIG. 8 , which shows a specific hardware structure example of the encoder 70 provided by the embodiment of the present application, which may include: a communication interface 801, a memory 802, and a processor 803; each The components are coupled together through a bus system 804 . It will be appreciated that the bus system 804 is used to implement connection communication between these components. In addition to the data bus, the bus system 804 also includes a power bus, a control bus, and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 804 in FIG. 7 . Among them, the communication interface 801 is used for receiving and sending signals in the process of sending and receiving information with other external network elements;

a memory 802 for storing computer programs that can be executed on the processor 803;

The processor 803 is configured to, when running the computer program, execute:

It can be understood that the memory 802 in this embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Wherein, the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically programmable read-only memory (Erasable PROM, EPROM). Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may be Random Access Memory (RAM), which acts as an external cache. By way of illustration and not limitation, many forms of RAM are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), Synchronous link DRAM (Synchronous link DRAM, SLDRAM) ) and direct memory bus random access memory (Direct Rambus RAM, DRRAM). The memory 802 of the systems and methods described herein is intended to include, but not be limited to, these and any other suitable types of memory.

The processor 803 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 803 or an instruction in the form of software. The above-mentioned processor 803 can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a ready-made programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 802, and the processor 803 reads the information in the memory 802, and completes the steps of the above method in combination with its hardware.

It will be appreciated that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For hardware implementation, the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASIC), Digital Signal Processing (DSP), Digital Signal Processing Device (DSP Device, DSPD), programmable Logic Devices (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), General Purpose Processors, Controllers, Microcontrollers, Microprocessors, Others for performing the functions described herein electronic unit or a combination thereof.

For a software implementation, the techniques described herein may be implemented through modules (eg, procedures, functions, etc.) that perform the functions described herein. Software codes may be stored in memory and executed by a processor. The memory can be implemented in the processor or external to the processor.

Optionally, as another embodiment, the processor 803 is further configured to execute the steps of the method in any one of the foregoing embodiments when running the computer program.

This embodiment provides an encoder, which includes a determination unit, a calculation unit, and an encoding unit; wherein the determination unit is configured to determine pre-parameters of a video to be encoded, and determine a first Lagrangian according to the pre-parameters a multiplier and a second Lagrangian multiplier; the computing unit is configured to determine a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier; and determine The unit is further configured to determine a first distortion value according to a first distortion metric, wherein the first distortion metric includes a semantic distortion metric; and determine a second distortion value according to a second distortion metric, wherein the The second distortion metric criterion includes a numerical error metric criterion; the computing unit is further configured to, and based on the first distortion value and the second distortion value, determine a target distortion value; the encoding unit is configured to utilize the target Lagrangian multiplication and the target distortion value, determine the encoding parameters of the to-be-encoded video, and encode the to-be-encoded video. In this way, the first distortion metric based on the semantic distortion metric and the second distortion metric based on the numerical error metric are comprehensively considered in video coding for rate-distortion optimization, which can be well adapted to machine vision and human-machine vision-oriented applications. scene, and under a certain bit rate, it can improve the semantic segmentation accuracy of the reconstructed video, while maintaining a good fidelity performance, thereby also improving the coding efficiency.

Referring to FIG. 9 , it shows a schematic structural diagram of a video system provided by an embodiment of the present application. As shown in FIG. 9 , the video system 90 may include an encoder 901 and a decoder 902 . The encoder 901 may be the encoder 70 described in any one of the foregoing embodiments.

The encoder 901 is configured to determine pre-parameters of the video to be encoded, determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameters; and determine the first Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier to determine a target Lagrangian multiplier; and a first distortion value is determined according to a first distortion metric criterion, wherein the first distortion metric criterion includes a semantic distortion metric criterion And according to the second distortion metric criterion, determine the second distortion value, wherein, the second distortion metric criterion includes numerical error metric criterion; And according to the first distortion value and the second distortion value, determine the target distortion value and utilize the target Lagrange multiplier and the target distortion value to determine the encoding parameters of the video to be encoded, encode the video to be encoded to generate a code stream, and transmit the code stream to the the decoder;

The decoder 902 is configured to parse the code stream to obtain a decoded video.

Further, in some embodiments, the decoder 902 is further configured to parse the code stream, obtain decoding parameters, and obtain the decoded video according to the decoding parameters; wherein, the decoding parameters at least include a code indicating the division mode of the video to be decoded. parameters and parameters constructing the predicted values of the decoded blocks in the video to be decoded.

In the embodiment of the present application, the video system 90 comprehensively considers the first distortion metric based on the semantic distortion metric and the second distortion metric based on the numerical error metric to perform rate-distortion optimization in video coding, which can be well adapted to It is oriented to the application scenarios of machine vision and human-machine vision, and in the case of a certain bit rate, it can improve the accuracy of semantic segmentation of reconstructed videos, while maintaining good fidelity performance, thereby improving coding efficiency.

It should also be noted that, in this application, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements, but also other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in a process, method, article or apparatus that includes the element.

The above-mentioned serial numbers of the embodiments of the present application are only for description, and do not represent the advantages or disadvantages of the embodiments.

The methods disclosed in the several method embodiments provided in this application can be arbitrarily combined under the condition of no conflict to obtain new method embodiments.

The features disclosed in the several product embodiments provided in this application can be combined arbitrarily without conflict to obtain a new product embodiment.

The features disclosed in several method or device embodiments provided in this application can be combined arbitrarily without conflict to obtain new method embodiments or device embodiments.

The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Industrial Applicability

In the embodiment of the present application, the pre-parameters of the video to be encoded are determined, and the first Lagrangian multiplier and the second Lagrangian multiplier are determined according to the pre-parameters; according to the first Lagrangian multiplier and the second Lagrangian multiplier to determine the target Lagrangian multiplier; according to the first distortion metric criterion, determine the first distortion value, wherein the first distortion metric criterion includes a semantic distortion metric criterion; determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion includes a numerical error metric criterion; determining a target distortion value according to the first distortion value and the second distortion value; using the The target Lagrange multiplier and the target distortion value are used to determine the encoding parameters of the video to be encoded, and the video to be encoded is encoded. In this way, the first distortion metric based on the semantic distortion metric and the second distortion metric based on the numerical error metric are comprehensively considered in video coding for rate-distortion optimization, which can be well adapted to machine vision and human-machine vision-oriented applications. scene, and under a certain bit rate, it can improve the semantic segmentation accuracy of the reconstructed video, while maintaining a good fidelity performance, thereby also improving the coding efficiency.

Claims

A video coding method, applied to an encoder, the method comprising:

determining pre-parameters of the video to be encoded, and determining a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameters;

determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier;

determining a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion includes a semantic distortion metric criterion;

determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion includes a numerical error metric criterion;

determining a target distortion value according to the first distortion value and the second distortion value;

Using the target Lagrangian multiplier and the target distortion value, the encoding parameters of the video to be encoded are determined, and the video to be encoded is encoded.
The method according to claim 1, wherein the pre-parameter comprises a quantization parameter, and the determining the pre-parameter of the video to be encoded comprises:

A quantization parameter of a coding unit in the to-be-coded video is determined, wherein the coding unit includes at least one of the following: an image, a slice, a sub-image, a tile, and an encoding block.
The method according to claim 2, wherein the determining the first Lagrangian multiplier according to the pre-parameter comprises:

determining a first calculation model parameter, the first calculation model representing the correspondence between the first Lagrangian multiplier and the quantization parameter;

The first Lagrangian multiplier is determined according to the quantization parameter and the first calculation model.
The method according to claim 3, wherein said determining the first calculation model parameter comprises:

In the first calculation model, the first Lagrangian multiplier is set to a weighted value equal to the exponential power of the quantization parameter;

The first calculation model parameter includes a first exponential parameter indicating the exponential power and a first weighting coefficient indicating the weighting.
The method of claim 4, wherein the method further comprises:

The first calculation model parameter is set to a preset value.
The method of claim 4, wherein the method further comprises:

Based on the test video, using the first distortion metric, determine a first relationship function between the first distortion value and the bit rate of the test video, perform a derivative operation on the first relationship function, and determine the the derivative function of the first relation function;

Based on the test video, a second relationship function between the bit rate and the quantization parameter is determined, and the first calculation model parameter is determined according to the derivative function and the second relationship function.
The method according to claim 1, wherein the pre-parameter includes a quantization parameter and a target bit rate, and the determining the pre-parameter of the video to be encoded comprises:

A quantization parameter and a target code rate of a coding unit in the to-be-coded video are determined, wherein the coding unit includes at least one of the following: an image, a slice, a sub-image, a tile, and a coding block.
The method according to claim 7, wherein the determining the first Lagrangian multiplier according to the pre-parameter comprises:

determining a second calculation model parameter, the second calculation model representing the correspondence between the first Lagrange multiplier and the code rate;

The first Lagrangian multiplier is determined according to the target code rate and the second calculation model.
The method according to claim 8, wherein said determining the second calculation model parameter comprises:

In the second calculation model, the first Lagrangian multiplier is set to a weighted value equal to the exponential power of the target code rate;

The second calculation model parameter includes a second exponent parameter indicating the power of the exponent and a second weighting coefficient indicating the weighting.
The method of claim 9, wherein the method further comprises:

The second calculation model parameter is set to a preset value.
The method of claim 9, wherein the method further comprises:

Based on the test video, using the first distortion metric, determine a first relationship function between the first distortion value and the bit rate of the test video;

A derivative operation is performed on the first relational function to determine the second calculation model parameter.
The method according to claim 7, wherein the determining the target bit rate of the coding unit in the to-be-coded video comprises:

The target code rate of the coding unit in the to-be-coded video is determined by using a bit allocation method.
The method according to claim 2 or 7, wherein the determining the second Lagrangian multiplier according to the pre-parameter comprises:

The second Lagrangian multiplier is determined according to a preset third calculation model; wherein, the third calculation model represents the corresponding relationship between the second Lagrangian multiplier and the quantization parameter.
The method according to claim 2 or 7, wherein the determining the quantization parameter of the coding unit in the to-be-coded video comprises:

The quantization parameter of the coding unit in the to-be-coded video is determined by using a rate control method.
The method according to claim 2 or 7, wherein the determining the quantization parameter of the coding unit in the to-be-coded video comprises:

The quantization parameter is set to a preset value.
The method according to claim 1, wherein the determining the first distortion value according to the first distortion metric criterion comprises:

Based on the test video, semantically segment the test video to determine the semantic accuracy of one or more categories;

determining the target semantic accuracy according to the semantic accuracy of the one or more categories;

Distortion measurement is performed on the semantic accuracy of the target by using the fourth calculation model to obtain the first distortion value.
The method of claim 16, wherein the determining the target semantic accuracy according to the semantic accuracy of one or more categories comprises:

A weighted sum of the semantic accuracy of the one or more categories is calculated, and the resulting weighted sum is determined as the target semantic accuracy.
The method according to claim 16, wherein the performing distortion measurement on the target semantic accuracy by using a fourth calculation model to obtain the first distortion value comprises:

determining the fourth calculation model parameter, the fourth calculation model representing the correspondence between the first distortion value and the target semantic accuracy;

The first distortion value is obtained according to the target semantic accuracy and the fourth calculation model.
The method of claim 18, wherein said determining a fourth calculation model parameter comprises:

In the fourth calculation model, the first distortion value is set as a weighted value equal to the logarithm of the target semantic accuracy;

The fourth calculation model parameter includes a base parameter indicating the logarithm and a fourth weighting coefficient parameter indicating the weighting.
The method of claim 19, wherein the method further comprises:

The fourth calculation model parameter is set to a preset value.
The method according to claim 1, wherein the determining the first distortion value according to the first distortion metric criterion comprises:

determining a fifth calculation model parameter, the fifth calculation model representing a third relationship function between the first distortion value and the mean square error;

The target mean square error of the coding unit in the video to be encoded is determined, and the first distortion value is determined according to the target mean square error and the fifth calculation model.
The method of claim 21, wherein said determining a fifth computational model parameter comprises:

In the fifth calculation model, the first distortion value is set equal to the product of the target mean square error and the first parameter factor and the sum value of the second parameter factor is superimposed;

The fifth calculation model parameter includes the first parameter factor and the second parameter factor.
The method of claim 22, wherein the method further comprises:

The fifth calculation model parameter is set to a preset value.
The method of claim 22, wherein the method further comprises:

Based on the test video, using the first distortion metric, determine a third relationship function between the first distortion value and the mean square error of the test video;

The fifth calculation model parameter is determined according to the third relational function.
The method according to claim 1, wherein the determining the second distortion value according to the second distortion metric criterion comprises:

determining a reconstruction value of a coding unit in the video, wherein the coding unit includes at least one of the following: an image, a slice, a sub-image, a tile, and a coding block;

Based on the numerical error criterion, the second distortion value is determined according to the reconstructed value and the original value of the coding unit;

Wherein, the numerical error criterion is one of the following: absolute error sum criterion, mean absolute error criterion, error sum of square criterion, and mean square error criterion.
The method according to claim 1, wherein the determining a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier comprises:

determining a first preset parameter; wherein, the first preset parameter is used to control weight values corresponding to the first Lagrangian multiplier and the second Lagrangian multiplier;

The first Lagrangian multiplier and the second Lagrangian multiplier are weighted and calculated by using the first preset parameter to obtain the target Lagrangian multiplier.
The method according to claim 26, wherein said determining the first preset parameter comprises:

The first preset parameter is set according to the configuration information of the encoder.
The method of claim 26, wherein the method further comprises:

When the configuration information of the encoder indicates that the first preset parameter is equal to k, the target Lagrangian multiplier is set equal to the first Lagrangian multiplier and the second Lagrangian A weighted sum of Lagrangian multipliers, where k is any value greater than or equal to 0 and less than or equal to 1, the weighting coefficient of the first Lagrangian multiplier is set to be equal to 1–k, and the second Lagrangian The weighting factor of the Grange multiplier is set equal to k.
30. The method of claim 28, wherein the value of k is equal to 0.75.
The method according to claim 1, wherein the determining a target distortion value according to the first distortion value and the second distortion value comprises:

determining a second preset parameter; wherein, the second preset parameter is used to control the weight value corresponding to the first distortion value and the second distortion value;

The first distortion value and the second distortion value are weighted and calculated by using the second preset parameter to obtain the target distortion value.
The method according to claim 30, wherein said determining the second preset parameter comprises:

The second preset parameter is set according to the configuration information of the encoder.
The method of claim 31, wherein the method further comprises:

When the configuration information of the encoder indicates that the second preset parameter is equal to m, the target distortion value is set equal to the weighted sum of the first distortion value and the second distortion value, where m is Any value greater than or equal to 0 and less than or equal to 1, the weighting coefficient of the first distortion value is set equal to 1−m, and the weighting coefficient of the second distortion value is set equal to m.
The method of claim 32, wherein the value of m is equal to 0.75.
The method according to claim 1, wherein the determining the encoding parameter of the to-be-encoded video by using the target Lagrangian multiplier and the target distortion value comprises:

constructing a rate-distortion cost function based on the target Lagrangian multiplier and the target distortion value;

Using one or more candidate encoding parameters to pre-encode the video to be encoded, to determine the rate-distortion cost value corresponding to the one or more candidate encoding parameters;

A minimum rate-distortion cost value is selected from the determined rate-distortion cost values, and a candidate encoding parameter corresponding to the minimum rate-distortion cost value is determined as the encoding parameter of the video to be encoded.
The method according to claim 1 or 34, wherein the encoding parameters include at least a parameter indicating a division manner of the to-be-encoded video and a parameter for constructing a prediction value of an encoded block in the to-be-encoded video.
The method of claim 35, wherein said encoding the video to be encoded comprises:

Write the encoding parameters into the code stream.
An encoder comprising a determination unit, a calculation unit and an encoding unit; wherein,

The determining unit is configured to determine pre-parameters of the video to be encoded, and determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameters;

the computing unit, configured to determine a target Lagrangian multiplier according to the first Lagrangian multiplier and the second Lagrangian multiplier;

The determining unit is further configured to determine a first distortion value according to a first distortion metric criterion, wherein the first distortion metric criterion includes a semantic distortion metric criterion; and determine a second distortion value according to a second distortion metric criterion, Wherein, the second distortion metric criterion includes a numerical error metric criterion;

The computing unit is further configured to determine a target distortion value according to the first distortion value and the second distortion value;

The encoding unit is configured to use the target Lagrangian multiplier and the target distortion value to determine encoding parameters of the video to be encoded, and to encode the video to be encoded.
An encoder comprising a memory and a processor; wherein,

the memory for storing a computer program executable on the processor;

The processor is configured to execute the method according to any one of claims 1 to 36 when running the computer program.
A computer storage medium, wherein the computer storage medium stores a computer program which, when executed by at least one processor, implements the method according to any one of claims 1 to 36.
A video system, the video system includes an encoder and a decoder; wherein,

the encoder, configured to determine pre-parameters of the video to be encoded, determine a first Lagrangian multiplier and a second Lagrangian multiplier according to the pre-parameters; and according to the first Lagrangian a multiplier and the second Lagrangian multiplier to determine a target Lagrangian multiplier; and a first distortion value is determined according to a first distortion metric criterion, wherein the first distortion metric criterion includes a semantic distortion metric and determining a second distortion value according to a second distortion metric criterion, wherein the second distortion metric criterion includes a numerical error metric criterion; and determining a target distortion according to the first distortion value and the second distortion value and using the target Lagrange multiplier and the target distortion value to determine the encoding parameters of the to-be-encoded video, encode the to-be-encoded video to generate a code stream, and transmit the code stream to the decoder;

The decoder is configured to parse the code stream to obtain decoded video.
The system of claim 40, wherein,

The decoder is further configured to parse the code stream, obtain decoding parameters, and obtain the decoded video according to the decoding parameters; wherein, the decoding parameters at least include a parameter indicating the division mode of the video to be decoded and the construction of the video to be decoded. The parameter for the predicted value of the decoded block in .